This is an expired proposal made for applying interesting position.
Social media has been
battlefield for public opinion. Public
opinion can be manufactured, because the "truth" and the information
presented by various channels are not equal. Since the emergence of the
Internet, social media has become the new battlefield for public opinion. Starting from online forum
and portal website, nowadays the opinion influencer mainly happens on platforms
such as Twitter, Facebook, WeChat, Telegram etc. For example, the 2016 US
presidential election makes people better aware that social media has become a
global battlefield of public opinion. In these social media platforms, public
opinion is usually influenced by individual accounts and by customized news
feeds. Covered by the freedom of thought and speech, new problems arise. On the
one hand, the credibility of individual accounts becomes a big issue. While on
the other hand, the news feeds powered by recommender engine can potentially
lead people to certain thinking pattern, which may interfere with one’s
independent thinking process.
Consequently, systematic research efforts in bridging the gap between
the social media platform and credibility of influencing accounts or news feeds
are needed to provide an actionable guideline and law enforcement strategy to
detect “cyber-attacks” on social media. This proposal suggests an
interdisciplinary study that integrates psychology, social intelligence, and
artificial intelligence to tackle this problem. Specifically, it aims to adopt
data driven approach by collecting data from popular social media platforms,
and building a machine learning model for monitoring abnormal social media
attacks. The ultimate goal is to provide solid quantitative analysis to policy
maker for judging and monitoring the social media, so that public can be
informed while forming their own independent opinion.
Keywords: Cybersecurity, Social media, public opinion,
Artificial Intelligence, Big Data
Contents
Motivation
Social media is the new battlefield. It is like a real
war. Cyber-attack is not only about traditional hack information system, but
also about influencing public opinion. Social media has been influencing our
thoughts, actions, security. It is reported that 22% of teenagers log on to
their favorite social media site more than 10 times a day, and more than half
of adolescents log on to a social media site more than once a day [1] . There is no doubt
that social media fully impact on society today in both positive and negative
ways.
The public opinion monitoring tool have been
developed for years.
Information retrieval has been research topic of Data Mining
for many years. Information retrieval (IR) is the activity of obtaining
information system resources that are relevant to an information need from a
collection of those resources [2] .
Data Mining technologies has been applied to different domains, including detecting
potential terrorists. Also, Big Data technology has been well developed. It
makes the task of processing massive amount of data much easier and less cost. Benefit
from these technologies, lots of public opinion monitoring tools has been
developed. Particularly in country like China, dozens of public opinion
monitoring tools has been developed and used [3] .
For example, China have Sina (www.yqt365.com), Baidu (yuqing.baidu.com), people
(yuqing.people.com.cn), tianya (yq.tianya.cn) and so on. In USA, we can also
find company that declare they have public opinion monitoring tools. It is
interesting to deeply analyze how different law enforcement cause different circumstances and function of public opinion
monitoring. Internet has no national edge. But country and law enforcement have
national edge. How can different forms of countries get different result from
open Internet, in either positive and negative way?
We need tools that can help people to predict
cyber-attack through social media
It is obvious that many countries’ government are aware of the
impact from the Internet or social media. New challenges emerge. Certain
countries have released law and policy
related to this issue. For example, Singapore government makes Fake news law
goes into effect in Singapore in year 2019 [4] .
Regardless of government should have this type of law, I believe that certain
degree of monitoring is necessary. The question is about how to identify privacy
information and public information. Europe Union has made a regulation in EU
law as effort to solve this. GDPR [5]
is a regulation in EU law on data protection in the European Union and the
European Economic Area. It also prevents personal data collected in EU from
transferring out of EU.
One famous example is 2016 United States presidential
election. It is believed by many people that The Russian government interfered
in the 2016 U.S. presidential election with the goal of harming the campaign of
Hillary Clinton, boosting the candidacy of Donald Trump, and increasing
political and social discord in the United States [6] .
It determines that it is vital important to have a social media attack
monitoring system for crisis management.
It is great for government to make policy to protect both
nation security and citizens privacy. It is also vital important to have tools
for law enforcement. It is even more important to predict potential
cyber-attack through social media. Except
simply collecting certain keywords from the social media forums, a sophisticated
prediction model needs to be invented as well. Here, certain AI technologies
could be employed. Those AI algorithms could include nature language processing
(NLP), Opinion Mining, sequence to sequence NN and so on.
Content recommendation engine may cause bias opinion
Content recommendation engine is another thing we can
investigate on. A recommender system, or a recommendation system, is a subclass
of information filtering system that seeks to predict the "rating" or
"preference" a user would give to an item [7] .
When we visit a web site, such as youtube.com, we can see that it is smart
enough to recommend certain content to us based on visitor’s profile and
behavior. Another obvious example is Amazon, when we browse on Amazon web site,
purchase books and other goods, we can notice that similar goods will be
recommended to users. Now, we have a question: won’t this recommendation guide
user to an extremism? Or, in other words, does those recommendation algorithms consider
the situation, which mislead users to more and more bias content? It is not
just about business and technologies. It is also about human being’s mental
health. Furthermore, it is about affecting public opinion.
Research has determined that social media or the online
world may affect human brain. Some researchers assert that social media is
harmful to human brain and relationships [8] .
Whatever this online world affects human brain in a good way or bad way. One
fact is true. It is that online world, including social media platform and recommendation
system, can affect human brain. On the social media platform, human being
interacts with human being and chatting robot. Recommendation system affect
human being with its algorithms and people’s own profile and behavior.
Each individual come together to form our society. Besides
how social media and recommendation system affect human brain, policy maker may
be more interested in how it will affect human being’s society and what we need
to prevent?
What to do?
As it is obvious that social media and other online world
mechanism like content recommendation system can affect human brain, we
certainly need to ask how it will affect human society? Once we will have the
answer, we have to think about what we need to do? I believe there is no doubt
government need to do something. Using
United States antitrust law [9]
as reference, we maybe need to make a law about preventing giant data companies
from controlling public opinion. If tech giants refuse to share certain
information, will they carry out duties to protect our society?
This topic has been talked on media for long
time. I believe government need more data and supporting evidences. Lots of
facts can be revealed under the data. So, for online world crisis management, we
need tools to collect data and a model to produce prediction in an
objective/scientific way as much as possible. There are lots of things we need
to do. And, there are lots of things we can do.
Goal
Conducting an interdisciplinary study
To explore this domain, it must be an interdisciplinary
study. It combines several majors including but not limited to computer
science, social psychology, public policy and so on. It requires collaboration
among researchers, software engineer, government staff etc. Supports from
organization like Harvard Kennedy School Belfer Center is essential too. Today,
technologies extremely and rapidly developed. When we enjoy the benefits from
extremely developed technology, how to keep the rein in hand for public society
is a challenge in this age.
There are researchers have been working on studying how the
Internet influence children brain. There are phycologists working on public
mind. There are also crying about foreign government influence United States
presidential election. Except emotional reaction, we can conduct a solid study
about how social media influence public opinion. We need data. We need
scientific research result and proper reaction according the study. In the book
“The Crowd”, it is said that organized crowds have always played an important
part in the life of peoples, but this part has never been of such moment as at
present [10] .
Today, in the age of the Internet, people’s brain has been connected with the
Internet regardless nations, race, age and so on! This crowd is even huger and
much more complex. Study need to be conducted for this new challenge for human
society. This study needs to make positive impact on our society. For example,
without a misleading public opinion environment for presidential election and
governor. We need quantitative analysis instead of just emotional feelings!
Developing data collecting tools
To achieve study goal, it is necessary to quickly develop
some little convenience tools for speeding the study. For instance, because of
my curious, I developed a small tool to catch tweets from twitter.com. Using
the caught data, I discover some interesting phenomena during 2016 present
election. One of those is that significant large amount of Donald Trump
supporting account are registered Russian language as profile language [11] . In the following
context, I can show a sample data plot figure. In the Figure 1, shown in below,
we can clearly see that Trump campaign attracted more Russian language speaking
twitter users to support them. On Hillary campaign side, there is much less
support from them. But it seems that Spain language speaking people gave
Hillary campaign more support.
.
Different
language speaking community support different candidates
Figure 1. shows that Russia speaking people significantly
support Donald Trump more than Hillary Clinton.
Based on the data collected by the same little tool, we can
also clearly see that Trump campaign attracted more auto application/robot to
tweets. On Hillary campaign side, there is an auto post application called
“Monkey Thank U” that stands out. On Trump campaign side, Zapier.com attracted
our attention. So, who are they? What did they post on the twitter? Why did
they do this? All these questions are interesting to me. People must be curious
to find out answers too. We can use cutting edge AI technologies to answer
these questions. We must be able to reveal more insightful details after we
enhanced the tool and do deeper analysis on the data.
Table 1.
device/application used to post tweets for Trump and Hillary respectively [12]
Hillary
|
Trump
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Besides Twitter, it is technically feasible to grasp more information
from other social media tools like Facebook, WeChat and so on. With helps from
fast developed AI technologies, it is also possible to do Opinion Mining and
chat robot on those popular social media platforms. Once these tools are
developed, massive amount information can be collected and stored in Big Data
platform for further analyzing.
Selecting features and training Models.
Our goal is to do quantitative analysis about social media
impact on public opinion. To do this, first of all, we need to clearly define
the target problem in details. After narrowing down and highlighting the key
problem, we need to evaluate all information that could be fetched. Then, right algorithm will be selected, right
features need to be identified. To build a model for precise perdition social
media attack, discovering right feature collection is key in the process of
training model. Having more domain specific knowledge will be very helpful for
identifying right features for modeling task. This model will be used to
forecast social media attack, such as massive amount post online created by
foreign government agents to influence presidential or governor election.
Social media cyber-attack alarm platform
Once tools development and model building are success, it is
possible to create a full functional public opinion monitoring platform with
minimized technical debt. It should be run by a party independent organization.
It should only monitor the social media and avoid to directly jump into social
media forum to influence public opinion. But it can certainly supply insightful
and precise information for policy maker as reference.
Having publication and impact
Supported with experiment result, data analysis, and policy
study, certain number of publications can be expected. Bot academical
publication or social media articles can be good way to have impact among
public. In fact, social media itself, the investigating target in this study,
is an important place for publication. There is study showing that young
generation use more social media than old generation. However, they will grow
up and they are going to lead the world in the future for sure.
Challenges
To successfully achieve
the goal, there are lots of challenges that affect the final result of the
study. However, we can try to list out challenges here and seek ways to
overcome them.
Identifying problems out of metaphysics.
In the social study
domain, metaphysics topic can easily break in.
Scientists often seem to be discussing topics which have traditionally
fallen within the purview of metaphysics—topics such as time, space, matter,
causation, and composition [13] .
However, our goal
is to do quantitative analysis of social media impact. To avoid blur focus, we
need to narrow our study objects in the way that our study objects can be
quantitatively analyzed. We need to make clear line that what features will be
considered in the model. It is challenge to try identify and include measurable
features as much as possible.
Finding the
balance point between health business activity and harmful influence.
After repeated
privacy breaches and disclosures that Russia used social media platforms to
distribute propaganda meant to influence the 2016 presidential election, tech
giant companies face a so-called techlash of greater congressional and
regulatory scrutiny. Teck giants, including Google, Amazon.com Inc. and
Facebook Inc. set company records for lobbying spending in 2018 as Washington’s
scrutiny of Big Tech intensified [14] .
It is obviously that it is hard to find the balance point
between health business activities and harmful influence on public opinion.
Meanwhile, it is vital important to identify the balance point between health
business activity and harmful influence on public opinion. Only after we can crystal
clearly define the balance point; it is possible to study and select features
for modeling a social media cyber-attack predict platform.
Limited technical support and privacy policy
Teck giants can not open all API to third party vendor. It
is for sure. Part of the reason is that they have to prevent users’ privacy although
they use those user data as company’s valuable property to earn money. However,
as long as their service is open to public, we will have technique to archive
our goal without broken their End User Agreement (EUA).
On the other hand, this proposed monitoring system itself
should not infringe people’s privacy.
In fact, as a tool for independent agency, it should not directly influence
public opinion. It can only generate report and alarm.
Strategy
Define terms
Lots of terms are used among people. But some of them may
have not been seriously analyzed and strictly defined. For example, terms about
privacy, security, social media attack need to be defined in a more
academically strict way. Without clearly
coining these terms, it is difficult to define the problem and create right
model for prediction. So, it should be the first step with supports from
experts from different domains.
Searching support
Since this study must be an interdisciplinary study. It will
be wise to search support from other domain experts at early stage. For
instance, there must be research about how human brain are affected by surround
environment, including online world. And how content recommend system activate
human brain’s reward center. There must be existing models for us to use as
reference.
Also, to find support from privacy policy expert is
important too. Understanding privacy policy will help to build trust, avoid
breaking law. A social media attack tools itself may probe into online user’s
private information. It is essential to well understand privacy policy and not
broke it.
Develop tools.
It is essential to have tools for collecting data and
analyze data. Analysis report is supporting evidence for studying policy and
building social media attach monitoring platform. Several relevant tools are in
my code repository already. We can reuse them for quickly develop prototype.
We can select several popular social media platforms as
target at the begin. For example, Figure 2 can help us to know the most popular
social media platform in USA. We can also expand to other countries such as China.
Those countries may have different social media platforms holding the dominant
position. For example, WeChat and Weibo are in the dominant positions in China.
Tools like chat robot can be developed to fetch data and monitoring public
opinion there.
Figure 2. Instagram, Snapchat remain
especially popular among those ages 18 to 24 [15] .
Furthermore, it will be great if we can divide the project
into smaller tasks and conduct several campus projects. It will benefit both
students and this project. Using cutting edge technologies will reduce the technical
debt.
Treat it as startup instead of a pure academic
research.
This project can evolve to be a sophisticated quantitative
analysis platform for social media political impact tool. Besides political
issue, same technology can be applied to other social issue like online
bulling. It will be great if we treat this as a startup project instead of a
pure academic research project. It will be wonderful to work on a project that
may have impact on our real life. 2020 president election is coming soon. Also,
there will be governor elections. It will be great if this project can be
successfully implemented and deployed.
[1]
|
K. C.-P. Gwenn
Schurgin O'Keeffe, "The Impact of Social Media on Children,
Adolescents, and Families," American Academy of Pediatrics, vol.
1, no. 1, pp. 1-20, 2015.
|
[2]
|
wikipedia,
"Information retrieval," [Online]. Available:
https://en.wikipedia.org/wiki/Information_retrieval.
|
[3]
|
zhihu.com,
"list of public opinion monitoring tools," [Online]. Available:
https://www.zhihu.com/question/28406057.
|
[4]
|
Protection from
Online Falsehoods and Manipulation Act, 2019.
|
[5]
|
General Data
Protection Regulation (EU), 2016.
|
[6]
|
wikipedia.org,
"Russian interference in the 2016 United States elections,"
wikipedia.org, [Online]. Available: https://en.wikipedia.org/wiki/Russian_interference_in_the_2016_United_States_elections.
[Accessed 14 January 2020].
|
[7]
|
WikiPedia.org,
"Recommender system," WikiPedia.org, April 2019. [Online].
Available: https://en.wikipedia.org/wiki/Recommender_system. [Accessed 14
January 2020].
|
[8]
|
B. Gordon,
"Social Media Is Harmful to Your Brain and Relationships,"
www.psychologytoday.com, 20 Oct. 2017. [Online]. Available:
https://www.psychologytoday.com/us/blog/obesely-speaking/201710/social-media-is-harmful-your-brain-and-relationships.
[Accessed 14 January 2019].
|
[9]
|
wikipedia.org,
"United States antitrust law," wikipedia.org, [Online]. Available:
https://en.wikipedia.org/wiki/United_States_antitrust_law. [Accessed 14
January 2019].
|
[10]
|
G. L. Bon, The Crowd,
A study of the Popular Mind, Batoche Books, 1896.
|
[11]
|
Y. Jia, "Trump
vs Hillary week02," Boston Info Pro LLC, 22 September 2016. [Online].
Available: http://yiyujia.blogspot.com/2016/09/trump-vs-hillary-week02.html.
[Accessed 14 January 2019].
|
[12]
|
Y. Jia, "Trump
vs Hillary on Twitter," 16 September 2016. [Online]. Available:
http://yiyujia.blogspot.com/2016/09/trump-vs-hillary-on-twitter.html. [Accessed
14 January 2020].
|
[13]
|
K. Hawley,
"Social Science as a Guide to Social Metaphysics?," Journal for
General Philosophy of Science, vol. 49, no. 2, pp. 187 - 198, 2018.
|
[14]
|
bloomberg.com,
"Google, Facebook Set 2018 Lobbying Records as Tech Scrutiny Intensifies,"
22 January 2019. [Online]. Available:
https://www.bloomberg.com/news/articles/2019-01-22/google-set-2018-lobbying-record-as-washington-techlash-expands.
[Accessed 14 January 2020].
|
[15]
|
A. P. A. M.
ANDERSON, "Share of U.S. adults using social media, including Facebook,
is mostly unchanged since 2018," www.pewresearch.org, 10 April 2019.
[Online]. Available:
https://www.pewresearch.org/fact-tank/2019/04/10/share-of-u-s-adults-using-social-media-including-facebook-is-mostly-unchanged-since-2018/.
[Accessed 14 January 2020].
|