Saturday, February 1, 2020

Quantitative Analysis of Social Media Political Impact

This is an expired proposal made for applying interesting position.

Social media has been battlefield for public opinion. Public opinion can be manufactured, because the "truth" and the information presented by various channels are not equal. Since the emergence of the Internet, social media has become the new battlefield for public opinion. Starting from online forum and portal website, nowadays the opinion influencer mainly happens on platforms such as Twitter, Facebook, WeChat, Telegram etc. For example, the 2016 US presidential election makes people better aware that social media has become a global battlefield of public opinion. In these social media platforms, public opinion is usually influenced by individual accounts and by customized news feeds. Covered by the freedom of thought and speech, new problems arise. On the one hand, the credibility of individual accounts becomes a big issue. While on the other hand, the news feeds powered by recommender engine can potentially lead people to certain thinking pattern, which may interfere with one’s independent thinking process.
Consequently, systematic research efforts in bridging the gap between the social media platform and credibility of influencing accounts or news feeds are needed to provide an actionable guideline and law enforcement strategy to detect “cyber-attacks” on social media. This proposal suggests an interdisciplinary study that integrates psychology, social intelligence, and artificial intelligence to tackle this problem. Specifically, it aims to adopt data driven approach by collecting data from popular social media platforms, and building a machine learning model for monitoring abnormal social media attacks. The ultimate goal is to provide solid quantitative analysis to policy maker for judging and monitoring the social media, so that public can be informed while forming their own independent opinion.   
Keywords:  Cybersecurity, Social media, public opinion, Artificial Intelligence, Big Data
Quantitative Analysis of Social Media Political Impact

Contents





Motivation

Social media is the new battlefield. It is like a real war. Cyber-attack is not only about traditional hack information system, but also about influencing public opinion. Social media has been influencing our thoughts, actions, security. It is reported that 22% of teenagers log on to their favorite social media site more than 10 times a day, and more than half of adolescents log on to a social media site more than once a day [1]. There is no doubt that social media fully impact on society today in both positive and negative ways.

The public opinion monitoring tool have been developed for years.

Information retrieval has been research topic of Data Mining for many years. Information retrieval (IR) is the activity of obtaining information system resources that are relevant to an information need from a collection of those resources [2]. Data Mining technologies has been applied to different domains, including detecting potential terrorists. Also, Big Data technology has been well developed. It makes the task of processing massive amount of data much easier and less cost. Benefit from these technologies, lots of public opinion monitoring tools has been developed. Particularly in country like China, dozens of public opinion monitoring tools has been developed and used [3]. For example, China have Sina (www.yqt365.com), Baidu (yuqing.baidu.com), people (yuqing.people.com.cn), tianya (yq.tianya.cn) and so on. In USA, we can also find company that declare they have public opinion monitoring tools. It is interesting to deeply analyze how different law enforcement cause different circumstances and function of public opinion monitoring. Internet has no national edge. But country and law enforcement have national edge. How can different forms of countries get different result from open Internet, in either positive and negative way?  

We need tools that can help people to predict cyber-attack through social media

It is obvious that many countries’ government are aware of the impact from the Internet or social media. New challenges emerge. Certain countries have released law and policy related to this issue. For example, Singapore government makes Fake news law goes into effect in Singapore in year 2019 [4]. Regardless of government should have this type of law, I believe that certain degree of monitoring is necessary. The question is about how to identify privacy information and public information. Europe Union has made a regulation in EU law as effort to solve this. GDPR [5] is a regulation in EU law on data protection in the European Union and the European Economic Area. It also prevents personal data collected in EU from transferring out of EU.
One famous example is 2016 United States presidential election. It is believed by many people that The Russian government interfered in the 2016 U.S. presidential election with the goal of harming the campaign of Hillary Clinton, boosting the candidacy of Donald Trump, and increasing political and social discord in the United States [6]. It determines that it is vital important to have a social media attack monitoring system for crisis management.
It is great for government to make policy to protect both nation security and citizens privacy. It is also vital important to have tools for law enforcement. It is even more important to predict potential cyber-attack through social media. Except simply collecting certain keywords from the social media forums, a sophisticated prediction model needs to be invented as well. Here, certain AI technologies could be employed. Those AI algorithms could include nature language processing (NLP), Opinion Mining, sequence to sequence NN and so on.

Content recommendation engine may cause bias opinion  

Content recommendation engine is another thing we can investigate on. A recommender system, or a recommendation system, is a subclass of information filtering system that seeks to predict the "rating" or "preference" a user would give to an item [7]. When we visit a web site, such as youtube.com, we can see that it is smart enough to recommend certain content to us based on visitor’s profile and behavior. Another obvious example is Amazon, when we browse on Amazon web site, purchase books and other goods, we can notice that similar goods will be recommended to users. Now, we have a question: won’t this recommendation guide user to an extremism? Or, in other words, does those recommendation algorithms consider the situation, which mislead users to more and more bias content? It is not just about business and technologies. It is also about human being’s mental health. Furthermore, it is about affecting public opinion.
Research has determined that social media or the online world may affect human brain. Some researchers assert that social media is harmful to human brain and relationships [8]. Whatever this online world affects human brain in a good way or bad way. One fact is true. It is that online world, including social media platform and recommendation system, can affect human brain. On the social media platform, human being interacts with human being and chatting robot. Recommendation system affect human being with its algorithms and people’s own profile and behavior.
Each individual come together to form our society. Besides how social media and recommendation system affect human brain, policy maker may be more interested in how it will affect human being’s society and what we need to prevent?

What to do?

As it is obvious that social media and other online world mechanism like content recommendation system can affect human brain, we certainly need to ask how it will affect human society? Once we will have the answer, we have to think about what we need to do? I believe there is no doubt government need to do something. Using United States antitrust law [9] as reference, we maybe need to make a law about preventing giant data companies from controlling public opinion. If tech giants refuse to share certain information, will they carry out duties to protect our society?
 This topic has been talked on media for long time. I believe government need more data and supporting evidences. Lots of facts can be revealed under the data. So, for online world crisis management, we need tools to collect data and a model to produce prediction in an objective/scientific way as much as possible. There are lots of things we need to do. And, there are lots of things we can do.

Goal

Conducting an interdisciplinary study

To explore this domain, it must be an interdisciplinary study. It combines several majors including but not limited to computer science, social psychology, public policy and so on. It requires collaboration among researchers, software engineer, government staff etc. Supports from organization like Harvard Kennedy School Belfer Center is essential too. Today, technologies extremely and rapidly developed. When we enjoy the benefits from extremely developed technology, how to keep the rein in hand for public society is a challenge in this age. 
There are researchers have been working on studying how the Internet influence children brain. There are phycologists working on public mind. There are also crying about foreign government influence United States presidential election. Except emotional reaction, we can conduct a solid study about how social media influence public opinion. We need data. We need scientific research result and proper reaction according the study. In the book “The Crowd”, it is said that organized crowds have always played an important part in the life of peoples, but this part has never been of such moment as at present [10]. Today, in the age of the Internet, people’s brain has been connected with the Internet regardless nations, race, age and so on! This crowd is even huger and much more complex. Study need to be conducted for this new challenge for human society. This study needs to make positive impact on our society. For example, without a misleading public opinion environment for presidential election and governor. We need quantitative analysis instead of just emotional feelings!

Developing data collecting tools

To achieve study goal, it is necessary to quickly develop some little convenience tools for speeding the study. For instance, because of my curious, I developed a small tool to catch tweets from twitter.com. Using the caught data, I discover some interesting phenomena during 2016 present election. One of those is that significant large amount of Donald Trump supporting account are registered Russian language as profile language [11]. In the following context, I can show a sample data plot figure. In the Figure 1, shown in below, we can clearly see that Trump campaign attracted more Russian language speaking twitter users to support them. On Hillary campaign side, there is much less support from them. But it seems that Spain language speaking people gave Hillary campaign more support.   
.
Different language speaking community support different candidates
Figure 1. shows that Russia speaking people significantly support Donald Trump more than Hillary Clinton.

Based on the data collected by the same little tool, we can also clearly see that Trump campaign attracted more auto application/robot to tweets. On Hillary campaign side, there is an auto post application called “Monkey Thank U” that stands out. On Trump campaign side, Zapier.com attracted our attention. So, who are they? What did they post on the twitter? Why did they do this? All these questions are interesting to me. People must be curious to find out answers too. We can use cutting edge AI technologies to answer these questions. We must be able to reveal more insightful details after we enhanced the tool and do deeper analysis on the data.

Table 1. device/application used to post tweets for Trump and Hillary respectively [12]

Hillary
Trump
Application
Num
1
Twitter Web Client
181518
2
Twitter for iPhone
161013
3
Twitter for Android
151083
4
Twitter for iPad
62717
5
Monkey Thank u
10782
6
Mobile Web (M5)
10512
7
TweetDeck
9591
8
RoundTeam
5058
9
Annie Green 1.0
4708
10
diana aquaviva 1.0
4161
11
http://ussanews.com/news1/
4040
12
Hootsuite
2839
13
Twitter for Windows Phone
2717
14
http://nyc.epeak.in/
2679
15
prohiggins1.0
2678
16
Twitter for Windows
2476
17
DropOuthillary1
2252
18
DeviationStand 1.0
2239
19
Daniel Addison 1.0
2232
20
zerosum 1.0
2206
Application
Num
1
Twitter Web Client
995545
2
Twitter for iPhone
922009
3
Twitter for Android
830775
4
Twitter for iPad
282343
5
Mobile Web (M5)
72408
6
Zapier.com
61257
7
IFTTT
32233
8
Tweet Jukebox
26551
9
Twitter for Windows
16821
10
TweetDeck
13780
11
RoundTeam
11398
12
PinkPoniesGreat
10120
13
Mobile Web (M2)
9350
14
Hootsuite
8373
15
StopMadness
7841
16
Twitter for Windows Phone
7800
17
thebestappsever
7644
18
oneoftheapps
7437
19
rollingtwitter
6585
20
WhytePantherTest1
6337

Besides Twitter, it is technically feasible to grasp more information from other social media tools like Facebook, WeChat and so on. With helps from fast developed AI technologies, it is also possible to do Opinion Mining and chat robot on those popular social media platforms. Once these tools are developed, massive amount information can be collected and stored in Big Data platform for further analyzing.   

Selecting features and training Models. 

Our goal is to do quantitative analysis about social media impact on public opinion. To do this, first of all, we need to clearly define the target problem in details. After narrowing down and highlighting the key problem, we need to evaluate all information that could be fetched.  Then, right algorithm will be selected, right features need to be identified. To build a model for precise perdition social media attack, discovering right feature collection is key in the process of training model. Having more domain specific knowledge will be very helpful for identifying right features for modeling task. This model will be used to forecast social media attack, such as massive amount post online created by foreign government agents to influence presidential or governor election.

Social media cyber-attack alarm platform

Once tools development and model building are success, it is possible to create a full functional public opinion monitoring platform with minimized technical debt. It should be run by a party independent organization. It should only monitor the social media and avoid to directly jump into social media forum to influence public opinion. But it can certainly supply insightful and precise information for policy maker as reference.

Having publication and impact

Supported with experiment result, data analysis, and policy study, certain number of publications can be expected. Bot academical publication or social media articles can be good way to have impact among public. In fact, social media itself, the investigating target in this study, is an important place for publication. There is study showing that young generation use more social media than old generation. However, they will grow up and they are going to lead the world in the future for sure.     

Challenges

To successfully achieve the goal, there are lots of challenges that affect the final result of the study. However, we can try to list out challenges here and seek ways to overcome them.  

Identifying problems out of metaphysics.

In the social study domain, metaphysics topic can easily break in.  Scientists often seem to be discussing topics which have traditionally fallen within the purview of metaphysics—topics such as time, space, matter, causation, and composition [13].
However, our goal is to do quantitative analysis of social media impact. To avoid blur focus, we need to narrow our study objects in the way that our study objects can be quantitatively analyzed. We need to make clear line that what features will be considered in the model. It is challenge to try identify and include measurable features as much as possible.

 Finding the balance point between health business activity and harmful influence.

 After repeated privacy breaches and disclosures that Russia used social media platforms to distribute propaganda meant to influence the 2016 presidential election, tech giant companies face a so-called techlash of greater congressional and regulatory scrutiny. Teck giants, including Google, Amazon.com Inc. and Facebook Inc. set company records for lobbying spending in 2018 as Washington’s scrutiny of Big Tech intensified [14].
It is obviously that it is hard to find the balance point between health business activities and harmful influence on public opinion. Meanwhile, it is vital important to identify the balance point between health business activity and harmful influence on public opinion. Only after we can crystal clearly define the balance point; it is possible to study and select features for modeling a social media cyber-attack predict platform.

Limited technical support and privacy policy

Teck giants can not open all API to third party vendor. It is for sure. Part of the reason is that they have to prevent users’ privacy although they use those user data as company’s valuable property to earn money. However, as long as their service is open to public, we will have technique to archive our goal without broken their End User Agreement (EUA).
On the other hand, this proposed monitoring system itself should not infringe people’s privacy. In fact, as a tool for independent agency, it should not directly influence public opinion. It can only generate report and alarm. 

Strategy

Define terms

Lots of terms are used among people. But some of them may have not been seriously analyzed and strictly defined. For example, terms about privacy, security, social media attack need to be defined in a more academically strict way.  Without clearly coining these terms, it is difficult to define the problem and create right model for prediction. So, it should be the first step with supports from experts from different domains.

Searching support

Since this study must be an interdisciplinary study. It will be wise to search support from other domain experts at early stage. For instance, there must be research about how human brain are affected by surround environment, including online world. And how content recommend system activate human brain’s reward center. There must be existing models for us to use as reference.
Also, to find support from privacy policy expert is important too. Understanding privacy policy will help to build trust, avoid breaking law. A social media attack tools itself may probe into online user’s private information. It is essential to well understand privacy policy and not broke it.  

Develop tools.

It is essential to have tools for collecting data and analyze data. Analysis report is supporting evidence for studying policy and building social media attach monitoring platform. Several relevant tools are in my code repository already. We can reuse them for quickly develop prototype.
We can select several popular social media platforms as target at the begin. For example, Figure 2 can help us to know the most popular social media platform in USA. We can also expand to other countries such as China. Those countries may have different social media platforms holding the dominant position. For example, WeChat and Weibo are in the dominant positions in China. Tools like chat robot can be developed to fetch data and monitoring public opinion there.

Figure 2. Instagram, Snapchat remain especially popular among those ages 18 to 24 [15].
Furthermore, it will be great if we can divide the project into smaller tasks and conduct several campus projects. It will benefit both students and this project. Using cutting edge technologies will reduce the technical debt.

Treat it as startup instead of a pure academic research.

This project can evolve to be a sophisticated quantitative analysis platform for social media political impact tool. Besides political issue, same technology can be applied to other social issue like online bulling. It will be great if we treat this as a startup project instead of a pure academic research project. It will be wonderful to work on a project that may have impact on our real life. 2020 president election is coming soon. Also, there will be governor elections. It will be great if this project can be successfully implemented and deployed.       
[1]
K. C.-P. Gwenn Schurgin O'Keeffe, "The Impact of Social Media on Children, Adolescents, and Families," American Academy of Pediatrics, vol. 1, no. 1, pp. 1-20, 2015.
[2]
wikipedia, "Information retrieval," [Online]. Available: https://en.wikipedia.org/wiki/Information_retrieval.
[3]
zhihu.com, "list of public opinion monitoring tools," [Online]. Available: https://www.zhihu.com/question/28406057.
[4]
Protection from Online Falsehoods and Manipulation Act, 2019.
[5]
General Data Protection Regulation (EU), 2016.
[6]
wikipedia.org, "Russian interference in the 2016 United States elections," wikipedia.org, [Online]. Available: https://en.wikipedia.org/wiki/Russian_interference_in_the_2016_United_States_elections. [Accessed 14 January 2020].
[7]
WikiPedia.org, "Recommender system," WikiPedia.org, April 2019. [Online]. Available: https://en.wikipedia.org/wiki/Recommender_system. [Accessed 14 January 2020].
[8]
B. Gordon, "Social Media Is Harmful to Your Brain and Relationships," www.psychologytoday.com, 20 Oct. 2017. [Online]. Available: https://www.psychologytoday.com/us/blog/obesely-speaking/201710/social-media-is-harmful-your-brain-and-relationships. [Accessed 14 January 2019].
[9]
wikipedia.org, "United States antitrust law," wikipedia.org, [Online]. Available: https://en.wikipedia.org/wiki/United_States_antitrust_law. [Accessed 14 January 2019].
[10]
G. L. Bon, The Crowd, A study of the Popular Mind, Batoche Books, 1896.
[11]
Y. Jia, "Trump vs Hillary week02," Boston Info Pro LLC, 22 September 2016. [Online]. Available: http://yiyujia.blogspot.com/2016/09/trump-vs-hillary-week02.html. [Accessed 14 January 2019].
[12]
Y. Jia, "Trump vs Hillary on Twitter," 16 September 2016. [Online]. Available: http://yiyujia.blogspot.com/2016/09/trump-vs-hillary-on-twitter.html. [Accessed 14 January 2020].
[13]
K. Hawley, "Social Science as a Guide to Social Metaphysics?," Journal for General Philosophy of Science, vol. 49, no. 2, pp. 187 - 198, 2018.
[14]
bloomberg.com, "Google, Facebook Set 2018 Lobbying Records as Tech Scrutiny Intensifies," 22 January 2019. [Online]. Available: https://www.bloomberg.com/news/articles/2019-01-22/google-set-2018-lobbying-record-as-washington-techlash-expands. [Accessed 14 January 2020].
[15]
A. P. A. M. ANDERSON, "Share of U.S. adults using social media, including Facebook, is mostly unchanged since 2018," www.pewresearch.org, 10 April 2019. [Online]. Available: https://www.pewresearch.org/fact-tank/2019/04/10/share-of-u-s-adults-using-social-media-including-facebook-is-mostly-unchanged-since-2018/. [Accessed 14 January 2020].