I am looking for projects to work on
Please contact with me at!

Thursday, March 22, 2018

Access Jupyter notebooks on Azure DSVM

Data Science VM, or DSVM is a serials VM offers from Microsoft Azure Cloud platform. It comes installed with a comprehensive tools for data science projects. In addition, it also by default includes several iPython notebooks as sample scenarios for users to jump start. How to access and run experiments in them? It can be tricky for beginners. In this blog, we answer this question specifically for Linux Ubuntu DSVM, or Linux DLVM (Deep Learning version, NC series).

When logging in your VM, you can see two directories "Desktop" and "notebooks". All the sample notebooks stay in the "notebooks" directory. When we access Jupyter notebooks, we are actually access the notebooks on Jupyter server. We can access Jupyter server by typing https://[vm_ip]:[port_no] in a browser tab.

In DSVMs, there is a default port 8000 already configured and the Jupyter server is automatically launched when the DSVM is provisioned. The user can access the "notebooks" directory by using address https://[vm_ip]:8000 and run all the notebooks under this directory right away.
Sometimes, you need to manually start the Jupyter server for other reasons (e.g. your notebooks are stored in a different directory). You need to perform following steps.

  1. Navigate to your home directory to locate file file (see screeshot below). If the file does not exisit, execute "jupyter notebook --generate-config" in console.  
  2. Add c.NotebookApp.notebook_dir = ‘/home/mylogin/’ to make Jupyter server point to it when started (see screenshot below). You can use a different directory path as well.  
  3. Execute "jupyter notebook password" and set the password in console. 4. Execute "jupyter notebook" in console to start the Jupyter server. When the server is launched, the output will look similar to below screenshot. Locate the port number, which is usually 9999 as shown below.
  4.  Go to the overview page of your VM in Azure portal to add in-bound rule for port 9999 (It can be another port if you have other configurations). 
  5.  Now you can access the above set directory at https://[vm_ip]:9999

Friday, October 21, 2016

Trend of supporting rate on Twitter for Hillary and Trump

In this post, I want to report the support rate in previous month (9/5/2016-10/23/2016) for Hillary and Trump based on the number of tweets on a set of pre-selected hashtags, ten in for each side. The details about these selected  hashtags can be found in my previous post.

In the following figure, by looking at the percentage of the supporting tweets at a daily basis, we can observe whether each side's supporting rate has significant change overtime. There is no significant change of each side's support ratio since the first present debate 2016, which happened on Sept 26 2016. However, there is a significant pattern shows that the number of Twitters always increases significantly after each debate.

Before second debate, Oct. 9th 2016, Hillary's supporting ratio went up. After debate, the ratio come down again. Overall, it looks like her supporting ratio has trend to go up?

Edit: Based on below figure, it looks like Hillary's support ratio is going down.

Edit: It looks like Hillary gained more support after third debate as as the trend shows her support rate going up. Or, in other words, Trump lost some supporters on Twitter? It will be interesting to study if those "supporters" will continuously post tweets on Twitter or they will just show their opinion and waiting for the voting day.

Edit: It looks like that Hillary's supporting rate is going down.

Fig. 1 shows the percentage of supporters on Twitter. The value of percentage may have bias because of sampled population. However, the trend of each side's supporting ratio may reveal some truth.


Fig. 2 shows the number of active Twitters everyday. It reveals people's passion on the elections affect by major events like debates etc.