Online News Popularity.

Business Goal

-> Increase popularity of news by customizing news in popular format.

Description of Dataset

-> Raw data: 39644 obs, 61 variables
-> Left outer join 'channel' with the probability of people read news of certain channel online called 'frequency'.
-> Some values of channel are missing. Simply deleted.
-> Dataset after join: 33510 obs, 64 variables

Review of Dataset Analysis

Key Steps PROCs used and SAS programs customization
-> Import data and data treatment -> Import, Data steps, if else statement
-> Summary of Variables -> Means, Freq
-> Draw summary plots of variables     -> Sgplot, Univariate
-> Join tables -> Sql, merge statement in Data steps
-> Compute correlation coefficient -> Corr
-> Linear regression and ANOVA -> Reg, Glmselect
-> Binary logistic regression -> Univariate, Logistic
-> Rapid Predictive Modeler -> Delete

DashBoard

Characterize Data

Collect heterogeneous set of data about articles published by the website in a period of two years. Develop causal relationship between webpage data and number of shares. Develop a plan to increase website popularity by adjusting website layout, wording of text, and time of release.

view project

Linear Regression

Improve website popularity and public awareness; Identify the which day of the week results in the most shares; Compare the popularity of different channels on Mashable; Explore how other factors influence shares in social media; Number of words in title, polarity of text, etc.

view project

ANOVA 1

Analysis of variance of categorical variables: weekday and channel and perform their multiple comparison to gain insights of number of shares.

view project

ANOVA 2

Find more variables to better explain the dependent variable, Number of Shares; Companies may use insights to choose when to prioritize certain categories of news.

view project

Binary Logistic Regression

Identify factors that influence news website popularity and awareness, measured in number of shares in social media.

view project

Rapid Predictive Modeler

Identify factors that influence news website popularity and awareness, measured in number of shares in social media.

view project

Insights and Recommendations


Increase:
-> Amount of key words.
-> Number of linked embedded.
-> Number of images.
-> Reference articles with high popularity.
-> A more subjective and positive title.

Time of publication:
-> Postpone non-time sensitive articles (features etc.) to the weekend. Weekend receive more shares than weekdays.
-> Focus more on social media articles during the weekdays.
      Monday Social > Lifestyle/ Tech > Business > World/ Entertainment
      Tuesday Social > Lifestyle/ Tech > Business > Entertainment > World
      Wednesday    Social > Lifestyle/ Tech > Business > Entertainment > World
      Thursday Social > Lifestyle/ Tech > Business > Entertainment > World
      Friday Social > Lifestyle/ Tech > Business > World/ Entertainment
      Saturday Lifestyle/ Business/ Social/ Tech > Entertainment > World
      Sunday Lifestyle/ Business/ Social/ Tech > Entertainment > World

Channel:
-> Editors may want to put more emphasis on articles of a specific channel.
-> Social media> technology > lifestyle > business > entertainment > world

Next Step

Continue to refine the model by including more independent variables.

Extend the time interval, currently we only collected data for 2 years.

Further subdivide news according to their topics and find what factors influence news with a particularly topic.

Better understand the differences between online news with large text sentiment polarity and and those with small text sentiment polarity.

Thank you!




Contact us now!