The Philippines is the social media capital of the world. According to this Huffington Post article[^1], “from a global average of 4.4 hours/day, the Filipino spends an average of 6.3 hours/day online via laptop and 3.3 hours/day via mobile.” It’s then no surprise that social media has become one of the main news sources for many Filipinos, and with a tumultuous 2016 Presidential Election, many issues have cropped up, from the newly elected President hitting media for “biased” news, introducing legislation around the spread of “fake news” on social media, to a campaign engineering a social media machine designed to weaponize hatred.
Apart from what we can gather via investigative journalism, how much do we really know about the Philippine news landscape on social media? This series (and yes, I intend to finish this one) is intended to explore the phenomenon using data.
We use the Facebook Graph API to extract all relevant information for 2016 in key news pages, as identified by the number of likes. We extract posts, comments, reactions, comment replies and reactions and then place them in a structured format in the SQLite Database.
We use Latent Dirichlet Allocation on news article unstructured text in order to uncover latent “topics” in the corpus. We then explore the overall distribution, time trends, and also the concentration of news pages on a particular topic to explore the assertion that media is “biased.”
For details on the license and permission requests, please see the license file.