I love computers. That’s the geeky side of me. I find intrinsic joy in building Raspberry Pi desktop PCs and assembling automated webcams for my three guinea pigs. I am elated to know that Francis Fukuyama, the world-renowned political economist, is a Raspberry Pi fan. He uses a $50 Raspberry Pi-based computer to read the newspaper in the morning and advocates for Linux. Francis Fukuyama said he owns four Linux desktop machines. Here, I want to brag: I have seven running four different Linux distros.

The beauty of gadgets-making

People who are drawn to the open-source community see the beauty…


It is the Lunar New Year, the year of the Ox. “Wish You a Bullish 2021” has become the standard greeting on this celebratory day. Like every new year within the Christian community, we like to use Psalm 65:11 to thank God as He crowns the year with a bountiful harvest. However, Psalm 65:11 wasn’t the first verse that came to my mind today. It was Psalm 137, “By the rivers of Babylon,” a song of lament by a people in exile.

Every New Year Eve, Chinese Mainlanders gather around and watch the nationally-televised spring gala. It is the Super…


Last week I did two things: I bought Dogecoin, and I posted on Facebook asking for a Clubhouse invite. Two unrelated things, but since Elon Musk tweeted about both, I guess I acted like his “disciple.”

There is something to be admired about Clubhouse. It is an invite-only audio-chat app that first became popular among the elite. The early adopters were influencers, tech entrepreneurs, venture capitalists, political insiders, gurus and hero. These people are the idols of our time, who define cultures, drive trends, and make history. Who won’t want to be in the same club with them?

As a…


There are eleven computing devices in my home office: my two work laptops, three spare Linux laptops, a Raspberry Pi desktop, a smartphone, two old and new iPads, a Chromebook, and a Kindle. They are all vying for the home WiFi and my attention.

As a tech aficionado, I work and live with many screens, browser tabs, and cloud servers. Digital distraction is real and feels like enslavement. This gets me to think about digital minimalism. Don’t get me wrong. It is not Tech Sabbath (tech-free days/hours) or digital distancing (rationing screen time). What I want to practice is with…


This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics.

What is a topic model?

Have you dreamed of a day when algorithms can quickly scan through your textbooks and give you a bullet point summary? How convenient! No more tedious reading! Actually, there are algorithms out there that do automatic summarization of large-scale corpus. They are called topic models. In building topic models, we basically ask computers to discover some abstract topics from the text. …


This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics.

To understand what a semantic network looks like, go ahead and run the code below.

library(quanteda) library(ggplot2) reviews_tok <- tokens(review_corpus, remove_punct = TRUE,remove_numbers = TRUE, remove_symbols = TRUE, remove_twitter=TRUE, remove_url=TRUE) reviews_tok <- tokens_select(reviews_tok, pattern = stopwords('en'), selection = 'remove') reviews_tok <- tokens_select(reviews_tok, min_nchar=3, selection = 'keep') reviews_dfm <- dfm(reviews_tok) #create a feature co-occurrence matrix (FCM) review_fcm <- fcm(reviews_dfm) #extract the top 50 frequent terms from the FCM object feat <- names(topfeatures(review_fcm, 50)) #trim the old FCM object into a one that…

This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics.

Now you are on course to try basic text mining techniques to extract insights from textual data. In this tutorial, we will try four techniques: simple word frequency, word cloud, n-grams, and keyness.

Simple word frequency

Suppose we want to see how often the word “noisy” appears in Airbnb reviews from the three cities respectively. We first create a tokenized corpus called reviews_tok and then use the dfm() function to create a DFM. Subsequently, we ask R to group the text…


This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics.

Why text cleaning?

Textual data are always messy. The data may contain words that, if taken out of context, would be meaningless. You may also encounter a group of different words which convey the same meaning. Or you might have to convert slangs and acronyms into standard English, or emojis into something computer can recognize. Only by cleaning the mess and the noise in the text will you be able to discern useful patterns and signals.

Tokenization The text cleaning process…


This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics.

Text mining: From corpus to DFM

There is a lot of interest in quantifying and visualizing textual data. Texts reveal our thoughts, our personality, and the pulse of a society. We broadly refer to the quantification of text as text mining. Thanks to the developments in Natural Language Processing and Information retrieval, we now have a wide selection of easy-to-use R libraries for cleaning, transforming, quantifying, and visualizing text.

Which R library?

Throughout the tutorials on text mining, we will use the library quantaeda, which stands for…


This post is a static and abbreviated version of this interactive tutorial on using R for social data analytics.

During the 2012 US presidential election, Twitter, in partnership with several polling agencies, launched something called Twitter Political Index. The idea was to track candidates’ popularity among voters based on sentiment expressed in tweets. Back then, such idea was a novelty. Nowadays, sentiment analysis of social media text has been widely applied in marketing/PR, electoral forecasting, and sports analytics. The NPR show Planet Money even built a Twitter bot to automatically trade stocks based on sentiments in Trump’s tweets.

There are…

Wayne Xu

assistant professor studying discord, distrust, and dishonesty on internet platforms. I occasionally writing about everyday tech and opensource computing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store