Data Vis: Trump’s Favourite Twitter Insults

Few voices on Twitter have become as recognisable as the next American President-elect, Donald Trump. His signature style of banter has splattered nearly every national newspaper in the world over the past year. So much so that some publishers have devoted entire graphics to Trump’s twitter account.

One such graphic is the New York Times’ live interactive: The 282 People, Places and Things Donald Trump has Insulted on Twitter: A Complete List. The page is a scrolling mass of links to original tweets sent by the President-elect, categorised by the intended recipient of said insult.

Despite being only text, the page still gives a good general sense of who Donald Trump has insulted most since announcing his candidacy last June. Hillary Clinton takes a clear number one spot on that front. But I was curious to see what kinds of insults Trump (and his campaign team) began to fall back on, a sort of branding of the Trump banter.

Most people are now quite familiar with zingers like “crooked Hillary” and “lyin’ Ted”. But I wanted to get a better overall picture at the other, less publicized insults. After scraping some data, cleaning up the csv file and playing around in Tableau a bit, here’s what I found.

trump-twitter-insults

For hover interactivity to see all insults, click on the graphic to see the original Tableau Sheet.

Unsurprisingly, “crooked” topped the list of insults at 402 instances. But some other interesting Trumpisms also stood out from the vis: calling someone a “clown” 28 times for example or using the word “dummy” 34 times.

trump-insults-pie

In order to better understand his style of insulting, I broke down the types of insults into four categories: dishonest, failure, general insult and weak. Though not very descriptive, the category of “general insult” becomes necessary when you look at the types of names that fall into this color. To prevent the bubble chart from becoming overwhelming with the amount of colors, I lumped all the bizarre insults that didn’t fit in the other three categories here.

Trump’s bread and butter on Twitter appears to be attacking the “crooked”, “liars”, and generally “dishonest” accounts online. Then come general insults ranging from “bad” to “dopey”, followed in frequency by insults relating to being a failure or incompetent. Finally, a fair amount of tweets also fell into a category of “weakness”, belittling insults only befitting of a masculine demagogue like “lightweight”, “pathetic”, and “weak”.

How it’s made

A couple caveats before I dive in to how I went about creating the data vis:

  • The word categorisation is not an exact science. I used my best judgement as far as lumping similar words into a category.
  • I also took the liberty of combining words like “lies, lyin’, liar, lied” into the single insult of “liar”, since they all seem to be getting at a similar idea.
  • Some words may be taken out of their original context, and I acknowledge this. For example, the word “little” could be used as an insult or as an adjective. For exact tweets, it’s best to check the context of the full tweet which can all be found on the NYT page.

Now for the fun part. Here are the tools and processes I used in order to collect the data and visualise it.

    1. I scraped the text off the NYT interactive using the Data Miner chrome extension. It’s not nearly as fancy as using a language like Python, but for scraping single pages it’s fast and easy. The extension allowed me to scrape all the link text on the page into a single csv file.
    2. The formatting of the csv was terrible upon first opening, so I had to do a fair bit of cleanup work before sorting. I used the Find & Replace option in Excel to get rid of any weird characters that were muddled in with the text I scraped.
    3. After cleaning, I copied and pasted the results into a simple word frequency counter. Once each word was counted from all the tweets, I used Data Miner again to scrape each word with its count into a different csv file.
    4. The word counter I used isn’t picky, so it even counts articles like “the” and “a”. First step here was to use Find & Replace to get rid of all the words I knew weren’t insults. Then I had to do a bit of manual searching to get rid of other words that I didn’t consider insulting.
    5. Once I had my csv file looking decent, I started to categorise the words, combining similar insults into one word and adding the counts together.screen-shot-2016-11-30-at-11-47-11-am
    6. Once my spreadsheet was ready to go, I connected it to Tableau Public and began looking for the best vis option. I dragged the “Frequency” column to Rows and “Word” to Columns. I liked the way that the bubble chart grouped the words together, so I chose that option and added the “Category” column as the color differentiator. screen-shot-2016-11-30-at-5-44-25-pm
    7. Finally, I did a bit of formatting on the Dashboard view of Tableau to make sure I included the Source at the bottom and the Category legend on the chart to reduce whitespace. You can do this by selecting the tiny triangle button by category, and choosing the “Floating” option for the legend.

Want to do your own vis? I loaded the data on Octopub.io, a fantastic tool for open datasets. It’s hosted by the Open Data Institute. Find the dataset here.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s