North American Dialects On Twitter and YouTube

Using data from the Atlas of North American English (ANAE) by William Labov, Sharon Ash, and Charles Boberg combined with his own research, linguist Rick Aschmann created the detailed map above to show regional dialects throughout North America.  One of the coolest features is that he’s linked over 600 YouTube videos to the map, so that clicking a region will take you to video clips of (mostly famous) people raised in that area so that you can hear a sample of the dialect.

Researchers at Carnegie Mellon have done some similar research, though they’re using social media – Twitter specifically – as the data source, rather than just to illustrate linguistic nuance. Jacob Eisenstein and his colleagues looked at 380,000 geo-tagged tweets recently and explored the geographical dialects represented within. They saw differences in the way people abbreviate words to fit the short medium and the slang terms they used in informal messaging and were able to create a statistical model from the variation they saw that could predict the location of a user to within about 300 miles based on the dialect used.

The existence of Twitter and other informal, microblogging platforms affords a newly accessible, low-cost source of data for linguistics researchers since they don’t require labor-intensive in-person interviews to uncover patterns of informal speech:

Studies of regional dialects traditionally have been based primarily on oral interviews, Eisenstein said, noting that written communication often is less reflective of regional influences because writing, even in blogs, tends to be formal and thus homogenized. But Twitter offers a new way of studying regional lexicon, he explained, because tweets are informal and conversational. Furthermore, people who tweet using mobile phones have the option of geotagging their messages with GPS coordinates.

Carnegie Mellon University

Eisenstein also points out that the identifiable regional variation could be an indicator that the internet is less a force for homogenization than often thought.

The Georgetown University Round Table on Languages and Linguistics later this year will explore many ways in which these, “new worlds of words occasion innovative uses of language and new spaces for constructing identities, forming relationships, and expressing social meanings.” (GURT 2011)

So, expect to see plenty more research mining social media and remember to act normal online so you don’t throw off the results.

Share on Facebook Share on Twitter

More from Language

What is the long now?

The Long Now Foundation is a nonprofit established in 01996 to foster long-term thinking. Our work encourages imagination at the timescale of civilization — the next and last 10,000 years — a timespan we call the long now.

Learn more

Join our newsletter for the latest in long-term thinking