Displaying the lexicon of a nation

We recently came across Brad Borevitz’ State of the Union (SOTU), a remarkable web project that allows users to explore changes in use of the English language in the United States over the past 200 years, using the State of the Union as a window. As the site’s author puts it, “SOTU allows you to explore how specific words gain and lose prominence over time, and to link to information on the historical context for their use”. The project synthesizes a great deal of data (in this case, every word uttered in over 200 years of State of the Union addresses) and displays it in a visual format that is easy to digest and easy to navigate. Through the manner in which the author has graphed each speech, he has enabled users to grab information at a glance about the relative abundance and frequency of specific words in specific speeches.

We took a quick tour through the decades and were immediately struck by the differences in word usage across different eras in our nation’s history and how the words shed light on the priorities and concerns of our nation at that time, or at least those of the presidential administration. This connection is further highlighted by a one-click feature in each graph that reveals a timeline in the sidebar describing events that happened in and around the year of that particular address

Aside from its value as a tool for both historical evaluation and linguistic study, this project is also a testament to the power (existing and potential) of open resources across the web, including the very software used write the interface. To that point, here is a statement taken from the site’s “About” page, where such sources are credited:

State of the Union was authored by Brad Borevitz using Java for the analysis and Processing for the graphic user interface. A project such as this, although executed directly by a single person, is a collaborative effort. The work would not be possible without the kind of open resources that are currently available on the web. It required a good deal of research on hundreds of websites to build the knowledge-base that produced this project.

I have relied directly on the following resources to implement the code:

The interface is written using Processing.

The search engine is Sphider.

Timelines are from Wikipedia.

Most of the text was from Project Guttenberg.

JavaScript for highlighting is based on code from Kryogenix.

JavaScript for switching style sheets is from A List Apart.

Methods for building frequency word lists were based on code in Andrew Roberts’ aConCorde.

The syllable counting algorithm is by Daniel Schiffman.

Thanks to Christiane Paul & Martin Wattenberg for feedback on the new version.

SHARE VIA

One Comment