Color-coded text reveals the foreign origins of your words

Illustration for article titled Color-coded text reveals the foreign origins of your words

Your language is not your own. The words you speak have been borrowed, modified, and molded by the forces of linguistic evolution. And the sentences they form are not so much "English" as they are a shapeshifting hodgepodge of different languages that have intersected with English over the years.

Not that many of us would ever know it. Sure, the etymologies and histories of many words may only be a dictionary-reference away, but few of us have the time or inclination to investigate where these words — let alone entire sentences — actually come from.

Unless, of course, you're Mike Kinde, who maintains the ridiculously enthralling data visualization blog Ideas Illustrated. Looking to better understand the role of foreign words in his day-to-day use of the English language, Kinde whipped up a program that would allow him to actually see precisely that:

Using Douglas Harper's online dictionary of etymology, I paired up words from various passages I found online with entries in the dictionary. For each word, I pulled out the first listed language of origin and then re-constructed the text with some additional HTML infrastructure. The HTML would allow me to associate each word (or word fragment) with a color, title, and hyperlink to a definition.


Kinde associated Old English with pink, Middle English with red, Anglo-French with orange, Old French with light orange, Middle French with pale orange, Classical & Medieval Latin with yellow, Gallo-Roman & Middle Low German with gray, and American with green. His system allowed him to analyze everything from simple, etymologically homogenous-looking sentences:

The quick brown fox jumps over the lazy dog.

To complex Monty Python quotes:

Supreme executive power derives from a mandate from the masses, not from some farcical aquatic ceremony.

To passsages from classic American literature, like this excerpt from Mark Twain's The Adventures of Tom Sawyer:

Tom gave up the brush with reluctance in his face, but alacrity in his heart. And while the late steamer Big Missouri worked and sweated in the sun, the retired artist sat on a barrel in the shade close by, dangled his legs, munched his apple, and planned the slaughter of more innocents. There was no lack of material; boys happened along every little while; they came to jeer, but remained to whitewash. By the time Ben was fagged out, Tom had traded the next chance to Billy Fisher for a kite, in good repair; and when he played out, Johnny Miller bought in for a dead rat and a string to swing it with - and so on, and so on, hour after hour. And when the middle of the afternoon came, from being a poor poverty-stricken boy in the morning, Tom was literally rolling in wealth. He had beside the things before mentioned, twelve marbles, part of a jews-harp, a piece of blue bottle-glass to look through, a spool cannon, a key that wouldn't unlock anything, a fragment of chalk, a glass stopper of a decanter, a tin soldier, a couple of tadpoles, six fire-crackers, a kitten with only one eye, a brass door-knob, a dog-collar - but no dog - the handle of a knife, four pieces of orange-peel, and a dilapidated old window sash .

Illustration for article titled Color-coded text reveals the foreign origins of your words

Things get even more interesting when Kinde starts creating pie charts that compare the word origins in work by American vs non-American authors, or in legal texts, medical publications, and sports articles. The two pie charts shown here, for example, illustrate the marked difference between word origins in the Tom Sawyer passage and a paragraph from a medical journal. With etymological underpinnings like these, it's no wonder people can find medical and scientific articles so impenetrable; only about half the words have origins in Old English. Compare that to something like a sports article, where Kinde finds that figure hovering around 80%.


Kinde says a website where you can upload your own passages and have them analyzed and color-coded is in the works. In the meantime, however, you'll find many more word-origin visualizations and distribution breakdowns on his blog, Ideas Illustrated. (By the way: I wasn't kidding about it being enthralling; if you have the slightest interest in data science, design, or visualization, Kinde's blog entries will consume hours of your time. You've been warned. Proceed with caution.)

All figures via Ideas Illustrated.

Share This Story

Get our newsletter


I am reminded of Poul Anderson's "Uncleftish Beholding," which was an introduction to atomic physics written with all the technical terms derived from Germanic rather than Romantic roots.