What languages sounded like before a few thousand years ago is one of the great unsolvable mysteries of modern science. Now two linguists have come up with a bold hypothesis: the speakers of the oldest known language spoke like Yoda.

Update: The original version of this article indicated the researchers were attempting to reconstruct what language sounded like 50,000 years ago, the estimated beginning of human language. I have been informed by a representative of the Santa Fe Institute that this is incorrect - rather, the authors believe "a much more recent human evolutionary bottleneck was likely the source of the proto-language", meaning the 50,000 year estimate is not the one they are working with.

However, mainstream linguistic opinion does place 50,000 years ago as a likely origin point, and any proto-language that encompasses all living languages would have to originate around then, as humans began migrating out of Africa shortly afterward. None of this invalidates the research discussed here - and I do think there are some intriguing ideas being put forward here - but I would still say there's plenty of room for skepticism about its findings. In fairness to the researchers, I have amended the article slightly to correct my initial errors.

End update.

Murray Gell-Mann of the Santa Fe Institute and Merritt Ruhlen of Stanford have come up with a possible way to peer all the way back to a language that predates all known tongues. They focused on one of the most fundamental aspects of language: word order. Your basic sentence features a subject, a verb, and an object. For instance, in the sentence "the man eats a sandwich", "the man" is the subject, "eats" is the verb, and "a sandwich" is the object.


In English, we subscribe to a very strict word order: first the subject, next the verb, and then the object. This is known as SVO, and any attempt to deviate from that in English creates nothing but confusion. Rewriting that sentence as subject, object, then verb (SOV) gives us "the man a sandwich eats", which is complete gibberish coming from anybody other than Yoda. Meanwhile, flipping the subject and the object around (OVS) is "a sandwich eats the man", which takes on a completely different (and nonsensical) meaning.

A lot of living languages are SVO, just like English, although a decent portion are SOV. SOV is also particularly common among dead languages - Latin is a particularly well-known example. The only other significant word order type is verb-subject-object (VSO). Verb-subject-object (VSO) and object-subject-verb (OSV) are both quite rare, but even they are practically common compared to object-verb-subject (OVS), which is almost unknown outside the Dravidian language Tamil and, well, Klingon.


Here's why this matters. Gell-Mann and Ruhlen constructed a tree of 2,200 living and dead languages, tracing how word order evolved along all the different paths. They argue that all living languages, no matter what their current word order is, come from a SOV language. According to their research, the verb-first languages (VSO and VOS) were originally SVO languages like English, but those can be traced still further back to SOV languages, as can almost all other SVO languages. The object-first languages (OSV and OVS) all derive from SOV. Basically, all languages eventually lead back to a subject-object-verb word order.

Their contention, then, is that this has held true for the entire history of human language, meaning the most primitive language was likely SOV. It's an intriguing idea, to be sure, and it's certainly one of the more novel ways to tackle such a gigantic problem. Their work is an intriguing idea and a bold attempt to bridge a gap in our knowledge that is just too impossibly vast.

Of course, it should be stressed that this still probably can't tell us too much about the deeper origins of human language - something, in fairness, the researchers themselves do not claim. Even if this represents an accurate accounting of the last few thousand years of language change - and it's not at all clear that the mainstream linguistic community is even willing to go that far - there really isn't any way to know that this accurately describes the last fifty thousand years of language change.


After all, there's any number of hypothetical scenarios one could imagine. The researchers themselves say VSO and VOS languages occasionally revert back to SVO - there's no known case of a language reverting back to SOV without contact with an outsider language, but that hardly makes it impossible. We also don't know much about the verb-first languages because there are so few examples of them - over a 30,000 year timescale, could a VSO language become SOV? Maybe the last 10,000 years of language change can all be traced back to SOV, but maybe the 10,000 before that are dominated by OSV, while the 10,000 before that were all about VOS, and so on. There's really no way to tell.

Peering back that far most likely just fundamentally exceeds our current data, and it's hard to imagine what new information we could gain that would ever allow us to probe the most ancient languages with even the vaguest sense of accuracy. (I suppose finding 50,000 year old examples of translatable writing would do it, but that's about as massive a pipe dream as one could ever possibly imagine.) But this research represents an interesting attempt to take us further back than we've gone before. Now we'll have to see how it holds up to wider linguistic scrutiny.

Via PNAS. Painting of the Tower of Babel by Lucas van Valckenborch.