Vote 2020 graphic
Everything you need to know about and expect during
the most important election of our lifetimes

Genealogist Builds a Family Tree Connecting 13 Million People

Illustration for article titled Genealogist Builds a Family Tree Connecting 13 Million People

By using data culled from genealogy websites, computational biologist Yaniv Erlich has put together some of the largest family trees ever seen, including a single pedigree comprising 13 million individuals — some of whom date back 500 years.


Erlich recently presented his work at the American Society of Human Genetics annual meeting in Boston. His work could provide a new tool for analyzing the extent to which genes influence certain traits, like personality, longevity, and facial features. Erlich recently made his database available to other researchers, but he removed all the names to ensure privacy. And indeed, other work by Erlich has considered the end of genetic anonymity.


Called FamiLink, it's a database of crowd-sourced genealogy that contains pedigrees, demographic data, and simple phenotypic information. Websites used include MyHeritage and As of yet, FamiLinx does not contain any DNA information — but the hope is that this will soon change.

Heidi Ledford from Nature News explains more:

Pedigrees provide clues about genetic inheritance. For instance, by comparing an individual to their more distant relatives on the family tree, the change in frequency of a given trait, such as fertility, can indicate to what extent the trait has its roots in genetics. It can also provide clues as to whether the trait is controlled by a few genes that have large effects, or by many genes that each make smaller contributions.

But it takes years to assemble genealogical data for even just a few thousand individuals, said Erlich during a presentation at the meeting on 24 October. In the past, researchers have painstakingly gathered such data from church records and individual volunteers. Erlich and his team decided to streamline the process by collecting data from more than 43 million public profiles on the genealogy website The profiles typically included birth and death dates, as well as locations and, occasionally, photos uploaded by the users.

The team assembled the data into family trees that ranged from a few thousand individuals up to 13 million people in size. Erlich says that pedigrees previously available for genetic studies contained hundreds of thousands of family members at best.

It's not immediately clear how useful this data will be — like the kinds of experiments that can be performed — especially considering potential inaccuracies in the crowd-sourced reporting of family information. But as Ledford notes in her article, genealogical analysis will likely play a big part in genetic studies in the future, particularly as people become more willing to contribute data and medical records.

Top image: PSV/Shutterstock.

Share This Story

Get our newsletter


By using data culled from genealogy websites

—-and *that's* where you lost me. Anybody who has been working on genealogy longer than a year knows that there is so much bad and oft-repeated wrongness in published genealogy data on the internet that you can literally connect anyone to anyone if you try. Hell, there are "genealogists" who claim to be related to fictional characters. If you don't use real document transcriptions and official public records to do ALL of your work, it's probably not true. Even if you are, it's very likely to not be true. If take data from a pedigree someone else made without finding all of the documented evidence yourself, you're pretty much making things up.

Note that I've never said the word 'proof' - Usually the people who recorded the information you rely on for Genealogy were far less interested in the facts they recorded than you are. I remember watching Kris Kristofferson once joke with Willie Nelson that "Half of Lead Guitar is recovery." Half of Genealogy as it is practiced on the internet is wishful thinking. But it looks good.