Saturday, April 28, 2012


Paleoanthropology, genealogy and the miracle of DNA

Part Two.

William Hudson. Latest update 28th April 2012

Cellular structure

Cells can be thought of as the building blocks of living organism. They consist primarily of the body of the cell or cytoplasm and the nucleus. The human nucleus contains 23 pairs of chromosomes (22 pairs of autosomes and one pair of sex chromosomes), giving a total of 46 per cell. The sex chromosomes are X & Y; Two X chromosomes produce a female; an X & Y produce a male. Y-chromosomes can only be passed on from father to son and is usually passed on without alternation or genetic mixing.  

Chromosomes are, in turn, are made up of DNA molecules. DNA is made up of four chemical bases: adenine (A), cytosine (C), guanine (G) and thymine (T). The DNA molecule is arranged as a double-helix like a ladder with the four bases making up the “rungs” and sugar phosphates making up the “rails”. The building block (nucleotides) of the DNA consists of a base plus a phosphate. Each rung of the DNA ladder consists of two bases (base pairs) but A can only bond with T and C always bonds with G.  Thus four “rung combinations” are possible: AG, GA, CT and TC.

The genetic code of organisms is determined by the arrangement of these base pairs within the DNA of the chromosomes.  A gene is the fundamental unit of hereditary and is simply a sequence of nucleotides on a chromosome (the coding region). However, about 95% of the DNA in the human genome is non-coding (aka “junk”). Outside of the nucleus, the cytoplasm of the cell includes mitochondria which has its own DNA, independent of the DNA included in the nuclear chromosomes. Mitochondria are divided into two parts, the control region and the coding region. The genetic code in mtDNA can only be passed on by females but to both sons and daughters.

A genetic marker is a distinctive feature of the DNA molecule that allows a particular position (or locus) on the molecule to be flagged.

DNA provides the “instructions” for cells to make identical copies of themselves. When spontaneous or random changes do occur, these are known as mutations. Such mutations that occur in the coding regions of chromosomes account for all genetic differences between humans whereas mutations in the non-coding or junk-DNA have no effect.

Y-chromosome testing

The Y-chromosome is inherited from father to son with minimal changes from one generation to the next. There is one type of Y-chromosome mutation that occurs at a relatively fast rate that is of use in a genealogical timeframe, known as a Short Tandem Repeat (STR). They are usually 2-5 bases in length; for example GATAGATAGATA where the three-base sequence is repeated. Results are expressed as the number of repeats (or allele) at a given marker. For example DYS390 – 24 would be 24 repeats at the marker in position DYS390. The complete set of a subject’s results is known as his haplotype.

Just knowing the haplotype numbers in of itself means little. However, the differences in STRs at select markers on the Y-chromosome (or polymorphs) can provide a basis for comparison among individuals and populations.  If the mutation rate is known, the time frame in which the two individuals shared a most recent common ancestor or MRCA can be determined. If their test results are a perfect or near perfect match, they are related within genealogy's time frame. For example, using Family Tree DNA’s 76-marker comparison, two men will most likely share a common ancestor within a genealogical timeframe with a match of 60 out of 67 or better. Probabilities can also be assigned. For example, an exact match in a high-resolution 111-marker test, there is a 95% probability that the common ancestor lived within five generations.

Another type of mutation is known as Single Nucleotide Polymorphisms (SNP) which occur only at a single nucleotide at a specific position in a chromosome and can only occur once in a single individual. An SNP is one type of Unique Event Polymorphism (UEP) which is has a mutation rate so low that it can be treated as a one-time event and is more applicable to “Deep Ancestry” studies. A group of descendents that each shares the same UEP is known as a Haplogroup.  These are identified by the letters A through S, with A being the African group from which all modern haplogroups are descended. Note: STR results can sometimes predict a likely haplogroup but this can only be confirmed by SNP testing.

mtDNA testing

Mitochondrial DNA is only passed though the maternal line and also with minimal changes from one generation to another. Testing can be done in one or both of two areas of the control region known as the Hypervariable Region (HVR1 and HVR2) although it is now possible to obtain a complete mtDNA sequence (16,659 bases or nucleotides). The test result is a string of bases, defined by their letters and raging from a few hundred to upwards of a thousand, that is compared to the Cambridge Reference Sequence (CRS) and the differences (which represent substitutions of bases) noted.  In absolute terms, the mtDNA mutation rate is low and most people have only a handful of differences with the CRS. Results are often more applicable to “Deep Ancestry” studies, rather than shorter timeframe genealogical projects. The complete set of mtDNA polymorphs then represents the individual’s haplotype. mtDNA haplogroups are also identified with letters but the sequence denotes the order in which they were discovered. As with the Y-DNA test, haplogroups can be indicated by mtDNA haplotypes but confirmation can only be obtained by SNP testing which is usually carried out in the coding region.

In some instances, mtDNA tests can have genealogical relevance but a nearly perfect match is not as helpful as it is for the above Y-DNA case. In the matrilineal case, it takes a perfect match to be really useful and even then the MRCA could have lived hundreds of years ago. The higher the resolution, the higher the chance that an exact match indicates a maternal common ancestor.

Autosomal testing

This test is carried out on the 22 non-gender determining chromosome pairs which do undergo changes (known as recombination) from one generation to another. A mixture of autosomal DNA is inherited from both parents in a roughly equal mix but it is shuffled up with each generation. Thus the test crosses gender lines; it is not restricted to either the paternal or maternal lines only. Conclusions from autosomal testing tend to be somewhat generic and the method suffers from a high error rate. Two types of test are available.

One identifies the number of times a given sequence repeats at each location (Short Tandem Repeats or STRs).  These tests would be applicable, for example, to paternity or sibling verification or adoption issues.

The second method tests for Single Nucleotide Polymorphism (SNP) and identifies the number and length of DNA segments that are shared between individuals. The more shared segments and the longer the length of those segments, the more common ancestors are possible.

Family Tree DNA

There are several genealogical DNA testing companies, most of which are in the US. Of all of these, it is difficult to argue against using Family Tree DNA, based in Houston. They offer the most complete suite of tests and, although not cheap, are competitively priced. Results are stored free for 25 years; they host many genealogical test projects, manage the largest DNA databases and are in partnership with the National Geographic Genographic Project.

Y-DNA testing is offered at three levels with 37, 67 and 111 markers. If a customer’s Y-DNA STR haplogroup cannot be predicted with 100% confidence, the Backbone SNP deep ancestry test is offered at no charge.  mtDNAPlus is a mid-level maternal line test that includes HVR1 and HVR1+HVR2 matches. mtFullSequence also tests the Coding Region. Family Finder is Family Tree DNA’s version of the autosomal DNA test.

Monday, April 23, 2012

Paleoanthropology, genealogy and the miracle of DNA


So you think that finding your great-great grandparents was tough?

Paleoanthropology, genealogy and the miracle of DNA.

Part One. William Hudson. Latest update 23rd April 2012

Did you know that the human race originated in Africa and from there migrated throughout the world? Or that 95% of modern Europeans fit into one of seven maternal ancestor groups, of ages ranging from 10,000 to 45,000 years?  Did you hear of the prehistoric cave-man from the Cheddar Gorge in southwest Britain and a modern history schoolteacher (living only a few miles away) who are descended from the same female ancestor? In kinship terms, they are some degree of cousin, some 315 times removed.  Have you heard of the Iceman, found in the Italian Alps in 1991 and dated at 5300 years old and that he is a proven ancestor of a modern Irish woman? Did you know that there is convincing evidence that the Bantu-speaking Lemba people of southern Africa have at least some Jewish ancestry?

Moving to more recent times, can you imagine that 8% of all males in a vast region of Asia, stretching from the Pacific to the Caspian Sea are all descended from Genghis Khan, the ruler of the Mongolian empire? Did you read about the exhumed bodies in Russia which were proven (with the help of the Queen of England’s husband) to be members of the executed Tsarist royal family?  By now you have probably figured out that the bond linking all these topics is DNA, or Deoxyribonucleic Acid.

DNA is the genetic material carried by all living things, including us, which allows inheritance of characteristics from one generation to the next. A DNA molecule consists of two strands that wrap around each other to resemble a twisted ladder, the famous double helix. Strands of DNA in the nucleus of the cell, which function in the transmission of hereditary information, are called chromosomes. A gene is the fundamental unit of heredity passed from parent to offspring and consists of a sequence of DNA that occupies a specific location on a chromosome. The DNA sequence within the genes of an organism can change over time, resulting in the creation of a new character or trait not found in the parental type. This is known as a mutation. If we then know the approximate period of time over which mutations occur (rate of mutation), we can compare the DNA of two like organisms and then estimate the age of their common ancestor.

There are two primary genetic methods of determining if you are related to someone who may be an ancestor, whether living or deceased. This can either be on a genealogical (historical) scale or in an archaeological framework, depending on the mutation rates applicable to the particular method being used. In both methods, similarities and differences between DNA signatures are identified. These can indicate the time to the Most Recent Common Ancestor (MRCA) of the two individuals or groups.

If the genetic make-up of two individuals are compared, the differences between them (polymorphisms) are due to mutations (changes in the DNA sequence). If these differences can be identified and the rate of mutation is known, then the elapsed time back to a common ancestor can be estimated. The assumption is made that the number of nucleotides (a repeating unit in a DNA strand) that differ between two individuals, increase in relation to the time elapsed from their last common ancestor. In other words, the closer they are related, the higher the number of matching nucleotides. Depending on the number of markers (segments of DNA with identifiable locations on a chromosome) tested and the number of matches identified, a  probability can be assigned as to how long ago this common ancestor existed.

The first method is mitochondrial (mtDNA) analysis which examines the DNA found in the mitochondria, a circular strand of DNA found outside of the cell’s nucleus. This method traces ancestors through the maternal line because mitochondrial DNA is only passed from the mother to her children, both sons and daughters. Because mtDNA mutation rates are relatively slow, the test is more often used to study long-term population developments such as human migrations and can reveal details about the distant origins of maternal ancestors.

The second method uses the Y chromosome which is passed only through the male line.  This characteristic allows us to trace a direct genetic line of inheritance from fathers to sons. Because women don’t carry the Y-chromosome, their patrilineal ancestry can be traced only through a DNA sample from a father or brother. The second characteristic that makes the Y chromosome unique is that the information carried on this chromosome is inherited largely intact over time. Unlike other chromosomes, in most instances the genetic material on the Y is not mixed with each new generation. However, during the DNA copying process from one generation to another, small changes or mutations do occasionally occur and it is these mutational differences that allow us to distinguish the Y chromosome of an individual from his ancestor's. Depending on the number of DNA markers tested and the number of matches between individuals, the tests will indicate with a certain degree of probability how long ago their common ancestor existed.

Y-chromosome analysis is generally more suitable for genealogical study as the faster mutating DNA patterns have durations of hundreds of years whereas the slower mutating mtDNA patterns last for thousands of years. However, Y-chromosome recovery from ancient remains is very difficult whereas it is possible to recover mtDNA, depending on the conditions of burial.

A third, solely genealogical, method is relatively recent and is more restricted in application. It tests the autosomal chromosomes. Compared to Y-DNA and mtDNA tests, it is broader (can find matches in any branch of a family and is not limited to just the paternal or maternal lines) but also shallower (only works when people share relatively recent ancestors). Both men and women can take this test.

To first address the archaeological time scale, consider the evolutionary route by which modern humans arrived. The path by which Homo sapiens evolved is much like a family tree. Some branches die out, while others have progeny that continue the line. In some instances, the successful line might be living at the same time as one which later becomes extinct. Homo sapiens evolved about 200,000 years ago, possibly in Ethiopia. At that time there was at least one other older “cousin” still sharing the earth with us, namely Homo neanderthalensis who lived in Europe until about 25,000 years ago at about the time of the last Ice Age. Two key questions kept surfacing. First, were we descended directly from these folks or alternatively, did we share common ancestors? Second, did humans evolve on one part of the earth and then migrate to the other continents or did we evolve concurrently on several continents?


DNA analysis takes us closer to the answers of both of these questions. Svante Pääbo of the Max Planck Institute for Evolutionary Anthropology in Leipzig sequenced Neanderthal DNA from bones found in the Vindija cave in Croatia. He and his team  concluded that between 1-4% of the DNA of people today who live outside Africa came from Neanderthals, the result of interbreeding between them and early modern humans.

Moreover, our species most likely originated in Africa and then migrated throughout the rest of the world (the “out of Africa” theory) rather than simultaneously evolving from a prior hominid in a number of locales (the less likely multi-regional theory). Modern humans evolved relatively recently from a small founding population of a few thousand people living in Africa. 

Early migrations of specific human populations are known as haplogroups which are usually associated with a geographic region. Both mtDNA and Y-DNA tests provide haplogroup information but use different nomenclatures. A y-DNA haplogroup is defined as all of the male descendants of the single person who first showed a particular type of genetic mutation. Simiarly, a mtDNA haplogroup is defined as all of the female descendants of the single ancestor who first showed a particular genetic polymorphism.

Worldwide, there were probably at least 36 maternal ancestral groups as defined by mtDNA analysis. The “the clan mothers” were clearly not the only females alive at the time but they were the only women to have direct maternal descendants living through to the present day. The other women around, or their descendants, either had no children at all or had only sons, who could not pass on their mtDNA.  Mitochondrial DNA analysis through the female line can hopefully identify a subject’s matrilineal ancestral groups. For example, one author claims that most inhabitants of Europe are descended from just seven women who arrived on the continent at different times during the last 45,000 years. The data was taken from an analysis of 6,000 mtDNA samples and found that the seven "ancestral mothers" have strong links to one of three groups in Africa today. Virtually all European populations have representatives of all seven "mothers".  If you have European ancestry, modern DNA sampling can thus  place your family into one of these archaic groups. Similar studies are being carried out on other geographic and ethnic groups around the world. Obviously, all the clan mothers had ancestors themselves. Their genealogies show how everyone alive on the planet today can trace their maternal ancestry back to just one woman (“Mitochondrial Eve”). She lived in Africa about 150,000+ years ago.

Ancestral men were similarly clustered in a relatively small number of groups, perhaps about 18 in total, which can be defined by the genetic signature of their y-DNA. The men within each of these groups are all ultimately descended from just one man, their “clan father”. Again, these ancestral clan fathers were not the only men around at the time, but they were the only ones to have direct male descendants living today. The other men around at the time, or their descendants, either had no children at all or had only daughters. For example, one study has concluded that most European men alive today are probably descended from five ancient groups of forefathers. Furthermore, 80% of European men inherited their Y-chromosomes from primitive hunter-gatherers who lived up to 40,000 years ago. The remaining 20% of male ancestors are likely to have been migrants who arrived in Europe from the Near East about 10,000 years ago. These clan fathers themselves had male ancestral lines and these ultimately converge on the common paternal ancestor of every man alive today (“Y-chromosomal Adam”). This man is believed to have lived in Africa, 60,000+ years ago.

On a historical or genealogical time-scale, Y-chromosome tests can help determine:

  1. Whether individual males share a common male ancestor (the Most Recent Common Ancestor, or MRCA).  An analysis of the mutations in the Y-chromosome can be used to estimate the degree of separation between the men, expressed as the number of generations since the separation of their lineages occurred.
  2. If a set of men with the same or similar surname are directly related through a common ancestor.
  3. How many different common male ancestors are shared by any given male group.
  4. Paternity and name-change uncertainties
  5. To which broad haplogroup an individual male belongs, possibly including his geographic origins in another continent or country

The use of mtDNA on a genealogical time scale is rather more challenging due to the slower mutation rate. When mitochondrial DNA sequencing is used for genealogical purposes, the results are usually reported as differences from the revised Cambridge Reference Sequence (CRS), the first mtDNA donor that was completely sequenced.  However, as mtDNA is more likely to be preserved in the remains of deceased people, it has been used to resolve historical mysteries such as the Titanic Baby and the identification of the Unknown Soldier in addition to those case histories mentioned above.

There are now dozens of genealogical “one-name” groups who are pooling resources and building DNA databases of all their members based on the fact that, at least in many Western societies, both surnames and Y-chromosomes are passed down via the male line. “In a medium resolution test, an exact match on all markers by two men sharing the same surname generally implies that they share a common male ancestor within a genealogically relevant time frame” (Pomery, 2004).  Such projects are also based on assumptions that the surnames have a unique origin (it would not work for “Smith”, for example) and that there are few illegitimacies in the pedigree.

Groups of specific ethnicity are also using DNA to determine their ancestry. For example, black Americans are using the latest genetic research to make once-impossible connections to their ancestral homelands. African Ancestry claims it can usually trace at least one family bloodline to specific geographic areas on the African continent. Similarly, several Native American groups have embarked on DNA projects. Trace Genetics is one company that targets the Native American segment of this market.