April 21, 2015

PCA and natural selection

arXiv:1504.04543 [q-bio.PE]

Detecting genomic signatures of natural selection with principal component analysis: application to the 1000 Genomes data

Nicolas Duforet-Frebourg et al.

(Submitted on 8 Apr 2015)

Large-scale genomic data offers the perspective to decipher the genetic architecture of natural selection. To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis. We show that the common Fst index of genetic differentiation between populations can be viewed as a proportion of variance explained by the principal components. Looking at the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) after removal of recently admixed individuals resulting in 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3X). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and non-coding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). PCA-based statistics retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially in non-model species for which defining populations can be difficult. Genome scan based on PCA is implemented in the open-source and freely available PCAdapt software.


bioRxiv http://dx.doi.org/10.1101/018143

Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia

Kevin J Galinsky et al.

Principal components analysis (PCA) is a widely used tool for inferring population structure and correcting confounding in genetic data. We introduce a new algorithm, FastPCA, that leverages recent advances in random matrix theory to accurately approximate top PCs while reducing time and memory cost from quadratic to linear in the number of individuals, a computational improvement of many orders of magnitude. We apply FastPCA to a cohort of 54,734 European Americans, identifying 5 distinct subpopulations spanning the top 4 PCs. Using a new test for natural selection based on population differentiation along these PCs, we replicate previously known selected loci and identify three new signals of selection, including selection in Europeans at the ADH1B gene. The coding variant rs1229984 has previously been associated to alcoholism and shown to be under selection in East Asians; we show that it is a rare example of independent evolution on two continents.


April 19, 2015

mtDNA of Alaskan Eskimos

AJPA DOI: 10.1002/ajpa.22750

Mitochondrial diversity of Iñupiat people from the Alaskan North Slope provides evidence for the origins of the Paleo- and Neo-Eskimo peoples 

Jennifer A. Raff et al.



All modern Iñupiaq speakers share a common origin, the result of a recent (∼800 YBP) and rapid trans-Arctic migration by the Neo-Eskimo Thule, who replaced the previous Paleo-Eskimo inhabitants of the region. Reduced mitochondrial haplogroup diversity in the eastern Arctic supports the archaeological hypothesis that the migration occurred in an eastward direction. We tested the hypothesis that the Alaskan North Slope served as the origin of the Neo- and Paleo-Eskimo populations further east.

Materials and Methods:

We sequenced HVR I and HVR II of the mitochondrial D-loop from 151 individuals in eight Alaska North Slope communities, and compared genetic diversity and phylogenetic relationships between the North Slope Inupiat and other Arctic populations from Siberia, the Aleutian Islands, Canada, and Greenland.


Mitochondrial lineages from the North Slope villages had a low frequency (2%) of non-Arctic maternal admixture, and all haplogroups (A2, A2a, A2b, D2a, and D4b1a–formerly known as D3) found in previously sequenced Neo- and Paleo-Eskimos and living Inuit and Eskimo peoples from across the North American Arctic. Lineages basal for each haplogroup were present in the North Slope. We also found the first occurrence of two haplogroups in contemporary North American Arctic populations: D2a, previously identified only in Aleuts and Paleo-Eskimos, and the pan-American C4.


Our results yield insight into the maternal population history of the Alaskan North Slope and support the hypothesis that this region served as an ancestral pool for eastward movements to Canada and Greenland, for both the Paleo-Eskimo and Neo-Eskimo populations


April 13, 2015

Haplogroup G1, Y-chromosome mutation rate and migrations of Iranic speakers

The origin of Iranian speakers is a big puzzle as in ancient times there were two quite different groups of such speakers: nomadic steppe people such as Scythians and settled farmers such as Persians and Medes.

I am guessing that the story of Iranian origins will only be solved in correlation to their Indo-Aryan brethren and their more distant Indo-European relations.

Clearly, G1 cannot be Proto-Indo-European as it has a rather limited distribution in Eurasia, but it could very well have been a marker of a subset of Indo-Europeans. If it was present in ancestral Iranians, then this would geographically constrain the places where ancestral Iranians were formed.

PLoS ONE 10(4): e0122968. doi:10.1371/journal.pone.0122968

Deep Phylogenetic Analysis of Haplogroup G1 Provides Estimates of SNP and STR Mutation Rates on the Human Y-Chromosome and Reveals Migrations of Iranic Speakers

Oleg Balanovsky et al.

Y-chromosomal haplogroup G1 is a minor component of the overall gene pool of South-West and Central Asia but reaches up to 80% frequency in some populations scattered within this area. We have genotyped the G1-defining marker M285 in 27 Eurasian populations (n= 5,346), analyzed 367 M285-positive samples using 17 Y-STRs, and sequenced ~11 Mb of the Y-chromosome in 20 of these samples to an average coverage of 67X. This allowed detailed phylogenetic reconstruction. We identified five branches, all with high geographical specificity: G1-L1323 in Kazakhs, the closely related G1-GG1 in Mongols, G1-GG265 in Armenians and its distant brother clade G1-GG162 in Bashkirs, and G1-GG362 in West Indians. The haplotype diversity, which decreased from West Iran to Central Asia, allows us to hypothesize that this rare haplogroup could have been carried by the expansion of Iranic speakers northwards to the Eurasian steppe and via founder effects became a predominant genetic component of some populations, including the Argyn tribe of the Kazakhs. The remarkable agreement between genetic and genealogical trees of Argyns allowed us to calibrate the molecular clock using a historical date (1405 AD) of the most recent common genealogical ancestor. The mutation rate for Y-chromosomal sequence data obtained was 0.78×10-9 per bp per year, falling within the range of published rates. The mutation rate for Y-chromosomal STRs was 0.0022 per locus per generation, very close to the so-called genealogical rate. The “clan-based” approach to estimating the mutation rate provides a third, middle way between direct farther-to-son comparisons and using archeologically known migrations, whose dates are subject to revision and of uncertain relationship to genetic events.


Neandertal flutes debunked

Royal Society Open Science DOI: 10.1098/rsos.140022

‘Neanderthal bone flutes’: simply products of Ice Age spotted hyena scavenging activities on cave bear cubs in European cave bear dens

Cajus G. Diedrich

Punctured extinct cave bear femora were misidentified in southeastern Europe (Hungary/Slovenia) as ‘Palaeolithic bone flutes’ and the ‘oldest Neanderthal instruments’. These are not instruments, nor human made, but products of the most important cave bear scavengers of Europe, hyenas. Late Middle to Late Pleistocene (Mousterian to Gravettian) Ice Age spotted hyenas of Europe occupied mainly cave entrances as dens (communal/cub raising den types), but went deeper for scavenging into cave bear dens, or used in a few cases branches/diagonal shafts (i.e. prey storage den type). In most of those dens, about 20% of adult to 80% of bear cub remains have large carnivore damage. Hyenas left bones in repeating similar tooth mark and crush damage stages, demonstrating a butchering/bone cracking strategy. The femora of subadult cave bears are intermediate in damage patterns, compared to the adult ones, which were fully crushed to pieces. Hyenas produced round–oval puncture marks in cub femora only by the bone-crushing premolar teeth of both upper and lower jaw. The punctures/tooth impact marks are often present on both sides of the shaft of cave bear cub femora and are simply a result of non-breakage of the slightly calcified shaft compacta. All stages of femur puncturing to crushing are demonstrated herein, especially on a large cave bear population from a German cave bear den.


April 05, 2015

Biology of Genomes titles

have been announced. A sample of interest is below:

  • Population structure in African-Americans
  • Contrasting patterns in the high-resolution variation of uniparental markers in European populations highlight very recent male-specific expansions
  • Is Sanger sequencing still a gold standard?
  • The time and place of European gene flow into Ashkenazi Jews
  • 65,222 whole genome haplotypes from the Haplotype Reference Consortium and efficient algorithms to use them
  • The expansion of human populations out of Africa might have led to the progressive build-up of a recessive mutation load
  • An early modern human with a recent Neandertal ancestor
  • Great ape Y chromosome diversity reflects social structure and sex-biased behaviours
  • Theoretical analysis indicates human genome is not a blueprint but a storage of genes, and human oocytes have an instruction
  • Modeling population size changes leads to accurate inference of sex-biased demographic events
  • Exploring population structure through large pedigrees
  • Better, faster, stronger—Mixed models and PCA in the year 2015
  • Denisovan ancestry in East Eurasian and Native American populations
  • Measuring the rate and heritability of aging in Sardinians using pattern recognition
  • Dog diversity is shaped by a Central Asian origin followed by geographical isolation and admixture
  • Comparative analysis of the Y chromosome genomes of greater apes
  • Genomic analysis of ‘Paleoamerican relicts’ reveals close ancestry with Native Americans
  • Analysis of genetic history of Siberian and Northeastern European populations

April 04, 2015

In search of the source of Denisovan ancestry

bioRxiv http://dx.doi.org/10.1101/017475

Denisovan Ancestry in East Eurasian and Native American Populations.

Pengfei Qin , Mark Stoneking

Although initial studies suggested that Denisovan ancestry was found only in modern human populations from island Southeast Asia and Oceania, more recent studies have suggested that Denisovan ancestry may be more widespread. However, the geographic extent of Denisovan ancestry has not been determined, and moreover the relationship between the Denisovan ancestry in Oceania and that elsewhere has not been studied. Here we analyze genome-wide SNP data from 2493 individuals from 221 worldwide populations, and show that there is a widespread signal of a very low level of Denisovan ancestry across Eastern Eurasian and Native American (EE/NA) populations. We also verify a higher level of Denisovan ancestry in Oceania than that in EE/NA; the Denisovan ancestry in Oceania is correlated with the amount of New Guinea ancestry, but not the amount of Australian ancestry, indicating that recent gene flow from New Guinea likely accounts for signals of Denisovan ancestry across Oceania. However, Denisovan ancestry in EE/NA populations is equally correlated with their New Guinea or their Australian ancestry, suggesting a common source for the Denisovan ancestry in EE/NA and Oceanian populations. Our results suggest that Denisovan ancestry in EE/NA is derived either from common ancestry with, or gene flow from, the common ancestor of New Guineans and Australians, indicating a more complex history involving East Eurasians and Oceanians than previously suspected.


March 30, 2015

Ice age Europeans on the brink of extinction

Ice-age Europeans roamed in small bands of fewer than 30, on brink of extinction (Horizon magazine)
In some cases, small bands of potentially as few as 20 to 30 people could have been moving over very large areas, over the whole of Europe as a single territory, according to Professor Ron Pinhasi, principal investigator on the EU-funded ADNABIOARC project.

This demographic model is based on new evidence that suggests populations were much smaller than is generally thought to be a stable size for healthy reproduction, usually around 500 people. Such small groupings may have led to reduced fitness and even extinctions.

‘As an archaeologist and anthropologist, I was quite shocked to see how limited, how small the population numbers were. You know, shockingly small,’ said Prof. Pinhasi, based at University College Dublin, Ireland.


Prof. Pinhasi’s team has found that the genomes sequenced from hunter-gatherers from Hungary and Switzerland between 14 000 to 7 500 years ago are very close to specimens from Denmark or Sweden from the same period.

These findings suggest that genetic diversity between inhabitants of most of western and central Europe after the ice age was very limited, indicating a major demographic bottleneck triggered by human isolation and extinction during the ice age.

‘We’re starting to be able to reconstruct the actual dynamics of migrations and colonisation of the continent by modern humans and that’s never been done before the genomic era,’ explained Prof. Pinhasi.

He believes that early humans crossed the continent in small groups that were cut off while the ice was at its peak, then successively dispersed and regrouped over thousands of years, with dwindling northern populations invigorated by humans arriving from the south, where the climate was better.

‘You see a real reduction in population numbers and diversity, so you see the few lineages that probably split or separated before the ice age, and then stayed isolated during the ice age,’ he said. ‘Some time after the ice age, they kind of re-emerge, or disperse, and get together, as we see new contributions to European lineages from Asia and in particular the Near East.’
The last couple of statements are interesting because they hint at post-glacial recolonization of Europe after the Ice Age. So far, we are in the dark about what happened in Europe between the time of Kostenki and 8kya. Hopefully another interesting study is on its way to throw some light into the lattter part of this time interval.

March 28, 2015

Afanasievo, Okunev, Andronovo, Sintashta DNA?

A reader alerts me to this article in Russian, but you can use Google Translate to get the gist of it. Some interesting bits (note that "pit"=Yamna):
I can not ignore the question I now have is particularly exciting - the origin of the Indo-Europeans. Community Indo-Europeists animatedly discussing just appeared as a preprint work of David Raika and his colleagues discovered by studying the genomes of people Neolithic and Bronze Age that a decisive influence on the genetic landscape of Europe has had a migration of people pit culture to the north and west in the middle of the III millennium. BC .e. As a result, according to geneticists, there was a population associated with the Corded Ware culture, and from it are the origin of the later Indo-European. By the same conclusions about the same time came the other team's leading geneticists led by Eske Villerslevom.

A steppe, we thought had long been a special world, and differs sharply from the Middle East, and from the European. Migration from there - so it seemed - were mainly directed not to the west and to the east, along the steppes, in the direction of Central Asia, which the ancient Indo-Europeans, Afanasiev media culture (descendants of the people of the pit culture or their ancestors steppe) reached no later turn IV- III millennium BC. It is now confirmed and the group Villersleva.

By the way, it also happens that paleoanthropologists prompted geneticists way of research - and turned out to be right. As it happens, for example, with native Okunevskaya culture of South Siberia. When 20 years ago, we found that craniologically (by a combination of traditional measurement and we proposed new informative features of the structure of the cranial sutures and holes) okunevtsy - "cousins" of American Indians, few believed us. Firstly, in okunevtsah ever seen Caucasoid-Mongoloid Métis like the Kazakhs, and secondly, the ancestors of the Indians withdrew from Siberia to the New World at least 10 thousand. Before the Yenisey there Okunevskaya culture.

Eske Willerslev Now and his colleagues have fully confirmed our conclusion. They confirmed the close relationship between the carriers and the pit Afanasiev cultures and migration ancestors sintashtintsev and Andronov from Europe in the Urals and further to Siberia - but this is already a long time, few archaeologists and anthropologists doubted.
I hope more details will appear soon on what promises to be a very interesting new study. The author seems to be referring to his theory of a relationship between Okunev and Amerindians, and I'm wondering if this is simply "Ancient North Eurasian" ancestry or an even more specific link. Any Russian readers who can dig up more information are invited to post in the comments.

March 25, 2015

Icelanders galore

A set of four papers in Nature Genetics today. All open access. Of interest from the Y-chromosome paper:
When this rate was applied to estimate the TMRCA between two Y chromosomes that encompass the oldest known patrilineal bifurcation between any humans (representing haplogroups A00 and A0, with 75 derived mutational differences in 180 kb of XDG sequence)19, we obtained a maximum-likelihood estimate21 of 239,000 years ago and a 95% CI of 188,000–296,000 years ago (174,000–321,000 years ago when incorporating the 95% CI of our mutation rate).
This seems similar to the 254kya estimated by Karmin et al.

Nature Genetics (2015) doi:10.1038/ng.3247

Large-scale whole-genome sequencing of the Icelandic population 

Daniel F Gudbjartsson et al.

Here we describe the insights gained from sequencing the whole genomes of 2,636 Icelanders to a median depth of 20×. We found 20 million SNPs and 1.5 million insertions-deletions (indels). We describe the density and frequency spectra of sequence variants in relation to their functional annotation, gene position, pathway and conservation score. We demonstrate an excess of homozygosity and rare protein-coding variants in Iceland. We imputed these variants into 104,220 individuals down to a minor allele frequency of 0.1% and found a recessive frameshift mutation in MYL4 that causes early-onset atrial fibrillation, several mutations in ABCB4 that increase risk of liver diseases and an intronic variant in GNAS associating with increased thyroid-stimulating hormone levels when maternally inherited. These data provide a study design that can be used to determine how variation in the sequence of the human genome gives rise to human diversity.


Nature Genetics (2015) doi:10.1038/ng.3171

The Y-chromosome point mutation rate in humans

Agnar Helgason et al.

Mutations are the fundamental source of biological variation, and their rate is a crucial parameter for evolutionary and medical studies. Here we used whole-genome sequence data from 753 Icelandic males, grouped into 274 patrilines, to estimate the point mutation rate for 21.3 Mb of male-specific Y chromosome (MSY) sequence, on the basis of 1,365 meioses (47,123 years). The combined mutation rate for 15.2 Mb of X-degenerate (XDG), X-transposed (XTR) and ampliconic excluding palindromes (rAMP) sequence was 8.71 × 10−10 mutations per position per year (PPPY). We observed a lower rate (P = 0.04) of 7.37 × 10−10 PPPY for 6.1 Mb of sequence from palindromes (PAL), which was not statistically different from the rate of 7.2 × 10−10 PPPY for paternally transmitted autosomes1. We postulate that the difference between PAL and the other MSY regions may provide an indication of the rate at which nascent autosomal and PAL de novo mutations are repaired as a result of gene conversion.


Nature Genetics (2015) doi:10.1038/ng.3246

Loss-of-function variants in ABCA7 confer risk of Alzheimer's disease

Stacy Steinberg et al.

We conducted a search for rare, functional variants altering susceptibility to Alzheimer's disease that exploited knowledge of common variants associated with the same disease. We found that loss-of-function variants in ABCA7 confer risk of Alzheimer's disease in Icelanders (odds ratio (OR) = 2.12, P = 2.2 × 10−13) and discovered that the association replicated in study groups from Europe and the United States (combined OR = 2.03, P = 6.8 × 10−15).


Nature Genetics (2015) doi:10.1038/ng.3243

Identification of a large set of rare complete human knockouts 

Patrick Sulem et al.

Loss-of-function mutations cause many mendelian diseases. Here we aimed to create a catalog of autosomal genes that are completely knocked out in humans by rare loss-of-function mutations. We sequenced the whole genomes of 2,636 Icelanders and imputed the sequence variants identified in this set into 101,584 additional chip-genotyped and phased Icelanders. We found a total of 6,795 autosomal loss-of-function SNPs and indels in 4,924 genes. Of the genotyped Icelanders, 7.7% are homozygotes or compound heterozygotes for loss-of-function mutations with a minor allele frequency (MAF) below 2% in 1,171 genes (complete knockouts). Genes that are highly expressed in the brain are less often completely knocked out than other genes. Homozygous loss-of-function offspring of two heterozygous parents occurred less frequently than expected (deficit of 136 per 10,000 transmissions for variants with MAF less than 2%, 95% confidence interval (CI) = 10–261).


Long Live the 25th March 1821

March 21, 2015

Ancient mtDNA from cis-Baikal area

Russian Journal of Genetics: Applied Research January 2015, Volume 5, Issue 1, pp 26-32

Mitochondrial DNA diversity in the gene pool of the Neolithic and Early Bronze Age Cisbaikalian human population R. O. Trapezov, A. S. Pilipenko, V. I. Molodin

This paper presents the results of a study of a mitochondrial DNA sample (N = 15) from the remains of representatives of the Neolithic and Early Bronze Age (VI–III millennia BC) Cisbaikalian human population. It was found that the mitochondrial gene pool of the ancient population under study contains lineages of East Eurasian haplogroups D, G2a C, Z, and F1b. The results of the comparative analysis of the group under study with ancient and modern Eurasian populations suggest that the development of autochtonous East Eurasian genetic components was the main mechanism of the formation of the population of the Baikal region. Genetic contacts with populations of neighboring regions of Central Asia also contributed to the formation of the gene pool of the Cisbaikalian population.


March 20, 2015

Campanian Ignibrite and Neandertal demise

Geology doi:10.1130/G36514.1

Campanian Ignimbrite volcanism, climate, and the final decline of the Neanderthals

Benjamin A. Black1, Ryan R. Neely2,3,4 and Michael Manga1

The eruption of the Campanian Ignimbrite at ca. 40 ka coincided with the final decline of Neanderthals in Europe. Environmental stress associated with the eruption of the Campanian Ignimbrite has been invoked as a potential driver for this extinction as well as broader upheaval in Paleolithic societies. To test the climatic importance of the Campanian eruption, we used a three-dimensional sectional aerosol model to simulate the global aerosol cloud after release of 50 Tg and 200 Tg SO2. We coupled aerosol properties to a comprehensive earth system model under last glacial conditions. We find that peak cooling and acid deposition lasted one to two years and that the most intense cooling sidestepped hominin population centers in Western Europe. We conclude that the environmental effects of the Campanian Ignimbrite eruption alone were insufficient to explain the ultimate demise of Neanderthals in Europe. Nonetheless, significant volcanic cooling during the years immediately following the eruption could have impacted the viability of already precarious populations and influenced many aspects of daily life for Neanderthals and anatomically modern humans.


March 18, 2015

British origins (Leslie et al. 2015)

The long-awaited paper on the People of the British Isles has just appeared in Nature. I will update this entry with more information.


The authors write:
Consistent with earlier studies of the UK, population structure within the PoBI collection is very limited. The average of the pairwise FST estimates between each of the 30 sample collection districts is 0.0007, with a maximum of 0.003 (Supplementary Table 1).
These are extremely small differences in the European (let alone global) context. So, the British are, overall, a very homogeneous population. This is what led the researchers to use methods such as ChromoPainter/ fineStructure/ Globetrotter that can squeeze out fine-scale population structure by exploiting linkage disequilibrium. Thus, the authors are able to detect 17 main clusters of the British.

Most of the clusters are geographical, but some span different regions (e.g., the "yellow circle" cluster). The elephant in the room is the "red square" cluster which spans Central/South England. The authors write:
There is a single large cluster (red squares) that covers most of central and southern England and extends up the east coast. Notably, even at the finest level of differentiation returned by fineSTRUCTURE (53 clusters), this cluster remains largely intact and contains almost half the individuals (1,006) in our study.
The authors then tried to infer the ancestry of the British clusters in terms of continental European clusters, which is to be published separately. In the plot on the right, you see the British clusters (columns) and their continental European sources (rows). The authors observe that clusters that are widely represented in Britain are likely to be older, while those that are missing in some populations are likely to be younger, because they didn't have the chance to spread across Britain. For example, a couple of Norwegian clusters are strongly represented in the Orkney islands, and these are likely to reflect Viking colonization.

The authors draw conclusions on several historical episodes of British history. The big one is the extent of Anglo-Saxon ancestry:
After the Saxon migrations, the language, place names, cereal crops and pottery styles all changed from that of the existing (Romano-British) population to those of the Saxon migrants. There has been ongoing historical and archaeological controversy about the extent to which the Saxons replaced the existing Romano-British populations. Earlier genetic analyses, based on limited samples and specific loci, gave conflicting results. With genome-wide data we can resolve this debate. Two separate analyses (ancestry profiles and GLOBETROTTER) show clear evidence in modern England of the Saxon migration, but each limits the proportion of Saxon ancestry, clearly excluding the possibility of long-term Saxon replacement. We estimate the proportion of Saxon ancestry in Cent./S England as very likely to be under 50%, and most likely in the range of 10–40%.
Two other details are the lack of Danish Viking ancestry in England:
In particular, we see no clear genetic evidence of the Danish Viking occupation and control of a large part of England, either in separate UK clusters in that region, or in estimated ancestry profiles, suggesting a relatively limited input of DNA from the Danish Vikings and subsequent mixing with nearby regions, and clear evidence for only a minority Norse contribution (about 25%) to the current Orkney population.
And, the absence of a unified pre-Saxon "Celtic" population. What seems to unify "Celts" is lower levels/absence of the Saxon influence, rather than belonging to a homogeneous "Celtic" population:
We saw no evidence of a general ‘Celtic’ population in non-Saxon parts of the UK. Instead there were many distinct genetic clusters in these regions, some amongst the most different in our study, in the sense of being most separated in the hierarchical clustering tree in Fig. 1. Further, the ancestry profile of Cornwall (perhaps expected to resemble other Celtic clusters) is quite different from that of the Welsh clusters, and much closer to that of Devon, and Cent./S England. However, the data do suggest that the Welsh clusters represent populations that are more similar to the early post-Ice-Age settlers of Britain than those from elsewhere in the UK.
Unfortunately, the authors have decided not to make their data publicly available. This is very unfortunate, and will keep this research out of the hands of many people who would be interested in it and who would be interested in analyzing this data. I can already guess the disappointment of people of British ancestry from around the world who have a genealogical interest in tracing their British ancestors to particular areas of the UK. Apparently, the data is deposited in the EGA archive, access requires red tape, and is apparently limited to institutional researchers. Thus, this data, perhaps the richest genetic survey of any country to date, will not be fully utilized to further science.

Nature 519, 309–314 (19 March 2015) doi:10.1038/nature14230

The fine-scale genetic structure of the British population

Stephen Leslie et al.

Fine-scale genetic variation between human populations is interesting as a signature of historical demographic events and because of its potential for confounding disease studies. We use haplotype-based statistical methods to analyse genome-wide single nucleotide polymorphism (SNP) data from a carefully chosen geographically diverse sample of 2,039 individuals from the United Kingdom. This reveals a rich and detailed pattern of genetic differentiation with remarkable concordance between genetic clusters and geography. The regional genetic differentiation and differing patterns of shared ancestry with 6,209 individuals from across Europe carry clear signals of historical demographic events. We estimate the genetic contribution to southeastern England from Anglo-Saxon migrations to be under half, and identify the regions not carrying genetic material from these migrations. We suggest significant pre-Roman but post-Mesolithic movement into southeastern England from continental Europe, and show that in non-Saxon parts of the United Kingdom, there exist genetically differentiated subgroups rather than a general ‘Celtic’ population.


March 15, 2015

Natural selection and ancient European DNA

A new preprint on the bioRxiv studies the same data as the recent Haak et al. paper, but focuses on natural selection in Europe. Until recently, selection could only be studied by looking at modern populations, but since selection is genetic change over time effected by the environment, it's possible that studies like this will be the norm in the future.

The new study seems to confirm the results of Wilde et al. on steppe groups, as the Yamnaya had a very low frequency of the HERC2 derived "blue eye" allele and a lower frequency of the SLC45A2 "light skin" allele than any modern Europeans. The Yamnaya seem to have been fixed for the other SLC24A5 "light skin" allele which seems to have been at high frequency in all ancient groups save the "Western Hunter Gatherers".

It seems that light pigmentation traits had already existed in pre-Indo-European Europeans (both farmers and hunter-gatherers) and so long-standing philological attempts to correlate them with the arrival of light-pigmented Indo-Europeans from the steppe (or indeed anywhere), and to contrast them with darker pre-Indo-European inhabitants of Europe were misguided. If anything, it seems that the "fairest of them all" were the Scandinavian hunter-gatherers, and a combination of light/dark pigmentation was also present in Neolithic farmers and Western Hunter Gatherers in various combinations.

It also seems that both the theory that lactose tolerance started with LBK farmers and the theory that it came to Europe from milk-drinking steppe Indo-Europeans were wrong, as this trait seems to be altogether absent in European hunter-gatherers, farmers, and Yamnaya, and make a very timid appearance in the Late neolithic/Bronze Age before shooting up in frequency to the present.

Another new development is the ability to predict "genetic height" from ancient DNA. I think this may be a little bit superfluous as you can predict "actual height" by measuring long bone lengths. On the other hand, actualized height depends not only on genetics but also on diet, disease, etc., so it's useful to look at genetic changes in such polygenic traits directly.

A big surprise was the presence of the derived EDAR allele in Swedish hunter-gatherers. This allele is very rare in modern Europeans and seems to have pleiotropic effects in East Asians. This raises the question why this allele (that was so successful in East Asians), never "took hold" in Europeans. One possibility is that it never provided an advantage to Europeans (I don't think anyone really knows what it's actually good for). Another is that Swedish hunter-gatherers simply didn't contribute much ancestry to modern Europeans and so the allele never got the chance to rise in frequency by much.

bioRxiv http://dx.doi.org/10.1101/016477

Eight thousand years of natural selection in Europe

Iain Mathieson et al.

The arrival of farming in Europe beginning around 8,500 years ago required adaptation to new environments, pathogens, diets, and social organizations. While evidence of natural selection can be revealed by studying patterns of genetic variation in present-day people, these pattern are only indirect echoes of past events, and provide little information about where and when selection occurred. Ancient DNA makes it possible to examine populations as they were before, during and after adaptation events, and thus to reveal the tempo and mode of selection. Here we report the first genome-wide scan for selection using ancient DNA, based on 83 human samples from Holocene Europe analyzed at over 300,000 positions. We find five genome-wide signals of selection, at loci associated with diet and pigmentation. Surprisingly in light of suggestions of selection on immune traits associated with the advent of agriculture and denser living conditions, we find no strong sweeps associated with immunological phenotypes. We also report a scan for selection for complex traits, and find two signals of selection on height: for short stature in Iberia after the arrival of agriculture, and for tall stature on the Pontic-Caspian steppe earlier than 5,000 years ago. A surprise is that in Scandinavian hunter-gatherers living around 8,000 years ago, there is a high frequency of the derived allele at the EDAR gene that is the strongest known signal of selection in East Asians and that is thought to have arisen in East Asia. These results document the power of ancient DNA to reveal features of past adaptation that could not be understood from analyses of present-day people.

Link (pdf)

March 14, 2015

Bottleneck in human Y-chromosomes in the last 10,000 years.

A very exciting new paper has just been published in Genome Research on 456 full sequence Y-chromosomes from around the world. The authors date the MRCA of Y-chromosomes ("Y chromosome Adam") to 254 (95% CI 192–307) kya, find coalescences of major non-African haplogroups to 47–52 kya (which clearly corresponds to the Upper Paleolithic revolution), but also infer a second bottleneck that occurred in the last 10 thousand years.

The contrast (left) between mtDNA (red) and Y-chromosome (yellow) coalescences is quite noticeable. The little "dip" in the yellow curve in many regions on the right of the various regional plots corresponds to a the second bottleneck event (that was really not "one" event, but rather shows that many modern men descend from a small number of "patriarchs" of the Neolithic and Bronze Age worlds. The "when" of the dip is important:

Most human mythologies contain stories of "first men" and eponymous founders of nations; these were often ridiculed in recent times as invented stories whose purpose was to engender social cohesion through a story of shared descent. But, now it seems that these stories were at least in part true, and such ultra-prolific patriarchs do indeed stand at the beginning of many later lines of descent.

Figure 1 from the paper is an extremely useful overview of human Y-chromosome phylogeny.

The split between DT and B2'5 is placed at around ~100 thousand years ago. This corresponds perfectly (in my opinion) to the Out-of-Africa event from which most Eurasian men are probably descended. For the next thirty thousand years, Eurasians were probably confined to Arabia and the Middle East. The next major event is the foundation of the unambiguously Eurasian CT lineage ~70 thousand years ago (coinciding with the Toba eruption and the onset of super arid conditions at the onset of MIS 4). And, the final event of the "grand picture" of Eurasian prehistory was the Upper Paleolithic at ~50 thousand years ago, when Eurasians finally got the "tech" to exhibit complex behavior, invent new tools, conquer diverse environments and ultimately colonize the entire planet while driving Eurasian archaics to extinction.

An important detail in this grand picture is the fact that the authors estimate the coalescence date between D and E1'4 to ~70,000 years, coinciding with the C/GT split from the same time. These lineages are all found in Eurasia, but only E1'4 is found in Africa. I think this points clearly to back-migration of Eurasians into Africa, perhaps as environmental refugees following the c. 70kya Arabian ecological catastrophe. In any case, the fact that two separate Eurasian-specific lineages (CT and D) coalesce to ~70kya destroys the theory that the spread of modern humans into Eurasia happened together with UP-related technologies, a theory that was already on its last legs given the evidence that pre-UP admixture with Neandertals had taken place (as such admixture would have been impossible if pre-UP Eurasians were not already present outside of Africa at that time).

Genome Research doi:10.1101/gr.186684.114

A recent bottleneck of Y chromosome diversity coincides with a global change in culture

Monika Karmin et al.

It is commonly thought that human genetic diversity in non-African populations was shaped primarily by an out-of-Africa dispersal 50–100 thousand yr ago (kya). Here, we present a study of 456 geographically diverse high-coverage Y chromosome sequences, including 299 newly reported samples. Applying ancient DNA calibration, we date the Y-chromosomal most recent common ancestor (MRCA) in Africa at 254 (95% CI 192–307) kya and detect a cluster of major non-African founder haplogroups in a narrow time interval at 47–52 kya, consistent with a rapid initial colonization model of Eurasia and Oceania after the out-of-Africa bottleneck. In contrast to demographic reconstructions based on mtDNA, we infer a second strong bottleneck in Y-chromosome lineages dating to the last 10 ky. We hypothesize that this bottleneck is caused by cultural changes affecting variance of reproductive success among males.