Skip to Main Content

It has become the geneticist’s version of, “What, me not inclusive? But I have a black friend!” In the case of modern studies of DNA, researchers called to task for studying mainly European populations often defend themselves by pointing out that they included some Yoruba (or Khoisan or Bantu or other African) DNA in their analysis, too.

Research presented at the annual meeting of the American Society of Human Genetics (ASHG) on Tuesday and Wednesday shows how “woefully inadequate” such tokenism is, said Dr. Neil Hanchard of Baylor College of Medicine, who led one of the studies.

advertisement

He and his colleagues sequenced the genomes of 426 people from 13 African countries and 50 ethnolinguistic groups from across the continent, producing “an unprecedented, in-depth cataloging of the genetic diversity of people across the African continent,” said Dr. Kiran Musunuru, a medical geneticist at the University of Pennsylvania and chair of the ASHG program committee who was not involved in the study.

The research was done as part of Human Heredity and Health in Africa (H3Africa), a consortium that was launched in 2013 to remedy the underrepresentation of that continent in genetics research.

“There is so much genetic diversity across the African continent, if you sample from just one or two ethnolinguistic groups you know something about one or two groups,” Hanchard said. Because most studies linking a DNA variant to a disease or other trait have been done on Europeans, and the human “reference genome” — hailed as the blueprint of humankind — is missing millions of DNA sequences that are found only in Africans, people of African ancestry are at risk of not benefitting from gene-based personalized medicine.

advertisement

The DNA sequences from the 50 groups told stories of both history and health. About 14% of the genome of the Berom of Nigeria, for instance, traces to East Africa, suggesting migrations 50 to 70 generations ago from there to central Nigeria. “That differentiates them from the Yoruba and other West African groups,” who have been in place for much longer, Hanchard said, and suggests that the Berom hail from Chad.

The genomes of the 50 groups, the most ever analyzed in such detail in a single study, each had unique genetic variants, underlining why the Human Genome Project does such a poor job of representing Africa. Although some 70% of the sequence in the “reference genome” the project produced came from an African-American man living in Buffalo, N.Y., a single individual of African ancestry can no more encompass the diversity of African DNA than one page of Shakespeare can show the diversity of Western literature.

Among the genetic variants in the 50 African groups that differed most from non-Africans’ were those involved in infectious disease, particularly viral infection. Evolution selected for DNA variants that gave their owners the best chance to survive and reproduce in the face of viral threats. In that way, “infectious diseases left behind unique imprints in people’s genomes,” said Musunuru.

The H3Africa study confirmed that failing to study African genomes can hinder the use of genetic medicine in people of African descent. It found more than 3 million DNA variants that had not been seen in other, mostly European, genomes. Groups from Botswana and Mali each had at least 6,000 novel variants, for instance. If a clinical test found one of the variants in a patient, its rareness might lead scientists to conclude that it causes disease, since “one of the things often used to infer pathogenicity is that a variant is very rare,” Hanchard said.

But that conclusion would likely be wrong. “With better information on African genomes, we can say, this variant is common among Africans and so is probably not a big deal,” Hanchard said.

Half of the participants in the H3Africa study had more than seven “reportedly pathogenic” (according to databases used in clinical genetics) variants. Some of those rare (again, among Europeans) variants were 10 times higher in Africans than in European-derived databases. That suggests the Africa-only variants, despite being rare by Western standards, “are probably not causing disease in this population,” said geneticist Steven Salzberg of Johns Hopkins University, who last year reported that the reference genome is missing vast amounts of African DNA.

In a related study, presented at ASHG on Wednesday, scientists at the University of California, San Francisco, took another stab at making the reference genome better reflect global genetic diversity. They sequenced 220 genomes from around the world, and found unique DNA sequences totaling 7 million base pairs (the A’s, T’s, C’s, and G’s that make up the DNA molecule), where “unique” means it’s missing from the reference genome.

Seven million isn’t much in the 3 billion base pairs of the human genome, but it’s not nothing, and likely only the tip of the iceberg of DNA sequences in non-Europeans.

“It turns out that we’re still missing important pieces” of the human genome, Musunuru said. “Right now, if you sequence a person’s genome, you compare it to that [reference] genome, and anything that doesn’t match” — such as bits of DNA that underrepresented groups have and Europeans don’t — “is thrown out.” But the discarded sequences “can have a big impact on health. Sequencing of hundreds of people from around the world allows us to fill in the blanks, so that in the future we don’t throw out that important information when we sequence a patient’s genome.”

STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect

To submit a correction request, please visit our Contact Us page.