De novo gene genesis

Researchers identify primate-specific sequences in the human genome in comparative genomics study

Amy Swinderman
Register for free to listen to this article
Listen with Speechify
0:00
5:00
DETROIT, Mich.—In comparative genomic research project using data that is already available in the public domain, researchers from Wayne State University (WSU) and the Genome Institute of Singapore have identified primate-specific sequences in the human genome. According to the researchers, who summarized their findings in the July 6 online edition of the Proceedings of the National Academy of Sciences, the study has many implications for the field of genomics research.

According to the researchers, the study, "Global discovery of primate-specific genes in the human genome," offers an explanation for lineage-specific uniqueness that is based on something completely new in evolution, not on changes to old sequences or structures. Perhaps more importantly, the researchers believe the study itself provides an interesting critique of current genomic research methods.

The researchers began their quest to find primate-specific genes by noting that despite the increasing availability of genome and transcriptome sequence data, the genomic basis of primate phenotypic uniqueness remains obscure. According to Dr. Leonard Lipovich, assistant professor of the Center for Molecular Medicine and Genetics and Department of Neurology at WSU's School of Medicine and principal investigator of the study, this challenge is due to multiple factors.

First, searching for non-conserved genes isn't emphasized by any of the major players in genomics research, Lipovich says. Although factors such as segmental duplications and positive selection have received much attention as potential drivers of primate phenotypes, single-copy primate-specific genes are poorly characterized, he says.

"There is a seldom-challenged assumption in the genomics field that functional genes must be broadly evolutionarily conserved and protein-coding," Lipovich says. "You hear a lot about using genome and transcriptome data to look at conserved genes, but investigators tend to ignore genomic intervals outside of those known genes. The efforts that are out there are unimaginative and focus primarily on finding homologs of known protein-coding genes in additional species, not on non-conserved genes and their possible role in the genomic basis of interspecies distinctions."

A second challenge, Lipovich notes, is that too much genomic and transcriptiome sequencing is being done without sufficient downstream efforts to analyze the sequence data.

"The fact that we have genome and transcriptome databases is not, by itself, helpful," he notes. "What might be helpful is developing new algorithmic approaches. In addition, these datasets frequently are not put together in a way that can help test specific hypotheses."

The Genome Institute of Singapore's Sen-Kwan Tay, who worked on the study as an extension of a dissertation for his M.Sc. degree in bioinformatics, adds that data on the genomes of humans and our nearest relative, the chimpanzee, show a 99 percent similarity in their sequences. Explanations for the substantial phenotypic differences between the two species not only include sequence differences, but also regulatory and genome structure differences and species-specific indels, Tay says.

"While the genome and transcriptome sequence data provide a lot of what we know about interspecies sequence and genomic structure differences, we still don't understand exactly how, mechanistically, these differences lead to phenotypic differences such as the uniquely higher cognitive capacity in humans, etc.," Tay says.

To address both of these concerns, the researchers screened a catalog of 38,037 human transcriptional units (TUs), compiled from EST and cDNA sequences in conjunction with the FANTOM3 transcriptome project and interrogated the intersection of transcriptome data and multispecies genome alignments to search for primate-specific genes. The comparative study, using transcriptome sequencing and transcript-to-genome alignments, mapped the human transcripts from FANTOM against the genomes of a number of organisms, including the chimpanzee, to discover de novo gene genesis.

"We searched for new classes of interspecies differences, specifically entirely new genes in primates, because such genes might provide another explanation for lineage-specific uniqueness that is based on something completely new in evolution, not on changes to old sequences or structures," Tay explains.

The researchers identified 131 TUs from transcribed sequences residing within primate-specific insertions in nine-species sequence alignments and outside of segmental duplications. Exons of 120 (92 percent) of the TUs contained interspersed repeats, indicating that repeat insertions may have contributed to primate-specific gene genesis. Fifty-nine (46 percent) primate-specific TUs may encode proteins, the researchers also found. Although primate-specific TU transcript lengths were comparable to known human gene mRNA lengths overall, 92 (70 percent) primate-specific TUs were single-exon. Thirty-two (24 percent) primate-specific TUs were localized to subtelomeric and pericentromeric regions. Forty (31 percent) of the TUs were nested in introns of known genes, indicating that primate-specific TUs may arise within older, protein-coding regions. Primate-specific TUs were preferentially expressed in reproductive organs and tissues consistent with the expectation that emergence of new, lineage-specific genes may accompany speciation or reproduction. Of the 33 primate-specific TUs with human Affymetrix microarray probe support, 21 were differentially expressed in human teratozoospermia.

"This paper suggests that the emergence of primate-specific and functional transcripts that due to de novo insertions, not arising from duplication and subsequent accelerated sequence evolution," Tay says. "By excluding segmental duplications often synonymous with gene genesis, we have also shown that there exists single-copy transcripts which are also unique to primates and presented initial evidence for function for these transcripts. For example, 21 of our 131 primate-specific transcripts were found to be differentially expressed in a separate study on severe teratozoospermia in men. A comparison of our primate-specific transcripts with primate orphan genes identified in a recent paper (Toll-Riera, et al.) shows no overlap—an indication that the global primate-specific transcript catalog is far from saturated and many primate-specific genes are still to be discovered."
The broader implication of the study is that not all genes are necessarily conserved and protein-coding, Tay says.

"There are genes that are 'neither,' but they are interesting because of their recent origin and possibly functional roles in reproduction and behavior," he says. "Such genes need to be included in drug target screens, RNA structure analyses, etc. We need to understand the mechanisms underlying the birth of these insertions, especially their non-repetitive portions."

To accomplish that, researchers will now need to update the set of primate-specific transcripts as new data becomes available, Tay says. This will enable the researchers to confirm that such evolutionary novelties are expressed, he adds.

"Additionally, there is a set of human transcripts which are deleted in chimpanzees but conserved in the rhesus macaque, and possibly other primate genomes," Tay says. "Such gene loss in the chimpanzees may also contribute to the phenotypic differences between them and us."

The study may also serve as a paradigm of how research can be conducted differently, Lipovich says.

"This is such an underrepresented area of research, and one take-home message we have is that people should be looking at publicly available data more," Lipovich says. "We established an generally applicable paradigm for exploiting the union of two publicly available resources: genome-wide sequence alignments and transcriptome data. Our approach was unbiased in that it considered all publicly available human transcriptome data, not just transcriptome data supporting already-known genes. Mapping this transcriptome data onto multispecies genomic alignments enabled us to discover primate-specific genes outside of annotated, known genes."
 

Amy Swinderman

Subscribe to Newsletter
Subscribe to our eNewsletters

Stay connected with all of the latest from Drug Discovery News.

March 2024 Issue Front Cover

Latest Issue  

• Volume 20 • Issue 2 • March 2024

March 2024

March 2024 Issue