Inferring horizontal gene transfer

From PLoSWiki
Jump to: navigation, search

Matt Ravenhall1, Nives Škuncatba, Christophe Dessimoztba*

1To be added

2To be added

* Corresponding author, email: to be added

Horizontal or Lateral Gene Transfer (HGT or LGT) occurs when a host genome obtains foreign DNA in a process that circumvents vertical inheritance. In contrast to vertical transfer, HGT is not strictly intraspecific; the presence of HGT events can therefore complicate investigations of evolutionary history. Furthermore, HGT events are often implicated in the transfer of antibiotic resistance and pathogenicity.

Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based (parametric) methods search for deviations from the genomic average whereas evolutionary history-based (phylogenetic) approaches identify genes whose evolutionary history significantly differs from that of the host species. Benchmarking for both types of methods typically relies upon simulated genomes. Currently, different inference methods tend to identify conflicting sets of HGT events, and it can be difficult to ascertain all but very simple HGT events.



Horizontal gene transfer was first observed in 1928, in Frederick Griffith's experiment: showing that virulence was able to pass from virulent to non-virulent strains of Streptococcus pneumoniae, Griffith demonstrated that genetic information can be horizontally transferred between bacteria via a mechanism known as transformation.[1] Similar observations in the 1940s[2] and 1950s[3] showed evidence that conjugation and transduction are additional mechanisms of horizontal gene transfer.

To infer HGT events, which may not necessarily result in changes of externally visible traits, most contemporary methods are based on analyses of genomic data. These methods can be broadly separated into two groups: parametric and phylogenetic methods. Parametric methods search for sections of a genome that significantly differ from the genomic average, such as GC content or codon usage. Phylogenetic methods examine evolutionary histories of genes involved and identify conflicting phylogenies. Phylogenetic methods can be further divided into those that reconstruct and compare phylogenetic trees explicitly, and those that use surrogate measures in place of the phylogenetic trees.

A considerable advantage of parametric methods is that they only require the genome under study to infer HGT events. However, because they rely on the uniformity of the host's signature to infer HGT events, not accounting for the host's intra-genomic variability will result in overpredictions—flagging native segments as possible HGT events.[4] Similarly, the transferred segments need to exhibit the donor's signature. However, this might not be the case for ancient HGT events: the transferred segments are subject to the same mutational processes as the rest of the host genome so their distinct signatures ameliorate.[5]

Phylogenetic methods benefit from many recently sequenced genomes and can identify the donor in the HGT event. However, their use in detecting HGT is fraught with caveats: the reference tree topology is not trivial to obtain; the conflicting phylogenies can be the result of an unrecognized paralogy or a gene loss, as well as a HGT event; and computational costs of reconstructing many gene/species trees can be prohibitively expensive.

Because of their complementary approaches—and often non-overlapping sets of HGT candidates—combining predictions from parametric and phylogenetic methods could provide a more comprehensive set of HGT candidate genes. In fact, already combining different parametric methods significantly improved the quality of predictions.[6][7] Moreover, in the absence of a comprehensive set of true horizontally transferred genes, discrepancies between different methods[8][9] might be resolved through combining parametric and phylogenetic methods.

Highlighting the most successful methods provides a good basis for improvements of HGT detection methods. Simulated genomes with known non-native insertions can provide a powerful evaluation tool. A database that collates predictions of parametric and phylogenetic methods can be used for benchmarking of HGT detection methods. Combined with the increases in the available computation power, the methodological improvements will result in further advancement of HGT detection.

Parametric methods

Figure 1. GC content of coding regions compared to the genome size for selected Bacteria. Full data can be browsed. The slight positive correlation could exist because G and C nucleotides are more expensive to produce—obligate endosymbiotic species always have a very small genome; the benefit of having more GC pairs is in the more stabile DNA helix, as GC pairs are bound with a triple hydrogen bond.

Parametric methods to infer HGT use species-specific sequence indicators, often called genomic signatures. If a fragment of the genome strongly deviates from the genomic signature, it is a possible horizontal transfer. For example, because bacterial GC content falls within a wide range (Fig 1), GC content of a genome segment is the simplest signature of its host genome. Commonly used genomic signatures include nucleotide composition,[10] oligonucleotide frequencies,[11] or structural features of the genome.[12]

To detect HGT using parametric methods, the host's genomic signature needs to be clearly recognizable. However, the host's genome is not always uniform with respect to the genome signature: for example, GC content of the third codon position is lower close to the replication terminus [13] and GC content tends to be higher in highly-expressed genes.[14] Not accounting for such intra-genomic variability in the host can result in over-predictions, flagging native segments as HGT candidates.[4] Larger sliding windows can account for this variability at the cost of a reduced ability to detect smaller HGT regions.[8]

Just as importantly, horizontally transferred segments need to exhibit the donor's genomic signature. This might not be the case for ancient transfers: they were subjected to the same mutational processes as the rest of the host genome, so their distinct signatures might have ameliorated[5] and consequently became undetectable using parametric methods. For example, Bdellovibrio bacteriovorus, a predatory δ-Proteobacterium has homogeneous GC content and we might conclude that its genome is resistant to HGT.[15] However, subsequent analysis using phylogenetic methods identified a number of ancient HGT events in the genome of B. bacteriovorus.[16] Similarly, if the inserted segment was previously adapted to the host's genome, as is the case for prophage insertions,[17] parametric methods might miss predicting these HGT events.

Nucleotide composition

Figure 2. Nucleotide composition methods to detect HGT rely on the small natural variations in the percentage of GC nucleotides within a genome, denoted by the blue limits; they search for deviations from the natural variations to detect atypical regions, denoted by a red asterisk, thereby highlighting potentially non-native sections of a genome.

Bacterial GC content falls within a wide range with Carsonella ruddii having GC content of 16.5%[18] and Anaeromyxobacter dehalogenans having GC content of 75%[19] (see figure 1). Even within a closely related group of α-Proteobacteria values range from about 30% to about 65%.[20] These differences can be exploited when detecting HGT events as a significantly different GC content for a genome segment can be an indication of foreign origin[10] (see figure 2).

Oligonucleotide frequencies

The Oligonucleotide frequency measures the frequency of short nucleotide stretches in the genome. It tend to vary less within genomes than between genomes and therefore can also be used as a genomic signature.[21] A deviation from this signature suggests that a genomic segment might have arrived through horizontal transfer.

Oligonucleotide frequency owes much of its discriminatory power to the number of possible oligonucleotides: if n is the size of the vocabulary and w is oligonucleotide size, the number of possible distinct oligonucleotides is nw; for example, there are 45=1024 possible pentanucleotides.

codon usage bias, which can be viewed as a special case of oligonucleotide frequency, was one of the first detection methods used in methodical assessments of HGT.[11] This approach requires a host genome which contains a strong bias towards certain synonymous codons (different codons which code for the same amino acid) which is distinct from the bias found within the donor genome. In contrast, the simplest oligonucleotide used as a genomic signature is the dinucleotide, for example the third nucleotide in a codon and the first nucleotide in the following codon represents the dinucleotide least restricted by amino acid preference and codon usage.[22]

It is important to optimise the size of the sliding window in which to count the oligonucleotide frequency: a larger sliding window will better buffer the small variability in the host genome ( figure 2) at the cost of being worse at detecting smaller HGT regions.[23] A good compromise have been reported using tetranucleotide frequencies in a sliding window of 5kb with a step of 0.5kb[24]

A convenient way of modelling oligonucleotide genomic signatures is to use Markov chains. The transition probability matrix can derived for endogenous vs. acquired genes,[25] from which Bayesian posterior probabilities for particular stretches of DNA can be obtained.[26]

Structural features

Just as the nucleotide composition of a DNA molecule can be represented by a sequence of letters, its structural features can be encoded in a numerical sequence. The structural features include interaction energies between neighbouring base pairs,[27] the twist that makes two bases of a pair non-coplanar,[28] or DNA deformability induced by the proteins shaping the chromatin.[29] The autocorrelation analysis of this numerical sequence shows characteristic periodicities in complete genomes.[30] In fact, after detecting archaea-like regions in the thermophilic bacteria Thermotoga maritima,[31] periodicity spectra of these regions were compared to the periodicity spectra of the homologous regions in the archaea Pyrococcus horikoshii.[12] The revealed similarities in the periodicity were strong supporting evidence for a case of massive HGT between two kingdoms: bacteria and archaea.[12]

Genomic context

The existence of genomic islands, short (typically 10-200kb long) regions of a genome which have been acquired horizontally, lends support to the ability to identify non-native genes by their location in a genome.[32] For example, a gene of ambiguous origin which forms part of a non-native operon could be considered to be non-native. Alternatively, flanking repeat sequences or the presence of nearby integrases or transposases can indicate a non-native region.[33] A context-aware approach has been considered as a secondary identification method, after removal of genes which are significantly native or non-native through the use of other parametric methods.[6]

Phylogenetic methods

The use of phylogenetic analysis in the detection of HGT was advanced by the availability of many newly sequenced genomes. Phylogenetic methods detect inconsistencies in gene and species evolutionary history in two ways: explicitly, by reconstructing the gene tree and reconciling it with the reference species tree, or implicitly, by examining aspects that correlate with the evolutionary history of the genes in question, e.g. patterns of presence/absence across species, or unexpectedly short or distant pairwise evolutionary distance.

Explicit phylogenetic methods

Figure 3. Explicit phylogenetic approaches focus on identifying trees which conflict with more reliable trees. Here a hypothetical gene tree depicts a frog gene within a group of mammals, a clear conflict which would suggest an HGT event from mammals to frogs.

The aim of explicit phylogenetic methods is to compare gene trees with their associated species trees. A significant difference between the two can be suggestive of a HGT event (see figure 3). Such an approach can produce more detailed results than parametric approaches because the involved species, time and direction of transfer can potentially be identified. For example, if two genes from different species share the lowest connecting node in the gene tree, but the respective species are spaced apart in the species tree, multiple duplication and loss events would need to be invoked to explain the resulting gene tree topology, whereas a single HGT event might provide a more plausible explanation.

As discussed in more details below, phylogenetic methods range from simple methods merely identifying discordance between gene and species trees to mechanistic models inferring probable sequences of HGT events. An intermediate strategy entails deconstructing the gene tree into smaller parts until it matches the species tree (genome spectral approaches).

Explicit phylogenetic methods rely upon the accuracy of the input rooted gene and species trees, yet these can be challenging to build.[34] Even when there is no doubt in the input tree, the conflicting phylogenies can be the result of evolutionary processes other than HGT, such as duplications and losses, causing these methods to erroneously infer HGT events when paralogy is the correct explanation. Similarly, in the presence of incomplete lineage sorting, explicit phylogeny methods can erroneously infer HGT events. Finally, when the donor species is not represented among the analysed species or their ancestors, explicit methods can fail to predict HGT events.

Tests of topologies

To detect sets of genes that fit poorly to the reference tree, one can use statistical tests of topology, such as Kishino-Hasegawa (KH),[35] Shimodara-Hasegawa (SH),[36] and Approximately Unbiased (AU).[37] These tests assess the likelihood of the gene sequence alignment when the reference topology is given as the null hypothesis.

The rejection of the reference topology is an indication that the evolutionary history for that gene family is inconsistent with the reference tree. When these inconsistencies cannot be explained using a small number of non-horizontal events such as gene loss, duplication, a HGT event is inferred.

One such analysis checked for HGT in groups of homologs, as best bidirectional hits, of the γ-Proteobacterial lineage.[38] Here six reference trees were reconstructed using either the highly conserved small subunit ribosomal RNA sequences, a consensus of the available gene trees or concatenated alignments of orthologs. The failure to reject the six evaluated topologies, and the rejection of seven alternative topologies, was interpreted as evidence for a small number of HGT events in the selected groups.

Tests of topology provide a way to account for the uncertainty in tree reconstruction but they do not indicate the locations that any HGT events may have occurred. For that, genome spectral or subtree pruning and regraft methods are required.

Genome spectral approaches

In order to identify the location of HGT events, genome spectral approaches decompose a gene tree into substructures (such as bipartitions or quartets) and identify those that are consistent or inconsistent with the species tree.

Bipartitions Removing one edge from a reference tree produces two unconnected sub-trees, each a disjoint set of nodes—a bipartition. If a bipartition is present in both the gene and the species trees, it is compatible, otherwise it is conflicting. These conflicts can indicate an HGT event or may be the result of uncertainty in gene tree inference. To reduce uncertainty, bipartition analyses typically focus on strongly supported bipartitions such as those associated with branches with bootstrap values or a posterior probabilities above a certain threshold. Any gene family found to have one or several conflicting, but strongly supported, bipartitions is considered as a HGT candidate.[39][40]

Quartet decomposition Quartets are trees consisting of four leaves. In bifurcating (fully resolved) trees, each internal branch induces a quartet whose leaves are either subtrees of the original tree or actual leaves of the original tree). If the topology of a quartet extracted from the reference species tree is embedded in the gene tree, the quartet is compatible with the gene tree. Conversely, incompatible strongly supported quartets indicate potential HGT events.[41]

Subtree pruning and regrafting

A mechanistic way of modelling an HGT event on the reference tree is to first cut an internal branch—i.e., prune the tree—and then regraft it onto another edge, an operation referred to as subtree pruning and regrafting (SPR).[42] If the gene tree was topologically consistent with the original reference tree, the editing results in an inconsistency. Similarly, when the original gene tree is inconsistent with the reference tree, it is possible to prune and regraft the reference tree to obtain a consistent topology. By interpreting the edit path of pruning and regrafting HGT candidate nodes can be flagged and the host and donor genomes inferred.[40]

Because SPR is NP-Hard,[43] solving the problem becomes considerably more difficult as more nodes are considered. The computational challenge lies in finding the optimal edit path, the one that requires the least number of steps,[44][45] and different strategies are used in solving the problem. For example, the HorizStory algorithm reduces the problem by first eliminating the consistent nodes[46]; recursive pruning and regrafting reconciles the reference tree with the gene tree and optimal edits are interpreted as HGT events.

Implicit phylogenetic methods

Figure 4. A strikingly irregular distance between two leaves of a phylogenetic gene tree, when compared to a tree for their associated genomes is highly suggestive of an HGT event for that gene. Here an HGT event is likely to have occurred prior to C and D's separation.

In contrast to explicit phylogenetic methods, which compare the agreement between gene and species trees, implicit phylogenetic methods compare evolutionary distances or sequence similarity. Here, an unexpectedly short or long distance from a given reference compared to the average can be suggestive of a HGT event (see figure 4). Because tree construction is not required, implicit approaches tend to be simpler and faster than explicit methods.

However, implicit methods can be limited by disparities between the underlying correct phylogeny and the evolutionary distances considered. For instance, the most similar sequence as obtained by the highest-scoring BLAST hit is not always the evolutionarily closest one.[47]

Top sequence match in a distant species

A simple way of identifying HGT events is by looking for high-scoring sequence matches in distantly related species. For example, an analysis of the top BLAST hits of protein sequences in the bacteria Thermotoga maritime revealed that most hits were in archaea rather than closely-related bacteria, suggesting extensive HGT between the two;[31] these predictions were later supported by an analysis of the structural features of the DNA molecule.[12]

However, this method is limited to detecting relatively recent HGT events. Indeed, if the HGT occurred in the common ancestor of two or more species included in the database, the closest hit will reside within that clade and therefore the HGT will not be detected by the method.

Discrepancy between gene and species distances

When considering orthologous sequences, the molecular clock hypothesis posits that the evolutionary distance between the genes should be approximately proportional to the evolutionary distances between their respective species.[48] If a putative group of orthologs contains xenologs, the proportionality of evolutionary distances may only hold among the orthologs, not the xenologs.[49]

Simple approaches compare the distribution of similarity scores for a particular Open Reading Frame (ORF) to those of other ORFs in the genome and to infer LGT from outliers.[50][51]

The more sophisticated DLIGHT method considers simultaneously the effect of LGT on all sequences within groups of putative orthologs[52]: if a likelihood-ratio test of the HGT hypothesis and a hypothesis of no HGT is significant, a putative HGT event is inferred. In addition, the method allows inference of potential donor and recipient species and provides an estimation of the time since the HGT event.

Phylogenetic profiles

A group of orthologs or homologs can be analysed in terms of the presence or absence of group members in the reference genomes; such patterns are called phylogenetic profiles.[53] To find HGT events, phylogenetic profiles are scanned for an unusual distribution of genes. Absence of a homolog in a group of closely related species is an indication that the examined gene might have arrived via a HGT event. For example, the three facultatively symbiotic Frankia sp. strains are of strikingly different sizes: 5.43 Mbp, 7.50 Mbp and 9.04 Mbp, depending on their range of hosts.[54] Marked portions of strain-specific genes had no significant hit in the reference database, and were possibly acquired by HGT transfers from other bacteria. Similarly, the three phenotypically diverse Escherichia coli strains (uropathogenic, enterohemorrhagic and benign) share about 40% of the total combined gene pool, with the other 60% being strain-specific genes and consequently HGT candidates.[55] Further evidence for these genes resulting from HGT was their strikingly different codon usage patterns from the core genes and a lack of gene order conservation typical of vertically-evolved genes.[55]

Impact of polymorphic sites

Genes are commonly regarded as the basic units transferred through an HGT event. However it is also possible for HGT to occur within genes. For example, it has been shown that horizontal transfer between closely related species often results in the exchange of ORF fragments.[56][57] The analysis of a group of four Escherichia coli and two Shigella flexneri strains also revealed that the sequence stretches common to all six strains contain polymorphic sites, consequences of homologous recombination.[58] This method of detection is, however, restricted to the sites in common to all analysed species, limiting the analysis to a group of closely related organisms.


The existence of the numerous and varied methods to infer HGT raises the question of how to validate individual inferences and of how to compare the different methods. These issues are non-trivial because the extent of HGT nature remains largely unknown. As with other types of phylogenetic inferences, the true evolutionary history cannot be established with certainty. As a result, it is difficult to obtain a representative test set of HGT events. Furthermore, HGT inference methods vary considerably in the information they consider and often identify inconsistent groups of HGT candidates[59][60]: it is not clear to which extent taking the intersection, the union, or some other combination of the individual methods affect the false positive and false negative rates [61].

Even so, several approaches to validating individual HGT inferences and benchmarking methods have been adopted, typically relying on various forms of simulation. Because the truth is known in simulation, the number of false positives and the number of false negatives can be straightforwardly computed.

Standard tools to simulate sequence evolution along trees such as INDELible[62] or PhyloSim [63] can be adapted to simulate HGT. HGT events cause the relevant gene trees to conflict with the species tree. Such HGT events can be simulated through subtree pruning and regrafting rearrangements of the species tree.[64] Some programs, such as genome evolution simulator ALF [65] directly generate gene families subject to HGT.

Simulation of HGT events can also be performed by manipulating the biological sequences themselves. Artificial chimeric genome can be obtained by inserting known foreign genes into random positions of a host genome.[66][67][8] The donor sequences are inserted into the host unchanged or can be further evolved by simulation[52], e.g. using the tools described above.


Fran, Jelena, Daniel, Steffan provided constructive comments.


  1. ^ Griffith, Fred. 1928. The Significance of Pneumococcal Types. Journal of Hygiene 27(2):113-159. [1]
  2. ^ Lederberg, J, and Tatum, E. 1946. Gene Recombination in Escherica coli. Nature 158:558.
  3. ^ Zinder, N, and Lederberg, J. 1952. Genetic Exchange in Samonella. Journal of Bacteriology 64(5):679-699.
  4. ^ a b Guindon, S, and G Perri re. 2001. Intragenomic base content variation is a potential source of biases when searching for horizontally transferred genes. Molecular Biology and Evolution 18 (9) (September): 1838-1840.
  5. ^ a b Lawrence, J G, and H Ochman. 1997. Amelioration of bacterial genomes: rates of change and exchange. Journal of Molecular Evolution 44 (4) (April): 383-397.
  6. ^ a b Azad, Rajeev K., and Jeffrey G. Lawrence. 2011. Towards more robust methods of alien gene detection. Nucleic Acids Research (February 4). doi:10.1093/nar/gkr059.
  7. ^ Xiong D, Xiao F, Liu L, Hu K, Tan Y, He S, Gao X. 2012. Towards a Better Detection of Horizontally Transferred Genes by Combining Unusual Properties Effectively. PLoS One. 7(8):e43126. doi: 10.1371/journal.pone.0043126.
  8. ^ a b c Becq, Jennifer, C cile Churlaud, and Patrick Deschavanne. 2010. A Benchmark of Parametric Methods for Horizontal Transfers Detection. PLoS ONE 5 (4) (April 1): e9989. doi:10.1371/journal.pone.0009989.
  9. ^ Poptsova, M. 2009. Testing Phylogenetic Methods to Identify Horizontal Gene Transfer. Methods in Molecular Biology 532:227-240 doi:10.1007/978-1-60327-853-9_13.
  10. ^ a b Daubin, Vincent, Emmanuelle Lerat, and Guy Perri re. 2003. The source of laterally transferred genes in bacterial genomes. Genome Biology 4 (9): R57. doi:10.1186/gb-2003-4-9-r57.
  11. ^ a b Lawrence, J G, and H Ochman. 1998. Molecular archaeology of the Escherichia coli genome. Proceedings of the National Academy of Sciences of the United States of America 95 (16) (August 4): 9413-9417.
  12. ^ a b c d Worning, P, L J Jensen, K E Nelson, S Brunak, and D W Ussery. 2000. Structural analysis of DNA sequence: evidence for lateral gene transfer in Thermotoga maritima. Nucleic Acids Research 28 (3) (February 1): 706-709.
  13. ^ Deschavanne, P, and J Filipski. 1995. Correlation of GC content with replication timing and repair mechanisms in weakly expressed E.coli genes. Nucleic Acids Research 23 (8) (April 25): 1350-1353.
  14. ^ Wuitschick JD, Karrer KM. 1999. Analysis of genomic G + C content, codon usage, initiator codon context and translation termination sites in Tetrahymena thermophila. Journal of Eukaryotic Microbiology 46(3):239–47. doi:10.1111/j.1550-7408.1999.tb05120.x. PMID 10377985.
  15. ^ Rendulic, Snjezana, Pratik Jagtap, Andrea Rosinus, Mark Eppinger, Claudia Baar, Christa Lanz, Heike Keller, et al. 2004. A predator unmasked: life cycle of Bdellovibrio bacteriovorus from a genomic perspective. Science (New York, N.Y.) 303 (5658) (January 30): 689-692. doi:10.1126/science.1093027.
  16. ^ Gophna, Uri, Robert L. Charlebois, and W. Ford Doolittle. 2006. Ancient lateral gene transfer in the evolution of Bdellovibrio bacteriovorus. Trends in Microbiology 14 (2) (February): 64-69. doi:10.1016/j.tim.2005.12.008.
  17. ^ Vernikos, Georgios S, Nicholas R Thomson, and Julian Parkhill. 2007. Genetic flux over time in the Salmonella lineage. Genome Biology 8 (6): R100. doi:10.1186/gb-2007-8-6-r100.
  18. ^ Nakabachi, Atsushi, Atsushi Yamashita, Hidehiro Toh, Hajime Ishikawa, Helen E Dunbar, Nancy A Moran, and Masahira Hattori. 2006. The 160-kilobase genome of the bacterial endosymbiont Carsonella. Science (New York, N.Y.) 314 (5797) (October 13): 267. doi:10.1126/science.1134196.
  19. ^ Liu, Zhandong, Santosh S Venkatesh, and Carlo C Maley. 2008. Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples. BMC Genomics 9: 509. doi:10.1186/1471-2164-9-509.
  20. ^ Bentley, Stephen D, and Julian Parkhill. 2004. Comparative genomic structure of prokaryotes. Annual Review of Genetics 38: 771-792. doi:10.1146/annurev.genet.38.072902.094318.
  21. ^ Kariin, S and Burge, C. 1994. Dinucleotide relative abundance extremes: a genomic signature. Trends in Genetics 11(7):283-290. doi:10.1016/S0168-9525(00)89076-9
  22. ^ Hooper, Sean D, and Otto G Berg. 2002. Detection of genes with atypical nucleotide sequence in microbial genomes. Journal of Molecular Evolution 54 (3) (March): 365-375. doi:10.1007/s00239-001-0051-8.
  23. ^ Deschavanne, P, Giron, A, Vilain, J, Fagot, G, and Fertil B. 1999. Genomic Signature: Characterization and Classification of Species Assessed by Chaos Game Representation of Sequences. Molecular Biology and Evolution 16(10):1391-1399.
  24. ^ Dufraigne, Christine, Bernard Fertil, Sylvain Lespinats, Alain Giron, and Patrick Deschavanne. 2005. Detection and characterization of horizontal transfers in prokaryotes using genomic signature. Nucleic Acids Research 33 (1): e6-e6. doi:10.1093/nar/gni004.
  25. ^ Cortez, Diego, Patrick Forterre, and Simonetta Gribaldo. 2009. “A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes.” Genome Biology 10 (6): R65. doi:10.1186/gb-2009-10-6-r65.
  26. ^ Nakamura, Yoji, Takeshi Itoh, Hideo Matsuda, and Takashi Gojobori. 2004. “Biased biological functions of horizontally transferred genes in prokaryotic genomes.” Nat Genet 36 (7) (July): 760–6. doi:10.1038/ng1381.
  27. ^ Ornstein, Rick L, Robert Rein, Donnal L Breen, and Robert D Macelroy. 1978. An optimized potential function for the calculation of nucleic acid interaction energies I. Base stacking. Biopolymers 17 (10) (October 1): 2341-2360. doi:10.1002/bip.1978.360171005.
  28. ^ el Hassan, M A, and C R Calladine. 1996. Propeller-twisting of base-pairs and the conformational mobility of dinucleotide steps in DNA. Journal of Molecular Biology 259 (1) (May 31): 95-103. doi:10.1006/jmbi.1996.0304.
  29. ^ Olson, Wilma K., Andrey A. Gorin, Xiang-Jun Lu, Lynette M. Hock, and Victor B. Zhurkin. 1998. DNA sequence-dependent deformability deduced from protein DNA crystal complexes. Proceedings of the National Academy of Sciences 95 (19): 11163 -11168. doi:10.1073/pnas.95.19.11163.
  30. ^ Herzel, H, O Weiss, and E N Trifonov. 1999. 10-11 bp periodicities in complete genomes reflect protein structure and DNA folding. Bioinformatics (Oxford, England) 15 (3) (March): 187-193.
  31. ^ a b Nelson, K E, R A Clayton, S R Gill, M L Gwinn, R J Dodson, D H Haft, E K Hickey, et al. 1999. Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima. Nature 399 (6734) (May 27): 323-329. doi:10.1038/20601.
  32. ^ Langille MG, Hsiao WW, Brinkman FS. 2010. Detecting genomic islands using bioinformatics approaches. Nature Reviews Microbiology 8(5):373-82. doi: 10.1038/nrmicro2350.
  33. ^ Hacker J, Blum-Oehler G, Mühldorfer I, Tschäpe H. 1997. Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Molecular Microbiology 23(6):1089-97. doi: 10.1046/j.1365-2958.1997.3101672.x
  34. ^ Altenhoff, Adrian M. and Dessimoz, Christophe. 2012. Inferring Orthology and Paralogy. In, Anisimova,M. (ed), Evolutionary Genomics, Methods in Molecular Biology. Humana Press, pp. 259–279.
  35. ^ Goldman, N, J P Anderson, and A G Rodrigo. 2000. Likelihood-based tests of topologies in phylogenetics. Systematic Biology 49 (4) (December): 652-670.
  36. ^ Shimodaira, H, and M Hasegawa. 1999. Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference. Molecular Biology and Evolution 16 (8): 1114.
  37. ^ Shimodaira, Hidetoshi. 2002. An approximately unbiased test of phylogenetic tree selection. Systematic Biology 51 (3) (June): 492-508. doi:10.1080/10635150290069913.
  38. ^ Lerat, Emmanuelle, Vincent Daubin, and Nancy A Moran. 2003. From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria. PLoS Biology 1 (1) (October): E19. doi:10.1371/journal.pbio.0000019.
  39. ^ Zhaxybayeva, Olga, Lutz Hamel, Jason Raymond, and J Peter Gogarten. 2004. Visualization of the phylogenetic content of five genomes using dekapentagonal maps. Genome Biology 5 (3): R20. doi:10.1186/gb-2004-5-3-r20.
  40. ^ a b Beiko, Robert G, Timothy J Harlow, and Mark A Ragan. 2005. Highways of gene sharing in prokaryotes. Proc Natl Acad Sci USA 102 (40) (October): 14332 7. doi:10.1073/pnas.0504068102.
  41. ^ Zhaxybayeva, Olga, J Peter Gogarten, Robert L Charlebois, W Ford Doolittle, and R Thane Papke. 2006. Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. Genome Res 16 (9) (September): 1099 108. doi:10.1101/gr.5322306.
  42. ^ Abby, Sophie, Eric Tannier, Manolo Gouy, and Vincent Daubin. 2010. Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests. BMC Bioinformatics 11 (1) (June): 324. doi:10.1186/1471-2105-11-324.
  43. ^ Hickey G, Dehne F, Rau-Chaplin A, and Blouin C. 2008. SPR Distance Computation for Unrooted Trees. Evolutionary Bioinformatics Online 4:17–27.
  44. ^ Hein, Jotun, Tao Jiang, Lusheng Wang, and Kaizhong Zhang. 1995. On the Complexity of Comparing Evolutionary Trees.
  45. ^ Allen, Benjamin L., and Mike Steel. 2001. Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees. Annals of Combinatorics 5 (1) (June): 1-15. doi:10.1007/s00026-001-8006-8.
  46. ^ MacLeod, Dave, Robert L Charlebois, Ford Doolittle, and Eric Bapteste. 2005. Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement. BMC Evolutionary Biology 5: 27. doi:10.1186/1471-2148-5-27.
  47. ^ Koski, L B, and G B Golding. 2001. The closest BLAST hit is often not the nearest neighbor. Journal of Molecular Evolution 52 (6) (June): 540-542. doi:10.1007/s002390010184.
  48. ^ Zuckerkandl, E. and Pauling, L.B. 1965. Evolutionary divergence and convergence in proteins. In Bryson, V.and Vogel, H.J. (editors). Evolving Genes and Proteins. Academic Press, New York. pp. 97–166.
  49. ^ Novichkov, Pavel, Marina Omelchenko, Mikhail Gelfand, Andrei Mironov, Yuri Wolf, and Eugene Koonin. 2004. Genome-Wide Molecular Clock and Horizontal Gene Transfer in Bacterial Evolution. The Journal of Bacteriology 186 (19) (October): 6575. doi:10.1128/JB.186.19.6575-6585.2004.
  50. ^ Lawrence, J. G., & Hartl, D. L. 1992. Inference of horizontal genetic transfer from molecular data: an approach using the bootstrap. Genetics, 131(3), 753–760.
  51. ^ Clarke, G D Paul, Robert G Beiko, Mark A Ragan, and Robert L Charlebois. 2002. Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. Journal of Bacteriology 184 (8) (April): 2072-2080.
  52. ^ a b Dessimoz, Christophe, Daniel Margadant, and Gaston H Gonnet. 2008. DLIGHT - Lateral Gene Transfer Detection Using Pairwise Evolutionary Distances in a Statistical Framework. Springer, 4955:315-330.
  53. ^ Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. 1999. Assigning protein functions by comparative genome analysis. Proc Natl Acad Sci U S A. 96(8):4285-8.
  54. ^ Normand, Philippe, Pascal Lapierre, Louis S Tisa, Johann Peter Gogarten, Nicole Alloisio, Emilie Bagnarol, Carla A Bassi, et al. 2007. Genome characteristics of facultatively symbiotic Frankia sp. strains reflect host range and host plant biogeography. Genome Research 17 (1) (January): 7-15. doi:10.1101/gr.5798407.
  55. ^ a b Welch, R A, V. Burland, G. Plunkett, P. Redford, P. Roesch, D. Rasko, E L Buckles, et al. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci USA 99 (26) (December): 17020 4. doi:10.1073/pnas.252529799.
  56. ^ Ochman, H, J G Lawrence, and E A Groisman. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature 405 (6784) (May 18): 299-304. doi:10.1038/35012500.
  57. ^ Papke, R Thane, Jeremy E Koenig, Franc sco Rodr guez-Valera, and W Ford Doolittle. 2004. Frequent recombination in a saltern population of Halorubrum. Science (New York, N.Y.) 306 (5703) (December 10): 1928-1929. doi:10.1126/science.1103289.
  58. ^ Mau, Bob, Jeremy D Glasner, Aaron E Darling, and Nicole T Perna. 2006. Genome-wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli. Genome Biology 7 (5): R44. doi:10.1186/gb-2006-7-5-r44.
  59. ^ Ragan, Mark. 2001. On surrogate methods for detecting lateral gene transfer. FEMS Microbiology Letters 201(2):187-191.
  60. ^ Lawrence, J G, and H Ochman. 2002. Reconciling the many faces of lateral gene transfer. Trends in Microbioloy. 10(1):1-4.
  61. ^ Poptsova, Maria S, and J Peter Gogarten. 2007. The power of phylogenetic approaches to detect horizontally transferred genes. BMC Evolutionary Biology 7: 45. doi:10.1186/1471-2148-7-45.
  62. ^ Cite error: Invalid <ref> tag; no text was provided for refs named Fletcher2009
  63. ^ Cite error: Invalid <ref> tag; no text was provided for refs named Sipos2011
  64. ^ Beiko, Robert G, and Nicholas Hamilton. 2006. Phylogenetic identification of lateral genetic transfer events. BMC Evolutionary Biology 6: 15. doi:10.1186/1471-2148-6-15.
  65. ^ Cite error: Invalid <ref> tag; no text was provided for refs named Dalquen2012
  66. ^ Cortez, Diego Q, Antonio Lazcano, and Arturo Becerra. 2005. Comparative analysis of methodologies for the detection of horizontally transferred genes: a reassessment of first-order Markov models. In Silico Biology 5 (5-6): 581-592.
  67. ^ Azad, Rajeev K, and Jeffrey G Lawrence. 2005. Use of Artificial Genomes in Assessing Methods for Atypical Gene Detection. PLoS Comput Biol 1 (6) (November 11): e56. doi:10.1371/journal.pcbi.0010056.

Cite error: <ref> tag with name "Boto2010" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Bromham2003" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Galtier2007" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Grantham1981" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Kechris2006" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Keeling2008" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Kensche2008" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Mallet2010" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Ranwez2001" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Supek2010" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Zaneveld2008" defined in <references> is not used in prior text.

Personal tools