Inferring horizontal gene transfer

From PLoSWiki
Jump to: navigation, search

Matt Ravenhall1, Nives Škuncatba , Christophe Dessimoztba*

1To be added

2To be added

* Corresponding author, email: to be added

Horizontal or Lateral Gene Transfer (HGT or LGT) occurs when a host genome obtains foreign DNA in a process that circumvents vertical inheritance. In contrast to vertical transfer, HGT is not strictly intraspecific; the presence of HGT events can therefore complicate investigations of evolutionary history. Furthermore, HGT events are often implicated in the transfer of antibiotic resistance and pathogenicity.

Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based (parametric) methods search for deviations from the genomic average whereas evolutionary history-based (phylogenetic) approaches identify genes whose evolutionary history significantly differs from that of the host species. Benchmarking for both types of methods typically relies upon simulated genomes. Currently, different inference methods tend to identify conflicting sets of HGT events, and it can be difficult to ascertain all but very simple HGT events.



Initial discoveries of horizontal gene transfer events relied upon observation of a particular trait, such as virulence, moving from trait-positive to trait-negative organisms. Most prominently, the first evidence that genetic information could pass between bacteria was witnessed in 1928 when Frederick Griffith demonstrated that virulence was able to pass to non-virulent strains of "Streptococcus pneumoniae" in what is now known as "Griffith's Experiment"[1]. Later evidence for conjugation and transduction, the other known methods of horizontal gene transfer, was found in the 1940s[2] and 1950s[3] through similar observations. Given that HGT is not limited to the transfer of externally visible traits, other methods are required to detect these events. Contemporary methods rely upon genomic data and can broadly be separated into two groups: parametric and phylogenetic.

Parametric methods identify sections of a genome that differ significantly from a genomic average, such as GC percentage or codon usage, whilst phylogenetic approaches examine evolutionary histories and identify conflicting phylogenies. Phylogenetic methods can be further divided into those that reconstruct and compare phylogenetic trees explicitly, and those that use surrogate measures in place of the phylogenetic trees. Whilst the parametric approaches benefit from only requiring the genome under study they are limited, due to amelioration of transferred sequences, to only discovering recent HGT events and must take intra-genomic variation into account to reduce false positives [4]. On the other hand, the phylogenetic approaches benefit from many recently sequenced genomes and hold an improved ability to identify the specific donor, time and direction of horizontal transfer but also bring significant computational demands, especially when considering explicit methods.

Significant improvements have occurred through combining different parametric methods[5][6], offering a potential solution to discrepancies between different methods[7][8]. Future advances may occur through wider combinations, perhaps with parametric and phylogenetic methods together. For this a database could effectively collate benchmarking data for future evaluation. Advances in HGT detection should also benefit from future increases in computational power but must be founded upon effective calculation of false positive and negative results. Simulated genomes with known non-native insertions provide a powerful evaluation tool but it is vital that non-native sequences are incorporated as realistically as possible to draw meaningful conclusions.

Parametric methods

Figure 1. GC content of coding regions compared to the genome size for selected Bacteria. Full data can be browsed. The slight positive correlation could exist because G and C nucleotides are more expensive to produce—obligate endosymbiotic species always have a very small genome; the benefit of having more GC pairs is in the more stabile DNA helix, as GC pairs are bound with a triple hydrogen bond.

Many aspects of genome sequence composition are species-specific "genomics signatures" which can be utilised to identify sequences that have arrived through horizontal transfer. Commonly used signatures include nucleotide composition[9], oligonucleotide frequencies[10], or structural features of the genome[11]. Parametric HGT inference methods identify fragments of a genome with atypical signatures.

A requirement for parametric methods is that the host's genomic signature is clearly recognizable whilst still taking intra-genomic variability into account, so as to reduce false positives[4]. For example, it has been observed that the GC content of the third codon position is lower closer to the replication terminus[12]. False positives may also occur for genes with significantly high or low rates of expression as GC content has been found to be higher with greater expression[13]. Larger sliding windows can account for this variability at the cost of a reduced ability to detect smaller HGT regions [7].

Just as importantly, horizontally transferred segments need to exhibit the donor's genomic signature, although specific identification of the donor still may not be possible. This requirement can represent an issue for ancient transfer events as the transferred segments will have been subjected to the same mutational processes as the rest of the host genome, causing their distinct signatures to ameliorate[14]. A parametric approach will therefore be restricted to the identification of more recent transfers. Similarly, if the inserted segment was previously adapted to the host's genome, as is the case for prophage insertions[15], the power of sequence composition methods in detecting HGT is reduced.

One notable example of the potential flaws with parametric methods is that of Bdellovibrio bacteriovorus, a predatory δ-Proteobacterium. Its initial analysis, based on homogeneous GC content, found that its genome is resistant to HGT[16]. However, subsequent research with phylogenetic analysis identified a number of ancient HGT events in the genome[17].

Nucleotide composition

Figure 2. Nucleotide composition methods to detect HGT rely on the small natural variations in the percentage of GC nucleotides within a genome, denoted by the blue limits; they search for deviations from the natural variations to detect atypical regions, denoted by a red asterisk, thereby highlighting potentially non-native sections of a genome.

Bacterial GC content falls within a wide range with Carsonella ruddii having GC content of 16.5%[18] and Anaeromyxobacter dehalogenans having GC content of 75%[19] (see figure 1). Even within a closely related group of α-Proteobacteria values range from about 30% to about 65%[20]. These differences can be exploited when detecting HGT events as a significantly different GC content for a genome segment can be an indication of foreign origin[9] (see figure 2).

Oligonucleotide frequencies

Oligonucleotide frequency varies less within a genome than between genomes and therefore represents a valid genomic signature[21]. Any deviation from this signature suggests that a genomic segment may have arrived through horizontal transfer. This discriminatory power relies upon the large number of possible oligonucleotides. To demonstrate, if 'n' is the size of the vocabulary and 'w' is oligonucleotide size, the number of possible distinct oligonucleotides is nw; for example, there are 44=256 possible tetranucleotides.

One of the first detection methods used in methodical assessments of HGT was codon usage bias, which uses trinucleotide frequencies [10]. This approach requires a host genome which contains a strong bias towards certain synonymous codons (different codons which code for the same amino acid) which is distinct from the bias found within the donor genome. In contrast, the simplest oligonucleotide used as a genomic signature is the dinucleotide, for example the third nucleotide in a codon and the first nucleotide in the following codon represents the dinucleotide least restricted by amino acid preference and codon usage[22].

Optimising the size of the sliding window is of great importance as a larger sliding window can better account for the variability in the host genome (see figure 2) at the cost of a reduced ability to detect smaller HGT regions[23]. To balance reliability with computational demand, one suggested optimal length is tetranucleotide frequencies in a sliding window, such as 5kb with a step of 0.5kb[24]

A more complex method of characterising a genomic signature utilises a set of typical host genes[citation needed]. In the case of a Markov model-based approach a transition probability matrix is derived using typical genes[25], for a Bayesian model the posterior probabilities of a sequence are calculated based upon the typical genomic signatures[26].

Structural features

Just as the nucleotide composition of a DNA molecule can be represented by a sequence of letters, its structural features can be encoded in a numerical sequence. The structural features include interaction energies between neighbouring base pairs [27], the twist that makes two bases of a pair non-coplanar [28], or DNA deformability induced by the proteins shaping the chromatin [29]. The autocorrelation analysis of this numerical sequence shows characteristic periodicities in complete genomes [30]. In fact, upon detecting archaea-like regions in the thermophilic bacteria Thermotoga maritima[31], periodicity spectra of these regions were compared to the periodicity spectra of the homologous regions in the archaea Pyrococcus horikoshii[11]. The revealed similarities in the periodicity were strong supporting evidence for a case of massive HGT between two kingdoms: bacteria and archaea[11].

Genomic context

The existence of genomic islands, short (typically 10-200kb long) regions of a genome which have been acquired horizontally, lends support to the ability to identify non-native genes by their location in a genome[32]. For example, a gene of ambiguous origin which forms part of a non-native operon could be considered to be non-native. Alternatively, flanking repeat sequences or the presence of nearby integrases or transposases can indicate a non-native region[33]. A context-aware approach has been considered as a secondary identification method, after removal of genes which are significantly native or non-native through the use of other parametric methods[5].

Explicit phylogenetic methods

Figure 3. Explicit phylogenetic approaches focus on identifying trees which conflict with more reliable trees. Here a hypothetical gene tree depicts a frog gene within a group of mammals, a clear conflict which would suggest an HGT event from mammals to frogs.

The aim of explicit phylogenetic methods is to compare phylogenetic trees for various genes with the tree for their associated species. Significant differences between the two can be suggestive of a HGT event (see figure 3). Such an approach can produce more detailed results than parametric approaches because the involved species, time and direction of transfer can potentially be identified.

As discussed in more details below, phylogenetic methods range from simple and efficient methods of discordance identification to complex mechanistic models that infer probable sequences of HGT events. An intermediate strategy consists of deconstructing the gene tree into smaller parts until it matches the species tree (genome spectral approaches). Explicit phylogenetic methods rely upon the accuracy of the input species and gene trees. However, the computational complexity of reconstructing well-resolved, rooted gene tree or a species tree can be a challenge.

Even if there is no doubt in the input tree, the conflicting phylogenies can be the result of evolutionary processes other than HGT, such as duplications and losses. These can result in undetected paralogy or incomplete lineage sorting. An additional complication arises, if the donor species is not represented among the set of species (or their ancestors) considered.

Tests of topologies

To detect sets of genes that fit poorly to the reference tree, one can use statistical tests of topology, such as Kishino-Hasegawa (KH)[34], Shimodara-Hasegawa (SH)[35], and Approximately Unbiased (AU)[36]. These tests assess the likelihood of the gene sequence alignment when the reference topology is given as the null hypothesis.

The rejection of the reference topology is an indication that the evolutionary history for that gene family is inconsistent with the reference tree. When these inconsistencies cannot be explained using a small number of non-horizontal events such as gene loss or mutational change, a HGT event is inferred.

check for likelihood ratio tests as well include EEEP, uses bayesian poserior probability

One such analysis checked for HGT in groups of homologs, as best bidirectional hits, of the γ-Proteobacterial lineage[37]. Here six reference trees were reconstructed using either the highly conserved small subunit ribosomal RNA sequences, a consensus of the available gene trees or concatenated alignments of orthologs. The failure to reject the six evaluated topologies, and the rejection of seven alternative topologies, was interpreted as evidence for a small number of HGT events in the selected groups.

Tests of topology provide a way to account for the uncertainty in tree reconstruction but they do not indicate the locations that any HGT events may have occurred. For that, genome spectral or subtree pruning and regraft methods are required.

Genome spectral approaches

In order to identify the location of HGT events, genome spectral approaches decompose a gene tree into substructures (such as bipartitions or quartets) and identify those that are consistent with the gene tree.

Bipartitions Removing one edge from a reference tree produces two unconnected sub-trees, each a disjoint set of nodes (a bipartition). If a bipartition can exist on both the gene and species tree it is compatible, otherwise it is conflicting. These conflicts can indicate an HGT event, or may be the result of uncertainty in gene tree inference. To reduce uncertainty, bipartition analyses typically focus on strongly supported bipartitions such as those associated with a branch with a bootstrap value above a certain threshold. Any gene family found to have one or several conflicting, but strongly supported, bipartitions is considered as a HGT candidate[38]. By considering how a particular bipartition conflicts with the reference tree (e.g. which 'leaves' are on the 'wrong' side), a plausible HGT scenario can be inferred.

Quartet decomposition Quartets are trees consisting of four leaves. In bifurcating (full resolved) trees, each internal branch induces a quartet whose leaves are either subtrees of the original tree or actual leaves of the original tree). These are often utilised in the construction of larger phylogenies[39]. By deconstructing candidate phylogenies into quartets and comparing these to all candidate trees, potential HGT events can be flagged within incompatible quartets.[40].

Subtree pruning and regrafting

A mechanistic way of modelling an HGT event on the reference tree is to first cut an edge 'prune the tree' and then regraft the sub-tree to another edge[41]. If the gene tree was topologically consistent with the original reference tree, the editing results in an inconsistency. Similarly, when the original gene tree is inconsistent with the reference tree, it is possible to prune and regraft the reference tree to obtain a consistent topology. By interpreting the edit path of pruning and regrafting HGT candidate nodes can be flagged and the host and donor genomes inferred[42].

As SPR is NP-Hard [43] solving the problem is considerably more difficult as more nodes are considered. The computational challenge lies in finding the optimal edit path, the one that requires the least number of steps[44][45], and different strategies are used in solving the problem. For example, the HorizStory algorithm reduces the problem by first eliminating the consistent nodes[46]; recursive pruning and regrafting reconciles the reference tree with the gene tree and optimal edits are interpreted as HGT events.

Implicit phylogenetic methods

Figure 4. A strikingly irregular distance between two leaves of a phylogenetic gene tree, when compared to a tree for their associated genomes is highly suggestive of an HGT event for that gene. Here an HGT event is likely to have occurred prior to C and D's separation.
In contrast to explicit phylogenetic methods, which rely upon the creation and compatibility of phylogenetic trees, implicit methods compare evolutionary distances. Here an unexpected distance from a given reference, which can be the gene family or genomic average, is suggestive of a HGT event (see figure 4). Due to tree construction not being required, implicit approaches are faster and arguably more robust than explicit methods.

Implicit methods can however be limited by disparities between the phylogeny and evolution distances, or over-reliance upon top BLAST hits that reflect closely-related rather than donor species[47]. Additionally whilst a list of top sequence similarity hits is used to create phyletic patterns that detect if a gene was lost from the genome, considering gene remnants as lost genes could inflate the predicted number of HGT events.

In principle implicit methods can detect all three types of HGT events: insertion of a novel gene, paralog or xenolog in orthologous gene displacement. However analysis is limited to xenologous displacement detection if exceptionally small groups or those that without representative in all taxa, such as 'ORFans', are omitted.

Sequence similarity

The potential identification of HGT events through sequence comparison is achieved when the top-scoring BLAST hits are associated with a distantly related species. For example, phyletic profiles of the bacteria Thermotoga maritima have shown that most of the best BLAST matches are in archaea rather than closely related bacteria[31]; these predictions were later supported by an analysis of the structural features of the DNA molecule[11].

However, this approach can be limited to uncovering relatively recent HGT events as speciation after a transfer will result in the top BLAST hit being a more closely related species, therefore potentially registering as a false negative.

Outliers within orthologous groups

For a group of orthologs the molecular clock hypothesis states that the evolutionary distances of genes are proportional to the evolutionary distances of their respective genomes[48]. If a group of orthologs contains xenologs the proportionality of evolutionary relationships will only hold for orthologs, not the xenologs [49].

One approach finds violations of the expected evolutionary distances by ranking similarity scores of Open Reading Frames (ORFs) to a "virtual genome", a collection of ORFs of the respective strains from the GenBank database [50]. If an ORF's evolutionary distance to the virtual genome was inconsistent with the distances of other ORFs from the same genome, the authors inferred an HGT event. Another algorithm for HGT detection compares all pairs of genes in predefined groups of orthologs [51]: if a likelihood ratio test of the HGT hypothesis and a hypothesis of no HGT rejects the null, a putative HGT event is inferred. In addition, a pair-wise comparison allows inference of potential donors and provides an estimation of the time since the HGT event.

Phyletic profiles

A group of orthologs or homologs can be analyzed in terms of the presence/absence of group members in the reference genomes; such patterns are called phyletic profiles[52]. To find HGT events, phyletic profiles are scanned for an unusual distribution of genes. Absence of a homolog in a group of closely related species is an indication that the examined gene might have arrived via a HGT event. For example, the three facultatively symbiotic Frankia sp. strains are of strikingly different sizes: 5.43 Mbp, 7.50 Mbp and 9.04 Mbp, depending on their range of hosts [53]. Marked portions of strain-specific genes had no significant hit in the reference database, and were possibly acquired by HGT transfers from other bacteria. Similarly, the three phenotypically diverse E. coli strains (uropathogenic, enterohemorrhagic and benign) share about 40% of the total combined gene pool, with the other 60% being strain-specific genes and HGT candidates[54]. Further evidence for these genes being present due to HGT was shown as strikingly different codon usage patterns from the core genes and a mostly conserved gene order[54].

Impact of polymorphic sites

It is commonly considered that genes are the units transferred through an HGT event, however it is also possible for recombination to occur within genes. For example, it has been shown that horizontal transfer between closely related species often results in the exchange of ORF fractions[55][56]. The analysis of a group of four E. coli and two Shigella flexneri strains also revealed that the sequence stretches common to all six strains contain polymorphic sites, consequences of homologous recombination[57]. This method of detection is, however, restricted to the sites in common to all analysed species, limiting the analysis to a group of closely related organisms.


Assessing the methods used for detecting HGT events is crucial for the interpretation of their results but represents a significant challenge. Heterogenity of the current methods has so-far prevented a comprehensive assessment of all principles although case studies on nitrogen fixation genes [58] and the use of artificial genomes[7] have shown the potential for benchmarking and subsequent refinement. There is therefore potential for a database of benchmarking results to be used in the development of new methods but for now conclusions about the power of various HGT detection principles largely depend upon the theoretical considerations employed.

One major issue with existing HGT detection methods is the high rate of false positive and false negative results[59]. Related to this is the tendency for some inference methods to identify conflicting groups of potentially non-native genes[60][61]. To determine these rates the amount of non-native genes within a genome must be known, whilst some HGT mechanisms leave tell-tale clues in the genome[62] obtaining an unbiased benchmark is hindered by the large evolutionary scale on which HGT operates. Strategies of evaluation will therefore utilise artificial genomes and phylogenetic trees to simulate known HGT events.

Artificial genomes Inserting known donor genes into a known position in the host genome results in a chimeric genome. These donor and host sequences can either be obtained from a sequence database[63] or can be simulated in silico. Artificial genomes are obtained, for example, using Markov models [64] or by simulating whole-genome evolution [65][40]. These altered genomes benefit from the number of non-native genes being a known value and therefore allow for both type I and type II errors to be identified.

Subtree pruning and regrafting The presence of an HGT event will cause the phylogenetic tree for that gene to conflict with the reference tree for that the host species. With this considered, the effectiveness of a phylogeny-based method can be determined through the creation of trees which simulate HGT events. For example, by switching branches within a phylogenetic tree, HGT events of known values are simulated allowing explicit phylogenetic methods to be tested[66].


Fran, Jelena, Daniel, Steffan provided constructive comments.


  1. ^ Griffith, Fred. 1928. The Significance of Pneumococcal Types. Journal of Hygiene 27(2):113-159. [1]
  2. ^ Lederberg, J, and Tatum, E. 1946. Gene Recombination in Escherica coli. Nature 158:558.
  3. ^ Zinder, N, and Lederberg, J. 1952. Genetic Exchange in Samonella. Journal of Bacteriology 64(5):679-699.
  4. ^ a b Guindon, S, and G Perri re. 2001. Intragenomic base content variation is a potential source of biases when searching for horizontally transferred genes. Molecular Biology and Evolution 18 (9) (September): 1838-1840.
  5. ^ a b Azad, Rajeev K., and Jeffrey G. Lawrence. 2011. Towards more robust methods of alien gene detection. Nucleic Acids Research (February 4). doi:10.1093/nar/gkr059.
  6. ^ Xiong D, Xiao F, Liu L, Hu K, Tan Y, He S, Gao X. 2012. Towards a Better Detection of Horizontally Transferred Genes by Combining Unusual Properties Effectively. PLoS One. 7(8):e43126. doi: 10.1371/journal.pone.0043126.
  7. ^ a b c Becq, Jennifer, C cile Churlaud, and Patrick Deschavanne. 2010. A Benchmark of Parametric Methods for Horizontal Transfers Detection. PLoS ONE 5 (4) (April 1): e9989. doi:10.1371/journal.pone.0009989.
  8. ^ Poptsova, M. 2009. Testing Phylogenetic Methods to Identify Horizontal Gene Transfer. Methods in Molecular Biology 532:227-240 doi:10.1007/978-1-60327-853-9_13.
  9. ^ a b Daubin, Vincent, Emmanuelle Lerat, and Guy Perri re. 2003. The source of laterally transferred genes in bacterial genomes. Genome Biology 4 (9): R57. doi:10.1186/gb-2003-4-9-r57.
  10. ^ a b Lawrence, J G, and H Ochman. 1998. Molecular archaeology of the Escherichia coli genome. Proceedings of the National Academy of Sciences of the United States of America 95 (16) (August 4): 9413-9417.
  11. ^ a b c d Worning, P, L J Jensen, K E Nelson, S Brunak, and D W Ussery. 2000. Structural analysis of DNA sequence: evidence for lateral gene transfer in Thermotoga maritima. Nucleic Acids Research 28 (3) (February 1): 706-709.
  12. ^ Deschavanne, P, and J Filipski. 1995. Correlation of GC content with replication timing and repair mechanisms in weakly expressed E.coli genes. Nucleic Acids Research 23 (8) (April 25): 1350-1353.
  13. ^ Wuitschick JD, Karrer KM. 1999. Analysis of genomic G + C content, codon usage, initiator codon context and translation termination sites in Tetrahymena thermophila. Journal of Eukaryotic Microbiology 46(3):239–47. doi:10.1111/j.1550-7408.1999.tb05120.x. PMID 10377985.
  14. ^ Lawrence, J G, and H Ochman. 1997. Amelioration of bacterial genomes: rates of change and exchange. Journal of Molecular Evolution 44 (4) (April): 383-397.
  15. ^ Vernikos, Georgios S, Nicholas R Thomson, and Julian Parkhill. 2007. Genetic flux over time in the Salmonella lineage. Genome Biology 8 (6): R100. doi:10.1186/gb-2007-8-6-r100.
  16. ^ Rendulic, Snjezana, Pratik Jagtap, Andrea Rosinus, Mark Eppinger, Claudia Baar, Christa Lanz, Heike Keller, et al. 2004. A predator unmasked: life cycle of Bdellovibrio bacteriovorus from a genomic perspective. Science (New York, N.Y.) 303 (5658) (January 30): 689-692. doi:10.1126/science.1093027.
  17. ^ Gophna, Uri, Robert L. Charlebois, and W. Ford Doolittle. 2006. Ancient lateral gene transfer in the evolution of Bdellovibrio bacteriovorus. Trends in Microbiology 14 (2) (February): 64-69. doi:10.1016/j.tim.2005.12.008.
  18. ^ Nakabachi, Atsushi, Atsushi Yamashita, Hidehiro Toh, Hajime Ishikawa, Helen E Dunbar, Nancy A Moran, and Masahira Hattori. 2006. The 160-kilobase genome of the bacterial endosymbiont Carsonella. Science (New York, N.Y.) 314 (5797) (October 13): 267. doi:10.1126/science.1134196.
  19. ^ Liu, Zhandong, Santosh S Venkatesh, and Carlo C Maley. 2008. Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples. BMC Genomics 9: 509. doi:10.1186/1471-2164-9-509.
  20. ^ Bentley, Stephen D, and Julian Parkhill. 2004. Comparative genomic structure of prokaryotes. Annual Review of Genetics 38: 771-792. doi:10.1146/annurev.genet.38.072902.094318.
  21. ^ Kariin, S and Burge, C. 1994. Dinucleotide relative abundance extremes: a genomic signature. Trends in Genetics 11(7):283-290. doi:10.1016/S0168-9525(00)89076-9
  22. ^ Hooper, Sean D, and Otto G Berg. 2002. Detection of genes with atypical nucleotide sequence in microbial genomes. Journal of Molecular Evolution 54 (3) (March): 365-375. doi:10.1007/s00239-001-0051-8.
  23. ^ Deschavanne, P, Giron, A, Vilain, J, Fagot, G, and Fertil B. 1999. Genomic Signature: Characterization and Classification of Species Assessed by Chaos Game Representation of Sequences. Molecular Biology and Evolution 16(10):1391-1399.
  24. ^ Dufraigne, Christine, Bernard Fertil, Sylvain Lespinats, Alain Giron, and Patrick Deschavanne. 2005. Detection and characterization of horizontal transfers in prokaryotes using genomic signature. Nucleic Acids Research 33 (1): e6-e6. doi:10.1093/nar/gni004.
  25. ^ Cortez, Diego, Patrick Forterre, and Simonetta Gribaldo. 2009. “A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes.” Genome Biology 10 (6): R65. doi:10.1186/gb-2009-10-6-r65.
  26. ^ Nakamura, Yoji, Takeshi Itoh, Hideo Matsuda, and Takashi Gojobori. 2004. “Biased biological functions of horizontally transferred genes in prokaryotic genomes.” Nat Genet 36 (7) (July): 760–6. doi:10.1038/ng1381.
  27. ^ Ornstein, Rick L, Robert Rein, Donnal L Breen, and Robert D Macelroy. 1978. An optimized potential function for the calculation of nucleic acid interaction energies I. Base stacking. Biopolymers 17 (10) (October 1): 2341-2360. doi:10.1002/bip.1978.360171005.
  28. ^ el Hassan, M A, and C R Calladine. 1996. Propeller-twisting of base-pairs and the conformational mobility of dinucleotide steps in DNA. Journal of Molecular Biology 259 (1) (May 31): 95-103. doi:10.1006/jmbi.1996.0304.
  29. ^ Olson, Wilma K., Andrey A. Gorin, Xiang-Jun Lu, Lynette M. Hock, and Victor B. Zhurkin. 1998. DNA sequence-dependent deformability deduced from protein DNA crystal complexes. Proceedings of the National Academy of Sciences 95 (19): 11163 -11168. doi:10.1073/pnas.95.19.11163.
  30. ^ Herzel, H, O Weiss, and E N Trifonov. 1999. 10-11 bp periodicities in complete genomes reflect protein structure and DNA folding. Bioinformatics (Oxford, England) 15 (3) (March): 187-193.
  31. ^ a b Nelson, K E, R A Clayton, S R Gill, M L Gwinn, R J Dodson, D H Haft, E K Hickey, et al. 1999. Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima. Nature 399 (6734) (May 27): 323-329. doi:10.1038/20601.
  32. ^ Langille MG, Hsiao WW, Brinkman FS. 2010. Detecting genomic islands using bioinformatics approaches. Nature Reviews Microbiology 8(5):373-82. doi: 10.1038/nrmicro2350.
  33. ^ Hacker J, Blum-Oehler G, Mühldorfer I, Tschäpe H. 1997. Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Molecular Microbiology 23(6):1089-97. doi: 10.1046/j.1365-2958.1997.3101672.x
  34. ^ Goldman, N, J P Anderson, and A G Rodrigo. 2000. Likelihood-based tests of topologies in phylogenetics. Systematic Biology 49 (4) (December): 652-670.
  35. ^ Shimodaira, H, and M Hasegawa. 1999. Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference. Molecular Biology and Evolution 16 (8): 1114.
  36. ^ Shimodaira, Hidetoshi. 2002. An approximately unbiased test of phylogenetic tree selection. Systematic Biology 51 (3) (June): 492-508. doi:10.1080/10635150290069913.
  37. ^ Lerat, Emmanuelle, Vincent Daubin, and Nancy A Moran. 2003. From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria. PLoS Biology 1 (1) (October): E19. doi:10.1371/journal.pbio.0000019.
  38. ^ Zhaxybayeva, Olga, Lutz Hamel, Jason Raymond, and J Peter Gogarten. 2004. Visualization of the phylogenetic content of five genomes using dekapentagonal maps. Genome Biology 5 (3): R20. doi:10.1186/gb-2004-5-3-r20.
  39. ^ Ranwez, Vincent and Gascuel, Olivier. 2001. Quartet-Based Phylogenetic Inference: Improvements and Limits. Molecular Biology and Evolution 18 (6): 1103-1116.
  40. ^ a b Zhaxybayeva, Olga, J Peter Gogarten, Robert L Charlebois, W Ford Doolittle, and R Thane Papke. 2006. Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. Genome Res 16 (9) (September): 1099 108. doi:10.1101/gr.5322306.
  41. ^ Abby, Sophie, Eric Tannier, Manolo Gouy, and Vincent Daubin. 2010. Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests. BMC Bioinformatics 11 (1) (June): 324. doi:10.1186/1471-2105-11-324.
  42. ^ Beiko, Robert G, Timothy J Harlow, and Mark A Ragan. 2005. Highways of gene sharing in prokaryotes. Proc Natl Acad Sci USA 102 (40) (October): 14332 7. doi:10.1073/pnas.0504068102.
  43. ^ Hickey G, Dehne F, Rau-Chaplin A, and Blouin C. 2008. SPR Distance Computation for Unrooted Trees. Evolutionary Bioinformatics Online 4:17–27.
  44. ^ Hein, Jotun, Tao Jiang, Lusheng Wang, and Kaizhong Zhang. 1995. On the Complexity of Comparing Evolutionary Trees.
  45. ^ Allen, Benjamin L., and Mike Steel. 2001. Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees. Annals of Combinatorics 5 (1) (June): 1-15. doi:10.1007/s00026-001-8006-8.
  46. ^ MacLeod, Dave, Robert L Charlebois, Ford Doolittle, and Eric Bapteste. 2005. Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement. BMC Evolutionary Biology 5: 27. doi:10.1186/1471-2148-5-27.
  47. ^ Koski, L B, and G B Golding. 2001. The closest BLAST hit is often not the nearest neighbor. Journal of Molecular Evolution 52 (6) (June): 540-542. doi:10.1007/s002390010184.
  48. ^ Bromham, Lindell, and David Penny. 2003. The modern molecular clock. Nature Reviews. Genetics 4 (3) (March): 216-224. doi:10.1038/nrg1020.
  49. ^ Novichkov, Pavel, Marina Omelchenko, Mikhail Gelfand, Andrei Mironov, Yuri Wolf, and Eugene Koonin. 2004. Genome-Wide Molecular Clock and Horizontal Gene Transfer in Bacterial Evolution. The Journal of Bacteriology 186 (19) (October): 6575. doi:10.1128/JB.186.19.6575-6585.2004.
  50. ^ Clarke, G D Paul, Robert G Beiko, Mark A Ragan, and Robert L Charlebois. 2002. Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. Journal of Bacteriology 184 (8) (April): 2072-2080.
  51. ^ Dessimoz, Christophe, Daniel Margadant, and Gaston H Gonnet. 2008. DLIGHT - Lateral Gene Transfer Detection Using Pairwise Evolutionary Distances in a Statistical Framework. Springer, 4955:315-330.
  52. ^ Kensche, Philip R, Vera van Noort, Bas E Dutilh, and Martijn A Huynen. 2008. Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution. Journal of the Royal Society, Interface / the Royal Society 5 (19) (February 6): 151-170. doi:10.1098/rsif.2007.1047.
  53. ^ Normand, Philippe, Pascal Lapierre, Louis S Tisa, Johann Peter Gogarten, Nicole Alloisio, Emilie Bagnarol, Carla A Bassi, et al. 2007. Genome characteristics of facultatively symbiotic Frankia sp. strains reflect host range and host plant biogeography. Genome Research 17 (1) (January): 7-15. doi:10.1101/gr.5798407.
  54. ^ a b Welch, R A, V. Burland, G. Plunkett, P. Redford, P. Roesch, D. Rasko, E L Buckles, et al. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci USA 99 (26) (December): 17020 4. doi:10.1073/pnas.252529799.
  55. ^ Ochman, H, J G Lawrence, and E A Groisman. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature 405 (6784) (May 18): 299-304. doi:10.1038/35012500.
  56. ^ Papke, R Thane, Jeremy E Koenig, Franc sco Rodr guez-Valera, and W Ford Doolittle. 2004. Frequent recombination in a saltern population of Halorubrum. Science (New York, N.Y.) 306 (5703) (December 10): 1928-1929. doi:10.1126/science.1103289.
  57. ^ Mau, Bob, Jeremy D Glasner, Aaron E Darling, and Nicole T Perna. 2006. Genome-wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli. Genome Biology 7 (5): R44. doi:10.1186/gb-2006-7-5-r44.
  58. ^ Kechris, Katherina J, Jason C Lin, Peter J Bickel, and Alexander N Glazer. 2006. Quantitative exploration of the occurrence of lateral gene transfer by using nitrogen fixation genes as a case study. Proceedings of the National Academy of Sciences of the United States of America 103 (25) (June 20): 9584-9589. doi:10.1073/pnas.0603534103.
  59. ^ Poptsova, Maria S, and J Peter Gogarten. 2007. The power of phylogenetic approaches to detect horizontally transferred genes. BMC Evolutionary Biology 7: 45. doi:10.1186/1471-2148-7-45.
  60. ^ Ragan, Mark. 2001. On surrogate methods for detecting lateral gene transfer. FEMS Microbiology Letters 201(2):187-191.
  61. ^ Lawrence, J G, and H Ochman. 2002. Reconciling the many faces of lateral gene transfer. Trends in Microbioloy. 10(1):1-4.
  62. ^ Zaneveld, Jesse R, Diana R Nemergut, and Rob Knight. 2008. Are all horizontal gene transfers created equal? Prospects for mechanism-based studies of HGT patterns. Microbiology (Reading, England) 154 (Pt 1) (January): 1-15. doi:10.1099/mic.0.2007/011833-0.
  63. ^ Cortez, Diego Q, Antonio Lazcano, and Arturo Becerra. 2005. Comparative analysis of methodologies for the detection of horizontally transferred genes: a reassessment of first-order Markov models. In Silico Biology 5 (5-6): 581-592.
  64. ^ Azad, Rajeev K, and Jeffrey G Lawrence. 2005. Use of Artificial Genomes in Assessing Methods for Atypical Gene Detection. PLoS Comput Biol 1 (6) (November 11): e56. doi:10.1371/journal.pcbi.0010056.
  65. ^ Galtier, Nicolas. 2007. A model of horizontal gene transfer and the bacterial phylogeny problem. Systematic Biology 56 (4) (August): 633-642. doi:10.1080/10635150701546231.
  66. ^ Beiko, Robert G, and Nicholas Hamilton. 2006. Phylogenetic identification of lateral genetic transfer events. BMC Evolutionary Biology 6: 15. doi:10.1186/1471-2148-6-15.

Cite error: <ref> tag with name "Boto2010" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Grantham1981" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Keeling2008" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Mallet2010" defined in <references> is not used in prior text.
Cite error: <ref> tag with name "Supek2010" defined in <references> is not used in prior text.

Personal tools