Several alignment-free methods for DNA distance computation have been proposed. In general, these methods are based on statistics of word frequencies (i.e., k-tuples) using metrics such as weighted Euclidean distance, correlation, co-variance, information theory-based measurements, and angle metrics. . However, other methods based on graphical DNA representations apply dinucleotide (doublet) histograms , graph theory , trinucleotide (triplet) curves , or the average bandwidth of distance/distance (D/D) matrices . A widely used tool for computing phylogenetic trees is the phylogeny inference package (Phylip) , which applies different methods such as parsimony, jackknife, bootstrapping, and consensus trees using molecular sequences, gene frequencies, restriction sites and fragments, distance matrices, and discrete characters.
Several elements may impact GAFD results to different extents. Even if they are related, these elements can be grouped according to the source of two critical phenomena, namely those related to the method, such as sequence-to-signal mapping and zero padding, and those related to the nature of the genomic grammar. Regarding the former, GAFD sequence-to-signal mapping is based on uniformly-euclidean unidimensional mappings. Even though comparisons will sometimes yield different results under particular sequence conditions, higher-order NN mappings are very difficult to implement and analyze. Additionally, since biologically meaningful sequences may not be of the same length, power spectra comparisons are influenced by zero padding. Regarding the latter, genomic information is typically full of challenging sequences, i.e. palindromes, inversions, translocations, repeats, duplications, and indels. All of these will exhibit distinct characteristics in power spectra that will in turn lead to inconsistencies when several sequence comparisons are performed. For example, differences among inversions, translocations, and palindromes may not be observed, while repetitions and duplications will display specific frequency peaks. How the interaction between all of these elements affects GAFD analysis is outside the scope of this paper. Moreover, NN mapping using context-sensitive information and DNA distance determination through power spectra comparison should be explored in the future. 2b1af7f3a8