Hierarchical multifractal representation of symbolic sequences and application to human chromosomes
Source: Phys. Rev. E 81, 026102 (2010); doi:10.1103/PhysRevE.81.026102
Published 8 February 2010
The two-dimensional density correlation matrix is constructed for symbolic sequences using contiguous segments of arbitrary size. The multifractal spectrum obtained from this matrix motif is shown to characterize the correlations in the symbolic sequences. This method is applied to entire human chromosomes, shuffled human chromosomes, reconstructed human genomic sequences and to artificial random sequences. It is shown that all human chromosomes have common characteristics in their multifractal spectrum and deviate substantially from random and uncorrelated sequences of the same size. Small deviations are observed between the longer and the shorter chromosomes, especially for the higher (in absolute values) statistical moments. The correlations are crucial for the form of the multifractal spectrum; surrogate shuffled chromosomes present randomlike spectrum, distinctly different from the actual chromosomes. Analytical approaches based on hierarchical superposition of tensor products show that retaining pair correlations in the sequences leads to a closer representation of the genomic multifractal spectra, especially in the region of negative exponents, due to the underrepresentation of various functional units (such as the cytosine-guanine CG combination and its complementary GC complex). Retaining higher-order correlations in the construction of the tensor products is a way to approach closer the structure of the multifractal spectra of the actual genomic sequences. This hierarchical approach is generic and is applicable to other correlated symbolic sequences.
©2010 The American Physical Society
| History: | Received 23 September 2009; published 8 February 2010 |
| Permalink: |
http://link.aps.org/abstract/PRE/v81/e026102 |
ADVERTISEMENT


