MEPI-24

Compositional Complexity in Genomic Patterns and Classification

SMB2025 SMB2025 Follow
Share this

somdatasina

IISER Kolkata, India
"Compositional Complexity in Genomic Patterns and Classification"
A genome consists of a long string of four letters (bases A, T, C, G). How the information of biochemical processes stored in this string of bases is an open question. Are their higher order structures, such as, words, sentences, semantics, and a grammar in the DNA language (compositional complexity)? DNA from different species exhibit differences in global sequence composition, and this is used as markers to align larger sequences - grouping of genomes based on homology. Classification of genomes through similarity and dissimilarity is at the heart of Phylogenetics/Genomic Epidemiology. It uses several statistical-mathematical methods to align and compare the base sequences of multiple genomes, which are both computational resource intensive and time consuming for similar sequences. We develop and use an “alignment-free” method based on the Chaos-Game-Representation (CGR) of Statistical Physics, to successfully classify very closely related genomes of sub and sub-sub-species of HIV1 and mutants of Covid19. This useful approach requires less computational resources and time for analysis.
Additional authors:



SMB2025
#SMB2025 Follow
Annual Meeting for the Society for Mathematical Biology, 2025.