Siddharth Jain
Research InterestsThe human genome is a complex set of information with many evolution secrets hidden within. Unfolding these secrets has applications in predicting disease, discerning ancestry, materializing DNA storage, and designing synthetic biology devices for computation. In the past, many studies have only focused on small portions of DNA for evolutionary inferences, as the whole human genome was not readily available. Following the success of the human genome project and super exponential progress in sequencing technology, the whole human genome of high quality can now be availed at a very low cost. Genomic studies have largely focused on making evolution inference out of the genome by comparing observed phenotypes (like diseases, physical traits, race, gender,etc.), a scientific philosophy motivated by backward inference. However, now that we have the whole genome available, forward inference about evolution can be made directly. My interests lie in using the ideas and philosophies of information theory, coding theory, theoretical computer science, and signal processing to make this forward inference by understanding the evolution channel of each genome. More precisely, I want to describe an individual's genome with a persona by viewing it as a signal of mutation activity.
News!Jan 11, 2019: Our article on Cancer Classification using healthy DNA is available on bioRxiv [Link] Jan 11, 2019: Our article on the statistical bias present in short tandem repeats in amplified samples on TCGA is now available on bioRxiv [Link] Current ResearchDisease Risk EstimationSiddharth Jain, B. Mazaheri, N. Raviv, Jehoshua Bruck Siddharth Jain, B. Mazaheri, N. Raviv, Jehoshua Bruck Siddharth Jain, B. Mazaheri, N. Raviv, Jehoshua Bruck Balanced SequencesMany genomes including our own human genome share a property termed as 2nd Chargaff Rule. According to this rule the number of occurences of a kmer is almost equal to the number of occurences of it's reverse complement. For example, #A = #T, #C = #G, #AC = #GT and so on. This property is observed to hold upto a certain k known as KLimit which depends on the length of genome. For human genome, the KLimit is 10. In this project, we investigate models based on reversed tandem and interspered duplications for sequence generation that can give sequences satisfying the 2nd Chargaff Rule. We find fundamental limit onthe number of generations required to generate such balanced sequences for a given KLimit. We compare the sequences generated by these models with real genome sequences and find results that give valubale insights into the evolution of genomes.
String Duplication SystemsMotivated by the phenomenon of tandem duplications observed in DNA, we investigate the diversity of sequences that can be generated by a series of tandem duplications applied on a seed. This diversity is characterized by mathematical measures of capacity and expressiveness. Given a sequence, we calculate upper and lower bounds on the number of steps required to reduce the sequence by omitting tandem repeats to its seed. We also develop algorithms that perform near optimal reduction.
Codes for liveDNA StorageThe advances in CRISPR have led to the realization of storing information inside the DNA of a living organism. As the cell replicates, this information can become erroneous due to mutation errors. Recovery of the stored information requires design of error correcting schemes that can correct these evolutionary errors. We investigated an evolution channel with unbounded duplication errors and came up with the error correcting codes. We also investigated errors caused by substitution mutations alongside duplications to come up with sphere packing bounds.
Past ResearchMatch Lengths, Zero Entropy and Large DeviationsWe investigate the match length expression in the context of Sliding Window LempelZiv Algorithm for zero entropy sequences. We also prove large deviation property for recurrence times and propose entropy estimator based on them under certain mixing conditions.
Fractional Calculus based approaches in ControlWe propose a fractal optimal controller for heart rate control through pacemaker. The dynamics of the the heart rate are modeled by fractional differential equation. We also use a similar approach to perform power management for multivoltage and frequency islands multiprocesser platforms.
Talks
Posters
