Siddharth Jain
Research Interests
 Information and Coding Theory
 Intelligence Amplification (IA) and Causal Inference
 Disease Genomics, DNA Storage and Computational Biology
News!
Feb 13, 2019: I gave an invited graduation day talk at ITA Workshop in San Diego
Jan 11, 2019: Our article on Cancer Classification using healthy DNA is available on bioRxiv [Link]
Jan 11, 2019: Our article on the statistical bias present in short tandem repeats in amplified samples on TCGA is now available on bioRxiv [Link]
Current Research
Disease Risk Estimation
Siddharth Jain, B. Mazaheri, N. Raviv, Jehoshua Bruck
Disease Risk Estimation from Mutation Profile of the Genome
Provisional patent application filed, 2018.
Siddharth Jain, B. Mazaheri, N. Raviv, Jehoshua Bruck
Cancer Classification from Healthy DNA using Machine Learning
bioRxiv [Link]
Siddharth Jain, B. Mazaheri, N. Raviv, Jehoshua Bruck
Short Tandem Repeats Information in TCGA is Statistically Biased by Amplification
bioRxiv [Link]
Balanced Sequences
Many genomes including our own human genome share a property termed as 2nd Chargaff Rule. According to this rule the number of occurences of a kmer is almost equal to the number of occurences of it's reverse complement. For example, #A = #T, #C = #G, #AC = #GT and so on. This property is observed to hold upto a certain k known as KLimit which depends on the length of genome. For human genome, the KLimit is 10. In this project, we investigate models based on reversed tandem and interspered duplications for sequence generation that can give sequences satisfying the 2nd Chargaff Rule. We find fundamental limit onthe number of generations required to generate such balanced sequences for a given KLimit. We compare the sequences generated by these models with real genome sequences and find results that give valubale insights into the evolution of genomes.
String Duplication Systems
Motivated by the phenomenon of tandem duplications observed in DNA, we investigate the diversity of sequences that can be generated by a series of tandem duplications applied on a seed. This diversity is characterized by mathematical measures of capacity and expressiveness. Given a sequence, we calculate upper and lower bounds on the number of steps required to reduce the sequence by omitting tandem repeats to its seed. We also develop algorithms that perform near optimal reduction.
Siddharth Jain, F. Farnoud, Jehoshua Bruck
Capacity and Expressiveness of Genomic Tandem Duplication [pdf] IEEE Transactions on Information Theory, vol 63, no. 10, pp. 61296138, October 2017. shorter version in Proceedings of 2015 IEEE International Symposium on Information Theory (ISIT), pp. 19461950, Hong Kong.

Noga Alon, Jehoshua Bruck, F. Farnoud, Siddharth Jain
Duplication Distance to the Root for Binary Sequences[pdf] IEEE Transactions on Information Theory, vol 63, no. 12, pp. 77937803, December 2017.

Noga Alon, Jehoshua Bruck, F. Farnoud, Siddharth Jain On the Duplication Distance of Binary Strings[pdf] in Proceedings of 2016 IEEE International Symposium on Information Theory (ISIT), pp. 260264 , Barcelona, Spain.
Codes for liveDNA Storage
The advances in CRISPR have led to the realization of storing information inside the DNA of a living organism. As the cell replicates, this information can become erroneous due to mutation errors. Recovery of the stored information requires design of error correcting schemes that can correct these evolutionary errors. We investigated an evolution channel with unbounded duplication errors and came up with the error correcting codes. We also investigated errors caused by substitution mutations alongside duplications to come up with sphere packing bounds.

Siddharth Jain, F. Farnoud, M. Schwartz, Jehoshua Bruck
Noise and Uncertainty in StringDuplication Systems[pdf] in Proceedings of 2017 IEEE International Symposium on Information Theory (ISIT), pp. 31203124, Aachen, Germany.

Siddharth Jain, F. Farnoud, M. Schwartz, Jehoshua Bruck
DuplicationCorrecting Codes for Data Storage in the DNA of a Living Organism[pdf]
IEEE Transactions on Information Theory, vol 63, no. 8, pp. 49965010, August 2017. shorter version in Proceedings of 2016
IEEE International Symposium on Information Theory (ISIT), pp. 10281032, Barcelona, Spain.
Past Research
Match Lengths, Zero Entropy and Large Deviations
We investigate the match length expression in the context of Sliding Window LempelZiv Algorithm for zero entropy sequences. We also prove large deviation property for recurrence times and propose entropy estimator based on them under certain mixing conditions.
Siddharth Jain, R. K. Bansal
On Match Lengths, Zero Entropy and Large Deviations  with Application to Sliding Window LempelZiv Algorithm[Link][pdf]
IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 120132, January 2015.
Siddharth Jain, R. K. Bansal
On Large Deviation Property of Recurrence Times[Link]
[pdf] Proceedings of IEEE Intenational Symposium on Information Theory (ISIT) 2013, pp. 28802884, Istanbul, Turkey.
Siddharth Jain, R. K. Bansal
On Match Lengths and Asymptotic Behavior of Sliding Window LempelZiv Algorithm for Zero Entropy Sequences[Link][pdf] Proceedings of IEEE International Syposium
on Information Theory (ISIT) 2013, pp. 28852889, Istanbul, Turkey.
Fractional Calculus based approaches in Control
We propose a fractal optimal controller for heart rate control through pacemaker. The dynamics of the the heart rate are modeled by fractional differential equation. We also use a similar approach to perform power management for multivoltage and frequency islands multiprocesser platforms.
Paul Bogdan, Siddharth Jain, Kartikeya Goyal, Radu Marculescu
Implantable Pacemakers Control and Optimization via Fractional Calculus Approach: A Cyber Physical Systems (CPS) Perspective[Link] [pdf]
International Conference on Cyber Physical Systems (ICCPS), pp. 2332, CPS Week 2012, Beijing, China.
Paul Bogdan, Siddharth Jain, Radu Marculescu
Pacemaker Control of Heart Rate Variability: A CPS Perspective[Link][pdf]
ACM Transactions on Embedded and Computing Systems (TECS), vol. 12, no. 1s, Article 50, March 2013.
Paul Bogdan, Radu Marculescu, Siddharth Jain
Dynamic Power Management for Multidomain Processor SystemsonChip Platforms: An Optimal Control Approach[Link][pdf]
ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 18, no. 4, Article 46, October 2013.
Paul Bogdan, Radu Marculescu, Siddharth Jain and Rafael Tornero Gavila,
An Optimal Control Approach to Power Management for MultiVoltage and Frequency Islands Multi Processor platforms under highly variable workloads[Best Paper Award][Link] [pdf]
International Syposium on NetworksOnChip (NOCS) 2012, pp. 3542, Lyngby, Denmark.
Talks

Duplication Channel of The Genome.
Conference on Information Sciences and Systems, John Hopkins, March 2019 [Invited]

Decoding the Past.
Information Theory and Applications San Diego, Feb 13, 2019 [Invited]

Decoding the Past.
Molecular Programming Project (MPP) Workshop, Boston, Massachusetts, December 911, 2016 [Invited]

Biological Information Channel.
IPAM Computational Genomics Summer Institute (CGSI), UCLA, July 2016

Duplication Correcting Codes for DNA Storage.
Molecular Programming Project (MPP) Workshop, Seattle, Washington, January 1518, 2016 [Invited]
Posters

Siddharth Jain, F. Farnoud, M. Schwartz, Jehoshua Bruck
Capacity and Diversity of Tandem Duplication. Invited Poster in Molecular Programming Project (MPP) Workshop, San Francisco, California, January 911, 2015.
