Siddharth Jain

That's me 

Siddharth Jain
PhD Candidate
Advisor: Professor Jehoshua (Shuki) Bruck
Electrical Engineering Department
Caltech
Email: sidjain at caltech dot edu
Phone: +1-6266521958
Photo Credits: Prachi Parihar
Erdős Number: 2

I am on the academic job market this year!

Research Interests

The human genome is a complex set of information with many evolution secrets hidden within. Unfolding these secrets has applications in predicting disease, discerning ancestry, materializing DNA storage, and designing synthetic biology devices for computation. In the past, many studies have only focused on small portions of DNA for evolutionary inferences, as the whole human genome was not readily available. Following the success of the human genome project and super exponential progress in sequencing technology, the whole human genome of high quality can now be availed at a very low cost. Genomic studies have largely focused on making evolution inference out of the genome by comparing observed phenotypes (like diseases, physical traits, race, gender,etc.), a scientific philosophy motivated by backward inference. However, now that we have the whole genome available, forward inference about evolution can be made directly. My interests lie in using the ideas and philosophies of information theory, coding theory, theoretical computer science, and signal processing to make this forward inference by understanding the evolution channel of each genome. More precisely, I want to describe an individual's genome with a persona by viewing it as a signal of mutation activity.

Current Research

Disease Risk Estimation

Siddharth Jain, B. Mazaheri, N. Raviv, Jehoshua Bruck
Disease Risk Estimation from Mutation Profile of the Genome
Provisional patent application filed, 2018.

Siddharth Jain, B. Mazaheri, N. Raviv, Jehoshua Bruck
Cancer Risk Estimation from Mutation Profile of the Genome
in preparation

Siddharth Jain, B. Mazaheri, N. Raviv, Jehoshua Bruck
Short Tandem Repeat Information on TCGA is biased!
on biorxiv shortly

Balanced Sequences

Many genomes including our own human genome share a property termed as 2nd Chargaff Rule. According to this rule the number of occurences of a k-mer is almost equal to the number of occurences of it's reverse complement. For example, #A = #T, #C = #G, #AC = #GT and so on. This property is observed to hold upto a certain k known as K-Limit which depends on the length of genome. For human genome, the K-Limit is 10. In this project, we investigate models based on reversed tandem and interspered duplications for sequence generation that can give sequences satisfying the 2nd Chargaff Rule. We find fundamental limit onthe number of generations required to generate such balanced sequences for a given K-Limit. We compare the sequences generated by these models with real genome sequences and find results that give valubale insights into the evolution of genomes.
  • Siddharth Jain, N. Raviv, Jehoshua Bruck
    Attaining the 2nd Chargaff Rule by Tandem Duplications
    accepted in 2018 IEEE International Symposium on Information Theory (ISIT)

String Duplication Systems

Motivated by the phenomenon of tandem duplications observed in DNA, we investigate the diversity of sequences that can be generated by a series of tandem duplications applied on a seed. This diversity is characterized by mathematical measures of capacity and expressiveness. Given a sequence, we calculate upper and lower bounds on the number of steps required to reduce the sequence by omitting tandem repeats to its seed. We also develop algorithms that perform near optimal reduction.
  • Siddharth Jain, F. Farnoud, Jehoshua Bruck
    Capacity and Expressiveness of Genomic Tandem Duplication [pdf]
    IEEE Transactions on Information Theory, vol 63, no. 10, pp. 6129-6138, October 2017.
    shorter version in Proceedings of 2015 IEEE International Symposium on Information Theory (ISIT), pp. 1946-1950, Hong Kong.

  • Noga Alon, Jehoshua Bruck, F. Farnoud, Siddharth Jain
    Duplication Distance to the Root for Binary Sequences[pdf]
    IEEE Transactions on Information Theory, vol 63, no. 12, pp. 7793-7803, December 2017.

  • Noga Alon, Jehoshua Bruck, F. Farnoud, Siddharth Jain
    On the Duplication Distance of Binary Strings[pdf]
    in Proceedings of 2016 IEEE International Symposium on Information Theory (ISIT), pp. 260-264 , Barcelona, Spain.

Codes for DNA Storage

Design error correcting codes for storage under duplication and substitution errors.
  • Siddharth Jain, F. Farnoud, M. Schwartz, Jehoshua Bruck
    Noise and Uncertainty in String-Duplication Systems
    in Proceedings of 2017 IEEE International Symposium on Information Theory (ISIT), pp. 3120-3124, Aachen, Germany.

  • Siddharth Jain, F. Farnoud, M. Schwartz, Jehoshua Bruck
    Duplication-Correcting Codes for Data Storage in the DNA of a Living Organism[pdf]
    IEEE Transactions on Information Theory, vol 63, no. 8, pp. 4996-5010, August 2017.
    shorter version in Proceedings of 2016 IEEE International Symposium on Information Theory (ISIT), pp. 1028-1032, Barcelona, Spain.

Past Research

Match Lengths, Zero Entropy and Large Deviations

We investigate the match length expression in the context of Sliding Window Lempel-Ziv Algorithm for zero entropy sequences. We also prove large deviation property for recurrence times and propose entropy estimator based on them under certain mixing conditions.
  • Siddharth Jain, R. K. Bansal
    On Match Lengths, Zero Entropy and Large Deviations - with Application to Sliding Window Lempel-Ziv Algorithm[Link][pdf]
    IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 120-132, January 2015.

  • Siddharth Jain, R. K. Bansal
    On Large Deviation Property of Recurrence Times[Link] [pdf]
    Proceedings of IEEE Intenational Symposium on Information Theory (ISIT) 2013, pp. 2880-2884, Istanbul, Turkey.

  • Siddharth Jain, R. K. Bansal
    On Match Lengths and Asymptotic Behavior of Sliding Window Lempel-Ziv Algorithm for Zero Entropy Sequences[Link][pdf]
    Proceedings of IEEE International Syposium on Information Theory (ISIT) 2013, pp. 2885-2889, Istanbul, Turkey.

Fractional Calculus based approaches in Control

We propose a fractal optimal controller for heart rate control through pacemaker. The dynamics of the the heart rate are modeled by fractional differential equation. We also use a similar approach to perform power management for multi-voltage and frequency islands multiprocesser platforms.
  • Paul Bogdan, Siddharth Jain, Kartikeya Goyal, Radu Marculescu
    Implantable Pacemakers Control and Optimization via Fractional Calculus Approach: A Cyber Physical Systems (CPS) Perspective[Link] [pdf]
    International Conference on Cyber Physical Systems (ICCPS), pp. 23-32, CPS Week 2012, Beijing, China.

  • Paul Bogdan, Siddharth Jain, Radu Marculescu
    Pacemaker Control of Heart Rate Variability: A CPS Perspective[Link][pdf]
    ACM Transactions on Embedded and Computing Systems (TECS), vol. 12, no. 1s, Article 50, March 2013.

  • Paul Bogdan, Radu Marculescu, Siddharth Jain
    Dynamic Power Management for Multi-domain Processor Systems-on-Chip Platforms: An Optimal Control Approach[Link][pdf]
    ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 18, no. 4, Article 46, October 2013.

  • Paul Bogdan, Radu Marculescu, Siddharth Jain and Rafael Tornero Gavila,
    An Optimal Control Approach to Power Management for Multi-Voltage and Frequency Islands Multi Processor platforms under highly variable workloads[Best Paper Award][Link] [pdf]
    International Syposium on Networks-On-Chip (NOCS) 2012, pp. 35-42, Lyngby, Denmark.

Talks

  • Decoding the Past.
    Molecular Programming Project (MPP) Workshop, Boston, Massachusetts, December 9-11, 2016.

  • Biological Information Channel.
    IPAM Computational Genomics Summer Institute (CGSI), UCLA, July 2016.

  • Duplication Correcting Codes for DNA Storage.
    Molecular Programming Project (MPP) Workshop, Seattle, Washington, January 15-18, 2016.

Posters

  • Siddharth Jain, F. Farnoud, M. Schwartz, Jehoshua Bruck
    Capacity and Diversity of Tandem Duplication.
    Invited Poster in Molecular Programming Project (MPP) Workshop, San Francisco, California, January 9-11, 2015.