Toy models and transcription factor binding sites
Frederic Schuetz and Mauro Delorenzi
This model is inspired by HMM for the "dishonest casino" presented by Durbin, Eddy, Krogh, Mitchison
in "Biological sequence analysis" (Cambridge University Press, 1998).
MAMOT model file for CASINO
AT and CG-rich regions in DNA
A model for different GC isochores in genome sequences.
The model defines a "NULLMODEL" for regularization of emission
probabilities by pseudocounts when re-training the model.
MAMOT model file for "ATGC"
Transcription factor binding sites
Many TFs are multimeric proteins, for example homo- or
hetero-dimes, and bind to DNA in different configurations.
A full model should includes all possibilities for binding, for example in both
orientations to the DNA double helix.
An added layer of complexity is present when a TF binds to two "half-sites" with
"spacer" inbetween, of possibly variable length.
This is frequently the case, for example for nuclear receptors or p53.
Sometimes the "half-sites" are different and their order can be inverted,
for example for binding of SOX2-Oct4.
There are then four binding modi the DNA double-helices
resulting in four different sequence patterns when only one strand is considered:
2. inverted half sites,
3. reverse complementary of standard (other direction)
4. reverse complementary of inverted half sites (other direction)
An example for a MAMOT file that defines such a model is linked below.
It has different series of binding states corresponding to the four configurations:
1. first half-site "Bs1..7" - spacer("SP") - second half-site "Bo1..8",
2. "rBo1..8" - spacer("rSP") - "rBs1..7",
3. "aBo8..1" - spacer("aSP") - "aBs7..1"
4. "arBs7..1" - spacer("arSP") - "arBo8..1"
Reverse complementarity of model states can be imposed during training with a series of "compl_em"
specifications in the model file.
between states that represent the same binding interactions between DNA and protein
("Bo" and "rBo", "Bs" and "rBs")
can be restricted to have the same base preferences, this is realized with "tie_em" rules.
The spacer states are linked so that the length of the spacer can be 0, 3 or 6.
In this example we fixed the emission probabilities of the spacer states during training
with "fix_em" directives at the end of the file.
MAMOT model file for a transcription factor with 4 binding configurations
Reference that describes HMM models for Nuclear Hormone Receptors:
Sandelin A and Wasserman W., 2005.
Prediction of Nuclear Hormone Receptor Response Elements.
Molecular Endocrinologx 19:595-606, 2005.
For questions and comments: