Related terms

  • PWM:
  • Position Specific Scoring Matrix (PSSM):
  • odds:
  • log-odds:

How to determine the occurrences of motifs in sequences

odds score-based

The most straightforward way to draw the conclusion is to use the odds scores (usually in log scale) calcuclated using the PSSM. Log odds detection threshold, used to determine bound vs. unbound sites (mandatory) example: 8.059752

\(p\)-value-based

The \(p\)-value of a match for a \(k\)-mer \(x\) is the probability of obtaining the same or better log-odds score \(S(x)\) with the PWM on a random/control sequence: \[p\text{-value}(x)=\sum\limits_{S(z)\ge S(x)}P_{bg}(z),\] where \(P_{bg}(z)\) denotes the probability to observe \(z\) according to the background model.

The program computes a log-likelihood ratio score (often referred to incorrectly as a 'log-odds score') for each motif with respect to each sequence position and converts these scores to P-values using dynamic programming (Staden, 1994), assuming a zero-order null model in which sequences are generated at random with user-specified per-letter background frequencies. (default match p-value threshold is 0.0001)

When sequences are longer than 1kb.

How to tell whether motifs are enriched or not comparing to control (background) sequences

\(de\text{ }novo\) motif discovery