Symbols Used for TRANSFAC Binding Sequences
Consensus binding sequences are provided in the IUPAC code, and many are taken from the compilation of Faisst and Meyer, 1992. In addition to A, C, G, and T, the following symbols are used for consensus sequences:
Symbols Used for Consensus Binding
Sequences
And Their Meanings
Symbol | Meaning |
---|---|
B | C, G, or T |
D | A, G, or T |
H | A, C, or T |
K | G or T |
M | A or C |
N | A, C, G, or T |
R | A or G |
S | C or G |
V | A, C, or G |
W | A or T |
Y | C or T |
A number of consensus sequences has been generated by the TRANSFAC team, generally derived from DNA binding profiles comprised of matrices. Here, the use of degenerate codes follows the rules below, adapted from Cavener, 1987:
- A single nucleotide is shown if its frequency is at least 50%
and at least twice as high as the second most frequent
nucleotide.
- A double-degenerate code indicates that the corresponding two
nucleotides occur in at least 75% of the underlying sequences and
rule 1 does not apply.
- Usage of triple-degenerate codes is restricted to those
positions where one of the nucleotides did not show up at all in
the sequence set and none of the afore mentioned rules
applies.
- All other frequency distributions are represented by the letter "N".