Symbols Used for TRANSFAC Binding Sequences

Consensus binding sequences are provided in the IUPAC code, and many are taken from the compilation of Faisst and Meyer, 1992. In addition to A, C, G, and T, the following symbols are used for consensus sequences:

Symbols Used for Consensus Binding Sequences
And Their Meanings

Symbol Meaning
B C, G, or T
D A, G, or T
H A, C, or T
K G or T
M A or C
N A, C, G, or T
R A or G
S C or G
V A, C, or G
W A or T
Y C or T

A number of consensus sequences has been generated by the TRANSFAC team, generally derived from DNA binding profiles comprised of matrices. Here, the use of degenerate codes follows the rules below, adapted from Cavener, 1987:

  1. A single nucleotide is shown if its frequency is at least 50% and at least twice as high as the second most frequent nucleotide.

  2. A double-degenerate code indicates that the corresponding two nucleotides occur in at least 75% of the underlying sequences and rule 1 does not apply.

  3. Usage of triple-degenerate codes is restricted to those positions where one of the nucleotides did not show up at all in the sequence set and none of the afore mentioned rules applies.

  4. All other frequency distributions are represented by the letter "N".

