Genes and Molecules Described in the BKL

The important characteristics of components of a group are what define those components as members of the group and allow researchers to identify relationships between and among the members. For example, molecules with certain shared characteristics are grouped into families or superfamilies. Similarly, reactions with shared characteristics can be assigned into general reaction classes, as has been done in the EC nomenclature. Lastly, combinations of molecules and reactions can be grouped, and the grouping may correspond to a particular network or pathway.

Genes and Molecules in the BKL are grouped into the following categories:


Genes and Orthogenes

Gene describes a single gene from a particular taxon. The prefix ortho is added (orthogene) to describe a group of orthologous genes from multiple species or higher taxons. Genes and orthogenes are described on Locus Reports.


Family and Orthofamily Molecules

Grouping molecules into families and orthofamilies allows us to write algorithms that exploit these relationships and infer properties for the individual family members. Molecules from a single species or higher taxon that comprise a family or superfamily are grouped into a family. Orthofamily describes a group of homologous families or superfamilies from multiple species or higher taxons. Families and orthofamilies are described on Family Reports.

Isogroup and Orthogroup Molecules

Several isoforms may exist for a particular gene product. Occasionally, a signaling activity is attributed to a single gene product, and researchers discover later that several molecules are produced by that gene. The BKL has molecule categories that refer to the products of a particular gene, where the particular isoform is not known or specified in the scientific literature. Isogroup describes a group of species- or higher taxon-specific products of a single gene. Orthogroup describes a group of products from orthologous genes from multiple species or higher taxons. Isogroup and orthogroup molecules are described on Locus Reports.

Protein and Orthobasic Molecules

Often the isoform, a particular splice variant for example, is specified in the scientific literature. Protein describes a specific isoform from a particular species or higher taxon. Orthobasic describes a group of specific isoforms produced by orthologous genes from multiple species or higher taxons. Proteins and orthobasic molecules are described on Locus Reports.


Complex and Orthocomplex Molecules

Non-covalently bound molecules from a particular species or higher taxon are referred to as a complex. Homologous complexes from multiple species or higher taxons are referred to as an orthocomplex. Complexes and orthocomplexes are described on Complex Reports.

The figure below illustrates the relationships between and among various genes, proteins, and complexes in the BKL.

Molecule Types in the BKL

Genes, Proteins, and Complexes in the BKL. Types that describe groups of orthologous genes and molecules appear in gray. Types that specify particular taxons appear in red, with taxon designated in parentheses. Here, (h) indicates human. Arrows indicate that the gene or molecule is a component of a particular group. A. Genes in the BKL. Genes and orthogenes are described on Locus Reports. B. Proteins in the BKL. Family and orthofamily molecules are described on Family Reports, while isogroup, orthogroup, orthobasic, and protein molecules are described on Locus Reports. C. Complexes in the BKL. Complexes are indicated with colon separating the individual components (A1A:B). Complex and orthocomplex molecules are described on Complex Reports.

Small Molecules

Non-proteinaceous molecules that play regulatory roles in signaling pathways are also described in the BKL, on Small Molecule Reports.

Preserving Scientific Context with Molecule Types in TRANSPATH
Examining the sequence of a gene or protein is easier than investigating its function. Typically, certain characteristics are experimentally determined for a few members of a protein family, and homologs are added to the functional group by sequence or structural similarity. Various databases classify proteins and map sequence motifs to functional annotation using this premise. They cluster proteins by multiple sequence alignments and use common structural motifs, "profile" patterns, or Hidden Markov Models derived from these alignments to classify new proteins. Sometimes these methods can correctly predict function, but sometimes not. Thus, it is common practice to group molecules into families on the basis of sequence similarities, even if they do not share common characteristics.
For TRANSPATH, it is advantageous to group molecules that show common signaling characteristics. Since we are primarily interested in function, we group molecules by function. In addition, we want to draw from information derived from expert knowledge and remain consistent with groupings that already exist. To solve this dilemma, we group the molecules as it is done traditionally, but link signaling only to those molecules for which it has been shown experimentally.
Given the relationships described in the figure above, if a reaction has been demonstrated for A1a(h), we link that reaction to the A1a(h) molecule only. We link statements made on a general level, such as those derived from reviews, for example, to nodes on a higher level in the molecule hierarchy. For example, we can link the general activation of A-like proteins to the orthofamily entry "A-like family", and the context of the original statement in the scientific literature is preserved.

Additional details about the TRANSPATH data model that has been used to describe molecule types in the BKL can be found at:

Choi C, Crass T, Kel A, Kel-Margoulis O, Krull M, Pistor S, Potapov A, Voss N, Wingender E. (2004) Consistent re-modeling of signaling pathways and its implementation in the TRANSPATH database, Genome Inf. Ser. 15: 244-254. [Abstract].

Copyright © geneXplain. All rights reserved.
Contact us at