BLAST Details
Alignments with proteins from S. cerevisiae, S. pombe, human fungal pathogens, C. elegans, human, mouse and rat, and proteins from the plant species A. thaliana, O. sativa, Z. mays, G. max, and S. bicolor, are precalculated for presentation and updated periodically.
Sources for Protein Sequences
- S. cerevisiae protein sequences are from the genomic sequence data at the Saccharomyces Genome Database (SGD), and in some cases from the published data by individual research labs.
- S. pombe protein sequences are from PomPep) translated from the genomic sequence data, and in some cases from the published data by individual research labs.
- C. albicans protein sequences are from the Stanford Genome Technology Center, and in some cases from the published data by individual research labs.
- Sequences for human fungal pathogens proteins (other than C. albicans) are from the published data by individual research labs.
- C. elegans protein sequences are from Wormpep, translated from the genomic sequence data and in some cases from the published data by individual research labs.
- Human, mouse and rat protein sequences are obtained from Entrez Gene.
- A. thaliana, O. sativa, and Z. mays protein sequences are obtained from Entrez Gene.
- G. max and S. bicolor protein sequences are obtained from Joint Genome Institute (JGI) Phytozome database.
Alignments
The sequences are aligned with Gapped BLAST (v2.0.10) with SEG and COIL filtering. Alignments that have an expectation score of less than or equal to 1e-3 are refined by Smith-Waterman alignment algorithms from the USC Sequence alignment package v2.0, with no filtering. Only those alignments that produce at least 20% identity or at least 40% similarity are shown. The SEG filter masks out small regions of low complexity in the sequence. Low complexity regions are runs comprised of a mere handful of the 20 amino acids. The COIL filter masks out potential coiled-coil domains. Low complexity regions and coiled-coil domains tend to have significant BLAST similarities to many unrelated proteins, often obscuring real similarities that may be present.
PDB comparisons: BLAST comparisons are performed against sequences corresponding to various known structural regions. Sequences are obtained from ftp://ftp.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt.Z. The results are shown in a table similar in format to that used for other BLAST comparisons within the databases.
Score and E-value (Expectation Value) are as determined by the gapped BLAST algorithm. To provide additional details for analyzing the significance of a match, the alignment length (Match Length), with percent identity (% Iden) and percent similarity (% Sim) across the alignment, are provided in additional columns on the output table. Another novel feature is the Position Line above each alignment, which displays the position of the alignment (with # symbols) relative to the full-length of the query protein.
BLAST and Smith-Waterman Details
BLAST Details | |
---|---|
Program | blastp |
Expected value | 1e-03 |
Filter | seg |
Cost to open a gap | 10 |
Cost to extend a gap | 1 |
Max alignments | 250 |
Substitution matrix | BLOSUM62 |
Smith-Waterman Details | |
Filter | none |
Cost to open a gap | 11 |
Cost to extend a gap | 1 |
Substitution matrix | BLOSUM62 |
Cutoffs | 20% identity OR 40% similarity |
Updates
BLAST updates are done at biweekly intervals. New protein sequences are loaded biweekly.
Additional Resources
To expand these alignments to other species, or to vary the parameters of the search, you may wish to consult the resources listed below.
The National Center for Biotechnology Information
Baylor College of Medicine Human Genome Center BCM Search Launcher
References
BLAST
Altschul SF, Gish W, Miller W, Myers EW, Lipman
DJ. (1990) Basic local alignment search tool. J Mol Biol
215:403-410. [Abstract]
gapped BLAST
Altschul SF, Madden TL, Schaffer AA, Zhang
J, Zhang Z, Miller W, Lipman DJ. (1997) Gapped
BLAST and PSI-BLAST: a new generation of protein database search
programs. Nucleic Acids Res 25:3389-3402. [Abstract]
SEG filter program
Wootton JC, Federhen S. (1996) Analysis of compositionally
biased regions in sequence databases. Methods Enzymol 266:
554-571. [Abstract]
Smith-Waterman algorithm
Waterman MS. Introduction to Computational Biology: Maps, sequences
and genomes. Chapman & Hall. London: 1995. ISBN
0-412-99391-0
PDB
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H,
Shindyalov IN, Bourne PE. (2000) The Protein Data Bank, Nucleic
Acids Research 28:235-242. [Abstract]