BLAST Details

Alignments with proteins from S. cerevisiae, S. pombe, human fungal pathogens, C. elegans, human, mouse and rat, and proteins from the plant species A. thaliana, O. sativa, Z. mays, G. max, and S. bicolor, are precalculated for presentation and updated periodically.

Sources for Protein Sequences

S. cerevisiae protein sequences are from the genomic sequence data at the Saccharomyces Genome Database (SGD), and in some cases from the published data by individual research labs.
S. pombe protein sequences are from PomPep) translated from the genomic sequence data, and in some cases from the published data by individual research labs.
C. albicans protein sequences are from the Stanford Genome Technology Center, and in some cases from the published data by individual research labs.
Sequences for human fungal pathogens proteins (other than C. albicans) are from the published data by individual research labs.
C. elegans protein sequences are from Wormpep, translated from the genomic sequence data and in some cases from the published data by individual research labs.
Human, mouse and rat protein sequences are obtained from Entrez Gene.
A. thaliana, O. sativa, and Z. mays protein sequences are obtained from Entrez Gene.
G. max and S. bicolor protein sequences are obtained from Joint Genome Institute (JGI) Phytozome database.

Alignments

The sequences are aligned with Gapped BLAST (v2.0.10) with SEG and COIL filtering. Alignments that have an expectation score of less than or equal to 1e-3 are refined by Smith-Waterman alignment algorithms from the USC Sequence alignment package v2.0, with no filtering. Only those alignments that produce at least 20% identity or at least 40% similarity are shown. The SEG filter masks out small regions of low complexity in the sequence. Low complexity regions are runs comprised of a mere handful of the 20 amino acids. The COIL filter masks out potential coiled-coil domains. Low complexity regions and coiled-coil domains tend to have significant BLAST similarities to many unrelated proteins, often obscuring real similarities that may be present.

PDB comparisons: BLAST comparisons are performed against sequences corresponding to various known structural regions. Sequences are obtained from ftp://ftp.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt.Z. The results are shown in a table similar in format to that used for other BLAST comparisons within the databases.

Score and E-value (Expectation Value) are as determined by the gapped BLAST algorithm. To provide additional details for analyzing the significance of a match, the alignment length (Match Length), with percent identity (% Iden) and percent similarity (% Sim) across the alignment, are provided in additional columns on the output table. Another novel feature is the Position Line above each alignment, which displays the position of the alignment (with # symbols) relative to the full-length of the query protein.

BLAST and Smith-Waterman Details

BLAST Details
Program	blastp
Expected value	1e-03
Filter	seg
Cost to open a gap	10
Cost to extend a gap	1
Max alignments	250
Substitution matrix	BLOSUM62
Smith-Waterman Details
Filter	none
Cost to open a gap	11
Cost to extend a gap	1
Substitution matrix	BLOSUM62
Cutoffs	20% identity OR 40% similarity

Updates

BLAST updates are done at biweekly intervals. New protein sequences are loaded biweekly.

Additional Resources

To expand these alignments to other species, or to vary the parameters of the search, you may wish to consult the resources listed below.

The National Center for Biotechnology Information

Baylor College of Medicine Human Genome Center BCM Search Launcher

Saccharomyces Genome Database

References

BLAST
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990) Basic local alignment search tool. J Mol Biol 215:403-410. [Abstract]

gapped BLAST
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389-3402. [Abstract]

SEG filter program
Wootton JC, Federhen S. (1996) Analysis of compositionally biased regions in sequence databases. Methods Enzymol 266: 554-571. [Abstract]

Smith-Waterman algorithm
Waterman MS. Introduction to Computational Biology: Maps, sequences and genomes. Chapman & Hall. London: 1995. ISBN 0-412-99391-0

PDB
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. (2000) The Protein Data Bank, Nucleic Acids Research 28:235-242. [Abstract]