http://compbio.soe.ucsc.edu/sam.html

Sequence Alignment and Modeling System
SAM-T02
HMM WWW Servers
SAM 3.5 (July 2005) is available!
The SAM
documentation (the 175 page,
manual is also available in PDF and PS)
discusses the changes from previous versions.
If you are a college, university, U.S. government lab, or nonprofit,
you can download the software from the SAM distribution page.
If you are interested in SAM for commercial use, please request more
information from [email protected]
Martin Madera and Julian Gough have written a perl converter
between
SAM and HMMer 2.0 formats. You can
get it from them (be sure to read their excellent documentation!)
or download
a 10/24/2000 copy.
Please read the ISMB99
tutorial on using HMMs
A linear hidden Markov model is a sequence of nodes, each
corresponding to a column in a multiple alignment. In our HMMs, each
node has a match state (square), insert state (diamond) and delete
state (circle). Each sequence uses a series of these states to
traverse the model from start to end. Using a match state indicates
that the sequence has a character in that column, while using a delete
state indicates that the sequence does not. Insert states allow
sequences to have additional characters between columns. In
many ways, these models correspond to profiles.
The primary advantage of these models over
standard methods of
sequence search
is their ability to characterize an entire family of sequences.
Thus,
each position has a distribution of bases, as do transitions
between states. That is, these linear HMMs have position-dependent
character distributions and position-dependent insertion and deletion
gap penalties. The alignment of each of a family to a trained model
automatically yields a multiple alignment among those sequences.
The SAM software system is a collection of tools for
creating and using these models.
The algorithms and methods used by SAM and other HMM systems
were initially described in several papers from the University of
California, Santa Cruz.
These papers, several of which are described below, are available in
the UCSC
Computational Biology group's Protein FTP
directory.
The complete SAM documentation
is available in compressed
(.gz) postscript and as a series of WWW
pages.
We also have a
2-page
overview of SAM in
postscript.
SAM runs on Unix workstation. Building a model using SAM
can require minutes to several hours
on a workstation depending on the length of the model, the number of
sequences, and other factors.
SAM makes use of UCSC's Dirichlet
mixture regularizer research.
The creation and distribution of SAM has been supported in
part by NSF grants CDA-9115268,
IRI-9123692, DBI-9408579 and
DBI-9808007; ONR grant
N00014-91-J-1162; NIH grants GM17129
and 1 R01 GM068570-01; DOE
grant DE-FG03-95ER62112; a grant from the Danish Natural Science
Research Council; and the UCSC
Center for Biomolecular
Science and Engineering;
Sean Eddy has written another program suite based on these methods
called HMMER,
which may also be of interest. SAM includes conversion programs
between the two systems' formats.
Hidden Markov models are used extensively in
speech recognition.
- Hidden
Markov models in computational biology: Applications to
protein modeling.
A. Krogh, M. Brown, I. S. Mian, K. Sjolander, and D. Haussler.
Journal of Molecular Biology , 235:1501--1531, February
1994.
The original journal article.
- Hidden Markov models for sequence analysis:
Extension and
analysis of the basic method. R. Hughey and A. Krogh,
CABIOS 12(2): 95-107, 1996.
(HTML
version)
or
(POSTSCRIPT
version)
Experimental evaluation of noise methods and regularizers, with
discussions of surgery, the parallel SAM code, and finding motifs.
-
Hidden Markov Models for Detecting Remote Protein Homologies
K. Karplus, C. Barrett, and R. Hughey, Bioinformatics
14(10):846--856, 1998.
(HTML
version) or (postscript).
Detailed discussion of the SAM-T98 method we applied to CASP3 to
predict protein structure.
-
Predicting protein structure using hidden Markov models
K. Karplus, K. Sjolander, C. Barrett, M. Cline, D. Haussler,
R. Hughey, L. Hold, C. Sander, Proteins: Structure, Function, and
Genetics. Pp. 134--139, Supplement 1, 1997
(HTML
version)
Discussion of our CASP2 methods for using hidden Markov models to
predict protein structure.
- Weighting Hidden Markov Models for Maximum
Discrimination. R. Karchin and A. Hughey,
Bioinformatics, 14(9):772--782, 1998.
(HTML
version with mangled table headings) and postscript.
Adding internal weighting to SAM to create SAM Version 2.0. Includes
a comparison of SAM to HMMer, Meta-MEME, and Probabistic Smith
Waterman (from Agarawal and States paper) based on 67 discrimination
tests from Pearson.
- C. Tarnas and R. Hughey
Reduced
space hidden Markov model training
14(5):401--406, 1998.
Also available in postscript
and pdf.
Discussion and analysis of the implementation of the checkpoint method
(see Grice, below) in SAM.
-
Transparencies
from our
CASP2 talk, at
which UCSC's hidden Markov model methods were
among the very top overall scores among threading-based predictions of
protein structure.
-
Scoring Hidden Markov Models
C. Barrett and R. Hughey and K. Karplus
CABIOS 13(2):191-199, 1997.
Available in
postscript and compressed
(.gz) postscript as well.
Experimental evaluation of several different scoring methods
using
both SAM and HMMer.
-
Tutorial:
Stochastic Modeling Techniques: Understanding and using hidden
Markov models.
L. Grate, R. Hughey, K. Karplus, K. Sjölander. University of
California, Santa Cruz, CA, June 1996. SAM and HMMER tutorial used at
ISMB last June 1996. (compressed
postscript (.ps.Z))
-
"A Flexible Motif Search Technique based on Generalized Profiles"
(compressed postscript) Philipp Bucher, Kevin Karplus, Nicolas
Moeri, and Kay Hoffman, Computers and Chemistry Jan 1996,
20(1) 3--24. (
postscript).
An evaluation of search techniques for linead hidden Markov
models and generalized profiles.
- J Alicia Grice, Richard
Hughey, and Don Speck
Reduced
Space Sequence Alignment
CABIOS 13(1):45-53, 1997.
To be part of SAM2.0, this checkpoint method has many advantages over
the divide-and-conquer method.
-
SAM : Sequence alignment and modeling software system.
R. Hughey and A. Krogh, Technical Report UCSC-CRL-95-7,
University of California,
Santa Cruz, CA, January 1995. (Regularly updated.)
The SAM documentation.
-
Dirichlet Mixtures: A Method for Improving
Detection of Weak but Significant Protein Sequence Homology.
Sjolander, K,
Karplus, K., Brown, M., Hughey, R., Krogh, A., Mian, I.S., and
Haussler, D.
The most up-to-date discussion of Dirichlet Mixtures. The
method is
an option in SAM.
- Using
Dirichlet mixture priors to derive hidden Markov models for protein
families.
M. P. Brown, R. Hughey, A. Krogh, I. S. Mian, K. Sjolander, and
D. Haussler. In L. Hunter, D. Searls, and J. Shavlik, editors, Proc.
of First Int. Conf. on Intelligent Systems for Molecular Biology ,
pages 47--55, Menlo Park, CA, July 1993. AAAI/MIT Press.
The original Dirichlet paper.
- Massively parallel biosequence analysis.
R. Hughey. Technical Report UCSC-CRL-93-14, University of
California, Santa Cruz, CA, April 1993.
(HTML version)
or
(POSTCRIPT
version)
Parallel sequence analysis on specialized hardware, and the parallel
SAM code.
Other papers and pointers of interest (please email new pointers!)
-
"Profile Hidden Markov Models" Sean R. Eddy (1998) Bioinformatics
14(9), review of HMMs.
-
"Maximum Discrimination Hidden Markov Models of Sequence Consensus"
Sean R. Eddy, Graeme Mitchison, and Richard Durbin (1995). J.
Computational Biology 2:9-23. PostScript; 30 pages. Describes an
alternative to maximum likelihood parameter optimization for HMMs which
compensates for the biased sequence representation caused by
phylogenetic relationships.
-
"Multiple Alignment Using Hidden Markov Models" Sean R. Eddy
(1995). Proc. Third Int. Conf. Intelligent Systems for Molecular
Biology, C. Rawlings et al., eds. AAAI Press, Menlo Park. pp. 114-120.
PostScript; 7 pages. Describes a simulated annealing algorithm for HMM
training and a probabilistic suboptimal alignment algorithm. Compares
HMM-based multiple alignment to CLUSTALW.
- Parameterization
studes for the SAM and HMMER methods of hidden Markov
model generation Marcella A. McClure, Chris Smith, and Pete Elton.
Proc. Fourth Int. Conf. Intelligent Systems for Molecular Biology, D.
States et al., eds. AAAI Press, Menlo Park. pp. 155-164. A detailed
comparison of HMM training methods for constructing multiplie
alignments.
-
"Fitting a mixture model by expectation maximization to discover motifs
in biopolymers" , Timothy L. Bailey and Charles Elkan, Proceedings
of the Second International Conference on Intelligent Systems for
Molecular Biology, (28-36), AAAI Press, 1994, and an associated MEME server.
-
"Meta-MEME: Motif-based Hidden Markov Models of Protein Families".
Grundy, William N., Timothy L. Bailey, Charles P. Elkan and Michael E.
Baker. Computer Applications in the Biosciences, 3(4):397-406,
1997, and an associated Meta-MEME
server.
-
Searching for statistically significant regulatory modules. Timothy
L. Bailey and William Stafford Noble Bioinformatics (Proceedings of
the European Conference on Computational Biology)., 19(Suppl.
2):ii16-ii25, 200 and an associated MCAST
server.
|