One might suggest the goals of functional genomics to be to define cellular roles for all proteins encoded by DNA in a genome, including:
The middle three items represent the base categories of the Gene Ontology (GO).
Why do we want to automate prediction of protein structure and function?
In a recent review of the state of the art, David Baker [Science, 2005, 310, 638] makes the point that structure prediction and design are complementary processes, and employ essentially the same methodology.

Predicting Secondary Structure is relatively easy. For example:
For greater accuracy we need:
Results tend to be right 70-75% of the time.
For an example of this and other prediction methodologies, I have used the sequence of the amyloid precursor protein (APP), one of the proteolysis products of which is the Alzheimer's amyloid protein:
MLPGLALLLLAAWTARALEVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKT
CIDTKEGILQYCQEVYPELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSD
ALLVPDKCKFLHQERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCC
PLAEESDNVDSADAEEDDSDVWWGGADTDYADGSEDKVVEVAEEEEVAEVEEEEADDDE
DDEDGDEVEEEAEEPYEEATERTTSIATTTTTTTESVEEVVREVCSEQAETGPCRAMISRWYF
DVTEGKCAPFFYGGCGGNRNNFDTEEYCMAVCGSAMSQSLLKTTQEPLARDPVKLPTTAAST
PDAVDKYLETPGDENEHAHFQKAKERLEAKHRERMSQVMREWEEAERQAKNLPKADKKAVI
QHFQEKVESLEQEAANERQQLVETHMARVEAMLNDRRRLALENYITALQAVPPRPRHV
FNMLKKYVRAEQKDRQHTLKHFEHVRMVDPKKAAQIRSQVMTHLRVIYERMNQSLSLLYNVPA
VAEEIQDEVDELLQKEQNYSDDVLANMISEPRISYGNDALMPSLTETKTTVELLPVNGEFSLDDLQ
PWHSFGADSVPANTENEVEPVDARPAADRGLTTRPGSGLTNIKTEEISEVKMDAEFRHDSGYEV
HHQKLVFFAEDVGSNKGAIIGLMVGGVVIATVIVITLVMLKKKQYTSIHHGVVEVDAAVTPEERHLS
KMQQNGYENPTYKFFEQMQN
Submitted to the PsiPred server (link on the main page), this brought the result by email:
Going after tertiary structure is MUCH more difficult.
The difficulty is related to the overall protein folding problem.
Levinthal's Paradox [J. Chim. Phys., 1968, 85, 44] suggests that given the size of the conformation space available to a protein of any significant size, finding a particular conformation starting from any other conformation could take longer than the age of the Universe.
Consider a polypeptide with 100 amino acid residues, and two equally possible conformations per residue
Hence, simply searching the conformation space of a protein for the "best" tertiary structure is futile, even with a supercomputer.
Several methodologies have been developed for dealing with this problem.
We will take a brief look at:
Comparative Modeling consists of four steps:
Two general methodologies are used for finding the template:
Software for implementing homology or comparative modeling includes
My group at Maine is working with Profs David Neivandt and John Vetelino to develop a detection method for the toxin produced by "red tide" algae, saxitoxin:
| Saxitoxin |
|---|
| A Gramicidin Channel, side view | Looking Through the Channel |
|---|---|
![]() |
![]() |
A sequence, but no structure, is available for saxiphilin. We carried out a BLAST search, and found that it has 34% sequence identity to an ovotransferrin (pdb 1aiv), with the identities well distributed throughout the sequence:

Using the ovotransferrin as a template, we built a homology model at the Swiss Model web site; the initial model was subjected to 100 cycles of energy minimization using the Amber force field within UCSF Chimera:
| The Refined Model |
|---|
The next step is to dock saxitoxin to saxiphilin and determine the mode of binding. Then we will attempt to prepare a simple compound containing the right functionalities in the right geometry.