Whether we call it predicting or folding, the problem is the same one. Given the primary structure of a protein, how do we find, theoretically (that is, no X-rays) what the tertiary structure is?
Why do we want to predict protein structure ?
In a recent review of the state of the art, Baker [Science, 2005, 310, 638] makes the point that prediction and design are complementary processes, and employ essentially the same methodology.

Predicting Secondary Structure is relatively easy. For example:
For greater accuracy we need:
Results tend to be right 70-75% of the time.
For an example of this and other prediction methodologies, I have used the sequence of the amyloid precursor protein (APP), one of the proteolysis products of which is the Alzheimer's amyloid protein:
MLPGLALLLLAAWTARALEVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKT
CIDTKEGILQYCQEVYPELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSD
ALLVPDKCKFLHQERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCC
PLAEESDNVDSADAEEDDSDVWWGGADTDYADGSEDKVVEVAEEEEVAEVEEEEADDDE
DDEDGDEVEEEAEEPYEEATERTTSIATTTTTTTESVEEVVREVCSEQAETGPCRAMISRWYF
DVTEGKCAPFFYGGCGGNRNNFDTEEYCMAVCGSAMSQSLLKTTQEPLARDPVKLPTTAAST
PDAVDKYLETPGDENEHAHFQKAKERLEAKHRERMSQVMREWEEAERQAKNLPKADKKAVI
QHFQEKVESLEQEAANERQQLVETHMARVEAMLNDRRRLALENYITALQAVPPRPRHV
FNMLKKYVRAEQKDRQHTLKHFEHVRMVDPKKAAQIRSQVMTHLRVIYERMNQSLSLLYNVPA
VAEEIQDEVDELLQKEQNYSDDVLANMISEPRISYGNDALMPSLTETKTTVELLPVNGEFSLDDLQ
PWHSFGADSVPANTENEVEPVDARPAADRGLTTRPGSGLTNIKTEEISEVKMDAEFRHDSGYEV
HHQKLVFFAEDVGSNKGAIIGLMVGGVVIATVIVITLVMLKKKQYTSIHHGVVEVDAAVTPEERHLS
KMQQNGYENPTYKFFEQMQN
Submitted to the PsiPred server (link on the main page), this brought the result by email:
Going after tertiary structure is MUCH more difficult.
The difficult is related to the overall protein folding problem.
Levinthal's Paradox [J. Chim. Phys., 1968, 85, 44] suggests that given the size of the conformation space available to a protein of any significant size, finding a particular conformation starting from any other conformation could take longer than the age of the Universe.
Consider a polypeptide with 100 amino acid residues, and two equally possible conformations per residue
Hence, simply searching the conformation space of a protein for the "best" tertiary structure is futile, even with a supercomputer.
Several methodologies have been developed for dealing with this problem.
We will take a brief look at:
Comparative Modeling consists of four steps:
Two general methodologies are used for finding the template:
Software for implementing homology or comparative modeling includes
Reading the raw APP sequence into Deep View, and submitting it to SwissMod brought the result:
Satyavan Singh in my group at Maine is applying homology modelling to the structure of a quinone reductase from brown rot fungus. The sequence was determined at the USDA Forest Product Lab in Madison by Ken Hammel and his group:
MCFPSKRRKDGSPEEGGRIKRSRSAQEPAESTNTPAPPTSTGTKPTTTQTTDTTMSSPRLAIVIYTMYGH
VAKLAEAIKSGIEGAGGNASIFQVAETLSPEILNLVKAPPKPDYPVMDPLDLKNYDGFLFGIPTRYGNFP
VQWKAFWDSTGPLWASTALCGKYAGLFVSTGSPGGGQESTLMAAMSTLVHHGVIYVPLGYKYTFAQLANL
TEVRGGSPWGAGTFANSDGSRQPTPLELEIANLQGKSFYEYVARVKW
A first pass through the SwissMod site found three model proteins that have approximately 40% sequence identity. When modeled against those templates, our sequence produced the structure:
| Quinone Reductase from Brown Rot Fungus, G. trabeum |
|---|
![]() |
The enzyme is known to require FAD as a cofactor; our next step is docking FAD into this structure.