Predicting Protein Structure

One might suggest the goals of functional genomics to be to define cellular roles for all proteins encoded by DNA in a genome, including:

The middle three items represent the base categories of the Gene Ontology (GO).

Why do we want to automate prediction of protein structure and function?

In a recent review of the state of the art, David Baker [Science, 2005, 310, 638] makes the point that structure prediction and design are complementary processes, and employ essentially the same methodology.

 

Predicting Secondary Structure is relatively easy. For example:

For greater accuracy we need:

Results tend to be right 70-75% of the time.

For an example of this and other prediction methodologies, I have used the sequence of the amyloid precursor protein (APP), one of the proteolysis products of which is the Alzheimer's amyloid protein:

MLPGLALLLLAAWTARALEVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKT
CIDTKEGILQYCQEVYPELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSD
ALLVPDKCKFLHQERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCC
PLAEESDNVDSADAEEDDSDVWWGGADTDYADGSEDKVVEVAEEEEVAEVEEEEADDDE
DDEDGDEVEEEAEEPYEEATERTTSIATTTTTTTESVEEVVREVCSEQAETGPCRAMISRWYF
DVTEGKCAPFFYGGCGGNRNNFDTEEYCMAVCGSAMSQSLLKTTQEPLARDPVKLPTTAAST
PDAVDKYLETPGDENEHAHFQKAKERLEAKHRERMSQVMREWEEAERQAKNLPKADKKAVI
QHFQEKVESLEQEAANERQQLVETHMARVEAMLNDRRRLALENYITALQAVPPRPRHV
FNMLKKYVRAEQKDRQHTLKHFEHVRMVDPKKAAQIRSQVMTHLRVIYERMNQSLSLLYNVPA
VAEEIQDEVDELLQKEQNYSDDVLANMISEPRISYGNDALMPSLTETKTTVELLPVNGEFSLDDLQ
PWHSFGADSVPANTENEVEPVDARPAADRGLTTRPGSGLTNIKTEEISEVKMDAEFRHDSGYEV
HHQKLVFFAEDVGSNKGAIIGLMVGGVVIATVIVITLVMLKKKQYTSIHHGVVEVDAAVTPEERHLS
KMQQNGYENPTYKFFEQMQN

Submitted to the PsiPred server (link on the main page), this brought the result by email:

PsiPred Prediction

Going after tertiary structure is MUCH more difficult.

The difficulty is related to the overall protein folding problem.

Levinthal's Paradox [J. Chim. Phys., 1968, 85, 44] suggests that given the size of the conformation space available to a protein of any significant size, finding a particular conformation starting from any other conformation could take longer than the age of the Universe.

Consider a polypeptide with 100 amino acid residues, and two equally possible conformations per residue

Hence, simply searching the conformation space of a protein for the "best" tertiary structure is futile, even with a supercomputer.

Several methodologies have been developed for dealing with this problem.

We will take a brief look at:

Comparative Modeling consists of four steps:

Two general methodologies are used for finding the template:

  • Sequence comparison plus distance geometry data from NMR

    • Experimental data allow use of less than optimum alignments

    • However, they require that the protein have been isolated and purified

    Software for implementing homology or comparative modeling includes

    • Modeller, from Andrej Sali's lab (link on main page), available for free download

    • SwissMod, implemented in the Swiss PDB Viewer

    My group at Maine is working with Profs David Neivandt and John Vetelino to develop a detection method for the toxin produced by "red tide" algae, saxitoxin:

    Saxitoxin

    • Saxitoxin accumulates in shellfish that filter the water containing it; when humans ingest the shellfish, the toxin blocks sodium channels, causing paralysis.

    • The infamous toxin from the Japanese pufferfish, tetrodotoxin, also has guadinium structures and acts similarly.

    • One approach is to use gramicidin, a 13-amino acid peptide secreted by several bacteria, which forms sodium channels when it becomes embedded in a cell membrane.

      A Gramicidin Channel, side view Looking Through the Channel

    • The other approach is saxiphilin, a protein found in various amphibians, that has the ability to bind saxitoxin extremely tightly (making the amphibians immune to the toxin)

    A sequence, but no structure, is available for saxiphilin. We carried out a BLAST search, and found that it has 34% sequence identity to an ovotransferrin (pdb 1aiv), with the identities well distributed throughout the sequence:

    Using the ovotransferrin as a template, we built a homology model at the Swiss Model web site; the initial model was subjected to 100 cycles of energy minimization using the Amber force field within UCSF Chimera:

    The Refined Model

    The next step is to dock saxitoxin to saxiphilin and determine the mode of binding. Then we will attempt to prepare a simple compound containing the right functionalities in the right geometry.


    This page last modified 1:43 PM on Tuesday September 29th, 2009.
    Webmaster, Department of Chemistry, University of Maine, Orono, ME 04469