Primary Structure of Proteins
Proteins are built by forming amide, or peptide, bonds between a series of amino acids.
- Amino acids differ only in the nature of the side chains attached to the a-carbon; about 20 different ones occur.
- Therefore, they all have the same backbone structure:
- One end has an amino group that is not part of an amide link; this is called the N-terminal.
- The other end has a free carboxyl, called the C-terminal end.
- By custom, proteins are written with the N-terminal amino acid at the left.
The sequence in which the amino acids are connected is called the "primary structure". The sequence can be determined by
- Chemical degradative steps performed by automated machines (Edman degradation)
- Nmr (for oligomers of up to about 20 amino acids)
- Mass spectrometry
Frederick Sanger, an English biochemist, was the first person to sequence completely a protein, insulin, in 1953.
- The Nobel Committee was unusually rapid in awarding him the Prize in 1956 for this achievement.
- Sanger won a second Nobel in 1980 for developing methodology for sequencing nucleic acids.
- [Thee other people have won two Nobels: a Polish woman, and two American men. Can you name them?]
Sequencing nucleic acids has become the easiest way to determine the primary structure of a protein. The steps:
- Identify the gene coding for the protein
- Create cDNA from the gene
- Sequence the cDNA and translate
Note, however:
- cDNA does not encode post-translational processing; for example, many enzymes are synthesized as proenzymes
- cDNA does not locate disulfide bonds, which are crucial to structure and function
Only about 20 amino acids are found in naturally occurring proteins. Nonetheless, an enormous variety of primary structures is possible. Consider a tripeptide (three amino acids) as a simple example.
- The probability of finding any particular one of the amino acids at the N-terminal end is 1/20.
- The probability is the same for each of the other two positions.
- So the probability of finding a tripeptide with a particular amino acid sequence is 1/20 x 1/20 x 1/20 = 1/8000. Thus, 8000 different tripeptides can be built from the basic set of amino acids.
- For longer polypeptides of N residues, the number of possibilities is 20N.
- For a typical length, say 100 amino acids, 20100 is roughly 10130.
- This is more than the total number of particles in the universe! So, 20 amino acids gives us all the variety we need.
Since all proteins have the same backbone, the overall three-dimensional structure of proteins must be determined by the nature of the side chains attached to that backbone.
So, in principle, if one knows the sequence, one should be able to predict the entire structure: secondary, tertiary, and perhaps even quaternary. Good luck!
This page last modified 11:40 AM on Sunday October 10th, 2004.
Webmaster, Department of Chemistry, University of Maine, Orono, ME 04469