The primary events in the development of biological diversity are the mutation, insertion, and deletion of nucleotides in DNA.
The primary structure of a protein is determined by the gene (DNA) that encodes it. A change in the gene may produce:
That is, evolution of protein structures, in parallel with evolution of the organisms, may occur through positive or negative selection, or by fixation of random neutral variations.
Patterns of conservation or variation at individual positions provide clues to selective constraints: constraints that maintain or improve function.
Consider myoglobin, the oxygen storage protein. Here are the sequences from two myoglobins - that from a whale and one from a plant, lupin (the whale is on top) :
| Sequence Alignment: Whale and Lupin Myoglobins |
|---|
![]() |
Not a lot of residues are conserved (25%) - that is, the same - between these two organisms, which diverged from a common ancestor some hundreds of millions of years ago.
| Sperm Whale Myoglobin | Lupin Leghemoglobin |
|---|---|
![]() |
![]() |
| Backbone Alignment of Whale (1mbd) and Lupin (2gdm) Globins RMSD = 1.39 Å |
|---|
![]() |
Another example is insulin, synthesized initially as proinsulin, containing a sequence of residues that is excised to form the mature, functional hormone.
This "C chain" actually connects an "A chain" and a "B chain" that in the mature hormone are held together only by disulfide bonds (shown in yellow):
| Human Insulin (1guj) |
|---|
![]() |
| (The protein crystallizes as a dimer.) |
We might expect, if we compare insulin from several species, to find that variation is greatest in the C chain, which is disfunctional, and conservation is greatest in the A and B chains, which are functional. Here is a table comparing the B and C chains of five species:
| B Chain Sequences | |||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Human | F | V | N | Q | H | L | C | G | S | H | L | V | E | A | L | Y | L | V | C | G | E | R | G | F | F | Y | T | P | K | T | |||||
| Pig | F | V | N | Q | H | L | C | G | S | H | L | V | E | A | L | Y | L | V | C | G | E | R | G | F | F | Y | T | P | K | A | |||||
| Cow | F | V | N | Q | H | L | C | G | S | H | L | V | E | A | L | Y | L | V | C | G | E | R | G | F | F | Y | T | P | K | A | |||||
| Guinea Pig | F | V | S | R | H | L | C | G | S | N | L | V | E | T | L | Y | S | V | C | Q | D | D | G | F | F | Y | I | P | K | D | |||||
| Rat | F | V | K | Q | H | L | C | G | P | H | L | V | E | A | L | Y | L | V | C | G | E | R | G | F | F | Y | T | P | K | S | |||||
| C Chain Sequences | |||||||||||||||||||||||||||||||||||
| Human | R | R | E | A | E | D | L | Q | V | G | Q | V | E | L | G | G | G | P | G | A | G | S | L | Q | P | L | A | L | E | G | S | L | Q | K | R |
| Pig | R | R | E | A | E | N | P | Q | A | G | A | V | E | L | G | G | G | - | - | L | G | G | L | Q | A | L | A | L | E | G | P | P | Q | K | R |
| Cow | R | R | E | V | E | G | P | Q | V | G | A | L | E | L | A | G | G | P | G | A | G | G | L | - | - | - | - | - | E | G | P | P | Q | K | R |
| Guinea Pig | R | R | E | L | E | D | P | Q | V | E | Q | T | E | L | G | M | G | L | G | A | G | G | L | Q | P | L | A | L | E | M | A | L | Q | K | R |
| Rat | R | R | E | V | E | D | P | Q | V | P | Q | L | E | L | G | G | G | P | E | A | G | D | L | Q | T | L | A | L | E | V | A | R | Q | K | R |
Note the double basic residues at each end of the C chain (RR, Arg-Arg, and KR, Lys-Arg).
Variability in the rest of the chain clearly is greater than in the A.