Lecture 3

The Polymerase Chain Reaction

Being sophisticated students, I won't be able to surprise you with the news that there is a technique called the Polymerase Chain Reaction, and that it has revolutionized the way that molecular biology is conducted.

I'm always amused by the science instructional materials for young children that profess that they teach the nature of science, but for some reason they never get around to talking about Kary Mullis, LSD, California's Highway 128, and mile-marker 46.58.

Who is Kary Mullis?


While driving in his Honda Civic, Mullis had the flash of insight that two short DNA segments, oligonucleotide primers, could direct the amplification of a segment of DNA. At high temperature, the DNA segment would be denatured, and it could be annealed to short primers at a lower temperature. A DNA polymerase could be added, along with nucleotide substrates (dATP, dTTP, dCTP, and dGTP), to cause synthesis from the 3' end of each primer, using the annealed DNA as template. Where does the LSD come in shaping his imagination? I don't want to spoil the story - you'll just have to read about it.

Back in the early days of PCR work, the polymerase would need to be added fresh to the reaction at each temperature cycle, because thermostable enzymes were not commercially available. Of course, there were also no thermocyclers then, and so moving the tubes from one temperature bath to another every 30 seconds to a minute for several hours was a mind numbing job that fell to graduate students.

Polymerase Chain Reaction - the early years.

Nowadays, we have instruments for changing temperature that are much more sophisticated, but it is fundamentally no different than having a few pots of hot water at different temperatures. When our PCR machines start leaking their "coolant" all over the bench, the two become one and the same! But I digress.

In the polymerase chain reaction, a DNA template is repetitively:

  • denatured into single stranded molecules,
  • annealed to specific oligonucleotide primers (one specific primer per strand),
  • copied with DNA polymerase to extend the primers to the end of the DNA strand.

The reaction-in brief

And for all you visual learners, take a look at this step-by-step animation

Step 1: Denaturation

Why must the DNA be denatured into single strands? Because without separation of strands, you would not be able to anneal (i.e. hybridize) specific primers in the next step.
Step 2: Annealing

Primers in excess
The annealing reaction is very efficient because the primers are "in excess" in the reaction. In a typical PCR reaction, 10,000 molecules of a template may be used, which is 1.6 x 10-20 moles (0.016 attomoles). On the other hand, 5 picomoles of each primer may be used (5 x 10-12 moles) -- that is a 3 x 108 fold excess.

Temperature controls annealing rate

The rate of annealing is controlled by adjusting the temperature of the solution. At 55 C under most PCR salt conditions, typical primers of 18 nt. in length efficiently hydrogen bond to a DNA template. Adjustments in the protocol are made to account for the G/C vs. A/T richness of the primer and the overall length. There are many programs or Web sites at which one may calculate the Tm (melting temperatures) based on sequence and salt condition.
Step 3: Extension

You can see, from a comparison of the figures for step 1 and step 3, that we now have two double-stranded DNA copies of the sequences between the specific primers. By denaturing these two copies and repeating the annealing and synthesis steps, we can obtain four copies.


Now if we repeat the process again, we can obtain eight copies.

Note that the sequence between the two primers is being copied or "amplified" exponentially, whereas the original template is not.

So, here's the thing. When you perform a PCR reaction, you have two oligonucleotides that anneal to two different strands of the DNA, and have 3' ends that point towards each other.

Here's the image to keep in mind. If you start with "red colored" oligonucleotide primers, and had "blue" substrates (dNTPs), then your final product would look like this:

You managed to amplify everything between (and including) the oligonucleotide sequences. The 5' ends of the two oligos are exactly the 5' ends of the product. Let me reiterate: exactly.

Someone in the class always worries about the following problem, so if it happens that you are the one worried one this year, then you'll want to read this:

During synthesis from the original template, the polymerase goes past the position where the other primer will bind. Like this:

You are about to ask me happens to these "loose ends" that are longer than the intended product? Well, for one thing, you're really worrying about a very small problem. You should probably stop washing your hands so many times a day too! These longer versions are generated only from the original template, and not from the copies; as a result they are not generated at an exponential rate. After 30 cycles, you will have only made about 30 copies of these longer products (per template), but you will have made about 2 to the 30th power (about 1 billion) copies of the shorter version (per template). Why? because the longer products can't be used to make more of themselves - they only are templates for the shorter product. If you are still worried, don't worry! Be happy!

Gaze at this picture some more:

Trace it with your fingers on the screen. Make a model of it out of mashed potatoes. Build a larger model of it out of dirt, in your living room. Feel the overpowering urge to go to Wyoming to see the original. This is a really important image, and once you understand it, everything else will fall into place in your life.

An example:

Consider the following sequence - a little bit from Yersinia pestis. Only one strand is shown, and you'll have to imagine what the other strand looks like. By convention, the first nucleotide is a 5' end.

  1 caggaaggcg gcaacgaagc gagtccccag gagcttacat cagtaagtga
 51 ctggggtgaa cgacggcagc caacgcacat gcaactcgaa gtatgacggg
101 taaatgggtg agcttatgct cggaacaacg cattattcag ggggttaatt
151 gatagattga ccagccactg ccggacgtca tttgaggatt gtctgaatct
201 cttgccactc tagtttaatc atgtgaggta acaatatatg ctgatgaaga
251 gcaaatgggc tttaagcaat aattaacacg ataacaagga tgtactatgg
301 atacagttga agagctgggc gggacgtact tttatgcagg caagccgaat
351 ttaaaagcca gtgagctact atttatgatt ttctgtgaga atactgccag
401 ccagtttggt atgcaggatt tcggggctgt ggttgcgatt atttcgggta
451 gaagcaatct cttaacaaga ggaaagccca taggtgcaac aaaaggcaca
501 tcatacgcct ccaaagcggc acgaagtgta tttaagaaaa caaaattccc
551 atttgggatg tcgctaccta catggttagg cgggtatact ccatggacgg
601 cgagaaaagc gatggtgcgt aatatcgccc cgtttgttgg tcgttcgatc
651 cccctacttg gcctcatcat tattgctgct gacgtatcag caatcactta
701 ccgtacaatt cgtgactaca acatgatagc aagggggggc gataagctat
751 ggtagatgat attgaacaac gaatttatga tcttgtccgg ccttatgctg
801 gcgtctatgt gttcaagaga aagccagtat ctttgacccc tgatacagac
851 ttagacactg acctaagtat tgatgagctt gaaatagaag atttaatgaa
901 tgacttcttt aaagagttct ctgtacaaag aggtaatttc aatattaaaa
951 attacttccc tgacgttcct ttttctttca atccattcaa aaaaacagcg

It contains a coding sequence of "unknown function" from nt 297 (an ATG start codon) to nt 755 (a TAG stop codon). How would you design oligonucleotides, each 18 nt in length, that would allow you to extract that coding sequence by PCR. You want only that sequence.

Well, it is easy, and you only have to think about that really important picture (see above). The first oligonucleotide will start with nt 297 of the sequence shown, and go until nt 314. That gives you 18 nt as follows:

5' atggatacagttgaagag

The other oligo will have a 5' end at nt 755, but it will be sequence from the other strand. Let me say that again, because it is the greatest cause of wrong oligonucleotides: It will be sequence from the other strand.

It will look like this, and please note that the 5' end is on the right in this representation:

                ccgctattcgataccatc 5'

So, I hear you thinking, "Where did that come from?" It is sequence from the strand that you have to imagine - the one that is base paired to the 1 kbp sequence that is shown. Where the Yersinia sequence has a TAG stop codon at 755, the G nucleotide at 755 is at the 3' end of the coding sequence. It is base paired to the C nucleotide at the 5' end of the oligonucleotide (at the right end, above). For example, the 5'-TAG base pairs to the ATC-5'. OK?

What PCR product would we get? Here it is, in double-stranded form, and it should remind you strongly of that picture you can't get out of your mind. The red sequence is from the oligonucleotides, and the blue sequence was contributed by synthesis in the reaction:
297                                                5' atgg

301 atacagttga agagctgggc gggacgtact tttatgcagg caagccgaat
    tatgtcaact tctcgacccg ccctgcatga aaatacgtcc gttcggctta

351 ttaaaagcca gtgagctact atttatgatt ttctgtgaga atactgccag
    aattttcggt cactcgatga taaatactaa aagacactct tatgacggtc

401 ccagtttggt atgcaggatt tcggggctgt ggttgcgatt atttcgggta
    ggtcaaacca tacgtcctaa agccccgaca ccaacgctaa taaagcccat

451 gaagcaatct cttaacaaga ggaaagccca taggtgcaac aaaaggcaca
    cttcgttaga gaattgttct cctttcgggt atccacgttg ttttccgtgt

501 tcatacgcct ccaaagcggc acgaagtgta tttaagaaaa caaaattccc
    agtatgcgga ggtttcgccg tgcttcacat aaattctttt gttttaaggg

551 atttgggatg tcgctaccta catggttagg cgggtatact ccatggacgg
    taaaccctac agcgatggat gtaccaatcc gcccatatga ggtacctgcc

601 cgagaaaagc gatggtgcgt aatatcgccc cgtttgttgg tcgttcgatc
    gctcttttcg ctaccacgca ttatagcggg gcaaacaacc agcaagctag

651 cccctacttg gcctcatcat tattgctgct gacgtatcag caatcactta
    ggggatgaac cggagtagta ataacgacga ctgcatagtc gttagtgaat

701 ccgtacaatt cgtgactaca acatgatagc aagggggggc gataagctat
    ggcatgttaa gcactgatgt tgtactatcg ttcccccccg ctattcgata

751 ggtag
    ccatc 5'

Well it would be really nice if we could just draw a picture of what we wanted and could send that to a company that makes oligonucleotides - let them figure out the rest of it for us! Unfortunately, they want the order placed in a standard format, and that means we need to do one more little task. The company will want us to give them the sequences to make, written 5' to 3'. For the first oligo that is easy because it is already in that format:

5' atggatacagttgaagag

The second oligo needs to be rotated so the 5' end is on the left. Remember that what we want is this:

ccgctattcgataccatc 5'

So when it is rotated 180 degrees it looks like this:

5' ctaccatagcttatcgcc

That's what we tell the company we want for the second oligo. It is hard to look at this and see that it really amounts to nt 738 to nt 755 of the Yersinia sequence, because it seems upside down and backwards. In fact, we call it the "reverse complement" which means pretty much the same thing.

Remember -

This image tells you how to solve the problem of oligo design.

An overview of what is needed:

Here's the recipe: A pair of short oligonucleotide primers specific for a DNA sequence, with the ability to hybridize to the opposite strands of that molecule (3' ends pointing "towards" each other):

  • A DNA template.
  • A thermostable DNA polymerase (such as Taq or Pfu polymerase), and all four dNTP substrates (meaning dATP, dGTP, dCTP, dTTP)
  • A machine that can change the incubation temperature of the reaction tube automatically, cycling between approximately 98 C (for denaturation), 55 C (for oligonucleotide annealing), and 72 C (for synthesis).
Where does the thermostable polymerase come from?

These days, the thermostable polymerases are commercial products, but not too long ago they were only found in hot springs (such as this one at Yellowstone National Park).


The temperature changes:

When you program the thermocycler, you specify a series of temperatures and times, such as:

Temperature        Time
   98 C            30 seconds
   55 C            30 seconds
   72 C            60 seconds

...and specify the number of times this series should be repeated (for example, 35 times). Multiple program segments may be linked together, as in the following example:

Temperature         Time

program segment 1, do 1 time:
   98 C             5 minutes

program segment 2, do 35 times:
   98 C            30 seconds
   55 C            30 seconds
   72 C            60 seconds

program segment 3, do 1 time:
   72 C            10 minutes

program segment 4, do 1 time:
    4 C           999 minutes

What was the purpose of each of these segments?

Program segment #1: To denature the template fully, prior to the first synthetic step. If this step is excessive, there can be damage to the enzyme or template, reducing the efficiency. If too short, the template is not available for synthesis.

Program segment #2: To amplify the DNA fragment, with the following steps taken:

  • Denature DNA at 96 C
  • Anneal oligonucleotides at 55 C
  • Synthesize DNA at 72 C

The individual temperatures (96, 55, 72) may be optimized for each reaction, however the thermostable polymerases generally work well at 72-74 C, and typical oligonucleotides anneal well at 45-65 C.

Program segment #3: Finish synthesis of any partially completed fragments.

Program segment #4: Cool samples while waiting for researcher to finish nap.

In this type of program, the temperature changes are as rapid as the machine can manage, usually taking 30 seconds to a minute to complete. Some advanced machines can change the temperature between these steps in just seconds, and these speed up the PCR process considerably. This type of temperature profile could be represented by a square wave plot.

Step cycle file

There are times when you don't want the temperature changes to be rapid, and here's an example: Suppose we are trying to work out the conditions for a polymerase chain reaction using two degenerate oligonucleotides that are approximately 1000-fold degenerate.

What do we mean by degenerate? We mean that there are various different versions of the oligonucleotide in the reaction. We do this when we have a pretty good idea of the protein sequence but not the DNA sequence encoding it. For example, we might use a collection of oligonucleotide primers that was degenerate at the third, sixth and ninth positions, to cover all of the degeneracies of the genetic code in that region, as shown below. There are 32 different ones, and they all encode Thr-His-Ala-Thr-Ser-Thr-Ala-Asn ("THATSTAN" in one-letter amino acid code).



Who knew thatstan could be so degenerate?

It gets worse! We would like to use that handy web site to determine the Tm, so we would know what annealing temperature to program into the machine, but we don't actually know which of the 1000 versions of each oligonucleotide will be an exact match to the target sequence. What we would really like to do is to introduce some flexibility into the temperature cycle, so that every potential oligonucleotide has a fair chance of annealing to the target.

What do we do?

Answer: We program the temperature cycler so that it gradually changes the annealing temperature, thereby exposing the reaction to a range of temperatures. In this kind of program, the timing at each temperature and between each temperature are specified. Rapid changes can be programmed by setting a "between temperature" time of only one second (it obviously takes longer than that to change temperatures, so the machine just does its best). In this example, the temperature gradually increases from 55 C to 65 C over a 1 minute period.

Thermocycle file

Otherwise, you will notice how similar it is to the square-wave version described before. There's always a bit of time taken in changing temperatures, and sometimes it makes the reaction work better.

To get very fast temperature changes, you need to have good contact between the tube and the walls of the heating block. They need to fit together perfectly (not too much air space). Also, it is common to use "thin walled" tubes for polymerase chain reactions. These tubes transmit the temperature changes faster, but the downside is that they aren't very strong. You wouldn't want to use them in a high speed centrifuge for example. Another factor in temperature changes is the thermal mass of your reaction. A 10 microliter reaction can be brought to temperature faster than a 100 microliter reaction.

  If you've got your MasterCard ready, here are some of the models (past and present) from the Perkin Elmer (Applied Biosystems) showroom:

DNA Thermal Cycler 480

GeneAmp PCR System 2400

GeneAmp PCR System 9600

Dual 384-Well GeneAmp® PCR System 9700

GeneAmp® PCR System 2700

The machines have a metal "hot block" that conducts heat to the samples, and a heating/cooling fluid that circulates through the inside of the metal block to change the temperature.

The vapor pressure problem With the Perkin Elmer 480, it is often recommended that you use some sort of mineral oil overlay on your sample, to keep it from condensing on the top of the tube (which is cooler than the bottom). The reason to worry about condensation is that it removes water from your reaction mixture by distillation, and makes the salts (and other components) more concentrated. On a more modern instrument such as the Perkin Elmer 2400, there is a heated cover that keeps the top of the tube warm, and prevents it from acting as a site of condensation. Nonetheless, loss of solution volume can be troubling and a source of irreproducibility.

If you do add mineral oil or waxes, it is extra thermal mass, and it is a bit messy to remove later. If you don't add it, your volumes may drop a bit during the course of the reaction.
Actual yield is less than the theoretical maximum

You may recall our discussion about the maximal theoretical yield, which is to double the amount of product every cycle. In practice you do not achieve that level of synthesis.

Here is an example of synthesis specifications from Perkin Elmer Corp.:

Using the DNA Thermal Cycler 480 and the GeneAmp® PCR Reagent Kit, an amplification yield of at least 100,000 fold of the Lambda Control DNA target can be achieved with: 0.2 µM each of the Lambda Control Primers. 0.1 ng of Lambda Control DNA target. 100 µL reaction volume with a 50 µL mineral oil overlay in a 0.5 mL GeneAmp® Thin-Walled Reaction Tube. >25 thermal cycles: 94 degrees C for 1 minute and 68 degrees C for 2 minutes.

An amplification yield of 100,000x after 25 cycles would mean at each cycle 1 template would yield 1.58 templates for the next round of synthesis.

How was this calculated? If c is the number of copies made per round of synthesis then
c25 = 100,000 = 105
so c5 = 10
and so 5(log c) = log 10 = 1
so log c = 0.2 and c = 1.58 (approximately)

(Or you could calculate the 25th root of 100,000 on a calculator, if you prefer.)

If we obtain 1.58 copies instead of the theoretical maximum of 2 copies, then the efficiency of the reaction could be said to be 79% (because 1.58/2.00 = 0.79).

One reason this calculation is important, is that a slight loss of efficiency is magnified through the amplification. A reaction may appear to have not worked if the efficiency drops (in each cycle) by just a few percent. Optimization is critically important in the polymerase chain reaction.

Here are some comments, abstracted from the Pfu Turbo DNA polymerase manual from Stratagene Co.

"Extension time is one of the most critical parameters affecting the yield of PCR product obtained. For optimal yield with minimal smearing using pfuTurbo DNA polymerase, use an extension time of 1.0 minute/kb for vector targets up to 10 kb and genomic targets up to 6 kb. When amplifying vecotor targets between 10 and 15 kb or genomic targets between 6 and 10 kb in length, use an extension time of 2.0 minutes/kb.

The most successful PCR results are achieved when the amplification reaction is performed using purified primers and templates that are essentially free of extraneous salts. Gel-purified primers, generally > 18 nucleotides in length, are strongly recommended. Additionally, an adequate concentration of primers and template should be used to ensure a good yield of desired PCR products. When DNA of known concentration is available, amounts of 50-1000 ng of DNA template/100 ul reaction are typically used for amplifications of single-copy chromosomal targets. Stratagene suggests using primers at a final concentration of 0.1-0.5 uM.

Specificity problem

The appropriate annealing temperature can be calculated from the base sequence and length, noting that longer oligonucleotides can form more hydrogen bonds with a target and therefore have a higher annealing temperature. Similarly, the fraction of G or C nucleotides in an oligonucleotide affects the annealing temperature because GC base pairs form three H bonds and AT base pairs form only two. If there is any degeneracy or mis-match between the oligonucleotide and the target, the annealing temperature will be lower.

A typical annealing temperature one might use is 50 to 60 degrees Celsius. There is a problem with using annealing temperatures lower than that, because nonspecific products may be amplified. These are side reactions in which an oligonucleotide may form just a few Watson Crick base pairs with a template, and are usually unwanted. Here is an example:

                        ||||||   || || 
      synthesis   <---- CTAGGGATTATAGCACATT-5'

If another starting site is found further to the left, and pointing in the opposite direction (with either this oligo or its partner in the reaction), then a nonspecific product may be made in the reaction. The ability of that side reaction to compete with the specific reaction will depend on the length of the product. Smaller products are synthesized faster, and are therefore more competitive. Note that even though this side reaction was only initiated with a few H bonds, subsequent reactions in the tube will fused to the oligonucleotide. That means they will be able to anneal "end to end" with the oligonucleotide.

What do you get if you have numerous nonspecific side reactions?

Answer: When you run your reaction products on a gel, you will get a smear instead of a band.

How can you solve this problem?
Answer: There are several things you can try.

  • Raise your annealing temperature
  • Perform a "hot start", meaning that the reaction is not initiated until the tube is at a temperature above approximately 50 C. Read Dr. Laura Ruth's column on Hot Start.
    • The cheap way: Sometimes this is done by leaving out one of the reaction components (such as the Taq polymerase) until a certain temperature is reached. When I want to do this, I just program in a brief 70 C temperature hold after the initial denaturation. It gives me time to add the polymerase, and the temperature is low enough that the vapor pressure of the reaction is reduced. That's a good idea, since it is hard to pipet into a solution that wants to boil.
    • Another way: There are also waxes you can buy to keep two halves of a reaction mix apart (until the wax melts that is). For example, InVitrogen's "HotWax" beads with MgCl2 emulsifed in paraffin that melts after the reaction is underway, releasing the MgCl2 so that the reaction starts. Another example is AmpliWax PCR Gem 100 from Applied Biosystems.
    • The "invisible" hot start: There are also polymerases commercially available that are complexed with an inactivating monoclonal antibody - for example Clontech's TitaniumTM Taq DNA polymerase, or Perkin Elmer AmplitaqTM Gold. When the temperature rises during the initial denaturation, the antibody is denatured (it is not thermostable) and the polymerase is free to do its job.

    Why is this important? Because it prevents low-temperature annealing and synthesis when your PCR machine is just warming up the tubes for the first cycle. Once a nonspecific product has been made, and at temperatures between 20 and 45 C there is plenty of room for that type of mischief, it is likely to contaminate your reaction and be amplified along with your desired product. To say this indelicately, you'll get a smear on your gel!

  • Design new oligonucleotides that have fewer matches with nonspecific sequences. A formula given by Stratagene Co is:

    Tm = 2(NA + NT) + 4(NG + NC)
    [in degrees Celsius]

    For example, an oligonucleotide containing 5 A residues, 7 T residues, 6 G residues and 3 C residues would have an approximate Tm of:
    Tm = 2(5+7) + 4(6+3) = 24+36 = 60 C

    Primers must be designed with the least degeneracy or mis-match at the 3' end, and for best results they should be designed to have similar melting temperatures.

  • Or, you can put on your body armor and try "touchdown PCR"!

What is "touchdown PCR?"
It is a method where you program your thermocycler to deliver slightly lower annealing temperatures in each subsequent cycle:

A     30" at 98 C     30" at 65 C     30" at 72 C
B     30" at 98 C     30" at 64 C     30" at 72 C
C     30" at 98 C     30" at 63 C     30" at 72 C
D     30" at 98 C     30" at 62 C     30" at 72 C
... and so on ...
K     30" at 98 C     30" at 55 C     30" at 72 C

Why does touchdown PCR help?
In the first few cycles, nothing works because the annealing temperature is too high. At some point however, the reactions will work because an appropriate temperature will have been reached. It is more likely that the first reaction that works will be specific, because it will take place at a relatively high temperature. Therefore, the specific reactions get a bit of a headstart on the nonspecific reactions, which may not get underway until the annealing temperature drops still further.

Why are a helmet and shoulder pads a good idea in touchdown PCR?
Because on some PCR machines you have to erase everyone else's programs on the machine to make room for your incremental temperature steps. Your lab mates may do something unpredictable!


Error rate The following comparison of error rates is offered by Stratagene Co.
DNA polymerase Error rate % of mutated PCR products (1 kb target, over 20 cycles)
PfuTurbo 1.3 x 10 -6


Pfu 1.3 x 10 -6


Taq 8 x 10 -6


U___ (brand X) 55.3 x 10 -6


  Which of these (allegedly) has low error rates?
Brand X
This table might be taken with a grain of salt (it is an advertisement, after all), however there is truth to the differences in polymerase fidelity. Taq DNA polymerase lacks a 3'-5' exonuclease activity, and so is unlikely to fix errors in a product. Pfu DNA polymerase has such an error correcting activity, and so makes fewer mistakes overall.

The rate of error is sensitive to Mg concentration (if lower than 1.5 mM in the case of Pfu enzyme, the error rate increases), as well as nucleotide concentration and buffer conditions. Fewer cycling reactions also naturally leads to fixation of fewer errors, however the yield of product may drop as a result, and the amount of template may need to be increased by way of compensation.

Stan Metzenberg
Department of Biology
California State University Northridge
Northridge CA 91330-8303

© 1996, 1997, 1998, 1999, 2000, 2001, 2002