Lecture 24

DNA Chips and Microarrays

Data by the truckload!

What is a DNA microarray?

A set of DNA oligonucleotides or cloned cDNAs can be spotted on a membrane or a glass slide, and this array can be hybridized to a labeled RNA or DNA.

Here are a couple of applications that we will discuss:

RNA expression

One can spot 10,000 individual cDNA clones onto a glass slide and hybridize to a labeled total RNA from a cell (or cDNA copy of the same). The level of hybridization to each spot is reflective of the amount of that respective RNA that is present in the total RNA.

DNA variation and sequencing

One can construct a "tiling" array to scan a given sequence for variations, taking advantage of the fact that mismatches can be detected during hybridization. A sequence can be broken into 25 nt segments, and each segment expressed as four variants, with G, A, T, or C at the central nucleotide. Over the entire array, all possible single nucleotide polymorphisms can be detected in a sample.

Where does the DNA come from?

Sources of DNA for spotting

  • cDNA, from extension of RNA using oligo(dT) primer
  • oligonucleotides
    • pre-made oligos, spotted with a pen
    • synthesized in situ

Note that the use of the words "probe" and "target" may be a bit misleading. In a Southern blot experiment the target DNA is affixed to a nylon or nitrocellulose membrane and the probe is a labeled DNA in solution. The probe is the "known" nucleic acid that is being used to interrogate the "unknown" nucleic acid sample.

In fluorescence detection, the probe (black lines - affixed to the surface) is hybridized to a labeled target (red lines - hydrogen bonded to probe).

UV light is used to excite the fluorescent dye conjugated to the (red) target nucleic acid.

Advantages of using glass slides

  • DNA covalently attached to treated glass surface
  • Durability of glass
  • non-porous surface, allows more favorable volume and kinetics of hybridization


Commercially available arrayers have price tags between about $45,000 and $190,000, depending on their capabilities of speed and slide capacity. The first glass slide arrays were made by Pat Brown's laboratory at Stanford. Additional sites of interest include:

Jeff Trent's lab (NHGRI)

Vivian Cheung's lab (Univ. Penn)

Geoff Childs' lab (Albert Einstein College of Medicine)

Robotic pens

The quality of the pen in an arrayer is critical for performance, since sloppy spots are difficult to analyze. The pens are responsible for picking up a small amount of liquid from a 96 well or 384 well plate and spotting the liquid (containing DNA) onto a series of slides. Typically 12 pens pick up approximately 500 nl (0.5 ul) of sample and deliver 0.25 to 1 nl on each slide dot, which is about 100 um in diameter. The centers of the dots are separated on the slide by about 250 um (one quarter millimeter). Looking at the array up-close, the regularly spaced spots of DNA might look like this:

Between samples the pens are washed in a water bath and dried (all robotically and computer controlled).

Here is an actual image of DNA spots after hybridization to fluorescent DNAs

The whole chip

An entire glass slide of about 6,000 spots might look like this (with the DNA spots nearly too small to see)

Keep in mind that this slide would represent 6,000 different tests for hybridization to a target RNA or DNA. That's a lot of data!

In the table below it is shown how the spot volume would increase in proportion to the 3rd power of the radius of the spot (with an assumption that the spot is initially a sphere), and the amount of DNA deposited would be directly proportional to the volume.

DNA quantity and spot sizes

Spot Radius Spot Volume Amount of DNA
250 um 33 nl 16 ng
150 um 7 nl 3.5 ng
50 um 0.26 nl 0.13 ng

(source: Cheung et al., Nature Genetics (21:15-19 [supplement] January 1999)

The slides are air-dried, then the DNA is cross-linked to the glass by UV light.

RNA expression profiling

Here is an example of something you can do. Suppose that you want to better understand the action of a pharmaceutical drug on human cells (let's call this imaginary drug "KillimolTM", which is being marketed with the prefix "killi" to make consumers believe that it is "1000x better" than the other drugs). You could make a series of identical chips that have cDNAs of most human genes, and use them to probe the RNAs expressed by human cells under different growth conditions in KillimolTM. Here are some imaginary cell growth profiles, in a test of the effectiveness of KillimolTM over a period of 4 days. The amount of KillimolTM added varies from 0 to 1000 units:

The question might be "How does KillimolTM affect the gene regulation in a cell?" "How does KillimolTM work?" "How can we manipulate the value of employee stock options at Killicell Inc., the company making the drug KillimolTM?"

O.K., that was actually three questions, but we'll try to tackle the first one...

Suppose that you repeated the experiment and collected cell samples for RNA extraction, at each of the points shown below with a blue square (that is, twice a day for each growth condition). These RNAs could be converted to labeled cDNAs by priming with oligo(dT) and extending the primers with fluorescently labeled nucleotides and the enzyme reverse transcriptase (alternatively, some protocols have the initial cDNA be a template for additional amplification synthesis via PCR). The labeled cDNAs are the "target" in this case because they are the "unknown". The slides containing about 10,000 cDNA spots are the "probes" even though they are fixed to a solid support.

Labeled cDNA from each time point is used in an experiment with a replicate DNA chip, so the regulation of about 10,000 genes are simultaneously assayed over samples from seven 12-hour periods and under four different concentrations of KillimolTM. Here's the amount of data you've produced (imagine that each slide-chip has 10,000 cDNA spots).

Just looking at a few cDNA spots at a few time points, you can see that the magnitude of data is difficult to imagine:

Many of the cDNAs are known genes with functions that are characterized. The pattern of expression is more than a "fingerprint" since each spot has a tale to tell.

And if that's not enough...the consultants hired by Killicell Inc. have just told you to repeat the experiment three times per cell line, and test 10 different cell lines!

Want a real-life example? Look at variation of
expression of 1753 genes in 84 human breast tumors.

Laser scanning

An image is created of the slide, for which (it is hoped) the intensity of color in the pixel is proportional to the number of dye molecules bound. Scanners are not cheap either - they cost $40,000 to $110,000 depending on their capabilities.


How do you organize huge data sets, of the type generated by microarrays? In the example of RNA expression screening, a particular action on a cell (for example, cell growth or increasing drug or nutrient condition) may have a variety of effects. One way to simplify the problem is to try to organize expression patterns by classifying them. Some genes may be turned off at low drug concentrations and turned on at high drug concentrations. Others may start out high, then decrease in expression, then increase again.

This is meant to represent four different patterns of expression - classes of expression behavior. In real life there would be many more, and it would be the job of a computer to try to put the genes into these classes, based on expression pattern. How do you sort a gene that doesn't match one of the patterns well? Either you assign the gene to its own class (making the simplification less simple) or you assign it to the closest matching class (thereby losing some information). This is tricky, and there is no definitive method.

Nonetheless, you may gather information about which genes tend to increase or decrease in concert with each other, and that may be useful.

Is it good science?

This is important to ask. If you are just "collecting data" and intend to dredge through it to find something interesting, that may not be good science. Science does not mean having mountains of data - it means having a testable hypothesis.

High density synthetic oligonucleotide arrays (Affymetrix)

Arrays of oligonucleotides can be synthesized in situ, using light-directed combinatorial chemical reactions. The techniques used include a combination of photolithography and solid-phase DNA synthesis. Synthesis proceeds by use of photochemically removable protecting groups on the 5' end of a growing oligonucleotide, and a photolithographic mask is used to specify which "spots" are deprotected for the next round of synthesis.

Here's an example of how you would use such a chip for DNA sequencing. Suppose you prepare a chip with four rows (as shown below), with 17-mer oligonucleotides synthesized in situ. In any given column, the oligonucleotides are identical except for a single nucleotide change in the middle of the 17-mer. Going from one column to the next one to its right, the sequence is offset by one nucleotide and a different nucleotide is in the center (and variable). Two columns are shown below as examples, with the variable nucleotides shown in red. The one sequence in each column that matches a "Target" is highlighted in pink.

Suppose the pink color represents hybridization of the labeled target to the spot of probe. This could be extrapolated well beyond these two columns, as follows, and the sequence could be read from left to right by a scanner and computer.

(redrawn, from Lipschutz et al, Nature Genetics (21:20-24 suppl. January 1999))

So... the laser scan image of the slide is interpreted by a computer program, which reads the following sequence (complementary to the pattern of hybridization): GATGAACTGTATCCGACATCT

Note the scale of the segment of chip - each in situ oligo spot has a width of approximately 20-25 microns (about four times the width of a red blood cell). 300,000 polydeoxynucleotides could be placed on a 1.28 cm square array, which would be sufficient to determine the sequence of about 75 kb once.

Alternatively, the 75,000 determinations of sequence can be used to "fingerprint" a sample of DNA at single nucleotide polymorphism sites distributed throughout the genome.