Lecture 27

Combinatorial genetic engineering

Consider the natural example of clonal selection of B-lymphocytes in your immune system. A foreign antigen is tested for fit against the surface antibodies on many different B-lymphocytes. These B-lymphocytes express immunoglobulin genes that have been rearranged and mutated, and the repertoire of possible binding surfaces is quite large. Will one of them bind to the antigen?

Maybe so. That's one of the beautiful things about the human immune system - its capacity to respond to new antigens. If the fit of the antigen is good, that B-lymphocyte will be activated and will replicate, making antibody-secreting plasma cells and memory-B-cells.

We can make use of a similar strategy in molecular biology - building a random collection of molecules and selecting one (or more) that has the properties we want.

Degenerate oligos

A simple example that we've already discussed is the use of degenerate oligonucleotides during PCR synthesis. Remember- we discussed an example in which we had isolated a protein from an organism, determined a bit of amino acid sequence from several peptide fragments by Edman degradation, and cloned the gene using degenerate oligonucleotides.

We looked at this example:

The protein sequence PRETTYFLY could be encoded as follows, with the possible codon differences indicated vertically. As it turned out, there were (4)(2)(4)(2)(4)(4)(2)(2)(2)(4)(2) = 65,536 versions of the oligonucleotide.

                 P   R   E   T   T   Y   F   L   Y

                  G   G   G   G   G   T   T   G   T
                  C   C       C   C           C
                  T   T       T   T           T
                    A                       T

So - if we know that a gene has PRETTYFLY in it, one of these 65,536 oligonucleotides ought to match exactly. Probably many of them will match well enough at their 3' ends to get synthesis in a PCR reaction. If we only need 15 matching nucleotides, maybe we would get lots of different products that were PRETTYFLY:

                 P   R   E   T   T   Y   F   L   Y

                  G   G   G         
                  C   C          
                  T   T         

In this example, there might be (4)(2)(4)(2)=64 versions of the oligo that work in a PCR, all corresponding to allowable degeneracy at the 5' ends of the oligos. It isn't perfect, though.

There's a bit of a problem with leucines, serines, and arginines, in that they are 6-fold degenerate. If we used the degenerate oligo shown above, we would not only be specifying PRETTYFLY, but also P
SETTYFLY, PRETTYFFY, and PSETTYFFY. That's because the second codon could be CGN or AGR, which is arginine, or AGY which is serine. The eighth codon could be CTN or TTR, which is leucine, or TTY which is phenylalanine. Well, it's really only a problem if you have competition in the reaction - how likely is it that your genome encodes both a PRETTYFFY protein and a PRETTYFLY protein?

Making aptamers

Have I mentioned the RNA world?

RNA can do more than bind to other RNA molecules - it has the capability of driving reactions catalytically. For the moment, let's just focus on its ability to recognize other molecules. Suppose that we made a random collection of RNAs, using a T7 RNA polymerase promoter and a random collection of DNAs.


The red sequence on the left is the T7 RNA polymerase promoter. The two blue sequences might just be considered "tags" for recovery of products by PCR. The black "N" residues are a random segment - perhaps there might be 30 to 40 of them.

Suppose that you transcribed RNA from a mixture such as this, in vitro, using T7 RNA polymerase. What would you get? A lot of different RNAs, which would look something like this:


If you had 32 degenerate "N" residues, you could get 432 different products. That's about 1.8 X 1019. Well, it would actually be difficult to get full representation, what with the Poisson distribution problem that we discussed a few weeks ago. Also, if you had just one of each of the , you would have about half a gram of RNA! You would need about 100 ml of solution volume just to get it all into solution, and then each one would be so dilute that it would be hard to find.

Still, even a millionth of 1.8 X 1019 is a lot of different RNA molecules! Suppose that you put this big collection of RNA molecules over a column to which was bound a ligand of some sort. Maybe a small target molecule. Some of the RNA molecules may wrap around the ligand and stick to the column. You can wash away the ones that don't stick.

You keep the ones that do stick - releasing them from the column and amplifying them in a RT-PCR by virtue of their constant "blue" ends.

You use reverse transcriptase for the first strand, of course, then you use polymerase chain reaction. When you do amplify them, you will want to use a long oligonucleotide that restores the T7 RNA polymerase promoter.

That way, you can transcribe the sequences after amplification, and have a collection of RNA molecules again.

So - you do that, and you try the binding experiment again, using all of the candidates that seemed to stick the first time around. You haven't isolated them as individual sequences - they're still all mixed together. This time, though, you have many copies of each candidate because you've amplified the mixture by PCR. After you have conducted your binding reaction, the ones that are more successful in sticking to the target will be better represented in the RNAs that are eluted from the column.

You amplify them the same way, and try it again. And again. And again.

We call this
SELEX (Selective Evolution of Ligands by EXponential enrichment)

Each time you amplify the mixture and apply it to the column, you get selection for the collection of oligonucleotides that bind the best.

The members of this collection would be called
aptamers - molecules that are engineered to fit a target. Aptamers can be RNA or DNA, but let's concentrate on some of the examples of RNA aptamers that have been uncovered. They are organized on the RNAbase.org site, and here are a few examples:

Vitamin B12 RNA Aptamer (D.Sussman, et al.) 5 chains of the sequence
Representations in
chime or using Explorer, with PDB: 1DDY

Theophylline-binding RNA (G.R.Zimmermann, et al.)
chime /PDB: 1EHT
Biotin-binding RNA Pseudoknot (J.Nix, et al.)
chime /PDB: 1F27

For practical (that is to say, commercial) uses, the sensitivity of RNA to environmental nucleases is a problem, but there are steps you can take to make RNA aptamers more stable. For example you can make the 2' hydroxyls into 2'-O-methyl groups, or amino or fluoro groups. That makes the RNA resistant to many environmental RNases. Or, you can make the enantiomer of the normal RNA, which will be resistant to many RNases.

Or - you can switch to DNA aptamers and go through a similar SELEX process.


Environmental remediation

"SELEX DNA Aptamer Filter for Removal of Pesticides and Chloroaromatics. OmniSite BioDiagnostics, Inc. (OmniSite) proposes to develop artificial receptors composed of DNA oligomers (called "aptamers") for binding and removal of organophosphorous and chlorinated pesticides."


"Eyetech's lead product MacugenTM (pegaptanib sodium) is an anti-VEGF [Vascular Endothelial Growth Factor] aptamer. The aptamer was discovered using SELEX technology. Macugen˘ (pegaptanib sodium) is an oligonucleotide that acts like a high affinity antibody to VEGF. This anti-VEGF aptamer blocks blood vessel growth and inhibits neovascularization in pre-clinical models." http://www.eyetechpharmaceuticals.com/products/product_lead.asp

Infectious Disease Research

"Bent pseudoknots and novel RNA inhibitors of type 1 human immunodeficiency virus (HIV-1) reverse transcriptase" Donald H. Burke, Lori Scates, Katy Andrews, and Larry Gold. J. Mol. Biol. 264:650-666 (1996). ..."We have found several new RNA inhibitors of HIV-1 RT that differ significantly from the pseudoknot ligands found previously, along with a wide variety of pseudoknot variants. "http://bl-chem-ernie.chem.indiana.edu/~dhburke/pub10.htm "Expressing SELEX-derived aptamers and ribozymes in cells lets us model RNA World organisms and exploit the power of biological selection and rapid in vivo screens to optimize RNA function. In the experiment below, a collection of aptamers that bind the RT protein has been inserted next to the control signals for a reporter gene that turns cells blue. Protein expression is blocked by atamers that bind the protein, turning those cells white." http://bl-chem-ernie.chem.indiana.edu/~dhburke/research.htm

picture source: http://bl-chem-ernie.chem.indiana.edu/~dhburke/research.htm


Anti-thrombin DNA aptamers "A fiber-optic biosensor based on DNA aptamers used as receptors was developed for the measurement of thrombin concentration. Anti-thrombin DNA aptamers were immobilized on silica microspheres, placed insid microwells on the distal tip on an imaging optical fiber, coupled to a modified epifluorescence microscope through its proximal tip. Thrombin concentration is determined by a competitive binding assay using a fluorescein-labeled competitor. " http://www.protein.bio.msu.su/biokhimiya/contents/v67/full/67060850.htm

Resource reading:
Here is a tutorial on SELEX, from Indiana University.

RNA can do more than just hang on to things, of course. RNA has the capability of catalysis - we call RNA enzymes

Example of the SELEX method - isolation of a sulfur alkylating ribozyme (Wecker et al. 1996)
Substitution of 5'-phosphorothioate-RNA in N-bromoacetyl-bradykinin

It reacts ---> so it sticks!

SELEX approach:
1. Prepare a pool of 5 x 1013 different phosphorothioate-RNAs, 76 in length, with the internal 30 nucleotides randomized (the ends are identical and used as tags, as before).
2. Incubate RNA pool with N-bromoacetyl-bradykinin
3. Partition reacted molecules on thiopropyl sepharose
4. Amplify pool of functional RNAs by RT-PCR
5. Transcribe in vitro to produce enriched RNA pool (5'-phosphorothioate-capped)
6. Repeat steps 2-5.

What just happened? Weckner et al. started with a combinatorial collection of RNA molecules, and gave them a job to do. Those that could do the job were retained on a column, and those that could not were washed away. Through multiple rounds of SELEX, an RNA with a specific desired catalytic activity was found. That sure beats protein engineering!

There are many natural ribozymes - some of the earliest discovered by Cech et al. were self-splicing RNA molecules. Now we know that even the rRNA is catalytic in the peptidyl transferase reaction.
The ribosome is a ribozyme!

Another example -
viroids are infectious RNA molecules (e.g. the 359 nt RNA of Satellite Tobacco Ringspot Virus) that replicate by rolling circle transcription from an RNA template. The polymeric product self-cleaves to yield individual monomer units of the genome.

Self-cleavage of viroid RNAs

Processing of the viroid genome depends on the presence of 13 required nucleotides, and the formation of a specific secondary structure surrounding the cleavage site.

Detail of the cleavage site

So here we have a secondary structure, a "hammerhead", that chelates a magnesium, and forms a self-cleaving structure. This can be put to use in many ways:


anti-Hepatitis C ribozyme May 11, 2000 "Administration of LY466700 to chronic Hepatitis C patients has now been initiated in a clinical trial designed to study safety and to assess the effect of the compound on HCV viral RNA levels following a 28 day dose-response regimen. The drug will be administered by a daily subcutaneous injection to approximately 20 patients."

ANGIOZYMETM "is the first chemically synthesized ribozyme to be studied in human clinical trials. ANGIOZYME(TM) specifically inhibits formation of the VEGF-r (Vascular Endothelial Growth Factor receptor), a key component in the angiogenesis pathway." http://www.slip.net/~mcdavis/database/angio183.htm

Anti-HIV ribozyme "The Hammerhead anti-gag ribozyme catalytically cleaves HIV-1 RNA within the gag open reading frame, blocking protein synthesis of the gag-encoded p24 capsid protein (1). The Hammerhead anti-gag ribozyme is introduced into cells through through transformation of target cells with a ribozyme RNA expression vector" http://www.niaid.nih.gov/daids/dtpdb/000681.htm


Allozymes - Allosteric ribozymes "are a class of ribozymes that are activated to cleave a reporter RNA in the presence of a target analyte. The resulting signal from the cleavage of the reporter RNA can be readily measured. Allosteric ribozymes have multiple diagnostic applications, including detecting and quantifying a wide range of nucleic acids, proteins and small molecules. " http://www.rpi.com/diagnost.jsp

HalfzymesTM - broadly applicable and is well suited for direct nucleic acid screening of blood products for viral contamination, determination of viral drug resistance, and for the detection of single nucleotide polymorphisms (SNPs) relevant to human health. ... In the absence of nucleic acid target, the technology lacks sequences required to form the catalytic core and to properly dock a tethered substrate RNA that serves as a reporter. A target nucleic acid supplies these sequences. http://www.rpi.com/diagnost.jsp

The idea of an RNA molecule binding to a ligand (similar to the allozyme) is not going to seem strange for very long! Winkler et al. described, in the Oct 31, 2002 issue of Nature, that mRNA can be involved in allosteric regulation. The example given was vitamin B1 biosynthesis in E. coli, where the mRNA encoding enzymes for biosynthesis can bind to thiamine. When this binding occurs, the mRNA changes conformation and the ribosome can no longer bind to the ribosome binding site.

Halfzymes - maybe they work like this. You start with a target, and design a ribozyme to match it so that you can form an activating secondary structure.

The half-ribozyme might look like this, with a fluorescent dye conjugated to one end and a quenching dye (one that prevents the fluorescence) on the other end. With the dye and the quench molecule in close proximity, the half-ribozyme is not fluorescent.

Then you add a sample that might have the target RNA. If it does, you might form a structure like this:

This might lead to cleavage of the ribozyme:

And then the fluorescent dye-labeled end would be released and could float away from the 3' quencher:

That would give a fluorescent signal that could be detected. Alternatively, the half-ribozyme could be affixed to a solid support, and a dye could simply be released for quantitation.

Here's an idea I had a few years back- couldn't get anyone interested in it, but I think it's nice. You could make two half-ribozymes, combining the target from one (T1 or T2) with the cleavage site from another (R2 or R1). It might look like this:

These two RNA molecules are tethered to a solid support so that they cannot reach each other and react. Obviously, if T1 happened to bind to R1, or if T2 happened to bind to R2, then a cleavage reaction would occur.

Now R1 happens to be designed so that it matches a target sequence (T1) that is of some interest - perhaps something we wish to see in a diagnostic test. Suppose a SINGLE molecule of the untethered, native target T1 happens to drift into this peaceful scene:

Then, the first ribozyme is complete, and it cleaves itself, releasing most of the molecule into solution. This released molecule contains the T2 target, which can then diffuse over to R2:

Well- there's a pretty sight. Now the second ribozyme self-cleaves, and releases its T1. That T1 goes on to release more T2, which goes on to release more T1 from the solid support. The reaction accelerates as a chain reaction! And all from a single molecular trigger.

Completely impractical, but interesting on a theoretical level.

protein combinatorics

One way to work with libraries is to make a collection of synthetic products, and to plan the synthesis so that multiple versions are included. Here's an example:

Suppose we are looking at a protein that has a biochemical function, for example a potential drug that has a target, and we suspect that the binding function is associated with a specific loop, reading Glu-His-Cys-Pro-Asp.

Suppose we would like to find out what other amino acids (if any) will work in this loop, keeping the cysteine intact in all of the mutated versions. We could make a small library of all possible amino acid sequences. The original DNA might have looked like this:

sequence on left GAG CAC TGT CCA GAC sequence on right

We could make a degenerate oligonucleotide that covers this region and changes four out of the five codons:

5' sequence on left NNN NNN TGT NNN NNN sequence on right 3'

We could use this degenerate oligonucleotide (which might be about 45 nt in length overall) in PCR mutagenesis, or embed it into the sequence in place of the original.

Having changed four codons to "NNN" means that they can be any of the 20 amino acids. There are 204 different products, or about 160,000. Of course occasionally the NNN will be something like "TAA" or "TAG" or "TGA", which will be a stop codon, and the amino acids will be represented by their frequency in the codon table (serine would be six times more common than methionine, for example). This is called a combinatorial library because you are testing all possible combinations.

You could prepare this library and screen clones on the basis of function. If you are trying to improve a potential drug so that it has a higher binding activity, you might see which clones produce a product with the highest affinity constant.

Protein display systems

While those biochemists were busy, watching their columns go "drip drip drip", the molecular biologists did a favor for them! They created phage display libraries.

Think for a moment, about the problem of working with proteins. Aside from the bone-chilling time you have to spend in the cold-room, the molecules you work on don't even carry their genetic information with them. Wouldn't it be terrific if a protein just carried its nucleic acid coding sequence with itself, like a suitcase? Then if you found a protein you were particularly interested in, you could just look into the suitcase and pull out the gene sequence that encoded the protein.

That's essentially what we have with phage display libraries (and similar pili display libraries in E. coli). If you clone a random collection of coding sequences into a T7 phage vector coat protein (i.e. as a fusion between the coat protein of T7 phage and your random collection of sequences) then the protein encoded by the inserted sequence will be displayed on the outside of the phage. Why? Because the coat protein, which is assembled on the outside of the phage capsid, is now fused to the peptide encoded by the inserted sequence.

Why is this any help? Because now we can screen a library for phage that are displaying the very protein we are interested in, by any sort of binding assay. In the schematic below, phage are allowed to interact with a ligand bound to a solid support. Those that don't bind are washed away. Those that do bind are isolated, and their fusion gene is sequenced. After several sequential rounds of isolation, a pattern may emerge.

The "biopan" model

This method is called the Ph.D.TM kit (for "Phage Display") by New England Biolabs, Incorporated.

Here's an example of how this has worked in epitope mapping. In the given example, the phage display library contains random short segments of amino acids. Larger sequences may also be cloned into the phage display system, to assay native cDNAs for example, but a different T7 vector must be used.

Epitope Mapping of an Anti-Beta-Endorphin Monoclonal Antibody

The Ph.D.-12 library was panned against anti-beta-endorphin antibody in solution (10 nM antibody), followed by affinity capture of the antibody-phage complexes onto Protein A-agarose (rounds 1 and 3) or Protein G-agarose (round 2). Bound phage were eluted with 0.2 M glycine-HCl, pH 2.2. Selected 12-mer sequences from each round are shown aligned with the first 12 resides of beta-endorphin; consensus elements are boxed.

The results clearly show that the epitope for this antibody spans the first 7 residues of beta-endorphin, and that the bulk of the antibody-antigen binding energy is contributed by the first 4 residues (YGGF), with some flexibility allowed in the third position. Additionally, the conserved position of the selected sequences within the 12 residue window suggests that the free alpha-amino group of the N-terminal tyrosine is part of the epitope.


Description of Phage Display (from New England Biolabs)

Phage display describes a selection technique in which a peptide or protein is expressed as a fusion with a coat protein of a bacteriophage, resulting in display of the fused protein on the exterior surface of the phage virion, while the DNA encoding the fusion resides within the virion. Phage display has been used to create a physical linkage between a vast library of random peptide sequences to the DNA encoding each sequence, allowing rapid identification of peptide ligands for a variety of target molecules (antibodies, enzymes, cell-surface receptors, etc.) by an in vitro selection process called biopanning.

In its simplest form, biopanning is carried out by incubating a library of phage-displayed peptides with a plate (or bead) coated with the target, washing away the unbound phage, and eluting the specifically-bound phage. (Alternatively the phage can be reacted with the target in solution, followed by affinity capture of the phage-target complexes onto a plate or bead that specifically binds the target.) The eluted phage is then amplified and taken through additional cycles of biopanning and amplification to successively enrich the pool of phage in favor of the tightest binding sequences. After 3-4 rounds, individual clones are characterized by DNA sequencing and ELISA.

The Ph.D.-7 linear 7-mer library contains 2.0 x 109 independent clones, while the Ph.D.-C7C disulfide-constrained library contains 3.7 x 109 independent clones. Both libraries are sufficiently complex to contain most if not all of the 207 = 1.28 x 109 possible 7-mer sequences. In contrast, the Ph.D.-12 library, with 1.9 x 109 independent clones, represents only a very small sampling of the potential sequence space of 2012 = 4.1 x 1015 12-mer sequences.


Stan Metzenberg
Department of Biology
California State University Northridge
Northridge CA 91330-8303

© 1996, 1997, 1998, 1999, 2000, 2001, 2002