Cistrons, Genes, and Exons
What is a gene?
|What is a gene? Is a gene defined functionally or structurally?
Some people would define a gene as the segment of DNA that specifies the amino acid sequence of an expressed protein, but that leaves out rRNA "genes" and tRNA "genes". Such a definition would neglect the regulatory regions such as transcriptional promoters that are not transcribed but have functional importance in gene expression. It would also tend to ignore elements that are transcribed but not translated, such as 5' and 3' untranslated regions of mRNAs that may have roles in controlling the stability of the RNA, or intervening sequences (introns).
What is a "cis-trans" test?
This is a rather old example, but you should learn it to impress your elder colleagues at scientific meetings (i.e. those who sit at the bar after hours, talking about the good old days and wondering why young people don't know anything anymore).
Suppose you were studying a biochemical pathway that takes a substrate A to product B, and then uses B as a substrate in a second reaction to make product C:
A -----> B -----> C
You isolate two mutations, which will be called x and y, each of which can prevent A from being converted to C. Are they in the same gene? Here's how people used to find out.
It is possible to make bacteria partially diploid for a set of genes, using bacteriophage methods. If loss of function (recessive) mutations x and y are on the same DNA strand we say that they are in "cis" and if they are on different DNA strands we say that they are in "trans."
That is true regardless of whether the x and y mutations are in the same functional unit or not, as the two examples show. The blue and magenta ovals (i.e. enzymes) can be made from the wild type strand.
Now the fun begins: If x and y are placed in trans, different results are possible
|This is called the cis-trans test, and mutations that behaved differently
in the cis and trans configuration (as shown in the case on the right) were said
to lie in the same "cistron" - a word derived from "cis-trans".
(For more reading on complementation and cis-trans test, see the web site by Holmes and Jobling)
Remember that if you performed the cis-trans test, you didn't have the benefit of knowing where the blue and magenta boxes were. You only knew whether the biochemical reaction converted A to C, and whether the mutations were in cis or trans. It was the difference in behavior in "cis" and "trans" that led you to know whether x and y were mutations in the same cistron. The cistron was a functional definition (at the time) of a gene, and cistron is a word that remains in our vocabulary.
The cis-trans test was used in bacterial genetics, where genes are represented by a single block of contiguous sequence. As you know, starting in 1977 it was discovered that many eukaryotes have non-contiguous protein-coding genes that are assembled into a single coding region by the process of RNA splicing.
Looking just at the DNA surrounding a gene, we see that in eukaryotes the coding regions are separated into blocks of sequence that are separated by introns. They are shown below as colored blocks. Note also the regulatory sequences that affect expression, such as an enhancer, promoter, and transcription terminator (ter). This is a simplification - there may be multiple promoters with additional surrounding sequences responsible for activation and repression, and there may be multiple terminators.
It is a general rule that each gene in a eukaryotic organism needs to have its own transcriptional promoter.
Generally speaking, polycistronic transcription (in which multiple cistrons or genes are included on a single pre-mRNA transcript) is not performed in eukaryotes (there are exceptions to every rule of course).
As you know from our discussion of operons, prokaryotes generally do have polycistronic transcription, and prokaryotic ribosomes are capable of internal initiation in a transcript.
In prokaryotes, a single transcript may encode many different proteins, as genes are organized in operons under the control of a single transcriptional promoter.
When a pre-mRNA transcript is generated in eukaryotes, it may carry a number of regulatory elements (as shown below: SD = splice donor, SA = splice acceptor, PA = cleavage and polyadenylation site).
Cleavage and polyadenylation are performed by a polyadenylate polymerase approximately 30 nucleotides after a "AAUAAA" sequence. The poly(A) "tail" that is added is approximately 20-250 adenylate residues in length.
Splicing depends on recognition of splice donor and splice acceptor sites, which are specified by a variety of nearby sequences. By and large, splice donors have a GU dinucleotide at the 5' end of the intervening sequence and splice acceptors have an AG dinucleotide at the 3' end of the intervening sequence.
A pre-mRNA might look like this:
Where the green nucleotides represent sequences in exons, and the aforementioned GU and AG are in red, then the spliced product would have the following structure:
I've overlayed the green mRNA sequence with red lines to indicate how a ribosome might read the triplet code. Note that splice junctions need not be placed right at the junctions between codons - there is really no relationship between the two. A codon at a splice can be assembled from two pieces.
Finding a splice junction is more difficult than just looking for GU and AG dinucleotides. While those are nearly universal consensus sequences, not every GU is a splice donor and not every AG is a splice acceptor. There are also many examples of "alternative" splicing in which a gene may be spliced in many different ways. Predicting splice junctions is an art (see several prediction programs on line: pasteur and cshl) and it is critical in making sense of the large genome sequencing projects.
The overall reaction may look something like this:
(Reload to start animation)
...Except that there is something wrong with the picture. The process requires a set of small nuclear ribonucleoprotein particles (snRNP):
Determining the size of RNAs is often an important component of analysis. RNA can be analyzed on a gel in much the same way as DNA, but there are a few modifications in approach:
How does one "flatten out" the RNA so that it will run in a predictable fashion on a gel?
The denaturation of RNA is usually started prior to loading of a gel, and then steps are also taken to continue the denatured state during running of the gel.
When formamide is added to an aqueous solution, in a final concentration of approximately 40-50% v/v, it causes the hydrogen bonds in nucleic acids to be less stable. That is, the strands of nucleic acid (whether DNA or RNA) fall apart more easily, or we say that the "melting point has dropped." In the case of an RNA molecule, the prevention of hydrogen bonding between different parts of the same molecule prevents secondary structure formation.
A typical treatment for RNA before it is loaded on a gel, is resuspension in a buffer containing 50% formamide, heating it to 95-100 C for 3 to 5 minutes, then plunging the sample tube into ice. Why ice? Because we do not want the sample to "reanneal" and a rapid drop in temperature discourages that from happening.
The gel often contains 7 to 8 M urea as a denaturant, if the matrix of the gel is polyacrylamide (i.e. a vertical gel). In that case the gel may be run "hot" as well -- approximately 50 to 55 C. The wattage of the gel electrophoresis provides the heat, and the front or back of the gel plates may have a large heat sink or other mechanism for distributing heat evenly over the surface. The heat and urea both prevent hydrogen bonding in the sample, so the bands are "sharp" and well-resolved.
If the gel is prepared with agarose, formaldehyde is often used as a denaturant (though that requires running the gel in a hood to avoid exposing your nose and eyes to the fumes). Formaldehyde is added to the agarose mix after it is boiled and cooled to approximately 60 C (you really don't want to melt your agarose with the formaldehyde mixed in!).
An alternate method is to pre-treat your RNA with glyoxal, which chemically modifies the bases to prevent secondary structure formation. After glyoxal treatment, the sample can be run on a more traditional gel that lacks denaturants.
In the good old days (pre 1980), people used to run "methyl mercury gels" as a denaturing gel electrophoresis method, but that was extremely dangerous.
|This is really the same sort of idea as Southern blots, in the sense
that a nucleic acid is being run on a gel and transferred to a membrane. What is
different is that there are no restriction enzymes involved in the analysis. One
is usually looking at an intact structure. The issues of probes and hybridization
are largely the same, although one must remember that only one strand of RNA is usually
present on the gel, unlike DNA in which both strands may be present after denaturation.
What are cDNAs?
It is not feasible to clone RNA sequences directly. As the next best thing, we prepare a complementary DNA (cDNA) that contains the same genetic information but is in a form that can be ligated into a plasmid.
The method of cDNA synthesis
In order to copy the RNA sequence into complementary DNA sequence, we need to use an enzyme called "reverse transcriptase" which is an RNA dependent DNA polymerase. This enzyme requires a primer, and for the generation of cDNAs from a collection of polyadenylated RNAs, we often use oligo(dT); a short stretch of deoxythymidylate residues.
The reaction shown above is called the "first strand synthesis," because it yields a heteroduplex of DNA and RNA base paired to each other. There are three common methods for initiating synthesis of a second strand to make the cDNA double stranded. Since one is now copying DNA, it is not necessary to use reverse transcriptase -- the issue at hand is how to prime the synthesis of the second strand?
The second strand synthesis
In the first method, the RNA still annealed to the cDNA is nicked by the enzyme RNase H. Each nick (free 3' hydroxyl) can be used as a primer by DNA polymerase I.
The DNA polymerase step also suffers from problems of extension, in that secondary structures can cause the enzyme to stop prematurely. If RNase H is used to generate free hydroxyls as primers, the 5'-most random nick in the RNA is as far as one can go towards copying the 3' end of the first cDNA strand. That is, one cannot expect complete copying of a second strand because the primers are randomly created.
Cloning a messy duplex DNA
A modern twist on the story is that if you know some of the sequence in your gene of interest, you can simply perform a PCR reaction on the first strand cDNA. This is called RT-PCR (for Reverse Transcription-Polymerase Chain Reaction).
The first strand would typically be made using a specific primer:
From this point, a PCR reaction can be used to make additional product, and the net result is that an RNA sequence can be determined from very little template.
Department of Biology
California State University Northridge
Northridge CA 91330-8303
© 1996, 1997, 1998, 1999, 2000, 2001, 2002