One of the cool things about science is using new technologies to address previously intractable questions. In a paper just published we did just that to create the first genome-wide map of splicing branchpoints in the human genome. But what are branchpoints and why are they important?
Like many things in biology, RNA splicing is a pretty wondrous process. Small exonic regions are picked out of vast intronic sea and pasted together to create a mature RNA. Many factors are required for splicing, including the proteins and small RNAs of the spliceosome, as well as nucleic acid sequence motifs that mark the borders of introns and exons and localise spliceosomal components on the RNA.
Crucial sequence features in introns include the 5′ and 3′ splice sites, which are present at intronic termini next to the upstream and downstream exons and the branchpoint. The branchpoint, which is generally close to the 3′ end of the intron, is recognised early in the splicing process and in doing so, the spliceosome selects the nearby exon for inclusion into the mRNA. During splicing the 5′ splice site and branchpoint nucleotide are brought together and joined to form an intron lariat. This both frees the upstream exon and brings it into close proximity with the downstream exon allowing them to be joined and the intron lariat cut adrift (see figure below).
Branchpoints are a vital component in RNA splicing and mutation of branchpoints can disrupt proper splicing and cause diseases such as cancer. In yeast, branchpoints are easy to find because their sequence is always the same. This is not the case in humans, where the sequence motif at the branchpoint is known to vary considerably, making them difficult to confidently identify by sequence analysis. Compounding this issue, the rare and transient nature of intron lariats makes branchpoints difficult to pinpoint experimentally. So, despite having hundreds of thousands of introns, only a few hundred human splicing branchpoints had been previously identified.
In our study we tackled this issue using two complementary experimental techniques that enrich for branchpoint sequences within intron lariats. We identify ~60 000 branchpoints in >10,000 genes, providing a first genome-wide map of splicing branchpoints in the human genome. Having this many branchpoints allowed us to a perform a much more comprehensive analysis than has been previously possible, providing some cool new insights in branchpoints and their role in RNA splicing.
So how did we do it?
Firstly we used RNA Capture Sequencing (targeted RNA sequencing/ RNA CaptureSeq). RNA CaptureSeq uses oligo probes as baits to pull out RNA regions of interest and is super sensitive. CaptureSeq has been used previously to find rare mRNAs but is also ideal for rare RNA processing intermediates. Here, instead of capturing exonic sequence, we targeted the 5′ of introns (which loops around to join the branchpoint) and the 3′ of introns (where we predict most branchpoints to be).
Secondly, we used RNaseR digestion, RNaseR digests linear (but not circular) RNA, removing most RNA species and leaving behind lariats and other circRNAs. Although we don’t expect RnaseR to hone in on the branchpoint anymore than the rest of the intron, (unlike CaptureSeq) it does give nice enrichment for intronic sequence and hence for branchpoints too.
The figure below gives you an idea of how these methods work compared to standard RNA-seq:
The reason this works at all is that reverse transcriptase can cross the unusual bond between the 5′ intron end and the branchpoint.
This means branchpoint sequences are present in RNA sequencing libraries. The difficultly (see figure below) is that the sequence you get is not standard for aligning to the genome, but does give you both the 5′ of the intron and the branchpoint. So you know not only where the branchpoint is, but which upstream exon was involved in this splicing event.
Our analysis of ~60 000 branchpoints shows they are “predominantly adenosine, highly conserved, and closely distributed to the 3′ splice site.” We find that multiple branchpoints within an intron are common.
During splicing the U2 snRNA binds to the sequence around the branchpoint, an essential step in productive splicing. We analysed the conserved sequence motif, which we term the Beta-box, which overlaps the branchpoint and interacts with the U2 snRNA and identified the following distinct features.
- The density of G & U-residues within U2 snRNA enables greater base-pairing possibilities with Beta-boxes through RNA wobble-base pair interactions. This allows high sequence diversity amongst Beta-boxes while maintaining Beta-box function, similar to previous observations in microRNA seed sequences.
- The type of base-pairing allowed between U2 snRNA and Beta-boxes makes Beta-box function resistant to disruption by common transition mutations.
- The abundance of Beta-boxes families differs widely within the human genome and diverges between metazoan lineages. “Branchpoints with strong U2 binding (strong B-boxes) outcompete those with weak B-boxes … to specify exon inclusion. In addition, U2 binding strength positively correlates with both B-box occurrence and conservation, supporting the importance of the B-box to efficient splicing.”
- “B-box families preferentially associate with distinct classes of intron–exon architecture that can be distinguished by polypyrimidine tract nucleotide content, GC content, and conservation. It has been proposed these alternative architectures correspond to intron- and exon-defined splicing mechanisms. Therefore, B-box motifs contribute a further distinction between these two alternative architectures. This integration of multiple splicing features suggests the coevolution of B-box motifs with the surrounding sequence and their integration into the competitive and compensatory mechanisms that regulate splicing”
Looking at common and disease SNPs at branchpoints we find branchpoints are ~3.1-fold depleted in common SNPs, while disease associated SNPs are (16.5-fold) enriched at branchpoints, where they can cause aberrant splicing in patients. An potential outcome of the loss of a branchpoint sequence is exon skipping and we confirm that previously identified mutations in RB1 and the MET oncogene found to drive cancer development involve the elimination of branchpoint nucleotides.
Finally we took a look at branchpoint usage in primate specific exons, specifically Alu element exonizations. It has previously been noted that Alu elements are ‘‘pre-exons’’ well placed for inclusion into mature RNA transcripts in the inverted orientation. Our results build on this showing that inverted Alus have a strong Beta-box element just 5′ of an internal polyT tract in their native sequence. This cryptic Beta-box is widely used in exonized Alu elements and likely promotes their exonization.
Reference: Mercer, Clark et al Genome-wide discovery of human splicing branchpoints. 2015. Genome Research. 25: 290-303