from Wikipedia, the free encyclopedia

As RNA-Seq , also called "total transcriptome shotgun sequencing" called, the determination is nucleotide sequence of RNA called that on high-throughput methods (Next Generation Sequencing) is based. For this, the RNA is translated into cDNA so that the method of DNA sequencing can be used. RNA-Seq reveals information on gene expression , such as how different alleles of a gene are expressed, the recognition of post-transcriptional modifications or the identification of fusion genes .

RNA-Seq is a modern sequence-based method and is based on next-generation sequencing (Engl. Next-generation sequencing ). RNA-Seq has clear advantages over the other methods. RNA-Seq helps to research complex transcriptomes and provides information about which exons come together in the messenger RNA . Low background noise, higher resolution and high reproduction rates in technical as well as biological replicas are clear advantages of RNA-Seq.

Biological background

The cell only uses part of its genes . This includes the household genes and the genes of the specialized cell. For example, muscle cells have mechanical properties, while islets of Langerhans β-cells produce insulin. All of these cells have identical genes, but differ in their gene expression . Gene expression is the synthesis of proteins from DNA . The gene expression analysis or transcriptome analysis measures which genes are switched on or off. When a gene is switched on, parts of the gene are transferred into the mRNA . Methods of gene expression analysis, such as that of RNA-Seq, measure the concentration of the mRNA in various experimental conditions (e.g. with / without drugs). The gene expression analysis thus follows the question of how the mRNA concentration behaves as a result of drugs, in different development stages of the cell, in a healthy or diseased state.

The RNA sequence can be used to better understand the mechanism of alternative splicing and fusion genes. Alternative splicing is the process in which the pre-RNA is converted into different mRNAs and thus also different proteins. Fusion genes are hybrid genes from two previously separated genes, united in one gene. Fusion genes arise through translocation , interstitial deletion or chromosomal inversion .



Sample preparation

Most people are interested in the mRNA , which is a blueprint for proteins. However, 90% of a cell's RNA is rRNA . In order to separate the mRNA from the rRNA, there are standardized methods for RNA purification , so-called “ribosomal depletion kits”. For the subsequent sequencing it is necessary to fragment the mRNA, since the sequencing techniques only have a certain reading length. The fragmentation can take place both before (RNA fragmentation) and after conversion into the cDNA (cDNA fragmentation). The cDNA fragmentation gives better results at the 5 'end, but poor quality shows up in the middle of the transcript where the RNA fragmentation does better.

When preparing the sample, you have to weigh up whether you take the reading direction, i.e. strand-specific information, into account. In this way one can exclude artifacts that originate from the aRNA . However, this is a very time-consuming and labor-intensive step. By base pairing with the complementary mRNA, the aRNA inhibits its translation in the cell and influences the gene expression of individual genes.


There are now many high throughput methods that convert the incorporation of a single nucleotide into the DNA into an electrical signal. Many of these methods differ in how they are performed. Here is an example of sequencing on the Illumina Genome Analyzer II:

  • Fragmentation of the cDNA
  • Cleaning, repairing the ends of the fragments
  • Adapters are ligated to the sample
  • The samples are separated according to their size with an agarose gel
  • PCR
  • Purification and Sequencing

Read mapping

Probably the greatest challenge in data analysis of RNA-Seq is to assign the read fragments to the reference genome. This may seem trivial for a single read, but for millions of reads, established alignment procedures such as B. BLAST 43 hours to assign 10 million reads with a length of 32 bp to the reference genome. Therefore it was necessary to design new algorithms for the read mapping.

In read mapping there are algorithms that take splicing into account, such as B. exon first or seed and extend , as well as algorithms that do not consider splicing such as B. seed methods or Burrows-Wheeler aligners .

Individual evidence

  1. Ryan D. Morin, Matthew Bainbridge, Anthony Fejes, Martin Hirst, Martin Krzywinski, Trevor J. Pugh, Helen McDonald, Richard Varhol, Steven JM Jones, and Marco A. Marra .: Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing . In: BioTechniques . 45, No. 1, 2008, pp. 81-94. doi : 10.2144 / 000112900 . PMID 18611170 .
  2. a b c Zhong Wang, Mark Gerstein, Michael Snyder: RNA-Seq: a revolutionary tool for transcriptomics . In: Nature Reviews Genetics . 10, No. 1, January 2009, pp. 57-63. doi : 10.1038 / nrg2484 . PMID 19015660 . PMC 2949280 (free full text).
  3. Trapnell C, Pachter L, Salzberg SL: Top Hat: discovering splice junctions with RNA-Seq. . In: Bioinformatics . 25, No. 9, 2009, pp. 1105-1111. doi : 10.1093 / bioinformatics / btp120 . PMID 19289445 . PMC PMC2672628 (free full text).
  4. Teixeira MR: Recurrent fusion oncogenes in carcinomas. . In: Crit Rev Oncog . 12, No. 3-4, 2006, pp. 257-271. PMID 17425505 .
  5. Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK et al .: Stem cell transcriptome profiling via massive-scale mRNA sequencing. . In: Nat Methods . 5, No. 7, 2008, pp. 613-619. doi : 10.1038 / nmeth.1223 . PMID 18516046 .
  6. a b Wilhelm BT, Landry JR: RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. . In: Methods . 48, No. 3, 2009, pp. 249-257. doi : 10.1016 / j.ymeth.2009.03.016 . PMID 19336255 .
  7. Garber M, Grabherr MG, Guttman M, Trapnell C: Computational methods for transcriptome annotation and quantification using RNA-seq. . In: Nat Methods . 8, No. 6, 2011, pp. 469-477. doi : 10.1038 / nmeth.1613 . PMID 21623353 .