Gene expression analysis

from Wikipedia, the free encyclopedia
Heatmaps of gene expression data show how experimental conditions affect the production (expression) of mRNA for a group of genes. Green indicates reduced expression, while red stands for increased expression. Cluster analysis placed a group of downregulated genes in the upper left corner.

The gene expression analysis refers to an investigation of the implementation of genetic information ( gene expression ) using molecular biological and biochemical methods. It can be used both for individual transcripts and for the entire transcriptome and enables qualitative and quantitative statements to be made about the activity of the genes . While in the first case only the sequence information of the transcript to be examined needs to be known, the entire sequence information of the transcriptome is needed for the transcriptome analysis.

properties

In molecular biology, the activity and expression of thousands of genes can be measured simultaneously with the help of gene expression analysis, which enables an overview of cellular functions. Gene expression profiles can be used, for example, to identify cells that are in the active phase of division, or to show the response of cells to a specific treatment. Many such experiments examine the entire genome ; H. each gene from a specific cell.

The DNA microarray technique measures the relative activity of previously identified target genes. Sequence-based methods such as serial analysis of gene expression ( SAGE , SuperSAGE ) are also used for gene expression analysis. SuperSAGE is particularly accurate because this method is not limited to predefined genes, but can measure any active gene. Since the introduction of next-generation sequencing methods, sequence-based “total transcriptome shotgun sequencing”, including RNA-Seq, has enjoyed increasing popularity as it represents a digital alternative to microarrays. A comparison of PubMed articles from 2015 with the terms “microarray” or “RNA-Seq” shows that microarrays were used about three times more often in publications.

Qualitative analysis:

  • Is a gene even expressed under the circumstances examined?
  • In which cells does expression take place (for example in situ hybridization )?

Quantitative analysis:

  • How great is the difference in expression compared to a defined reference?
    • healthy vs. diseased cells
    • Wild type vs. Mutant cells
    • unstimulated vs. stimulated cells

Depending on the method, the products of the different levels of gene expression are analyzed:

Expression analysis is the next logical step after sequencing a genome: the sequence provides information about what a cell could possibly be doing, while an expression profile shows what the cell is actually doing. Genes contain the information needed to create messenger RNA ( mRNA ), but at any given point in time a cell only transcribes a fraction of its genes into mRNA. If a gene is being read (expressed) to produce mRNA, it is considered “on”, otherwise it is considered to be off. Many different factors determine whether a gene is on or off. For example, the time of day, the local environment, chemical signals from other cells and the question of whether the cell is currently actively dividing or not all have an influence on the activity of the genes. Skin cells, liver cells, and nerve cells express different genes to some extent, and that explains a large part of the differences between these cells. Therefore, an expression analysis allows u. a. Conclusions about the cell type as well as the condition and environment of a cell. Expression analyzes often examine the relative amounts of mRNA which are expressed under two or more experimental conditions. Altered levels of a specific mRNA sequence indicate a change in the requirement of the protein for which this mRNA codes . Such a change could indicate both a homeostatic response and a pathological condition. For example, increased levels of the mRNA coding for alcohol dehydrogenase indicate that the cells or tissues examined react to an increased alcohol concentration. If breast cancer cells express a larger amount of mRNA from a specific transmembrane receptor than normal cells, this could be an indication that this receptor plays a role in breast cancer . A drug that inhibits this receptor could be used to prevent or treat breast cancer. When developing a drug, gene expression analysis can help determine its toxicity by looking for changes in the level of expression of a biomarker for drug metabolism such as cytochrome P450 . Gene expression analysis could thus become an important diagnostic tool.

Methods

  • With in-situ hybridization , the sequence-specific RNA of a defined gene / gene set is detected in the tissue and the local gene expression pattern is determined.

Various methods can be used after an RNA purification :

  • In the Northern blot method, RNA is first isolated and electrophoretically separated according to its size in a gel. After transferring it to a membrane ( blotting ), the RNA sequence sought is detected by labeled probes (radioisotopes, fluorescent dyes) made from complementary RNA or DNA via complementary binding. As a rule, only a small number of sequences are examined simultaneously.
  • In the RNase Protection Assay , RNA with specifically hybridizing radioactively labeled antisense RNA hybridization probes are protected from degradation by single-strand degrading RNases . The protected RNA molecules are separated by gel electrophoresis , detected by autoradiography and quantified
  • With the nuclear run-on assay (English nuclear run-on assay ) RNA segments with transcriptionally active RNA polymerase II can be identified in the genome . It is based on the fact that emerging RNA is marked and detected. This is done by hybridizing the RNA with individual target sequences by means of dot blot , or as nuclear run-on sequencing , based on the sequencing of the next generation , also at the level of the transcriptome.
  • In DNA microarrays or macroarrays, the amount of mRNA of a large number of genes from cells of a culture / tissue can be determined simultaneously. To do this, the mRNA is isolated and transcribed into cDNA. With this method, detection takes place via complementary hybridization of the marked cDNA (radioisotopes, fluorescent dyes) with the probes of the DNA array.
  • With the serial analysis of gene expression (SAGE) and in particular SuperSAGE , the expression of theoretically all genes of a cell can be determined very precisely by generating a short piece of sequence (the so-called "tag" = label) and as many as possible from each transcript of these tags are sequenced . The advantage over microarrays is the much more accurate quantification of the transcripts, as well as the possibility (especially with SuperSAGE) to identify new transcripts (e.g. non-coding ribonucleic acids such as microRNAs or antisense RNAs ) and to examine organisms with previously unknown genomes .
  • When differential display (engl. For differential representation , also DDRT-PCR or DD-PCR the change of gene expression is mentioned) compared to the mRNA level between two eukaryotic cellular samples. Since randomly selected primers are used here, it does not require any knowledge of the mRNAs present.
  • The real-time quantitative PCR is a variation of the polymerase chain reaction (PCR). Dyes or special probes added to the reaction mixture monitor the concentration of the product during the PCR. The change in concentration over time enables conclusions to be drawn about the initial concentration of the nucleic acid in question. Alternatively, digital PCR is used for quantification
  • With "total transcriptome shotgun sequencing", also known as RNA-Seq , which places high demands on bioinformatic analysis, one tries to determine the transcriptome of a cell or tissue, i.e. the quantitative distribution of as many transcripts as possible.
  • Ribosomal Profiling , the detection of all RNAs of a cell that are bound to ribosomes at a certain point in time and are therefore probably translated, was made possible by the RNA-Seq. By sequencing the ribosome footprints ( Ribo-Seq ), nucleotide-precise mapping is possible.

At the protein level, too, there are a number of methods for comparing the occurrence of individual or a large number of proteins.

  • The occurrence of individual proteins in different biological samples can be compared using a classic Western blott analysis. Proteins are separated with regard to different properties such as size, electrical charge or isoelectric point and then detected with antibodies . Usually only a manageable number of gene products are examined simultaneously.
  • The differential analysis of 2D gels enables the expression of up to 10,000 proteins at the same time. For this purpose, the protein extracts are obtained from cell cultures / tissues, separated two-dimensionally according to the isoelectric point and the molecular mass , and detected and quantified by means of marking or various (fluorescence) staining techniques. The determined intensities of different samples are compared with each other and thus the expression behavior is monitored over different conditions.
  • In protein arrays, analogous to DNA arrays, the amount of certain proteins is examined. The numerous interactions between proteins and other molecules are used for detection: B. Enzyme-substrate, antibody-antigen or receptor-messenger interaction.
  • In recent years, the mass spectrometric analysis of protein mixtures has become more and more important. The “spectral abundance” is used to count protein pieces, i.e. peptides , similar to the DNA fragments of existing RNAs in SAGE, and thus their frequency is quantified. Other methods use the signal intensity of certain peptides in the mass spectrum as a measure of the frequency.

Many methods use fluorescent dyes that are coupled to the probes (RNA probes, antibodies, etc.) and are made visible by means of fluorescence spectroscopy or fluorescence microscopy . The latter offers the advantage of a high spatial resolution. In addition, radioactively labeled probes are used, or those that convert chromogens into dyes using coupled enzymes.

Comparison with proteomics

The human genome contains around 25,000 genes that work together to create around 1,000,000 different proteins. This diversity arises mainly through post-translational modifications so that a single gene can serve as a template for many different versions of a protein. About 2000 proteins (0.2% of the total) can be identified in a single mass spectrometry experiment. Knowing about the individual proteins a cell produces ( proteomics ) is more relevant than knowing how much mRNA is produced by each gene. However, gene expression analysis gives the best possible overview that can be obtained in a single experiment.

Use to develop and test hypotheses

Sometimes a scientist already has a hypothesis and does gene expression analysis to test that hypothesis. In other words, the scientist makes a specific prediction about the expected expression levels, which may turn out to be right or wrong. Gene expression analyzes are often carried out before it is known how the respective test conditions affect the expression of certain genes. That is, there is initially no testable hypothesis, but gene expression analysis can help identify candidate genes for future experiments. Most of the early and many of today's gene expression analyzes have this form, which is referred to as class discovery. A popular approach to class determination involves dividing similar genes or samples into clusters using the k-means algorithm or hierarchical cluster analysis . The figure (see above) shows the result of a two-dimensional cluster in which similar samples (rows) and similar genes (columns) are organized in such a way that they are close together. The simplest form of class determination is a list of all genes that have changed by more than a certain value under various experimental conditions. Class prediction is more difficult than class determination, but it allows the answer to questions of direct clinical relevance, such as the question of how likely it is that a patient with a certain profile will respond to a drug under investigation. This requires many examples of profiles of patients who have responded or not responded to the drug, as well as cross-validation methods to differentiate between these profiles.

restrictions

In general, gene expression analyzes reveal genes whose expression level shows statistically significant differences between different test conditions. For various reasons, this is usually only a small fraction of the entire genome. First, as a direct consequence of their differentiation , different cell types and tissues express only part of all genes, while the remaining genes are switched off. Second, many genes code for proteins that must be in a certain concentration range for the cell to survive so that their level of expression does not change. Third, besides changing the amount of mRNA, a cell has alternative mechanisms of protein regulation, so that some genes are expressed at the same level, even if the amount of the protein they code for fluctuates. Fourth, financial constraints limit gene expression analysis to a small number of observations of the same gene under identical experimental conditions, thereby weakening the statistical power of the experiment and making minor changes in expression levels impossible to identify. Ultimately, it would be too much of a hassle to discuss the biological significance of each individual regulated gene, so scientists often confine themselves to a specific group of genes when discussing the results of a gene expression analysis. Although newer microarray analysis methods automate certain aspects of the interpretation of gene expression analysis results with regard to their biological relevance, this remains a very difficult challenge. Since the lists of published genes from gene expression analyzes are usually relatively short, a comparison of the degree of agreement with the results of another laboratory is limited. Posting the results of gene expression analyzes in a public database (microarray database) enables scientists to compare expression patterns beyond the information contained in publications and, if necessary, to find matches with their own data.

Validation of high throughput methods

Both DNA microarrays and qPCR use the preferred binding of complementary nucleic acid sequences ("base pairing") and both methods are used, often in succession, to create gene expression profiles. While DNA microarrays have a high throughput rate, they lack the high quantitative accuracy of qPCR. However, in the same time it takes to determine the expression of a few dozen genes with qPCR, the entire genome can be examined using DNA microarrays. For this reason, it often makes sense to first carry out a semi-quantitative DNA microarray analysis to identify candidate genes, which can then be validated and more precisely quantified using qPCR. Additional experiments, such as Western blotting of the proteins of differentially expressed genes, can help to substantiate the results of the gene expression analysis, since mRNA concentrations do not necessarily correlate with the amount of protein expressed.

Statistical analysis

The analysis of microarray data has developed into an intensive research area. The previous practice of stating that a group of genes was up or downregulated by a factor of 2 lacks a solid statistical basis. With the 5 or fewer repetitions in each group, which are typical for microarrays, a single outlier can apparently cause a difference that is more than double. In addition, the arbitrary stipulation that deviations from a factor of 2 are significant is not based on a biological basis, since this excludes many genes with obvious biological significance (but too little variation in expression). Instead of identifying differentially expressed genes by means of a threshold value for an x-fold change, various statistical tests or omnibus tests such as ANOVA can be used. Such tests take into account both the x-fold change and the variability to generate a p-value , which is a measure of how often this data could be observed purely by chance. The application of these tests to microarray data is made difficult by the large number of different comparison options between the individual genes. For example, a p-value of 0.05 is usually taken as a sign that the data is significant, as this value indicates only a 5% probability that this data can be observed purely by chance. However, if you look at 10,000 genes on a microarray, p <0.05 means that 500 of these genes are incorrectly identified as significantly up or down, although there is no real difference between the various test conditions. An obvious solution to this problem is to only consider genes that meet more stringent p-value criteria, e.g. B. by using a Bonferroni correction or adapting the p-values ​​to the number of parallel attempts, taking into account the false-positive rate. Unfortunately, such approaches can reduce the number of significant genes to zero even if some of these genes are actually differentially expressed. Common statistical methods such as the rank product method try to achieve a balance between the false recognition of genes due to random changes (false positives) on the one hand and the non-recognition of differentially expressed genes (false negatives) on the other. The frequently cited methods include the significance analysis of microarrays (SAM) as well as numerous other methods available in software packages from Bioconductor or other bioinformatics companies. Choosing a different test usually yields a different list of significant genes because each test uses a specific set of assumptions and places a different emphasis on certain properties of the data. Many tests are based on the assumption of a normal distribution of the data, as this is often a reasonable starting point and gives results that appear to be more significant. Some tests take into account the common distribution of all gene observations to estimate the general variability of the measurements, while others look at each individual gene individually. Many modern microarray analysis techniques involve bootstrap technology, machine learning or Monte Carlo methods. The higher the number of replicates in a microarray experiment, the more similar the results of various statistical analysis methods become. In contrast, microarrays appear to be less reliable with a low level of agreement between the results of various statistical data analysis methods. The MAQC Project gives scientists recommendations for choosing standard methods that will make experiments in different laboratories more consistent.

Gene annotation

With the help of statistical methods, genes whose products change under experimental conditions can be reliably identified. For a meaningful interpretation of expression profiles, it is essential to know which protein is encoded by which gene and what function it has. This process is called gene annotation. Some annotations are more reliable than others, and sometimes they are completely absent. Gene annotation databases are constantly changing, and different databases use different names for the same protein, reflecting a change in understanding of its function. The use of standardized gene nomenclature avoids the problem of different naming, but the exact assignment of transcripts to genes remains an important challenge.

Classification of Regulated Genes

The next step after identifying a group of differentially regulated genes is to look for patterns within that group. Do the proteins these genes code for have similar functions? Are they chemically similar? Are they located in similar cell compartments? Analysis of the gene ontology provides a common way of defining these relationships. Gene ontology starts with a very broad upper category, e.g. B. "Metabolic process", and then subdivides these into smaller sub-categories such as "Carbohydrate metabolism", which in turn can be divided into more specific sub-groups such as "Phosphorylation of inositol and derivatives". In addition to their biological function, chemical properties and cellular location, genes have other properties. For example, genes can be classified into groups based on their relationship to other genes, their connection with diseases or their interaction with drugs or toxins. The Molecular Signature Database and the Comparative Toxicogenomics Database allow genes to be categorized in a wide variety of ways.

Pattern recognition between regulated genes

The Ingenuity Gene Network Diagram arranges genes with a known relationship in a dynamic network. Green stands for reduced expression, while red indicates increased expression. The algorithm also includes non-regulated genes (white) to improve connectivity

When regulated genes are sorted by what they are and what they do, important relationships between different genes can emerge. For example, you could get an indication that a particular gene encodes a protein that creates an enzyme that in turn activates a protein that regulates a second gene on our list. This second gene could be a transcription factor , which in turn regulates another of our candidate genes . By observing these cross-connections, we can surmise that these are more than random associations and that all of these genes are on our list because they are part of an underlying biological process. On the other hand, genes that are independent of one another and that are completely randomly selected could also give the impression that they are part of a common process, although this is not the case.

Cause and effect relationships

With the help of simple statistical means, one can estimate whether associations between different genes in a list are larger than would be the case purely by chance. These simple statistics are interesting, even if they are a very oversimplification of the actual situation. Here's an example: Suppose an experiment includes 10,000 genes, of which only 50 (0.5%) are known to play a role in cholesterol synthesis. The experiment identified 200 regulated genes. 40 (20%) of these genes are also on a list of cholesterol genes. Based on the total frequency of cholesterol genes (0.5%), one would expect only 1 cholesterol gene per 200 regulated genes on average. This is just an average; H. sometimes you can expect more than 1 gene in 200. The question now is, how often do you see 40 genes instead of 1 by pure chance? According to the hypergeometric distribution, one would need experiments (a 10 with 56 zeros) before selecting 200 genes from a pool of 10,000 genes at random, 39 or more of which are then cholesterol genes. Whether one realizes how negligibly small the probability is that this is a chance observation or not, one would in any case come to the conclusion that the list of regulated genes is enriched with cholesterol-associated genes. In addition, it could be hypothesized that the experimental conditions have an influence on cholesterol regulation, since these selectively regulate those genes that are associated with cholesterol . While this may be true, there are a number of reasons why it is inappropriate to jump to conclusions here. On the one hand, gene regulation does not necessarily have a direct influence on protein regulation. Even if the proteins these genes code for do nothing but produce cholesterol, a change in their mRNA concentration does not say anything about changes at the protein level. It is quite possible that the amount of these cholesterol-associated genes will remain constant under the experimental conditions. On the other hand, it would be possible that even if the amount of protein changes, there is still enough protein to maintain the cholesterol synthesis at maximum speed, since it is not these proteins, but another protein that is not on the list that is the rate-determining step and is the limiting factor in cholesterol synthesis. In addition, proteins typically have many different functions, so that these genes are not regulated because of their common association with cholesterol synthesis, but rather because of a common function in a completely independent process. While gene profiles themselves cannot prove a causal relationship between experimental conditions and biological effects due to the reservations mentioned above, they do offer unique biological insights into relationships that would be very difficult to obtain in any other way. These relationships between different genes via the regulator proteins they express are graphically mapped using gene regulation networks . Such network models are identified using bioinformatic methods as a result of the gene expression analysis in conjunction with prior knowledge from molecular biological databases. This data-based network modeling is called network inference.

Using patterns to recognize regulated genes

As described above, one can first identify genes with significantly altered expression and then find relationships between groups of different significant genes by comparing a list of significant genes with a group of genes with a known association. You can also solve the problem the other way around, here is an example: 40 genes are related to a known process, e.g. B. A predisposition to diabetes. Take two sets of expression profiles, one with mice that were fed high carbohydrates and the other with mice on a low carbohydrate diet. When these two groups of expression profiles are compared, it is observed that all 40 diabetes genes in the group with the high-carbohydrate diet are expressed to a greater extent than in the low-carbohydrate group. Regardless of whether or not any of these genes would have ended up on a list of significantly altered genes, the fact that all 40 genes were upregulated and none downregulated can hardly be the result of sheer coincidence. It would be like tossing "heads" 40 times in a row, which is a probability of one in a trillion. A group of genes whose combined expression pattern gives the experimental conditions a unique characteristic represents the gene signature of these conditions for a particular cell type. Ideally, such a gene signature can be used to identify a group of patients who are at a certain stage of the disease. whereby the selection of the appropriate treatment method is facilitated. Gene Set Enrichment Analysis (GSEA) is based on this type of logic using more sophisticated statistical methods, since genes often behave more complexly than simply being up or down regulated as a group. In addition, the extent of the up or down regulation is decisive and not just the direction. In any case, such statistical methods measure the extent to which the behavior of small groups of genes differs from other genes that do not belong to this group. GSEA uses Kolmogorov Smirnov-style statistics to determine whether any previously defined gene groups exhibit unusual behavior in the current expression profile. This creates the challenge of testing multiple hypotheses, but adequate methods exist to address this problem.

Conclusions

Gene expression analysis provides new information about how genes behave under various conditions. By and large, microarray techniques produce reliable expression profiles. Based on this data, new biological hypotheses can be drawn up or existing hypotheses checked. However, the scope and complexity of these experiments often lead to a variety of different interpretations. In many cases, data analysis of expression profiles requires significantly more time and effort than the original experiment to generate the data. Many scientists use several statistical methods and preliminary data analysis and consult biostatisticians or other experts in the field of microarray technology before publishing the results of gene expression analyzes. A good experimental set-up, an adequate number of biological replicates, and repeated experiments play a key role in performing successful gene expression analyzes.

Individual evidence

  1. Microarrays Factsheet . Archived from the original on September 1, 2013. Retrieved December 28, 2007.
  2. Suter L, Babiss LE, Wheeldon EB: Toxicogenomics in predictive toxicology in drug development . In: Chem Biol.. . 11, No. 2, 2004, pp. 161-71. doi : 10.1016 / j.chembiol.2004.02.003 . PMID 15123278 .
  3. Magic Z, Radulovic S, Brankovic-Magic M: cDNA microarrays: identification of gene signatures and their application in clinical practice . In: J BUON . 12 Suppl 1, 2007, pp. S39-44. PMID 17935276 .
  4. Cheung AN: Molecular targets in gynecological cancers . In: Pathology . 39, No. 1, 2007, pp. 26-45. doi : 10.1080 / 00313020601153273 . PMID 17365821 .
  5. ^ ST Smale: Nuclear run-on assay. In: Cold Spring Harbor protocols. Volume 2009, Number 11, November 2009, S. pdb.prot5329, doi: 10.1101 / pdb.prot5329 , PMID 20150068 .
  6. LJ Core, JJ Waterfall, JT Lis: Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. In: Science. Volume 322, number 5909, December 2008, pp. 1845-1848, doi: 10.1126 / science.1162228 , PMID 19056941 , PMC 2833333 (free full text).
  7. ^ P. Liang, AB Pardee: Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. In: Science. Volume 257, Number 5072, August 1992, pp. 967-971, PMID 1354393 .
  8. ^ P. Liang: A decade of differential display. In: BioTechniques. Volume 33, Number 2, August 2002, pp. 338-44, 346, PMID 12188186 (review).
  9. MA Mumtaz, JP Couso: Ribosomal profiling adds new coding sequences to the proteome. In: Biochemical Society transactions. Volume 43, number 6, December 2015, pp. 1271-1276, doi: 10.1042 / BST20150170 , PMID 26614672 (review).
  10. Mirza SP, Olivier M: Methods and approaches for the comprehensive characterization and quantification of cellular proteomes using mass spectrometry . In: Physiological Genomics . 33, 2007, p. 3. doi : 10.1152 / physiolgenomics.00292.2007 . PMID 18162499 .
  11. Chen JJ: Key aspects of analyzing microarray gene-expression data . In: Pharmacogenomics . 8, No. 5, 2007, pp. 473-82. doi : 10.2217 / 14622416.8.5.473 . PMID 17465711 .
  12. ^ Significance Analysis of Microarrays . Retrieved December 27, 2007.
  13. Yauk CL, Berndt ML: Review of the literature examining the correlation among DNA microarray technologies . In: Environ Mol Mutagen . 48, No. 5, 2007, pp. 380-94. doi : 10.1002 / em.20290 . PMID 17370338 .
  14. Breitling R: Biological microarray interpretation: the rules of engagement . In: Biochim. Biophys. Acta . 1759, No. 7, 2006, pp. 319-27. doi : 10.1016 / j.bbaexp.2006.06.003 . PMID 16904203 .
  15. Draminski M, Rada-Iglesias A, Enroth S, Wadelius C, Koronacki J, Komorowski J: Monte Carlo feature selection for supervised classification . In: Bioinformatics . 24, No. 1, 2008, pp. 110-7. doi : 10.1093 / bioinformatics / btm486 . PMID 18048398 .
  16. Dr. Leming Shi, National Center for Toxicological Research: MicroArray Quality Control (MAQC) Project . US Food and Drug Administration. Retrieved December 26, 2007.
  17. Dai M, Wang P, Boyd AD, et al. : Evolving gene / transcript definitions significantly alter the interpretation of GeneChip data . In: Nucleic Acids Res . 33, No. 20, 2005, p. E175. doi : 10.1093 / nar / gni179 . PMID 16284200 .
  18. Alberts R, Terpstra P, Hardonk M, et al. : A verification protocol for the probe sequences of Affymetrix genome arrays reveals high probe accuracy for studies in mouse, human and rat . In: BMC Bioinformatics . 8, 2007, p. 132. doi : 10.1186 / 1471-2105-8-132 . PMID 17448222 .
  19. GSEA . Retrieved January 3, 2008.
  20. CTD: The Comparative Toxicogenomics Database . Retrieved January 3, 2008.
  21. ^ Ingenuity Systems . Retrieved December 27, 2007.
  22. Alekseev OM, Richardson RT, Alekseev O, O'Rand MG: Analysis of gene expression profiles in HeLa cells in response to overexpression or siRNA-mediated depletion of NASP . In: Reprod. Biol. Endocrinol. . 7, 2009, p. 45. doi : 10.1186 / 1477-7827-7-45 . PMID 19439102 . PMC 2686705 (free full text).
  23. Curtis RK, Oresic M, Vidal-Puig A: Pathways to the analysis of microarray data . In: Trends Biotechnol . . 23, No. 8, 2005, pp. 429-35. doi : 10.1016 / j.tibtech.2005.05.011 . PMID 15950303 .
  24. ^ Mook S, Van't Veer LJ, Rutgers EJ, Piccart-Gebhart MJ, Cardoso F: Individualization of therapy using Mammaprint: from development to the MINDACT Trial . In: Cancer Genomics Proteomics . 4, No. 3, 2007, pp. 147-55. PMID 17878518 .
  25. Corsello SM, Roti G, Ross KN, Chow KT, Galinsky I, DeAngelo DJ, Stone RM, Kung AL, Golub TR, Stegmaier K: Identification of AML1-ETO modulators by chemical genomics . In: Blood . 113, No. 24, June 2009, pp. 6193-205. doi : 10.1182 / blood-2008-07-166090 . PMID 19377049 .
  26. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles . In: Proc. Natl. Acad. Sci. USA . 102, No. 43, 2005, pp. 15545-50. doi : 10.1073 / pnas.0506580102 . PMID 16199517 .
  27. GSEA . Retrieved January 9, 2008.
  28. ^ Couzin J: Genomics. Microarray data reproduced, but some concerns remain . In: Science . 313, No. 5793, 2006, p. 1559. doi : 10.1126 / science.313.5793.1559a . PMID 16973852 .