MA plot

Within computational biology, an MA plot is an application of a Bland–Altman plot for visual representation of genomic data. The plot visualizes the differences between measurements taken in two samples, by transforming the data onto M (log ratio) and A (mean average) scales, then plotting these values. Though originally applied in the context of two channel DNA microarray gene expression data, MA plots are also used to visualise high-throughput sequencing analysis.^[1]^[2]

Explanation

Microarray data is often normalized within arrays to control for systematic biases in dye coupling and hybridization efficiencies, as well as other technical biases in the DNA probes and the print tip used to spot the array.^[3] By minimizing these systematic variations, true biological differences can be found. To determine whether normalization is needed, one can plot Cy5 (R) intensities against Cy3 (G) intensities and see whether the slope of the line is around 1. An improved method, which is basically a scaled, 45 degree rotation of the R vs. G plot is an MA-plot.^[4] The MA-plot is a plot of the distribution of the red/green intensity ratio ('M') plotted by the average intensity ('A'). M and A are defined by the following equations.

M=\log _{2}(R/G)=\log _{2}(R)-\log _{2}(G)

A={\frac {1}{2}}\log _{2}(RG)={\frac {1}{2}}(\log _{2}(R)+\log _{2}(G))

M is, therefore, the binary logarithm of the intensity ratio (or difference between log intensities) and A is the average log intensity for a dot in the plot. MA plots are then used to visualize intensity-dependent ratio of raw microarray data (microarrays typically show a bias here, with higher A resulting in higher |M|, i.e. the brighter the spot the more likely an observed difference between sample and control). The MA plot puts the variable M on the y-axis and A on the x-axis and gives a quick overview of the distribution of the data.

In many microarray gene expression experiments, an underlying assumption is that most of the genes would not see any change in their expression; therefore, the majority of the points on the y-axis (M) would be located at 0, since log(1) is 0. If this is not the case, then a normalization method such as LOESS should be applied to the data before statistical analysis. (On the diagram below see the red line running below the zero mark before normalization, it should be straight. Since it is not straight, the data should be normalized. After being normalized, the red line is straight on the zero line and shows as pink/black.)

Packages

Several Bioconductor packages, for the R software, provide the facility for creating MA plots. These include affy (ma.plot, mva.pairs), limma (plotMA), marray (maPlot), and edgeR(maPlot)

Similar "RA" plots can be generated using the raPlot function in the caroline CRAN R package.

An interactive MA plot to filter genes by M, A and p-values, search by names or with a lasso, and save selected genes, is available as an R-Shiny code Enhanced-MA-Plot.

Example in the R programming language

library(affy)

if (require(affydata)) 
{
     data(Dilution)
}

y <- (exprs(Dilution)[, c("20B", "10A")])

x11()

ma.plot( rowMeans(log2(y)), log2(y[, 1])-log2(y[, 2]), cex=1 )

title("Dilutions Dataset (array 20B v 10A)")

library(preprocessCore)

#do a quantile normalization
x <- normalize.quantiles(y)

x11()

ma.plot( rowMeans(log2(x)), log2(x[, 1])-log2(x[, 2]), cex=1 ) 
title("Post Norm: Dilutions Dataset (array 20B v 10A)")

MA Plot for raw data, note of the distortion in the upper ranges.

Pre Normalization

Post Normalization

MA Plots.

References

↑ Robinson, M. D.; McCarthy, D. J.; Smyth, G. K. (11 November 2009). "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data". Bioinformatics. 26 (1): 139–140. doi:10.1093/bioinformatics/btp616. PMC 2796818. PMID 19910308.
↑ Love, Michael I; Huber, Wolfgang; Anders, Simon (5 December 2014). "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2". Genome Biology. 15 (12): 550. doi:10.1186/s13059-014-0550-8. PMC 4302049. PMID 25516281.
↑ YH Yang, S Dudoit, P Luu, DM Lin, V Peng, J Ngai, TP Speed. (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research vol. 30 (4) pp. e15.
↑ Dudoit, S, Yang, YH, Callow, MJ, Speed, TP. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. Sin. 12:1 111–139

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Robinson, M. D.; McCarthy, D. J.; Smyth, G. K. (11 November 2009). "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data". Bioinformatics. 26 (1): 139–140. doi:10.1093/bioinformatics/btp616. PMC 2796818. PMID 19910308.

[2] Love, Michael I; Huber, Wolfgang; Anders, Simon (5 December 2014). "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2". Genome Biology. 15 (12): 550. doi:10.1186/s13059-014-0550-8. PMC 4302049. PMID 25516281.

[3] YH Yang, S Dudoit, P Luu, DM Lin, V Peng, J Ngai, TP Speed. (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research vol. 30 (4) pp. e15.

[4] Dudoit, S, Yang, YH, Callow, MJ, Speed, TP. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. Sin. 12:1 111–139

[1]

[2]

[3]

[4]

Explanation

Packages

Example in the R programming language

See also

References