![]() NMR-Based Metabolomics Studies: From Class Separation To Biological Significance |
|||||||
|
by Yann Bidault, Michelle D' Souza, Chen Peng, Gregory M. Banik Introduction
Metabolomics studies rely on applying mathematical and statistical treatments to spectroscopic data in order to correlate variations in the spectroscopic signals to changes in biological conditions. NMR has emerged as a major analytical technique used to measure metabolic changes in biological systems. Although NMR spectroscopy is usually less sensitive than mass spectrometry, it has the advantage of requiring little sample preparation, providing quick results using small samples with very good reproducibility within and across labs. In addition, NMR is more quantitative than MS. Transforming NMR spectra into multivariate distributions is a relatively straightforward process, but deriving biological significance from these data represents a very challenging task. We will present how utilizing advanced chemometrics tools closely integrated within a spectroscopy data management and analysis platform can result in the efficient interpretation of high resolution 1H NMR data. Figure 1 presents the workflow we utilized to reach this goal. Two examples will illustrate the flexibility and efficiency of this approach and its capacity to extract useful information starting from raw data. Materials and methods
The ATHK dataset consists of NMR spectra from metabolic extracts of wild type (wt), ATHK1 knockout (At) and ATHK1 overexpressor (35s) Arabidopsis thaliana plants that were germinated and grown in sterile liquid cultures. This study aims to understand the role of ATHK1, a putative membrane histidine kinase, as the osmolyte sensor for the plant HOG1 pathway. It was hypothesized that ATHK1 mutants would have altered steady state concentrations of the established osmolytes when the plants were exposed to saline media. For both datasets, original FID signals were processed into NMR spectral databases using the KnowItAll Informatics System (Bio-Rad Laboratories, Informatics Division, Philadelphia, PA), version 7.8.4. The strong water peak at 4.6-5.0 ppm was removed from all spectra. Full resolution spectra can be used for direct principal component analysis (PCA). However, experimental variations can generate slight misalignment among spectra even after a global spectrum alignment. Therefore, it can become necessary to bin signals prior to transferring them for PCA. The "IntelliBucket" method is an in-house developed variable-width binning algorithm. The bin width can be automatically tuned in a user-defined variation range to accommodate a whole peak in a single bin. The KnowItAll system also offers fixed-width bucketing and AFNS (Automated Filtering of NMR Spectra), a method first proposed by Ian Lewis, John Markley's group, University of Wisconsin.(4) It uses a rolling binning algorithm, multiple bin widths, and t-statistic-based filtering to identify significant features in complex spectra. AFNS can be used to convert large spectral datasets (normally 32K or 64K points per spectrum) into a few hundreds of statistically significant points while keeping the original digital resolution.
Results and discussionFigure 2 presents PCA results for dataset 1 at full resolution (2a) and for dataset 2 after AFNS application (2b). Dataset 1 did not require any specific binning approach, considering that the sample spectra were reasonably well-aligned. Therefore, PCA was conducted at full resolution, with data pre-processing kept to a minimum (mean centering, data normalization, and variable subtraction to set the baseline Y value to 0), resulting in very good diabetic/non-diabetic class separation.Dataset 2 (ATHK) has six enumerations of genotypes and conditions. The samples were produced under controlled concentration and the spectra were normalized to the area of the DSS peaks. No further Y-transform was needed prior to PCA. It appeared that dataset 2 could not be separated very well using full-resolution or "IntelliBucketed" spectra due to local peak misalignment. Using the AFNS algorithm prior to PCA, selecting about 10K significant points from the original 17K points, achieved the best separation. Although it applies various bin widths for PCA, the resulting loadings plot retains the same resolution as the original spectra and is suitable for high quality automatic or visual interpretation. PCA following AFNS resulted in good class separation, except for two classes wt (wild type) and At (ATHK1 knock-out) in low saline media which is consistent with the biological interpretation exposed below. From separation to biological interpretation: correlating PCA loadings to identified metabolite peaksIn the ATHK study (dataset 2), it was hypothesized that ATHK mutants would have distinct osmolyte profiles when exposed to high salt media but more similar metabolic profiles under low salt conditions. As seen in figure 2b, AFNS followed by PCA supported this prediction and demonstrated that the high and low salt condition for every genotype could be discriminated with a single principal component. The Factor 1 PCA loadings plot, where peaks of identified metabolites were labeled (Figure 3), confirms there are significant differences in osmolyte concentrations between the three genotypes. These data support the observation of lower ATHK1 knockout viability and higher over-expressor viability in 100 mM NaCl in comparison to the small phenotypic variation observed in low salt media.
Converting PCA loadings into query spectrum and searching a reference metabolite databaseIn the diabetic study, the PCA scores plot presented in figure 2a separated well-diabetic and non-diabetic two classes along factor 1. The Factor 1 loadings were then converted into a query spectrum (Figure 4a) that was used to search our current metabolite reference database of 279 NMR spectra, original data obtained from the University of Wisconsin, Madison.(6) Glucose was identified as one of the top chemicals responsible for diabetic disease (Figure 4b).Projecting a reference database into the PCA scores spaceDatabase projection(7) represents an alternative method to confirm these findings. It consists in projecting a database of reference metabolites into the PCA scores space, filtering the data by distance and X-residuals to select only those metabolites that best account for class separation. As can be seen in the database projections resulting from the study, Glucose and related compounds were in the vicinity of the "diabetic" space, therefore confirming the results of the metabolite database query.ConclusionOur research indicates that depending on the nature of the dataset under investigation, successful class discrimination requires a flexible approach with multiple binning/bucketing and data filtering options. The utilization of a dedicated cheminformatics platform integrating spectroscopy data processing and management, chemometrics, and access to metabolite reference databases represents a powerful working environment for extracting biological meaning from spectra and ultimately turning data into knowledge.AcknowledgementsWe appreciate the chemometrics technology provided by Infometrix Inc. and the help from Dr. Brian Rohrback and Dr. Scott Ramos. Gratefully, we acknowledge the support from our users Dr. Bin Xia and Tao Wang of Beijing University, Dr. John Markley and Ian Lewis, University of Wisconsin. Their ideas and guidance have been critical for us to understand the science of NMR-based metabolomics and continually fine tune our approach.About the authorsYann Bidault, a former product manager with Bio-Rad Laboratories, is a cheminformatics consultant based in Gap, France. Michelle D. Souza, Chen Peng, and Gregory M. Banik are all with Bio-Rad Laboratories, Informatics Division.More information about the KnowItAll Metabolomics edition is available from: Bio-Rad Laboratories, Informatics Div. 888 524 6723 www.knowitall.com/metabolomics 1. Robertson, D.G. Metabonomics in Toxicology, a review. Toxicol. Sci. 85(2):809-822 (2005). 2. Nicholson, J.K., Lindon, J.C., Holmes, E. 'Metabonomics': understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica 29: 1181-1189 (1999). 3. Fiehn, O. Combining genomics, metabolome analysis, and biochemical modeling to understand metabolic networks. Comp. Func. Genom. 2:155-168 (2001). 4. Bidault, Y., Peng, C., Banik, G.M., Ramos, S., Rohrback, B., Lewis, I., Markley, J. Variable-width binning or no binning: a study of different binning methods in NMR-based metabolomics analysis. Metabomeeting3, 2006 (poster). 5. IPAK 4.0, http://www.infometrix.com/software/softdesc.html. 6. Original data available at: htttp://www.bmrb.wisc.edu/metabolomics/metabolomics_standards.html. 7. Dieterle, F., Ross, A., Scholetterbeck, G. Senn, H. Metabolite Projection Analysis for Fast Identification of Metabolites in Metabonomics. Application in an Amiodarone Study. Anal. Chem. 78(11):3551 -3561 (2006). |
|||||||
© 2006 Advantage Business Media All rights reserved. Use of this website is subject to its terms of use. Privacy Policy |