MALDI Quality Tests
Introduction
In MicrobeMS, spectral quality tests can be used to systematically evaluate spectrum quality of bacterial MALDI-ToF mass spectra based on predefined quality criteria and eventually exclude them from further analysis. For this purpose, MicrobeMS enables automated execution of four different quality tests (QT) on selected MALDI-ToF mass spectra, whereby a quality score is determined for each test. These scores are subsequently utilized to calculate a general quality score parameter. In detail, the following quality criteria of a mass spectrum can be automatically assessed using the QT implemented in MicrobeMS: (a) mass spectral noise, (b) baseline shape, (c) number of peaks per spectrum, and (d) mean resolving power of the identified mass peaks. Quality testing involves automated determination of individual scores for each of the four QT criteria. Each score may range between values of zero (poor quality, test failed) and 100 (excellent quality). From these scores a weighted overall QT score is then determined using weightings of the underlying test scores. All parameters for QT are adjustable through the MicrobeMS software. The results of quality testing are visualized by a HTML document in which the spectral quality is encoded by a traffic light scheme. Furthermore, QT results can be stored in a Matlab data format which is thought to be helpful for subsequent statistical analysis of spectra quality parameters.
How exactly is the total quality score calculated? To obtain a general QT score, the individual QT scores are weighted and averaged using predefined weighting factors that can be set via the software. Note that the sum of weightings should always equal 100% (or 1), otherwise an error message will be provided.
QT threshold 1 and 2 - what do these specifications mean? These specifications merely define the color coding in the resulting HTML result file. Test results with a score greater than QT threshold 2 are shown in green, scores lower than QT threshold 1 are shown in red. For all other QT results, according to the traffic light scheme the color yellow is used in the HTML document. It should be noted that predefined (default ) color coding settings are quite strict. It was observed that even spectra with low (red) QT were sometimes quite suitable for identification analysis.
What does the parameter number of bins mean? Each QT produces as a result a specific test parameter. For example, the number of peaks test provides a defined value for the number of peaks. These values have to be then converted into score values, with the highest score being assigned to spectra with equal or more than 80 peaks (default setting, see screenshot below). If fewer than 5 peaks are found, a score of 0 is given. For parameters between 5 and 80, intervals in the range of [5 80] are used, on the basis of which the respective score values are assigned. The parameter number of bins specifies how many intervals are used by such an approach. To give an example, 9 limit values will be determined if for number of bins a value of 10 was specified. Such bin limits will be determined in a log scale manner. Furthermore, it is obvious, that 9 limits are needed to define 10 intervals. To stay with the example given above (number of peaks), the following interval limit values would be automatically calculated: [5 7 10 14 20 28 40 57 80]). This means, that a QT score of 0 would be assigned if only 4 peaks would be determined from the mass spectrum under investigation. In case of 15 peaks a score of 44.4% and of 70 peaks a QT score of 89.9% would be achieved, respectively.
Noise quality test
This first QT involves the following preprocessing steps:
- Initial baseline correction by an asymmetric least-square algorithm (AsLS)
- Cutting the spectra between two m/z values, usually between m/z 2000 and 13000
- Normalization (2-norm)
- Offset correction using the high m/z region to determine the spectrum offset value
Spectrum noise is then determined as the standard deviation of the pre-processed spectra, with the top 20% of the intensity data, usually the peak data, removed.
Noise values between 20 and 80 are assigned to scores of 100 (great) and 0 (very poor), respectively, while the noise weighting factor is set to 25% (default settings, see screenshot below).
P.H. Eilers and H. F. M. Boelens. Baseline Correction with Asymmetric Least Squares Smoothing. Leiden University Centre Medical Report 1(1) 2005 p. 5.
Baseline quality test
Baseline deviation is calculated as the the integral under the baseline curve. Prior to this, the spectrum is preprocessed, i.e. normalized (2-norm) and offset corrected. This procedure considers the peak intensities when normalizing the mass spectra.
Baseline deviation values between 150 and 1500 are assigned to baseline quality test scores of 100 and 0, respectively, while the baseline weighting factor is set to 20% (all default settings). Since the baselines can usually be well corrected by means of appropriate baseline correction routines such as AsLS, the weight of the baseline scores could be reduced below the default value (0.2). Note that the sum of all weightings should always equal 100% (or 1), otherwise an error message will be given.
Test number of peaks
The third quality test number of peaks starts with the following processing steps:
- Initial baseline correction by an asymmetric least-square algorithm (AsLS)
- Obtain a function to be used as a so-called peak threshold intensity function
In this QT, peaks are defined when spectral intensity values are larger than the threshold intensity function at specific m/z positions. It is important to note, that calculation of the threshold intensity function involves not only the AsLS baseline curve, but also spectrum noise. In addition to this, the threshold function is defined as generalized logistic function that models the higher sensitivity of the MALDI-ToF MS technique at lower m/z values and correspondingly lower spectrum intensities (peaks, noise) in the higher m/z range.
Furthermore, it is important to note that the parameter number of peaks is considered a relative value, intended mainly for comparing spectra. The parameter number of peaks given by the QT does not necessarily correspond to the actual number of peaks that can be extracted from a given mass spectrum. This parameter depends on many other parameters, such as the distance of the threshold function from the spectral curve (defined, among others, by the noise parameter).
Peak numbers of 5 and 80 are assigned to peak number score values of 0 and 100, respectively, while the corresponding weighting factor has been set by default to 30%. Since the peak number score is considered the most important parameter for calculating the overall QT score, the weight of this specific test could be increased beyond this value. Again, the sum of weightings should always equal 100% (or 1).
Quality test resolving power
The parameter resolving power, as the ratio of the m/z position of a peak divided by its FWHM, is effectively a by-product of the procedure described in the previous paragraph. The resolving power of the QT indicates the average of values for peaks found above the intensity threshold function. A high resolving power, corresponding to low values of the FWHM, is thought to be beneficial for peak detection. Large FWHM values are determined in case of broad peak features, often in cases where high laser power has been applied, and may indicate sub-optimal settings when acquiring mass spectral data.
Mean resolving power values between 200 and 800 are assigned to test scores of 0 and 100, respectively, while the weighting factor equals 25% (all in standard settings).
Useful links
- Example of a HTML formatted quality test report: https://wiki.microbe-ms.com/uploads/report-quality-29-Oct-2024-16-33-26.html
- Description of how to modify the default settings for quality testing by modifying the parameter file microbems.opt
- Description of the format of quality test result files (*.mat)