Data Format of Peak List Files: Difference between revisions

From MicrobeMS Wiki
Jump to navigation Jump to search
No edit summary
 
(20 intermediate revisions by the same user not shown)
Line 1: Line 1:
Peak list files combine multiple peak lists in one single file. These files are stored in a Matlab™ specific data format and contain the peak lists as well as the respective metadata. In Matlab peak list files can be loaded by entering the following command at the Matlab command prompt:
Peak list files combine multiple peak lists in one single file. Such peak list files are stored in a Matlab™ specific data format and contain peak data as well as the respective metadata. Peak list files can be loaded by entering the following command at the Matlab command prompt:


  >> load('ecoli-peaklist-oct16.pkf','-mat')
  >> load('pkffname','-mat');


This command will open ''ecoli-peaklist-oct16.pkf'', an example peak list file consisting of 16 individual peak lists from spectra of five different strains of ''E. coli''. The file ''ecoli-peaklist-oct16.pkf'' can be downloaded [http://wiki.microbe-ms.com/upload/ecoli-peaklist-oct16.pkf: '''here''']. If loading was successful, you will have access to a new Matlab variable ''C'' (structure array). Details of the structure of ''C'' are described next.<br> &nbsp;<br>
where ''pkffname'' denotes the name of the peak list multifile. For example, the command ''load('RKI-ring-trial-test-data.pkf','-mat')'' will open the file ''RKI-ring-trial-test-data.pkf'', a MALDI-ToF mass peak list multifile containing 24 individual peak tables acquired from experimental MALDI-ToF mass spectra. The latter spectra were recorded within the ''so called'' [https://pubmed.ncbi.nlm.nih.gov/26063856/ '''RKI ring trial study''']. The file ''RKI-ring-trial-test-data.pkf'' can be downloaded [https://wiki.microbe-ms.com/upload/RKI-ring-trial-test-data.pkf: '''here''']. <br>
You will have access to a new Matlab variable ''C'' (struc array) if loading was successful. Details of the structure of ''C'' are described next.<br> &nbsp; <br>




'''Fields of the structure array ''C''''':
== Fields of the structure array - '''''C''''' ==
 
<br>
{| class="wikitable" width=1100
{| class="wikitable" width=1100
!width=100| Fields
!width=100| Fields
Line 16: Line 17:
| nam
| nam
| spectra id
| spectra id
| string
| char array
| rowspan="33" style="background: #ffffff;" valign="top" | [[File:Peaklist-format-C-struc.jpg|250px|thumb|center|Matlab screenshot - format of a peak list file (*.pkf) demonstrating the general structure of the structure array 'C'. In this example the metadata of peak list #1 are shown.]]
| rowspan="39" style="background: #ffffff;" valign="top" | [[File:Multifile-format-spec-struc.png|250px|thumb|center|Screenshot showing the contents of the structure array ''C'' that is stored in ''so called'' spectrum multi files (*.muf). Fields of ''C'' contain spectral data (original, i.e. unmodified, and pre-processed), spectrum metadata as well as peak lists, calibration information, results of quality tests, and information collected during creation  of average, or database spectra. The example screenshot shows the contents of a database spectrum.]]
|-
|-
| gen
| gen
| genus information
| genus information
| string
| char array
|-
|-
| spe
| spe
| species info
| species info
| string
| char array
|-
|-
| str
| str
| strain info
| strain info
| string
| char array
|-
|-
| typ
| typ
| type
| type
| string
| char array
|-
|-
| uid
| uid
| taxonomy identification number for species as used by the NCBI (see [https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi])
| taxonomy identification number for species as used by the NCBI (see [https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi]), can be modified
| integer
| char array
|-
|-
| uie
| uie
| taxonomy identification number for strains used by the NCBI (see [https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi])
| unmodified taxonomy identification number for strains used by the NCBI (see [https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi])
| interger
| char array
|-
|-
| gti
| gti
| cultivation conditions: growth time
| cultivation conditions: growth time
| string
| char array
|-
|-
| tem
| tem
| cultivation conditions: cultivation temperature
| cultivation conditions: cultivation temperature
| string
| char array
|-
|-
| air
| air
| cultivation conditions: cultivation under aerobic or anaerobic conditions
| cultivation conditions: cultivation under aerobic or anaerobic conditions
| string
| char array
|-
|-
| med
| med
| cultivation conditions: cultivation medium
| cultivation conditions: cultivation medium
| string
| char array
|-
|-
| spo
| spo
| spore formers (YES or NO)
| spore formers (Yes or No)
| string
| char array
|-
|-
| con
| con
| sample concentration
| sample concentration
| string
| char array
|-
|-
| trt  
| trt  
| sample treatment
| sample treatment
| string
| char array
|-
|-
| ext  
| ext  
| extra information
| extra information
| string
| char array
|-
|-
| las  
| las  
| laser parameters (power, diameter, frequency, etc.)
| laser parameters (power, spot diameter, frequency, etc.)
| string
| char array
|-
|-
| cal  
| cal  
| calibration info
| calibration info
| string
| char array
|-
|-
| met
| met
| measurement method
| measurement method
| string
| char array
|-
|-
| cus  
| cus  
| customer info
| customer info
| string
| char array
|-
|-
| tim  
| tim  
| date and time of measurement
| date and time of measurement
| string
| char array
|-
|-
| pth
| pth
| path to spectrum
| path to spectrum
| string
| char array
|-
| pik
| [[#peak table format|peak table]], an array of the dimension [4 x npeaks] npeaks: number of peaks
| float32
|-
|-
| cls
| cls
| class assignment (valid values are 0,1,2,3 and 4)
| class assignment (valid values are 0,1,2,3 and 4)
| float32
|-
| lms
| MALDI-TOF or LC-MS spectrum? (valid values are 0 [MALDI] and 1 [LC-MS])
| float32
| float32
|-
|-
| lst
| lst
| formatted text containing the peak table
| formatted text containing the peak table info
| char array
| char array
|-
|-
| seq
| seq
| sequence of preprocessing steps
| sequence of pre-processing steps
| string
| char array
|-
|-
| smo
| smo
| the number of smoothing points (Savitzky-Golay smoothing)
| the number of smoothing points (Savitzky-Golay smoothing)
| float32
| char array
|-
|-
| bas
| bas
Line 132: Line 125:
|-
|-
| clb
| clb
| calibration paarmeters (see below for details)
| calibration parameters (not used)
| float32
| float32
|-
|-
| red
| red
| data reduction factor (spectral binning)
| data reduction factor (spectral binning)
| string
| char array
|-
|-
| cut
| cut
| cut in the spectral domain
| cut in the spectral domain, m/z range
| string
| char array
|-
| tmp
| temporary info (not always present)
| char array
|-
|-
| mod
| mod
| original data modified by cut or red (Yes:1, No:0)
| original data modified by cut spectra or reduce resolution (Yes:1, No:0)
| float32
|-
| lms
| MALDI-ToF MS, or LC-MS&sup1; data? (0: MALDI, 1: LC-MS&sup1;)
| float32
| float32
|-
|-
| prm
| pik
| parameters of peak detection
| [[#peak table format|peak table]], an array of the dimension [4 x npeaks] or [6 x npeaks], where npeaks denotes the number of peaks
| string
| float 32
|-
|-
| ccl
| ccl
| [[#structure array ccl|calibration information]] (see below)
| [[#structure array ccl|calibration information]]
| structure array
| struc array
|-
| avr
| [[#structure array avr|average spectrum]]
| struc array
|-
|-
| dbs
| dbs
| [[#structure array dbs|data base spectrum]] (Yes:1, No:0)
| [[#structure array dbs|data base spectrum]]
| structure array
| struc array
|-
| prm
| parameters of peak detection
| char array
|-
| qt
| quality test parameter
| struc array
|-
|-
| avr
| [[#structure array avr|average spectrum]] (Yes:1, No:0)
| structure array
|}
|}


== Peak table format - '''''C.pik''''' ==


<span class="mw-headline" id="peak table format">'''Format of peak tables''' (C.pik):</span>
<span class="mw-headline" id="peak table format"></span>
 
<br>
 
{| class="wikitable" width=800
{| class="wikitable" width=800
!width=100| Fields
!width=100| Fields
Line 183: Line 193:
| C.pik(4,:) <br> &nbsp; <br>
| C.pik(4,:) <br> &nbsp; <br>
| in case of single spectra, i.e. no database or average spectra: baseline-corrected absolute intensities of the peaks, in case of average or database spectra: the relative peak frequency
| in case of single spectra, i.e. no database or average spectra: baseline-corrected absolute intensities of the peaks, in case of average or database spectra: the relative peak frequency
|-
| C.pik(5,:) <br> &nbsp; <br>
| FWHH of the given peak (not always present, requires QT) <br> &nbsp; <br>
|-
| C.pik(6,:) <br> &nbsp; <br>
| resolving power of the given peak (not always present, requires QT) <br> &nbsp; <br>
|}
|}


== Calibration information - '''''C.ccl''''' ==


<span class="mw-headline" id="structure array ccl">'''Calibration Information''' (C.ccl):</span>
<span class="mw-headline" id="structure array ccl"></span>
 
<br>
{| class="wikitable" width=1100
{| class="wikitable" width=1100
!width=100| Fields
!width=100| Fields
Line 197: Line 214:
| calibration constant 1
| calibration constant 1
| float32
| float32
| rowspan="15" style="background: #ffffff;" valign="top" | [[File:Array-spec-ccl.jpg|250px|thumb|center|Matlab screenshot - format of structure array C.ccl containing the calibration info, such as calibration constants, delay time, number of spectra data points, etc. for spectrum #1.]]
| rowspan="15" style="background: #ffffff;" valign="top" | [[File:calibration-format-spec-struc.png|250px|thumb|center|Screenshot showing the contents of the structure array ''C.ccl'' containing the calibration info (calibration constants, delay time, number of spectrum data points, etc.)]]
|-
|-
| cl2
| cl2
Line 221: Line 238:
| ncl
| ncl
| calibration info required to store the spectrum in a Bruker-specific data format  
| calibration info required to store the spectrum in a Bruker-specific data format  
| string
| char array
|-
|-
| ncr
| ncr
| calibration info required to store the spectrum in a Bruker-specific data format  
| calibration info required to store the spectrum in a Bruker-specific data format  
| string
| char array
|-
|-
| bid
| bid
| hardware id of the spectrum
| hardware id of the spectrum ('Bruker ID')
| string
| char array
|-
| mid
| MicrobeMS id of the spectrum
| char array
|-
|-
| org
| org
| manufacturer info
| manufacturer info
| string
| char array
|-
|-
| tfu
| tfu
| manufacturer info
| 'ToF user'
| string
| char array
|-
| tfu
| software info, required for compatibility issues
| string
|-
|-
| spm
| spm
| type of instrumentation
| not used
| string
| char array
|-
|-
| stp
| stp
| type of measurement (should be 'TOF')
| type of measurement (should be 'TOF')
| string
| char array
|-
|-
| acq
| acq
| path to the original spectrum
| further acquisition info
| string
| char array
|}
|}


== Database spectra - '''''C.dbs''''' ==


 
<span class="mw-headline" id="structure array dbs"></span>
 
<br>
<span class="mw-headline" id="structure array dbs">'''Data Base Spectrum''' (C.dbs):</span>
A [[Create database spectra|database spectrum]] is usually created from many (>3) individual mass spectra. Like in regular experimental spectra, spectral data and metadata of average spectra are stored in specific fields of structure array ''C''. In database spectra the field ''C(i).dbs'' is used to store relevant data from experimental source spectra from which the given database spectrum has been derived. These fields are left empty in experimental and average spectra. Details of the structure of ''C.dbs'' are given in the table below.  
 
A database spectrum is usually created from many (>3) individual mass spectra. The structure array ''C.dbs'' contains information (metadata, peak tables) on the mass spectra used to produce the given database spectrum. Details of the structure of ''C.dbs'' are given in the table below.


{| class="wikitable" width=1100
{| class="wikitable" width=1100
Line 270: Line 286:
|-
|-
| mem
| mem
| string defining if the current spectrum is a data base spectrum (1) or not (0)
| specifies whether the current spectrum is a data base spectrum (1) or not (0)
| string
| char array
| rowspan="5" style="background: #ffffff;" valign="top" |[[File:Array-spec-dbs.jpg|250px|thumb|center|Matlab screenshot - format of structure array C.dbs. C(1,17).dbs(1,1) contains information of mass spectrum #1 which was used with others to obtain data base spectrum #17, such as the id, taxonomic information, peak tables and the respective peak detection parameters).]]
| rowspan="5" style="background: #ffffff;" valign="top" |[[File:dbs-format-spec-struc.png|250px|thumb|center|Screenshot of structure array ''C.dbs''. This screenshot shows information like the spectrum id, taxonomic information, peak tables, respective peak detection parameters, etc of mass spectrum #1 [C(1).dbs(1,1)] that was used with others to obtain a database spectrum]]
|-
|-
| ids
| ids
| id of the individual mass spectrum used to create the data base spectrum
| id of the individual mass spectrum that contributed to the given database spectrum
| string
| char array
|-
|-
| tax
| tax
| taxonomic info of the source spectrum
| contains taxonomical information (i.e. the genus, species, strain information)
| string
| char array
|-
|-
| pik
| pik
| peak table of the source spectrum  
| peak table of the given source spectrum  
| float32
| float32
|-
|-
| prm
| prm
| parameters of peak detection  
| parameters used for peak detection  
| string
| char array
|}
|}


== Average spectra - '''''C.avr''''' ==


<span class="mw-headline" id="structure array avr">'''Average Spectrum''' (C.avr):</span>
<span class="mw-headline" id="structure array avr"></span>
 
<br>
An average spectrum is usually created from many (>3) individual mass spectra. The structure array ''C.avr'' contains information (metadata, peak tables) on the mass spectra used to produce the given avarage spectrum. Details of the structure of ''C.avr'' are given in the table below.
An [[Averaging Mass Spectra|average spectrum]] is usually created from many (>3) individual mass spectra. Like in regular experimental spectra, spectral data and metadata of average spectra are stored in specific fields of structure array ''C''. In average spectra the field ''C(i).avr'' is used to store relevant data from experimental source spectra from which the given average spectrum has been derived. These fields are empty in experimental and database spectra. Details of the structure of ''C.avr'' are given in the table below.  


{| class="wikitable" width=1100
{| class="wikitable" width=1100
Line 303: Line 320:
|-
|-
| mem
| mem
| string defining if the current spectrum is an average spectrum (1) or not (0)
| specifies whether the contributing spectrum is an average spectrum (1) or not (0)
| string
| char array
| rowspan="5" style="background: #ffffff;" valign="top" |[[File:Array-spec-avr.jpg|250px|thumb|center|Matlab screenshot - format of structure array C.avr. spec(1,18).avr(1,1) contains information of mass spectrum #1 which was used with others to obtain an average spectrum #18, such as the id, taxonomic information, peak tables and the respective peak detection parameters).]]
| rowspan="6" style="background: #ffffff;" valign="top" |[[File:avr-format-spec-struc.png|250px|thumb|center|Screenshot of structure array ''C.avr''. This screenshot shows information like the spectrum id, taxonomic information, peak tables, respective peak detection parameters, etc of mass spectrum #1 [C(1).avr(1,1)] that was used with others to obtain an average spectrum]]
|-
|-
| ids
| ids
| id of the individual mass spectrum used to create the avarage spectrum
| id of the individual mass spectrum that contributed to the given average spectrum
| string
| char array
|-
|-
| tax
| tax
| taxonomic info of the source spectrum
| contains taxonomical information (i.e. the genus, species, strain information)
| string
| char array
|-
|-
| pik
| pik
| peak table of the source spectrum  
| peak table of the given source spectrum  
| float32
| float32
|-
|-
| prm
| prm
| parameters of peak detection  
| parameters used for peak detection  
| string
| char array
|}
 
 
== Quality test results - '''''C.qt''''' ==
 
<span class="mw-headline" id="structure array qt"></span>
<br>
The structure array ''C.qt'' contains the results of a [[MALDI Quality Tests|Quality Test]]. Fields of this structure are empty if no QT has been performed. Details of the structure of ''C.qt'' are given in the table below.
<br>
{| class="wikitable" width=1100
!width=100| Fields
!width=600| Description
!width=100| Type
!width=300|
|-
| noise
| QT data of the ''noise'' test, contains fields ''abs'', ''rnk'', and ''obj''
| struc array
| rowspan="7" style="background: #ffffff;" valign="top" |[[File:qt-format-spec-struc.png|250px|thumb|center|Screenshot of structure array ''C.qt'' that contains the results of a quality test (QT).]]
|-
| basln
| QT data of the ''baseline'' test, contains fields ''abs'', ''rnk'', and ''obj''
| struc array
|-
| npiks
| QT data of the test ''number of peaks'', contains fields ''abs'', ''rnk'', and ''obj''
| struc array
|-
| respw
| QT data of the test ''resolution power'', contains fields ''abs'', ''rnk'', and ''obj''
| struc array
|-
| rnk
| overall rank that the given spectrum has achieved in a QT with a number of other spectra
| float32
|-
| res
| overall quality test score
| float32
|}
|}

Latest revision as of 10:32, 11 April 2025

Peak list files combine multiple peak lists in one single file. Such peak list files are stored in a Matlab™ specific data format and contain peak data as well as the respective metadata. Peak list files can be loaded by entering the following command at the Matlab command prompt:

>> load('pkffname','-mat');

where pkffname denotes the name of the peak list multifile. For example, the command load('RKI-ring-trial-test-data.pkf','-mat') will open the file RKI-ring-trial-test-data.pkf, a MALDI-ToF mass peak list multifile containing 24 individual peak tables acquired from experimental MALDI-ToF mass spectra. The latter spectra were recorded within the so called RKI ring trial study. The file RKI-ring-trial-test-data.pkf can be downloaded here.
You will have access to a new Matlab variable C (struc array) if loading was successful. Details of the structure of C are described next.
 


Fields of the structure array - C


Fields Description Data type
nam spectra id char array
Screenshot showing the contents of the structure array C that is stored in so called spectrum multi files (*.muf). Fields of C contain spectral data (original, i.e. unmodified, and pre-processed), spectrum metadata as well as peak lists, calibration information, results of quality tests, and information collected during creation of average, or database spectra. The example screenshot shows the contents of a database spectrum.
gen genus information char array
spe species info char array
str strain info char array
typ type char array
uid taxonomy identification number for species as used by the NCBI (see [1]), can be modified char array
uie unmodified taxonomy identification number for strains used by the NCBI (see [2]) char array
gti cultivation conditions: growth time char array
tem cultivation conditions: cultivation temperature char array
air cultivation conditions: cultivation under aerobic or anaerobic conditions char array
med cultivation conditions: cultivation medium char array
spo spore formers (Yes or No) char array
con sample concentration char array
trt sample treatment char array
ext extra information char array
las laser parameters (power, spot diameter, frequency, etc.) char array
cal calibration info char array
met measurement method char array
cus customer info char array
tim date and time of measurement char array
pth path to spectrum char array
cls class assignment (valid values are 0,1,2,3 and 4) float32
lst formatted text containing the peak table info char array
seq sequence of pre-processing steps char array
smo the number of smoothing points (Savitzky-Golay smoothing) char array
bas number of intervals used for baseline correction float32
nrm normalization parameter (Yes:1, No:0) float32
clb calibration parameters (not used) float32
red data reduction factor (spectral binning) char array
cut cut in the spectral domain, m/z range char array
tmp temporary info (not always present) char array
mod original data modified by cut spectra or reduce resolution (Yes:1, No:0) float32
lms MALDI-ToF MS, or LC-MS¹ data? (0: MALDI, 1: LC-MS¹) float32
pik peak table, an array of the dimension [4 x npeaks] or [6 x npeaks], where npeaks denotes the number of peaks float 32
ccl calibration information struc array
avr average spectrum struc array
dbs data base spectrum struc array
prm parameters of peak detection char array
qt quality test parameter struc array

Peak table format - C.pik


Fields Description
C.pik(1,:)
 
m/z positions of the peaks in the peak table
 
C.pik(2,:)
 
absolute intensities of these peaks
 
C.pik(3,:)
 
weighting factors (the sum of these factors equals 100)
 
C.pik(4,:)
 
in case of single spectra, i.e. no database or average spectra: baseline-corrected absolute intensities of the peaks, in case of average or database spectra: the relative peak frequency
C.pik(5,:)
 
FWHH of the given peak (not always present, requires QT)
 
C.pik(6,:)
 
resolving power of the given peak (not always present, requires QT)
 

Calibration information - C.ccl


Fields Description Type
cl1 calibration constant 1 float32
Screenshot showing the contents of the structure array C.ccl containing the calibration info (calibration constants, delay time, number of spectrum data points, etc.)
cl2 calibration constant 2 float32
cl3 calibration constant 3 float32
del delay time [ns] float32
npt number of data points float32
res time resolution [ns] float32
ncl calibration info required to store the spectrum in a Bruker-specific data format char array
ncr calibration info required to store the spectrum in a Bruker-specific data format char array
bid hardware id of the spectrum ('Bruker ID') char array
mid MicrobeMS id of the spectrum char array
org manufacturer info char array
tfu 'ToF user' char array
spm not used char array
stp type of measurement (should be 'TOF') char array
acq further acquisition info char array

Database spectra - C.dbs


A database spectrum is usually created from many (>3) individual mass spectra. Like in regular experimental spectra, spectral data and metadata of average spectra are stored in specific fields of structure array C. In database spectra the field C(i).dbs is used to store relevant data from experimental source spectra from which the given database spectrum has been derived. These fields are left empty in experimental and average spectra. Details of the structure of C.dbs are given in the table below.

Fields Description Type
mem specifies whether the current spectrum is a data base spectrum (1) or not (0) char array
Screenshot of structure array C.dbs. This screenshot shows information like the spectrum id, taxonomic information, peak tables, respective peak detection parameters, etc of mass spectrum #1 [C(1).dbs(1,1)] that was used with others to obtain a database spectrum
ids id of the individual mass spectrum that contributed to the given database spectrum char array
tax contains taxonomical information (i.e. the genus, species, strain information) char array
pik peak table of the given source spectrum float32
prm parameters used for peak detection char array

Average spectra - C.avr


An average spectrum is usually created from many (>3) individual mass spectra. Like in regular experimental spectra, spectral data and metadata of average spectra are stored in specific fields of structure array C. In average spectra the field C(i).avr is used to store relevant data from experimental source spectra from which the given average spectrum has been derived. These fields are empty in experimental and database spectra. Details of the structure of C.avr are given in the table below.

Fields Description Type
mem specifies whether the contributing spectrum is an average spectrum (1) or not (0) char array
Screenshot of structure array C.avr. This screenshot shows information like the spectrum id, taxonomic information, peak tables, respective peak detection parameters, etc of mass spectrum #1 [C(1).avr(1,1)] that was used with others to obtain an average spectrum
ids id of the individual mass spectrum that contributed to the given average spectrum char array
tax contains taxonomical information (i.e. the genus, species, strain information) char array
pik peak table of the given source spectrum float32
prm parameters used for peak detection char array


Quality test results - C.qt


The structure array C.qt contains the results of a Quality Test. Fields of this structure are empty if no QT has been performed. Details of the structure of C.qt are given in the table below.

Fields Description Type
noise QT data of the noise test, contains fields abs, rnk, and obj struc array
Screenshot of structure array C.qt that contains the results of a quality test (QT).
basln QT data of the baseline test, contains fields abs, rnk, and obj struc array
npiks QT data of the test number of peaks, contains fields abs, rnk, and obj struc array
respw QT data of the test resolution power, contains fields abs, rnk, and obj struc array
rnk overall rank that the given spectrum has achieved in a QT with a number of other spectra float32
res overall quality test score float32