Data Format of Peak List Files: Difference between revisions

From MicrobeMS Wiki
Jump to navigation Jump to search
mNo edit summary
No edit summary
Line 6: Line 6:




'''Fields of the structure array ''C''''':
== Fields of the structure array - '''''spec''''' ==
 
<br>
{| class="wikitable" width=1100
{| class="wikitable" width=1100
!width=100| Fields
!width=100| Fields
Line 17: Line 17:
| spectra id
| spectra id
| char array
| char array
| rowspan="33" style="background: #ffffff;" valign="top" | [[File:Peaklist-format-C-struc.jpg|250px|thumb|center|Matlab screenshot - format of a peak list file (*.pkf) demonstrating the general structure of the structure array 'C'. In this example the metadata of peak list #1 are shown.]]
| rowspan="39" style="background: #ffffff;" valign="top" | [[File:Multifile-format-spec-struc.png|250px|thumb|center|Screenshot showing the content of the structure array ''spec'' that is stored in ''so called'' spectrum multi files (*.muf). Fields of ''spec'' contain spectral data (original, i.e. unmodified, and pre-processed), spectrum metadata as well as peak lists, calibration information, results of quality tests, and information collected during creation  of average, or database spectra. In the example of the given screenshot, the content of one database spectrum is depicted.]]
|-
|-
| gen
| gen
Line 36: Line 36:
|-
|-
| uid
| uid
| taxonomy identification number for species as used by the NCBI (see [https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi])
| taxonomy identification number for species as used by the NCBI (see [https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi]), can be modified
| integer
| char array
|-
|-
| uie
| uie
| taxonomy identification number for strains used by the NCBI (see [https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi])
| unmodified taxonomy identification number for strains used by the NCBI (see [https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi])
| integer
| char array
|-
|-
| gti
| gti
Line 60: Line 60:
|-
|-
| spo
| spo
| spore formers (YES or NO)
| spore formers (Yes or No)
| char array
| char array
|-
|-
Line 76: Line 76:
|-
|-
| las  
| las  
| laser parameters (power, diameter, frequency, etc.)
| laser parameters (power, spot diameter, frequency, etc.)
| char array
| char array
|-
|-
Line 98: Line 98:
| path to spectrum
| path to spectrum
| char array
| char array
|-
| pik
| [[#peak table format|peak table]], an array of the dimension [4 x npeaks] npeaks: number of peaks
| float32
|-
|-
| cls
| cls
| class assignment (valid values are 0,1,2,3 and 4)
| class assignment (valid values are 0,1,2,3 and 4)
| float32
|-
| lms
| MALDI-TOF or LC-MS spectrum? (valid values are 0 [MALDI] and 1 [LC-MS])
| float32
| float32
|-
|-
| lst
| lst
| formatted text containing the peak table
| formatted text containing the peak table info
| char array
| char array
|-
|-
| seq
| seq
| sequence of preprocessing steps
| sequence of pre-processing steps
| char array
| char array
|-
|-
| smo
| smo
| the number of smoothing points (Savitzky-Golay smoothing)
| the number of smoothing points (Savitzky-Golay smoothing)
| float32
| char array
|-
|-
| bas
| bas
Line 132: Line 124:
|-
|-
| clb
| clb
| calibration paarmeters (see below for details)
| calibration parameters (not used)
| float32
| float32
|-
|-
Line 140: Line 132:
|-
|-
| cut
| cut
| cut in the spectral domain
| cut in the spectral domain, m/z range
| char array
|-
| tmp
| temporary info (not always present)
| char array
| char array
|-
|-
| mod
| mod
| original data modified by cut or red (Yes:1, No:0)
| original data modified by cut spectra or reduce resolution (Yes:1, No:0)
| float32
|-
| lms
| MALDI-TOF MS, or LC-MS1 data? (0: MALDI, 1: LC-MS1)
| float32
| float32
|-
|-
| prm
| pik
| parameters of peak detection
| [[#peak table format|peak table]], an array of the dimension [4 x npeaks] or [6 x npeaks], where npeaks denotes the number of peaks
| char array
| float 32
|-
|-
| ccl
| ccl
| [[#structure array ccl|calibration information]] (see below)
| [[#structure array ccl|calibration information]] (see below)
| structure array
| struc array
|-
| avr
| [[#structure array avr|average spectrum]] (Yes:1, No:0)
| struc array
|-
|-
| dbs
| dbs
| [[#structure array dbs|data base spectrum]] (Yes:1, No:0)
| [[#structure array dbs|data base spectrum]] (Yes:1, No:0)
| structure array
| struc array
|-
| prm
| parameters of peak detection
| char array
|-
| qt
| quality test parameter
| struc array
|-
|-
| avr
| [[#structure array avr|average spectrum]] (Yes:1, No:0)
| structure array
|}
|}




<span class="mw-headline" id="peak table format">'''Format of peak tables''' (C.pik):</span>
== Peak table format - '''''spec.pik''''' ==
 


<span class="mw-headline" id="peak table format"></span>
<br>
{| class="wikitable" width=800
{| class="wikitable" width=800
!width=100| Fields
!width=100| Fields
!width=700| Description
!width=700| Description
|-
|-
| C.pik(1,:) <br> &nbsp; <br>
| spec.pik(1,:) <br> &nbsp; <br>
| m/z positions of the peaks in the peak table <br> &nbsp; <br>  
| m/z positions of the peaks in the peak table <br> &nbsp; <br>  
|-
|-
| C.pik(2,:) <br> &nbsp; <br>
| spec.pik(2,:) <br> &nbsp; <br>
| absolute intensities of these peaks <br> &nbsp; <br>
| absolute intensities of these peaks <br> &nbsp; <br>
|-
|-
| C.pik(3,:) <br> &nbsp; <br>
| spec.pik(3,:) <br> &nbsp; <br>
| weighting factors (the sum of these factors equals 100) <br> &nbsp; <br>
| weighting factors (the sum of these factors equals 100) <br> &nbsp; <br>
|-
|-
| C.pik(4,:) <br> &nbsp; <br>
| spec.pik(4,:) <br> &nbsp; <br>
| in case of single spectra, i.e. no database or average spectra: baseline-corrected absolute intensities of the peaks, in case of average or database spectra: the relative peak frequency
| in case of single spectra, i.e. no database or average spectra: baseline-corrected absolute intensities of the peaks, in case of average or database spectra: the relative peak frequency
|-
| spec.pik(5,:) <br> &nbsp; <br>
| FWHH of the given peak (requires QT) <br> &nbsp; <br>
|-
| spec.pik(6,:) <br> &nbsp; <br>
| resolving power of the given peak (requires QT) <br> &nbsp; <br>
|}
|}




<span class="mw-headline" id="structure array ccl">'''Calibration Information''' (C.ccl):</span>
== Calibration information - '''''spec.ccl''''' ==


<span class="mw-headline" id="structure array ccl"></span>
<br>
{| class="wikitable" width=1100
{| class="wikitable" width=1100
!width=100| Fields
!width=100| Fields
Line 197: Line 215:
| calibration constant 1
| calibration constant 1
| float32
| float32
| rowspan="15" style="background: #ffffff;" valign="top" | [[File:Array-spec-ccl.jpg|250px|thumb|center|Matlab screenshot - format of structure array C.ccl containing the calibration info, such as calibration constants, delay time, number of spectra data points, etc. for spectrum #1.]]
| rowspan="15" style="background: #ffffff;" valign="top" | [[File:calibration-format-spec-struc.png|250px|thumb|center|Screenshot showing the content of the structure array spec.ccl containing the calibration info (calibration constants, delay time, number of spectrum data points, etc.)]]
|-
|-
| cl2
| cl2
Line 228: Line 246:
|-
|-
| bid
| bid
| hardware id of the spectrum
| hardware id of the spectrum ('Bruker ID')
| char array
| char array
|-
|-
| org
| mid
| manufacturer info
| MicrobeMS id of the spectrum
| char array
| char array
|-
|-
| tfu
| org
| manufacturer info
| manufacturer info
| char array
| char array
|-
|-
| tfu
| tfu
| software info, required for compatibility issues
| 'ToF user'
| char array
| char array
|-
|-
| spm
| spm
| type of instrumentation
| not used
| char array
| char array
|-
|-
Line 252: Line 270:
|-
|-
| acq
| acq
| path to the original spectrum
| further acquisition info
| char array
| char array
|}
|}




== Database spectra - '''''spec.dbs''''' ==


 
<span class="mw-headline" id="structure array dbs"></span>
<span class="mw-headline" id="structure array dbs">'''Data Base Spectrum''' (C.dbs):</span>
<br>
 
A [[Create database spectra|database spectrum]] is usually created from many (>3) individual mass spectra. Like in regular experimental spectra, spectral data and metadata of average spectra are stored in specific fields of structure array ''spec''. In database spectra the field ''spec(i).dbs'' is used to store relevant data from experimental source spectra from which the given database spectrum has been derived. These fields are left empty in experimental and average spectra. Details of the structure of ''spec.dbs'' are given in the table below.  
A database spectrum is usually created from many (>3) individual mass spectra. The structure array ''C.dbs'' contains information (metadata, peak tables) on the mass spectra used to produce the given database spectrum. Details of the structure of ''C.dbs'' are given in the table below.


{| class="wikitable" width=1100
{| class="wikitable" width=1100
Line 270: Line 288:
|-
|-
| mem
| mem
| defines whether the current spectrum is a data base spectrum (1) or not (0)
| specifies whether the current spectrum is a data base spectrum (1) or not (0)
| char array
| char array
| rowspan="5" style="background: #ffffff;" valign="top" |[[File:Array-spec-dbs.jpg|250px|thumb|center|Matlab screenshot - format of structure array C.dbs. C(1,17).dbs(1,1) contains information of mass spectrum #1 which was used with others to obtain data base spectrum #17, such as the id, taxonomic information, peak tables and the respective peak detection parameters).]]
| rowspan="5" style="background: #ffffff;" valign="top" |[[File:dbs-format-spec-struc.png|250px|thumb|center|Screenshot of structure array ''spec.dbs''. This screenshot shows information like the spectrum id, taxonomic information, peak tables, respective peak detection parameters, etc of mass spectrum #1 [spec(1).dbs(1,1)] that was used with others to obtain an average spectrum]]
|-
|-
| ids
| ids
| id of the individual mass spectrum used to create the data base spectrum
| id of the individual mass spectrum that contributed to the given database spectrum
| char array
| char array
|-
|-
| tax
| tax
| taxonomic info of the source spectrum
| contains taxonomical information (i.e. the genus, species, strain information)
| char array
| char array
|-
|-
| pik
| pik
| peak table of the source spectrum  
| peak table of the given source spectrum  
| float32
| float32
|-
|-
| prm
| prm
| parameters of peak detection  
| parameters used for peak detection  
| char array
| char array
|}
|}




<span class="mw-headline" id="structure array avr">'''Average Spectrum''' (C.avr):</span>
== Average spectra - '''''spec.avr''''' ==


An average spectrum is usually created from many (>3) individual mass spectra. The structure array ''C.avr'' contains information (metadata, peak tables) on the mass spectra used to produce the given avarage spectrum. Details of the structure of ''C.avr'' are given in the table below.
<span class="mw-headline" id="structure array avr"></span>
<br>
An [[Averaging Mass Spectra|average spectrum]] is usually created from many (>3) individual mass spectra. Like in regular experimental spectra, spectral data and metadata of average spectra are stored in specific fields of structure array ''spec''. In average spectra the field ''spec(i).avr'' is used to store relevant data from experimental source spectra from which the given average spectrum has been derived. These fields are empty in experimental and database spectra. Details of the structure of ''spec.avr'' are given in the table below.  


{| class="wikitable" width=1100
{| class="wikitable" width=1100
Line 303: Line 323:
|-
|-
| mem
| mem
| defines whether the current spectrum is an average spectrum (1) or not (0)
| specifies whether the contributing spectrum is an average spectrum (1) or not (0)
| char array
| char array
| rowspan="5" style="background: #ffffff;" valign="top" |[[File:Array-spec-avr.jpg|250px|thumb|center|Matlab screenshot - format of structure array C.avr. spec(1,18).avr(1,1) contains information of mass spectrum #1 which was used with others to obtain an average spectrum #18, such as the id, taxonomic information, peak tables and the respective peak detection parameters).]]
| rowspan="6" style="background: #ffffff;" valign="top" |[[File:avr-format-spec-struc.png|250px|thumb|center|Screenshot of structure array ''spec.avr''. This screenshot shows information like the spectrum id, taxonomic information, peak tables, respective peak detection parameters, etc of mass spectrum #1 [spec(1).avr(1,1)] that was used with others to obtain an average spectrum]]
|-
|-
| ids
| ids
| id of the individual mass spectrum used to create the avarage spectrum
| id of the individual mass spectrum that contributed to the given average spectrum
| char array
| char array
|-
|-
| tax
| tax
| taxonomic info of the source spectrum
| contains taxonomical information (i.e. the genus, species, strain information)
| char array
| char array
|-
|-
| pik
| pik
| peak table of the source spectrum  
| peak table of the given source spectrum  
| float32
| float32
|-
|-
| prm
| prm
| parameters of peak detection  
| parameters used for peak detection  
| char array
| char array
|}
== Quality test results - '''''spec.qt''''' ==
<span class="mw-headline" id="structure array qt"></span>
<br>
The structure array ''spec.qt'' contains the results of a [[MALDI Quality Tests|Quality Test]]. Fields of this structure are empty if no QT has been performed. Details of the structure of ''spec.qt'' are given in the table below.
<br>
{| class="wikitable" width=1100
!width=100| Fields
!width=600| Description
!width=100| Type
!width=300|
|-
| noise
| QT data of the ''noise'' test, contains fields ''abs'', ''rnk'', and ''obj''
| struc array
| rowspan="7" style="background: #ffffff;" valign="top" |[[File:qt-format-spec-struc.png|250px|thumb|center|Screenshot of structure array ''spec.qt'' that contains the results of a quality test (QT).]]
|-
| basln
| QT data of the ''baseline'' test, contains fields ''abs'', ''rnk'', and ''obj''
| struc array
|-
| npiks
| QT data of the test ''number of peaks'', contains fields ''abs'', ''rnk'', and ''obj''
| struc array
|-
| respw
| QT data of the test ''resolution power'', contains fields ''abs'', ''rnk'', and ''obj''
| struc array
|-
| rnk
| overall rank that the given spectrum has achieved in a QT with a number of other spectra
| float32
|-
| res
| overall quality test score
| float32
|}
|}

Revision as of 17:25, 15 December 2024

Peak list files combine multiple peak lists in one single file. These files are stored in a Matlab™ specific data format and contain the peak lists as well as the respective metadata. Peak list files can be loaded by entering the following command at the Matlab command prompt:

>> load('ecoli-peaklist-oct16.pkf','-mat')

This command will open ecoli-peaklist-oct16.pkf, an example peak list file consisting of 16 individual peak lists from spectra of five different strains of E. coli. The file ecoli-peaklist-oct16.pkf can be downloaded here. If loading was successful, you will have access to a new Matlab variable C (structure array). Details of the structure of C are described next.
 


Fields of the structure array - spec


Fields Description Data type
nam spectra id char array
Screenshot showing the content of the structure array spec that is stored in so called spectrum multi files (*.muf). Fields of spec contain spectral data (original, i.e. unmodified, and pre-processed), spectrum metadata as well as peak lists, calibration information, results of quality tests, and information collected during creation of average, or database spectra. In the example of the given screenshot, the content of one database spectrum is depicted.
gen genus information char array
spe species info char array
str strain info char array
typ type char array
uid taxonomy identification number for species as used by the NCBI (see [1]), can be modified char array
uie unmodified taxonomy identification number for strains used by the NCBI (see [2]) char array
gti cultivation conditions: growth time char array
tem cultivation conditions: cultivation temperature char array
air cultivation conditions: cultivation under aerobic or anaerobic conditions char array
med cultivation conditions: cultivation medium char array
spo spore formers (Yes or No) char array
con sample concentration char array
trt sample treatment char array
ext extra information char array
las laser parameters (power, spot diameter, frequency, etc.) char array
cal calibration info char array
met measurement method char array
cus customer info char array
tim date and time of measurement char array
pth path to spectrum char array
cls class assignment (valid values are 0,1,2,3 and 4) float32
lst formatted text containing the peak table info char array
seq sequence of pre-processing steps char array
smo the number of smoothing points (Savitzky-Golay smoothing) char array
bas number of intervals used for baseline correction float32
nrm normalization parameter (Yes:1, No:0) float32
clb calibration parameters (not used) float32
red data reduction factor (spectral binning) char array
cut cut in the spectral domain, m/z range char array
tmp temporary info (not always present) char array
mod original data modified by cut spectra or reduce resolution (Yes:1, No:0) float32
lms MALDI-TOF MS, or LC-MS1 data? (0: MALDI, 1: LC-MS1) float32
pik peak table, an array of the dimension [4 x npeaks] or [6 x npeaks], where npeaks denotes the number of peaks float 32
ccl calibration information (see below) struc array
avr average spectrum (Yes:1, No:0) struc array
dbs data base spectrum (Yes:1, No:0) struc array
prm parameters of peak detection char array
qt quality test parameter struc array


Peak table format - spec.pik


Fields Description
spec.pik(1,:)
 
m/z positions of the peaks in the peak table
 
spec.pik(2,:)
 
absolute intensities of these peaks
 
spec.pik(3,:)
 
weighting factors (the sum of these factors equals 100)
 
spec.pik(4,:)
 
in case of single spectra, i.e. no database or average spectra: baseline-corrected absolute intensities of the peaks, in case of average or database spectra: the relative peak frequency
spec.pik(5,:)
 
FWHH of the given peak (requires QT)
 
spec.pik(6,:)
 
resolving power of the given peak (requires QT)
 


Calibration information - spec.ccl


Fields Description Type
cl1 calibration constant 1 float32
Screenshot showing the content of the structure array spec.ccl containing the calibration info (calibration constants, delay time, number of spectrum data points, etc.)
cl2 calibration constant 2 float32
cl3 calibration constant 3 float32
del delay time [ns] float32
npt number of data points float32
res time resolution [ns] float32
ncl calibration info required to store the spectrum in a Bruker-specific data format char array
ncr calibration info required to store the spectrum in a Bruker-specific data format char array
bid hardware id of the spectrum ('Bruker ID') char array
mid MicrobeMS id of the spectrum char array
org manufacturer info char array
tfu 'ToF user' char array
spm not used char array
stp type of measurement (should be 'TOF') char array
acq further acquisition info char array


Database spectra - spec.dbs


A database spectrum is usually created from many (>3) individual mass spectra. Like in regular experimental spectra, spectral data and metadata of average spectra are stored in specific fields of structure array spec. In database spectra the field spec(i).dbs is used to store relevant data from experimental source spectra from which the given database spectrum has been derived. These fields are left empty in experimental and average spectra. Details of the structure of spec.dbs are given in the table below.

Fields Description Type
mem specifies whether the current spectrum is a data base spectrum (1) or not (0) char array
Screenshot of structure array spec.dbs. This screenshot shows information like the spectrum id, taxonomic information, peak tables, respective peak detection parameters, etc of mass spectrum #1 [spec(1).dbs(1,1)] that was used with others to obtain an average spectrum
ids id of the individual mass spectrum that contributed to the given database spectrum char array
tax contains taxonomical information (i.e. the genus, species, strain information) char array
pik peak table of the given source spectrum float32
prm parameters used for peak detection char array


Average spectra - spec.avr


An average spectrum is usually created from many (>3) individual mass spectra. Like in regular experimental spectra, spectral data and metadata of average spectra are stored in specific fields of structure array spec. In average spectra the field spec(i).avr is used to store relevant data from experimental source spectra from which the given average spectrum has been derived. These fields are empty in experimental and database spectra. Details of the structure of spec.avr are given in the table below.

Fields Description Type
mem specifies whether the contributing spectrum is an average spectrum (1) or not (0) char array
Screenshot of structure array spec.avr. This screenshot shows information like the spectrum id, taxonomic information, peak tables, respective peak detection parameters, etc of mass spectrum #1 [spec(1).avr(1,1)] that was used with others to obtain an average spectrum
ids id of the individual mass spectrum that contributed to the given average spectrum char array
tax contains taxonomical information (i.e. the genus, species, strain information) char array
pik peak table of the given source spectrum float32
prm parameters used for peak detection char array


Quality test results - spec.qt


The structure array spec.qt contains the results of a Quality Test. Fields of this structure are empty if no QT has been performed. Details of the structure of spec.qt are given in the table below.

Fields Description Type
noise QT data of the noise test, contains fields abs, rnk, and obj struc array
Screenshot of structure array spec.qt that contains the results of a quality test (QT).
basln QT data of the baseline test, contains fields abs, rnk, and obj struc array
npiks QT data of the test number of peaks, contains fields abs, rnk, and obj struc array
respw QT data of the test resolution power, contains fields abs, rnk, and obj struc array
rnk overall rank that the given spectrum has achieved in a QT with a number of other spectra float32
res overall quality test score float32