Introduction
Two hyperspectal image data formats are supported, corresponding to different instrument types: Secondary Ion Mass Spectrometry (SIMS) and MALDI. SIMS data are supported in the format produced by Ion-TOF instruments, and MALDI the ImzML XML format.
A class is provided for each instrument type (nag::SIMSpecImage and nag::MALDISpecImage) to encapsulate an \(m\) by \(n\) image matrix along with pre-processing functions particular to the instrument types. The \(m\) rows of the image matrix represent pixels in the image and the \(n\) columns mass channels.
nag::SIMSpecImage and nag::MALDISpecImage are subclasses of nag::SpecImage. This base class may be used to represent any hyperspectral image matrix, and provides the analysis functionality common to the two instrument types. It also provides the functionality to sum contiguous mass channels specified by a list of integration limits in order to reduce the number of columns in the matrix prior to analysis. This functionality is called peak selection. Accessor functions are also provided by the base class to retrieve a copy of the matrix, mass channel m/z values and image dimensions.
See the nag::SpecImage, nag::SIMSpecImage and nag::MALDISpecImage pages for complete listings of the available functions.
In addition to the classes above, the class nag::SpecImageFactory is available, which can generate pointers to a new SpecImage from a configuration file.
Environment variables and setup
The following environment variables and setup steps need to be completed:
-
PATH environment variable should point to the folders containing the SpecImage and NAGNPL libraries.
-
PCA_PARAMS_PATH should initially be set to the 'bin' directory. Once this is done, run the auto-tuning program 'bin/pca_tune_dp.exe'. This should generate a file called 'pca_crossover.dat'. If necessary move this output file to another location, but if you do this then update PCA_PARAMS_PATH to point to the directory in which 'pca_crossover.dat' is now located. This step only needs to be done once for a given machine and if you have aready performed it for the MATLAB installation then there is no need to repeat it.
-
TMPDIR should set to a directory in which to dump temporary files during out-of-core memory-limited processing (i.e. if you specify memory = xxx in a config file). It is best if this directory resides on a local disk.
-
NO_PCA_SWITCH can be set (to anything), and if set disables the switching from sparse to dense storage when performing PCA.
-
IMZML_NO_VALIDATE can be set (to anything), and if set will instruct the library to not validate ImzML input XML file against the Schema. This has been requested since a lot of IMZML files in the wild have minor deviations from the Schema which do not impact on correct parsing. ** WARNING: setting this may hide a more serious error in the ImzML file which could cause a crash. Therefore the behaviour of the classes with this environment variable set may be undefined.
-
OMP_NUM_THREADS should be set to the number of physical cores available in your machine.
-
KMP_AFFINITYmay be set; suggested setting:
granularity=thread,scatter
. See the Intel compiler manual for more information about this.
Example outline
An example workflow is presented below. The example uses the nag::SpecImageFactory::NewSpecImage interface to generate a SpecImage pointer from a configuration file. The example uses polymorphism: the type of the pointer returned by the factory function is nag::SpecImage (the base class), and we use the nag::SpecImage::getSource function to determine whether or not the underlying type is nag::SIMSpecImage. If it is then we call some functions which are specific to that type. Throughout the rest of the example the base class type is used to apply peak selection, query the image and and perform analyses.
The example also shows another way to generate a nag::SpecImage by construction from a dense matrix (the PCA scores). Note that it is also possible (but not shown in the example) to construct a SpecImage from a sparse matrix and to construct nag::SIMSpecImage and nag::MALDISpecImage objects from data files directly (i.e. without using a configuration file).
The compilation instructions for the example are given in the comments at the top of the example code.
Example code
#include <algorithm>
#include <iostream>
#include <omp.h>
int main(int argc, const char **argv)
{
if (argc < 2)
{
std::cerr << "Usage: enter a config file name" << std::endl;
return 1;
}
const char *file = argv[1];
try {
std::cout << "Matrix is " << fact_img->getM() << " by " << fact_img->getN() << std::endl;
std::shared_ptr<nag::SIMSpecImage<FP_REAL_T> >
sims_img->advancedDeadTimeCorrection();
sims_img->scaleMatrix();
std::cout << "SF = " << sims_img->getSF() << std::endl;
std::cout << "K0 = " << sims_img->getK0() << std::endl;
}
std::cout << "Image has dimensions (" << fact_img->getXDim() <<
", " << fact_img->getYDim() <<
", " << fact_img->getZDim() << ")" << std::endl;
std::vector<FP_REAL_T> mzvals = fact_img->getMZVals();
std::cout << "Minimum m/z value is " << mzvals.front() << std::endl;
std::cout << "Maximum m/z value is " << mzvals.back() << std::endl;
int nscales = 6;
std::vector<uint32_t> scales = std::vector<uint32_t>(nscales);
uint32_t j = 2;
for (auto i = scales.begin(); i != scales.end(); i++) {
*i = j;
j *= 2;
}
int npix = std::min(size_t(100), fact_img->getM()/16);
std::vector<size_t> pixels = std::vector<size_t>(npix);
j = 0;
for (auto i = pixels.begin(); i != pixels.end(); i++) {
*i = j++;
}
double t1 = omp_get_wtime();
std::cout << "CWT time for " << nscales << " scales and " << npix << " pixels = " <<
omp_get_wtime() - t1 << " secs." << std::endl;
t1 = omp_get_wtime();
fact_img->peakSelect();
std::cout << "Peak selection time = " << omp_get_wtime() - t1 << std::endl;
mzvals = fact_img->getMZVals();
std::cout << "After peak selection: ";
std::cout << "Minimum m/z value is " << mzvals.front() << std::endl;
std::cout << "After peak selection: ";
std::cout << "Maximum m/z value is " << mzvals.back() << std::endl;
int npcs = 15;
t1 = omp_get_wtime();
std::cout << "PCA time for " << npcs << " components = " << omp_get_wtime() - t1 << " secs." << std::endl;
j=0;
{
FP_REAL_T contrib = *i/pca_res.
pca_tot;
std::cout << "PC " << ++j << " contribution = " << contrib <<
" (" << *i <<
"/" << pca_res.
pca_tot <<
")" << std::endl;
}
int k = 10;
t1 = omp_get_wtime();
std::cout << "Time to perform NMF with k = " << k << " = " << omp_get_wtime() - t1 << " secs." << std::endl;
k = 2;
t1 = omp_get_wtime();
FP_REAL_T perplexity = 100, theta = 0.5;
int max_iter = 5;
std::cout << "Time to perform t-SNE with k = " << k << " = " << omp_get_wtime() - t1 << " secs." << std::endl;
(
const FP_REAL_T *)&pca_res.
scores[0]));
k = 4;
t1 = omp_get_wtime();
std::cout << "Time to perform k-means with k = " << k << " = " << omp_get_wtime() - t1 << " secs." << std::endl;
j = 0;
for (
auto i = kmeans_res.
nic.begin(); i != kmeans_res.
nic.end(); ++i)
std::cout << "Cluster " << ++j << " contains " << *i << " pixels" << std::endl;
}
{
std::cerr << e.
message << std::endl;
return 1;
}
return 0;
}