Introduction

Two hyperspectal image data formats are supported, corresponding to different instrument types: Secondary Ion Mass Spectrometry (SIMS) and MALDI. SIMS data are supported in the format produced by Ion-TOF instruments, and MALDI the ImzML XML format.

A class is provided for each instrument type (nag::SIMSpecImage and nag::MALDISpecImage) to encapsulate an \(m\) by \(n\) image matrix along with pre-processing functions particular to the instrument types. The \(m\) rows of the image matrix represent pixels in the image and the \(n\) columns mass channels.

nag::SIMSpecImage and nag::MALDISpecImage are subclasses of nag::SpecImage. This base class may be used to represent any hyperspectral image matrix, and provides the analysis functionality common to the two instrument types. It also provides the functionality to sum contiguous mass channels specified by a list of integration limits in order to reduce the number of columns in the matrix prior to analysis. This functionality is called peak selection. Accessor functions are also provided by the base class to retrieve a copy of the matrix, mass channel m/z values and image dimensions.

See the nag::SpecImage, nag::SIMSpecImage and nag::MALDISpecImage pages for complete listings of the available functions.

In addition to the classes above, the class nag::SpecImageFactory is available, which can generate pointers to a new SpecImage from a configuration file.

Environment variables and setup

The following environment variables and setup steps need to be completed:

PATH environment variable should point to the folders containing the SpecImage and NAGNPL libraries.
PCA_PARAMS_PATH should initially be set to the 'bin' directory. Once this is done, run the auto-tuning program 'bin/pca_tune_dp.exe'. This should generate a file called 'pca_crossover.dat'. If necessary move this output file to another location, but if you do this then update PCA_PARAMS_PATH to point to the directory in which 'pca_crossover.dat' is now located. This step only needs to be done once for a given machine and if you have aready performed it for the MATLAB installation then there is no need to repeat it.
TMPDIR should set to a directory in which to dump temporary files during out-of-core memory-limited processing (i.e. if you specify memory = xxx in a config file). It is best if this directory resides on a local disk.
NO_PCA_SWITCH can be set (to anything), and if set disables the switching from sparse to dense storage when performing PCA.
IMZML_NO_VALIDATE can be set (to anything), and if set will instruct the library to not validate ImzML input XML file against the Schema. This has been requested since a lot of IMZML files in the wild have minor deviations from the Schema which do not impact on correct parsing. ** WARNING: setting this may hide a more serious error in the ImzML file which could cause a crash. Therefore the behaviour of the classes with this environment variable set may be undefined.
OMP_NUM_THREADS should be set to the number of physical cores available in your machine.
KMP_AFFINITYmay be set; suggested setting: granularity=thread,scatter. See the Intel compiler manual for more information about this.

Example outline

An example workflow is presented below. The example uses the nag::SpecImageFactory::NewSpecImage interface to generate a SpecImage pointer from a configuration file. The example uses polymorphism: the type of the pointer returned by the factory function is nag::SpecImage (the base class), and we use the nag::SpecImage::getSource function to determine whether or not the underlying type is nag::SIMSpecImage. If it is then we call some functions which are specific to that type. Throughout the rest of the example the base class type is used to apply peak selection, query the image and and perform analyses.

The example also shows another way to generate a nag::SpecImage by construction from a dense matrix (the PCA scores). Note that it is also possible (but not shown in the example) to construct a SpecImage from a sparse matrix and to construct nag::SIMSpecImage and nag::MALDISpecImage objects from data files directly (i.e. without using a configuration file).

The compilation instructions for the example are given in the comments at the top of the example code.

Example code

#include <algorithm>
#include <iostream>
#include <omp.h>
#include "SpecImage.hpp"
#include "SpecImageFactory.hpp"
#include "SIMSpecImage.hpp"
#include "MALDISpecImage.hpp"
// How to compile and link with the MSVC compiler 2013 and above on **Windows** for **double** precision:
// cl /openmp /DFP_REAL_T=double /EHsc /I <path to header files> /c example.cpp
// cl /Feexample.exe example.obj /openmp /DFP_REAL_T=double <path to libnagspec_dp.dll>\libnagspec_dp.lib <path to libnagnpl_dp.dll>\libnagnpl_dp.lib
// How to compile and link with the MSVC compiler 2013 and above on **Windows** for **single** precision:
// cl /openmp /DFP_REAL_T=float /EHsc /I <path to header files>. /c example.cpp
// cl /Feexample.exe example.obj /openmp /DFP_REAL_T=float <path to libnagspec_sp.dll>\libnagspec_sp.lib <path to libnagnpl_sp.dll>\libnagnpl_sp.lib
// When using the MSVC compiler, be sure to use the x64 tools (VS2013 x64 Cross Tools Command Prompt); linking will not work using the x86 tools.
// How to compile and link with the Intel compiler on **Windows** for **double** precision:
// icl /openmp /Qstd=c++11 -DFP_REAL_T=double /I <path to header files> /c example.cpp
// icl /o example.exe example.obj /openmp /Qstd=c++11 /DFP_REAL_T=double <path to libnagspec_dp.dll>\libnagspec_dp.lib <path to libnagnpl_dp.dll>\libnagnpl_dp.lib
// How to compile and link with the Intel compiler on **Windows** for **single** precision:
// icl /openmp /Qstd=c++11 -DFP_REAL_T=float /I <path to header files> /c example.cpp
// icl /o example.exe example.obj /openmp /Qstd=c++11 /DFP_REAL_T=float <path to libnagspec_sp.dll>\libnagspec_sp.lib <path to libnagnpl_sp.dll>\libnagnpl_sp.lib
// When using the Intel compiler, be sure to use the Intel 64 Visual Studio Command Prompt.
// How to run the example on **Windows** in a command prompt:
// example.exe <path to config file><name of config file>
int main(int argc, const char **argv)
{
        if (argc < 2)
        {
                std::cerr << "Usage: enter a config file name" << std::endl;
                return 1;
        }
        const char *file = argv[1];
        try {
                // Use the factory interface to return a base class pointer
                std::shared_ptr<nag::SpecImage<FP_REAL_T> > fact_img(nag::SpecImageFactory<FP_REAL_T>::NewSpecImage(file));
        
                std::cout << "Matrix is " << fact_img->getM() << " by " << fact_img->getN() << std::endl;
                // If the underlying type is SIMS perform advanced dead time correction and scaling.
                if (fact_img->getSource() == nag::source_t::SIMS) {
                        std::shared_ptr<nag::SIMSpecImage<FP_REAL_T> > 
                                sims_img(std::dynamic_pointer_cast<nag::SIMSpecImage<FP_REAL_T> >(fact_img));
                        sims_img->advancedDeadTimeCorrection();
        
                        sims_img->scaleMatrix();
                        std::cout << "SF = " << sims_img->getSF() << std::endl;
                        std::cout << "K0 = " << sims_img->getK0() << std::endl;
                }
                std::cout << "Image has dimensions (" << fact_img->getXDim() <<
                        ", " << fact_img->getYDim() <<
                        ", " << fact_img->getZDim() << ")" << std::endl;
                std::vector<FP_REAL_T> mzvals = fact_img->getMZVals();
                std::cout << "Minimum m/z value is " << mzvals.front() << std::endl;
                std::cout << "Maximum m/z value is " << mzvals.back() << std::endl;
                // Set some scales to use for CWT
                int nscales = 6;
                std::vector<uint32_t> scales = std::vector<uint32_t>(nscales);
                uint32_t j = 2;
                for (auto i = scales.begin(); i != scales.end(); i++) {
                         *i = j;
                        j *= 2;
                }
                // Select some pixels to use for CWT
                int npix = std::min(size_t(100), fact_img->getM()/16);
                std::vector<size_t> pixels = std::vector<size_t>(npix);
                j = 0;
                for (auto i = pixels.begin(); i != pixels.end(); i++) {
                         *i = j++;
                }
                // Perform CWT using the Mexican hat wavelet and periodic end extension
                //
                //   Note that if we are using limited memory processing this call with throw an
                //   error because we have not yet performed peak selection
                double t1 = omp_get_wtime();
                nag::cwt_results_t<FP_REAL_T> cwt_res = fact_img->cwt(scales, pixels, 
                                nag::wavelet_t::MexicanHat, nag::wavelet_extension_t::Periodic);
                std::cout << "CWT time for " << nscales << " scales and " << npix << " pixels = " << 
                        omp_get_wtime() - t1 << " secs." << std::endl;
                // Apply peak selection
                t1 = omp_get_wtime();
                fact_img->peakSelect();
                std::cout << "Peak selection time = " << omp_get_wtime() - t1 << std::endl;
                mzvals = fact_img->getMZVals();
                std::cout << "After peak selection: ";
                std::cout << "Minimum m/z value is " << mzvals.front() << std::endl;
                std::cout << "After peak selection: ";
                std::cout << "Maximum m/z value is " << mzvals.back() << std::endl;
                // Perform PCA
                int npcs = 15;
                t1 = omp_get_wtime();
                nag::pca_results_t<FP_REAL_T> pca_res = fact_img->pca(npcs, nag::pca_mat_t::Covariance);
                std::cout << "PCA time for " << npcs << " components  = " << omp_get_wtime() - t1 << " secs." << std::endl;
                // Print the variance contributions
                j=0;
                for (auto i = pca_res.eigenvalues.begin(); i != pca_res.eigenvalues.end(); ++i)
                {
                        FP_REAL_T contrib = *i/pca_res.pca_tot;
                        std::cout << "PC " << ++j << " contribution = " << contrib << 
                                " (" << *i << "/" << pca_res.pca_tot << ")" << std::endl;
                }
                // Perform NMF
                int k = 10;
                t1 = omp_get_wtime();
                nag::nmf_results_t<FP_REAL_T> nmf_res = fact_img->nmf(k);
                std::cout << "Time to perform NMF with k = " << k << " = " << omp_get_wtime() - t1 << " secs." << std::endl;
                // Perform t-SNE
                k = 2;
                t1 = omp_get_wtime();
                FP_REAL_T perplexity = 100, theta = 0.5;
                int max_iter = 5;
                nag::tsne_results_t<FP_REAL_T> tsne_res = fact_img->tsne(k, &pca_res, NULL, perplexity, theta, max_iter);
                std::cout << "Time to perform t-SNE with k = " << k << " = " << omp_get_wtime() - t1 << " secs." << std::endl;
                
                // Create a new image from the PCA scores
                std::shared_ptr<nag::SpecImage<FP_REAL_T> > scores_img(new nag::SpecImage<FP_REAL_T>(fact_img->getM(), size_t(npcs), 
                        (const FP_REAL_T *)&pca_res.scores[0]));
                k = 4;
                t1 = omp_get_wtime();
                nag::kmeans_results_t<FP_REAL_T> kmeans_res = scores_img->kmeans(k);
                std::cout << "Time to perform k-means with k = " << k << " = " << omp_get_wtime() - t1 << " secs." << std::endl;
                j = 0;
                for (auto i = kmeans_res.nic.begin(); i != kmeans_res.nic.end(); ++i)
                        std::cout << "Cluster " << ++j << " contains " << *i << " pixels" << std::endl;
        }
        catch (nag::Error e)
        {
                std::cerr << e.message << std::endl;
                return 1;
        }
        return 0;
}