Short Read ========= .. _Short Read: .. figure:: _static/subpipelines/shortread.jpeg :width: 600 The short read pipeline analyzes unassembled sequencing data to generate a number of useful scientific results including taxonomic profiles and functional profiles. This pipeline runs on data that has been preprocessed using the preprocessing pipeline. Modules ^^^^^^^ Kraken2 Taxonomic Profiling --------------------------- This module generates taxonomic profiles for each sample using `Kraken2 `_. Taxonomic profiles record how many reads in a sample can be confidently assigned to a set of microbial taxa. These profiles can be fed into downstream techniques to infer the relative abundance of different species in a sample or set of samples. For most microbiome studies this information underpins the rest of the results. A number of different taxonomic profilers exist. Kraken2 is used because it is relatively resource efficient and `performs well on most accuracy benchmarks `_. In the CAP Kraken2 is used with a large database containg all microbial genomes from RefSeq. Kraken2 produces two files. A report file that summarizes the number of reads assigned to each taxa (and some diagnostic metrics) and a read assignment file which details what clade each read mapped to. An example of the output files from this module may be found on `Pangea `_. .. autoclass:: cap2.pipeline.short_read.Kraken2 Functional Profiling -------------------- This module identifies the abundance of microbial metabolic pathways using `HUMAnN `_. These profiles are called functional profiles and are generally used for inferring what metabolites a microbiome can process and produce. The module works by first aligning reads to `UniRef90 `_ using `Diamond `_ then by processing the resulting reads with HUMAnN. Diamond produces one file as output: an M8 format blast tabular file. An example of the output file from Diamond may be found on `Pangea `_. .. autoclass:: cap2.pipeline.short_read.MicaUniref90 .. autoclass:: cap2.pipeline.short_read.Humann2 MASH Sketching -------------- `MASH `_. generates small sketches of sequencing data that can be used to quickly identify similar samples in an unbiased way. Mash sketches are based on finding a pre-set number of minimized kmers in a sample and finding the overlapping minimizers between two samples. This module produces two output files: a small MASH sketch with 10,000 minimized k-mers and a large sketch with 10,000,000 minimized k-mers. An example of the output files from this module may be found on `Pangea `_. .. autoclass:: cap2.pipeline.short_read.Mash Jellyfish K-mer Counting ------------------------ `Jellyfish `_. counts the number of times each k-mer occurs in a sample to produce a k-mer profile. These profiles are useful to compare samples in an unbiased way and to search for particular sequences. By default the CAP counts canonical 31-mers and 15-mers. All k-mers, including singletons, are counted. K-mer counting occurs after :ref:`error correction` which substantially reduces the number of singletons compared to raw data. Jellyfish produces two output files, both jellyfish archives (a custom format supported by jellyfish), one for 15-mers and one for 31-mers. An example of the output files from this module may be found on `Pangea `_. .. autoclass:: cap2.pipeline.short_read.Jellyfish