CAPalyzer

The CAPalyzer is a package to parse and process the results produced by the main CAP pipeline. It performs two key tasks:

  • builds data tables that summarize the results of modules for multiple samples
  • parses those data tables and provides processing tools

API

Table Builder

class cap2.capalyzer.table_builder.cap_table_builder.CAPFileSource[source]

This abstract class provides an interface to get filepaths and other raw data.

metadata()[source]

Return a DataFrame containing metadata for the samples.

module_files(module_name, field_name)[source]

Return an iterable 2-ples of (sample_name, local_path) for modules of specified type.

sample_names()[source]

Return a list of sample names (strings).

class cap2.capalyzer.table_builder.CAPTableBuilder(name, file_source)[source]

This class builds summary tables for a set of samples.

covid_fast_detect_read_counts()[source]

Return a table of read counts by taxa from covid fast detect.

fast_taxa_read_counts()[source]

Return a table of read counts by taxa from fast taxa.

metadata()[source]

Return a metadata table for these samples.

sample_names()[source]

Return a list of sample names (strings).

strain_pileup(organism, sparse=1)[source]

Return a table of pileups for a strain.

taxa_read_counts()[source]

Return a table of read counts by taxa.

cap2.capalyzer.table_builder.parsers.parse_pileup(local_path, sparse=1)[source]

Return a pandas dataframe with info from a pileup file.

sparse is an int >= 1 if sparse is > 1 values will be averaged making the table more smaller.

cap2.capalyzer.table_builder.parsers.parse_taxa_report(local_path, **kwargs)[source]