API

Running the Pipeline and High Level API

cap2.api.run_db_stage(config_path='', cores=1, **kwargs)[source]

Run the database stage of the pipeline.

cap2.api.run_modules(samples, modules, group_modules=[], config_path='', cores=1, workers=1, **kwargs)[source]

Run a set of modules for a list of samples.

cap2.api.run_stage(samples, stage_name, config_path='', cores=1, workers=1, **kwargs)[source]

Run a subpipeline on a list of samples. stage_name can be one of qc, pre, reads.

class cap2.sample.Sample(sample_name, read1, read2=None, kind='short_read')[source]

Thin data collector that represents a sample.

as_tuple()[source]

Return a 3-ple of strigns with (sample_name, read_1_path, read_2_path).

paired

Return True iff this sample contains paired end data.

classmethod samples_from_manifest(manifest)[source]

Return a list of samples from a manifest file handle.

Generic Modules and CAPTasks

class cap2.pipeline.utils.cap_task.BaseCapTask(*args, **kwargs)[source]
MODULE_VERSION = None
check_versions = <luigi.parameter.BoolParameter object>
config_filename = <luigi.parameter.Parameter object>
cores = <luigi.parameter.IntParameter object>
classmethod dependencies()[source]

Return a list of modules this module depends on.

Modules are either other BaseCapTask classes or strings.

get_run_metadata()[source]

Return a dict with metadata about a completed run of this task.

get_run_metadata_filepath()[source]

Return a local filepath with metadata about a completed run of this task.

is_type_of_cap_task(cap_task_type)[source]

Return True iff self is of cap_task_type.

This method makes it easy for duck typed CAP Tasks to spoof their type as another CAP Task. i.e. PangeaCapTasks

max_ram = <luigi.parameter.IntParameter object>
module_description = 'No description for this module.'
classmethod module_name()[source]

Return a string giving the human readable name for this task.

run()[source]

Run an instance of this task.

run_cmd(cmd)[source]
classmethod short_version_hash()[source]

Return a 12 character hash string giving the version of this task and all upstream tasks.

tool_version()[source]

Return a string giving a version for any software used.

classmethod version()[source]

Return a string giving a human readable version for this module only.

classmethod version_hash()[source]

Return a hash string giving the version of this task and all upstream tasks.

classmethod version_tree(terminal=True)[source]

Return a newick tree with versions.

class cap2.pipeline.utils.cap_task.CapTask(*args, **kwargs)[source]

Base class for CAP2 tasks.

Currently implements some basic shared logic.

data_type = <luigi.parameter.Parameter object>
classmethod from_cap_task(other, **kwargs)[source]
classmethod from_sample(sample, config_path, cores=1)[source]

Return an instance of this module from a Sample.

get_target(field_name, ext)[source]

Return a luigi LocalTarget for this instance.

paired

Return true iff this instance is processing paired end data.

pe1 = <luigi.parameter.Parameter object>
pe2 = <luigi.parameter.Parameter object>
sample_name = <luigi.parameter.Parameter object>
version_exists(version_str, version_hash)[source]

Return True iff this verion of this module already exists.

Configuration

class cap2.pipeline.config.PipelineConfig(filename)[source]
DB_MODE_BUILD = 'build'
DB_MODE_DOWNLOAD = 'download'
allowed_versions(module)[source]

Return a list of the allowed versions for the specified module.