Documentation

Command line interface

panct

panct: A collection of tools for working with pangenomes

panct [OPTIONS] COMMAND [ARGS]...

Options

-v, --version

Show the application’s version and exit.

Default:: False

--install-completion: Install completion for the current shell.

--show-completion: Show completion for the current shell, to copy it or customize the installation.

complexity

Compute complexity scores

panct complexity [OPTIONS] GRAPH

Options

--region <region>

A region in which to compute complexity, or a BED file of regions

Default:: ''

--metrics <metrics>

Comma-separated list of which complexity metrics to compute. Options: sequniq-normwalk,sequniq-normnode

Default:: 'sequniq-normwalk'

-r, --reference <reference>

The ID of the reference sequence in the GFA file

Default:: 'GRCh38'

-o, --out <output_file>

Name of output file

Default:: PosixPath('/dev/stdout')

-v, --verbosity <verbosity>

The level of verbosity desired

Default:: <Verbosity.info: 'INFO'>
Options:: CRITICAL | ERROR | WARNING | INFO | DEBUG | NOTSET

Arguments

GRAPH

Required argument

Path to the .gfa or .gbz file of a pangenome graph

walks

Extract walks to a file

panct walks [OPTIONS] GRAPH

Options

-o, --out <output_file>: Name of output file

-v, --verbosity <verbosity>

The level of verbosity desired

Default:: <Verbosity.info: 'INFO'>
Options:: CRITICAL | ERROR | WARNING | INFO | DEBUG | NOTSET

Arguments

GRAPH

Required argument

Path to the .gfa file of a pangenome graph

Module contents

panct.data.data module

class panct.data.data.Data(log=None)

Bases: ABC

Abstract class for accessing read-only data files

Attributes:

datanp.array: The contents of the data file, once loaded
log: Logger: A logging instance for recording debug statements.

static hook_compressed(filename, mode)

A utility to help open files regardless of their compression

Based off of python’s fileinput.hook_compressed and copied from https://stackoverflow.com/a/64106815/16815703

Return type:

Union[GzipFile, IO[Any]]

Parameters:

filenamePath | str: The path to the file
modestr: Either ‘r’ for read or ‘w’ for write

Returns:

gzip.GzipFile | IO[Any]: The resolved file object

abstract classmethod read(fname)

Read the file contents and perform any recommended pre-processing

Return type:

Data

Parameters:

fnamePath | str: The name of a file to load data from

Returns:

An initialized instance of Data with the data from fname loaded

panct.data.regions module

Utilities for processing regions

class panct.data.regions.Region(chrom, start, end)

Bases: object

Store information about a genomic region

Attributes:

chromstr: Chromosome
startint: Start coordinate
endint: End coordinate

classmethod read(region)

Extract chrom, start, end from coordinate string

Return type:

Region

Parameters:

regionstr: Coordinate string in the form ‘chrom:start-end’

Returns:

regionRegion: Region object

Raises:

ValueError: If the region region string could not be parsed

class panct.data.regions.Regions(data, log=None)

Bases: Data

Store a bunch of Regions

Attributes:

datatuple[Region]: A bunch of Region objects
log: Logger: A logging instance for recording debug statements.

__iter__()

Return type:: Iterator[Region]

classmethod read(fname, log=None)

Extract list of regions from BED file

Return type:

Regions

Parameters:

fnamePath | str: BED file of regions
logLogger, optional: A Logger object to use for debugging statements

Returns:

Regions: A Regions object loaded with a bunch of regions

Raises:

ValueError: If a region line could not be parsed to chrom, start, end from the first 3 columns

panct.complexity module

Compute complexity scores for regions of a pangenome graph

panct.complexity.compute_complexity(node_table, metric)

Compute complexity for a node table. Options:

Return type:: Optional[float]

sequniq-normwalk: sum_n len(n)*p_n*(1-p_n)/L: where L is the average walk length
sequniq-normnode: sum_n len(n)*p_n*(1-p_n)/L: where L is the average node length

Parameters:

node_tablegraph_utils.NodeTable: Stores info on lengths/walks through each node
metricstr: Which metric to compute. See description above

Returns:

complexityfloat: Complexity score

Raises:

ValueError: If invalid metric specified

panct.complexity.main(graph_file, output_file=PosixPath('/dev/stdout'), region_str=None, metrics='sequniq-normwalk', reference='GRCh38', log=None)

Compute complexity scores for regions of a pangenome graph

If a GFA file is given, compute complexity on the entire file.

If a GBZ file is given, must specify a region (or file with list of regions)

Parameters:

graph_filePath: Path to GFA or GBZ file
output_filestr, optional: Path to output file
region_strstr|Path, optional: chrom:start-end of region to process or a BED file of regions
metricsstr, optional: Comma-separated list of metrics to compute
referencestr, optional: Sample ID of reference
loglogging.Logger, optional: Logger object

Returns:

retcodeint: Return code of the program

panct.walks module

Extract walks (W lines) from a GFA file into an indexed tab-separated format

panct.walks.extract_walks(graph, output=None, log=None)

Creates a .walk file mapping nodes in the graph to sample IDs representing haplotypes

Parameters:

graphPath: The path to a pangenome graph in GFA file
outputPath, optional: The location to which to write output. If not specified, we use the path to the graph, but with a .walk.gz file ending, instead.
logLogger, optional: A logging module to which to write messages about progress and any errors