Documentation

Command line interface

panct

panct: A collection of tools for working with pangenomes

panct [OPTIONS] COMMAND [ARGS]...

Options

-v, --version

Show the application’s version and exit.

Default:

False

--install-completion

Install completion for the current shell.

--show-completion

Show completion for the current shell, to copy it or customize the installation.

complexity

Compute complexity scores

panct complexity [OPTIONS] GRAPH

Options

--region <region>

A region in which to compute complexity, or a BED file of regions

Default:

''

--metrics <metrics>

Comma-separated list of which complexity metrics to compute. Options: sequniq-normwalk,sequniq-normnode

Default:

'sequniq-normwalk'

-r, --reference <reference>

The ID of the reference sequence in the GFA file

Default:

'GRCh38'

-o, --out <output_file>

Name of output file

Default:

PosixPath('/dev/stdout')

-v, --verbosity <verbosity>

The level of verbosity desired

Default:

<Verbosity.info: 'INFO'>

Options:

CRITICAL | ERROR | WARNING | INFO | DEBUG | NOTSET

Arguments

GRAPH

Required argument

Path to the .gfa or .gbz file of a pangenome graph

walks

Extract walks to a file

panct walks [OPTIONS] GRAPH

Options

-o, --out <output_file>

Name of output file

-v, --verbosity <verbosity>

The level of verbosity desired

Default:

<Verbosity.info: 'INFO'>

Options:

CRITICAL | ERROR | WARNING | INFO | DEBUG | NOTSET

Arguments

GRAPH

Required argument

Path to the .gfa file of a pangenome graph

Module contents

panct.data.data module

class panct.data.data.Data(log=None)

Bases: ABC

Abstract class for accessing read-only data files

Attributes:
datanp.array

The contents of the data file, once loaded

log: Logger

A logging instance for recording debug statements.

static hook_compressed(filename, mode)

A utility to help open files regardless of their compression

Based off of python’s fileinput.hook_compressed and copied from https://stackoverflow.com/a/64106815/16815703

Return type:

Union[GzipFile, IO[Any]]

Parameters:
filenamePath | str

The path to the file

modestr

Either ‘r’ for read or ‘w’ for write

Returns:
gzip.GzipFile | IO[Any]

The resolved file object

abstract classmethod read(fname)

Read the file contents and perform any recommended pre-processing

Return type:

Data

Parameters:
fnamePath | str

The name of a file to load data from

Returns:
An initialized instance of Data with the data from fname loaded

panct.data.regions module

Utilities for processing regions

class panct.data.regions.Region(chrom, start, end)

Bases: object

Store information about a genomic region

Attributes:
chromstr

Chromosome

startint

Start coordinate

endint

End coordinate

classmethod read(region)

Extract chrom, start, end from coordinate string

Return type:

Region

Parameters:
regionstr

Coordinate string in the form ‘chrom:start-end’

Returns:
regionRegion

Region object

Raises:
ValueError

If the region region string could not be parsed

class panct.data.regions.Regions(data, log=None)

Bases: Data

Store a bunch of Regions

Attributes:
datatuple[Region]

A bunch of Region objects

log: Logger

A logging instance for recording debug statements.

__iter__()
Return type:

Iterator[Region]

classmethod read(fname, log=None)

Extract list of regions from BED file

Return type:

Regions

Parameters:
fnamePath | str

BED file of regions

logLogger, optional

A Logger object to use for debugging statements

Returns:
Regions

A Regions object loaded with a bunch of regions

Raises:
ValueError

If a region line could not be parsed to chrom, start, end from the first 3 columns

panct.complexity module

Compute complexity scores for regions of a pangenome graph

panct.complexity.compute_complexity(node_table, metric)

Compute complexity for a node table. Options:

Return type:

Optional[float]

sequniq-normwalk: sum_n len(n)*p_n*(1-p_n)/L

where L is the average walk length

sequniq-normnode: sum_n len(n)*p_n*(1-p_n)/L

where L is the average node length

Parameters:
node_tablegraph_utils.NodeTable

Stores info on lengths/walks through each node

metricstr

Which metric to compute. See description above

Returns:
complexityfloat

Complexity score

Raises:
ValueError

If invalid metric specified

panct.complexity.main(graph_file, output_file=PosixPath('/dev/stdout'), region_str=None, metrics='sequniq-normwalk', reference='GRCh38', log=None)

Compute complexity scores for regions of a pangenome graph

If a GFA file is given, compute complexity on the entire file.

If a GBZ file is given, must specify a region (or file with list of regions)

Parameters:
graph_filePath

Path to GFA or GBZ file

output_filestr, optional

Path to output file

region_strstr|Path, optional

chrom:start-end of region to process or a BED file of regions

metricsstr, optional

Comma-separated list of metrics to compute

referencestr, optional

Sample ID of reference

loglogging.Logger, optional

Logger object

Returns:
retcodeint

Return code of the program

panct.walks module

Extract walks (W lines) from a GFA file into an indexed tab-separated format

panct.walks.extract_walks(graph, output=None, log=None)

Creates a .walk file mapping nodes in the graph to sample IDs representing haplotypes

Parameters:
graphPath

The path to a pangenome graph in GFA file

outputPath, optional

The location to which to write output. If not specified, we use the path to the graph, but with a .walk.gz file ending, instead.

logLogger, optional

A logging module to which to write messages about progress and any errors