Documentation
Command line interface
panct
panct: A collection of tools for working with pangenomes
panct [OPTIONS] COMMAND [ARGS]...
Options
- -v, --version
Show the application’s version and exit.
- Default:
False
- --install-completion
Install completion for the current shell.
- --show-completion
Show completion for the current shell, to copy it or customize the installation.
complexity
Compute complexity scores
panct complexity [OPTIONS] GRAPH
Options
- --region <region>
A region in which to compute complexity, or a BED file of regions
- Default:
''
- --metrics <metrics>
Comma-separated list of which complexity metrics to compute. Options: sequniq-normwalk,sequniq-normnode
- Default:
'sequniq-normwalk'
- -r, --reference <reference>
The ID of the reference sequence in the GFA file
- Default:
'GRCh38'
- -o, --out <output_file>
Name of output file
- Default:
PosixPath('/dev/stdout')
- -v, --verbosity <verbosity>
The level of verbosity desired
- Default:
<Verbosity.info: 'INFO'>- Options:
CRITICAL | ERROR | WARNING | INFO | DEBUG | NOTSET
Arguments
- GRAPH
Required argument
Path to the .gfa or .gbz file of a pangenome graph
walks
Extract walks to a file
panct walks [OPTIONS] GRAPH
Options
- -o, --out <output_file>
Name of output file
- -v, --verbosity <verbosity>
The level of verbosity desired
- Default:
<Verbosity.info: 'INFO'>- Options:
CRITICAL | ERROR | WARNING | INFO | DEBUG | NOTSET
Arguments
- GRAPH
Required argument
Path to the .gfa file of a pangenome graph
Module contents
panct.data.data module
- class panct.data.data.Data(log=None)
Bases:
ABCAbstract class for accessing read-only data files
- Attributes:
- datanp.array
The contents of the data file, once loaded
- log: Logger
A logging instance for recording debug statements.
- static hook_compressed(filename, mode)
A utility to help open files regardless of their compression
Based off of python’s fileinput.hook_compressed and copied from https://stackoverflow.com/a/64106815/16815703
- Return type:
Union[GzipFile,IO[Any]]- Parameters:
- filenamePath | str
The path to the file
- modestr
Either ‘r’ for read or ‘w’ for write
- Returns:
- gzip.GzipFile | IO[Any]
The resolved file object
panct.data.regions module
Utilities for processing regions
- class panct.data.regions.Region(chrom, start, end)
Bases:
objectStore information about a genomic region
- Attributes:
- chromstr
Chromosome
- startint
Start coordinate
- endint
End coordinate
- class panct.data.regions.Regions(data, log=None)
Bases:
DataStore a bunch of Regions
- Attributes:
- datatuple[Region]
A bunch of Region objects
- log: Logger
A logging instance for recording debug statements.
- classmethod read(fname, log=None)
Extract list of regions from BED file
- Return type:
- Parameters:
- fnamePath | str
BED file of regions
- logLogger, optional
A Logger object to use for debugging statements
- Returns:
- Regions
A Regions object loaded with a bunch of regions
- Raises:
- ValueError
If a region line could not be parsed to chrom, start, end from the first 3 columns
panct.complexity module
Compute complexity scores for regions of a pangenome graph
- panct.complexity.compute_complexity(node_table, metric)
Compute complexity for a node table. Options:
- Return type:
Optional[float]
- sequniq-normwalk: sum_n len(n)*p_n*(1-p_n)/L
where L is the average walk length
- sequniq-normnode: sum_n len(n)*p_n*(1-p_n)/L
where L is the average node length
- Parameters:
- node_tablegraph_utils.NodeTable
Stores info on lengths/walks through each node
- metricstr
Which metric to compute. See description above
- Returns:
- complexityfloat
Complexity score
- Raises:
- ValueError
If invalid metric specified
- panct.complexity.main(graph_file, output_file=PosixPath('/dev/stdout'), region_str=None, metrics='sequniq-normwalk', reference='GRCh38', log=None)
Compute complexity scores for regions of a pangenome graph
If a GFA file is given, compute complexity on the entire file.
If a GBZ file is given, must specify a region (or file with list of regions)
- Parameters:
- graph_filePath
Path to GFA or GBZ file
- output_filestr, optional
Path to output file
- region_strstr|Path, optional
chrom:start-end of region to process or a BED file of regions
- metricsstr, optional
Comma-separated list of metrics to compute
- referencestr, optional
Sample ID of reference
- loglogging.Logger, optional
Logger object
- Returns:
- retcodeint
Return code of the program
panct.walks module
Extract walks (W lines) from a GFA file into an indexed tab-separated format
- panct.walks.extract_walks(graph, output=None, log=None)
Creates a .walk file mapping nodes in the graph to sample IDs representing haplotypes
- Parameters:
- graphPath
The path to a pangenome graph in GFA file
- outputPath, optional
The location to which to write output. If not specified, we use the path to the graph, but with a .walk.gz file ending, instead.
- logLogger, optional
A logging module to which to write messages about progress and any errors