Walks

A .walk file stores the walks (W lines) of a pangenome graph.

Unlike walks stored in the GFA format, the walks in the .walk format are designed to support faster querying from their node IDs.

Overview

A .walk file is tab-delimited and has three or more columns per line:

Name	Description
empty	The first column is always empty to satisfy `tabix`
node ID	The second column contains the ID of the node
sample IDs	Any other columns will contain the IDs of samples with a haplotype that passed through this node.

Each sample ID will have a colon and integer appended to it. The integer will denote the chromosomal strand of the sample that the walk belongs to.

Examples

You can find an example of a .walk file without any extra fields in tests/data/basic.walk:

        1       GRCh38:0        samp1:0 samp1:1 samp2:1
        2       GRCh38:0        samp1:0 samp1:1

And here’s the corresponding GFA file:

H       VN:Z:1.1        RS:Z:GRCh38
S       1       ACGTGCTG
S       2       AT
L       1       +       2       +       0M
W       GRCh38  0       chrTest 0       0       >1>2
W       samp1   0       chrTest 0       0       >1>2
W       samp1   1       chrTest 0       0       >1<2
W       samp2   1       chrTest 0       0       >1

Compressing and indexing

If it isn’t already, we encourage you to bgzip compress and index your .walk file whenever possible. This will reduce both disk usage and the time required to parse the file, but it is entirely optional. You can use the bgzip and tabix commands.

bgzip file.walk
tabix -s 1 -b 2 -e 2 file.walk.gz