Walks

A .walk file stores the walks (W lines) of a pangenome graph.

Unlike walks stored in the GFA format, the walks in the .walk format are designed to support faster querying from their node IDs.

Overview

A .walk file is tab-delimited and has three or more columns per line:

Name

Description

empty

The first column is always empty to satisfy tabix

node ID

The second column contains the ID of the node

sample IDs

Any other columns will contain the IDs of samples with a haplotype that passed through this node.

Each sample ID will have a colon and integer appended to it. The integer will denote the chromosomal strand of the sample that the walk belongs to.

Examples

You can find an example of a .walk file without any extra fields in tests/data/basic.walk:

        1       GRCh38:0        samp1:0 samp1:1 samp2:1
        2       GRCh38:0        samp1:0 samp1:1

And here’s the corresponding GFA file:

H       VN:Z:1.1        RS:Z:GRCh38
S       1       ACGTGCTG
S       2       AT
L       1       +       2       +       0M
W       GRCh38  0       chrTest 0       0       >1>2
W       samp1   0       chrTest 0       0       >1>2
W       samp1   1       chrTest 0       0       >1<2
W       samp2   1       chrTest 0       0       >1

Compressing and indexing

If it isn’t already, we encourage you to bgzip compress and index your .walk file whenever possible. This will reduce both disk usage and the time required to parse the file, but it is entirely optional. You can use the bgzip and tabix commands.

bgzip file.walk
tabix -s 1 -b 2 -e 2 file.walk.gz