Data Input

PopART reads Nexus-format alignments. An example (including the optional trees and traits blocks) can be found here. Currently binary ("Standard") and nucleotide data types are supported.

Your data file must have an alignment, either in Taxa + Characters blocks, or a Data block. If you want the nodes to be coloured according to phylogeographic or phenotype information (or in some other sensible way), you need to include a Traits block as well. NEW: you can also include geotagged sequences in the GeoTags block. These will be used to cluster sequences into groups that can be used to colour networks. See the example for the syntax for Traits and GeoTags blocks. In both cases, coordinates may be given as negative numbers to indicate southern or western hemispheres, or hemisphere may be specified with a letter (e.g. 25.61S or 76W).

You can also include a Trees block, if you want to be able to infer Ancestral Parsimony networks. This method is supposed to use all most parsimonious trees, but if you'd like to use it with some other set of trees (e.g., some trees produced by another inference method, or trees within 1 or 2 steps of the MP tree), I'd like to hear about it. Please let me know how it goes.

At the moment you can't use PopART to edit either the alignment or the traits data, but this feature will be added at some point.

NEW: PopART can now import alignments in Phylip formats and traits or geotags in tabular formats (tab, space, or comma-delimited).

The Networks

PopART includes several network inference methods. Minimum Spanning, Median Joining, and TCS are all popular methods that have been described elsewhere. Ancestral Parsimony, Integer Neighbor-Joining, and Tight Span Walker are new methods that I'll describe briefly here.

Ancestral Parsimony

This method shows different ways the haplotype sequences might be connected through ancestral sequences inferred by the Maximum Parsimony criterion. You'll find that these networks tend to have a lot of edges, because there's often a lot of uncertainty in the inference of ancestral sequences. You can reduce this somewhat by increasing the "Minimum Ancestral Frequency" value.

Integer Neighbor-Joining

This method begins with a Neighbor-Joining tree, but sets the branch lengths to integer values so that they represent the number of mutations between sequences. Then, for pairs of haplotypes whose distances on the tree are longer than the distances between the sequences, edges are added to shorten the distance. New edges won't be added if they create any distances on the network that are shorter than the distances between sequences. If a new edge is longer than the improvement produced by its addition (i.e., the difference between the distance on the network and the sequence distance), it won't be added. You can increase the "Reticulation tolerance" to allow the addition of these edges if they aren't much too long.

Tight Span Walker

This method uses some properties of the tight span for the distance matrix for your sequences to build a haplotype network. We'll be describing this method in detail soon, but it's beyond the scope of this documentation. The current implementation will fail with some data sets, but if it works, the distances on the network should match your sequence distances exactly.

The Interface

Once you've loaded your data, and inferred a network, you should see your network. If you've included a Traits block in your Nexus file, your network should be colour-coded according to these traits. Any sequences associated with multiple traits should appear as pie charts. You can click the "show barcharts" button on the toolbar to show trait frequencies as barcharts when you mouse over these nodes.

If you've got multiple sequence labels with the same sequence, you can see what all these sequences are by clicking the "show identical taxa" button on the toolbar. You can also go to Statistics > Identical sequences to produce a list of all sequences that were found to be identical, and you can optionally log this information to a file. Note that PopART masks any columns in your alignment with gaps or ambiguous characters (?, N, Y, or R), so you may have sequences that aren't truly identical that become identical after these columns are removed. We're working on an intelligent way to deal with ambiguous/gap data that will hopefully allow us to use your missing data rather than masking it.

You can change the colour theme using the "change colour theme" tool on the toolbar. The built-in themes only include 10 colours, so if you have more than 10 traits you'll see some repeated colours. You can change individual colours by going to Edit > Set trait colour. You can also do this by clicking on the coloured circles in the legend. Various other colours and fonts can be changed via the Edit menu as well.

Nodes and the legend can be moved around the network area. You can also rotate and zoom in and out of the network. If you want to make the network area bigger or smaller, you can zoom out and then move the grips on the dotted line indicating the edge of the network area.

Other Stuff

You can export your network as a graphic (.png, .svg, or .pdf). You can also export your network as a table so that you can import it in other programs that can read text files. This is useful if you'd like to use Cytoscape to open your network, or R, or potentially other programs as well.