Skip to content

Working with multiple references

ViroConstrictor is able to work with multiple references in a single analysis run. This can be beneficial when you're working with samples that contain multiple pathogens, multiple strains of the same pathogen, or when you're not sure which reference to use for a specific sample.

Multiple references are provided in a singular reference fasta file. This file should contain all the references you want to use in the analysis. The reference fasta file should be provided to ViroConstrictor with the --reference flag or through the samplesheet.

Run a full analysis with multiple references

When you want to run a full analysis with multiple references you can provide the reference fasta file with the --reference flag. The reference fasta file should contain all the references you want to use in the analysis.

By default, the full analysis will now be ran for each reference in the provided reference fasta file. This will result in multiple results for each provided reference.

Choose the best reference for each sample

If you have a sequencing protocol is effective for multiple strains of the same pathogen, and you're not sure which reference to use for a specific sample, then you can let ViroConstrictor choose the best reference for each sample out of a larger set of references.

To do this provide all potential references in a single reference fasta file through the --reference flag or through the samplesheet. Additionally, provide the --match-ref flag or set the Match-ref column in the samplesheet to True.

When the --match-ref flag is provided or the Match-ref column is set to True in the samplesheet, ViroConstrictor will try to choose the best reference for each sample out of the provided references.

Example of a reference fasta file

Below is an example of a reference fasta file depicting 4 subtypes of the measles virus. This file can be provided to ViroConstrictor with the --reference flag or through the samplesheet. With --match-ref enabled or the Match-ref column set to True in the samplesheet, ViroConstrictor will pick the best matching reference out of these 4 subtypes, this will be done individually for each sample.

>measles_subtype1
atgaaagtaaaactactggtcctg...
>measles_subtype2
atgagtcttctaaccgaggtcgaa...
>measles_subtype3
atgaacccaaatcaaaagataata...
>measles_subtype4
atgagtgacatcgaagccatggcg...

Choose the best reference for each sample with segmented viruses

If you have a sequencing protocol that is effective for multiple strains of the same virus, and that virus has a segmented genomic structure, then you can let ViroConstrictor choose the best fitting reference for each segment of the virus for each sample. This can be beneficial when you're working with segmented viruses like influenza.

This can be achieved by providing a single reference fasta file with all the reference segments, and variations of the segments if applicable, of the virus. To let ViroConstrictor choose the best fitting reference for each segment of the virus for each sample, provide both the --match-ref flag and the --segmented flag. Or set both the Match-ref and Segmented columns in the samplesheet to True.

Additionally, extra formatting of the reference fasta file is required. Each fasta-header should follow a format as shown below:

>{personal_identifier} {segment_name}|{segment_subtype}|{extra information}

Example of a reference fasta file for segmented viruses

Below is an example of a reference fasta file depicting 3 segments of 3 subtypes of the influenza virus.
This file can be provided to ViroConstrictor with the --reference flag or through the samplesheet. With --match-ref and --segmented enabled or the Match-ref and Segmented columns set to True in the samplesheet, ViroConstrictor will pick the best matching reference for each segment of the virus for each sample.

>A.HA_01 HA|H1|H1N1
atgaaagtaaaactactggtcc...
>A.HA_02 HA|H3|H3N2
atgaagactatcattgctttga...
>A.HA_03 HA|H5|H5N1
atgaagactatcattgctttga...

>A.MP_01 MP|MP|H1N1
atgagtcttctaaccgaggtcg...
>A.MP_02 MP|MP|H3N2
atgagccttcttaccgaggtcg...
>A.MP_03 MP|MP|H5N1
atgagtcttctaaccgaggtcg...

>A.NA_01 NA|N1|H1N1
atgaacccaaatcaaaagataa...
>A.NA_02 NA|N2|H3N2
atgaatccaaatcaaaagataa...
>A.NA_03 NA|N1|H5N1
atgaatccaaatcaaaagataa...

Please note that after choosing the best fitting reference for each segment of the virus for each sample, the fasta-header will be modified to ensure that the results folder is structured correctly. This requires that the {segment-name} will be the same for every segment-variant provided in the reference fasta file.