Title
Benchmarking bioinformatic tools for amplicon based High throughput sequencing of norovirus
ORCID
https://orcid.org/ 0000-0002-1883-0489
Department
Biological Sciences
Year of Study
4
Full-time or Part-time Study
Full-time
Level
Postgraduate
Presentation Type
Oral Presentation
Supervisor
Prof. Paul Cotter
Supervisor
Dr Sinead Keaveney
Supervisor
Dr Helen O'Shea
Abstract
In order to survey noroviruses in our environment using High Throughput Sequencing (HTS) it is essential that our methods, both wet-lab and computational, are fit for purpose. In this body of work, we have evaluated pipelines and classifiers for the genotypic characterisation of norovirus VP1 region using simulated sequencing data.
Denoising based pipelines Dada2, Deblur and USEARCH-UNOISE3 were included, alongside clustering based pipelines VSEARCH and FROGS. NoroNet and CaliciNet classifiers were compared to QIIME2 feature-classifier with standard and custom databases. Pipelines were compared to the expected sequences and composition using a variety of measures, Bray-Curtis distance, UniFrac weighted and unweighted and a confusion matrix.
Contrary to the expected performance of clustering versus denoising methods, clustering approaches produced data more closely reflecting the expected composition, on all measures, similarity/dissimilarity distances and phylogenetic. VSEARCH performed the best, in terms of similarity to expected composition. However, FROGS produced sequences and compositions distinctly different from all other pipelines. The impact of reduced depth of coverage on performance was assessed for VSEARCH and there were no differences in composition, phylogenetic similarity or taxonomic assignment. Classification was more strongly impacted by database rather than classification method. QIIME2 feature-classifier provides 99% agreement with NoroNet typing tool to capsid designation level. Disagreement increases with the inclusion capsid variant designation.
VSEARCH provides a robust option for analysing viral amplicons. Pipeline choice impacted false positives (Dada2) and sub-standard classification (FROGS). QIIME2 feature-classifier is a viable alternative to external classification, however maintenance of the input database is essential.
Keywords:
norovirus, High Throughput Sequencing, in-silico, sensitivity
Start Date
14-6-2022 9:30 AM
End Date
14-6-2022 9:45 AM
Recommended Citation
Fitzpatrick, Amy Heather, "Benchmarking bioinformatic tools for amplicon based High throughput sequencing of norovirus" (2022). ORBioM (Open Research BioSciences Meeting). 2.
https://sword.cit.ie/orbiom/2022/schedule/2
Benchmarking bioinformatic tools for amplicon based High throughput sequencing of norovirus
In order to survey noroviruses in our environment using High Throughput Sequencing (HTS) it is essential that our methods, both wet-lab and computational, are fit for purpose. In this body of work, we have evaluated pipelines and classifiers for the genotypic characterisation of norovirus VP1 region using simulated sequencing data.
Denoising based pipelines Dada2, Deblur and USEARCH-UNOISE3 were included, alongside clustering based pipelines VSEARCH and FROGS. NoroNet and CaliciNet classifiers were compared to QIIME2 feature-classifier with standard and custom databases. Pipelines were compared to the expected sequences and composition using a variety of measures, Bray-Curtis distance, UniFrac weighted and unweighted and a confusion matrix.
Contrary to the expected performance of clustering versus denoising methods, clustering approaches produced data more closely reflecting the expected composition, on all measures, similarity/dissimilarity distances and phylogenetic. VSEARCH performed the best, in terms of similarity to expected composition. However, FROGS produced sequences and compositions distinctly different from all other pipelines. The impact of reduced depth of coverage on performance was assessed for VSEARCH and there were no differences in composition, phylogenetic similarity or taxonomic assignment. Classification was more strongly impacted by database rather than classification method. QIIME2 feature-classifier provides 99% agreement with NoroNet typing tool to capsid designation level. Disagreement increases with the inclusion capsid variant designation.
VSEARCH provides a robust option for analysing viral amplicons. Pipeline choice impacted false positives (Dada2) and sub-standard classification (FROGS). QIIME2 feature-classifier is a viable alternative to external classification, however maintenance of the input database is essential.
Comments
Oral Session 1 - Technological Advancements in Food and Health Research