SWORD - South West Open Research Deposit - ORBioM (Open Research BioSciences Meeting): Using LRR standard deviation as a genotype quality control measure and its downstream effect on CNV calling.
 

ORCID

0009-0002-7599-5068

Department

Biological Sciences

Year of Study

1st

Full-time or Part-time Study

Full-time

Level

Postgraduate

Presentation Type

Oral Presentation

Supervisor

Dr Deirdre Purfield

Supervisor

Dr Noirin McHugh

Supervisor

Dr Bruno Andrade

Abstract

Background:

Log R Ratio (LRR) is a genotype intensity measurement returned alongside routine genotype results and used to detect structural variation in the genome known as copy number variation (CNVs). Although LRR standard deviation (SD) is commonly applied as a quality control measure in CNV studies, limited research has explored its effect on genotype quality and the potential impact on CNV calling.

Methods:

A total of 720,152 genotypes were available on 716,234 cattle. Among these were 1,044 cattle that had duplicate genotype samples where one sample was considered gold standard (LRR SD < 0.3, Call Rate ≥ 0.95). Genotype concordance was calculated for all duplicates. PennCNV was used to call CNVs and concordance among CNV calls per duplicate animal was determined using the Jaccard index.

Results:

Across all 720,152 cattle samples, the mean (median) LRR SD was 0.21 (0.17), ranging from 0.07 to 1.98. A total of 14.62% of samples had an LRR SD > 0.3, the threshold routinely applied for CNV detection using SNP array data. Genotype concordance among the duplicates decreased as LRR SD increased, primarily due to misclassification of homozygous calls. Concordance among CNVs was highest for duplicate animals with a call rate > 0.9 and LRR SD < 0.3, but only 19.1% of these animals had a Jaccard index > 0.1.

Conclusion:

LRR SD is a useful additional quality control metric for genotype data. The low level of CNV concordance between duplicates highlights the need for caution when interpreting CNV results from medium density SNP panels.

Start Date

16-6-2025 1:30 PM

End Date

16-6-2025 1:45 PM

Included in

Genomics Commons

Share

COinS
 
Jun 16th, 1:30 PM Jun 16th, 1:45 PM

Using LRR standard deviation as a genotype quality control measure and its downstream effect on CNV calling.

Background:

Log R Ratio (LRR) is a genotype intensity measurement returned alongside routine genotype results and used to detect structural variation in the genome known as copy number variation (CNVs). Although LRR standard deviation (SD) is commonly applied as a quality control measure in CNV studies, limited research has explored its effect on genotype quality and the potential impact on CNV calling.

Methods:

A total of 720,152 genotypes were available on 716,234 cattle. Among these were 1,044 cattle that had duplicate genotype samples where one sample was considered gold standard (LRR SD < 0.3, Call Rate ≥ 0.95). Genotype concordance was calculated for all duplicates. PennCNV was used to call CNVs and concordance among CNV calls per duplicate animal was determined using the Jaccard index.

Results:

Across all 720,152 cattle samples, the mean (median) LRR SD was 0.21 (0.17), ranging from 0.07 to 1.98. A total of 14.62% of samples had an LRR SD > 0.3, the threshold routinely applied for CNV detection using SNP array data. Genotype concordance among the duplicates decreased as LRR SD increased, primarily due to misclassification of homozygous calls. Concordance among CNVs was highest for duplicate animals with a call rate > 0.9 and LRR SD < 0.3, but only 19.1% of these animals had a Jaccard index > 0.1.

Conclusion:

LRR SD is a useful additional quality control metric for genotype data. The low level of CNV concordance between duplicates highlights the need for caution when interpreting CNV results from medium density SNP panels.