Biostatistics Faculty Publications

Using Mendelian Inheritance Errors as Quality Control Criteria in Whole Genome Sequencing Data Set

Valentina V. Pilipenko, Cincinnati Children's Hospital Medical Cente
Hua He, Cincinnati Children's Hospital Medical Cente
Brad G. Kurowski, University of Cincinnati
Eileen S. Alexander, University of Cincinnati
Xue Zhang, Cincinnati Children's Hospital Medical Cente
Lili Ding, University of Cincinnati
Tesfaye B. Mersha, University of Cincinnati
Leah Kottyan, Cincinnati Children's Hospital Medical Cente
David W. Fardo, University of KentuckyFollow
Lisa J. Martin, University of Cincinnati

Abstract

Although the technical and analytic complexity of whole genome sequencing is generally appreciated, best practices for data cleaning and quality control have not been defined. Family based data can be used to guide the standardization of specific quality control metrics in nonfamily based data. Given the low mutation rate, Mendelian inheritance errors are likely as a result of erroneous genotype calls. Thus, our goal was to identify the characteristics that determine Mendelian inheritance errors. To accomplish this, we used chromosome 3 whole genome sequencing family based data from the Genetic Analysis Workshop 18. Mendelian inheritance errors were provided as part of the GAW18 data set. Additionally, for binary variants we calculated Mendelian inheritance errors using PLINK. Based on our analysis, nonbinary single-nucleotide variants have an inherently high number of Mendelian inheritance errors. Furthermore, in binary variants, Mendelian inheritance errors are not randomly distributed. Indeed, we identified 3 Mendelian inheritance error peaks that were enriched with repetitive elements. However, these peaks can be lessened with the inclusion of a single filter from the sequencing file. In summary, we demonstrated that erroneous sequencing calls are nonrandomly distributed across the genome and quality control metrics can dramatically reduce the number of mendelian inheritance errors. Appropriate quality control will allow optimal use of genetic data to realize the full potential of whole genome sequencing.

Document Type

Article

Publication Date

6-17-2014

Notes/Citation Information

Published in BMC Proceedings, v. 8, supplement 1, article S21, p. 1-5.

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Digital Object Identifier (DOI)

http://dx.doi.org/10.1186/1753-6561-8-S1-S21

Funding Information

The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. This work was supported in part by NIH grants 8P20GM103436-12 (DWF, KN), K25 AG043546 (DWF), NS36695 (LD, LJM), AI070235 (HH, LJM, TMB), AI066738 (LJM), HL111459 (LJM, VP), T32-ES10957 (ESA), K12 HD001097-16 (BGK), K01HL103165 (TMB).

The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.

Repository Citation

Pilipenko, Valentina V.; He, Hua; Kurowski, Brad G.; Alexander, Eileen S.; Zhang, Xue; Ding, Lili; Mersha, Tesfaye B.; Kottyan, Leah; Fardo, David W.; and Martin, Lisa J., "Using Mendelian Inheritance Errors as Quality Control Criteria in Whole Genome Sequencing Data Set" (2014). Biostatistics Faculty Publications. 17.
https://uknowledge.uky.edu/biostatistics_facpub/17

Download

Included in

Biostatistics Commons

COinS

Biostatistics Faculty Publications

Using Mendelian Inheritance Errors as Quality Control Criteria in Whole Genome Sequencing Data Set

Abstract

Document Type

Publication Date

Notes/Citation Information

Digital Object Identifier (DOI)

Funding Information

Related Content

Repository Citation

Included in

Search

Browse by Author

Author Corner

Connect

Biostatistics Faculty Publications

Using Mendelian Inheritance Errors as Quality Control Criteria in Whole Genome Sequencing Data Set

Authors

Abstract

Document Type

Publication Date

Notes/Citation Information

Digital Object Identifier (DOI)

Funding Information

Related Content

Repository Citation

Included in

Share

Search

Browse by Author

Author Corner

Connect