Abstract

Background: The National Sleep Research Resource (NSRR) is a large-scale, openly shared, data repository of de-identified, highly curated clinical sleep data from multiple NIH-funded epidemiological studies. Although many data repositories allow users to browse their content, few support fine-grained, cross-cohort query and exploration at study-subject level. We introduce a cross-cohort query and exploration system, called X-search, to enable researchers to query patient cohort counts across a growing number of completed, NIH-funded studies in NSRR and explore the feasibility or likelihood of reusing the data for research studies.

Methods: X-search has been designed as a general framework with two loosely-coupled components: semantically annotated data repository and cross-cohort exploration engine. The semantically annotated data repository is comprised of a canonical data dictionary, data sources with a data dictionary, and mappings between each individual data dictionary and the canonical data dictionary. The cross-cohort exploration engine consists of five modules: query builder, graphical exploration, case-control exploration, query translation, and query execution. The canonical data dictionary serves as the unified metadata to drive the visual exploration interfaces and facilitate query translation through the mappings.

Results: X-search is publicly available at https://www.x-search.net/ with nine NSRR datasets consisting of over 26,000 unique subjects. The canonical data dictionary contains over 900 common data elements across the datasets. X-search has received over 1800 cross-cohort queries by users from 16 countries.

Conclusions: X-search provides a powerful cross-cohort exploration interface for querying and exploring heterogeneous datasets in the NSRR data repository, so as to enable researchers to evaluate the feasibility of potential research studies and generate potential hypotheses using the NSRR data.

Document Type

Article

Publication Date

11-13-2018

Notes/Citation Information

Published in BMC Medical Informatics and Decision Making, v. 18, 99, p. 1-10.

© The Author(s) 2018

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Digital Object Identifier (DOI)

https://doi.org/10.1186/s12911-018-0682-y

Funding Information

This work was supported by the NIH National Heart, Lung, and Blood Institute through grant number R24HL114473, National Center for Advancing Translational Sciences through grant number UL1TR001998, National Institute on Drug Abuse through grant number T32DA016176, and National Science Foundation (NSF) through grant number 1626364. Publication of this article was supported by grant R24HL114473.

Related Content

The NSRR datasets are available at https://sleepdata.org/.

The X-search is accessible at https://www.x-search.net/.

The mappings between individual data dictionaries and the canonical data dictionary can be found at https://github.com/nsrr/cross-dataset-mapping.

All the data used in this work were de-identified. The access to individual datasets for this work has been granted through the NSRR Data Access and Use Agreement, made by and between the Brigham and Women’s Hospital, Inc., through its Division of Sleep and Circadian Disorders (“BWH”) and the Data User (available at https://sleepdata.org/data/requests).

Share

COinS