Background: Health insurance claims data offer a unique opportunity to study disease distribution on a large scale. Challenges arise in the process of accurately analyzing these raw data. One important challenge to overcome is the accurate classification of study outcomes. For example, using claims data, there is no clear way of classifying hospitalizations due to a specific event. This is because of the inherent disjointedness and lack of context that typically come with raw claims data.

Methods: In this paper, we propose a framework for classifying hospitalizations due to a specific event. We then tested this framework in a private health insurance claims database (Symphony) with approximately 4 million US adults who tested positive with COVID-19 between March and December 2020. Our claims specific COVID-19 related hospitalizations proportion is then compared to nationally reported rates from the Centers for Disease Control by age.

Results: Across all ages (18 +) the total percentage of Symphony patients who met our definition of hospitalized due to COVID-19 was 7.3% which was similar to the CDC’s estimate of 7.5%. By age group, defined by the CDC, our estimates vs. the CDC’s estimates were 18–49: 2.7% vs. 3%, 50–64: 8.2% vs. 9.2%, and 65 + : 14.6% vs. 28.1%.

Conclusions: The proposed methodology is a rigorous way to define event specific hospitalizations in claims data. This methodology can be extended to many different types of events and used on a variety of different types of claims databases.

Document Type


Publication Date


Digital Object Identifier (DOI)