Epidemiologists often categorize a continuous risk predictor, even when the true risk model is not a categorical one. Nonetheless, such categorization is thought to be more robust and interpretable, and thus their goal is to fit the categorical model and interpret the categorical parameters. We address the question: with measurement error and categorization, how can we do what epidemiologists want, namely to estimate the parameters of the categorical model that would have been estimated if the true predictor was observed? We develop a general methodology for such an analysis, and illustrate it in linear and logistic regression. Simulation studies are presented and the methodology is applied to a nutrition data set. Discussion of alternative approaches is also included.

Document Type


Publication Date


Notes/Citation Information

Published in Electronic Journal of Statistics, v. 12, no. 2, p. 4032-4056.

Creative Commons Attribution 4.0 International License

Digital Object Identifier (DOI)


Funding Information

Blas’s research was supported by a post-doctoral fellowship from the Brazilian Agency CNPq (201192/2015-2). The research of Wang and Carroll was supported by a grant from the National Cancer Institute (U01-CA057030).

Related Content

The R package CCP for implementing the methods has been placed on GitHub at https://github.com/tianyingw/CCP. The Eating at America’s Table Study data in Section 5 can be obtained from a data transfer agreement with the National Cancer Institute: our R package can generate simulated data as in Section 4 as a check for reproducibility.