23.07.2025 12:15 Oezge Sahin (TU Delft, NL): Effects of covariate discretization on conditional quantiles in bivariate copulas
Clinical data often include a mix of continuous measurements and covariates that have been discretized, typically to protect privacy, meet reporting obligations, or simplify clinical interpretation. This combination, along with the nonlinear and tail-asymmetric dependence frequently observed in clinical data, affects the behavior of regression and variable-selection methods. Copula models, which separate marginal behavior from the dependence structure, provide a principled approach to studying these effects. In this talk, we analyze how discretizing a continuous covariate into equiprobable categories impacts conditional quantiles and likelihoods in bivariate copula models. For the Clayton and Frank families, we derive closed-form anchor points: for a given category, we identify the continuous covariate value at which the conditional quantile under the continuous model matches that of the discretized one. These anchors provide an exact measure of discretization bias, which is small near the center but can be substantial in the tails. Simulations across five copula families show that likelihood-based variable selection may over- or under-weight discretized covariates, depending on the dependence structure. Through simulations, we conclude by comparing polyserial and Pearson correlations, as well as Kendall’s tau (-b), in the same settings. Our results have practical implications for copula-based modeling of mixed-type data.
Quelle