14.05.2025 12:15 Luciana Dalla Valle (University of Torino, IT): Approximate Bayesian conditional copulas
According to Sklar’s theorem, any multidimensional absolutely continuous distribution function can be uniquely represented as a copula, which captures the dependence structure among the vector components. In real data applications, the interest of the analyses often lies on specific functionals of the dependence, which quantify aspects of it in a few numerical values. A broad literature exists on such functionals, however extensions to include covariates are still limited. This is mainly due to the lack of unbiased estimators of the conditional copula, especially when one does not have enough information to select the copula model. Several Bayesian methods to approximate the posterior distribution of functionals of the dependence varying according covariates are presented and compared; the main advantage of the investigated methods is that they use nonparametric models, avoiding the selection of the copula, which is usually a delicate aspect of copula modelling. These methods are compared in simulation studies and in two realistic applications, from civil engineering and astrophysics.
Source
14.05.2025 16:15 Rajen Shah (University of Cambridge, UK): Robustness in Semiparametric Statistics
Given that all models are wrong, it is important to understand the performance of methods when the settings for which they have been designed are not met, and to modify them where possible so they are robust to these sorts of departures from the ideal. We present two examples with this broad goal in mind.
\[ \]
We first look at a classical case of model misspecification in (linear) mixed-effect models for grouped data. Existing approaches estimate linear model parameters through weighted least squares, with optimal weights (given by the inverse covariance of the response, conditional on the covariates) typically estimated by maximizing a (restricted) likelihood from random effects modelling or by using generalized estimating equations. We introduce a new ‘sandwich loss’ whose population minimizer coincides with the weights of these approaches when the parametric forms for the conditional covariance are well-specified, but can yield arbitrarily large improvements when they are not.
\[ \]
The starting point of our second vignette is the recognition that semiparametric efficient estimation can be hard to achieve in practice: estimators that are in theory efficient may require unattainable levels of accuracy for the estimation of complex nuisance functions. As a consequence, estimators deployed on real datasets are often chosen in a somewhat ad hoc fashion and may suffer high variance. We study this gap between theory and practice in the context of a broad collection of semiparametric regression models that includes the generalized partially linear model. We advocate using estimators that are robust in the sense that they enjoy root n consistent uniformly over a sufficiently rich class of distributions characterized by certain conditional expectations being estimable by user-chosen machine learning methods. We show that even asking for locally uniform estimation within such a class narrows down possible estimators to those parametrized by certain weight functions and develop a new random forest-based estimation scheme to estimate the optimal weights. We demonstrate the effectiveness of the resulting estimator in a variety of semiparametric settings on simulated and real-world data.
Source
21.05.2025 12:15 Michael Muma (TU Darmstadt): The T-Rex Selector: Fast High-Dimensional Variable Selection with False Discovery Rate Control
Providing guarantees on the reproducibility of discoveries is essential when drawing inferences from high-dimensional data. Such data is common in numerous scientific domains, for example, in biomedicine, it is imperative to reliably detect the genes that are truly associated with the survival time of patients diagnosed with a certain type of cancer, or in finance, one aims at determining a sparse portfolio to reliably perform index tracking. This talk introduces the Terminating-Random Experiments (T-Rex) selector, a fast multivariate variable selection framework for high-dimensional data. The T-Rex selector provably controls a user-defined target false discovery rate (FDR) while maximizing the number of selected variables. It scales to settings with millions of variables. Its computational complexity is linear in the number of variables, making it more than two orders of magnitude faster than, e.g., the existing model-X knockoff methods. An easy-to-use open-source R package that implements the TRexSelector is available on CRAN. The focus of this talk lies on high-dimensional linear regression models, but we also describe extensions to principal component analysis (PCA) and Gaussian graphical models (GGMs).
Source
23.07.2025 12:15 Oezge Sahin (TU Delft, NL): t.b.a.
t.b.a.
Source
23.07.2025 13:15 Ariane Hanebeck (Karlsruher Institut für Technologie): t.b.a.
t.b.a.
Source