Das Hauptziel des Projekts C4 ist die Entwicklung hocheffizienter Regressionsansätze. Wir wollen moderne statistische Regressionsmethoden auf sehr große und hochdimensionale Datensätze und Umgebungen, in denen die Rechenressourcen knapp sind skalierbar machen.
Wir konzentrieren uns auf algorithmische Ansätze, die sowohl im Streaming als auch in verteilten Umgebungen effizient umgesetzt werden können. Insbesondere entwickeln wir Methoden zur Aggregation von Daten und zur Reduzierung der Anzahl der Beobachtungen, z.B. durch zufällige lineare Projektionen und Stichproben, sowie Methoden zur Reduzierung der Dimensionalität der zugrunde liegenden, möglicherweise Bayes'schen Modellklassen.
Skizzier- und Stichprobenverfahren für Regressionsansätze an groß angelegten Daten sind wichtige Forschungsgebiete mit vielen interessanten offenen Fragen. Obwohl die grundlegenden Modelle gut untersucht sind, hat die Forschung an komplexen und modernen statistischen Methoden gerade erst begonnen. Wir verfolgen die Erforschung neuartiger Datenreduktionstechniken für z.B. Bayes'sche verallgemeinerte lineare Modelle und zielen auf das anspruchsvolle Ziel ab, ihre algorithmische Behandlung zu vereinheitlichen, um Blaupausen für allgemeine statistische Rahmenbedingungen zu liefern.
| Munteanu/etal/2022a |
Munteanu, Alexander and Omlor, Simon and Peters, Christian.
p-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets.
In
The 25th International Conference on Artificial Intelligence and Statistics (AISTATS),
2022.
|
| Munteanu/etal/2022b |
Munteanu, Alexander and Omlor, Simon and Song, Zhao and Woodruff, David P..
Bounding the Width of Neural Networks via Coupled Initialization - A Worst Case Analysis.
In
Proceedings of the 39th International Conference on Machine Learning (ICML),
2022.
|
| Madjar/etal/2021a |
Madjar, Katrin and Zucknick, Manuela and Ickstadt, Katja and Rahnenführer, Jörg.
Combining heterogeneous subgroups with graph-structured variable selection priors for Cox regression.
In
BMC Bioinform.,
Vol. 22,
No. 1,
Seiten 586,
2021.
|
| Munteanu/etal/2021a |
Munteanu, Alexander and Omlor, Simon and Woodruff, David P..
Oblivious Sketching for Logistic Regression.
In
Proceedings of the 38th International Conference on Machine Learning (ICML),
2021.
|
| Parry/etal/2021a |
Parry, Katharina and Geppert, Leo N. and Munteanu, Alexander and Ickstadt, Katja.
Cross-Leverage Scores for Selecting Subsets of Explanatory Variables.
In
arXiv e-prints,
Vol. abs/2109.08399,
2021.
|
| Geppert/etal/2020a |
Geppert, Leo N. and Ickstadt, Katja and Munteanu, Alexander and Sohler, Christian.
Streaming statistical models via Merge & Reduce.
In
International Journal of Data Science and Analytics,
Vol. 10,
No. 4,
Seiten 331-347,
2020.
|
| Krivosija/Munteanu/2019a |
Krivo\vsija, Amer and Munteanu, Alexander.
Probabilistic smallest enclosing ball in high dimensions via subgradient sampling.
In
Proceedings of the 35th International Symposium on Computational Geometry (SoCG),
Seiten 47:1--47:14,
2019.
|
| Meintrup/etal/2019a |
Meintrup, Stefan and Munteanu, Alexander and Rohde, Dennis.
Random projections and sampling algorithms for clustering of high-dimensional polygonal curves.
In
Advances in Neural Information Processing Systems 32 (NeurIPS),
Seiten 12807--12817,
2019.
|
| Munteanu/etal/2019a |
Munteanu, Alexander and Nayebi, Amin and Poloczek, Matthias.
A Framework for Bayesian Optimization in Embedded Subspaces.
In
Proceedings of the 36th International Conference on Machine Learning (ICML),
Vol. 97,
Seiten 4752--4761,
Long Beach, California, USA,
PMLR,
2019.
|
| Tietz/etal/2019a |
Tietz, Tobias and Selinski, Silvia and Golka, Klaus and Hengstler, Jan G. and Gripp, Stephan and Ickstadt, Katja and Ruczinski, Ingo and Schwender, Holger.
Identification of interactions of binary variables associated with survival time using survivalFS.
In
Archives of Toxicology,
Vol. 93,
No. 3,
Seiten 585--602,
2019.
|
| Wigmann/etal/2019a |
Wigmann, Claudia and Lange, Laura and Vautz, Wolfgang and Ickstadt, Katja.
Modelling and Classification of GC/IMS Breath Gas Measurements for Lozenges of Different Flavours.
In
Applications in Statistical Computing,
Seiten 31--48,
Springer,
2019.
|
| Ickstadt/etal/2018a |
Ickstadt, Katja and Schäfer, Martin and Zucknick, Manuela.
Toward Integrative Bayesian Analysis in Molecular Biology.
In
Annual Review of Statistics and Its Application,
Vol. 5,
No. 1,
Seiten 141-167,
2018.
|
| Molina/etal/2018a |
Molina, Alejandro and Munteanu, Alexander and Kersting, Kristian.
Core Dependency Networks.
In
Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI),
2018.
|
| Munteanu/etal/2018a |
Munteanu,Alexander and Schwiegelshohn, Chris and Sohler, Christian and Woodruff, David P..
On Coresets for Logistic Regression.
In
Advances in Neural Information Processing Systems 31 (NeurIPS),
2018.
|
| Munteanu/Schwiegelshohn/2018a |
Munteanu, Alexander and Schwiegelshohn, Chris.
Coresets - Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms.
In
KI - Künstliche Intelligenz,
Vol. 32,
No. 1,
Seiten 37-53,
2018.
|
| Weihs/Ickstadt/2018a |
Weihs, Claus and Ickstadt, Katja.
Data Science: the impact of statistics.
In
International Journal of Data Science and Analytics,
Springer,
2018.
|
| Geppert/etal/2017a |
Geppert, Leo N. and Ickstadt, Katja and Munteanu, Alexander and Quedenfeld, Jens and Sohler, Christian.
Random projections for Bayesian regression.
In
Statistics and Computing,
Vol. 27,
No. 1,
Seiten 79-101,
2017.
|
| Schlieker/etal/2017a |
Schlieker, Laura and Telaar, Anna and Lueking, Angelika and Schulz-Knappe, Peter and Theek, Carmen and Ickstadt, Katja.
Multivariate binary classification of imbalanced datasets - A case study on high-dimensional multiplex autoimmune assay data.
In
Biometrical Journal,
2017.
|
| Treppmann/etal/2017a |
Treppmann, Tabea and Ickstadt, Katja and Zucknick, Manuela.
Integration of multiple genomic data sources in a Bayesian Cox model for variable selection and prediction.
In
Computational and Mathematical Methods in Medicine,
Vol. Vol. 2017,
Seiten 1-19,
2017.
|
| Huels/etal/2016a |
Hüls, Anke and Krämer, Ursula and Stolz, Sabine and Hennig, Frauke and Hoffmann, Barbara and Ickstadt, Katja and Vierkötter, Andrea and Schikowski, Tamara.
Applicability of the Global Lung Initiative 2012 Reference Values for Spirometry for Longitudinal Data of Elderly Women.
In
PLOS ONE,
Vol. 11,
No. 6,
Seiten e0157569,
2016.
|
| Koellmann/etal/2016a |
Köllmann, Claudia and Ickstadt, Katja and Fried, Roland.
Beyond unimodal regression: modelling multimodality with piecewise unimodal regression or deconvolution models.
arXiv:1606.01666 [stat.AP],
2016.
|
| Munteanu/Wornowizki/2015a |
Munteanu, Alexander and Wornowizki, Max.
Correcting statistical models via empirical distribution functions.
In
Computational Statistics,
Vol. 31,
No. 2,
Seiten 465-495,
Springer,
2016.
|
| Koellmann/etal/2014a |
Köllmann, Claudia and Bornkamp, Björn and Ickstadt, Katja.
Unimodal regression using Bernstein-Schoenberg-splines and penalties.
In
Biometrics,
Vol. 70,
No. 4,
Seiten 783-793,
2014.
|
| Koellmann/etal/2014b |
Köllmann, Claudia and Ickstadt, Katja and Fried, Roland.
Beyond unimodal regression: modelling multimodality with piecewise unimodal, mixture or additive regression.
No. 8,
TU Dortmund,
2014.
|
| Schwiegelshohn/Sohler/2014a |
Chris Schwiegelshohn and Christian Sohler.
Logistic Regression for Datastreams.
No. 1,
TU Dortmund,
2014.
|
| Binder/etal/2012a |
Binder, Harald and Müller, Tina and Schwender, Holger and Golka, Klaus and Steffens, Michael and Hengstler, Jan G. and Ickstadt, Katja and Schumacher, Martin.
Cluster-localized sparse logistic regression for SNP data.
In
Statistical Applications in Genetics and Molecular Biology,
Vol. 11,
No. 4,
2012.
|
| Canzar/etal/2011c |
Canzar, Stefan and Marschall, Tobias and Rahmann, Sven and Schwiegelshohn, Chris.
Solving The Minimum String Cover Problem.
In
David A. Bader and Petra Mutzel (editors),
Proceedings of the SIAM Meeting on Algorithm Engineering and Experiments (ALENEX'12),
Seiten 75--83,
2012.
|
| Koellmann/etal/2012a |
Köllmann, Claudia and Bornkamp, Björn and Ickstadt, Katja.
Unimodal regression using Bernstein-Schoenberg-splines and penalties.
No. 6,
TU Dortmund,
2012.
|
| Lohr/etal/2012a |
Lohr, M. and Köllmann, C. and Freis, E. and Hellwig, B. and Hengstler, J. G. and Ickstadt, K. and Rahnenführer, J..
Optimal strategies for sequential validation of significant features from high-dimensional genomic data.
In
Journal of Toxicology and Environmental Health, Part A,
Vol. 75,
No. 8-10,
Seiten 447-460,
2012.
|
| Schwender/etal/2012a |
Schwender, Holger and Selinski, Silvia and Blaszkewicz, Meinolf and Marchan, Rosemarie and Ickstadt, Katja and Golka, Klaus and Hengstler, Jan G..
Distinct SNP combinations confer susceptibility to urinary bladder cancer in smokers and non-smokers.
In
Plos One,
Vol. 7,
No. 12,
2012.
|
| Ickstadt/etal/2011b |
Ickstadt, Katja and Bornkamp, Björn and Grzegorczyk, Marco and Wieczorek, Jakob and Sheriff, M.Rahuman and Grecco, Hérnan E. and Zamir, Eli.
Nonparametric Bayesian Networks (with discussion).
In
Bernardo, José M. and Bayarri, M. J. and Berger, James O. and Dawid, A. Philip and Heckerman, David and Smith, Adrian F. M. and West, M. (editors),
Bayesian Statistics,
Vol. 9,
Seiten 283-316,
2011.
|
| Schwender/etal/2011a |
Schwender, Holger and Ruczinski, Ingo and Ickstadt, Katja.
Testing SNPs and sets of SNPs for importance in association studies.
In
Biostatistics,
Vol. 12,
No. 1,
Seiten 18-32,
2011.
|
| Sohler/Woodruff/2011a |
Sohler, Christian and Woodruff, David P..
Subspace embeddings for the \(L_1\)-norm with applications.
In
Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC),
Seiten 755-764,
ACM,
2011.
|
| Geppert/2018a |
Geppert, Leo Nikolaus.
Bayesian and Frequentist Regression Approaches for Very Large Data Sets.
TU Dortmund,
2018.
|
| Munteanu/2018a |
Munteanu, Alexander.
On large-scale probabilistic and statistical data analysis.
TU Dortmund,
2018.
|
| Koellmann/2016a |
Köllmann, Claudia.
Unimodal spline regression and its use in various applications with single or multiple modes.
TU Dortmund,
2016.
|
| Bornkamp/etal/2010a |
B. Bornkamp and K. Ickstadt and D. B. Dunson.
Stochastically ordered multiple regression.
In
Biostatistics,
Vol. 11,
No. 3,
Seiten 419-431,
2010.
|
| Feldman/etal/2010a |
Dan Feldman and Morteza Monemizadeh and Christian Sohler and David Woodruff.
Coresets and Sketches for High Dimensional Subspace Approximation Problems.
In
Proceedings 21st Annual ACM-SIAM Symposium on Discrete Algorithms,
Seiten 630-649,
2010.
|
| Bornkamp/etal/2009a |
B. Bornkamp and A. Fritsch and O. Kuss and K. Ickstadt.
Penalty specialists among goalkeepers: A nonparametric Bayesian analysis of 44 years of German Bundesliga.
In
B. Schipp and W. Krämer (editors),
Statistical Inference, Econometric Analysis and Matrix Algebra: Festschrift in Honour of Götz Trenkler,
Seiten 63-76,
Physica Verlag,
2009.
|
| Bornkamp/Ickstadt/2009b |
Bornkamp, Björn and Ickstadt, Katja.
Bayesian nonparametric estimation of continuous monotone functions with applications to dose-response analysis.
In
Biometrics,
Vol. 65,
Seiten 198 -- 205,
2009.
|
| Frahling/etal/2008a |
Gereon Frahling and Piotr Indyk and Christian Sohler.
Sampling in Dynamic Data Streams and Applications.
In
International Journal of Computational Geometry and Applications (Special Issue with selected papers from the 21st ACM Symposium on Computational Geometry),
Vol. 18,
No. 1/2,
Seiten 3 -- 28,
2008.
|
| Schwender/Ickstadt/2008a |
Schwender, H. and Ickstadt, K..
Identification of SNP interactions using logic regression.
In
Biostatistics,
Vol. 9,
Seiten 187 -- 198,
2008.
|
| Feldman/etal/2007a |
Dan Feldman and Morteza Monemizadeh and Christian Sohler.
A PTAS for k-means clustering based on weak coresets.
In
Proceedings of the 23rd ACM Symposium on Computational Geometry,
Seiten 11-18,
2007.
|
| Fritsch/2007a |
Fritsch, A. und Ickstadt, K..
Comparing logic regression based methods for identifying SNP interactions.
In
Hochreiter, S. and Wagner, R. (editors),
Bioinformatics in Research and Development,
Springer,
2007.
|
| Nunkesser/etal/2007a |
Nunkesser, R. and Bernholt, T. and Schwender, H. and Ickstadt, K. and Wegener, I..
Detecting high-order interactions of single nucleotide polymorphisms using genetic programming.
In
Bioinformatics,
Vol. 23,
Seiten 3280 -- 3288,
2007.
|
| Ickstadt/Wolpert/99a |
K. Ickstadt and R. L. Wolpert.
Spatial regression for marked point processes.
In
J. M. Bernardo and J. O. Berger and A. P. Dawid and A. F. M. Smith (editors),
Bayesian Statistics 6,
Seiten 323-341,
Oxford,
Oxford University Press,
1999.
|
| Wolpert/Ickstadt/98a |
R. L. Wolpert and K. Ickstadt.
Poisson/Gamma random field models for spatial statistics.
In
Biometrika,
Vol. 85,
Seiten 251-267,
1998.
|