• Advancement of computationally Intensive methods for efficient modern general-purpose statistical analysis and inference
The Client : Javna agencija za raziskovalno dejavnost RS
Project type: Research projects ARRS
Project duration: 2016 - 2019
  • Description

It is difficult to overstate the importance of statistical data analysis in today's world: all the empirical sciences, health, finance, fraud detection, telecommunications, social networking, and marketing are just a few areas, which rely heavily on data and their analysis. While applied statistics, especially modern Bayesian statistics, have progressed tremendously and have become much more accessible, progress has recently been slowing down, because current state-of-the-art computation cannot handle the models and volumes of data we want to analyze today.

The issue of inefficient statistical computation has recently been highlighted as one of the top 5 open problems in statistics. Our primary objective is to contribute to solving this problem by researching an approach to more efficient general-purpose computation and implementing the findings in a tool, which would allow us to analyze ever growing volumes of data at a reasonable cost.

We plan to achieve this objective by automatically parallelizing the most expensive parts of general-purpose Markov Chain Monte Carlo computation algorithms (in particular, Metropolis-Hastings and Hamiltonian Monte Carlo) and using graphical processing units. As a result of our project, we anticipate at least 100-fold speedups at a low cost (less than €1.000,00). Furthermore, have attracted top researchers and experts from the University of Ljubljana, the Slovenian Academy of Sciences and Arts, and industry to participate in the project. Every data set and statistical inference problem we use to gain insight, develop, evaluate, and validate our methodology, will be a part of a relevant practical problem faced by Slovenian researchers.

There have been successful attempts at efficient statistical computation for very limited cases, but what we are aiming for - general-purpose inference, which is automatically parallelized for highly efficient computation - is novel and has so far not been achieved. This makes the project extremely relevant both as a significant scientific achievement in the field of computation and due to the numerous practical benefits of low-cost accessible high-performance statistical inference.

Indices from related work suggest that the speedups we are aiming for are achievable. While this is a research project and several technical details and implementation issues remain to be resolved, we are confident of the projects feasibility, as have a set of well-defined and directly measurable requirements, we laid out a clear plan on how to achieve them, and assembled a project team of experts from varied backgrounds with all the required knowledge and know-how. We also attracted co-financing from industry to supplement our budget and we will actively promote student participation.

The main contributions of the project will be the theoretical research that leads to efficient computation, the practical implementation of this research into a software tool for general-purpose statistical computation, and, as a by-product, empirical research achievements in other fields of science made possible by our methodological research. Efficient computation will cut time and costs, which will directly benefit industry and, given the ubiquity and growing volumes of data, every-day life. And last, but not least, the collaboration between researchers, applied researchers, industry, and students will raise the general level of applied statistical knowledge, a field that is extremely underdeveloped in Slovenia.


ČEŠNOVAR, Rok, ŠTRUMBELJ, Erik. Parallel draws from the Polya-Gamma distribution for faster Bayesian multinomial and count model inference. V: GAMS, Matjaž (ur.), LUŠTREK, Mitja (ur.), PILTAVER, Rok (ur.). Slovenian Conference on Artificial Intelligence : proceedings of the 19th International Multiconference Information Society - IS 2016, 12 October 2016, Ljubljana, Slovenia : volume A. Ljubljana: Institut Jožef Stefan. 2016, str. 9-12. [1537224387]

ČEŠNOVAR, Rok, ŠTRUMBELJ, Erik. Bayesian Lasso and multinomial logistic regression on GPU. PloS one, ISSN 1932-6203, Jun. 2017, vol. 12, no. 6, str. 1-17. [1537467843]

ČEŠNOVAR, Rok, ŠTRUMBELJ, Erik. bayesCL : Bayesian Inference on a GPU using OpenCL. [S. l.]: The Comprehensive R Archive Network, 2017. https://cran.r-project.org/web/packages/bayesCL/index.html. [COBISS.SI-ID 1537481155]

ČEŠNOVAR, Rok, SLUGA, Davor, DEMŠAR, Jure, BRONDER, Steve, ŠTRUMBELJ, Erik. GPU optimized math routines in the Stan Math library : lecture at StanCon 2018 Helsinki, 29-31 August 2018. [ 1538085315]

CIGLARIČ, Tadej, ČEŠNOVAR, Rok, ŠTRUMBELJ, Erik. An OpenCL library for parallel random number generators. The journal of supercomputing, ISSN 0920-8542, 2019, vol. , no. , str. 1-16. [1538103747]

ŠTRUMBELJ, Erik, ČEŠNOVAR, Rok, SLUGA, Davor, JACKSON, Burton. GPU-based parallel computation of pharmacometric models in Stan software for Bayesian inference. V: The Ninth American Conference on Pharmacometrics : ACoP9, (Journal of pharmacokinetics and pharmacodynamics (Print), ISSN 1567-567X, vol. 45, iss. 1 (suppl.)). [S. l.]: Springer. cop. 2018, str. 39. [ 1538016707]

FAGANELI PUCER, Jana, ŠTRUMBELJ, Erik. Impact of changes in climate on air pollution in Slovenia between 2002 and 2017. Environmental pollution, ISSN 0269-7491. [Print ed.], 2018, vol. 242, part A, str. 398-406. [1537827267]

FAGANELI PUCER, Jana, PIRŠ, Gregor, ŠTRUMBELJ, Erik. A Bayesian approach to forecasting daily air-pollutant levels. Knowledge and information systems, ISSN 0219-1377. [Print ed.], Dec. 2018, vol. 57, no. 3, str. 635-654. [1537745603]

CIGLIČ, Rok, PERKO, Drago, HRVATIN, Mauro, ŠTRUMBELJ, Erik. Modeling and evaluating older landscape classifications with modern quantitative methods. V: From pattern and process to people and action. Ghent: IALE-Europe. 2017. [41978413]

BREG VALJAVEC, Mateja, CIGLIČ, Rok, OŠTIR, Krištof, RIBEIRO, Daniela. Modelling habitats in karstland scape by integrating remote sensing and topography data. Open geosciences, ISSN 2391-5447, 2018, vol. 10, issue 1, str. 137-156. [43194413]

ZUPANC, Kaja, ŠTRUMBELJ, Erik. A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment. PloS one, ISSN 1932-6203, Apr. 2018, vol. 13, no. 4, str. 1-16. [1537763779]

ROBNIK ŠIKONJA, Marko. Explanation of prediction models with ExplainPrediction. Informatica : an international journal of computing and informatics, ISSN 0350-5596, Mar. 2018, vol. 42, no. 1, str. 13-22. [1537765315]

CIGLIČ, Rok, PERKO, Drago. A method for evaluating raster data layers according to landscape classification scale. Ecological informatics, ISSN 1574-9541, 2017, 39, str. 45-55. [ 41426477]

CIGLIČ, Rok. Landscape classification with quantitative methods. Evaluating raster data layers according to the scale of classification : predavanje na Ss. Cyril and Methodius University, Faculty of Natural Sciences and Mathematics, Institute of Geography, Skopje (Makedonija), 22. maj 2017. [ 41589293]

CIGLIČ, Rok. Evaluating landscape classifications with machine learning : the case of Slovenia : prispevek na 4th International Scientific Conference Geobalcanica 2018 "Connect all geographers!", Ohrid (Makedonija), 15. maj 2018. [ 43886893]