2017/1/13 (Fri.) 11:10-12:00 A Framework for making better predictions by directly estimating Variables’ Predictivity

Lectures:A Framework for making better predictions  by directly estimating Variables’  Predictivity

Spearker:Professor Shaw-Hwa Lo (羅小華教授)

Introduction:Columbia University (美國哥倫比亞大學)

Time & Place:2017/1/13(Fri.) 11:10-12:00 4F-427, Assembly Building I , NCTU(國立交通大學綜一館427)


In our last paper, we showed that significant variables may not necessarily be predictive, and that good predictors may not appear statistically significant. This left us with an important question: how can we find highly predictive variables then, if not through a guideline of statistical significance? In this project, we provide a theoretical framework from which to design good measures of prediction in general. Importantly, we introduce a variable set's predictivity as a new parameter of interest to estimate, and provide the I-score as a candidate statistic to estimate variable set predictivity.

Current approaches to prediction generally include using a significance-based criterion for evaluating variables to use in models and evaluating variables and models simultaneously for prediction using cross-validation or independent test data. Using the I-score prediction framework allows us to define a novel measure of predictivity based on observed data, which in turn enables assessing variable sets for, preferably high, predictivity. We offer simulations and an application of the I-score on real data to demonstrate the statistic’s predictive performance on sample data. These show that the I-score can capture highly predictive variable sets, estimates a lower bound for the theoretical correct prediction rate, and correlates well with the out of sample correct rate. We suggest that using the I-score method can aid in finding variable sets with promising prediction rates, however, further research in the avenue of sample-based measures of predictivity is needed.

There are many applications for which using the I-score would be useful, for example in formulating predictions about diseases with high dimensional data, such as gene datasets, in the social sciences for text prediction or financial markets predictions; in terrorism, civil war, elections and financial markets. We’re hoping this opens up a new field of work that would focus on  designing new statistics that measure predictivity.

Big Data Research Center, NCTU
Institute of Statistics, NCTU
Institute of Statistics, NTHU