Question of the Week - Quantitative Methods
Which of the following methods can be used to overcome the data mining bias?
Question of the Week - Quantitative Methods
Test hundreds of variables for statistical significance
Use data sets created by previous research
Test the predictions on new data
Comments
If you test 100 variables for predictors with 95% confidence, you would expect 5 of those variables to appear as predictors even if there is no relationship with the data. The problem, not the solution, is that hundreds of variables are being tested for statistical significance. Another data mining issue (intergenerational data mining) can come about because a researcher uses another researcher's data. A solution to both problems involves testing the conclusions on a new data set.