The Methodology Center | Advancing methods, improving health

High-Dimensional Data Analysis and Variable Selection


As data collection technology and data storage devices become more powerful, many scientists collect many variables and large numbers of observations in their studies.



High-Dimensional Data Analysis

Fields such as genetics rely on high-dimensional data, and—thanks to recent advances in technology—high-dimensional data are becoming increasingly common in many fields in the health, behavioral, and social sciences. The goal of high-dimensional data analysis is often to find similarities and patterns inside of massive and complex data sets. This requires new and innovative methods for analysis.



High-Dimensional Variable Selection

Often, high numbers of variables are collected to reduce possible model bias (approximation error). Unfortunately, many variables are often highly correlated, and a complicated model may include many insignificant variables. These models may have less predictive power and may be difficult to interpret. In these cases, a more parsimonious model becomes desirable. Approaches such as the Smoothly Clipped Absolute Deviation (SCAD) penalty are proposed to select significant variables for various statistical models. This approach deletes insignificant covariates by estimating their coefficients to be zero and other covariates are adjusted accordingly.



Center Research on High-Dimensional Data

Center developed techniques for high-dimensional data analysis and variable selection can be applied broadly. The page below lists our publications according to the type of data involved. Applied examples appear at the bottom of the page. We also developed PROC SCAD, a pair of SAS procedures using the SCAD penalty for high-dimensional variable selection.

Read more


Runze LiLead researcher: Runze Li


Other researchers: 

Anne Buu

John Dziak


View All

, ,



Our research on variable selection is/was supported by the National Science Foundation grants DMS 0102505, DMS 0322673, DMS 0348869, CCF 0430349 and DMS 0722351 and the National Institute on Drug Abuse grants P50 DA10075 and National Institutes of Health Roadmap grant R21 DA024260.

Follow Us