As data collection technology and data storage devices become more powerful, many scientists collect many variables and large numbers of observations in their studies.
High-Dimensional Data Analysis
Fields such as genetics rely on high-dimensional data, and—thanks to recent advances in technology—high-dimensional data are becoming increasingly common in many fields in the health, behavioral, and social sciences. The goal of high-dimensional data analysis is often to find similarities and patterns inside of massive and complex data sets. This requires new and innovative methods for analysis.
High-Dimensional Variable Selection
Often, high numbers of variables are collected to reduce possible model bias (approximation error). Unfortunately, many variables are often highly correlated, and a complicated model may include many insignificant variables. These models may have less predictive power and may be difficult to interpret. In these cases, a more parsimonious model becomes desirable. Approaches such as the Smoothly Clipped Absolute Deviation (SCAD) penalty are proposed to select significant variables for various statistical models. This approach deletes insignificant covariates by estimating their coefficients to be zero and other covariates are adjusted accordingly.
Center Research on High-Dimensional Data
Center developed techniques for high-dimensional data analysis and variable selection can be applied broadly. The page below lists our publications according to the type of data involved. Applied examples appear at the bottom of the page. We also developed PROC SCAD, a pair of SAS procedures using the SCAD penalty for high-dimensional variable selection.