Recommended Reading for High-Dimensional Data

Click on a topic to display the related articles.


Fan, J., & Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101-148.  View article

Fan, J., & Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In M. Sanz-Sole, J. Soria, J. L. Varona, & J. Verdera (Eds.), Proceedings of the International Congress of Mathematicians (Vol. III, pp. 595-622). Zurich: European Mathematical Society.

Donoho, D. L. (2000). High-dimensional data analysis: The curses and blessings of dimensionality. Aide-Memoire of the lecture in American Mathematical Society conference: Math challenges of 21st Centrury.  Available at

Shao, J.  (1997). An asymptotic theory for linear model selection. Statistica Sinica, 7, 221-264.

Foster, D. P., & George, E. I. (1994). The risk inflation criterion for multiple regression, Annals of Statistics, 22, 1947-1975.

Nishii, R.  (1984) Asymptotic properties of criteria for selection of variables in multiple regression. Annals of Statistics, 12, 758-765.

Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716-723.

Mallows, C. L. (1973). Some comments on Cp. Technometrics, 15, 661-675.

Variable selection via penalized methods with a continuous penalty

Li, J., Das, K., Fu, G., Li, R., & Wu, R. (2011). The Bayesian LASSO for genome-wide association studies. Bioinformatics, 27, 516-523.

Wu, T. T., & Lange, K. (2008). Coordinate descent algorithms for LASSO penalized regression. Annals of Applied Statistics, 2, 224 - 244.

Zou, H. (2006). The adaptive LASSO and its oracle properties. Journal of the American Statistical Association, 101, 1418 - 1429.

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, B  67, 301- 320.

Tibshirani, R.  (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, B, 58, 267-288.


Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360.

Fu, W. J. (1998). Penalized regression: the bridge versus the LASSO. Journal of Computational and Graphical Statistics,  7, 397-416.

Breiman, L. (1995).  Better subset regression using the nonnegative garrote. Technometrics, 37, 373-384.

Frank, I. E., & Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics, 35, 109-148.


Zou, H., & Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Annals of Statistics, 36, 001509-1566.  PMCID: PMC2759727  View article

Friedman, J., Hastie, T., Hofling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1, 302-332.

Hunter, D. R., & Li, R.  (2005). Variable selection using MM algorithms. Annals of Statistics, 33, 1617-1642.

Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression (with discussion). Annals of Statistics, 32, 407-499.


Zhang, Y., Li, R., & Tsai, C.-L. (2010). Regularization parameter selections via generalized information criterion. Journal of the American Statistical Association, 105, 312-323. PMCID: PMC2911045  View article

Wang, H., Li, R., & Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94, 553-568.  View abstract


Candes, E., & Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Annals of Statistics, 35, 2313-2404.

Fan, J., & Peng, H.  (2004). Nonconcave penalized likelihood with a diverging number of parameters. Annals of Statistics, 32, 928-961.


Liu, J., Li, R., & Wu, R. (2014). Feature selection for varying coefficient models with ultrahigh-dimensional covariates. Journal of the American Statistical Association, 109, 266-274.

Huang, D., Li, R., & Wang, H. (2014). Feature screening for ultrahigh dimensional categorical data with applications. Journal of Business and Economic Statistics, 32, 237-244.

Wang, L., Kim. Y., & Li, R. (2013). Calibrating nonconvex penalized regression in ultrahigh dimension. Annals of Statistics, 41, 2505-2536.

Fan, J., & Lv, J. (2011). Non-concave penalized likelihood with NP-dimensionality. IEEE-Information Theory, 57, 5467–5484.

Fan, Y., & Li, R. (2012). Variable selection in linear mixed effects models. Annals of Statistics, 40, 2043 - 2068. PMC Journal- In Process

Li, R., Zhong, W., & Zhu, L. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association. 107, 1129-1139.

Zhu, L, Li, L., Li, R., & Zhu, L.-X. (2011). Model-free feature screening for ultrahigh dimensional data. Journal of the American Statistical Association, 106, 1464–1475.

Fan, J., Samworth, R., & Wu, Y. (2009). Ultrahigh dimensional variable selection: beyond the linear model. Journal of Machine Learning Research, 10, 1829-1853.

Hall, P., & Miller, H. (2009).  Using generalized correlation to effect variable selection in very high dimensional problems. Journal of Computational and Graphical Statistics, 18, 533–550.

Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). Journal of the Royal Statistical Society, B, 70, 849-911.

Buu, A. Johnson, N. J., Li, R., & Tan, X. (2011). New variable selection methods for zero-inflated count data with applications to the substance abuse field. Statistics in Medicine, 30, 2326-2340. PMCID: PMC3133860  View abstract

Wang, Y., Chen, H., Li, R., Duan, N., & Lewis-Fernandez, R. (2011). Prediction-based structured variable selection through receiver operating curve. Biometrics, 67, 896-905.  PMCID: PMC3134557  View abstract

Li, J., Das, K., Fu, G., Li, R., & Wu, R. (2011). The Bayesian LASSO for genome-wide association studies. Bioinformatics, 27, 516-523.

Follow Us