With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene’s expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) were then optimized by either the coordinate descent method or the threshold gradient descent regularization method. By applying the proposed methods to simulated data and a traumatic injury dataset, we have demonstrated that the proposed methods, especially for the combination of sign average and threshold gradient descent regularization, outperform other competitive algorithms. To conclude, the proposed methods are highly recommended for studies with the objective of carrying out feature selection for longitudinal gene expression data.
Digital Object Identifier (DOI)
This study was supported by funding (No. 31401123) from the Natural Science Foundation of China.
Data were retrieved from the Gene Expression Omnibus repository (http://www.ncbi.nlm.nih.gov/geo/). The accession number is GSE36809.
Supplementary File 1: R codes for the proposed method (the sign average and TGDR method). (Supplementary Materials)
Tian, Suyan and Wang, Chi, "Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time" (2019). Biostatistics Faculty Publications. 43.