Abstract

Background: Feature selection and gene set analysis are of increasing interest in the field of bioinformatics. While these two approaches have been developed for different purposes, we describe how some gene set analysis methods can be utilized to conduct feature selection.

Methods: We adopted a gene set analysis method, the significance analysis of microarray gene set reduction (SAMGSR) algorithm, to carry out feature selection for longitudinal gene expression data.

Results: Using a real-world application and simulated data, it is demonstrated that the proposed SAMGSR extension outperforms other relevant methods. In this study, we illustrate that a gene’s expression profiles over time can be regarded as a gene set and then a suitable gene set analysis method can be utilized directly to select relevant genes associated with the phenotype of interest over time.

Conclusions: We believe this work will motivate more research to bridge feature selection and gene set analysis, with the development of novel algorithms capable of carrying out feature selection for longitudinal gene expression data.

Document Type

Article

Publication Date

12-7-2018

Notes/Citation Information

Published in BMC Medical Informatics and Decision Making, v. 18, suppl. 5, 115, p. 89-96.

© The Author(s). 2018

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Digital Object Identifier (DOI)

https://doi.org/10.1186/s12911-018-0685-8

Funding Information

Publication of this article was sponsored by a grant (No. 31401123) from the National Natural Science Foundation of China.

Related Content

Raw data were downloaded from the Gene Expression Omnibus repository (http://www.ncbi.nlm.nih.gov/geo/). The accession number is GSE36809.

12911_2018_685_MOESM1_ESM.txt (7 kB)
Additional file 1: The R codes of the longitudinal SAMGSR method.

Share

COinS