With the recent advances in theoretical and computational methods for characterizing chemical, physicochemical and biological phenomena, volumes of information (data matrices) are often available for analysis. This is however not necessarily all advantageous as it usually engenders high dimensionality (i.e. “small sample-many features”) space, which has detrimental influence on the performance of regression and classification algorithms. Moreover, an exhaustive examination of the entire feature (variable) space in the search of subsets that best describe a specified phenomenon comes along with high computational complexity, in addition to the fact that such exploration may lead to the selection of features that aggravate data overfitting. It is thus important to develop procedures that filter out noisy, redundant or highly correlated variables without affecting the learning performance. It is known that dimensionality reduction usually improves the quality of models (especially, their predictive power), in addition to permitting greater computational efficiency. In this sense, the IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) software is conceived as a free computational tool for supervised and unsupervised feature selection based on information-theoretic parameters.
Read More »