Tag Archives: Feature Selection

IMMAN Software

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedIn


IMMAN (the acronym for Information theory-based CheMoMetrics ANalysis) is a multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks.

IMMAN offers a remarkably user-friendly graphical interface stratified in sections according to the different information-theoretic concepts. IMMAN is developed in JAVA programming language and thus needs a virtual machine for its execution, and in this case requires a JDK version 1.7 or superior.

A total of 17 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in case.

The accepted input file formats for IMMAN are Tab and Comma Separated Value Files (.txt, .csv). In the conception of this application, we considered it useful to provide for real-time computations of multiple dataset files, and thus comparisons of features or datasets from different sources are possible.

It is important to remark that IMMAN provides the possibility of performing single parameter ranking or ensemble (multi-criteria) ranking using the former as base ranking methods. Note that the ensemble ranking procedure could be performed using the scores (values for the analyzed features)

Read More »

Feature Selection & Variable Screening

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedIn


With the recent advances in theoretical and computational methods for characterizing chemical, physicochemical and biological phenomena, volumes of information (data matrices) are often available for analysis. This is however not necessarily all advantageous as it usually engenders high dimensionality (i.e. “small sample-many features”) space, which has detrimental influence on the performance of regression and classification algorithms. Moreover, an exhaustive examination of the entire feature (variable) space in the search of subsets that best describe a specified phenomenon comes along with high computational complexity, in addition to the fact that such exploration may lead to the selection of features that aggravate data overfitting. It is thus important to develop procedures that filter out noisy, redundant or highly correlated variables without affecting the learning performance. It is known that dimensionality reduction usually improves the quality of models (especially, their predictive power), in addition to permitting greater computational efficiency. In this sense, the IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) software is conceived as a free computational tool for supervised and unsupervised feature selection based on information-theoretic parameters.
Read More »