HotspotEC: Protein binding hot spots prediction from sequence only
by a new ensemble learning method
To predict protein hot spot residues by a sequence-based ensemble classifier with neighbor environmental properties ...
Hot spots are interfacial core areas of binding proteins, which have been applied as targets in drug design. Experimental methods are costly in both time and expense to locate hot spot areas. Recently, in-silicon computational methods have been widely used for hot spot prediction through sequence or structure characterization. As the structural information of proteins is not always solved, and thus hot spot identification from amino acid sequences only is more useful for real-life applications. This work proposes a new sequence-based model for hot spot prediction that combines an encoding schema of physicochemical features for amino acid residues and a classifier ensemble system. The encoding schema for the model consider the local evolution information of physicochemical features for amino acids. The model consists of 83 independent classifiers involving the IBk (Instance-based k means) algorithm. Then top-performance classifiers are selected to form an ensemble by a majority voting technique. The ensemble classifier outperforms the state-of-the-art computational methods.
A simple Matlab implement of our predictor is available here: HotspotEC.
Copyright @ 2004-2016 by Peng Chen
All Rights Reserved