HotspotEC: Protein binding hot spots prediction from sequence only

by a new ensemble learning method


   Hot spots are interfacial core areas of binding proteins, which have been applied as targets in drug design. Experimental methods are costly in both time and expense to locate hot spot areas. Recently, in-silicon computational methods have been widely used for hot spot prediction through sequence or structure characterization. As the structural information of proteins is not always solved, and thus hot spot identification from amino acid sequences only is more useful for real-life applications. This work proposes a new sequence-based model for hot spot prediction that combines an encoding schema of physicochemical features for amino acid residues and a classifier ensemble system. The encoding schema for the model consider the local evolution information of physicochemical features for amino acids. The model consists of 83 independent classifiers involving the IBk (Instance-based k means) algorithm. Then top-performance classifiers are selected to form an ensemble by a majority voting technique. The ensemble classifier outperforms the state-of-the-art computational methods.

Illustration of encoding schema for residues. It shows (a) the first-tier, (b) the second-tier, and (c) the third-tier sequence order correlation mode along a protein sequence. Here, R1 , R2, and RL represent the 1st, 2nd, and L-th residue in the protein sequence, respectively, each of them belongs to the 20 common types of amino acids. Graph (a) reflects the correlation mode between all the nearest neighboring residues, (b) the second-nearest neighboring residues, (c) the third-nearest neighboring residues, ..., and (d) the first and the last residues of the sliding window. In this figure, the sixth residue is the central residue and all the tier sequence order correlations are considered as encoding features for representing the central residue.


Software available:

 A simple Matlab implement of our predictor is available here: HotspotEC.


Suplementary Materils:
S1, S2, S3, S4: Material.


ShanShan Hu, Peng Chen, Bing Wang, and Jinyan Li, Protein binding hot spots prediction from sequence only by a new ensemble learning method. Submitted.

Copyright @ 2004-2016 by Peng Chen

All Rights Reserved