HotspotRP: Prediction of protein hot spots from whole sequences

by a random projection ensemble system



Abstract:

  Hotspot residues are important in the determination of protein-protein interactions and they always perform specific functions in biological processes. The determination of hotspot residues is commonly using alanine scanning mutagenesis experiments, which is costly and time-consuming. To address the issue, computational methods have been developed. Most of them are structure-based, i.e., using the information of solved protein structures. However, the number of solved protein structures are extremely less than that of sequences. Moreover, all of the hotspot predictors identified hotspots from the interfaces of protein complexes. Seldom works identified hotspot residues from the whole protein sequences. Therefore determining hotspots from whole protein sequences by sequence information alone is urgent. 

  To address the issue of hotspot predictions from the whole sequences of proteins, we proposed an ensemble system with random projections by the use of statistically physicochemical properties of amino acids. First, an encoding scheme involving sequence profiles of residues and physicochemical properties from AAindex1 dataset is developed. Then random projection technique was adopted to project the encoding vectors into a reduced space. Then, several better random projections are obtained by training an IBk classifier based on the training dataset, which are thus applied to the test dataset. The ensemble of the random projection classifiers are therefore obtained. Experimental results showed that although the performance of our method is not good enough to real applications of hotspots from whole sequences, it is very promising in the determination of hotspot residues.

The flowchart of the ensemble system for hotspot predictions. Here Rk means the k-th random projection.

 

Software available:

 A simple Matlab implement of our predictor is available here: HotspotRP.

 

Suplementary Materils:
Datasets: ASEdb and BID.

Prediction results of ISIS: ISIS predictions.

 

Citation:
Jinjian Jiang, Nian Wang, Peng Chen, Chunhou Zheng and Bing Wang, Prediction of protein hot spots from whole sequences by a random projection ensemble system. Submitted.

Copyright @ 2004-2016 by Peng Chen

All Rights Reserved