Hotspot residues are important in the determination of
protein-protein interactions and they always perform
specific functions in biological processes. The
determination of hotspot residues is commonly using alanine
scanning mutagenesis experiments, which is costly and
time-consuming. To address the issue, computational methods
have been developed. Most of them are structure-based, i.e.,
using the information of solved protein structures. However,
the number of solved protein structures are extremely less
than that of sequences. Moreover, all of the hotspot
predictors identified hotspots from the interfaces of
protein complexes. Seldom works identified hotspot residues
from the whole protein sequences. Therefore determining
hotspots from whole protein sequences by sequence
information alone is urgent.
To address the issue of hotspot predictions from the whole
sequences of proteins, we proposed an ensemble system with
random projections by the use of statistically
physicochemical properties of amino acids. First, an
encoding scheme involving sequence profiles of residues and
physicochemical properties from AAindex1 dataset is
developed. Then random projection technique was adopted to
project the encoding vectors into a reduced space. Then,
several better random projections are obtained by training
an IBk classifier based on the training dataset, which are
thus applied to the test dataset. The ensemble of the random
projection classifiers are therefore obtained. Experimental
results showed that although the performance of our method
is not good enough to real applications of hotspots from
whole sequences, it is very promising in the determination
of hotspot residues. |
|