LigandDSES: A sequence-based dynamic ensemble learning system for

protein ligand-binding site prediction



Abstract:

Background:
Proteins have the fundamental ability to selectively bind to other molecules and perform specific functions through such interactions, such as protein-ligand binding. Accurate prediction of protein residues that physically bind to ligands is important for drug design and protein docking studies. Most of the successful protein-ligand binding predictions were based on known structures. However, structural information is not largely available in practice due to the huge gap between the number of known protein sequences and that of experimentally solved structures.

Results:
This paper proposes a dynamic ensemble approach to identify protein-ligand binding residues by using sequence information only. To avoid problems resulting from highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we constructed several balanced data sets and we trained a random forest classifier for each of them. We dynamically selected a subset of classifiers according to the similarity between the target protein and the proteins in the training data set. The combination of the predictions of the classifier subset to each query protein target yielded the final predictions. The ensemble of these classifiers formed a sequence-based predictor to identify protein-ligand binding sites.

Conclusions:
Experimental results on two CASP datasets and the ccPDB dataset demonstrated that of our proposed method compared favorably with the state-of-the-art.

Predictions for three classifier techniques on PDB target T0635 (PDB ID 3n1u). Correctly predicted binding site residues are colored in red, the wrongly predicted binding sites in green, and the wrongly predicted non-binding sites in blue.

 

Software available:

 A simple JAVA implement of our predictor is available here: LigandDSES.

 

Datasets:

There are 27 targets for FN in CASP8: sequence and binding sites.

There are 30 targets for FN in CASP9: sequence and binding sites.

There are 300 non-metal targets and 163 "Fe" targets.

 

Citation:

Peng Chen, Shanshan Hu, Jun Zhang, Xin Gao, Jinyan Li, Jun-feng Xia, and Bing Wang, A sequence-based dynamic ensemble learning system for protein ligand-binding site prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016, 13(5):901-912.


Copyright @ 2004-2015 by Peng Chen

All Rights Reserved