AntiCP3: Prediction of Anticancer Proteins Using Evolutionary Information from Protein Language Models
AntiCP3: Prediction of Anticancer Proteins Using Evolutionary Information from Protein Language Models
Gupta, A.; Chauhan, M.; Tomer, R.; Raghava, G. P. S.
AbstractA number of computational methods have been developed in the past for predicting anticancer peptides, including AntiCP and AntiCP2 from our group. While these tools have been widely used by the scientific community, they are not suitable for predicting anticancer proteins. In this study, we present AntiCP3, the first dedicated method for the prediction of anticancer proteins. All models were trained using five-fold cross-validation and evaluated on an independent dataset not used during training. Our initial analysis revealed distinct compositional differences between anticancer peptides and proteins, justifying the need for a separate prediction framework. We first implemented similarity-based approaches, which yielded moderate performance. Subsequently, we developed machine learning and deep learning models using conventional protein features, achieving a maximum AUC of 0.72. The performance improved to an AUC of 0.79 with the incorporation of evolutionary information through PSSM profiles. Further enhancement was observed when embeddings from a fine-tuned protein language model ESM-t33 were used, leading to a best AUC of 0.90. Finally, a hybrid approach combining BLAST with our machine learning model achieved an AUC of 0.91. To facilitate the scientific community, we have implemented AntiCP3 as both a web server and standalone software for the prediction of anticancer proteins (https://webs.iiitd.edu.in/raghava/anticp3/). We have also deployed our model at hugging face https://huggingface.co/raghavagps-group/anticp3.