Monday 13th, May 2024 11:00

AFMB lab

Abstract

Protein-carbohydrate (PC) interactions govern a wide variety of biological processes and play a crucial role in the development of different diseases. During the last decades, the release of an impressive amount of data on carbohydrate-binding proteins led to the emergence of first data driven methods for prediction of carbohydrate binding sites. Nevertheless, the performance of such models remains limited as compared to similar bioinformatics problems, and its correct evaluation is hindered by the lack of the reliable and non-redundant datasets. In the current study, we address this challenge and perform an exhaustive analysis of the diversity of PC interfaces and of its impact on prediction models accuracy. 

We have gathered and annotated all the available information on PC interfaces found in the Protein Data Bank (PDB) in a user-friendly web-server, DIONYSUS : https://www.dsimb.inserm.fr/DIONYSUS/. Using a customized algorithm, we identified >46k PC complexes interacting with one of 3k carbohydrate-containing ligands of the PDB (increasing the number of these structures by orders of magnitude as compared to 900 ligand names available in the Chemical Component Dictionary). We performed an exhaustive study of PC interface diversity at different levels: by functional class of interaction, protein sequence identity and local geometrical similarity between the interfaces. Furthermore, we identified representative structures of different classes of PC interactions and used them to annotate PC complexes with missing functional information. 

Finally, the developed database allows us to train several deep learning models based on protein language model encoding of the protein sequence combined to molecular graphs to encode protein structure. In-depth analysis of our model performance and its comparison to the previously published methods demonstrates significant improvements of carbohydrate binding site identification as well as highlights the remaining challenges in the field.

Tatiana Galochkina est MCF à Université Paris Cité, Faculté de Sciences, UFR Sciences du Vivant depuis 2019 et fait partie de l’Équipe DSIMB du laboratoire BIGR (Biologie intégrée du globule rouge, INSERM UMRS 1134 et Université Paris Cité). Ses principaux intérêts de recherche incluent la modélisation de systèmes moléculaires complexes et le développement de modèles prédictifs pour la dynamique et les interactions des protéines. Elle a obtenu un projet ANR jeune chercheur intitulé “SugarPred : Deciphering protein-carbohydrate interactions“. En effet, les interactions protéines-glucides jouent un rôle crucial dans divers processus biologiques. Cependant, ces interactions restent peu étudiées en raison de la difficulté de leur description expérimentale. L’objectif principal de son projet est la classification précise et le développement d’outils basés sur l’apprentissage automatique pour prédire les sites potentiels de liaison des glucides à la surface des protéines. Plus d’informations sur https://sites.google.com/view/tatiana-galochkina

Published on May 12, 2024