Computational Biology

Bioinformatics of protein sequences, families and subfamilies

Fungal genomics and secretomics



The missions of the candidate subscribe in the current efforts of subfamily creations and guidance for functional exploration of the sequence space in CAZy. 

Following the deluge of sequences, we moved from phylogenetic to SSN (sequence similarity networks) analyses to divide large multifunctional families into more specific subfamilies with increased predictive power. After using SSN to produce the GH16 subfamilies (Viborg et al. 2019; https://pubmed.ncbi.nlm.nih.gov/31501245), we designed a method to guide and accelerate subfamily creations (Hornung et al. 2022, submitted; https://www.biorxiv.org/content/10.1101/2022.04.19.488343v2).

The first mission of the candidate will be to use and pursue the development of the aforementioned techniques with the analysis of a defined set of families to be divided into subfamilies. We particularly expect needs in :

  • Methodological solutions to cope with extremely large and/or diverse families;
  • Visual representation of information to guide the decision (e.g. taxonomy, modularity);
  • Automatization in implementing the designed subfamilies in the CAZy data(base).

As a result, we expect to discover some subfamilies without any characterized member so far, or too few to make the subfamily annotation useful. The second mission of the candidate will thus be to guide the selection of candidate enzymes to be later functionally characterized through in-depth biochemical assays. To optimize the chance of success in wet-lab experiments, the candidate will take advantage of available quantitative/qualitative information to be integrated/cross-linked with subfamily members. For example, secretomics data is regularly produced by our collaborators at BBF who cultivate several fungal species of interest in various conditions, like distinct nutritive sources (e.g. https://pubmed.ncbi.nlm.nih.gov/30976326), or O2 deprivation (subject of the OxyMist project). Additional -omics data could be considered, such as seeking other publicly available transcriptomics datasets. Ultimately, proteins of unknown function could be selected based on secretomics/transcriptomics profiles, grouped into putative families, and further investigated for the potential discovery of novel carbohydrate-active enzymes.

The initial contract is for two years, with a possible extension of one or two years depending on the experience and the academic degree, as the monthly gross salary (Master 2120-2750 € / PhD 2730-3070 €).


The candidate should have a PhD or a MSc in Bioinformatics, with a strong background in bioinformatics of sequence analysis, especially homology searches (Blastp, HMMER) and comparative genomics. Good programming skills (e.g. Python, C++) are expected. Additional knowledge in glycobiology, fungal analysis, integration of -omics data, sequence similarity networks or databases will be an added value.






Filamentous fungi degrade plant cell wall polymers using secreted enzymes to acquire nutrients. They hence play pivotal roles in the carbon cycle via the degradation of dead organic matter, and constitute invaluable resources for the biotechnological production of chemicals from renewable biomass, as an alternative to fossil reserves. Depending on the type of glycans and environments, fungi are exposed to constraints affecting their growth and degradation capabilities. Dioxygen availability is an essential parameter which has been overlooked so far.

The candidate will integrate the OxyMist project which aims to decipher the role of O2 in the degradation ability of the microbial communities (https://ign.ku.dk/english/oxymist/; https://novonordiskfonden.dk/en/news/major-research-project-will-focus-on-fungal-feeding-habits/). This collaborative project involves teams from Copenhagen, Cambridge and Marseille. In Marseille, the OxyMist project is piloted by the INRAE laboratory BBF (Fungal Biotechnology and Biodiversity; https://www6.paca.inrae.fr/umrbcf_eng/) which has a long-standing collaboration with the Glycogenomics group in AFMB laboratory (Architecture and Function of Biological Macromolecules; https://www.afmb.univ-mrs.fr/en/) for the bioinformatics aspects. Both laboratories are located on the Luminy campus in the middle of the Calanques National Park.

The candidate will join the Glycogenomics group which gathers bioinformaticians and biochemists to maintain and develop the CAZy database (Carbohydrate Active EnZymes database; www.cazy.org). Since >20 years, CAZy is the worlwide reference family classification of the enzymes involved in glycan assembly and breakdown, thanks to the strong involvement in human curation and the focus on functional information (Drula et al. 2022; https://pubmed.ncbi.nlm.nih.gov/34850161). In the era of high-throughput sequencing, CAZy became an essential resource for the functional readability and discovery in genomes and metagenomes.


A cover letter, a curriculum, and the name of two referees


Applications should be sent by email to: nicolas.terrapon@univ-amu.frelodie.drula@inrae.fr and jean-guy.berrin@inrae.fr

Published on July 5, 2022