Bioinformatics & genomics researcher

I work at the crossing of computer science and genetics/genomics.

I am a postdoctoral researcher at the Kenny Lab at the Icahn School of Medicine at Mount Sinai.

I recently completed my PhD at the University of Liège, where I carried my research at the Unit of Human Genetics of the GIGA research center. During my PhD, I was advised by Prof. Vincent Bours and Prof. Guy Jerusalem and my work was funded by an FRS-FNRS fellowship.


I apply machine learning, algorithmics, and software engineering methods to data arising from population genomics and cancer genomics related questions.

As the modern genomic and transcriptomic techniques yield vast amounts of data, machine learning methods are well suited to tackle problems taking roots in these fields.

My research projects have led me to work with exome and transcriptome (RNA) sequencing data, as well as aCGH. I was particularly interested in the integration of these different techniques in human cancer samples, and how the consequent emergent information would give a better understanding of the sample specificities.

I've had the opportunity to work on projects related to the clinical (non-invasive diagnosis), technical (CNV detection), and molecular (non-coding RNAs) aspects of cancer.

My activities have led me to work both in a research and a clinical setting, often building bridges between both.


Peer-reviewed journal articles, as first author

In this paper, we explore the use of supervised learning methods to rank large ensembles of genes defined by their expression values measured with RNA-Seq in a typical 2 classes sample set.
First, we use one of the variable importance measures generated by the random forests classification algorithm as a metric to rank genes. Second, we define the EPS (extreme pseudo-samples) pipeline, making use of VAEs (Variational Autoencoders) and regressors to extract a ranking of genes while leveraging the feature space of both virtual and comparable samples.
We show that, on 12 cancer RNA-Seq data sets ranging from 323 to 1210 samples, using either a random forests based gene selection method or the EPS pipeline outperforms differential expression analysis for 9 and 8 out of the 12 datasets respectively, in terms of identifying subsets of genes associated with survival. These results demonstrate the potential of supervised learning-based gene selection methods in RNA-Seq studies.

Frontiers in Genetics. 2018

In this work, 22 ER+ breast cancers and their paired adjacent non-malignant tissues were analyzed by strand-specific RNA-Seq. To highlight ncNATs potentially playing a role in protein coding gene regulations that occur in breast cancer, three different data analysis methods were used: differential expression analysis of ncNATs between tumor and non-malignant tissues, differential correlation analysis of paired ncNAT/PCT between tumor and non-malignant tissues, and ncNAT/PCT read count ratio variation between tumor and non-malignant tissues. Each of these methods yielded lists of ncNAT/PCT pairs that were enriched in survival-associated genes. This work highlights ncNAT lists that display potential to affect the expression of protein-coding genes involved in breast cancer.

Scientific Reports. 2017

We designed a metric allowing the comparison of CNV profiles, independently of the technique used and assessed the validity of using a pool of unrelated healthy DNA instead of a matched healthy tissue as reference in exome‐based CNV detection.
We compared the CNV profiles obtained with 3 different approaches (aCGH, exome with matched healthy reference, exome with pool of unrelated healthy references) on 3 multiple myeloma (MM) samples. We showed that the usual analyses performed to compare CNV profiles (deletion/amplification ratios and CNV size distribution) lack in precision when confronted with low LRR values.
We showed that the metric‐based distance constitutes a more accurate comparison of CNV profiles. Based on these analyses, we concluded that a reliable picture of CNV alterations in MM samples can be obtained from whole‐exome sequencing in the absence of a matched healthy sample.

Genetic Epidemiology. 2016

Circulating microRNAs (miRNAs) are increasingly recognized as powerful biomarkers in several pathologies, including breast cancer. Here, their plasmatic levels were measured to be used as an alternative screening procedure to mammography for breast cancer diagnosis.
We measured the plasma miRNA profile of 378 women, and we built an 8-miRNA random forest based diagnostic tool which performs better than screening mammography (AUC on independent cohort: 0.81). The accuracy of the diagnostic tool remains unchanged considering age and tumor stage, and the model is able to differentiate gynecologic cancers from breast cancers, and to correctly classify breast cancers in remission.
This work led to the registration of a European patent.

Oncotarget. 2015

Peer-reviewed journal articles, as co-author

Breast cancer treatment can have long term side effects, including congestive heart failure (CHF). In this study, we evaluated innovative circulating cardiac biomarkers during and after anthracycline-based neoadjuvant chemotherapy (NAC) in breast cancer patients. Levels of cardiac-specific troponins T (cTnT), N-terminal natriuretic peptides (NT-proBNP), soluble ST2 (sST2) and 10 circulating microRNAs (miRNAs) were measured. Under chemotherapy, we observed an elevation of cTnT and NT-proBNP levels, but also the upregulation of sST2 and of 4 CHF-related miRNAs. We showed that circulating miRNAs and sST2 are potential biomarkers of the chemotherapy-related cardiac dysfunction (CRCD).

BMC Cancer. 2018

The genomic profile of multiple myeloma (MM) has prognostic value (hyperdiploid patients have a better prognosis than nonhyperdiploid patients). However, many other parameters (mutations, epigenetic modifications, genomic heterogeneity) may influence the prognosis.
We performed aCGH on a cohort of 162 patients to evaluate the frequency of genomic gains and losses. We identified a high frequency of X chromosome alterations leading to partial Xq duplication, often associated with inactive X (Xi) deletion in female patients.
This partial X duplication could be a cytogenetic marker of aneuploidy as it is correlated with a high number of chromosomal breakages. Patients with high level of chromosomal breakage had reduced survival regardless of the region implicated. Cancer genes located in this altered region (IKBKG, IRAK1, both members of the NFKB pathway) were shown to have higher transcriptional levels.

Genes, Chromosomes and Cancer. 2016

The BRCA1 gene plays a key role in triple negative breast cancers (TNBCs), in which its expression can be lost by multiple mechanisms. Here, we explored the correlations between BRCA1-related molecular parameters, tumor characteristics and clinical follow-up of patients to find new prognostic factors.
BRCA1 protein and mRNA expression were quantified in situ in the TNBCs of 69 patients. miR-548c-5p was emphasized as a new independent prognostic factor in TNBC. A combination of the tumoral expression of miR-548c and three other known prognostic parameters (tumor size, lymph node invasion and CK 5/6 expression status) allowed for relapse prediction by logistic regression with an area under the curve (AUC) = 0.96.

BMC Cancer. 2015


Method for the diagnosis of breast cancer.
European Patent EP2942399. 2015

Conference posters

Capturing drug response during surgery for pharmacogenomic discoveries.  
5th Human Genetics in NYC Conference. 2018

Antisense long non-coding RNAs in breast cancer: A transcriptome-wide disruption.  
American Society of Human Genetics (ASHG) annual meeting. 2017

Transcriptome wide analysis of natural antisense transcripts shows their potential role in breast cancer.  
European Conference of Human Genetics (ESHG). 2017

A miRNA expression based diagnostic tool for breast cancer using random forests.  
Benelux Bioinformatics Conference. 2013

Exome sequencing of tumors: relevance in copy-number alteration (CNA) analysis and fixed tissue samples.  
Belgian Society of Human Genetics Annual Meeting. 2013

  Google Scholar     Pubmed     Research Gate     Impact Story