Genomics, Oncology & Bioinformatics research scientist

I currently work as a Computational Biologist at Tempus.

My past and present research interests encompass computational biology, computer science, oncology, genomics.

Previously, I was a Postdoctoral Fellow at the Center for Population Genomic Health, a member of the Charles Bronfman Institute of Personalized Medicine at the Icahn School of Medicine at Mount Sinai.

I earned my PhD in cancer genetics at the University of Liège, where I carried my research at the Unit of Human Genetics of the GIGA research center. During my PhD, I was advised by Prof. Vincent Bours and Prof. Guy Jerusalem and my work was funded by an FRS-FNRS fellowship.


I apply machine learning, algorithmics, and software engineering methods to data arising from population genomics and cancer genomics related questions.

As the modern genomic and transcriptomic techniques yield vast amounts of data, machine learning methods are well suited to tackle problems taking roots in these fields.

My past and present research projects include using circulating miRNAs as cancer biomarkers, studying the global disruption of antisense long non-coding RNAs in breast cancer samples, analyzing real-time pharmacogenetics phenotypes in ancestrally diverse populations, using graph theory analysis on Identity-By-Descent networks and developing novel, machine learning based, gene prioritization methods.

My research projects have led me to work with exome, and transcriptome (RNA) sequencing data, as well as genotyping and aCGH. I was particularly interested in the integration of these different techniques, and how the consequent emergent information would give a better understanding of sample specificities.

During my thesis, I've had the opportunity to work on projects related to the clinical (non-invasive diagnosis), technical (CNV detection), and molecular (non-coding RNAs) aspects of cancer. Throughout my postdoc, I've applied statistical genetics methods to biobank-scale datasets.

My activities have led me to work both in a research and a clinical setting, often building bridges between both.


See Google Scholar for a full publication list, including consortium and group authorships, and pre-prints.

Selected Peer-reviewed journal articles, as first author

We hypothesized that there is a genetic underpinning to the magnitude of the response to phenylephrine, an α1-adrenergic receptor agonist commonly used to treat hypotension during anesthesia and surgery. We quantified the response to phenylephrine (by determining the Δ between the minimum blood pressure (BP) within 5 minutes before and the maximum BP within 5 minutes after bolus administration). We performed a GWAS adjusted for genetic ancestry, demographics, and relevant clinical covariates to investigate genetic factors underlying individual differences in systolic BP response to phenylephrine (ΔSBP), as well as mean arterial pressure (ΔMAP) and diastolic BP (ΔDBP), for both the entire study cohort as well as for each of 3 ancestry sub-cohorts; European American (EA), African American (AA), and Hispanic American (HA).

After deriving a clean phenotype for 4,000+ patients, we showed that,
1) On average, European Americans have a stronger response to phenylephrine than African Americans and Hispanics.
2) There's ultiple significant associations, with 5 out of 7 in regions previously associated with routine blood pressure measurements.
3) Some variants present in the 3 populations only have en affect on 1 population. I.e. ancestry modulates the variant's effect in a significant way.
4) there's at least two groups of phenylephrine non-responders: one EA group and one AA group, each carrying a different rare (~1%) variant.

The Pharmacogenomics Journal. 2020

In this paper, we explore the use of supervised learning methods to rank large ensembles of genes defined by their expression values measured with RNA-Seq in a typical 2 classes sample set.
First, we use one of the variable importance measures generated by the random forests classification algorithm as a metric to rank genes. Second, we define the EPS (extreme pseudo-samples) pipeline, making use of VAEs (Variational Autoencoders) and regressors to extract a ranking of genes while leveraging the feature space of both virtual and comparable samples.
We show that, on 12 cancer RNA-Seq data sets ranging from 323 to 1210 samples, using either a random forests based gene selection method or the EPS pipeline outperforms differential expression analysis for 9 and 8 out of the 12 datasets respectively, in terms of identifying subsets of genes associated with survival. These results demonstrate the potential of supervised learning-based gene selection methods in RNA-Seq studies.

Frontiers in Genetics. 2018

In this work, 22 ER+ breast cancers and their paired adjacent non-malignant tissues were analyzed by strand-specific RNA-Seq. To highlight ncNATs potentially playing a role in protein coding gene regulations that occur in breast cancer, three different data analysis methods were used: differential expression analysis of ncNATs between tumor and non-malignant tissues, differential correlation analysis of paired ncNAT/PCT between tumor and non-malignant tissues, and ncNAT/PCT read count ratio variation between tumor and non-malignant tissues. Each of these methods yielded lists of ncNAT/PCT pairs that were enriched in survival-associated genes. This work highlights ncNAT lists that display potential to affect the expression of protein-coding genes involved in breast cancer.

Scientific Reports. 2017

We designed a metric allowing the comparison of CNV profiles, independently of the technique used and assessed the validity of using a pool of unrelated healthy DNA instead of a matched healthy tissue as reference in exome‐based CNV detection.
We compared the CNV profiles obtained with 3 different approaches (aCGH, exome with matched healthy reference, exome with pool of unrelated healthy references) on 3 multiple myeloma (MM) samples. We showed that the usual analyses performed to compare CNV profiles (deletion/amplification ratios and CNV size distribution) lack in precision when confronted with low LRR values.
We showed that the metric‐based distance constitutes a more accurate comparison of CNV profiles. Based on these analyses, we concluded that a reliable picture of CNV alterations in MM samples can be obtained from whole‐exome sequencing in the absence of a matched healthy sample.

Genetic Epidemiology. 2016

Circulating microRNAs (miRNAs) are increasingly recognized as powerful biomarkers in several pathologies, including breast cancer. Here, their plasmatic levels were measured to be used as an alternative screening procedure to mammography for breast cancer diagnosis.
We measured the plasma miRNA profile of 378 women, and we built an 8-miRNA random forest based diagnostic tool which performs better than screening mammography (AUC on independent cohort: 0.81). The accuracy of the diagnostic tool remains unchanged considering age and tumor stage, and the model is able to differentiate gynecologic cancers from breast cancers, and to correctly classify breast cancers in remission.
This work led to the registration of a European patent.

Oncotarget. 2015

Selected Peer-reviewed journal articles, as co-author

Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations.
Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups.
This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.

Cell. 2021

The Extreme Pseudo-Sampling (EPS) algorithm offers a solution to the challenge of feature selection via a combination of deep learning and linear regression models.
First, using a variational autoencoder, it generates complex latent representations for the samples. Second, it classifies the latent representations of cases and controls via logistic regression. Third, it generates new samples (pseudo-samples) around the extreme cases and controls in the regression model. Finally, it trains a new regression model over the upsampled space. We present an open-source implementation of the algorithm that is easy to set up, use, and customize. Our package enhances the original algorithm by providing new features and customizability for data preparation, model training and classification functionalities.

Bioinformatics. 2021

Whole exome sequencing undertaken in two siblings with delayed psychomotor development, absent speech, severe intellectual disability and postnatal microcephaly, with brain malformations consisting of cerebellar atrophy in the eldest affected and hypoplastic corpus callosum in the younger sister; revealed a homozygous intragenic deletion in VPS51, which encodes the vacuolar protein sorting-associated protein, one the four subunits of the Golgi-associated retrograde protein (GARP) and endosome-associated recycling protein (EARP) complexes that promotes the fusion of endosome-derived vesicles with the trans-Golgi network (GARP) and recycling endosomes (EARP).
This observation supports a pathogenic effect of VPS51 variants, which has only been reported previously once, in a single child with microcephaly. It confirms the key role of membrane trafficking in normal brain development and homeostasis.

European Journal of Medical Genetics. 2019

Breast cancer treatment can have long term side effects, including congestive heart failure (CHF). In this study, we evaluated innovative circulating cardiac biomarkers during and after anthracycline-based neoadjuvant chemotherapy (NAC) in breast cancer patients. Levels of cardiac-specific troponins T (cTnT), N-terminal natriuretic peptides (NT-proBNP), soluble ST2 (sST2) and 10 circulating microRNAs (miRNAs) were measured. Under chemotherapy, we observed an elevation of cTnT and NT-proBNP levels, but also the upregulation of sST2 and of 4 CHF-related miRNAs. We showed that circulating miRNAs and sST2 are potential biomarkers of the chemotherapy-related cardiac dysfunction (CRCD).

BMC Cancer. 2018

The genomic profile of multiple myeloma (MM) has prognostic value (hyperdiploid patients have a better prognosis than nonhyperdiploid patients). However, many other parameters (mutations, epigenetic modifications, genomic heterogeneity) may influence the prognosis.
We performed aCGH on a cohort of 162 patients to evaluate the frequency of genomic gains and losses. We identified a high frequency of X chromosome alterations leading to partial Xq duplication, often associated with inactive X (Xi) deletion in female patients.
This partial X duplication could be a cytogenetic marker of aneuploidy as it is correlated with a high number of chromosomal breakages. Patients with high level of chromosomal breakage had reduced survival regardless of the region implicated. Cancer genes located in this altered region (IKBKG, IRAK1, both members of the NFKB pathway) were shown to have higher transcriptional levels.

Genes, Chromosomes and Cancer. 2016

The BRCA1 gene plays a key role in triple negative breast cancers (TNBCs), in which its expression can be lost by multiple mechanisms. Here, we explored the correlations between BRCA1-related molecular parameters, tumor characteristics and clinical follow-up of patients to find new prognostic factors.
BRCA1 protein and mRNA expression were quantified in situ in the TNBCs of 69 patients. miR-548c-5p was emphasized as a new independent prognostic factor in TNBC. A combination of the tumoral expression of miR-548c and three other known prognostic parameters (tumor size, lymph node invasion and CK 5/6 expression status) allowed for relapse prediction by logistic regression with an area under the curve (AUC) = 0.96.

BMC Cancer. 2015


Method for the diagnosis of breast cancer.
European Patent EP2942399. 2015

Conference posters

Towards cancer mega-cohorts: A novel homogenization algorithm applied to diverse breast cancer RNA-Seq datasets.
American Society of Clinical Oncology (ASCO) annual meeting. 2020

Rapid response to the Alpha-1 Adrenergic Agent Phenylephrine in the Perioperative Period is Impacted by Genomics and Ancestry.  
6th Human Genetics in NYC Conference. 2018

A new paradigm for pharmacogenomic discoveries: Capturing drug response during surgery.  
American Society of Human Genetics (ASHG) annual meeting. 2018

Capturing drug response during surgery for pharmacogenomic discoveries.  
5th Human Genetics in NYC Conference. 2018

Antisense long non-coding RNAs in breast cancer: A transcriptome-wide disruption.  
American Society of Human Genetics (ASHG) annual meeting. 2017

Transcriptome wide analysis of natural antisense transcripts shows their potential role in breast cancer.  
European Conference of Human Genetics (ESHG). 2017

A miRNA expression based diagnostic tool for breast cancer using random forests.  
Benelux Bioinformatics Conference. 2013

Exome sequencing of tumors: relevance in copy-number alteration (CNA) analysis and fixed tissue samples.  
Belgian Society of Human Genetics Annual Meeting. 2013

  ORCID     Google Scholar     Pubmed     Research Gate     Impact Story