goal and achievement: delivery of a pipeline automating the ingestion, processing and visualization of reinsurance data * Data ingestion from a relational database to a cloud environment * Setting up a data processing pipeline under Databricks environment (Bronze and Silver layers) * Building data transformation pipelines in Palantir Foundry (Gold layer) * Implementation and integration of business KPIs * Building of activity monitoring / dashboard tools with Foundry
environment: a squad of 4 developers; working based on Scrum,
Data Science Project Manager
Explain
June 2019
to October 2024
goal and achievement: develop, maintain and upgrade data science/LLM functionalities in two client-facing tools: * Development and integration of NLP/LLM/AI features under AWS environment * Translation of customer needs into Data Science issues * Abstraction and modeling of business problems * Production of prototypes of functionalities/models
Some data science / AI issues addressed: * Automatic answering of questions by querying a large documentary database * Timeline of publication summaries related to user defined parameters (topic and location) * Document deduplication * Named entity recognition * Document classification (machine learning model)
environment: two squads of 6 developers each; working based on Scrum
Data Science Senior Consultant
Capgemini Invent
May 2017
to June 2019
Airbus (one year)
* goal and achievement: delivery of a pipeline automating the ingestion, processing and visualization of aviation data: - Setting up a data processing pipeline under Palantir's Foundry environment. - Building data transformation pipelines in Palantir Foundry. - Implementation and integration of business KPI * tools: Palantir Foundry, PySpark, Hive, Scala, Spark * environment: many squads of 3 developers; working based on Scrum
BNP Paribas (6 months) * goal and achievement: Development of a disasters clustering model using Natural Language Processing (NLP) * Data science issues addressed: Topic extraction, Text classification, Text translation * tools: Python, Scikit-Learn, NLP * environment: a squad of 2 data scientists
Sodexo (6 months) * goal and achievement: Construction of data processing pipeline in distributed environment. Construction of a restaurant attendance forecasting model in order to reduce waste * tools: Dataiku, PySpark, Azure, time series, Machine learning (LSTM, Prophet, SARIMA) * environment: 3 squads of 2 data scientists each; working based on Scrum
Computer Science Researcher
Laboratoire Bordelais de Recherche en Informatique
January 2014
to February 2017
optimize in time and space the calculation of queries called Skyline within relational databases
estimation of the size of the query result
approximate calculation
identification of relationships (in particular functional dependencies) between columns
pre-computation, data structure
Multidimensional data analysis and correlation detection
tools: Java, C++, BigData
Parttime Lecturer
OpenClassrooms
June 2021
to January 2024
Student Mentoring
Student Project Evaluation
Data Scientist, Machine Learning Engineer, and Data Analyst Programs
Lecturer in statistics, SPSS
Université de Bordeaux
September 2015
to September 2016
Introduction to statistics with SPSS
Student assessment
Education
Engineer Statistician
Ecole Nationale de la Statistique et de l'Analyse de l'Information (ENSAI - Rennes)
September 2011
to November 2013
Data processing and analysis Statistical Information System
Skills
Data Science
Data processing
Data analysis
Decision support models
Classification, Clustering
Machine Learning
Data Mining
AI, LLMs
Tools
SAS (certification)
R, SPSS, Matlab, Spad
Scikit-learn, TensorFlow, Pytorch, Keras, MLOPS,
Tableau, PowerBI
AWS (EC2, EBS, Sagemaker, OpenSearch, ...)
GCP, Microsoft Azure, Dataiku, Palantir
Jupyter notebook, Jupyter Lab, Pycharm
ElasticSearch
Computer Science
Python, JAVA, C++, C, VBA
HTML5, Javascript, PHP, CSS
Base de données, SQL, NoSQL, Postgresql, MySQL, Oracle