Resume created on DoYouBuzz

Experiences

goal and achievement: delivery of a pipeline automating the ingestion, processing and visualization of reinsurance data
* Data ingestion from a relational database to a cloud environment
* Setting up a data processing pipeline under Databricks environment (Bronze and Silver layers)
* Building data transformation pipelines in Palantir Foundry (Gold layer)
* Implementation and integration of business KPIs
* Building of activity monitoring / dashboard tools with Foundry
tools: Foundry (Workshop, Slate, Contour, ...), Databricks, Dbt, PySpark, SQL, ...
environment: a squad of 4 developers; working based on Scrum,

goal and achievement: develop, maintain and upgrade data science/LLM functionalities in two client-facing tools:
* Development and integration of NLP/LLM/AI features under AWS environment
* Translation of customer needs into Data Science issues
* Abstraction and modeling of business problems
* Production of prototypes of functionalities/models
Some data science / AI issues addressed:
* Automatic answering of questions by querying a large documentary database
* Timeline of publication summaries related to user defined parameters (topic and location)
* Document deduplication
* Named entity recognition
* Document classification (machine learning model)
tools: Python, LLMs (Gpt, Llama, Mistral), ElasticSearch, PostgreSql/OpenSearch, Spark, MapR, AWS, Pandas, Linux, Shell, Git, Sagemaker, HuggingFace
environment: two squads of 6 developers each; working based on Scrum

Airbus (one year)
* goal and achievement: delivery of a pipeline automating the ingestion, processing and visualization of aviation data: - Setting up a data processing pipeline under Palantir's Foundry environment. - Building data transformation pipelines in Palantir Foundry. - Implementation and integration of business KPI
* tools: Palantir Foundry, PySpark, Hive, Scala, Spark
* environment: many squads of 3 developers; working based on Scrum
BNP Paribas (6 months)
* goal and achievement: Development of a disasters clustering model using Natural Language Processing (NLP)
* Data science issues addressed: Topic extraction, Text classification, Text translation
* tools: Python, Scikit-Learn, NLP
* environment: a squad of 2 data scientists
Sodexo (6 months)
* goal and achievement: Construction of data processing pipeline in distributed environment. Construction of a restaurant attendance forecasting model in order to reduce waste
* tools: Dataiku, PySpark, Azure, time series, Machine learning (LSTM, Prophet, SARIMA)
* environment: 3 squads of 2 data scientists each; working based on Scrum

optimize in time and space the calculation of queries called Skyline within relational databases
estimation of the size of the query result
approximate calculation
identification of relationships (in particular functional dependencies) between columns
pre-computation, data structure
Multidimensional data analysis and correlation detection
tools: Java, C++, BigData