Your browser is outdated!

To ensure you have the best experience and security possible, update your browser. Update now

×

Fallou Tall

Lead Data Engineer - Azure, Databricks and Google Cloud Certified

Fallou Tall
Professional Status
Employed
Open to opportunities
Resume created on DoYouBuzz
  • Profile and analyze relevant data
  • Develop data processing pipelines
  • Profile and optimize SQL queries
  • Put the data processing pipelines into production
  • Automate quality assurance testing
  • Monitor production status
  • Develop a Pyspark library for quality assurance testing
  • Stack: Azure; AWS, Databricks, Python, Snowflake, Spark, SQL
  • Migrate Flume, Pig, Spark1 and Sqoop workflows to Spark2.
  • Orchestrate workflows on the new cluster with Oozie.
  • Migrate Oracle databases to Hive.
  • Developed a Spark library to boost the data engineers productivity.
  • Stack: Hadoop (Hive, HBase, HDFS, Oozie...), Scala, Spark, Oracle
  • Extract raw QoE (NPS, resp.) data from HDFS, then transform it and finally save it to the Hive data warehouse via Spark
  • Explore then prepare QoE data for modeling with Pyspark and Pandas
  • Predict customer QoE by machine learning with Scikit-learn
  • Deploy the QoE prediction model as a Flask API and store the results in SQL Server
  • Automate the retraining of the QoE prediction model at regular monthly intervals (experimentally defined)
  • Extract Churn data from Hive via Spark, then transform it and finally save it to the Hive data warehouse via Spark
  • Perform correlation analysis between NPS and QoE (Churn and QoE, resp.) with Pyspark and Pandas, then visualization as a Dashboard with Tableau
  • Explore then prepare Churn data for modeling with Spark
  • Develop customer churn prediction models (base, recharge and data) by machine learning with SparkML
  • Deploy churn prediction models in batch mode via Spark and store results in SQL Server
  • Automate the retraining of the model at regular monthly intervals (experimentally defined) with Oozie
  • Report traffic alarms and alerts in real time for the dynamic management of Orange sites
  • Predict the outages of these sites by machine learning based on the alarms and alerts data
  • Extract Fibers data from Hive, then transform it and finally save it to the Hive data warehouse via Spark
  • Develop algorithms for recommending fiber to customers and recommending areas to fiber to Orange with SparkML
  • Deploy recommendation models in batch mode via Spark and store results in SQL Server
    Orchestrate data processing pipelines with Oozie
  • Stack: Hadoop (HDFS, Hive, Oozie), Scala, Spark, SQL Server, Tableau, Python, Scikit-Learn, Flask
  • Set up the micro-services architecture composed by the stack Kubernetes, Kafka, Cassandra, Spark and Node.js
  • Extract CDRs and Probs data from Kafka, then transform and store in Cassandra via Spark-Streaming
  • Develop a model for locating the living and working places of Orange customers in Dakar from CDRs data with Spark
  • Develop a model for determining the origin-destination matrix of Orange customers in Dakar from probs data with Spark
  • Validate algorithms with the urban transportation service data and demographic data
  • Extrapolate the results obtained on the entire population of Dakar
  • Predict population movements in Dakar by machine learning with SparkML based on origin-destination data combined with probes data
  • Dockerize and deploy the Spark applications on Kubernetes
  • Stack: Cassandra, Kafka, Kubernetes, Scala, Spark
  • Design and implement the architecture of the application
  • Scrape HR data from the HR platform with Beautiful Soop
  • Store scraped data into Google Cloud Storage
  • Clean and prepare training data for modeling
  • Define the intents and entities then manually create some dialog flows
  • Automatically generate dialog flows with Rasa Interactive
  • Develop an Intent Classification Model with Rasa-NLU and TensorFlow
  • Develop an entity recognition model with Rasa-NLU and Spacy
  • Develop a chatbot response prediction model with Rasa-Core and TensorFlow
  • Dockerize then connect the app to Facebook Messenger API by setting up a webhook
  • Deploy the chatbot on Google Cloud Platform via App-Engine Flex
  • Stack: Beautiful Soop, GCP, Messenger API, Python, Rasa-Core, Rasa-NLU, Spacy, TensorFlow
  • Literature review of job scheduling algorithms
  • Extract Slurm log history from MySQL with Pandas
  • Explore then prepare data for modeling with Pandas
  • Develop a clustering model of applications that run on the system by machine learning with Scikit-Learn
  • Develop a supercomputer user classification model by machine learning with Scikit-Learn
  • Deploy models as REST APIs
  • Develop an energy-efficient job scheduling algorithm in Python based on the prediction of the resource consumption of jobs and their owners
  • Stack: Anaconda, MySQL, Python, Scikit-Learn

Cooperative Master in Mathematical Sciences - Major Big Data

African Institute for Mathematical Sciences (AIMS - Senegal)

August 2016 to February 2018

Bachelor in Applied Mathematics

Cheikh Anta Diop University (UCAD - Senegal)

October 2009 to July 2013

Deep Learning Specialization

deeplearning.ai

April 2017 to December 2017
  • Apache Hadoop
    Advanced
  • Apache Spark
    Advanced
  • Cloudera/Hortonworks
    Good
  • Kafka
    Good
  • AWS
    Advanced
  • Azure
    Advanced
  • Databricks
    Advanced
  • GCP
    Advanced
  • Snowflake
    Advanced
  • Machine Learning
    Advanced
  • Deep Learning
    Good
  • Statistics
    Advanced
  • Data Strutures & Algorithms
    Advanced
  • Problem Solving
    Good
  • English
    Advanced
  • French
    Expert
  • Wolof
    Expert
  • Python
    Advanced
  • R
    Good
  • Scala
    Advanced
  • SQL
    Advanced
  • Agile methodology
    Advanced
  • Atlassian
    Advanced
  • Github
    Advanced
Certifications

AWS Certified Data Analytics - Speciality (on going...)

- April 2022

Databricks Developper Essentials

- January 2022

Databricks Certified Associate Developper for Apache Spark

- January 2022

Databricks Certified Associate Developper for Apache Spark

- October 2021

Azure Data Engineer Associate - MCID: 991749803

- October 2021

Google Cloud Pofessional Data Engineer

- December 2020