Saif Shabou

Saif Shabou

Building data products for a better planet

The World Resources Institute

OpenGeoScales

Biography

Data science | Data Product Management | Open Data | Geo-statistics | GIS | Machine Learning | Artificial Intelligence

Hello!

I am a Data & Product Specialist.

I have data science skills, combined with data architecture and engineering knowledge, and data product management experiences.

My goal is to participate in building robust and impactful Data products.

My works are especially related to geo-spatial topics: human mobility, environmental risks, spatial-temporal dynamics, smart building, urban planning…

I am a continuous learner and an eternal student. I share in this blog some of my personal data projects and tutorials.

One of my favorite quotes is from Carl Friedrich Gauss: “It is not knowledge, but the act of learning which grants the greatest enjoyment. When I have clarified and exhausted a subject, then I turn away from it, in order to go into darkness again …"

My technical stack is:

  • Data Science: R, Python, SQL
  • Machine Learning: Keras, Tensorflow, MLflow
  • IDE: RStudio, PyCharm, Jupyter
  • Product Management: Slack, Azure Boards, Jira, Trello, Github projects
  • Devops: Github, Azure, Google Cloud Platform
  • Reporting: Shiny, Dash, Plotly, leaflet, Tableau, PowerBI
  • Database: SQL Server, PosgreSQL, Cassandra, Hive, MongoDB, BigTable.
  • App Deployement: heroku, Google AppEngine
  • CI/CD: Gitlab, Azure DevOps
  • GIS: QGIS, sp, shapely, spstat

Interests

  • Data Product management
  • Data science
  • Geo-Analytics
  • Climate & Environmental science

Education

  • PhD in Geo-science, 2016

    University of Grenoble

  • MEng in Environmental Psychology, 2013

    University of Nîmes

  • MEng in Architecture and Urban Planning, 2011

    School of Architecture and Urban Planning

Skills

R

Expert

Product Management

Senior

python

Intermediate

Database (SQL,Nosql)

Senior

Data Science

Senior

Music

Expert

Experience

 
 
 
 
 

Data & Analytics Associate

The World Resources Institute

Jul 2021 – Present The Hague

Use geospatial data and develop data products to help cities adapt to climate change, reduce exposure to climate hazards, effectively manage their natural resources and reduce their greenhouse gas emissions.

Responsibilities include:

  • Research, design, prototype and deploy scalable data products and platforms by providing insights related to climate risks impacts, climate actions, biodiversity monitoring and urban development challenges.
  • Guide development, maintenance and updates of multiple data products and tools
 
 
 
 
 

Founder - President

OpenGeoScales

Jan 2020 – Present Paris

Responsibilities include:

  • Build an open source project for standardizing climate open data at multiple geospatial scales.
  • Build and manage a community of 30 data specialists volunteers
  • Data Product management (roadmap, issues, tasks…)
  • Project description
 
 
 
 
 

Product Owner Data Science

Freelance

Jan 2020 – Present Paris

Responsibilities include:

  • Accompanying companies in the design, development and deployment of Data and AI products
  • Working with Data Architects/Scientists/Engineers to design robust data pipelines: integration, storage, analytics and visualization
  • Specification and documentation of Data Product’s features
  • Feeding backlog with prioritized features/USs and managing product roadmap
 
 
 
 
 

Product Owner - Chief Data Scientist

OpenfieldLive

Aug 2017 – Oct 2019 Paris

Responsibilities include:

  • Leading Data Science team and responsible of Data Analytics products
  • Deployment of Location Intelligence and Smart Building Analytics solution providing our clients with insights about building usage and occupancy, visitors behaviors and environmental quality
  • Conception and development of Data Science product based on big data technologies: real time data collection and profiling, data catalog management, automated data cleaning, spatio-temporal modeling and aggregation, KPI’s Geo-visualization…
  • Defining product roadmap and strategies, feature prioritization based on clients needs, Backlog management and User stories definition, sharing product vision
  • Technical and functional documentation and communication about our product features
 
 
 
 
 

Data Scientist

OpenfieldLive

Feb 2017 – Aug 2017 Paris

Responsibilities include:

  • Develop several data science POCs and produce statistical analysis and reports to our clients for extracting reliable insights from their raw geospatial data
  • Anomaly and noise detection in wifi geolocation data
  • Trajectory interpolation: map matching, kd-tree, dijkstra shortest path implementation, graph theory
  • Trajectories aggregation and clustering: sequence mining, Dynamic Time Wrapping implementation
 
 
 
 
 

Data scientist

CNRS

Jun 2016 – Feb 2017 Grenoble

Responsibilities include:

  • Participation to European Research project “ANYWHERE”: EnhANcing emergencY management and response to extreme WeatHER and climate Event funded within EU’s Horizon 2020 research and innovation program: http://anywhere-h2020.eu/
  • Statstical analysis of population exposure to Natural Hazards

Publications:

 
 
 
 
 

Data scientist

University of Grenoble

May 2013 – May 2016 Grenoble

Responsibilities include:

  • Data management: collecting socio-environmental data and managing geospatial data base: national census data, national mobility and traffic data, social interviews, roads and hydrographic networks, hydrological simulation outputs, weather forecast data, precipitation, weather alerting data…
  • Statistical analysis: descriptives statistics, data cleaning and anomaly detection, time series analysis, pattern recognition, spatio-temporal aggregation, text mining, sequence mining, graph analysis, geo-visualization, geo-statistics, probability propagation, inference systems
  • Development of a mobility simulation model within R: agent-based simulation, activity-based modeling, discrete-event simulation, regression tree, machine learning
  • Development of a model of decision_making under risk and uncertainty: risk modeling, decision theory, bayesian network, probabilistic models

Scientif papers:

Recent Posts

Data Science Product Owner

Intro With the increasing number of Data Science and AI based products, we need to think more the work organization within Data teams. Inspired by Agile Frameworks and DevOps approaches, new Data and Machine Learning development workflows are being implemented (DataOps, MLOps…).

Lost with Open Data

Intro Le mouvement de l’open data m’a toujours fasciné. A chaque fois que je visite les portails d’open data (gouvernements, régions, villes, institutions publiques…), je suis captivé par la richesse du contenu et l’effort fourni pour réunir, organiser et publier toutes ces informations.

Lost in Hypertext

I am a wanderer in the world of hypertext, a specialist in drifting in Wikipedia pages. I have spent a large part of my life reading and learning. I read so much that I forgot what I read.

Image classification with Keras

Image classification with Keras

Projects

*

Art Painting

Art painting work.

Carbon Disclosure Project - Kaggle Competition

A data analysis report providing a methodology for computing cities' climate adapatation Index by combining citiy rediness and vulnerability indicators to climate change impacts.

French Open Data Report

An overview of available french open data providers, platforms, tools and analysis.

Natural Language Processing Tutorial

Natural Language Processing:Words embedding, tokenization, Text classification, Sentiment Analysis, Topic Modeling.

Open Data Book

An overview of available open data providers, platforms, tools and analysis in different scales (worldwide, europe, and France) and themes (environment, economy, demography…).

Open Geo KPI

A Shiny Web App providing geo-visualization of world development KPI and statistics based on various open data providers (World Bank, Eurostat, United Nations…).

Time Series Analysis

An overview of Time Series analysis methods including modeling appoaches, forecasting algorithms and outlier detection processes.

Computer vision tutorial with Keras and R

Computer vision tutorial with Keras and R:Deep learning, Keras, image classification.

Remote Sensing Images Analysis

Satellite imagery analysis:image processing, index generation, image segmentation, and image classification.

GeoScales

GeoScales is a tool that enables an easy and unique access to geospatial open data provided in different scales.

MobRisk

MobRISK a model for assessing the exposure of road users to flash flood events

Contact