Sitemap & RSS Feed Tags

Learn about data science in real life and machine learning in production

Issue 16: A story of distributions

Jun 24, 2021 Normal distribution is well known but not the only one. ...

Designing data intensive applications (reading)

Jun 01, 2021 I've just finished a book about designing data intensive applications. ...

Issue 14: Market yourself, ACID transactions in DeltaLake, OpenSource and Bayesian ABTests

May 21, 2021 Market yourself, ACID transactions in DeltaLake, OpenSource and Bayesian ABTests ...

Contributing to open source is not what I thought it was

May 21, 2021 Open source can be many things. ...

Overview of Bayesian AB Tests

May 18, 2021 A memo on some principles regarding Bayesian AB Tests. ...

Issue 13: Kubernetes and DevOps concepts

May 08, 2021 Kubernetes is a tool. The concepts that are around this one are interesting. Zero downtime deployment, containers orchestration, scaling, etc. They help to deploy safely and then to get solid ML systems. It's relevant to understand these concepts. This week, I would like to highlight some of them through different resources. ...

Kubernetes - Be sure of being ready for real zero downtime deployment

May 05, 2021 Readiness is an important concept in Kubernetes that avoids you getting a temporary error when deploying. It's a key to get real zero downtime deployment. ...

Issue 12: Bayesian AB Tests

Apr 20, 2021 Recently, for an R&D project, I had to implement Bayesian AB tests. As AB tests are an important key to develop safely and surely, I decided to present to you what I've learnt so far. I focused my reasearch on the Pymc3 library. ...

Issue 11: Machine Learning Design Patterns

Mar 25, 2021 ML design patterns for your ML journey. ...

Issue 10: Challenges in Deploying Machine Learning

Mar 15, 2021 I'm glad to see that we can find more and more papers about deploying Machine Learning. Challenges are multiple and everywhere. Let's dive a bit into these challenges and resources that discuss these. ...

Different ways to tackle the data labelling bottleneck in machine learning

Mar 09, 2021 Data are the food of machine learning training. There are more and more data everyday. But most of the time, these data are unlabelled. Labelling them manually is expensive and boring. There are different ways to tackle this problem. Active learning Active learning optimises labelling. It extracts the data that must be labelled. The system requests a manual labelling for identified cases. Those depend on the strategy you choose. I will cover only two of these strategies. ...

Issue 9: Code standardization, container orchestration, lakehouse, cats: concepts needed to productionalize machine learning models

Mar 01, 2021 Code standardisation with Pylint, container orchestration with Kubernetes, lakehouse with DeltaLake, working with cats <3, these are many concepts that can be useful to productionalize machine learning. This is what I saw recently and thought interesting. ...