As a data scientist, data engineer or, machine learning engineer, you sometimes have to deal with Kubernetes, this strange tool to orchestrate containers. This is why this is the main subject of this issue.
You can deal with Kubernetes when serving a Rest API for scoring purposes. You can also use Kubernetes as your resource manager for Spark.
I included one other article about the past, present and future of Spark and a course about applied machine learning in production.
What a wonderful way to explain Kubernetes! You will understand the basic concepts through a comic strip. It may be a bit far fetched but funny and helpful.
The target audience of this is a beginning one.
Solve real problems and enhance your skills with browser based hands on labs without any downloads or configuration
A good way and easy way to start. The target audience of this is also a beginning one.
A video about Apache Spark on Kubernetes.
The big news is: Community adoption of Kubernetes (instead of YARN) as a scheduler for Apache Spark has been accelerating since the major improvements from Spark 3.0 release.
A thought-provoking article.
As a summary:
Spark filled a void that isn’t really there anymore.
There was a clear need to do huge in-memory calculations over multiple machines. Single machines were limited in RAM and the only viable option to do cluster-scale compute was Hadoop, which was based on MapReduce.
But honestly, is that void still there? These days, we all shop our infrastructure together in the cloud.
Many data scientists prefer to stick with Python and its rich eco-system of ML libraries.
This course is not finished. But it’s pleasant to see more and more resources about applied machine learning in production.
Be a problem solver, not a model fitter.
Thank you for reading. Feel free to contact me on Twitter if you want to discuss machine learning in real life.