Reproducibility and Automation of Machine Learning Process
We’re very happy to keep engaging with professional communities on topics we’re passionate about. This time, our Data Science and Machine Learning consulting expert Denis Dus spoke at PyCon Belarus’17 – an annual international conference that connects Python community in Minsk. At the event, Denis covered the topic of Reproducibility and Automation of Machine Learning Process.
In his speech, he explained basic design concepts for automation of iterative processes in machine learning and shared his experience of building data pipelines within one of his projects.
Photo Credit: PyCon Belarus
Automation of machine learning process does not eliminate the data science expert, it helps to focus efforts on understanding the business problem, improving the model, and explaining results, the true value drivers for business.
Normally data scientists have to spend up to 80% of their time on data engineering tasks like data extraction, data cleaning, data transformation, data normalization, feature extraction and only 20% of the time is spent on modeling. Denis recommends considering automation if you repeatedly need to extract, clean and transform data, if you want to update models on regular basis or if you want to simplify reproducibility of data science experiments.
Check out the slides below for more details.