No-Code, Low-Code Machine Learning Platforms Still Require People

Code-free (horizontal) machine learning platforms are useful for extending data science in the enterprise. However, as many organizations are now discovering, there are many ways data science can go wrong in solving new problems. Zillow has suffered billions of dollars in losses buying homes using a flawed data-driven model for valuing home values. Data-driven HR technology, especially when based on facial recognition software, has been shown to bias hiring decisions against protected categories.

While automation is a great tool to have in your arsenal, you need to think about the challenges before using the horizontal ML platform. These platforms must be flexible, configurable, and monitorable to be robust and constantly add value over time. They need to allow data to be weighted flexibly in user-controlled ways and have data visualization tools in place to detect outliers and contributors to noise. They also need automated model parameters and data deviation monitors to alert users of changes. As you can see, we are not yet advanced to the point where algorithms outperform human intelligence.

So, don’t be fooled by artificial intelligence/machine learning/low code…you still need people. Let’s take a closer look at the reasons for this.

Machines learn from people

Attempting to replace human data scientists, domain experts, and engineers with automation is a hit-or-fail proposition that could lead to disaster if applied to mission-critical decision-making systems. why? Because humans understand data in ways that automated systems still struggle with.

Humans can only differentiate between data errors and unusual data (eg game trading / downtime / GME in February) and align unusual data patterns with real world events (eg 9/11, COVID, financial crises, elections). We also understand the impact of calendar events such as holidays. Depending on the data used in machine learning algorithms and the data that is being predicted, it may be difficult for machine learning algorithms to discover the semantics of the data. Forcing them to reveal these hidden relationships is not necessary if they are not hidden from the human factor.

Aside from semantics, the hardest part of data science is distinguishing between statistically good results and useful ones. It’s easy to use estimation statistics to convince yourself that you have good results or that a new model gives you better results than the old one, when in reality neither model is useful for solving a real-world problem. However, even with valid statistical methodologies, there is still a component to interpreting modeling results that requires human intelligence.

When developing a model, you often encounter problems about the model’s estimation statistics to be measured: how to weight them, evaluate them over time, and determine which outcomes are important. Then there’s the whole problem of over-testing: if you test repeatedly on the same set of data, you eventually “learn” your test data, making your test results overly optimistic. Finally, you have to build models and figure out how to put all these stats together in a simulation methodology that is achievable in the real world. You also need to keep in mind that just because a machine learning platform is successfully deployed to solve a particular modeling and forecasting problem, repeating the same process on a different problem in the field or in a different sector does not lead to the same successful outcome.

There are many choices that must be made at each step of the data science research, development, and deployment process. You need experienced data scientists to design experiments, domain experts to understand boundary conditions and data nuances, and production engineers who understand how to deploy models in the real world.

Visualization is the jewel of data science

In addition to weighting and data modeling, data scientists also benefit from data visualization, a highly manual process, and more art than science. Plotting the raw data, the correlations between the data and the predicted quantities, and the time series of transactions generated by the estimates over time can lead to feedback that can be fed back into the model building process.

You might notice a frequency in the data, perhaps a weekday effect or anomalous behavior around the holidays. You may discover extreme moves in transactions that indicate that external data is not being handled well by your learning algorithms. You may notice different behavior across subsets of your data, indicating that you can separate subsets of your data to create more accurate models. Again, self-organizing learning algorithms can be used to try to detect some of these hidden patterns in the data. But a human may be more willing to find these patterns, and then feed insights from them into the model-building process.

Horizontal ML platforms need to be monitored

Another important role that people play in deploying machine learning-based AI systems is model monitoring. Depending on what type of model is used, what it expects, and how these predictions are used in production, different aspects of the model must be monitored so that deviations in behavior are tracked and problems can be anticipated before they lead to deterioration in real-world performance.

If the models are retrained on a regular basis with newer data, it is important to track the consistency of the new data entering the training process with previously used data. If production tools are updated with new models trained on newer data, it is important to check that the new models are as similar to the old ones as one might expect, as the prediction depends on the model and the task.

There are clearly tremendous benefits to applying automation to a wide range of problems across many industries, but human intelligence remains central to these developments. You can automate human behavior to some extent, and in controlled environments, you can replicate the power and performance of their work using ML-based, codeless AI systems. But in a world where machines still rely heavily on humans, never forget the power of humans.

.

Leave a Comment