Insights > Blog

Common AI/ML Performance Bottlenecks

By Bryan Gilcrease & Paul Welch | Posted on November 11, 2020 | Posted in Featured, Security & Governance

Data is just one of the common bottlenecks to artificial intelligence (AI)/machine learning (ML) projects. An even bigger hurdle—one that trips up most enterprises—is actually putting AI and ML models into production.

In fact, as we’ve talked about before, an estimated 90% of ML models never get up and running. And one of the major reasons for this is a disconnect between data scientists working on models and IT teams tasked with implementing them.

If your organization is already experiencing bottlenecks in your projects—or you want to avoid them in the future—here are some areas to focus on:

cpu_large-iconYour horsepower

Among the top trends around AI/ML performance is the use of high-performance, dedicated hardware like GPUs with custom libraries dedicated to ML and deep learning.

What makes GPUs interesting in the AI/ML space is the fact that they were originally intended to accelerate rendering graphics on screens. But as it turns out, a lot of the math necessary to render graphics is the same as the math required to train ML models. 

By using GPUs, or other dedicated hardware like FPGAs or even TPUs, you are able to offload some of the heaviest processing being done by a CPU. This can greatly speed up training, allowing you to build models using much larger datasets and iterate more quickly—critical components of making AI and ML effective.

Another trend worth paying attention to is the growing use of all flash storage and high-speed, low-latency networking to facilitate larger datasets, as well as high-level libraries with more user-friendly features like PyTorch and TensorFlow.

tools_large-iconYour tools

No amount of data processing can make up for a lack of communication and partnership between data science and IT.

That’s where orchestration tools like Kubeflow come in.

Kubeflow is not really a single product but is composed of a collection of tools working in concert to make developing and deploying ML models easier and more efficient. With it, you can put to work:

  • Jupyter notebooks for experimentation and sharing
  • Katib for tuning hyperparameters on Kubernetes
  • Kubeflow Pipelines for building and deploying ML workflows based on containers
  • Metadata for tracking of information about the workflow
  • KFServing functions for serverless data processing and ML

Kubeflow is still an evolving project, but it already has a lot of promise for increasing innovation in theAI/ML space. 

check_large-iconYour goals

This last area is probably the most critical to focus on, since without a good understanding of what you’re specifically trying to achieve with AI and ML, any efforts you put toward those capabilities will eventually hit a dead end.

Are you hoping to leverage AI to improve customer service? Get in place ML models that identify things like potential fraud? Reduce the need for your team members to conduct repetitive tasks so they can focus on more productive efforts? All of the above?

Until you know—and have widespread agreement throughout your organization—about what you’re working toward, all your AI/ML projects will likely go nowhere.

 

To learn more about AI, ML, and how you can get your organization up and running with both, contact one of our experts.