Insights > Blog

Machine Learning Workloads: On-Premises vs. the Cloud

By Paul Welch | Posted on May 21, 2020 | Posted in AI/ML, Enterprise Infrastructure

Unlocking the very real benefits of machine learning (ML) comes at a cost, with high-performance GPUs and storage being at the top of the list of expenses.

While there’s no question experimenting with ML in the cloud is a great way to start, most organizations will eventually face the question of whether using a cloud provider or hosting ML workloads on-premises makes more sense.

The answer to this question often depends on some key business factors, including:

  • The resources you have available
  • Where you data currently resides
  • Your number of data science engineers and whether that number is expected to grow

Most of all, whether you go on-premises or with the cloud depends on the almighty dollar. 

How much you’re willing to invest both upfront and on an ongoing basis, your predicted revenue opportunities from utilizing ML, how much your investment will need to increase as you scale, and so on.

mid-page-graphic-blog-machine-learning-gpu

To help you better understand the differences in cost when it comes to ML workloads on-premises versus the cloud, let’s look at an estimated investment breakdown for each platform over three years.

Before we get into the numbers, a quick word on how we’ve made our calculations—for both the on-premises and cloud platforms we:

  • Compared GPU server instances only
  • Excluded peripheral expenses like operating expenses (power, space, cooling, administration, etc.)
  • Utilized a target requirement of 8 x Nvidia V100 GPUs, as well as high memory and local SSD
  • Compared lowest cost option for public clouds
  • Compared public cloud server configuration closest to the hardware spec

With those boundaries in place, we estimate that over a three year lifecycle, the breakdown on average GPU server instance costs would be:

redapt_icons_capabilities_infrastructure

$194,444

Physical Server Hardware
(On-Premises) 

vs.

cloud

$324,087

Public Clouds (AWS, Azure, GCP) 
(60% more expensive) 

 

To sum up

While the above numbers are not precise—and the gap between on-premises and the cloud certainly narrows when you add in operational costs—they are at least in the ballpark for what you can expect to pay for GPU server instances over three years.

For organizations looking for options, this calculation should at least give you an idea of the level of investment you’ll be looking at. 

Certainly, on-premises seems like the better bargain in our exercise, but as with anything that scales, the growth of your business will dictate the price tag you end up facing.

Contact one of our experts to develop a detailed cost analysis comparing running ML workloads on-premises and in the cloud.