Insights > Blog

The Keys to Designing Dynamic Infrastructure for Workload Performance

By Jason Zeng and Bryan Gilcrease | Posted on July 14, 2020 | Posted in AI/ML, Enterprise Infrastructure

Designing enterprise infrastructure that fits the requirements of multiple teams for workload performance can be a challenge.

IT departments strive to have in place the equipment and tools all departments need in order to work effectively and efficiently. As workloads become more demanding, balancing resources without creating friction—while ensuring all department needs are still met—is one of the most difficult, yet essential, tasks.

Modern tools to support new workloads, like artificial intelligence (AI) and machine learning (ML), can make things even harder. They often require more resources, which could critically impact other areas of an organization.

Capacity planning is one of the most important elements to proper workload design. Here are three common areas to get you started on your journey to workload performance:

1. Visibility

When planning for capacity with performance in mind, you must first have visibility into your existing environment. Identifying your current assets across the entire application, tools, and hardware stack, as well as evaluating how those assets are accessed, is a key first step.

Download now: Putting Artificial Intelligence to Work: A Guide to Designing High-Performance Datacenter Infrastructure for AI Workloads

In general, you want to identify:

  • Applications that require special hardware tools, like GPUs, FPGAs, or InfiniBand
  • Mission critical applications that are required to be running 24/7 without interruption
  • Non-mission critical applications that can yield to higher priority systems
  • Data sources, storage, and access patterns

2. Application assessment

The next step is to clearly articulate your new application dependencies and performance requirements. This includes:

  • I/O, bandwidth, storage subsystem types, and other performance characteristics being evaluated by all teams utilizing the new workload
  • A thorough understanding of what your organization is currently using and what you will need to meet future goals
  • Creating a dynamic design to help ensure that as workloads grow and change your infrastructure can adapt
  • Allowing workloads to run when and where they’re needed, which can include leveraging modern resource scheduling technologies such as Kubernetes to greatly aid in the effective distribution of resources and workloads

3. Data assessment

The third step is to evaluate the data access type and size anchoring of your workloads and various applications. This means:

  • Key metrics like response times, queries, SLAs, and access permissions
  • Subsystems in place to feed modern tools like GPUs. For example, high end GPU servers like Dell EMC’s PowerEdge C4140 can require high bandwidth networking as well as all flash storage systems such as Dell EMC’s Isilon platforms to properly utilize GPUs effectively
  • Local storage capacity and traditional networking, such has multi-node compute and HCI platforms for large-scale workloads
  • Understanding what types of data you are working with, which can play a big role in the design of your infrastructure

While this list is not exhaustive, we’ve found that these tenets should be a part of every workload capacity planning exercise.

If your organization is ready to design an infrastructure for AI and cloud-native workloads, download our free eBook to learn more.

Get your free eBook

Putting Artificial Intelligence to Work: A Guide to Designing High-Performance Datacenter Infrastructure for AI Workloads

Putting_Artificial_Intelligence_to_Work_Ebook_preview-1 Putting_Artificial_Intelligence_to_Work_Ebook_preview-2 Putting_Artificial_Intelligence_to_Work_Ebook_preview-3