Technology Blog - Redapt

Run:ai Partners with Redapt

Written by Redapt Marketing | Nov 15, 2022 10:30:17 PM

Our Partnership with Run:ai

We are excited to announce our new partnership with Run:ai, a company that aims to accelerate AI-driven innovation by providing a foundation for AI infrastructure. RunAI enables IT leaders to develop a platform strategy for orchestrating their AI workloads across their compute-intensive assets and enables GPU clusters to be utilized for different Deep Learning workloads dynamically, whether on-premises or in the cloud. 

Our partnership with Run:ai will allow us to:

  • Help customers manage and better utilize their GPUs.
  • Improve FinOps maturity by incorporating Run:ai recommendations with our existing cost optimization assets.
  • Leverage our deep and broad experience in helping customers deliver AI/ML to use the appropriate MLOps tools in conjunction with Run:ai software to migrate workloads seamlessly.
  • Deliver industry-specific software solutions using AI/ML practices.

Run:ai's Compute Management Platform

By centralizing and virtualizing GPU compute resources, Run:ai gives enterprises visibility and control over resource prioritization and allocation while simplifying workflows and removing infrastructure hassles for data scientists. The Run:ai platform is built on top of Kubernetes, enabling simple integration with leading open-source frameworks and integrating with common MLOps environments including Kubeflow, MLflow and data science tools. 

Run:ai enables organizations and their clients to retain control and gain real-time visibility – including seeing and provisioning run-time, queueing, and GPU utilization of jobs. Its virtual pool of resources gives teams the ability to view and allocate compute resources across multiple sites – whether on-premises or in the cloud.  

Customer Challenges

Customers face many challenges when effectively managing workloads. A couple of these challenges are: 

  • Large organizations have researchers that need to use multiple GPUs across multiple servers. What they need is a reliable tool to manage and orchestrate their workloads.
  • Kubernetes is the de facto standard for container orchestration, but it has many limitations when it comes to GPUs – can’t manage multiple queues, priority for GPUs is limited, fairness of data and resources is overlooked, and there is little to no support for GPU fractions.

A Kubernetes-based Software Solution

Building a Kubernetes cluster with VMI-powered instances and installing Run:ai for better management and utilization of the GPUs.

Where Run:ai Shines

  • Fair-share scheduling to allow users to easily and automatically share clusters of GPUs
  • Simplified multi-GPU distributed training
  • Visibility into workloads and resource utilization to improve user productivity
  • Control for cluster admin and ops teams, to align priorities to business goals
  • On-demand access to Multi-Instance GPU (MIG) instances for the A100 GPU

Common Use Cases

  1. Organizations are able to build state-of-the-art GPU clusters from scratch and Run:ai software to operate clusters using the Run:ai platform. 
  2. Run:ai’s platform enables IT teams to eliminate “Shadow AI” within their enterprises by creating a centralized GPU cluster with dynamic allocations instead of siloed GPU clusters for different business units.  
  3. Businesses can modernize HPC clusters while transitioning to Kubernetes using Run:ai software making Run:ai a solution that organizations can depend on when modernizing K8s and rely on high-performing HPC schedulers to do so.

About Redapt

Redapt is a global integrator focused on data center infrastructure and cloud engineering services. Redapt's core value proposition is the ability to help customers accelerate technologies into production. Redapt is vendor-agnostic and partners with OEMs and ISVs to architect and integrate customized, best-of-breed technology solutions.

Connect with Redapt

Ready to get more out of AI? Schedule some time with our experts.