In this webinar, Redapt experts Paul Welch and Bryan Gilcrease discuss how the Redapt ML Accelerator can help your organization quickly get started on its ML (machine learning) journey.
Video transcription:
All right. We're ready to get started. Thanks for joining a Redapt webinar. If you didn't know, Redapt is a technology solutions provider focused on helping our customers navigate the ever changing and complex landscape of enterprise technology. We believe that highly successful organizations are powered by highly successful technology. And today we're going to be presenting our ML Accelerator, which is how we're helping organizations successfully adopt ML. And presenting are the creators of the ML Accelerator, Paul Welch and Bryan Gilcrease.
We have Bryan as our senior solutions architect and Paul is the senior VP of product engineering. Together, the ML Accelerator is their brainchild, and they're very experienced, not only with hardware that is required to power these sorts of workloads, but they also really know the ins and outs of the technology itself and how to help our customers navigate getting started. I'll let them introduce themselves or a little bit more about their experience, and then Paul will drive.
Thanks, David. My name is Paul Welch. I focus on product engineering, and for Redapt that means combining pieces from a wide variety of our partners, both hardware, software, and also the engineering expertise and services that we do to come up with higher value solution offerings for our customers.
Thanks, Paul. Bryan Gilcrease. I'm a senior solution architect here at Redapt, and I've been working with customers to build out big data analytics machine learning solutions. And this is the combination of our experience and conversations with what customers are going through and what we see in the industry. I'll let Paul kick it off. Thanks.
So what we're going to talk about today, as David mentioned, is our ML Accelerator. We're going to cover kind of an introduction to why enterprises are looking to adopt ML, as well as some of the challenges involved and really focusing on addressing those with our Accelerator program to help them do it quicker and with less risk and more certainty than they would without our program. Bryan, do you want to add anything to the agenda about the Dell partnership?
This is a co-webinar with Dell EMC as a sponsor, we built this out with Dell EMC hardware as a reference architecture to fit into their portfolio. This ties in nicely with some of this stuff in reference architectures that they've done around this, and we've picked flexible platforms that make this able to work for our different customer needs.
Awesome. So why are enterprises looking to adopt ML? I'm sure everyone's heard about AI and ML at this point. There's really been a perfect storm of developments over the past five to 10 years that's enabled it. ML, or machine learning, can solve problems using these developments that were not feasible to even consider running on a computer in the past. One of those is advancements in the ML algorithms themselves, the math and the software algorithms, things like deep neural networks and transformers that can be used to optimize business decisions. For example, when to order inventory items or how to prioritize your marketing budget. In addition, many of these algorithms, they're called models, when they're built could not have been trained before today's high performance infrastructure like Dell EMC servers and, in some cases, acceleration with specialized hardware like GPUs, and also with the massive explosion of data that's available today from sources like the internet and social networking, IOT devices at the edge collecting data and so forth.
These models can even be used to drive automated decisions. As an example, I had a real life example of this the other day, myself, I had a customer support issue with an order from a large e-commerce site, that I won't name, and my package arrived without the product in the box. So I had a hard time actually finding a real person to talk to get it resolved. So I tried their chatbot and within 10 to 15 minutes, I had the problem resolved with a replacement item being shipped without any human intervention. And that was all driven by machine learning on the backend.
And so chatbots were a great early win for many companies automating the customer service process, giving a better customer experience to customers. And reducing the human intervention that's required, reduces the cost. But ML is not just for chatbots and not just for the retail industry or e-commerce or even high tech, it's being adopted for a wide variety of use cases across just about every industry from high tech companies to education, manufacturing, retail, oil and gas. Just a few examples might be customers that are using ML techniques to do better fraud detection for credit card transactions or insurance claims things, customers who are in the financials and investment industry coming up with trading strategy recommendations or portfolio allocations. And in healthcare, especially in these COVID days, very exciting developments are happening with diagnosis and treatment recommendations and even developing new medicines and drugs to treat people.
So with all that potential, why isn't everybody already doing it? Well, the simple answer is, it's hard. And we're going to go through a few of the pitfalls, but there's a lot of complexity and challenges. A lot of this is brand new. And if you were going to start from zero, it could take you a very long time to avoid all of these pitfalls and get to the end solution.
As I mentioned, previously, finding the first problem to work on and building a business case is getting easier all the time. There's a lot of examples, all you have to do is search on the internet for some examples, or you might even just need to look at what your competitors are doing for ideas. So that's the easy part. The challenges have more to do with, first of all, getting started. There are hundreds of tools, hundreds of frameworks and libraries, a wide variety of orchestration engines. You need to know what hardware and how much you need. Integrating all those pieces so that they work together is definitely not trivial. Bryan and I can attest from our own experience. And for many customers going through this process on their own could take a year or more to actually get it all working.
So to avoid that, what many companies do is buy high-end, very expensive, either workstations or maybe a dedicated server for the data scientist. This is not the optimal solution. It's a quicker way to get data scientists into experimentation, but it's, first of all, over provisioned, so it's more expensive because each box is dedicated to one person, not shared. It's also usually not a managed solution by IT. So those boxes miss out on things like patch management and systems management. The worst problem about that approach is when you need to go to production, let's say the data scientists have come up with this revolutionary new ML model that's going to make a huge increase in revenue, but they need to run it in production. And the architecture of that high end workstation under their desk is almost certainly going to be different from what you need in the data center to scale and run reliably.
So a lot of those pitfalls have to do with the learning curve. And there's the fact that there's not a 100% pre-integrated product on the market today that you can just buy and do it. You need to find all of those individual components, the specialized hardware, make them all work together. And that's one set of challenges, but then there's also an organizational divide or a disconnect in many organizations between the data scientists who are experimenting and building these models, which is really kind of similar to a software development process. And then what the IT organization does to manage the data center and operate products or software products that are being promoted to production by the software development teams. Very different processes, very different architectures as well. In fact, these all contribute to what many studies have shown is that more than 80% of those ML models never make it out of experimentation, never make it to production. So it's a huge risk to a making that kind of investment to experiment and trial and try all of these different models and techniques for most of it never to be realized.
Another set of the challenges, I think, is in this diagram. I think the diagram kind of says it all. It's adapted from a Google white paper about the technical data machine learning. And the key message here is, if you see a small box in the middle called ML code, that's really what I've been talking about with the data scientist, experimentation and development process to build a model. And the reality is there's a lot more that needs to be in place to make the whole process work intend. When many companies start out, they think we just need to hire a team of data scientists and then magic will happen. But as you can see, you need things like data and data feeds. You need to collect the data, clean the data, have a pipeline to feed the data into your process. You need configuration management tools, you need modeling tools. And very importantly, you need tools and processes to be able to deploy those train models into production and scale them.
And I guess one thing that's not obvious in the picture is building a machine learning model and this end-to-end process is very analogous to developing software. And so there's a huge advantage to adopting best practices that have been developed and agreed over many, many years for developing software, and adopting those in your machine learning development process and also adopting processes from SRE and DevOps, for example, in how you operate the production machine learning environment. Those peripheral boxes are all things that Redapt is very good at and has been doing for many, many years, even though a lot of the ML code box techniques and algorithms are newer.
We identified a lot of these opportunities and challenges. We have a huge demand from customers driven by their own ROI opportunity of implementing ML. We looked at a lot of the reasons why they're not already doing it and many of the challenges they were facing. We came up with an offering, this ML Accelerator program to kind of address those. The result was, we've spent a significant amount of time coming up with a reference architecture for the hardware and software. It's all of those hundreds of different pieces that work together and are integrated and can be delivered in a ready to use rack from our facility to your data center.
And we combine that with our engineering expertise in advanced analytics, but then also in things like platform engineering and DevOps. We feel this is the best way to get you to production quickly and avoid a lot of those pitfalls. And it's architected in a way that follows how many or most of the very large scale research labs are building out their own AI data centers. And so we know it will scale. It's built on a foundation of cloud-native tools, like Kubernetes and containers that we have scaled many times very large for customers. So I'm going to handoff to Bryan at this point to drill into what the solution is all about and the architecture.
Yeah, thanks Paul. There really is a lot of engineering and work that goes into delivering a machine learning model into production. I think Paul hit on a lot of the challenges, and working through this, we wanted to come up with something that addresses those challenges for different organizations, kind of at different steps in their journey to that deploying scalable machine learning at production. We wanted something that started off in a small package that was kind of a minimum footprint that these organizations could get started, they could start developing some of these development skills that Paul talked about for software development life cycle for machine learning. And then they could take that and then they could grow and scale that out as the projects grow, where they expanded to more organizations or become a hub for machine learning, however large they want to grow.
So putting that together, we did that with Dell EMC hardware. We wanted... Enterprise grade hardware is something that has a proven track record and something that IT organizations are used to managing. The Dell EMC hardware with the iDRAC control and server management features, that is something that has been around for a long time and makes support and monitoring possible by the IT organization. We took that and we kind of built up, like I said, a minimum footprint to run our software solution on top of that. And we also wanted to make sure that we could support running training for large models or deep learning models with GPUs. So we have an infrastructure portion, and then we've got compute nodes that scale out and support GPUs to make this possible.
So a few of the key components that we deemed from software stack is we'll be looking at Kubernetes and Kubeflow to support Jupyter Notebooks, pipelines, things like that. And we know that a lot of these are new features, so we want to put all of this together and create a services flow that mimics the infrastructure. We have a group of data scientists, data engineers, who are working with customers to build out these types of solutions. And we wanted to transfer that and to be able to do that on premises in a built-out Dell EMC reference architecture.
Like I mentioned earlier, we based this off of Kubernetes. And we wanted to go with a Kubernetes that is easy to use and support and deploy. We built this out with Rancher, that makes managing different Kubernetes clusters very easy in it's HA configuration, highly available. So, as we start, even with the smallest piece, it's resilient and can support outages, which, if we've learned anything in IT, is we know that they will happen. We want to make sure that we've got support for traditional machine learning and CPU based training, as well as acceleration for some of the larger models. Like Paul was talking about using GPS. This is something that not all... This is sort of unique and that a lot of times a company or a vendor will focus on one or the other. And we want to make sure that we have a flexible platform that ties in your existing workloads, as well as accelerating larger workloads with GPUs.
We went with Kubeflow as a workflow manager. This is a project that came out of Google and it was being adopted across the industry very quickly. This is going to run in containers on top of Rancher. One of the things that Paul talked about early on was that putting together all these different pieces can be very difficult and can take a long time. And Paul and I, we went through this. We've had to struggle finding versions that worked with each other and making sure that there're no incompatibilities and that there's a repeatable way to deploy all of this and that whenever you're sitting down and you're trying to use it, you're not working through infrastructure issues, you're focused on creating models and delivering a business value.
One of the kind of the easy buttons in Kubeflow is, it integrates with Jupyter Notebooks for self service. So a Jupyter hub is integrated into Kubeflow, and it allows data scientists or analysts to spin up, no bugs, start working, and start working through their development lifecycle and start seeing results interactively. This runs on the cluster in a container, so once they're done with that, they can destroy the instance and the resources are free. Back to kind of the traditional workloads and not just a GPU-centric reference architecture, one of the things that the Kubeflows integrated is also a Spark operator.
But on top of that, we can use that Spark operator, or you can just run Spark natively on Kubernetes. And this allows you to do your data processing and everything separately from your machine learning, but all in the same solution. Right? And part of some of the problems that we're trying to solve is to make the development environment for data scientists, more supportable by IT organizations. And to do that, we included Prometheus and Grafana, and it's integrated with Rancher and you can monitor all of your cluster metrics and everything like that. It really makes things simple for IT organizations.
So looking through this, we could've picked any hardware vendor to deploy this with, but we have a long history at Redapt working with Dell EMC and that made Dell EMC a great fit for what we're doing here. We've worked hand in hand with their high value workload team on many projects in the past. We're a strategic OEM partner. We are able to work with you if you have software that you're deploying to customers, we can take that and we can brand that and we can deploy OEM solutions together with Dell OEM. And I think that some of these different pieces is what makes this solution special and have a little bit more value than just picking any hardware vendor to support the hardware.
So I think Paul and David mentioned a lot of these pieces, but in the machine learning space, I think they're especially valuable. As a company, Redapt has worked with a lot of cutting edge technology company, web tech company, companies that have been doing this and doing large scale out computing. And we've worked with a lot of these in the past. We take that knowledge and we take that understanding of what it takes to support a solution that can be scaled and that you can reliably support and manage as that grows. So taking that and pulling that down and incorporating it into a kind of a bite-sized piece with this ML Accelerator was one of our primary goals. And Redapt is placed very well in the fact that we have that experience building out that scalable hardware and infrastructure. And then we also have services on top of that to support building out the workloads that run on top of that. And that's kind of where this comes together.
It's not just, here's your hardware with a software stack on top of it, go learn this. We can sit down and work with you. We can come up with areas in your business that might benefit from machine learning. We can take those areas, we can build POCs around those. We can start creating models. We can do that either from scratch or we can take some model that has been created by a company that shared that, like something like Burke for natural language processing and adopt that to your specific use case. And we can see that all the way in through to production. I think that's what really makes Redapt special. So, thanks for watching. I think we're going to open this up to question and answers. Paul or David, did you have any more comments?
Actually I did get a couple of kinds of questions sent in through the Q&A and one was, I'll just kind of paraphrase it here. They're interested in doing some ML, but before they engage are there some things that they should be thinking about in regards to data?
Yeah, I'll go ahead and handle that. That's often one of the first places to start when you're creating machine learning initiatives, is looking at what kind of data you have available or what kind of data you need, and then taking that and starting to do symbol analysis on it and finding maybe outliers or bad data and start the process that way with simple experimentation. I think any initiative you have will start with data. So, that's a great question. And we spend a lot of time helping customers with that.
Okay. Yeah. Yeah. In fact, we have an entire practice dedicated to that. There's one more question. And so just getting started, what does an initial discovery meeting entail and then what kind of information should be collected or is important to know for that meeting?
Yeah, that's a really good question. I'll give it my 2 cents and then, Bryan, you can chip in if you have anything to add. But I think as far as initial discovery, I think having some understanding of what's going on in your industry, what your competitors are doing and what's possible in terms of what new types of problems ML solves are good educational types of homework to do before you get started. There's a lot of content out there to read. Having said that, our Accelerator program includes a professional services, engineering-driven jumpstart to help you get started. So it's not absolutely required that you know everything about what you're doing before you jump, because we're there to help guide you. What was the second part of that question, David?
Yeah. What kind of thing, what kind of information is helpful to bring or to know kind of for that meeting?
Oh, right. What's helpful to know going into this. Well, I think some of the information that's useful is knowing what your IT operations processes and organization looks like and how... that will help us to kind of understand how to translate what the architecture is to your team so that they can be comfortable taking it over and fill any gaps in terms of operations and management that they need to fill in. Bryan, do you have any thoughts on the machine learning side of what they need to bring?
Yeah. I mean, in this day and age, it's very rare that we talk with customers that don't have some sort of machine learning initiative already, even if it's in a specific business unit or kind of just on a team here or there. So if you could think about that and talk to those groups that may or may not be doing that, it would be useful to kind of come to the table and say, here's what we're trying to do and these are the pitfalls that we've already found. And that can be a great place to start. Like I said, I think most companies have some of these teams doing this already.
Yeah. Okay. One more question kind of trickled in here. I think we'll cap off after this. But the question is, I've heard data scientists are pricey, right? And just from my own experience, I've heard 400, $500,000 in salary a year compensation. I don't know if that's true or not. I think for some exceptional ones that that would be true, but are there machine learning models or technologies that are applicable to smaller sized companies, right? That maybe we don't need to make the investment in a data scientist, but we can use kind of some models, or is there another way to do that?
Yeah. I have a couple thoughts on that and Bryan probably does as well. So first of all, in terms of comp, it always depends. Data scientist is a pretty broad title and sometimes means different things to different people. So that can mean anything from people really more focused on building pipelines at a software engineering level, all the way to someone with a PhD in math, who's doing pure research. So it's kind of hard to tell, as far as the cost of what you need for head count, but one thing that's really encouraging in that is that there are so many prebuilt, or I'll say halfway built models available from these research groups that large companies. Google, Facebook, Microsoft, Amazon, and probably a hundred others all have research teams with a building full of those math PhDs who are building models completely from scratch as a research R&D effort. And many of them are open sourcing and making available that research so that other companies can take advantage of it.
And then, NVIDIA, I should've mentionedNVIDIA as well, has taken some of those OpenSource models and pre-packaged into Nvidia optimized containers that we can use on our platform. And so you have a model like BERT, I think Bryan mentioned earlier, a model called BERT. That one was originally developed by Google for natural language processing and is what a lot of companies use as the underlying foundation for chatbots understanding the text that's being put in and how to respond to that. So what you can do is start with that as a starting point, and you can't just use it off the shelf, but you can incrementally add to it to customize it for your use case. And that's a much easier problem to solve than building it all from scratch. If that makes sense?
Yeah. That does make sense. Perfect. Well, thank you guys. Thanks Bryan, thanks Paul, for your time. And I really appreciate it. And if any of you have additional questions, just reach out to us at Redapt and we'll connect you with Bryan and Paul so you can dive deeper into machine learning.
Categories
- Cloud Migration and Adoption
- Enterprise IT and Infrastructure
- Data Management and Analytics
- Artificial Intelligence and Machine Learning
- DevOps and Automation
- Cybersecurity and Compliance
- Application Modernization and Optimization
- Featured
- Managed Services & Cloud Cost Optimization
- News
- Workplace Modernization
- Tech We Like
- Social Good News
- Cost Optimization
- Hybrid Cloud Strategy
- NVIDIA
- AWS
- Application Development
- GPU