We are thrilled to announce the general availability of the Cloudera AI Inference service, powered by NVIDIA NIM microservices, part of the NVIDIA AI Enterprise platform, to accelerate generative AI deployments for enterprises. This service supports a range of optimized AI models, enabling seamless and scalable AI inference.
Background
The generative AI landscape is evolving at a rapid pace, marked by explosive growth and widespread adoption across industries. In 2022, the release of ChatGPT attracted over 100 million users within just two months, demonstrating the technology’s accessibility and its impact across various user skill levels.
By 2023, the focus shifted towards experimentation. Enterprise developers began exploring proof of concepts (POCs) for generative AI applications, leveraging API services and open models such as Llama 2 and Mistral. These innovations pushed the boundaries of what generative AI could achieve.
Now, in 2024, generative AI is moving into the production phase for many companies. Businesses are now allocating dedicated budgets and building infrastructure to support AI applications in real-world environments. However, this transition presents significant challenges. Enterprises are increasingly concerned with safeguarding intellectual property (IP), maintaining brand integrity, and protecting client confidentiality while adhering to regulatory requirements.
A major risk is data exposure — AI systems must be designed to align with company ethics and meet strict regulatory standards without compromising functionality. Ensuring that AI systems prevent breaches of client confidentiality, personally identifiable information (PII), and data security is crucial for mitigating these risks.
Enterprises also face the challenge of maintaining control over AI development and deployment across disparate environments. They require solutions that offer robust security, ownership, and governance throughout the entire AI lifecycle, from POC to full production. Additionally, there is a need for enterprise-grade software that streamlines this transition while meeting stringent security requirements.
To safely leverage the full potential of generative AI, companies must address these challenges head-on. Typically, organizations approach generative AI POCs in one of two ways: by using third-party services, which are easy to implement but require sharing private data externally, or by developing self-hosted solutions using a mix of open-source and commercial tools.
At Cloudera, we focus on simplifying the development and deployment of generative AI models for production applications. Our approach provides accelerated, scalable, and efficient infrastructure along with enterprise-grade security and governance. This combination helps organizations confidently adopt generative AI while protecting their IP, brand reputation, and compliance with regulatory standards.
Cloudera AI Inference Service
The new Cloudera AI Inference service provides accelerated model serving, enabling enterprises to deploy and scale AI applications with enhanced speed and efficiency. By leveraging the NVIDIA NeMo platform and optimized versions of open-source models like Llama 3 and Mistral, businesses can harness the latest advancements in natural language processing, computer vision, and other AI domains.
Cloudera AI Inference: Scalable and Secure Model Serving
The Cloudera AI Inference service offers a powerful combination of performance, security, and scalability designed for modern AI applications. Powered by NVIDIA NIM, it delivers market-leading performance with substantial time and cost savings. Hardware and software optimizations enable up to 36 times faster inference with NVIDIA accelerated computing and nearly four times the throughput on CPUs, accelerating decision-making.
Integration with NVIDIA Triton Inference Server further enhances the service. It provides standardized, efficient deployment with support for open protocols, reducing deployment time and complexity.
In terms of security, the Cloudera AI Inference service delivers robust protection and control. Customers can deploy AI models within their virtual private cloud (VPC) while maintaining strict privacy and control over sensitive data in the cloud. All communications between the applications and model endpoints remain within the customer’s secured environment.
Comprehensive safeguards, including authentication and authorization, ensure that only users with configured access can interact with the model endpoint. The service also meets enterprise-grade security and compliance standards, recording all model interactions for governance and audit.
The Cloudera AI Inference service also offers exceptional scalability and flexibility. It supports hybrid environments, allowing seamless transitions between on-premises and cloud deployments for increased operational flexibility.
Seamless integration with CI/CD pipelines enhances MLOps workflows, while dynamic scaling and distributed serving optimize resource usage. These features reduce costs without compromising performance. High availability and disaster recovery capabilities help enable continuous operation and minimal downtime.
Feature Highlights:
- Hybrid and Multi-Cloud Support: Enables deployment across on-premises*, public cloud, and hybrid environments, offering flexibility to meet diverse enterprise infrastructure needs.
- Model Registry Integration: Seamlessly integrates with Cloudera AI Registry, a centralized repository for storing, versioning, and managing models, enabling consistency and easy access to different model versions.
- Detailed Data and Model Lineage Tracking*: Ensures comprehensive tracking and documentation of data transformations and model lifecycle events, enhancing reproducibility and auditability.
- Enterprise-Grade Security: Implements robust security measures, including authentication, authorization*, and data encryption, helping ensure that data and models are protected both in transit and at rest.
- Real-time Inference Capabilities: Provides real-time predictions with low latency and batch processing for large datasets, offering flexibility in serving AI models based on different needs.
- High Availability and Dynamic Scaling: Features high availability configurations and dynamic scaling capabilities to efficiently handle varying loads while delivering continuous service.
- Advanced Language Model: Support with pre-generated optimized engines for a diverse range of cutting-edge LLM architectures.
- Flexible Integration: Easily integrate with existing workflows and applications. Developers are provided open inference protocol APIs for traditional ML models and with an OpenAI compatible API for LLMs.
- Multiple AI Framework Support: Integrates seamlessly with popular machine learning frameworks such as TensorFlow, PyTorch, Scikit-learn, and Hugging Face Transformers, making it easy to deploy a wide variety of model types.
- Advanced Deployment Patterns: Supports sophisticated deployment strategies like canary and blue-green deployments*, as well as A/B testing*, enabling safe and gradual rollouts of new model versions.
- Open APIs: Provides standards-compliant, open APIs for deploying, managing, and monitoring online models and applications*, as well as for facilitating integration with CI/CD pipelines and other MLOps tools.
- Performance Monitoring and Logging: Provides comprehensive monitoring and logging capabilities, tracking performance metrics such as latency, throughput, resource utilization, and model health, supporting troubleshooting and optimization.
- Business Monitoring*: Supports continuous monitoring of key generative AI modeI metrics like sentiment, user feedback, and drift that are crucial for maintaining model quality and performance.
The Cloudera AI Inference service, powered by NVIDIA NIM microservices, delivers seamless, high-performance AI model inferencing across on-premises and cloud environments. Supporting open-source community models, NVIDIA AI Foundation models, and custom AI models, it offers the flexibility to meet diverse business needs. The service enables rapid deployment of generative AI applications at scale, with a strong focus on privacy and security, to help enterprises that want to unlock the full potential of their data with AI models in production environments.
* feature coming soon – please reach out to us if you have questions or would like to learn more.
The post Deploy and Scale AI Applications With Cloudera AI Inference Service appeared first on Cloudera Blog.