NVIDIA-accelerated inference on some of the most well-liked AI models is now easily accessible to one of the largest AI communities globally, which consists of 4 million developers on the Hugging Face platform.
Leading large language models, such as the Llama 3 family and Mistral AI models, can be quickly deployed by developers thanks to new inference-as-a-service features. These models are optimized by NVIDIA NIM microservices operating on NVIDIA DGX Cloud.
The service, which was unveiled today at SIGGRAPH, will assist developers in swiftly prototyping and implementing open-source AI models hosted on the Hugging Face Hub in production. With NVIDIA NIM, Enterprise Hub users may take advantage of serverless inference to maximize performance, reduce infrastructure costs, and boost flexibility.
Hugging Face’s existing AI training service, Train on DGX Cloud, is enhanced by the inference service.
A central location where developers can quickly evaluate possibilities can be helpful as the number of open-source models available to them grows. Hugging Face developers now have new methods to test, experiment, and implement state-of-the-art models on NVIDIA-accelerated infrastructure thanks to these training and inference tools. With just a few clicks, users can get started thanks to the “Train” and “Deploy” drop-down choices on Hugging Face model cards, which make them immediately accessible.
Start using NVIDIA NIM-powered inference-as-a-service at this point.
Beyond a Simple Gesture, NVIDIA NIM Offers Significant Advantages
Using industry-standard application programming interfaces or APIs, NVIDIA NIM is a collection of AI microservices optimized for inference. These microservices include NVIDIA AI foundation models and open-source community models.
Higher processing efficiency is provided by NIM to users when handling tokens, which are data units produced and used by language models. Critical AI apps may run faster as a result of the optimized microservices’ increased efficiency in the underlying NVIDIA DGX Cloud architecture.
This indicates that when compared to previous iterations of the model, developers witness quicker, more reliable outcomes from an AI model accessed as a NIM. When accessible as a NIM, the 70 billion parameter version of Llama 3, for instance, provides up to 5 times greater throughput than off-the-shelf deployment on GPU-powered computers with NVIDIA H100 Tensor Cores.
Accessible AI Acceleration is Provided by Near-Instant Access to DGX Cloud
Designed specifically for generative AI, the NVIDIA DGX Cloud platform gives developers quick access to dependable accelerated computing infrastructure, enabling them to launch production-ready apps more quickly.
The platform doesn’t require developers to commit to long-term AI infrastructure because it offers scalable GPU resources that assist AI development at every stage, from prototype to production.
Utilizing NVIDIA DGX Cloud’s Hugging Face inference-as-a-service, which is driven by NIM microservices, customers may experiment with the newest AI models in an enterprise-grade setting with simple access to compute resources tailored for AI deployment.
More about SIGGRAPH-exclusive NVIDIA NIM
NVIDIA also unveiled NIM microservices for the OpenUSD framework and generative AI models at SIGGRAPH to help developers create more realistic virtual environments faster in preparation for the next wave of AI.
Visit ai.nvidia.com to experience over 100 NVIDIA NIM microservices with applications spanning industries.