Hugging Face Offers Inference-as-a-Service powered by NVIDIA NIM for Developers

NVIDIA-accelerated inference on some of the most well-liked AI models is now easily accessible to one of the largest AI communities globally, which consists of 4 million developers on the Hugging Face platform.

Leading large language models, such as the Llama 3 family and Mistral AI models, can be quickly deployed by developers thanks to new inference-as-a-service features. These models are optimized by NVIDIA NIM microservices operating on NVIDIA DGX Cloud.

The service, which was unveiled today at SIGGRAPH, will assist developers in swiftly prototyping and implementing open-source AI models hosted on the Hugging Face Hub in production. With NVIDIA NIM, Enterprise Hub users may take advantage of serverless inference to maximize performance, reduce infrastructure costs, and boost flexibility.

Hugging Face’s existing AI training service, Train on DGX Cloud, is enhanced by the inference service.

A central location where developers can quickly evaluate possibilities can be helpful as the number of open-source models available to them grows. Hugging Face developers now have new methods to test, experiment, and implement state-of-the-art models on NVIDIA-accelerated infrastructure thanks to these training and inference tools. With just a few clicks, users can get started thanks to the “Train” and “Deploy” drop-down choices on Hugging Face model cards, which make them immediately accessible.

Start using NVIDIA NIM-powered inference-as-a-service at this point.

Beyond a Simple Gesture, NVIDIA NIM Offers Significant Advantages

Using industry-standard application programming interfaces or APIs, NVIDIA NIM is a collection of AI microservices optimized for inference. These microservices include NVIDIA AI foundation models and open-source community models.

Higher processing efficiency is provided by NIM to users when handling tokens, which are data units produced and used by language models. Critical AI apps may run faster as a result of the optimized microservices’ increased efficiency in the underlying NVIDIA DGX Cloud architecture.

This indicates that when compared to previous iterations of the model, developers witness quicker, more reliable outcomes from an AI model accessed as a NIM. When accessible as a NIM, the 70 billion parameter version of Llama 3, for instance, provides up to 5 times greater throughput than off-the-shelf deployment on GPU-powered computers with NVIDIA H100 Tensor Cores.

Accessible AI Acceleration is Provided by Near-Instant Access to DGX Cloud

Designed specifically for generative AI, the NVIDIA DGX Cloud platform gives developers quick access to dependable accelerated computing infrastructure, enabling them to launch production-ready apps more quickly.

The platform doesn’t require developers to commit to long-term AI infrastructure because it offers scalable GPU resources that assist AI development at every stage, from prototype to production.

Utilizing NVIDIA DGX Cloud’s Hugging Face inference-as-a-service, which is driven by NIM microservices, customers may experiment with the newest AI models in an enterprise-grade setting with simple access to compute resources tailored for AI deployment.

More about SIGGRAPH-exclusive NVIDIA NIM

NVIDIA also unveiled NIM microservices for the OpenUSD framework and generative AI models at SIGGRAPH to help developers create more realistic virtual environments faster in preparation for the next wave of AI.

Visit ai.nvidia.com to experience over 100 NVIDIA NIM microservices with applications spanning industries.

Greg Lyons Joins Subway as Global Chief Marketing Officer

Global Biz Outlook Reviews Google’s Edge Gallery: Run AI on Your Phone Without Internet

Google Launches Edge Gallery: Offline AI Experience Now Possible on Android Smartphones

India’s Top 10 Women Leaders in Global Capability Centres (GCCs) – 2025 Edition

Send Us A Message

more insights

Who we are

Special Edition

Exclusive Content

GlobalBizOutlook is the platform that provides you with best business practices delivered by individuals, companies, and industries around the globe. Learn more

Technology

IT & Consulting

IT & Consulting

Industry

Technology

IT & Consulting

IT & Consulting

Industry

Hugging Face Offers Inference-as-a-Service powered by NVIDIA NIM for Developers

Share:

More Posts

Greg Lyons Joins Subway as Global Chief Marketing Officer

Global Biz Outlook Reviews Google’s Edge Gallery: Run AI on Your Phone Without Internet

Google Launches Edge Gallery: Offline AI Experience Now Possible on Android Smartphones

India’s Top 10 Women Leaders in Global Capability Centres (GCCs) – 2025 Edition

Send Us A Message

more insights

Greg Lyons Joins Subway as Global Chief Marketing Officer

Global Biz Outlook Reviews Google’s Edge Gallery: Run AI on Your Phone Without Internet

Google Launches Edge Gallery: Offline AI Experience Now Possible on Android Smartphones

India’s Top 10 Women Leaders in Global Capability Centres (GCCs) – 2025 Edition

Who we are

Special Edition

Exclusive Content

Who we are

Special Edition

Exclusive Content