DeepSeek Unveils FlashMLA for Hopper GPUs

FlashMLA

DeepSeek, an innovative Chinese AI lab backed by High-Flyer startup, has kicked off its much-anticipated “Open Source Week” with the release of FlashMLA, a cutting-edge MLA decoding kernel tailored for Hopper GPUs. Designed to efficiently process variable-length sequences, FlashMLA is now fully deployed in production, marking a significant advancement in AI-driven computational performance.

Key Features of FlashMLA

FlashMLA is engineered to enhance decoding efficiency, boasting support for BF16 (bfloat16) precision and integrating a paged KV cache with a block size of 64. These enhancements allow for optimized memory management and faster computational execution, making it a game-changer for AI applications.

Unparalleled Performance Benchmarks

On H800 GPUs, FlashMLA achieves remarkable speeds, demonstrating up to 3000 GB/s in memory-intensive tasks and 580 TFLOPS in compute-intensive workloads. Such high-speed performance sets a new standard for AI model execution, particularly in fields that demand rapid processing, such as cryptocurrency trading algorithms and machine learning inference.

Inspired by Leading AI Technologies

DeepSeek attributes the innovation behind FlashMLA to inspirations drawn from prominent projects like FlashAttention 2 & 3 and Cutlass. By leveraging these advancements, the company has successfully developed a kernel that optimizes memory bandwidth and computational throughput, ensuring superior efficiency in large-scale AI deployments.

FlashMLA Now Available on GitHub

In a recent post on X (formerly Twitter), DeepSeek expressed excitement over the release: “Honored to share FlashMLA – our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production.”

The kernel is now available on GitHub, allowing developers and researchers to explore and integrate it into their own AI applications.

DeepSeek’s Growing Open-Source Contributions

DeepSeek has also announced the launch of five new open-source repositories as part of its initiative to drive transparency in AI research. The lab, which is actively exploring Artificial General Intelligence (AGI), emphasized its commitment to sharing progress openly with the AI community.

Currently, DeepSeek maintains 14 open-source models and repositories on Hugging Face, including its recently launched DeepSeek-R1 and DeepSeek-V3 models. These models have been recognized for delivering state-of-the-art performance at a fraction of the cost compared to industry competitors.

The Future of AI with DeepSeek

With the introduction of FlashMLA and its ongoing open-source contributions, DeepSeek continues to push the boundaries of AI innovation. As the company advances toward its AGI goals, its growing suite of high-performance AI tools is set to impact industries far beyond machine learning, from high-frequency trading to real-time data analytics.

For more details, developers can explore FlashMLA on GitHub and stay updated on DeepSeek’s latest AI breakthroughs through its official channels.

Read more: Mira Murati Launches AI Startup “Thinking Machines Lab” Six Months After OpenAI Exit

more insights

GlobalBizOutlook is the platform that provides you with best business practices delivered by individuals, companies, and industries around the globe. Learn more

GlobalBizOutlook is the platform that provides you with best business practices delivered by individuals, companies, and industries around the globe. Learn more

Advertise with GlobalBiz Outlook

Fill the details to get 

  • Detailed demographic data
  • Affiliate partnership opportunities
  • Subscription Plans as per Business Size
Advertise with GlobalBiz Outlook

Are you looking to reach your target audience?

Fill the details to get 

  • Detailed demographic data
  • Affiliate partnership opportunities
  • Subscription Plans as per Business Size