LIVE NEWS
  • Global economy must stop pandering to ‘frivolous desires of ultra-rich’, says UN expert | Environment
  • Some Middle East Flights Resume but Confusion Reigns From Iran Strikes
  • Clinton Deposition Videos Released in Epstein Investigation
  • Elevance stock tumbles as CMS may halt Medicare enrollment
  • Wild spaces for butterflies to be created in Glasgow
  • You can now adjust how your caller card looks for calls on Android phones
  • TRON DAO expands TRON Academy initiative with Dartmouth, Princeton, Oxford, and Cambridge
  • Alex Mitchell: England scrum-half ruled out of Six Nations
Prime Reports
  • Home
  • Popular Now
  • Crypto
  • Cybersecurity
  • Economy
  • Geopolitics
  • Global Markets
  • Politics
  • See More
    • Artificial Intelligence
    • Climate Risks
    • Defense
    • Healthcare Innovation
    • Science
    • Technology
    • World
Prime Reports
  • Home
  • Popular Now
  • Crypto
  • Cybersecurity
  • Economy
  • Geopolitics
  • Global Markets
  • Politics
  • Artificial Intelligence
  • Climate Risks
  • Defense
  • Healthcare Innovation
  • Science
  • Technology
  • World
Home»Artificial Intelligence»Liquid AI’s New LFM2-24B-A2B Hybrid Architecture Blends Attention with Convolutions to Solve the Scaling Bottlenecks of Modern LLMs
Artificial Intelligence

Liquid AI’s New LFM2-24B-A2B Hybrid Architecture Blends Attention with Convolutions to Solve the Scaling Bottlenecks of Modern LLMs

primereportsBy primereportsFebruary 25, 2026No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Liquid AI’s New LFM2-24B-A2B Hybrid Architecture Blends Attention with Convolutions to Solve the Scaling Bottlenecks of Modern LLMs
Share
Facebook Twitter LinkedIn Pinterest Email


The generative AI race has long been a game of ‘bigger is better.’ But as the industry hits the limits of power consumption and memory bottlenecks, the conversation is shifting from raw parameter counts to architectural efficiency. Liquid AI team is leading this charge with the release of LFM2-24B-A2B, a 24-billion parameter model that redefines what we should expect from edge-capable AI.

Liquid AI’s New LFM2-24B-A2B Hybrid Architecture Blends Attention with Convolutions to Solve the Scaling Bottlenecks of Modern LLMsLiquid AI’s New LFM2-24B-A2B Hybrid Architecture Blends Attention with Convolutions to Solve the Scaling Bottlenecks of Modern LLMs
https://www.liquid.ai/blog/lfm2-24b-a2b

The ‘A2B’ Architecture: A 1:3 Ratio for Efficiency

The ‘A2B’ in the model’s name stands for Attention-to-Base. In a traditional Transformer, every layer uses Softmax Attention, which scales quadratically (O(N2)) with sequence length. This leads to massive KV (Key-Value) caches that devour VRAM.

Liquid AI team bypasses this by using a hybrid structure. The ‘Base‘ layers are efficient gated short convolution blocks, while the ‘Attention‘ layers utilize Grouped Query Attention (GQA).

In the LFM2-24B-A2B configuration, the model uses a 1:3 ratio:

  • Total Layers: 40
  • Convolution Blocks: 30
  • Attention Blocks: 10

By interspersing a small number of GQA blocks with a majority of gated convolution layers, the model retains the high-resolution retrieval and reasoning of a Transformer while maintaining the fast prefill and low memory footprint of a linear-complexity model.

Sparse MoE: 24B Intelligence on a 2B Budget

The most important thing of LFM2-24B-A2B is its Mixture of Experts (MoE) design. While the model contains 24 billion parameters, it only activates 2.3 billion parameters per token.

This is a game-changer for deployment. Because the active parameter path is so lean, the model can fit into 32GB of RAM. This means it can run locally on high-end consumer laptops, desktops with integrated GPUs (iGPUs), and dedicated NPUs without needing a data-center-grade A100. It effectively provides the knowledge density of a 24B model with the inference speed and energy efficiency of a 2B model.

https://www.liquid.ai/blog/lfm2-24b-a2b

Benchmarks: Punching Up

Liquid AI team reports that the LFM2 family follows a predictable, log-linear scaling behavior. Despite its smaller active parameter count, the 24B-A2B model consistently outperforms larger rivals.

  • Logic and Reasoning: In tests like GSM8K and MATH-500, it rivals dense models twice its size.
  • Throughput: When benchmarked on a single NVIDIA H100 using vLLM, it reached 26.8K total tokens per second at 1,024 concurrent requests, significantly outpacing Snowflake’s gpt-oss-20b and Qwen3-30B-A3B.
  • Long Context: The model features a 32k token context window, optimized for privacy-sensitive RAG (Retrieval-Augmented Generation) pipelines and local document analysis.

Technical Cheat Sheet

PropertySpecification
Total Parameters24 Billion
Active Parameters2.3 Billion
ArchitectureHybrid (Gated Conv + GQA)
Layers40 (30 Base / 10 Attention)
Context Length32,768 Tokens
Training Data17 Trillion Tokens
LicenseLFM Open License v1.0
Native Supportllama.cpp, vLLM, SGLang, MLX

Key Takeaways

  • Hybrid ‘A2B’ Architecture: The model uses a 1:3 ratio of Grouped Query Attention (GQA) to Gated Short Convolutions. By utilizing linear-complexity ‘Base’ layers for 30 out of 40 layers, the model achieves much faster prefill and decode speeds with a significantly reduced memory footprint compared to traditional all-attention Transformers.
  • Sparse MoE Efficiency: Despite having 24 billion total parameters, the model only activates 2.3 billion parameters per token. This ‘Sparse Mixture of Experts’ design allows it to deliver the reasoning depth of a large model while maintaining the inference latency and energy efficiency of a 2B-parameter model.
  • True Edge Capability: Optimized via hardware-in-the-loop architecture search, the model is designed to fit in 32GB of RAM. This makes it fully deployable on consumer-grade hardware, including laptops with integrated GPUs and NPUs, without requiring expensive data-center infrastructure.
  • State-of-the-Art Performance: LFM2-24B-A2B outperforms larger competitors like Qwen3-30B-A3B and Snowflake gpt-oss-20b in throughput. Benchmarks show it hits approximately 26.8K tokens per second on a single H100, showing near-linear scaling and high efficiency in long-context tasks up to its 32k token window.

Check out the Technical details and Model weights. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleSeveral trends are shifting defense tech toward Europe
Next Article Cyber valuations climb as capital concentrates, AI security expands
primereports
  • Website

Related Posts

Artificial Intelligence

The Greatest AI Show On Earth

February 25, 2026
Artificial Intelligence

Judge Dismisses Elon Musk’s XAI Trade Secret Lawsuit Against OpenAI

February 25, 2026
Artificial Intelligence

OpenClaw is being called a security “Dumpster fire,” but there is a way to stay safe

February 25, 2026
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Global Resources Outlook 2024 | UNEP

December 6, 20255 Views

The D Brief: DHS shutdown likely; US troops leave al-Tanf; CNO’s plea to industry; Crowded robot-boat market; And a bit more.

February 14, 20264 Views

German Chancellor Merz faces difficult mission to Israel – DW – 12/06/2025

December 6, 20254 Views
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Latest Reviews

Subscribe to Updates

Get the latest tech news from FooBar about tech, design and biz.

PrimeReports.org
Independent global news, analysis & insights.

PrimeReports.org brings you in-depth coverage of geopolitics, markets, technology and risk – with context that helps you understand what really matters.

Editorially independent · Opinions are those of the authors and not investment advice.
Facebook X (Twitter) LinkedIn YouTube
Key Sections
  • World
  • Geopolitics
  • Popular Now
  • Artificial Intelligence
  • Cybersecurity
  • Crypto
All Categories
  • Artificial Intelligence
  • Climate Risks
  • Crypto
  • Cybersecurity
  • Defense
  • Economy
  • Geopolitics
  • Global Markets
  • Healthcare Innovation
  • Politics
  • Popular Now
  • Science
  • Technology
  • World
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms & Conditions
  • Disclaimer
  • Cookie Policy
  • DMCA / Copyright Notice
  • Editorial Policy

Sign up for Prime Reports Briefing – essential stories and analysis in your inbox.

By subscribing you agree to our Privacy Policy. You can opt out anytime.
Latest Stories
  • Global economy must stop pandering to ‘frivolous desires of ultra-rich’, says UN expert | Environment
  • Some Middle East Flights Resume but Confusion Reigns From Iran Strikes
  • Clinton Deposition Videos Released in Epstein Investigation
© 2026 PrimeReports.org. All rights reserved.
Privacy Terms Contact

Type above and press Enter to search. Press Esc to cancel.