VibeThinker-3B: How This Compact AI Model Outperforms Larger Models in Math and Coding Reasoning

The VibeThinker-3B model, developed by researchers at Sina Weibo, achieves state-of-the-art results in math and coding reasoning tasks with only 3 billion parameters. This compact AI model operates at a fraction of the size of its competitors, which typically rely on larger parameter counts. Its open-source MIT license and GPU-friendly 6GB footprint make it particularly appealing for developers with limited resources looking for efficient AI solutions.
| Model | Params | AIME26 | HMMT25 | IMO-Ans |
|---|---|---|---|---|
| VibeThinker-3B | 3B | 94.3 | 89.3 | 76.4 |
| DeepSeek V3.2 | 671B | 94.2 | 90.2 | 78.3 |
| Kimi K2.5 | 1T | 93.3 | 95.4 | 81.8 |
Remarkably, the VibeThinker-3B model matches or surpasses larger AI models like DeepSeek (671B) and Kimi (1T) on key math benchmarks. Its 96.1% acceptance rate on unseen LeetCode coding problems further highlights its exceptional coding proficiency.
Test-Time Scaling With CLR
Claim-Level Reliability Assessment (CLR) is a parameter-free scaling technique introduced by VibeThinker-3B. This innovative method:
- Generates 32 solution trajectories for each problem.
- Extracts 5 key claims from each trajectory.
- Validates claims internally to calculate reliability scores.
- Selects the optimal answer through weighted clustering.
This technique boosts AIME26 performance to 97.1 and BruMO25 to 99.2, effectively narrowing the performance gap with larger models without increasing the parameter count.
Targeted Applications for Verifiable Tasks
The VibeThinker-3B model excels in areas where answers can be verified algorithmically:
- Math Education: Generates step-by-step solutions for AIME/HMMT problems with 94.3% accuracy.
- Coding Assistance: Achieves a 96.1% LeetCode acceptance rate for Python coding solutions.
- Edge Computing: Operates locally on consumer GPUs with BF16 precision.
- Cost-Efficient APIs: Reduces inference costs by 200x compared to models with over 600 billion parameters.
Deployment Made Simple
Starting with the VibeThinker-3B model requires standard ML stacks:
pip install vllm vllm serve "WeiboAI/VibeThinker-3B"For direct integration with the VibeThinker-3B model:
from transformers import AutoModelForCausalLM, AutoTokenizer tok = AutoTokenizer.from_pretrained("WeiboAI/VibeThinker-3B", trust_remote_code=True)A key configuration tip for using VibeThinker-3B is to set max_new_tokens=102400 to accommodate lengthy reasoning chains.