vllm serve "WeiboAI/VibeThinker-3B"

VibeThinker-3B: How This Compact AI Model Outperforms Larger Models in Math and Coding Reasoning

VibeThinker-3B: A 3B Dense Reasoning Model Built on Qwen2.5-Coder-3B With the Spectrum-to-Signal Post-Training Pipeline

The VibeThinker-3B model, developed by researchers at Sina Weibo, achieves state-of-the-art results in math and coding reasoning tasks with only 3 billion parameters. This compact AI model operates at a fraction of the size of its competitors, which typically rely on larger parameter counts. Its open-source MIT license and GPU-friendly 6GB footprint make it particularly appealing for developers with limited resources looking for efficient AI solutions.

Model	Params	AIME26	HMMT25	IMO-Ans
VibeThinker-3B	3B	94.3	89.3	76.4
DeepSeek V3.2	671B	94.2	90.2	78.3
Kimi K2.5	1T	93.3	95.4	81.8

Remarkably, the VibeThinker-3B model matches or surpasses larger AI models like DeepSeek (671B) and Kimi (1T) on key math benchmarks. Its 96.1% acceptance rate on unseen LeetCode coding problems further highlights its exceptional coding proficiency.

Test-Time Scaling With CLR

Claim-Level Reliability Assessment (CLR) is a parameter-free scaling technique introduced by VibeThinker-3B. This innovative method:

Generates 32 solution trajectories for each problem.
Extracts 5 key claims from each trajectory.
Validates claims internally to calculate reliability scores.
Selects the optimal answer through weighted clustering.

This technique boosts AIME26 performance to 97.1 and BruMO25 to 99.2, effectively narrowing the performance gap with larger models without increasing the parameter count.

Targeted Applications for Verifiable Tasks

The VibeThinker-3B model excels in areas where answers can be verified algorithmically:

Math Education: Generates step-by-step solutions for AIME/HMMT problems with 94.3% accuracy.
Coding Assistance: Achieves a 96.1% LeetCode acceptance rate for Python coding solutions.
Edge Computing: Operates locally on consumer GPUs with BF16 precision.
Cost-Efficient APIs: Reduces inference costs by 200x compared to models with over 600 billion parameters.

Deployment Made Simple

Starting with the VibeThinker-3B model requires standard ML stacks:

pip install vllm vllm serve "WeiboAI/VibeThinker-3B"

For direct integration with the VibeThinker-3B model:

from transformers import AutoModelForCausalLM, AutoTokenizer tok = AutoTokenizer.from_pretrained("WeiboAI/VibeThinker-3B", trust_remote_code=True)

A key configuration tip for using VibeThinker-3B is to set max_new_tokens=102400 to accommodate lengthy reasoning chains.

vllm serve "WeiboAI/VibeThinker-3B"

VibeThinker-3B: How This Compact AI Model Outperforms Larger Models in Math and Coding Reasoning

Test-Time Scaling With CLR

Targeted Applications for Verifiable Tasks

Deployment Made Simple

Dr. Elena Vasquez

Questions

Comments

Leave a Comment