ISSUE 003 April 19, 2026

AI Research Weekly – April 19, 2026

/ 018.2/10

When Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 Convergence

Summary

This paper reveals that neural networks undergo a three-phase quantization degradation during training: rapid learning, meta-stable plateau, and explosive INT4 collapse. Critically, the collapse begins when FP32 perplexity stops improving—not when learning rates decay—meaning continued training after convergence actively destroys quantization robustness while providing no FP32 benefit. INT8 quantization remains unaffected throughout, and weight distributions become more uniform (not more outlier-heavy) during collapse. The authors propose monitoring validation perplexity derivatives rather than learning rate schedules to predict quantization fragility, and demonstrate that calibrated oscillatory schedules can partially mitigate the problem.

Key findings

INT4 quantization collapse follows a three-phase structure with a meta-stable plateau lasting ~70,000 steps before explosive divergence
Divergence begins precisely when FP32 perplexity converges, not when learning rate decays, providing an actionable early stopping signal
INT8 quantization remains <1% gap while INT4 reaches 517%, constraining the mechanism to 16-level grid resolution
Weight kurtosis decreases during collapse phase, directly refuting outlier accumulation as the underlying mechanism
Oscillatory schedules help only with calibrated amplitude; naive SGDR restarts uniformly worsen quantization robustness

How to implement

Implement perplexity derivative monitoring in training pipelines to trigger early stopping before INT4 compatibility degrades, preserving quantization robustness for mobile/edge deployment
Modify existing LLM training loops to save checkpoints at FP32 convergence point rather than training completion, ensuring better post-training quantization results
Deploy calibrated oscillatory learning rate schedules in production training runs to maintain quantization compatibility while avoiding the performance degradation of naive warm restarts

AI Research Weekly – April 19, 2026

When Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 Convergence

Summary

Key findings

How to implement

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Summary

Key findings

How to implement

GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens

Summary

Key findings

How to implement

DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation

Summary

Key findings

How to implement

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

Summary

Key findings

How to implement

HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

Summary

Key findings

How to implement

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Summary

Key findings

How to implement

Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes

Summary

Key findings

How to implement

Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models

Summary

Key findings

How to implement

An Optimal Transport-driven Approach for Cultivating Latent Space in Online Incremental Learning

Summary

Key findings

How to implement

Context Over Content: Exposing Evaluation Faking in Automated Judges

Summary

Key findings

How to implement

Bounded Autonomy for Enterprise AI: Typed Action Contracts and Consumer-Side Execution

Summary

Key findings

How to implement

VoxSafeBench: Not Just What Is Said, but Who, How, and Where

Summary

Key findings

How to implement

Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge

Summary

Key findings

How to implement

Reinforcement Learning via Value Gradient Flow

Summary

Key findings

How to implement