Blogs

Here are the blogs where I share my thoughts and notes on various topics.

LLM for Theorem Proving

CCF软件工程专业委员会首批发布的三个重要难题之一：自动程序定理证明（如何高度自动化地生成程序定理的形式化证明？）
One of the three key challenges initially released by the CCF Technical Committee of Software Engineering: Automated Program Theorem Proving (How can formal proofs of program theorems be generated in a highly automated manner?)

SeedProver: (Chen et al. 2025)
- Lemma-Style Proving: 分解任务, 独立验证, 构建知识库, 组合证明.
- Test-Time Scaling
  - Light: Proof Sketch → Lean → Error Feedback → Refine
  - Medium: Outer Loop for main theorem, Inner Loop for hard lemmas.
  - Heavy: Conjecture Proposer (Conjecture Pool) → Light Prover → Lemma Pool → Evaluate → Select High-Quality Lemmas, let Medium Prover prove main theorem again.
- IMO 2025: 4/6 problems solved.

In my opinion, some agent techniques can be helpful for theorem proving as well, so I may record some notes of agent papers here:

Live-SWE-Agent: (Xia et al. 2025)
- the first live software agent that can autonomously and continuously evolve itself on-the-fly during runtime
- starts with the most basic agent scaffold with only access to bash tools
- autonomously evolves its own scaffold implementation
- 75.4% in SWE-bench Verified; 45.8% in SWE-Bench Pro. (SOTA in 2025.11)
- Q: What are the tools in Coq/Lean that can help us evolve the proof strategies on-the-fly during proof construction?
  - the necessary helper lemmas
  - Reducing to only necessary hypotheses that imply the goal (Brendel, Sivaraman, and Millstein 2025)

Formal Methods in AI

Static Analysis for DL Architectures

DEBAR: Detecting numerical bugs in neural network architectures (Zhang et al. 2020)
- Affine Relations Abstract Domain.
- Tensor Partitioning.
RANUM: Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects (Li et al. 2023)
- Potential-Defect Detection via Static Analysis.
- Feasibility Confirmation via Two-Step Test Generation.
- Fix Suggestion via Abstract Optimization.

Static Analysis for DL Models

Use SMT solvers to verify properties of DL models.
Use integer programming to verify robustness of DL models.
Abstract interpretation for DL models. (e.g., \(\text{AI}^2\), DeepPoly, etc.)
- Zonotope abstract domain.

SOTA tools:

\(\alpha,\beta\)-CROWN

Dafny Blogs

Two quantifiers in a specification

Others

A brief note about Dataflow Analysis
The first Theorem Proving Competition: 第一届定理证明竞赛
OCaml 学习笔记: OCaml 学习笔记

References

Brendel, Ana, Aishwarya Sivaraman, and Todd Millstein. 2025. “Synthesizing Implication Lemmas for Interactive Theorem Proving.” Proc. ACM Program. Lang. 9 (OOPSLA2). https://doi.org/10.1145/3763131.

Chen, Luoxin, Jinming Gu, Liankai Huang, Wenhao Huang, Zhicheng Jiang, Allan Jie, Xiaoran Jin, et al. 2025. “Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving.” https://arxiv.org/abs/2507.23726.

Li, Linyi, Yuhao Zhang, Luyao Ren, Yingfei Xiong, and Tao Xie. 2023. “Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects.” In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 1827–39. https://doi.org/10.1109/ICSE48619.2023.00156.

Xia, Chunqiu Steven, Zhe Wang, Yan Yang, Yuxiang Wei, and Lingming Zhang. 2025. “Live-SWE-Agent: Can Software Engineering Agents Self-Evolve on the Fly?” https://arxiv.org/abs/2511.13646.

Zhang, Yuhao, Luyao Ren, Liqian Chen, Yingfei Xiong, Shing-Chi Cheung, and Tao Xie. 2020. “Detecting Numerical Bugs in Neural Network Architectures.” In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 826–37. ESEC/FSE 2020. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3368089.3409720.