Detecting numerical bugs in neural network architectures

This paper makes the first attempt to conduct static analysis for detecting numerical bugs at the architecture level.

Introduction

A neural architecture can contain numerical bugs that cause serious consequences. Numerical bugs in a neural architecture manifest themselves as numerical errors in the form of “NaN”, “INF”, or crashes during training or inference.

# Input:
# center: 2*100 - shape tensor whose elements in [-1, 1]
# offset : 2*100 - shape tensor whose elements in [0, 2]

# Create 100 rectangles .
bottomLeft = center - offset
topRight = center + offset
rectangle = tf.concat([bottomLeft, topRight], axis=1)

# Calculate the reciprocal of their areas.
bottom, left, top, right = tf.split(rectangle, num_or_size_splits=4, axis=1)
width = right - left
height = top - bottom
area = width * height
scale = tf.reciprocal(area)

The key insight is that affine relations are common in neural network architectures.

Basics of Abstract Interpretation

Concrete properties are described in the concrete domain \(\mathbb{C}\) with a partial order \(\subseteq\), and abstract properties are described in the abstract domain \(\mathbb{A}\) with a partial order \(\sqsubseteq\).

  • Abstraction function \(\alpha: \mathbb{C} \rightarrow \mathbb{A}\)
  • Concretization function \(\gamma: \mathbb{A} \rightarrow \mathbb{C}\)

Galois connection \(\langle\mathbb{C},\subseteq\rangle \overset{\gamma}{\underset{\alpha}{\leftrightarrows}} \langle\mathbb{A},\sqsubseteq\rangle\): \[ \forall c \in \mathbb{C}, a \in \mathbb{A}. \alpha(c) \sqsubseteq a \Leftrightarrow c \subseteq \gamma(a) \]

Abstract Domain of Intervals

\[ \mathbb{A}_I \triangleq \{([l_1,u_1],...,[l_n,u_n] | l,u \in \mathbb{R}^n)\} \] Here \(x \in \mathbb{A}_I \Rightarrow l_i \leq x_i \leq u_i\).

Also, we say \(a_1 \sqsubseteq a_2\) iff \(\forall i. [l_i^1,u_i^1] \subseteq [l_i^2,u_i^2]\).

And the abstraction functions are defined as: \[ \alpha_I(c) = ([\inf\{x_i | x \in c\}, \sup\{x_i | x \in c\}])_{i=1}^n \] That is, the abstraction function maps a concrete set to the smallest interval containing it.

The concretization function is defined as: \[ \gamma_I(a) = \{x \in \mathbb{R}^n | \forall i. l_i^a \leq x_i \leq u_i^a\} \]

Abstract Domain of Affine Relations

\[ \mathbb{A}_E \triangleq \{(\mathbf{A},b)|\mathbb{A} \in \mathbb{R}^{m \times n}, b \in \mathbb{R}^m, m>0\} \]

Here \(x \in \mathbb{A}_E \Rightarrow \mathbf{A}x = b\).

And the abstraction functions are defined as:

\[ \alpha_E(c) = \begin{cases} (\mathbf{A},b) & \text{if } c \subseteq \{x | \mathbf{A}x = b\} \text{ and } (\mathbf{A},b) \text{ is in reduced echelon form}\\ \top & \text{if } c = \mathbb{R}^n\\ \bot & \text{otherwise} \end{cases} \]

The concretization function is defined as:

\[ \gamma_E(a) = \{x \in \mathbb{R}^n | \mathbf{A}x = b\}, \] here \(a = (\mathbf{A},b)\).

Abstract Domain for Neural Architectures

The abstract domain for Tensor partitioning and Interval abstraction with affine Equality relation \(\mathbb{A}_{\text{TIE}}\) is defined as: \[ \mathbb{A}_{\text{TIE}} \triangleq \{(\mathcal{P}, a^{\sharp I}, a^{\sharp E})|a^{\sharp I} \in \mathbb{A}_I, a^{\sharp E}\in\mathbb{A}_E\}, \] where \(\mathcal{P} = \{A_1,A_2,...,A_n\}\) is a tensor partitioning.

The concretization function \(\gamma_{\text{TIE}}\) of an element \(a^\sharp = (\mathcal{P}, a^{\sharp I}, a^{\sharp E})\) is defined as: \[ \gamma_{\text{TIE}}(a^\sharp) = \gamma_I(a^{\sharp I}) \cap \gamma_E(a^{\sharp E}) \]

Notes for ReLU:

  • Create a new symbolic variable \(b\) for \(a\).
  • And we have \(b - a^{\text{ReLU}} = 0\).
  • \(a^{\text{ReLU}} - a^{-\text{ReLU}} - a =0\).