Mamba Terminology and Glossary of Key Terms

The vocabulary surrounding Mamba-based sequence modeling is precise and draws from control theory, signal processing, and deep learning in ways that differ meaningfully from transformer-centric literature. This glossary covers the foundational and advanced terms encountered across Mamba architecture documentation, research publications, and implementation frameworks. Familiarity with these definitions is prerequisite to navigating technical architecture breakdowns, benchmark comparisons, and deployment decisions in production environments.

Definition and scope

Mamba terminology spans three overlapping domains: the mathematical foundations inherited from state space model (SSM) theory, the architectural innovations introduced in the original Mamba paper by Albert Gu and Tri Dao (2023), and the engineering vocabulary associated with hardware-aware implementation. The glossary below organizes these into structured classification groups.

Core architectural terms:

How it works

Understanding Mamba's operational vocabulary requires distinguishing between the recurrent and convolutional representations of the same underlying computation.

Computational mode terms:

  1. Recurrent mode: Sequential computation where the hidden state h(t) is updated one timestep at a time using h(t) = Āh(t−1) + B̄x(t). This mode is efficient at inference time, requiring O(1) memory per step, but cannot be parallelized across the sequence dimension during training.
  2. Convolutional mode: A mathematically equivalent parallel formulation using the SSM kernel — the sequence of output responses to unit impulses — expressed as a global convolution over the input sequence. This mode enables GPU-parallel training but requires storing the full input sequence.
  3. Hardware-Aware Algorithm: The implementation strategy in Mamba that fuses the selective scan into a single GPU kernel, keeping intermediate states in SRAM (on-chip memory) rather than HBM (high-bandwidth memory off-chip). This design reduces memory I/O by an order of magnitude for typical sequence lengths. See Mamba Hardware-Aware Algorithms for benchmarks.
  4. Selective Scan: The core computational primitive of Mamba — a parallel prefix-sum-style operation over the sequence that applies input-dependent transition matrices. Because Δ, B, and C vary per token, this cannot be reduced to a simple convolution, requiring the hardware-aware kernel approach.
  5. Step Size (Δ): A learned, input-dependent scalar that controls the discretization of the continuous-time system at each token position. Larger Δ causes the model to "focus" on the current input; smaller Δ causes it to carry forward prior state.

Common scenarios

Practitioners encounter distinct terminology subsets depending on application domain. The Mamba glossary reference covers domain-specific vocabulary extensions.

Vocabulary by application context:

Decision boundaries

The distinction between terms that appear synonymous but carry precise technical differences is operationally significant.

S4 vs. Mamba: S4 uses time-invariant parameters — A, B, C are fixed across all timesteps for a given layer. Mamba makes B, C, and Δ functions of the input x(t), producing a time-varying system. This difference determines whether selective filtering of input content is possible.

Recurrence vs. attention: Both mechanisms aggregate sequence history, but recurrence compresses it into a fixed-size state (O(1) memory per step at inference) whereas attention retains an explicit key-value cache that grows linearly with sequence length. Mamba's linear-time scaling is a direct consequence of the recurrent formulation.

SSM vs. RNN: Structured state space models and recurrent neural networks are both recurrent, but SSMs are defined by linear state transitions with principled continuous-time derivations, whereas classical RNNs (LSTM, GRU) use nonlinear gating without continuous-time grounding. See Mamba vs. RNNs for the full comparison.

The main Mamba reference index provides the entry point for navigating the full technical and applied coverage of Mamba as a model family.

References