Key Mamba Research Papers and Publications

The Mamba architecture has generated a concentrated body of peer-reviewed research since its introduction, spanning state space model theory, hardware-aware implementation, and domain-specific applications from genomics to vision. This page maps the primary publications, their technical scope, and the research lineage that connects them — structured as a reference for practitioners, researchers, and engineers evaluating the literature before implementation or further study.

Definition and scope

Mamba research refers to the corpus of academic publications, preprints, and technical reports that formally define, extend, evaluate, or critique the Mamba selective state space model and its derivatives. The field is anchored by a small number of foundational papers but has expanded rapidly through domain adaptation studies and architectural hybridization work.

The primary publication record lives on arXiv, the Cornell University-operated preprint server that hosts the overwhelming majority of deep learning architecture papers prior to or alongside peer-reviewed venue publication. Conference venues of record include NeurIPS, ICML, ICLR, and ACL for language-focused variants. Understanding the publication landscape requires distinguishing between 4 categories:

  1. Foundational architecture papers — define the core mechanism and training algorithm
  2. Domain adaptation papers — apply Mamba to specific modalities (vision, audio, genomics, time series)
  3. Comparative evaluation papers — benchmark Mamba against Transformers and RNNs across defined tasks
  4. Hybridization papers — propose combined Mamba-Transformer or Mamba-CNN architectures

The Mamba architecture overview and the full state space models explained reference provide the technical substrate for understanding what each paper is building upon or departing from.

How it works

The foundational paper is "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" by Albert Gu and Tri Dao (arXiv:2312.00752, December 2023). This paper introduced the selective state space mechanism, the hardware-aware parallel scan algorithm, and the S6 block. It reported benchmark results on language modeling tasks where Mamba matched or exceeded Transformer models of comparable parameter count while scaling at O(n) rather than O(n²) in sequence length. The paper is the mandatory entry point for any serious engagement with the literature.

The follow-up "Mamba-2" paper, formally titled "Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality" (Dao and Gu, arXiv:2405.21060, 2024), established a formal algebraic equivalence between structured state spaces and certain attention mechanisms, expanded the state dimension from 16 to values in the range of 64–256, and introduced the SSD (State Space Duality) algorithm. This paper is the definitional source for Mamba-2 improvements.

Preceding Mamba directly are the S4 (Structured State Spaces for Sequence Modeling, NeurIPS 2022) and H3 papers from the same research lineage, both available on arXiv. S4 introduced the HiPPO-based initialization that underpins Mamba's long-range memory properties. Practitioners studying selective state spaces or Mamba linear-time scaling will encounter S4 as a required background reference.

Common scenarios

Researchers encounter Mamba publications in distinct contexts depending on their application domain:

Language and sequence modeling: The Gu and Dao (2023) foundational paper and Mamba-2 (2024) are primary. Additional work includes papers benchmarking Mamba on the Long Range Arena (LRA) benchmark suite, a standardized evaluation framework first published in Tay et al. (ICLR 2021, arXiv:2011.04006) that is widely used for apples-to-apples Mamba vs Transformers comparisons.

Vision: Vision Mamba (VMamba, arXiv:2401.13260) and the related Vision Mamba architectural work apply 2D selective scanning to image classification and dense prediction tasks. The VMamba paper reported ImageNet-1K top-1 accuracy competitive with DeiT and Swin Transformer at matched parameter budgets.

Genomics and bioinformatics: The Caduceus paper (arXiv:2403.03234) adapted Mamba for bidirectional DNA sequence modeling, directly relevant to the Mamba genomics and bioinformatics application domain.

Audio: Work published under the MambaAudio and Samba labels extended the architecture to speech and music generation, documented in the Mamba audio processing reference.

Hybrid models: The Jamba architecture (AI21 Labs, arXiv:2403.19887) interleaves Mamba and Transformer layers at defined ratios, representing a major entry in Mamba hybrid models literature.

Decision boundaries

Choosing which papers to engage with first depends on the research or engineering objective:

The Mamba vs RNNs literature sits at an intersection between Mamba papers and prior recurrent model work, including RWKV (arXiv:2305.13048) and RetNet (arXiv:2307.08621), which are the primary comparison baselines in recurrent-class evaluations.

The full reference landscape for this sector is indexed at the Mamba resources and tools page and the /index of this reference network, which organizes all technical domains covered.

References