← Back
AI2 publishes Nature paper on retrieval-augmented model for scientific literature synthesis with verifiable citations
· modelreleaseopen-sourceapi · allenai.org ↗

Reliable Scientific Synthesis with Verifiable Citations

Keeping up with scientific publishing is challenging for researchers, and general-purpose AI systems have struggled with a critical requirement: reliable grounding in peer-reviewed literature. While language models can generate plausible summaries, they often cite irrelevant work or fabricate sources entirely—a problem known as hallucination that undermines trust in AI-assisted research.

OpenScholar: Building on Retrieval-Augmented Generation

AI2 and University of Washington researchers addressed this challenge by developing OpenScholar, an open-source model specifically designed for scientific literature synthesis. The system combines a model trained on scientific synthesis tasks with retrieval-augmented generation (RAG), allowing it to:

  • Search a corpus of 45 million open-access scientific papers
  • Incorporate relevant papers—including recent publications—into its responses
  • Provide verifiable citations for claims made in synthesized literature reviews
  • Leverage a full-text snippet index made available through the Semantic Scholar API

Supporting Resources and Benchmarking

To support evaluation of scientific synthesis systems, the researchers created ScholarQABench, the first large-scale multi-domain benchmark for evaluating citation quality and scientific answer generation. The computer science portion (ScholarQA-CS) evolved into ScholarQA-CS2, which is now part of AstaBench.

All research artifacts—including model checkpoints, the retrieval index, training data, and a public demo—are freely available. This research has already influenced downstream work, including the ScholarQA feature now in Asta and the expanded multi-step search capabilities in Deep Research Tulu (DR Tulu).

What This Means for Research Workflows

By showing that careful retrieval, ranking, and citation handling can substantially improve trustworthiness in scientific contexts, this work provides a foundation for AI systems that "show their work" rather than simply sounding convincing. Researchers can now inspect, validate, and extend these open-source tools for their own domains.