AI2 publishes Nature paper on retrieval-augmented model for scientific literature synthesis with verifiable citations

Reliable Scientific Synthesis with Verifiable Citations

Keeping up with scientific publishing is challenging for researchers, and general-purpose AI systems have struggled with a critical requirement: reliable grounding in peer-reviewed literature. While language models can generate plausible summaries, they often cite irrelevant work or fabricate sources entirely—a problem known as hallucination that undermines trust in AI-assisted research.

OpenScholar: Building on Retrieval-Augmented Generation

AI2 and University of Washington researchers addressed this challenge by developing OpenScholar, an open-source model specifically designed for scientific literature synthesis. The system combines a model trained on scientific synthesis tasks with retrieval-augmented generation (RAG), allowing it to:

Search a corpus of 45 million open-access scientific papers
Incorporate relevant papers—including recent publications—into its responses
Provide verifiable citations for claims made in synthesized literature reviews
Leverage a full-text snippet index made available through the Semantic Scholar API

Supporting Resources and Benchmarking

To support evaluation of scientific synthesis systems, the researchers created ScholarQABench, the first large-scale multi-domain benchmark for evaluating citation quality and scientific answer generation. The computer science portion (ScholarQA-CS) evolved into ScholarQA-CS2, which is now part of AstaBench.

All research artifacts—including model checkpoints, the retrieval index, training data, and a public demo—are freely available. This research has already influenced downstream work, including the ScholarQA feature now in Asta and the expanded multi-step search capabilities in Deep Research Tulu (DR Tulu).

What This Means for Research Workflows

By showing that careful retrieval, ranking, and citation handling can substantially improve trustworthiness in scientific contexts, this work provides a foundation for AI systems that "show their work" rather than simply sounding convincing. Researchers can now inspect, validate, and extend these open-source tools for their own domains.

Reliable Scientific Synthesis with Verifiable Citations

OpenScholar: Building on Retrieval-Augmented Generation

Supporting Resources and Benchmarking

What This Means for Research Workflows

Tags

Published

Source