AutoDiscovery: Autonomous Dataset Exploration
AI2 has released AutoDiscovery, an AI-powered research tool that fundamentally shifts how scientists approach data exploration. Unlike traditional AI research tools that require researchers to formulate hypotheses first, AutoDiscovery works in reverse: provide it a dataset and it autonomously generates hypotheses, designs statistical experiments, and surfaces novel findings researchers might never have considered.
The system operates overnight to comprehensively explore structured datasets, delivering a curated list of research directions with reproducible code, statistical results, and clear paths for follow-up investigation. AutoDiscovery evolved from a published research project with open-source code, refined through collaboration with domain experts across multiple disciplines.
Real-World Applications
Oncology: Dr. Kelly Paulson's team at Providence Swedish Cancer Institute used AutoDiscovery to analyze large clinical and genomic datasets from breast cancer and melanoma research. The tool confirmed expected findings (immune activity importance in melanoma, PI3K pathway relevance in breast cancer) while surfacing unexpected associations—such as strong immune response patterns in melanoma and lymph node spread risk factors in breast cancer—that weren't part of the team's initial hypotheses and are now undergoing validation.
Marine Ecology: Researchers at Scripps Institution of Oceanography applied AutoDiscovery to 20+ years of rocky reef monitoring data from the Gulf of California. Beyond documenting known heatwave impacts, the system discovered mechanistic relationships between productivity across trophic levels that would have required extensive manual iteration to uncover. Researchers emphasized AutoDiscovery's transparency and interpretability as critical for building trust in AI-assisted science.
Social Science: Economist Sanchaita Hazra used AutoDiscovery to explore social and economic datasets, discovering that authors with doctoral degrees edited AI-generated abstracts substantially more than those with undergraduate or master's degrees—a significant finding her original research focus would have missed. She noted the tool condensed weeks of PhD-level regression analysis into hours.
Availability and Impact
AutoDiscovery is now available as an experimental feature in AstaLabs. Researchers can sign up for Asta Preview to gain early access. The tool represents a paradigm shift in research methodology: rather than replacing expert intuition, it surfaces hidden questions in datasets, allowing researchers to focus effort where it matters most.