Automated Discovery and Integration at Scale
Harvey has introduced The Data Factory, an autonomous pipeline that dramatically reduces the manual work required to add new legal knowledge sources to its platform. Previously, integrating a new jurisdiction like Brazilian federal case law required weeks of manual effort: identifying repositories, building custom connectors, ingesting data, hand-labeling test cases, and recruiting domain experts for quality review. Now, this process is largely automated.
Architecture and Core Components
The system operates through three main stages:
- Intake Engine: Automatically discovers legal sources through jurisdiction mapping and compliance review, converting customer requests and coverage gaps into vetted, pipeline-ready data sources
- Evaluation Pipeline: Tests whether agents can effectively use new sources to solve real legal problems using synthetic scenario generation, production simulation, and multi-agent quality assessment
- Configuration Layer: Defines jurisdictions declaratively with domain lists, filter hierarchies, permissions, and agent-specific instructions
Impact and Scale
Since its launch in August 2025, The Data Factory has expanded Harvey's knowledge coverage from 6 to 60+ jurisdictions and integrated 400+ unique legal data sources. This automation enables the platform to respond rapidly to customer needs—a São Paulo customer requesting analysis of a Brazilian federal ruling that would previously have required weeks of preparation was served successfully within 72 hours through sources the automated pipeline had already indexed and validated.