Data Collection Policy Update
GitHub has announced updates to its Privacy Statement and Terms of Service that clarify how user data from GitHub Copilot will be used for AI model training. Starting April 24, GitHub will collect and use interaction data—including prompts, code snippets, outputs, and associated context—from Copilot Free, Pro, and Pro+ users to develop and improve AI models. This policy applies to individual consumer accounts only; Enterprise and organization-provided accounts remain governed by existing Data Protection Agreements.
Opting Out and Data Control
Users retain control over their data and can disable collection for AI training through their settings. GitHub emphasizes that:
- Individual opt-out available: Users can opt out of data collection anytime through GitHub settings
- Previous preferences preserved: Users who previously opted out of data collection for product improvements have their preference retained
- Enterprise protection: Copilot Business and Copilot Enterprise users are not affected, and their data will not be used for training
- Private repository safeguard: The update does not change access to private repository source code at rest; only interaction data (prompts and suggestions) generated during Copilot use may be collected
Privacy Safeguards and Terms Changes
GitHub introduced new contractual language to clarify AI-related terms:
- New Terms of Service provisions: Added dedicated Section J covering AI features, training, and data usage, plus updates to Section E on private repositories and Section D on user-generated content
- Data minimization commitments: GitHub commits to using minimum necessary personal data and applying de-identification and aggregation techniques during training
- No third-party sharing: GitHub will not share inputs or outputs with third-party AI model providers for their independent training
- Affiliate data handling: Data shared with Microsoft and other GitHub affiliates for AI training is subject to the same opt-out preferences and enterprise protections
These changes align with GitHub's push to use more developer data for model improvement while providing transparency and user control mechanisms.