# Data Governance (DG)

This compliance category contains requirements concerning the collection, management and use of data in AI based SaMD.

Robust algorithms typically require the availability of large, high-quality, and well-labelled training data sets. According to the [US FDA AI and ML Discussion Paper](https://www.fda.gov/media/109618/download), an organisation developing AI based SaMD should:&#x20;

> **Data management plan addressing how data will be collected, added to existing data sets, and used:** This data management plan may include a quality assurance (QA) plan for determining which new data are appropriate for inclusion as part of an expanded training data set; an approach to the reference standard determination; a data augmentation strategy that allows for additional training and independent test data to be added; and an auditing and sequestration strategy to monitor, document test dataset independence, and control access to both the training and test datasets as additional data are being included and any revised algorithm is being retrained and tested.

Similarly, in the [US FDA Good Machine Learning Practice (GMLP)](https://www.fda.gov/media/153486/download) guiding principles:&#x20;

> **Principle 3. Clinical Study Participants and Data Sets Are Representative of the Intended Patient Population:** Data collection protocols should ensure that the relevant characteristics of the intended patient population (for example, in terms of age, gender, sex, race, and ethnicity), use, and measurement inputs are sufficiently represented in a sample of adequate size in the clinical study and training and test datasets, so that results can be reasonably generalized to the population of interest. This is important to manage any bias, promote appropriate and generalizable performance across the intended patient population, assess usability, and identify circumstances where the model may underperform.&#x20;
>
> **Principle 4. Training Data Sets Are Independent of Test Sets:** Training and test datasets are selected and maintained to be appropriately independent of one another. All potential sources of dependence, including patient, data acquisition, and site factors, are considered and addressed to assure independence.&#x20;

Data Governance (DG) is also in line with the [IMDRF/SaMD N23](https://www.imdrf.org/sites/default/files/docs/imdrf/final/technical/imdrf-tech-151002-samd-qms.pdf), especially related to the sections:

> 7.1 -- Product Planning
>
> 7.3 -- Document Control and Records
>
> 7.4 -- Configuration Management and Control
>
> 8.2 -- Design
>
> 8.4 -- Verification and Validation

Below is the list of the controls that are part of this compliance category:

* [DG01 - Define Sets](/fda-ai-based-samd/data-governance-dg/dg01-define-sets.md)
* [DG02 - Dataset Governance Policies](/fda-ai-based-samd/data-governance-dg/dg02-dataset-governance-policies.md)
* [DG03 - Dataset Design Choices](/fda-ai-based-samd/data-governance-dg/dg03-dataset-design-choices.md)
* [DG04 - Dataset Source Information](/fda-ai-based-samd/data-governance-dg/dg04-dataset-source-information.md)
* [DG05 - Dataset Annotations Information](/fda-ai-based-samd/data-governance-dg/dg05-dataset-annotations-information.md)
* [DG06 - Dataset Labels Information](/fda-ai-based-samd/data-governance-dg/dg06-dataset-labels-information.md)
* [DG07 - Dataset Cleaning](/fda-ai-based-samd/data-governance-dg/dg07-dataset-cleaning.md)
* [DG08 - Dataset Enrichment](/fda-ai-based-samd/data-governance-dg/dg08-dataset-enrichment.md)
* [DG09 - Dataset Aggregation](/fda-ai-based-samd/data-governance-dg/dg09-dataset-aggregation.md)
* [DG10 - Dataset Description, Assumptions and Purpose](/fda-ai-based-samd/data-governance-dg/dg10-dataset-description-assumptions-and-purpose.md)
* [DG11 - Dataset Transformation Rationale](/fda-ai-based-samd/data-governance-dg/dg11-dataset-transformation-rationale.md)
* [DG12 - Dataset Bias Identification](/fda-ai-based-samd/data-governance-dg/dg12-dataset-bias-identification.md)
* [DG13 - Dataset Bias Mitigation](/fda-ai-based-samd/data-governance-dg/dg13-dataset-bias-mitigation.md)
* [DG14 - Dataset Bias Analysis Action and Assessment](/fda-ai-based-samd/data-governance-dg/dg14-dataset-bias-analysis-action-and-assessment.md)
* [DG15 - Dataset Gaps and Shortcomings](/fda-ai-based-samd/data-governance-dg/dg15-dataset-gaps-and-shortcomings.md)
* [DG16 - Dataset Bias Monitoring - Ongoing](/fda-ai-based-samd/data-governance-dg/dg16-dataset-bias-monitoring-ongoing.md)
* [DG17 - Dataset Bias Special/Protected Categories](/fda-ai-based-samd/data-governance-dg/dg17-dataset-bias-special-protected-categories.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentations.seclea.com/fda-ai-based-samd/data-governance-dg.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
