Training and Test Dataset (TTD)
AI systems that are based on ML require both training and test data in order to successfully train and validate the systems for the behaviour that was intended. In terms of data variety and quality, the data used for training and testing are appropriate for the behaviour that was intended. The data used for training and testing ought to be checked to ensure that they are up to date and pertinent to the task at hand.
The amount of training and test data that will be necessary will change depending on the complexity of the environment and the functionality that is intended to be implemented. In order for the AI system to have a high level of predictive power, both the training data and the test data should have a significant amount of variety in their features. Data for training and testing cannot be found within the business; instead, it must be obtained from outside sources. In that scenario, ensuring the integrity of the data is also necessary.
Controls related to this risk category are listed as below:
TTD 01 - Data Management Procedures
TTD 02 - Data Collection Assessment
TTD 03 - Dataset Governance Policies
TTD 04 - Dataset Annotations and Labels Information
TTD 05 - Dataset Cleaning Enrichment and Aggregation
TTD 06 - Dataset Description Assumptions and Purpose
TTD 07 - Dataset Transformation Rationale
TTD 08 - Dataset Bias Identification and Mitigation
TTD 09 - Dataset Bias Analysis Action and Assessment
Last updated