MAP 2.3 - TEVV Documentation

NIST AI RMF (in the playbook companion) states:

MAP 2.3

Scientific integrity and TEVV considerations are identified and documented including related to experimental design, data collection and selection (e.g., availability, representativeness, suitability), and construct validation.

About

Many AI system risks can be traced to insufficient testing and evaluation processes. For example, machine learning requires large scale datasets. The difficulty of finding the “right” data may lead AI actors to select datasets based more on accessibility and availability than on suitability. Such decisions may contribute to an environment where the data used in processes is not fully representative of the populations or phenomena that are being modeled, inserting or introducing downstream risks.

Other risks arise when selected datasets and/or attributes within datasets are not good proxies, measures, or predictors for operationalizing the phenomenon that the AI system intends to support or inform. Practices such as dataset reuse may also lead to data becoming disconnected from the social contexts and time periods of their creation. Datasets may also present security concerns or be polluted by bad actors in an attempt to alter system outcomes.

Collected data may differ significantly from what occurs in the real world. Large scale datasets used in AI systems often do not include representation of people who have been historically excluded. This may have a disproportionately negative impact on black, indigenous, and people of color, women, LGBTQ+ individuals, people with disabilities, or people with limited access to computer network technologies.

Actions
  • Document assumptions made and techniques used during the selection, curation, preparation, and analysis of data, and when identifying constructs and proxy targets, and developing indices – especially when seeking to measure concepts that are inherently unobservable (e.g. “hireability,” “criminality.” “lendability”).

  • Map adherence to policies that address data and construct validity, bias, privacy and security for AI systems and verify documentation, oversight,and processes.

  • Establish processes and practices that employ experimental design techniques for data collection, selection, and management practices.

  • Establish practices to ensure data used in AI systems is linked to the documented purpose of the AI system (e.g., by causal discovery methods).

  • Establish and document processes to ensure that test and training data lineage is well understood, traceable, and metadata resources are available for mapping risks.

  • Document known limitations, risk mitigation efforts associated with, and methods used for, training data collection, selection, labeling, cleaning, and analysis (e.g. treatment of missing, spurious, or outlier data; biased estimators).

  • Establish and document practices to check for capabilities that are in excess of those that are planned for, such as emergent properties, and to revisit prior risk management steps in light of any new capabilities.

  • Establish processes to test and verify that design assumptions about the set of deployment contexts continue to be accurate and sufficiently complete.

  • Work with domain experts to:

    • Gain and maintain contextual awareness and knowledge about how human behavior is reflected in datasets, organizational factors and dynamics, and society.

    • Identify participatory approaches for responsible Human-AI configurations and oversight tasks, taking into account sources of cognitive bias.

    • Identify techniques to manage and mitigate sources of bias (systemic, computational, human-cognitive) in computational models and systems, and the assumptions and decisions in their development.

  • Follow standard statistical principles and document the extent to which the proposed technology does not meet standard validation criteria.

  • Investigate and document potential negative impacts due to supply chain issues that may conflict with organizational values and principles.

Transparency and Documentation

Organizations can document the following:

  • Are there any known errors, sources of noise, or redundancies in the data?

  • Over what time-frame was the data collected? Does the collection time-frame match the creation time-frame

  • What is the variable selection and evaluation process?

  • How was the data collected? Who was involved in the data collection process? If the dataset relates to people (e.g., their attributes) or was generated by people, were they informed about the data collection? (e.g., datasets that collect writing, photos, interactions, transactions, etc.)

  • As time passes and conditions change, is the training data still representative of the operational environment?

  • Why was the dataset created? (e.g., were there specific tasks in mind, or a specific gap that needed to be filled?

  • How does the entity ensure that the data collected are adequate, relevant, and not excessive in relation to the intended purpose?

Last updated