What Are the Data Governance Requirements in ISO 42001?

CertBetter

Team CertBetter

14 min read
What Are the Data Governance Requirements in ISO 42001?

Why Data Governance Sits at the Heart of ISO 42001

If you are building an AI management system under ISO 42001, data governance is not a side topic you can address with a quick policy document. It is woven into the core of what the standard requires. The reason is straightforward: AI systems are only as trustworthy as the data that trains and drives them. Poor data quality, uncontrolled data sources, and undocumented data lineage are among the leading causes of AI failures, biased outputs, and regulatory breaches.

ISO 42001 is the international standard for AI management systems, published by the International Organisation for Standardisation. It provides a structured framework for organisations that develop, deploy, or use AI systems to manage risks, demonstrate accountability, and operate responsibly. Data governance sits within that framework as a foundational requirement rather than an optional add-on.

This article walks you through what ISO 42001 actually requires in relation to data governance, what it looks like in practice, and how to build controls that will hold up under audit. If you want a broader introduction to the standard first, our understanding of ISO IEC 42001 for AI management systems is a good starting point.

What ISO 42001 Says About Data

ISO 42001 does not have a single clause titled “data governance.” Instead, data-related requirements are distributed across several clauses and the standard’s normative Annex A controls. Understanding where these requirements live helps you build a complete picture rather than missing critical elements.

Clause 6: Planning and Data Risk

Clause 6 requires you to identify AI-related risks and opportunities. Data risks must be included in this assessment. That means you need to consider risks such as training data that is unrepresentative of the real population, data that was collected without appropriate consent, data that has become stale or outdated, and data pipelines that introduce errors or biases.

These are not theoretical concerns. An AI system trained on historical hiring data that reflects past discrimination will perpetuate that discrimination unless the data risk is identified and treated. Clause 6 requires you to document these risks and plan how you will address them, which feeds directly into the controls you establish around data.

Clause 8: Operational Controls and Data Management

Clause 8 is where the practical data governance work happens. It requires organisations to establish, implement, and control the processes needed to meet AI system requirements. For data, this translates into documented processes for how data is sourced, validated, prepared, stored, and used within AI systems.

Specifically, you need to be able to demonstrate that your data handling processes are planned and controlled, that criteria for data acceptance and rejection are defined, and that records are kept to show the processes were followed. This is where many organisations fall short in early implementation. They have informal practices but no documented process, which means an auditor cannot verify that the right controls are actually in place.

Annex A Controls Related to Data

Annex A of ISO 42001 contains a set of AI-specific controls that organisations should consider implementing. Several of these controls relate directly to data governance. The key ones include:

  • A.6 AI system data: This is the most directly relevant control area. It covers data governance policies, data quality requirements, data provenance, and the handling of sensitive data within AI systems.
  • A.7 AI system documentation: Requires that the data used in AI systems is documented sufficiently to allow for review and accountability.
  • A.8 AI system operation: Includes requirements around monitoring data inputs and outputs during operation, not just during development.
  • A.9 Technical and organisational measures: Covers security and access controls applied to data used in AI systems.

Annex A is normative in that organisations are expected to consider these controls, but the standard uses a Statement of Applicability approach similar to ISO 27001. You document which controls apply to your context, implement the applicable ones, and justify any exclusions. For most organisations deploying AI systems, the data-related controls in A.6 will almost always be applicable.

The Five Core Data Governance Requirements Under ISO 42001

When you map the standard’s requirements to practical obligations, five core data governance areas emerge. Each one needs a documented approach, implemented controls, and evidence that the controls are working.

1. Data Governance Policy

You need a documented policy that establishes your organisation’s approach to managing data in AI systems. This is not the same as a general data management policy or a privacy policy, though it can reference and build on those documents. The AI data governance policy needs to address the specific risks and responsibilities associated with data used to train, validate, test, and operate AI systems.

The policy should define roles and responsibilities, state the organisation’s commitment to data quality and ethical data use, and set out the principles that govern data selection and handling. It does not need to be lengthy. A clear, two-page policy that your team actually understands and follows is far more valuable than a thirty-page document that sits in a shared drive untouched.

2. Data Quality Requirements

ISO 42001 requires that you define and apply data quality criteria for the data used in your AI systems. Quality in this context covers accuracy, completeness, consistency, timeliness, and relevance to the intended use case. You need to document what “good enough” data looks like for each AI system and have a process for checking incoming data against those criteria before it is used.

In practice, this means building data validation steps into your AI development and deployment pipeline. If you are using a third-party dataset, you need to assess it against your quality criteria before incorporating it. If you are collecting data from operational systems, you need controls to detect and handle anomalies, missing values, and inconsistencies.

This requirement has direct implications for organisations that purchase or license datasets from external providers. You cannot simply assume that a commercially available dataset meets your quality requirements. You need to verify it, document your assessment, and keep records of that verification.

3. Data Provenance and Lineage

Provenance refers to where your data came from. Lineage refers to how it was transformed, processed, and used on its way to your AI system. ISO 42001 requires that you can account for both. This is one of the areas where organisations with informal data practices find themselves most exposed during an audit.

Being able to demonstrate data provenance means you can answer questions like: Where did this training dataset originate? Was it collected with appropriate consent? Does it comply with the relevant data protection laws in the jurisdictions where your AI system operates? Was it obtained from a reliable source, and how do you know?

Data lineage documentation means you can trace what happened to the data after it was collected. Was it anonymised? Was it augmented with synthetic data? Were any records removed or modified, and if so, why and by whom? These questions matter because they determine whether your AI system’s outputs can be trusted and whether you can defend your data practices to a regulator or a client.

4. Sensitive Data Handling

ISO 42001 pays specific attention to sensitive data, which includes personal information, health data, financial data, data relating to vulnerable populations, and any data that could cause harm if mishandled. If your AI system uses or produces sensitive data, you need additional controls.

These controls typically include access restrictions, anonymisation or pseudonymisation where feasible, retention limits, and documented approval processes for using sensitive data in AI training or testing. You also need to consider the intersection with privacy law. In Australia, the Privacy Act 1988 and the Australian Privacy Principles impose specific obligations on how personal information is handled. Your ISO 42001 data governance controls need to be consistent with those legal requirements.

For organisations that are also certified to ISO 27701 for privacy information management, there is significant overlap here. Aligning your ISO 42001 sensitive data controls with your existing ISO 27701 framework is an efficient approach that avoids duplication and makes both systems easier to maintain.

5. Data Monitoring During Operation

Data governance under ISO 42001 does not stop when your AI system goes live. The standard requires that you monitor data inputs and outputs during operation to detect issues such as data drift, where the statistical properties of incoming data change over time and cause the AI system’s performance to degrade.

This is a requirement that many organisations overlook during implementation because it requires ongoing operational effort rather than a one-time setup task. You need to define what you are monitoring, how often, what thresholds trigger a review, and what actions you take when issues are detected. These monitoring activities need to be documented and the results recorded.

Data drift is a real problem in production AI systems. A model trained on pre-pandemic consumer behaviour data will perform poorly if deployed without adjustment in a post-pandemic market. Monitoring for drift and having a documented response process is what separates organisations that manage their AI systems responsibly from those that simply deploy and hope for the best.

Documenting Your Data Governance Framework

ISO 42001 follows the same High Level Structure as other ISO management system standards, which means documentation requirements are consistent with what you would find in ISO 9001 or ISO 27001. You need documented information that demonstrates your data governance controls are defined, implemented, and maintained.

At a minimum, your documentation should include a data governance policy for AI systems, a register of the data assets used in each AI system within scope, data quality criteria for each AI system, data provenance records for training and validation datasets, a sensitive data register with associated controls, and monitoring records showing that operational data is being reviewed.

If you are already familiar with how to manage controlled documents under other ISO standards, the same principles apply here. Version control, review cycles, and access controls for your data governance documents are all part of demonstrating a functioning management system.

How Data Governance Connects to AI Risk Management

Data governance and AI risk management are not separate workstreams under ISO 42001. They are deeply connected. The risks you identify in Clause 6 around data directly inform the controls you establish, and the effectiveness of those controls determines whether your risk treatment is actually working.

Consider a practical example. A financial services company deploys an AI system to assess loan applications. The data governance risk assessment identifies that the training data contains historical loan decisions made by human officers who had implicit biases against certain demographic groups. The risk is that the AI system will replicate and potentially amplify those biases.

The data governance controls in response might include reprocessing the training data to remove or rebalance the biased historical decisions, establishing ongoing monitoring of approval rates across demographic groups, and setting a threshold that triggers a human review when the AI system’s decisions deviate from expected patterns. Each of these controls needs to be documented, implemented, and evidenced.

This kind of connected thinking between risk and control is what auditors are looking for. It is also what distinguishes ISO 42001 from the NIST AI Risk Management Framework in terms of certifiability. ISO 42001 requires you to demonstrate that your controls are working, not just that you have thought about the risks.

Common Gaps Found During ISO 42001 Audits

Having worked through AI management system implementations, certain data governance gaps come up repeatedly during Stage 1 and Stage 2 audits. Being aware of these helps you address them before the auditor finds them.

No Documented Data Quality Criteria

Organisations often have informal standards for what constitutes acceptable data, but nothing written down. If an auditor asks how you determined that a particular dataset was suitable for training your AI system, “we reviewed it and it looked fine” is not an acceptable answer. You need documented criteria and evidence that the data was assessed against them.

Missing Provenance Records for External Datasets

If you downloaded a public dataset or licensed data from a third party, you need records showing where it came from, what terms govern its use, and whether it is permitted to be used for AI training under those terms. Many organisations have not thought carefully about whether their data licensing agreements actually permit AI training use, which creates both a compliance gap and a legal risk.

No Operational Monitoring Plan

The absence of any operational data monitoring is a common finding. Organisations implement controls during development but have no plan for monitoring data quality or drift once the system is in production. This is a straightforward gap to address with a monitoring schedule and defined responsibilities, but it needs to be done before certification.

Sensitive Data in Test Environments

Using real personal data in AI testing and development environments without the same controls that apply in production is a frequent issue. ISO 42001 requires that sensitive data handling controls apply wherever the data is used, not just in the production system.

Practical Steps to Build Your ISO 42001 Data Governance Framework

If you are starting from scratch or assessing your current state against the standard’s requirements, here is a practical sequence to follow.

  1. Inventory your AI systems and their data inputs: You cannot govern what you have not mapped. Start by documenting every AI system within your certification scope and identifying all data sources that feed into each one.
  2. Assess data risks for each system: For each data source, assess the risks around quality, provenance, sensitivity, and bias. Document your assessment and link it to your Clause 6 risk register.
  3. Define data quality criteria: For each AI system, document what acceptable data looks like. Be specific. Vague criteria like “data must be accurate” are not auditable. Specific criteria like “training data must have less than 2% missing values in key fields” are.
  4. Document provenance for existing datasets: Go back to your current datasets and document where they came from, how they were obtained, and what terms apply. Fill any gaps by contacting data providers or sourcing replacement datasets where provenance cannot be established.
  5. Implement sensitive data controls: Apply appropriate controls to any sensitive data used in your AI systems and document those controls in a register.
  6. Build an operational monitoring plan: Define what you will monitor, how often, who is responsible, and what actions follow when issues are detected. Implement the plan and keep records of monitoring activities.
  7. Write your data governance policy: Once you have worked through the above steps, you have the substance to write a meaningful policy rather than a generic one.

If you are preparing for your first ISO 42001 audit, our guide on how to prepare for an ISO 42001 Stage 1 audit covers what auditors will look for across all clauses, including data governance.

Getting Help With ISO 42001 Data Governance

ISO 42001 is still a relatively new standard, and finding consultants with genuine hands-on experience implementing its data governance requirements is not straightforward. The standard requires a combination of AI system knowledge, data management expertise, and ISO management system experience that not every consultant brings to the table.

If you are looking for qualified help, CertBetter connects businesses with verified ISO 42001 consultants and accredited certification bodies. You submit one form and receive up to three competing quotes from vetted providers, at no cost to your business. It is a practical way to find consultants who have actually implemented ISO 42001 data governance frameworks rather than those who are learning on your time and budget.

Get 3 ISO Quotes. 24 Hours Response

Tell us what you need and compare vetted ISO consultants or certification bodies within 24 hours. Free, no obligation.

Trusted by 400+ businesses like yours

Frequently Asked Questions

ISO 42001 requires that your organisation has documented processes and policies governing how data is managed within AI systems. This does not have to be a standalone document if your existing data governance framework adequately covers AI-specific requirements such as data provenance, quality criteria, and sensitive data handling in AI contexts. In practice, most organisations find it cleaner to have a dedicated AI data governance policy or a specific AI addendum to their existing data governance documentation, because the AI-specific requirements are distinct enough to warrant separate treatment.

ISO 42001 data governance requirements and the Australian Privacy Act 1988 operate in parallel. The Privacy Act imposes legal obligations on how personal information is collected, used, stored, and disclosed, including in AI systems. ISO 42001 requires that your data governance controls address applicable legal requirements, which means your ISO 42001 framework must be consistent with your Privacy Act obligations. Where your AI system uses personal information, your data governance documentation should explicitly reference how your controls satisfy the relevant Australian Privacy Principles.

Data drift occurs when the statistical properties of data inputs to an AI system change over time, causing the system’s performance to degrade or its outputs to become unreliable. ISO 42001 requires operational monitoring of AI systems, which includes monitoring for data drift because it represents an ongoing risk to AI system performance and the reliability of outputs. Without monitoring, an organisation may be unaware that its AI system is producing increasingly poor or biased results due to changes in the data it is receiving, which creates both operational and reputational risk.

Yes, but using publicly available datasets does not automatically satisfy ISO 42001 data governance requirements. You still need to document the provenance of the dataset, assess it against your data quality criteria, verify that its terms of use permit AI training applications, and evaluate it for potential biases or quality issues. Many publicly available datasets have known limitations, biases, or licensing restrictions that affect their suitability for AI training. Your data governance process needs to capture this assessment and the decision made, with records retained to demonstrate the assessment was conducted.

There is meaningful overlap between ISO 42001 data governance and ISO 27001 information security management requirements, particularly around access controls, data classification, and incident management for data breaches. Organisations that already hold ISO 27001 certification can map their existing information security controls to the relevant ISO 42001 requirements and avoid duplicating effort. However, ISO 42001 introduces AI-specific data requirements around quality, provenance, and bias that go beyond what ISO 27001 covers, so the two frameworks complement rather than replace each other.

During an ISO 42001 certification audit, you should expect an auditor to request your AI data governance policy, your data asset register for AI systems within scope, documented data quality criteria and records of data assessments against those criteria, provenance records for training and validation datasets, your sensitive data register and associated controls, and operational monitoring records showing that data inputs and outputs are being reviewed. The auditor is looking for evidence that your controls are implemented and working, not just that they are documented. Records of actual monitoring activities, data quality assessments, and any corrective actions taken are essential.

Dilawar Laghari

Hi! I am Dilawar Laghari, founder of CertBetter.

I created CertBetter to help anyone compare ISO certification providers for free.

ISO 42001 Data Governance Requirements Explained - CertBetter