Regulatory frameworks, deployment domains, and model selection: three necessary conditions for privacy detection in regulated environments
By Juan F. Cobo · May 28, 2026
The insufficiency of regulatory compliance as a selection criterion
When a technical or compliance team prepares to deploy a system that processes sensitive information in a regulated environment, the first question is almost always: which normative framework applies? That question is correct. Identifying the applicable framework is the necessary starting point. The mistake is treating it as sufficient.
A normative framework defines, at the level of legal obligation, which categories of information require protection and under what conditions. HIPAA enumerates the identifier classes that must be removed before health information can be treated as de-identified. The GDPR defines personal data by its function — whether it enables the identification of a natural person in context. Argentina's Law 25,326 inherits that functional logic and adds interpretive layers through AAIP guidance. The EU AI Act does not define sensitive data categories but creates governance obligations for systems that process sensitive data in high-risk contexts. Each framework answers a different version of the same question, using a different logic.
What no framework answers is the operational question: of the categories the applicable regulation requires protecting, which ones actually appear in this system's data flows? And of those, which can the candidate model reliably detect under the conditions of this deployment?
These are three distinct questions. Each requires a different kind of knowledge. The normative framework provides the legal taxonomy. The deployment domain determines which elements of that taxonomy are operationally present. The model's properties — its entity vocabulary and its benchmarking methodology — determine whether the detection task can be performed with the sensitivity the regulatory context demands. Failing to address any one of them is not a partial compliance gap. It is a structural gap that the others cannot compensate for.
This article examines each condition in turn, describes the four normative frameworks that most commonly govern AI deployments in regulated sectors, and concludes with the criteria a rigorous model selection methodology must satisfy to close all three gaps.
Four frameworks, four logics
The four normative frameworks most commonly referenced in AI governance decisions for regulated sectors — HIPAA, the GDPR, Argentina's Law 25,326, and the EU AI Act — do not constitute a unified taxonomy of sensitive information. They define sensitivity using incompatible logics. That incompatibility is not incidental. It reflects the different legal traditions, institutional contexts, and enforcement objectives that produced each framework. For model selection purposes, the implication is direct: there is no single reading of "what must be protected" that a model can implement across regulatory contexts. Each deployment requires framework-specific analysis.
HIPAA: definition by enumeration
The Health Insurance Portability and Accountability Act's Safe Harbor de-identification method defines protected health information by listing the identifier categories that must be removed before information can be treated as de-identified. The list comprises 18 categories: names, geographic subdivisions smaller than a state, dates related to an individual, telephone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate or license numbers, vehicle identifiers, device identifiers and serial numbers, web URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying number or code. Compliance is binary: all 18 categories must be addressed.
The strength of this approach is clarity. The list is closed and specific. The weakness is that it was constructed from the clinical hospital context — the types of identifying information that appear in medical records and administrative claims. Applied to other deployment types, the most operationally significant sensitive entities may not be the ones the list emphasizes. A system handling financial transactions or account management queries may encounter account numbers and national identity numbers far more frequently than diagnoses or medical record numbers. HIPAA requires protecting all of them, but its taxonomic emphasis does not reflect their relative presence in non-clinical data flows.
GDPR: definition by function
The General Data Protection Regulation defines personal data as any information relating to an identified or identifiable natural person. Identifiability depends on context: whether a given piece of information — alone or in combination with other available data — is sufficient to single out an individual in a specific situation. The GDPR's recitals clarify that factors such as the cost of identification and the technology available at the time of processing are relevant to the determination.
This functional definition is more flexible than enumeration — it can accommodate new types of identifying information without legislative amendment. It is also more demanding: it transfers to the data controller the responsibility of determining what is identifying in each operational context, given the specific data flows, the population involved, and the technical environment. That contextual evaluation is precisely what a generic detection model does not perform. A model trained on general benchmarks has no knowledge of whether a date of birth combined with a postal code and a clinical site constitutes identifying information in a given deployment. The GDPR requires that determination to be made. It does not make it.
Argentina's Law 25,326 / AAIP: functional definition with local interpretation layers
Argentina's Personal Data Protection Law adopts the functional logic of the GDPR — personal data is defined as information of any kind referring to natural persons or legal entities. The law distinguishes between ordinary personal data and sensitive data, the latter defined as information that reveals racial or ethnic origin, political opinions, religious or moral convictions, trade union membership, and information about health or sexual life.
The operative layer for AI deployments is not the text of the law itself but the AAIP's Resolution 4/2019 and subsequent guidance, which introduced specific criteria for evaluating privacy risks in automated data processing. Treating the law's text as the complete normative reference — without incorporating the AAIP's interpretive layer — is analytically equivalent to evaluating a model against general benchmark metrics while ignoring domain-specific performance gaps. Both approaches miss the layer where operational precision lives.
EU AI Act: governance obligations, not data taxonomy
The EU AI Act does not define what constitutes personal or sensitive data. That question is answered by the GDPR, which the AI Act presupposes and explicitly references. What the AI Act introduces, and what is functionally relevant for model selection in regulated AI deployments, is a governance layer: obligations concerning how AI systems in high-risk categories are documented, evaluated, and justified before and during deployment.
High-risk AI systems — including those used in critical infrastructure, employment decisions, essential services, and law enforcement — are subject to requirements covering risk management, data governance, technical documentation, human oversight, and accuracy. For systems that process sensitive personal data in these contexts, the selection of a privacy detection model is not an internal technical decision. It is a point of regulatory accountability. The methodology used to select the model, the criteria applied, and the evidence supporting the choice are all subject to audit. A selection process that cannot be documented and defended does not meet the AI Act's governance standard, regardless of the model's reported performance.
The deployment domain: where the relevance gap becomes visible
The normative framework defines, at the level of legal obligation, which categories of information require protection. It does not determine which of those categories are present in a specific system's operational data flows.
That determination depends on the deployment domain — the type of system, the interactions it handles, the population it serves, and the nature of the data it processes. Two deployments in the same regulated sector, governed by the same normative framework, can have entirely different sensitivity profiles depending on what the system does. A clinical documentation system and an administrative service channel for health plan beneficiaries may both be governed by health data regulations. The categories that actually appear in their respective data flows differ substantially. So does the consequence of a missed detection.
This gap between the abstract regulatory taxonomy and the operational sensitivity profile exists by construction. Normative frameworks are designed to be general — they define obligations that apply across a sector, not within a specific system. Deployment contexts are particular by nature — they reflect specific interaction types, data structures, and user populations. Bridging that gap requires analysis of the actual data flows in the operational context: which categories appear, at what frequency, and with what regulatory consequence if a detection fails.
The domain does not change what the normative framework requires. It determines which requirements are operationally critical — and therefore which model capabilities and which performance thresholds are non-negotiable for a given deployment.
The model: taxonomy mismatch and benchmark asymmetry
Knowing the applicable normative framework and understanding the deployment domain establishes what must be detected and where. It does not establish whether a candidate model can perform that detection with the sensitivity the context demands. That requires examining two distinct properties of the model: its entity taxonomy and the methodology used to benchmark its performance.
Taxonomic mismatch
A model's entity taxonomy — the set of categories it is designed to detect — and the normative taxonomy defined by the applicable framework are two distinct vocabularies. They do not map cleanly onto each other.
A model may declare support for "identifier types" or "personal data categories" without its entity definitions corresponding to the specific classes a given regulation requires. HIPAA's "device identifiers and serial numbers" and a model's "device ID" entity may overlap substantially or may not — depending on how each was defined and what training data was used. The GDPR's contextual definition of personal data cannot be directly implemented as a model entity class; it requires an operational interpretation that the model's taxonomy may or may not reflect.
Coverage declarations — statements that a model supports detection of a given number of personal data categories — do not resolve this mapping problem. They describe the model's vocabulary, not its alignment with any specific normative framework. Evaluating taxonomic coverage requires a comparison between the domain-required sensitive classes and the model's actual entity definitions, not its declared category count.
Benchmark asymmetry
Standard performance metrics — precision, recall, and F1 — treat false positives and false negatives as equivalent costs within a symmetric optimization objective. In regulated privacy detection contexts, the costs are not equivalent.
A false positive flags information as sensitive when it is not. The consequence is unnecessary redaction and reduced data utility. A false negative fails to flag information that is sensitive. The consequence is that protected information remains exposed — a compliance failure whose regulatory, financial, and reputational consequences downstream controls may not be able to remedy.
A model optimized for F1 may achieve a high aggregate score by trading false negatives for false positives in the classes where that tradeoff improves the overall metric. That tradeoff is acceptable in many NLP applications. In privacy detection for regulated environments, it is not: the regulatory cost of a false negative is not recoverable through the same mechanism as the operational cost of a false positive. As Hartman et al. (2020) note in the context of clinical note de-identification, high sensitivity is a prerequisite rather than a tradeoff.
The benchmark methodology used to evaluate a model does not reveal this asymmetry. A high precision score does not indicate that the model's false negative rate on the most critical categories meets the threshold the deployment requires. It indicates that the model performs well on the metric it was optimized for — which may or may not correspond to the performance criterion that matters in the operational context.
What a rigorous selection methodology must address
The three conditions examined here — the applicable normative framework, the deployment domain, and the model's taxonomic and benchmarking properties — each constrain model selection in a way the others cannot substitute for. A methodology that addresses only one or two of them leaves a structural gap that reported performance metrics will not reveal.
A rigorous model selection methodology for privacy detection in regulated environments must satisfy five criteria.
First, it must incorporate the applicable normative framework explicitly — not as background context but as a functional component that determines which sensitive classes are evaluated and how their absence affects the defensible compliance posture.
Second, it must account for the deployment domain by mapping the normative taxonomy against the actual data flows in the operational context, identifying which categories are present, at what density, and with what regulatory consequence if missed.
Third, it must use a performance metric that reflects the asymmetric cost of false negatives over false positives — one that does not permit strong overall performance to conceal failures on operationally critical categories.
Fourth, it must evaluate taxonomic coverage against the domain-required classes, not against the model's declared entity vocabulary. Coverage is a function of alignment between the model's definitions and the normative requirements, not a function of declared category count.
Fifth, it must produce a result that is documentable and defensible before a regulatory review. For deployments subject to the EU AI Act's governance requirements, the selection process itself is an audit point. A methodology that yields a traceable, evidence-based rationale satisfies that requirement. A selection process that reduces to a comparison of vendor-reported benchmarks does not.
References
- U.S. Department of Health and Human Services. Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule. 2025.
- European Union. Regulation (EU) 2016/679 — General Data Protection Regulation. 2016. Articles 4, 9, 35.
- República Argentina. Ley 25.326 de Protección de Datos Personales. 2000.
- Agencia de Acceso a la Información Pública. Resolución 4/2019: Criterios orientadores e indicadores de mejores prácticas en la aplicación de la Ley 25.326. 2019.
- European Union. Regulation (EU) 2024/1689 — Artificial Intelligence Act. 2024. Articles 9–15.
- Hartman, T. et al. (2020). Customization scenarios for de-identification of clinical notes. BMC Medical Informatics and Decision Making, 20, 14.