Skip to content

Personal Data Cleaning

The Personal Data Cleaning section is used to configure rules for detecting and processing personal and sensitive data in user inputs and model outputs. This module helps reduce data leakage risks, ensure compliance with internal policies and regulations, and improve the overall security posture of AI applications.

HiveTrace includes a set of predefined patterns covering common types of personal data. These rules are enabled by default and require no additional configuration.

Base Patterns

CategoryDescription
AddressMailing and physical addresses
Bank CardBank card numbers
Card CodeCVV / CVC and other security codes
OrganizationCompany and organization names
Job TitleProfessional titles and roles
NamesFirst and last names
PassportPassport details and document identifiers
PhoneMobile and landline phone numbers
EmailEmail addresses
IP AddressIPv4 and IPv6 addresses
Domain NameDomain names and hosts
INNTaxpayer Identification Number
KPPTax Registration Reason Code
OGRNPrimary State Registration Number
OGRNIPIndividual Entrepreneur Registration Number
SNILSPersonal insurance account number
Monetary AmountFinancial amounts
DateCalendar dates
PeriodTime ranges and intervals
DeadlineDue dates
DiagnosisMedical diagnoses
Access TokenAPI keys and access tokens
LinkURLs and web links

You can define custom detection patterns using regular expressions. Scroll down and click “Add Pattern”, then provide a name, description, regex pattern, and select the scope — input, output, or both.

Custom Patterns

For detected personal data, the following processing modes are available

Processing Modes

Processing TypeDescription
MaskingReplaces detected data with an anonymized placeholder (for example, XXXX)
DetectionFlags the presence of personal data without modifying the content
RemovalCompletely removes detected data from the message

Processing rules can be configured independently for input and output.

At the bottom of the page, a validation tool allows you to test how the data cleaning module processes messages containing personal data (or not) for both input and output based on the configured settings. This helps verify correctness before deploying the configuration to production.

Dataclean Testing 1 Dataclean Testing 2