RedactoMatic is an open source command-line application that de-identifies (removes) personally identifiable information (PII) from contact center conversational data between agents and customers or bots and customers. It works with transcribed voice calls AND chat logs in a CSV format.
Many companies have security, privacy, and regulatory standards that require them to mask and/or remove PII information before it can be used and shared for data science purposes internally or externally. However, few vendors and no open source PII redaction tools comply with data protection and privacy regulations and standards. That is why we created Redactomatic, out of necessity. Redactomatic has been tested with NICE Nexidia and Verint voice transcriptions as well as LivePerson chat logs. However it will work with most conversation CSV input files for voice transcription, speech to text, live chat, chatbot, and IVR vendors. Many of the large financial institutions and telecoms in the US and around the world utilize Redactomatic to meet internal and regulatory requirements.
Redactomatic is a multi-pass redaction tool with an optional anonymization function that reads in CSV files as input. When PII is detected, it is removed and replaced with an entity tag, like [ORDINAL]. If the --anonymization flag is added, Redactomatic will replace PII entity tags with randomized values. For example, [PERSON] might be replaced by the name John. This is useful when sharing datasets that need PII removed but also need some real world like value. The first redaction pass utilizes the Spacy 3 named entity recognition library. You have the option of using the large Spacy library if you add the --large command-line parameter, which will increase the number of correctly recognized PII entities, but will also take longer. After the Spacy NER pass, the subsequent passes utilize regular expressions. The reason multiple passes are needed is that machine learning libraries like Spacy are not reliable and cannot catch all PII. That is obviously not acceptable for financial services and other regulated industries. While large companies use this tool for mission critical applications, please test and validate the results before using it in production and report any anomalies to the authors.
Redactomatic
Copyright © 2021 Redactomatic - All Rights Reserved.
Powered by GoDaddy
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.