Explainable AI for Credit Risk Decisions
HEALTHCARE | DE-IDENTIFICATIONThe Challenge: A healthcare organization needed to safely leverage case notes for analytics and model development, but the raw text contained extensive PHI/PII, including names, dates, locations, and medical identifiers. Manual redaction was impossible and not scalable due to the volume of the reports, creating compliance risks under HIPAA and PHIPA, and blocking downstream AI/ML initiatives.
The Solution: I built an automated NLP-driven de-identification pipeline combining regex patterning, Named Entity Recognition (NER) using transformer models, and context-aware scoring to accurately detect and mask sensitive entities. The pipeline included a two-stage hybrid approach: deterministic rules for structured PHI and ML models for contextual PHI. To ensure enterprise readiness, I integrated confidence-threshold tuning, audit logs, and customizable redaction policies for different data-sharing use cases.
The Impact: Delivered 97%+ entity-level recall across 12 PHI categories while reducing false positives. Enabled safe, compliant data access and cut de-identification time from weeks to minutes, unlocking faster model development cycles and secure cross-team data sharing.