Detecting Healthcare Fraud with ML

MAR 05 25

This is Part 2 of "The Compliance-First Fintech Playbook" series. Part 1 covered healthcare KYC/KYB requirements. Part 3 examines full compliance program costs.

At 2:47 AM, our fraud detection system flagged a $3,200 lab equipment purchase from "MedSupply Express"—the fourth such transaction this week from different dental practices, all using cards issued minutes before the purchase. By 3:15 AM, we had blocked $47,000 in coordinated fraud across 12 practices.

Traditional banking fraud systems never would have caught this. The individual transactions were under typical velocity thresholds, the merchants appeared legitimate, and the purchasing patterns looked like normal supply orders.

Healthcare fraud has unique signatures that standard fintech platforms miss entirely. After analyzing fraud patterns across 777 practices, we learned that effective healthcare fraud detection requires understanding both the clinical workflow and the financial ecosystem that supports it.

Healthcare Fraud Patterns Standard Banking Systems Miss

Consumer banking fraud detection focuses on velocity, geography, and merchant category anomalies. Healthcare fraud exploits clinical purchasing patterns and practice operational flows that general-purpose systems don't understand.

After-Hours Lab Order Fraud: Legitimate dental labs close at 5-6 PM. Orders placed at 10 PM using newly-issued virtual cards represent 89% fraud in our dataset. Standard fraud systems only check velocity and spending limits—they don't understand healthcare business hours.

Supply Vendor Impersonation: Fraudsters create shell companies with names like "Dental Supply Solutions" or "Healthcare Equipment LLC." To consumers, these look legitimate. To dental practices, these names don't match known suppliers like Patterson, Schein, or Benco. Our ML model learned legitimate supplier patterns from 777 practices' transaction history.

Equipment Financing Exploitation: Fraudsters exploit the fact that dental equipment purchases range from $15,000 (digital X-ray) to $150,000 (cone beam CT scanner). They submit fraudulent applications with stolen practice information for mid-range equipment that doesn't trigger manual underwriting review.

Insurance Reimbursement Timing Attacks: Legitimate practices show predictable cash flow patterns—low balances followed by insurance reimbursement deposits. Fraudsters exploit this by timing large purchases during historically low-balance periods, knowing practices expect deposits within 2-3 days.

These patterns are invisible to traditional fraud detection because they require understanding healthcare operations.

The ML Architecture That Actually Works

Healthcare fraud detection requires feature engineering that captures clinical and operational context beyond standard transactional features.

Feature Categories That Matter

Temporal Features:

  • Time-of-day for specific transaction types
  • Day-of-week patterns for lab orders vs. supply purchases
  • Equipment purchase timing relative to practice business hours
  • Insurance deposit intervals and payment timing patterns

Practice Operational Features:

  • Provider NPI registration date vs. account opening date
  • Practice location type (strip mall vs. medical building vs. hospital)
  • Staff size indicators from payroll transaction patterns
  • Patient volume estimates from card transaction frequency

Vendor Ecosystem Features:

  • Supplier name matching against known healthcare vendors
  • Shipping address consistency with practice location
  • Equipment model numbers matching practice specialty
  • Lab order frequency matching practice patient volume

Financial Flow Features:

  • Insurance reimbursement patterns and timing
  • Payroll regularity and staff count implications
  • Equipment financing payment consistency
  • Supply purchasing seasonality (back-to-school, holiday patterns)

Model Architecture

Ensemble approach combining:

1. XGBoost for tabular features:

# Healthcare-specific feature engineering
def engineer_healthcare_features(transaction_data, practice_data):
    features = {}
    
    # Business hours scoring
    features['business_hours_score'] = calculate_business_hours_score(
        transaction_data.timestamp, 
        practice_data.specialty
    )
    
    # Supplier validation
    features['known_supplier_score'] = validate_supplier_name(
        transaction_data.merchant_name,
        practice_data.specialty,
        known_suppliers_db
    )
    
    # Cash flow timing
    features['cash_flow_timing_score'] = analyze_cash_flow_timing(
        transaction_data.amount,
        practice_data.recent_deposits,
        practice_data.typical_cycle
    )
    
    return features

# XGBoost model for fraud scoring
xgb_model = XGBClassifier(
    n_estimators=200,
    max_depth=6,
    learning_rate=0.1,
    subsample=0.8,
    colsample_bytree=0.8,
    objective='binary:logistic'
)

2. LSTM for sequence modeling:

# Transaction sequence analysis
def build_sequence_model():
    model = Sequential([
        LSTM(128, return_sequences=True, input_shape=(30, n_features)),
        Dropout(0.2),
        LSTM(64, return_sequences=False),
        Dropout(0.2),
        Dense(32, activation='relu'),
        Dense(1, activation='sigmoid')
    ])
    
    model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy', 'precision', 'recall']
    )
    
    return model

# Analyze 30-day transaction windows
sequence_features = create_sequence_features(
    transactions_30d, 
    practice_operational_data
)

3. Graph Neural Network for practice relationships:

# Practice and supplier relationship modeling
def build_practice_graph():
    # Nodes: practices, suppliers, equipment vendors
    # Edges: transaction relationships, geographic proximity
    
    practice_supplier_graph = nx.Graph()
    
    # Add practice nodes with features
    for practice in practices:
        practice_supplier_graph.add_node(
            practice.id, 
            node_type='practice',
            specialty=practice.specialty,
            location=practice.location,
            risk_score=practice.risk_score
        )
    
    # Add supplier nodes
    for supplier in suppliers:
        practice_supplier_graph.add_node(
            supplier.id,
            node_type='supplier', 
            legitimacy_score=supplier.legitimacy_score
        )
    
    # Add transaction edges with weights
    for transaction in transactions:
        practice_supplier_graph.add_edge(
            transaction.practice_id,
            transaction.supplier_id,
            weight=transaction.frequency,
            amount_avg=transaction.amount_avg
        )
    
    return practice_supplier_graph

Real-Time Scoring Pipeline

Decision flow for transaction authorization:

def real_time_fraud_scoring(transaction, practice_context):
    # Sub-second scoring requirement
    start_time = time.time()
    
    # Feature extraction (< 50ms)
    features = extract_features(transaction, practice_context)
    
    # ML model inference (< 100ms) 
    xgb_score = xgb_model.predict_proba([features.tabular])[0][1]
    
    # Sequence model for historical context (< 100ms)
    sequence_score = lstm_model.predict([features.sequence])[0][0]
    
    # Graph context (< 50ms)
    graph_score = graph_model.score_transaction(
        transaction, practice_context
    )
    
    # Ensemble scoring
    final_score = (
        0.5 * xgb_score + 
        0.3 * sequence_score + 
        0.2 * graph_score
    )
    
    # Decision thresholds
    if final_score > 0.85:
        return "BLOCK", final_score
    elif final_score > 0.65:
        return "REVIEW", final_score  
    else:
        return "APPROVE", final_score
    
    # Total processing time < 300ms

Device Fingerprinting for Healthcare Context

Standard device fingerprinting tracks browser attributes and network information. Healthcare fraud requires understanding device usage patterns specific to clinical workflows.

Healthcare-Specific Device Signals

Practice Management System Integration: Most dental practices use integrated POS systems (Dentrix, Eaglesoft, Open Dental) that create predictable device fingerprints. Transactions from generic web browsers or mobile apps during business hours often indicate fraud.

Network Infrastructure Patterns: Legitimate practices typically use business-class internet with static IP addresses and consistent network equipment. Residential IP addresses during business hours or frequent IP changes flag potential account takeovers.

Peripheral Device Detection: Dental practices use specialized equipment (card readers, signature pads, receipt printers) that create unique USB device signatures. Transactions without these peripherals during patient visits indicate potential fraud.

Implementation Example

// Healthcare-specific device fingerprinting
function generateHealthcareDeviceFingerprint() {
    const fingerprint = {
        // Standard signals
        userAgent: navigator.userAgent,
        screenResolution: `${screen.width}x${screen.height}`,
        timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
        
        // Healthcare-specific signals
        pmsIntegration: detectPMSIntegration(),
        peripheralDevices: detectUSBPeripherals(),
        networkClass: classifyNetworkInfrastructure(),
        businessHoursContext: calculateBusinessHoursContext()
    };
    
    return hashFingerprint(fingerprint);
}

function detectPMSIntegration() {
    // Check for common PMS system markers
    const pmsIndicators = [
        'DentrixConnector',
        'EaglesoftBridge', 
        'OpenDentalAPI'
    ];
    
    return pmsIndicators.some(indicator => 
        window[indicator] || document.querySelector(`[data-pms="${indicator}"]`)
    );
}

function detectUSBPeripherals() {
    // USB device enumeration (requires permissions)
    if ('usb' in navigator) {
        return navigator.usb.getDevices().then(devices => {
            return devices.filter(device => 
                isHealthcarePeripheral(device.vendorId, device.productId)
            ).length;
        });
    }
    return 0;
}

Velocity Checks That Understand Healthcare

Traditional velocity checks use simple thresholds: $X per day, Y transactions per hour. Healthcare practices have predictable but complex spending patterns that require contextual velocity modeling.

Practice Specialty Velocity Profiles

General Dentistry:

  • Daily supply spending: $200-800
  • Lab orders: 2-8 per day, $150-400 each
  • Equipment purchases: Monthly, $2,000-15,000
  • Insurance deposits: 2-3x weekly, $5,000-25,000

Oral Surgery:

  • Higher supply costs due to surgical materials
  • Equipment purchases include specialized surgical tools
  • Different lab relationships (pathology vs. prosthetics)
  • Payment patterns include hospital facility fees

Orthodontics:

  • Bulk supply purchases for appliances
  • Quarterly equipment maintenance
  • Different lab relationships focused on appliance fabrication
  • Consistent patient payment plans

Dynamic Velocity Modeling

def calculate_dynamic_velocity_limits(practice, transaction_type):
    # Base limits by practice specialty
    base_limits = get_specialty_base_limits(practice.specialty)
    
    # Adjust for practice size
    size_multiplier = calculate_size_multiplier(
        practice.patient_volume,
        practice.staff_count
    )
    
    # Seasonal adjustments
    seasonal_multiplier = get_seasonal_multiplier(
        practice.specialty,
        datetime.now().month
    )
    
    # Recent pattern analysis
    pattern_multiplier = analyze_recent_patterns(
        practice.id,
        transaction_type,
        days_lookback=30
    )
    
    dynamic_limit = (
        base_limits[transaction_type] * 
        size_multiplier * 
        seasonal_multiplier * 
        pattern_multiplier
    )
    
    return {
        'daily_amount': dynamic_limit * 1.0,
        'weekly_amount': dynamic_limit * 5.5,
        'monthly_amount': dynamic_limit * 22,
        'transaction_count_hourly': calculate_transaction_limits(dynamic_limit)
    }

# Example velocity check implementation
def check_velocity_limits(transaction, practice):
    limits = calculate_dynamic_velocity_limits(practice, transaction.type)
    
    current_usage = get_current_period_usage(practice.id, transaction.type)
    
    violations = []
    
    if current_usage.daily_amount + transaction.amount > limits['daily_amount']:
        violations.append('daily_amount_exceeded')
    
    if current_usage.hourly_count >= limits['transaction_count_hourly']:
        violations.append('hourly_count_exceeded')
    
    return violations

SAR Filing Requirements for Healthcare Fintechs

Suspicious Activity Report (SAR) filing becomes complex in healthcare contexts because legitimate clinical activities can appear suspicious to traditional banking criteria.

Healthcare-Specific SAR Triggers

Structuring Detection: Healthcare practices naturally structure payments to avoid large cash deposits due to patient payment patterns. A practice receiving many $200-300 cash payments for dental work appears like structuring but represents normal patient co-pays.

Cross-Border Activity: Practices near borders (Texas-Mexico, Washington-Canada) have legitimate cross-border patient flows. Canadian patients paying U.S. dental practices or Mexican nationals receiving dental tourism services create complex reporting scenarios.

High-Risk Geography: Traditional SAR filing flags transactions from high-risk countries. Healthcare practices treating immigrant populations or providing charitable care in underserved areas trigger these flags despite legitimate clinical reasons.

Professional Service Exemptions: Healthcare providers qualify for certain BSA exemptions, but fintech platforms must carefully document the basis for these exemptions and monitor for changes in practice operations.

SAR Filing Decision Framework

class HealthcareSARAnalysis:
    def __init__(self):
        self.clinical_exemptions = load_clinical_exemptions()
        self.geographic_contexts = load_geographic_contexts()
        self.professional_service_rules = load_professional_rules()
    
    def evaluate_suspicious_activity(self, activity_pattern, practice_context):
        # Initial suspicion scoring
        base_suspicion_score = self.calculate_base_suspicion(activity_pattern)
        
        # Healthcare context adjustments
        clinical_adjustment = self.apply_clinical_context(
            activity_pattern, 
            practice_context
        )
        
        geographic_adjustment = self.apply_geographic_context(
            activity_pattern,
            practice_context.location
        )
        
        professional_adjustment = self.apply_professional_service_context(
            activity_pattern,
            practice_context.credentials
        )
        
        final_score = (
            base_suspicion_score + 
            clinical_adjustment + 
            geographic_adjustment + 
            professional_adjustment
        )
        
        # SAR filing decision
        if final_score > SAR_FILING_THRESHOLD:
            return self.prepare_sar_filing(activity_pattern, practice_context)
        else:
            return self.document_no_action_decision(final_score, reasoning)
    
    def apply_clinical_context(self, pattern, context):
        # Adjust for legitimate clinical activities
        adjustments = 0
        
        # Cash payment patterns common in healthcare
        if pattern.type == 'cash_deposits' and pattern.amount_range == (200, 400):
            if context.specialty in ['general_dentistry', 'family_medicine']:
                adjustments -= 0.3  # Reduce suspicion
        
        # Equipment purchase patterns
        if pattern.type == 'large_purchases':
            if self.validate_equipment_purchase(pattern, context):
                adjustments -= 0.4  # Legitimate equipment purchase
        
        # Cross-border healthcare services
        if pattern.type == 'cross_border_activity':
            if context.location in BORDER_REGIONS:
                adjustments -= 0.2  # Dental tourism context
        
        return adjustments

FinCEN Coordination for Healthcare Context

Healthcare fintechs must establish protocols with FinCEN for healthcare-specific reporting nuances:

Pre-Filing Consultation: For complex healthcare scenarios, consultation with FinCEN's Financial Institutions Hotline (866-556-3974) can clarify reporting requirements before filing.

Healthcare Industry Liaison: FinCEN maintains industry-specific guidance for healthcare BSA compliance. Healthcare fintechs should establish relationships with appropriate FinCEN analysts.

Documentation Standards: Healthcare SAR filings require additional documentation about clinical context, patient care rationale, and professional service delivery that standard SARs don't address.

Fraud Prevention Through Practice Education

The most effective healthcare fraud prevention combines technical detection with practice education about emerging threats.

Practice-Facing Fraud Alerts

def generate_practice_fraud_alert(practice, threat_type, context):
    alerts = {
        'supply_vendor_impersonation': {
            'title': 'New Supplier Verification Required',
            'message': f'''
            We've detected a payment attempt to "{context.merchant_name}" 
            which doesn't match your usual suppliers (Patterson, Schein, Benco). 
            
            Please verify:
            - Is this a legitimate new supplier?
            - Did you initiate contact with them?
            - Do they have proper healthcare industry credentials?
            
            Contact us at [phone] if you need to authorize this payment.
            ''',
            'urgency': 'high'
        },
        
        'after_hours_activity': {
            'title': 'Unusual After-Hours Transaction',
            'message': f'''
            We've flagged a ${context.amount:,.2f} transaction at {context.time} 
            which is outside your typical business hours.
            
            If this was authorized by your practice:
            - Reply "AUTHORIZE" to approve this payment
            - All future payments to this vendor will be approved
            
            If this was not authorized:
            - We've temporarily blocked the payment
            - Please call us immediately at [phone]
            ''',
            'urgency': 'immediate'
        }
    }
    
    return alerts.get(threat_type, generate_generic_alert(context))

Fraud Pattern Briefings

Monthly briefings to practices about emerging fraud patterns help prevent account compromises:

Quarter Q1 2025 Brief:

  • Shell supply companies targeting dental practices
  • Equipment financing application fraud using stolen NPI numbers
  • Business email compromise targeting practice administrators
  • Patient payment fraud using stolen insurance information

The Technical Infrastructure for Real-Time Detection

Healthcare fraud detection requires sub-second decision-making with high accuracy and explainability for regulatory reporting.

Architecture Requirements

Latency targets:

  • Feature extraction: < 50ms
  • ML model inference: < 100ms
  • Rule engine evaluation: < 25ms
  • Decision logging: < 25ms
  • Total authorization decision: < 200ms

Accuracy targets:

  • False positive rate: < 2% (minimize practice disruption)
  • False negative rate: < 0.5% (minimize fraud losses)
  • Explainability: 100% (required for SAR filings)

Scalability requirements:

  • 1,000+ transactions per second per practice
  • Real-time model updates without downtime
  • Geographic distribution for sub-50ms latency
  • Audit trail retention for 5+ years

Production Implementation

# Infrastructure architecture
fraud_detection_service:
  load_balancer:
    type: "Application Load Balancer"
    health_check_path: "/health"
    
  application:
    container_count: 12
    cpu_limit: "2000m"
    memory_limit: "4Gi"
    autoscaling:
      min_replicas: 6
      max_replicas: 50
      target_cpu: 70%
    
  ml_models:
    xgboost:
      inference_endpoint: "/models/xgb/predict"
      model_version: "v2.1.3"
      latency_sla: "100ms"
    
    lstm:
      inference_endpoint: "/models/lstm/predict" 
      model_version: "v1.8.2"
      latency_sla: "150ms"
    
    graph_neural_net:
      inference_endpoint: "/models/gnn/predict"
      model_version: "v1.2.1"
      latency_sla: "75ms"
  
  data_stores:
    feature_store:
      type: "Redis Cluster"
      node_count: 6
      memory_per_node: "16GB"
      
    transaction_history:
      type: "PostgreSQL"
      instance_type: "db.r5.2xlarge"
      read_replicas: 3
      
    model_artifacts:
      type: "S3"
      versioning: true
      lifecycle_policy: "retain_5_years"

From Detection to Prevention: The Feedback Loop

Effective fraud detection creates feedback loops that improve both technical accuracy and practice operational security.

Model improvement cycle:

  1. Detection: Real-time fraud flagging with confidence scores
  2. Investigation: Manual review with healthcare context analysis
  3. Labeling: Ground truth establishment for confirmed fraud/legitimate
  4. Retraining: Monthly model updates with new labeled data
  5. Deployment: A/B testing for model performance validation

Practice security improvement:

  1. Pattern sharing: Aggregate fraud insights across practice network
  2. Education: Targeted security training based on practice vulnerabilities
  3. Process improvement: Operational recommendations to reduce fraud exposure
  4. Vendor verification: Maintained whitelist of legitimate healthcare suppliers

The result: Healthcare fintechs that understand fraud at this technical level can offer superior protection while maintaining operational efficiency for legitimate clinical activities.

Next: Part 3 examines the full cost structure of compliance programs and why most healthcare fintechs underestimate these requirements by 3-5x.


Data sources: Internal fraud detection analysis across 777 practices, FinCEN SAR filing guidance, healthcare fraud pattern analysis from Q3-Q4 2024 operational data