QA & Test Engineers
Six weeks of legal review.
Or one command.
You have four test environments and every one of them is waiting on data. Your test deadline is Friday. Legal will not approve production extracts. Your QA team is testing against 14 patients, all named Test Patient, all born January 1, 1980. And you already know: what works in dev will break in prod, because your test data looks nothing like production.
The Test Data Problem
You have three options today. None of them work.
Copy production data.
Six weeks of legal review. HIPAA risk assessment. De-identification vendor contract. Budget approval. By the time you get the data, your test deadline has passed.
Build test data by hand.
14 patients with identical demographics. No clinical variance. No edge cases. Your tests pass because your data is too simple to fail — not because your system is correct.
Use a generic data generator.
Patients with diabetes prescribed pediatric antibiotics. Lab results that are physiologically impossible. Data that looks synthetic because it is — and your system handles it differently than real data.
The common thread: your test environment does not reflect production. And the failures you find in production are the failures your test data was too simple to reveal. The 99-year-old patient. The neonate. The polypharmacy case. The patient with 47 active diagnoses. The edge cases that cause production failures are the edge cases your test environment never contains.
What Changes With Pidgeon
De-Identify Without Waiting
On-device. No data leaves your machine. No vendor contract. No legal review. Your existing production messages become safe test data in one command, with referential integrity preserved across related messages.
$ pidgeon deident --in ./production_samples --out ./safe_test_data --date-shift 30d
Processed 2,847 messages
✓ Names replaced (consistent cross-message)
✓ MRNs hashed (referential integrity preserved)
✓ Dates shifted +30 days (temporal coherence maintained)
✓ Addresses replaced
✓ SSNs removed
✓ 18 HIPAA Safe Harbor identifier categories handled
Zero PHI in output. Nothing to approve. Nothing to breach.
Generate What Production Looks Like
Flock generates patient populations grounded in real epidemiological data. CDC WONDER prevalence rates. Census demographic distributions. NHANES lab value ranges. Schema-aware output that respects your database's foreign key constraints.
Not 14 test patients — 14,000
Realistic age distributions, comorbidity correlations, and clinical variance — including the edge cases that break production systems.
Epidemiologically grounded
CDC WONDER prevalence rates, Census demographic distributions, and NHANES lab value ranges. Your synthetic data matches what real populations look like.
Schema-aware output
SQL INSERT (FK-ordered), CSV, HL7 message streams, or FHIR bundles. Shaped to your schema, not a generic template.
The test environment that actually reflects production
No more 'works in dev, breaks in prod.' Your test environments contain the demographic variance, clinical complexity, and edge cases that production throws at you. When your tests pass, they mean something.
Edge cases on demand
The 99-year-old patient. The neonate. The polypharmacy case. The patient with 47 active diagnoses. Generate the exact edge cases that cause production failures.
Clinically coherent scenarios
ICD-10 diagnoses, LOINC lab codes, NDC drug codes, and CVX vaccine codes applied consistently. Your test data reflects real clinical relationships, not random combinations.
Referential integrity preserved
Cross-message consistency across entire patient records. The same patient MRN, the same visit dates, the same provider — cohesive across every message in a test run.
Population-scale output
10,000 patients with realistic Mississippi diabetes prevalence. Age distributions, comorbidity correlations, and lab values that match CDC data — generated in under a minute.
It is Tuesday morning. Your test environment needs 10,000 patients with realistic Mississippi diabetes prevalence for a population health module validation. Flock generates them in under a minute — age distributions, comorbidity correlations, and lab values that match CDC data. Your QA team starts testing before lunch. The compliance officer asks about PHI exposure. You tell her: “No PHI ever existed. There is nothing to scrub, nothing to approve, nothing to breach.”
The right Pidgeon product for each QA challenge
Post by Pidgeon
Free CLIGenerate clinically coherent HL7/FHIR test messages at volume. Validate against specs. De-identify production messages on-device in seconds. The free CLI that every QA engineer should have installed.
- On-device de-identification — 18 HIPAA Safe Harbor categories
- Temporal coherence across admission, order, and result messages
- Strict and compatibility validation modes
- HL7 v2.3.1–v2.8, FHIR R4, NCPDP SCRIPT
$ dotnet tool install pidgeon
$ pidgeon deident --in ./samples --out ./safe --date-shift 30d
Flock by Pidgeon
Coming SoonGenerate entire synthetic patient populations for database-level testing. Schema-aware, relationally consistent, epidemiologically grounded. The tool for QA teams who need their test environment to look like a real production database.
- CDC WONDER disease prevalence — grounded in real epidemiology
- SQL DDL schema reader with FK constraint graph
- FK-ordered SQL INSERT, CSV, HL7 streams, FHIR bundles
- Population analytics — validate distributions against CDC data
Flock population generation is in development. Join the waitlist to be first.
Join Flock WaitlistDe-Identification Is Free. Start Today.
pidgeon deident ships with the free CLI. No PHI ever touches the network — processing runs entirely on your machine. Enter your email to get the secure Desktop download.