On March 31, 2026, four researchers submitted a preprint to arXiv introducing Symphony for Medical Coding, an agentic AI system that claims state-of-the-art performance across five clinical coding datasets. Lead author Joakim Edin and co-authors Andreas Motzfeldt, Simon Flachs, and Lars Maaløe published the work as arXiv:2603.29709. The system targets two longstanding limitations of existing automated coding tools: inflexibility to new code sets and a lack of explainability.
- Symphony reasons over clinical text using live coding guidelines, enabling it to work across any classification system without retraining.
- The system provides span-level evidence for each code prediction, linking it directly to supporting text in the clinical document.
- Evaluated on two public benchmarks and three real-world datasets covering inpatient, outpatient, emergency, and subspecialty settings in the US and UK.
- The authors describe Symphony as “a flexible, deployment-ready foundation for automated clinical coding.”
What Happened
Joakim Edin, Andreas Motzfeldt, Simon Flachs, and Lars Maaløe submitted a paper to arXiv on March 31, 2026 detailing Symphony for Medical Coding, a system designed to automate the conversion of free-text clinical records into standardized billing and research codes. The paper is available at arXiv:2603.29709. Medical coding underlies billing, clinical research, and quality reporting across hospital systems globally, yet the authors characterize the process as remaining largely manual, slow, and error-prone.
The paper frames Symphony as a response to two structural failures in existing automated coding systems: they are trained on fixed code sets and cannot adapt to new or different classification schemes without full retraining on new labeled data, and they produce no explanations for their outputs. The authors argue these limitations have prevented automated coding from moving beyond controlled settings into routine clinical use.
Why It Matters
Classification systems used in medical coding — such as ICD-10-CM in the United States — contain tens of thousands of entries and are updated annually. Existing automated approaches train on a fixed snapshot of these codes, meaning any annual update to the classification scheme requires retraining on newly labeled data, a resource-intensive process that delays deployment and reduces coverage of newly introduced codes.
The explainability gap adds a second barrier. Prior automated systems produce code predictions without indicating which parts of the clinical text led to each prediction. In billing, compliance, and clinical research settings, outputs that cannot be traced to source documentation are difficult to audit or challenge — a constraint that has limited deployment even where prediction accuracy is acceptable.
Technical Details
Symphony is designed to reason over the clinical narrative with direct access to coding guidelines — a method the authors describe as analogous to how expert human coders approach the task. Rather than predicting from a fixed, pre-trained code vocabulary, the system queries the coding guidelines at inference time, which allows it to handle codes not seen during training and to operate across different coding systems without retraining.
A key architectural feature is span-level explainability: Symphony links each code prediction to the specific text span in the clinical document that supports it. This allows coders, auditors, and compliance officers to trace every assigned code back to its source in the patient record.
The researchers evaluated Symphony on five datasets — two public benchmarks and three real-world datasets — spanning inpatient, outpatient, emergency, and subspecialty settings across the United States and the United Kingdom. The authors report state-of-the-art results across all five settings. Detailed performance metrics, including accuracy breakdowns by code category and clinical setting, are available in the full paper and were not disclosed in the abstract.
Who’s Affected
Hospital revenue cycle and billing departments are the most direct stakeholders. Coding errors — whether undercoding or overcoding — directly affect reimbursement rates, compliance audit outcomes, and claim approval. A system with auditable, span-level outputs could reduce the manual review burden on human coders while providing a traceable record for payers and regulators.
Clinical AI developers building integrations with electronic health record platforms will find the coding-system-agnostic architecture relevant. Because Symphony does not require retraining for each classification scheme, it could be deployed across different institutional or national coding standards without maintaining separate model versions for each. Health informatics researchers may also benefit from the span-level linking feature for annotation quality checks and dataset construction.
What’s Next
The authors characterize Symphony as “a flexible, deployment-ready foundation for automated clinical coding,” positioning the system for real-world pilots rather than continued laboratory development. Whether prospective clinical trials or external industry validations are planned was not disclosed in the preprint.
Key limitations — including performance on multilingual clinical records, non-English coding systems, or documentation from rare subspecialties — are not addressed in the abstract and will require independent evaluation before broad adoption. The preprint was submitted on March 31, 2026 and had not yet undergone peer review at time of publication. Institutional affiliations for the four authors were not available from the abstract at time of publication.