methodology and early results from a novel national database
BACKGROUND: Systematized Nomenclature of Medicine (SNOMED) codes are computer-processable medical terms used to describe histopathological evaluations. SNOMED codes are not readily usable for analysis. We invented an algorithm that converts prostate SNOMED codes into an analyzable format. We present the methodology and early results from a new national Danish prostate database containing clinical data from all males who had evaluation of prostate tissue from 1995 to 2011. MATERIALS AND METHODS: SNOMED codes were retrieved from the Danish Pathology Register. A total of 26,295 combinations of SNOMED codes were identified. A computer algorithm was developed to transcode SNOMED codes into an analyzable format including procedure (eg, biopsy, transurethral resection, etc), diagnosis, and date of diagnosis. For validation, ~55,000 pathological reports were manually reviewed. Prostate-specific antigen, vital status, causes of death, and tumor-node-metastasis classification were integrated from national registries. RESULTS: Of the 161,525 specimens from 113,801 males identified, 83,379 (51.6%) were sets of prostate biopsies, 56,118 (34.7%) were transurethral/transvesical resections of the prostate (TUR-Ps), and the remaining 22,028 (13.6%) specimens were derived from radical prostatectomies, bladder interventions, etc. A total of 48,078 (42.2%) males had histopathologically verified prostate cancer, and of these, 78.8% and 16.8% were diagnosed on prostate biopsies and TUR-Ps, respectively. FUTURE PERSPECTIVES: A validated algorithm was successfully developed to convert complex prostate SNOMED codes into clinical useful data. A unique database, including males with both normal and cancerous histopathological data, was created to form the most comprehensive national prostate database to date. Potentially, our algorithm can be used for conversion of other SNOMED data and is available upon request.