Simulated Data

Methodological research typically requires some benchmark or ‘gold standard’ against which to measure performance. In this context, a desired gold standard would be a true causal relationship between a drug and a health outcome. Unfortunately, most observational data sources are poorly characterized, clinical observations may be insufficiently recorded or poorly validated, and actual ‘truth’ may not be absolutely determined. True relationships between drugs and outcomes may be difficult to ascertain as these ‘known associations’ may be affected by issues including sample size, adequacy of data capture, and confounding.

Because of these issues and the desire to have a common, acceptable test set, OMOP designed and developed an automated procedure to construct simulated datasets to supplement the methods evaluation. The simulated datasets (OSIM - Observational Medical Dataset Simulator) are modeled after real observational data sources, and comprised of hypothetical persons with fictional drug exposure and health outcomes occurrence, but representative of the types of relationships expected to be observed within real observational data sources. Because the simulated data will represent hypothetical patients, fictional drug classes and outcomes types, there can be no clinical interpretations drawn from the data.

The simulated datasets will only be used to perform statistical evaluations of the analytical methods offered to identify drug-outcome associations. The performance characteristics (sensitivity, specificity, positive and negative predictive value) of the analytical methods can then be empirically measured in terms of the known characteristics of the data will enable the classification of the drug-outcome relationships as ‘true’ or ‘false’ and methods will be executed to classify the drug-outcome pairs as ‘positive’ or ‘negatives’.

OSIM Publications
Murray RE, Ryan PB, & Reisinger SJ. (2011). Design and Validation of a Data Simulation Model for Longitudinal Healthcare Data. AMIA Annu Symp Proc., USA, 2011: 1176–1185.