OMOP Implementation

OMOP created a framework for observational research. This page contains the components and step-by-step instructions of how to implement this framework for your own environment.

Please note that all tools are Open Source and research grade. If you need help with implementation, contact us and we will bring you into the community of peers and vendors.

Data Transformation and Cohorts

Data Characterization and Quality Control

Queries, Analyses and Methods

Common Data Model - This gives you the description and the DDL to create your own CDM database instance. It also contains ETL implementations for a number of popular databases, including code.

  Vocabularies - This provides all the vocabularies used in the CDM. It also gives you the mapping tables you will need to convert from the source codes to the Standard Vocabularies (e.g. ICD-9 to SNOMED-CT or NDC to RxNorm).

Health Outcomes of Interest - This is a library of definitions of outcomes routinely studied for drugs.


OSCAR - Tool for systematic counts and summary statistics of the data.

  Data Quality and GROUCH - Tool for systematic testing for outliers in frequencies, data over time and boundaries.

NATHAN - This tool creates information about the natural history of disease.


Standardized Vocabulary Queries - A collection of queries to find concepts through use of the Standard Vocabulary.

  Standardized CDM Data Queries - A collection of queries you can use to interrogate the data.

Methods Library - A suite of analytical methods to detect association between intervention and outcome.


Drug Approvals

All drugs are approved by regulatory agencies after proof that they are safe and efficacious. In the US, the FDA approves each drug for marketing in the country. The DRUG_APPROVAL table contains the approval date for each ingredient in the Standard Vocabulary:

Column Name Description
APPROVAL_DATE Contains the approval date (see below)
INGREDIENT_ID Contains the CONCEPT_ID of the ingredient, e.g. 778268 for "Imipramine".
APPROVED_BY Contains the name of the regulatory agency. Currently, all records contain "FDA" as only FDA approvals are included.

The approval date was determined as follows:
Records from the regulatory action database the FDA publishes at were selected where the regulatory action was "Approval". These records were parsed for their product (active ingredient), and the earliest date an active ingredient ever received approval, in a standalone or combination drug, was selected as approval date. These active ingredients were mapped to the RxNorm ingredient records (vocabulary_id=8, concept_level=2).

Note: The date is the first approval date of an active ingredient. It does not contain approval dates for individual products, whether innovator or generic.

    Note: Not all ingredients could be mapped to an approval date. There are several reasons for a miss:
  1. The FDA reports regulatory actions quarterly. RxNorm publishes releases quarterly as well. In some cases, the FDA distribution is ahead, in other cases RxNorm already incorporated an ingredient but its approval was not yet included in the FDA database.
  2. Many ingredients do not require a formal approval if they are naturally occurring compounds.
  3. Some ingredients have duplicate entries or entries with different salt formulations. For some, we missed to identify the right match. Please let us know if you find a missing approval date for an ingredient.
For some ingredients, different salt formulations received separate approvals. For example, Imipramine exists in salts with the pamoic and hydrochloric acid. For both of these salts, the approval date of the salt was given to the RxNorm ingredient stating the salt, while the earlier of the approval dates was assigned to the ingredient (without a specific salt):

RxNorm Ingredient
Concept ID Concept Name Approval Date
778268 Imipramine 16-Apr-1959
19135884 Imipramine pamoate 11-Mar-1973
19012477 Imipramine Monohydrochloride 16-Apr-1959

OSCAR - Observational Source Characteristics Analysis Report (OSCAR) Design Specification and Feasibility Assessment

In order to interpret the results of any analysis on a data source, the characteristics of the data source be clearly understood. The Observational Source Characteristics Analysis Report (OSCAR) provides a systematic approach for summarizing all observational healthcare data within the OMOP common data model. The procedure creates structured output of descriptive statistics for all relevant tables within the model to facilitate rapid summary and interpretation of the potential merits of a particular data source for addressing active surveillance needs.

Observational Source Characteristics Analysis Report (OSCAR) and Source Code:
If you have implemented CDM v4.0, use OSCAR for CDM v4.0 otherwise, use OSCAR for CDM v2.0.

OSCAR has many uses, including:

OSCAR provides descriptive statistics that summarizes the entire database as a means to benchmark all studies. The diagram below outlines how we envision OSCAR fitting into the workflow for validating the transformation from raw data to the OMOP common data model.

The only prerequisite for OSCAR is that the program must be applied to a data source that conforms to the OMOP common data model, including all necessary tables and fields, and SAS 9.1 has to be available. OSCAR creates a summary result dataset in a structured format. This dataset contains descriptive statistics for all the various data elements with the common data model, but do not contain any person-level data. Organizations within the OMOP Data Community are encouraged to share these aggregate summary results by loading them into the OMOP Research Lab, where comparative analyses across the different sources can be conducted.

Sample size and Minimal Detectable Relative Risk

This page provides a simple power estimator for drug-outcome research. For each drug ingredient and each Health Outcome of Interest, an estimated minimal detectable relative risk (MDRR) can be computed. This MDRR can be detected given the population size of the database and the frequency of drug and outcome in 10 age and 2 gender strata:

The MDRR decreases over time with more and more drug exposure. It is important for newly introduced drugs to know when sufficient sample size has accumulated to power a MDRR of as low as necessary to detect typical drug outcome risks. This following provides the MDRR over time for detection the 35 HOIs for all drugs introduced to the market between 2003 and 2009 .

A table with all drugs and their FDA approval date can be found here.