Changes from Vocabulary Version V4

OMOP Implementation Specification

Standard Vocabularies in Observational Data Analysis (October 2013)

2. Changes from Vocabulary Version V4


2.1. Fundamental Changes
In Version 4.0 of the Common Data Model changes have been made to the Vocabulary tables affecting
the content of the vocabulary as following.


2.1.1. Introduction of a Record Lifecycle
The fields valid_start_date, valid_end_date and invalid_reason have been added to the CONCEPT, CONCEPT_RELATIONSHIP and SOURCE _TO_CONCEPT _MAP tables. All concepts now have a defined life cycle and can be launched or deprecated in accordance to the Source Vocabulary or OMOP considerations. Concepts that are deprecated (have a valid_end_date that is not the default end date) also have a reason for invalidation captured in the invalid_reason field. Concepts that are replaced with a new concept are designated "Updated" (U) and concepts that are removed without replacement are "Deprecated" (D). The relationship between deprecated concepts and their replacements is handled through records in the CONCEPT_RELATIONSHIP table.

The default valid_start_date is 1-Jan-1980 and the valid_end_date is 31-Dec-2099 - standing for "always been valid" and "not yet deprecated", respectively. A concept for which no valid_start_date or valid_end_date is known also carries the default date. However, not all concepts have their correct valid_start_dates and valid_end_dates updated: Version 4.0 focused on those vocabularies that either have codes re-used (and therefore the time period of validity is crucial to understand which concept it represents) or for which life cycle information is updated parallel to the existence of a concept in the real world and therefore important for understanding the underlying data (for example drug concepts). Table 3 shows the vocabularies that have active valid date information.
Table 3: Vocabularies with Active Lifecycle Capture

Vocabulary Vocabulary ID
SNOMED-CT 1
ICD-9-Procedure 3
HCPCS 5
LOINC 6
NDF-RT 7
RxNorm 8
MedDRA 15
FDB Indication 19
FDB ETC 20
WHO ATC 21
VA Product 28
SMQ 31
VA Class 32
Cohort 33
OMOP Drug Exposure Type 36
OMOP Condition Occurrence Type 37
OMOP Procedure Occurrence Type 38
OMOP Observation Type 39
DRG 40
MDC 41
APC 42


2.1.2. Handling of Unambiguous Mapping
In the Standard Vocabulary, most records in the SOURCE_TO_CONCEPT_MAP have unique entries for each source_code, source_vocabulary_id, and target_vocabulary_id combinations. However, in a few cases it is impossible to create a one-to-one or many-to-one mapping. For those cases, the is_primary field has been established to mark one of the mapping entries as the unique primary record. This is in contrast to previous versions of the Vocabulary where such ambiguous cases had been handled through surrogate or intermediate concepts, which then had hierarchical relationships to Standard Vocabulary concepts. Vocabularies 53 "Intermediate Condition Terminology", 54 "Intermediate Drug Terminology", 57 "Intermediate Generic Terminology" and 55 "Intermediate Procedure Terminology" are not used any longer for this purpose, and all the concepts in V3.0 have been purged.


2.1.3. Handling of Hierarchical Relationships between Concepts
Records in the CONCEPT_RELATIONSHIP table can define any direct relationship between concepts. Some of these relationships are defined hierarchically, i.e., organized into orders or ranks each concept subordinate to the one above it. For example, drug class concepts have hierarchical relationships to the drug ingredient concepts. Such relationships are marked in the is_hierarchical field in the RELATIONSHIP table.

The CONCEPT_ANCESTOR table also contains hierarchical relationships between concepts, but in contrast to the concept relationships, records in the CONCEPT_ANCESTOR table can represent relationships that span multiple levels of the hierarchy. Ancestors are chained together from individual direct relationships. However, not all of these direct concept-to-concept relationships are used for constructing the CONCEPT_ANCESTOR table: Some relationships are hierarchical but not desired to form ancestry (e.g. relationships between drugs and contraindications), and other concepts depicting equivalence between concepts are needed to walk from one vocabulary to another (e.g. drug product equivalence between RxNorm based concepts and VA Product based concepts, see below). Relationships that are used to construct the CONCEPT_ANCESTOR table are flagged in the defines_ancestry field.


2.1.4. Changes in Table and Field Names
Table 4 captures the changes in table names, field names and field types between V3.0 and V4.0 of the Standard Vocabulary.
Table 4: Changes in Table and Field Names

Change affecting Nature of change
Table name From VOCABULARY_REF to VOCABULARY
Table name From RELATIONSHIP_TYPE to RELATIONSHIP
Field name and type in CONCEPT table From concept_vocabulary_code to vocabulary_id, and
from string to integer
Field name in CONCEPT_SYNONYM table From description_name to concept_synonym_name
Field name in CONCEPT_RELATIONSHIP table From relationship_type to relationship_id
Field name in SOURCE_TO_CONCEPT_MAP

table
From source_vocabulary_code to
source_vocabulary_id
Field name in SOURCE_TO_CONCEPT_MAP

table
From target_vocabulary_code to target_vocabulary_id
Field name in VOCABULARY table From vocabulary_code to vocabulary_id
Field name in RELATIONSHIP table From relationship_type to relationship_id
Field name in RELATIONSHIP table From relationship_description to relationship_name


2.1.5. Complete Code Lists
Vocabularies that are standard in the OMOP CDM are loaded into the CONCEPT table as a comprehensive list of concepts, i.e., all existing codes in the Source Vocabulary will have an equivalent record as a concept. For those vocabularies that are not standard, but are mapped to a standard, an
attempt is made to list all source codes in the SOURCE_TO_CONCEPT_MAP table. Those codes without a mapping into the standard are "mapped" to target_concept_id=0 and target_vocabulary_id=0. In V4.0, the following Vocabularies are comprehensively listed (table 5).
Table 5: Source Vocabularies Found in the SOURCE_TO_CONCEPT_MAP

Vocabulary ID
ICD-9-CM 2
ICD-9-Procedure 3
HCPCS 5
LOINC 6
NDC 9
Read 17
FDB Indication 19
Multilex 22
VA Product 28
FDB Genseqno 53
ICD-10-CM 34
ICD-10-PCS 35
FDA SPL 50


2.2. Changes to the Vocabularies


2.2.1. Added Vocabularies
With Version 4.0, a number of new vocabularies including relationships to other vocabularies have been added.
Table 6: New Vocabularies Added to Version 4.0

Concept ID Concept Name Concept Level Concept Class
Standard Vocabularies
VA Class 32 NDF-RT, RxNorm 7, 8
Cohort 33 SNOMED-CT, RxNorm,

MedDRA
1, 8, 15
DRG 40 MDC 41
MDC 41 DRG 40
APC 42    
Revenue Code 43    
Ethnicity 44    
NUCC 47    
CMS Specialty 48 NUCC 47
LOINC Multidimensional

Classification
49 LOINC 6
Source Vocabularies with mappings to Standard Vocabularies
ICD-10-CM 34 SNOMED-CT 1
ICD-10-PCS 35 SNOMED-CT 1
NLM MeSH 46 RxNorm 8
FDA SPL 50 RxNorm 8


2.2.2. Added Type Concept Vocabularies
Type concepts are specialty concepts with the purpose of indicating where data were derived from within the source.
Table 7: Type Concepts for Drug Exposure, Condition Occurrence, Procedure Occurrence, Observation and Death

Vocabulary ID
Drug Exposure Type 36
Condition Occurrence Type 37
Procedure Occurrence Type 38
Observation Type 39
Death Type 45


2.2.3. Deprecated Vocabularies
In table 8, listed are vocabularies from the earlier version that are now not part of version 4.0.
Table 8: Vocabularies Deprecated

Vocabulary ID Reason
Zip Code 25 Zip Codes, Census Regions and States are no longer
used as Concepts in the Person table. Instead, ZIP codes are now
part of the Location table.
US Census Region 26
US State or Territory 27
Health Outcome of Interest 29 HOIs are now captured as Cohort concepts, vocabulary_id 33
Drug of Interest 30 DOIs are now captured as Cohort concepts, vocabulary_id 33
OMOP Intermediate Condition 53 Intermediate concepts are no longer
used for handling ambiguous one_to_many mappings. Instead, all
mappings point directly to target vocabularies and ambiguity is handled
through the introduction of a primary map (see above).
OMOP Intermediate Drug 54
OMOP Intermediate Generic 55
OMOP Intermediate Procedure 57


2.2.4. Changes in Individual Vocabularies
Significant effort has been spent improving the quality of existing vocabularies, and not every detail can be listed here. However, the following gives an overview of the more fundamental changes and corrections applied to individual vocabularies:

  • SNOMED-CT (vocabulary_id 1): Improved relationship within SNOMED-CT and abolished hierarchical "Subsumes" relationships between concepts of different concept classes
  • ICD-9-CM V-codes (vocabulary_id 2): Added mapping records to SNOMED-CT "Procedure" codes, as many of these V-Codes depict situations where the patient was administered a procedure, rather than diagnosed with a condition.
  • HCPCS (vocabulary_id 5): Added mapping records to RxNorm-based drug products (Clinical and Branded Drugs) and drug ingredients.
  • LOINC (vocabulary_id 9): Added mapping records to SNOMED-CT "Procedure" codes, as many of these LOINC codes depict situations where the patient was exposed to a diagnostic procedure to obtain the test result. Also added the LOINC Multidimensional Classification vocabulary_id 49) to provide some organization to the various LOINC codes.
  • NDF-RT (vocabulary_id 7): Relationships were added between Indications / Contraindications to SNOMED-CT, allowing a direct comparison between conditions in the Condition table and the Indications or Contraindications defined for each drug.
  • CDC Race and Ethnicity (vocabulary_id 13): Introduced a 2-layer Race code system and moved Hispanic from a Race to Ethnicity (vocabulary_id 44) according to CDC recommendations.
  • MedDRA (vocabulary_id 15): Changed from being a terminology (leaf-level vocabulary) to a hierarchical classification that can be used in conjunction with SNOMED-CT. Direct ICD-9-CM to MedDRA mapping is maintained, though, but for ETL only the ICD-9-CM to SNOMED-CT
    mapping is recommended.
  • SMQ (vocabulary_id 31): Revised concepts to include the notion of narrow and broad definition as well as introduced an internal hierarchy to SMQs connected through "Subsumes" relationships.
  • VA Class (vocabulary_id 32): Added as Standard Vocabulary and linked to RxNorm-based drug products (Clinical and Branded Drugs) and drug ingredients as an additional drug classification.


2.3. Changes to Relationships
As a fundamental change, each relationship between two concepts is now represented in the CONCEPT_RELATIONSHIP table as two records, reflecting both directions of that relationship. For example, if A is related to B, the relationship is represented as A->B and B<-A. All previously existing relationship records – which were unidirectional – are now duplicated and the relationship_id field can be used to determine the direction of the relationship. As an example, relationship_id 10 ("Subsumes") now exists in a reverse direction 144 ("Is a"), and all records with relationship_id 10 now also exist as relationship_id 144, but with concepts in concept_id_1 and concept_id_2 reversed.

Individual relationships have been revised as follows:

  • Direct (shortcut) relationships between drug products (Clinical and Branded Drugs) and Ingredients are no longer available because there is no need for them: RxNorm maintains this relationship through Drug Components, and the CONCEPT_ANCESTOR table will continue to have the direct relationship.
  • Similarly, the relationship between RxNorm-based Ingredient and NDF-RT-based Chemical Structure is removed, as during the construction of the CONCEPT_ANCESTOR table the entire relationship network of NDF-RT and RxNorm is traversed.
  • Direct (shortcut) relationships between drug products (Clinical and Branded Drugs) and Ingredients are no longer available because there is no need for them: RxNorm maintains this relationship through Drug Components, and the CONCEPT_ANCESTOR table will continue to have the direct relationship.
  • Similarly, the relationship between RxNorm-based Ingredient and NDF-RT-based Chemical Structure is removed, as during the construction of the CONCEPT_ANCESTOR table the entire relationship network of NDF-RT and RxNorm is traversed.


2.4. Changes to Ancestry Relationships
Records in the CONCEPT_ANCESTOR table are now automatically constructed by traversing the entire network of concepts (even if intermediate concepts in a chain are not part of the Standard Vocabulary, concept_level > 0) and their relationships that have the field defines_ancestry with value=1. In addition, the CONCEPT_ANCESTOR table now contains records linking each concept to itself, as long as it has at least one other hierarchical ancestry relationship. This is a technical change to alleviate queries aiming to collect concepts and all of their descendants from the need to add the query concepts themselves to the result.