Table name: stem_table
Reading from gp_clinical
The gp_clinical table contains clinical records from primary care linked data of consented UKB participants. The variables include date and clinical code (Read v2 or CTV3) for primary care events, such as consultations, diagnoses, history, symptoms, procedures, laboratory tests and administrative information. The records are retrieved from four different source systems (providers): EMIS/Vision Scotland, EMIS/Vision Wales, TPP England and Vision England (Note: EMIS England is missing). Coded data has been obtained for 45% of the UKB participants.
Notes on variables:
read_2
andread_3
: clinical codes, either Read v2 or CTV3 (i.e. Read v3) - mutually exclusive fields. Mappings to standard OMOP concept_ids are only available for Read v2 codes in Athena, but CTV3 codes that overlap with Read v2 can be also be mapped. Read v2 and CTV3 mappings to SNOMED are available from the NHS (PBCLReadSNOMEDmap20180401.txt). We removed CTV3 codes that were identical to a READ v2 code from the CTV3 mapping table and mapped as being a READ v2 code. Note that in the source data, Read v2 codes may appear with or without trailing dots (e.g.123..
vs123..00
). In the former case, before attempting to retrieve the mapping to standard concept_id, we extend the code with the missing cyphers, first by applying an extension vocabulary for specific curated codes, and otherwise by appending the default extension00
.value1
,value2
,value3
fields: the meaning of these fields differ perdata_provider
andread_2
/read_3
code combination, though if provided,value3
generally seems to refer to units. Provider-specific mapping logic needs to be implemented, for some of the values this is available at https://github.com/spiros/ukb-biomarker-phenotypes. See also: data quality notesevent_dt
(date) field: to protect individuals, UKB makes alterations to dates in relation to the participant’s date of birth as follows: 01/01/1901 (before birth), 02/02/1902 (on birth), 03/03/1903 (after birth), 07/07/2037 (future).- In the subsequent mapping step from stem to domain table, we force these records to be mapped to the Measurement domain. Some READ codes map to SNOMED concepts from the condition domain (e.g. Blood pressure reading). We would lose the numeric value if we map from the stem table to the condition domain. Therefore, we choose not to follow the concept domain, but always map to the Measurement table.
After mapping to the stem table, the records are mapped to their respective domains based on the domain of the concept_id
(see stem mapping specification).
If date in 2037, skip the record.
Destination Field | Source field | Logic | Comment field |
---|---|---|---|
id | |||
domain_id | ‘Measurement’ | All records from gp_clinical will be inserted in the measurement table | |
person_id | eid | ||
start_date | event_dt | If date empty, ignore record. Only 0.1% of the records have an empty date If 1902-02-02 or 1903-03-3, set date to yob-07-01 (field 34 in baseline) | |
start_datetime | event_dt | ||
visit_occurrence_id | eid event_dt | Look up visit occurrence by unique eid+event_dt+data_provider | |
provider_id | |||
concept_id | read_2 read_3 | Map read_2 code to OMOP standard concept_id, if not available map read_3 code. | |
source_value | read_2 read_3 | Either field will be available | |
source_concept_id | read_2 read_3 | Either field will be available. Use (non-standard) OMOP concept_id for Read code | |
type_concept_id | 32817 EHR | ||
end_date | |||
end_datetime | |||
verbatim_end_date | |||
days_supply | |||
dose_unit_source_value | |||
lot_number | |||
modifier_concept_id | |||
modifier_source_value | |||
operator_concept_id | value3 | IF prefixed with OPR | |
modifier_source_value | |||
quantity | |||
range_high | |||
range_low | |||
refills | |||
route_concept_id | |||
route_source_value | |||
sig | |||
stop_reason | |||
unique_device_id | |||
unit_concept_id | value3 | Map to UCUM (standard OMOP unit concept) | |
unit_source_value | value3 | ||
value_as_concept_id | value1 value2 | Which field to use depends on the read_code and data_provider, specific mapping logic. | |
value_as_number | value1 value2 | Which field to use depends on the read_code and data_provider, specific mapping logic. | |
value_as_string | |||
value_source_value | |||
anatomic_site_concept_id | |||
disease_status_concept_id | |||
specimen_source_id | |||
anatomic_site_source_value | |||
disease_status_source_value | |||
condition_status_concept_id | |||
condition_status_source_value | |||
qualifier_concept_id | |||
qualifier_source_value | |||
data_source | data_provider | Map as “GP-“ + number found in data_provider, e.g. GP-1, GP-2, GP-3, or GP-4 |