Data Dictionary
We describe the logical data model of our database using standard relational database terminology such as table, column, and primary key. The primary key is the collection of columns in a table that uniquely identify a row in that table. Columns tagged with a (PK) symbol are part of the primary key of the table.
Table: VARIANT_TASK
Master list of tasks (categories of variant effect data).
Columns:
CODE (PK) - Unique code for each task
NAME
SOURCE_TYPE
DESCRIPTION
Table: VARIANT_DATA_SOURCE
Master list of sources of data.
Columns:
CODE (PK) - Unique code for each source
NAME
DESCRIPTION
Table: VARIANT_EFFECT_SOURCE
Master list of sources of variant effect score data. Currently all score data originates from a VEP.
Columns:
CODE (PK) - Unique code for each source
NAME
SOURCE_TYPE - Currently, all values are VEP
DESCRIPTION
Table: VARIANT
Master list of variants.
Columns:
GENOME_ASSEMBLY (PK) - Currently, hg38 for all rows.
CHROMOSOME (PK)
POSITION (PK)
REFERENCE_NUCLEOTIDE (PK)
ALTERNATE_NUCLEOTIDE (PK)
PRIOR_GENOME_ASSEMBLY
PRIOR_CHROMOSOME
PRIOR_POSITION
PRIOR_PRIOR_GENOME_ASSEMBLY
PRIOR_PRIOR_CHROMOSOME
PRIOR_PRIOR_POSITION
REFERENCE_AMINO_ACID
ALTERNATE_AMINO_ACID
AMINO_ACID_POSITION
RS_DBSNP
GENE_SYMBOL
ENSEMBL_GENE_ID
ENSEMBL_TRANSCRIPT_ID
ENSEMBL_PROTEIN_ID
ALLELE_FREQUENCY_SOURCE
ALLELE_FREQUENCY
Table: VARIANT_EFFECT_LABEL
Variant labels organized by task.
Columns:
TASK_CODE (PK)
GENOME_ASSEMBLY (PK) - Currently, hg38 for all rows.
CHROMOSOME (PK)
POSITION (PK)
REFERENCE_NUCLEOTIDE (PK)
ALTERNATE_NUCLEOTIDE (PK)
LABEL_SOURCE
RAW_LABEL
BINARY_LABEL
Table: VARIANT_EFFECT_SCORE
Variant effect prediction scores organized by task.
Columns:
TASK_CODE (PK)
GENOME_ASSEMBLY (PK) - Currently, hg38 for all rows.
CHROMOSOME (PK)
POSITION (PK)
REFERENCE_NUCLEOTIDE (PK)
ALTERNATE_NUCLEOTIDE (PK)
SCORE_SOURCE
RAW_SCORE
RANK_SCORE - Normalized score, i.e. value between 0-1
Table: VARIANT_FILTER
Table of system supplied named filters that can be used by the user to restrict the set of variants to be retrieved or to be used in a benchmarking analysis. Filters are defined in terms of a list of genes and/or variants to be included or excluded.
Columns:
CODE (PK) - Unique code for each filter
NAME
INCLUDE_GENES - Y/N, include or exclude genes listed in VARIANT_FILTER_GENE table.
INCLUDE_VARIANTS - Y/N, include or exclude variants listed in VARIANT_FILTER_VARIANT table.
Table: VARIANT_FILTER_GENE
Lists the set of genes to be included or excluded by a filter in the VARIANT_FILTER table.
Columns:
FILTER_CODE (PK) - Refers to a CODE in the variant_filter table.
GENE_SYMBOL (PK)
Table: VARIANT_FILTER_VARIANT
Lists the set of variants to be included or excluded by a filter in the VARIANT_FILTER table.
Columns:
FILTER_CODE (PK) - Refers to a CODE in the variant_filter table.
GENOME_ASSEMBLY (PK)
CHROMOSOME (PK)
POSITION (PK)
REFERENCE_NUCLEOTIDE (PK)
ALTERNATE_NUCLEOTIDE (PK)