Data Dictionary =============== We describe the logical data model of our database using standard relational database terminology such as table, column, and primary key. The primary key is the collection of columns in a table that uniquely identify a row in that table. Columns tagged with a (PK) symbol are part of the primary key of the table. Table: VARIANT_TASK ------------------- Master list of tasks (categories of variant effect data). Columns: * CODE (PK) - Unique code for each task * NAME * SOURCE_TYPE * DESCRIPTION Table: VARIANT_DATA_SOURCE -------------------------- Master list of sources of data. Columns: * CODE (PK) - Unique code for each source * NAME * DESCRIPTION Table: VARIANT_EFFECT_SOURCE ---------------------------- Master list of sources of variant effect score data. Currently all score data originates from a VEP. Columns: * CODE (PK) - Unique code for each source * NAME * SOURCE_TYPE - Currently, all values are VEP * DESCRIPTION Table: VARIANT -------------- Master list of variants. Columns: * GENOME_ASSEMBLY (PK) - Currently, hg38 for all rows. * CHROMOSOME (PK) * POSITION (PK) * REFERENCE_NUCLEOTIDE (PK) * ALTERNATE_NUCLEOTIDE (PK) * PRIOR_GENOME_ASSEMBLY * PRIOR_CHROMOSOME * PRIOR_POSITION * PRIOR_PRIOR_GENOME_ASSEMBLY * PRIOR_PRIOR_CHROMOSOME * PRIOR_PRIOR_POSITION * REFERENCE_AMINO_ACID * ALTERNATE_AMINO_ACID * AMINO_ACID_POSITION * RS_DBSNP * GENE_SYMBOL * ENSEMBL_GENE_ID * ENSEMBL_TRANSCRIPT_ID * ENSEMBL_PROTEIN_ID * ALLELE_FREQUENCY_SOURCE * ALLELE_FREQUENCY Table: VARIANT_EFFECT_LABEL --------------------------- Variant labels organized by task. Columns: * TASK_CODE (PK) * GENOME_ASSEMBLY (PK) - Currently, hg38 for all rows. * CHROMOSOME (PK) * POSITION (PK) * REFERENCE_NUCLEOTIDE (PK) * ALTERNATE_NUCLEOTIDE (PK) * LABEL_SOURCE * RAW_LABEL * BINARY_LABEL Table: VARIANT_EFFECT_SCORE --------------------------- Variant effect prediction scores organized by task. Columns: * TASK_CODE (PK) * GENOME_ASSEMBLY (PK) - Currently, hg38 for all rows. * CHROMOSOME (PK) * POSITION (PK) * REFERENCE_NUCLEOTIDE (PK) * ALTERNATE_NUCLEOTIDE (PK) * SCORE_SOURCE * RAW_SCORE * RANK_SCORE - Normalized score, i.e. value between 0-1 Table: VARIANT_FILTER --------------------- Table of system supplied named filters that can be used by the user to restrict the set of variants to be retrieved or to be used in a benchmarking analysis. Filters are defined in terms of a list of genes and/or variants to be included or excluded. Columns: * CODE (PK) - Unique code for each filter * NAME * INCLUDE_GENES - Y/N, include or exclude genes listed in VARIANT_FILTER_GENE table. * INCLUDE_VARIANTS - Y/N, include or exclude variants listed in VARIANT_FILTER_VARIANT table. Table: VARIANT_FILTER_GENE -------------------------- Lists the set of genes to be included or excluded by a filter in the VARIANT_FILTER table. Columns: * FILTER_CODE (PK) - Refers to a CODE in the variant_filter table. * GENE_SYMBOL (PK) Table: VARIANT_FILTER_VARIANT ----------------------------- Lists the set of variants to be included or excluded by a filter in the VARIANT_FILTER table. Columns: * FILTER_CODE (PK) - Refers to a CODE in the variant_filter table. * GENOME_ASSEMBLY (PK) * CHROMOSOME (PK) * POSITION (PK) * REFERENCE_NUCLEOTIDE (PK) * ALTERNATE_NUCLEOTIDE (PK)