Database Query Guide

Do the following to prepare for querying the database. <config> is the directory where the config file is stored as specified during the installation:

import pandas as pd
from aigct.container import VEBenchmarkContainer

container = VEBenchmarkContainer("<config>/aigct.yaml")

query_mgr = container.query_mgr

query_mgr is an instance of the aigct.query.VEBenchmarkQueryMgr class that contains the query methods. Most methods return the results as a dataframe. Here are some some of the methods available:

# Fetch all tasks (categories of variants that we have data for)

tasks_df = query_mgr.get_tasks()

# Fetch all variants across all tasks

variants_df = query_mgr.get_all_variants()

# Fetch all variant effect sources ,i.e. VEP's for which we have scores,
# for the CANCER task

veps_df = query_mgr.get_variant_effect_sources("CANCER")

# Fetch statics for a specific set of variant effect sources
# for the CANCER task

vep_stats_df = query_mgr.get_variant_effect_source_stats("CANCER",
    ["ALPHAM", "REVEL", "EVE"])

Below we illustrate how detailed selection criteria can be specified for a query method:

from aigct.model import VEQueryCriteria

# Fetch scores for all variant effect sources for the CANCER task.
# Limit to variants found in the MTOR, PLCH2, PIK3CD genes.
# VEQueryCriteria allows you to specify detailed selection criteria
# for the query.

selection_criteria = VeQueryCriteria(gene_symbols=["MTOR", "PLCH2", "PIK3CD"])

scores_df = query_mgr.get_variant_effect_scores("CANCER",
    qry=selection_criteria)

Detailed information about all of the query methods available can be found in the API Documentation for the aigct.query.VEBenchmarkQueryMgr class.