aigct.query

Classes and methods to query the variant repository. This layer provides an abstraction layer that sits on top of the data access layer in the repository module. It uses the repository module to access the raw data and includes methods to optionally transform the data to make it more meaningful or presentation ready to the caller.

Classes

VEBenchmarkQueryMgr

Methods to query the variant repository

Functions

cleanup_variant_query_params(params)

Module Contents

aigct.query.cleanup_variant_query_params(params: aigct.model.VEQueryCriteria)[source]
class aigct.query.VEBenchmarkQueryMgr(variant_effect_label_repo: aigct.repository.VariantEffectLabelRepository, variant_repo: aigct.repository.VariantRepository, variant_task_repo: aigct.repository.VariantTaskRepository, variant_effect_source_repo: aigct.repository.VariantEffectSourceRepository, variant_effect_score_repo: aigct.repository.VariantEffectScoreRepository, variant_filter_repo: aigct.repository.VariantFilterRepository)[source]

Methods to query the variant repository

_variant_effect_label_repo[source]
_variant_repo[source]
_variant_task_repo[source]
_variant_effect_source_repo[source]
_variant_effect_score_repo[source]
_variant_filter_repo[source]
get_tasks() pandas.DataFrame[source]

Get all tasks

get_all_variants() pandas.DataFrame[source]
get_variants(qry: aigct.model.VEQueryCriteria) pandas.DataFrame[source]

Fetch variants based on query criteria.

Parameters

qryVEQueryCriteria

See description of VEQueryCriteria in model package. Specifies criteria that would limit the set of variants to be retrieved. The filter_names attribute is ignored.

Returns

DataFrame

get_variant_effect_sources(task_code: str = None) pandas.DataFrame[source]
static _compute_variant_counts(group) pandas.Series[source]
get_variant_effect_source_stats(task_code: str, variant_effect_sources=None, include_variant_effect_sources: bool = True, qry: aigct.model.VEQueryCriteria = None) pandas.DataFrame[source]

Get all variant effect sources for a task along with the number of variants, number of positive labels, number of negative labels, number of genes for each source.

Parameters

task_code : str

variant_effect_sourceslist, optional

If specified it would restrict the results based on system supplied vep’s in this list.

include_variant_effect_sourcesbool, optional

If variant_effect_source is specified, indicates whether to limit the results to sources in variant_effect_sources or not in variant_effect_sources.

qryVEQueryCriteria, optional

See description of VEQueryCriteria in model package. Specifies criteria that would limit the set of variants to be retrieved.

Returns

DataFrame

get_all_variant_effect_source_stats() pandas.DataFrame[source]
get_all_task_variant_effect_label_stats() pandas.DataFrame[source]

Returns one row per task with number of variants, number of positive labels, number of negative labels, number of genes.

Returns

DataFrame

get_variant_effect_scores(task_code: str, variant_effect_sources=None, include_variant_effect_sources: bool = True, qry: aigct.model.VEQueryCriteria = None) pandas.DataFrame[source]

Fetches variant effect scores for variant effect sources.

Parameters

task_codestr

task code

variant_effect_sourceslist, optional

If specified it would restrict the results based on system supplied vep’s in this list.

include_variant_effect_sourcesbool, optional

If variant_effect_source is specified, indicates whether to limit the results to sources in variant_effect_sources or not in variant_effect_sources.

qryVEQueryCriteria, optional

See description of VEQueryCriteria in model package. Specifies criteria that would limit the set of variants to be retrieved.

Returns

DataFrame

get_variants_by_task(task_code: str, qry: aigct.model.VEQueryCriteria = None) pandas.DataFrame[source]

Fetches variants by task. The optional parameters are filter criteria used to limit the set of variants returned.

Parameters

task_code : str

qryVEQueryCriteria, optional

See description of VEQueryCriteria in model package. Specifies criteria that would limit the set of variants to be retrieved.

Returns

DataFrame

get_variant_distribution(task_code: str, by: str = 'gene', qry: aigct.model.VEQueryCriteria = None) pandas.DataFrame[source]

Fetches the distribution of variants by gene or chromsome. For each gene/chromosome lists number of variants for which we have labels along with the number of positive and negative label counts.

Parameters

task_codestr

Task code

bystr

Values are gene or chromosome. Specifies the type of distribution to return.

qryVEQueryCriteria, optional

See description of VEQueryCriteria in model package. Specifies criteria that would limit the set of variants to be retrieved.

Returns

DataFrame

get_variant_filter(task_code: str, filter_name: str) aigct.model.VariantFilter[source]

Return a variant filter for a task by name.

Returns

VariantFilter

Object containing list of genes/variant id’s included in the filter. See description of the object.

get_all_variant_filters(task_code: str) dict[str, pandas.DataFrame][source]

Return basic descriptive information about all variant filters for a task.

Returns

dict[str, pd.DataFrame]

A dictionary of 3 data frames with the following keys: filter_df - Data frame of filters containing CODE, NAME, DESCRIPTION, etc. filter_gene_df - Data frame of genes associated with each filter filter_variant_df - Data frame of variants associated with each filter