aigct.model

Classes that represent model objects.

Model objects are containers for data. They generally do not have behavior associated with them.

Classes

VariantId

Model object that represents a variant id.

VariantEffectSource

Model object that represents a variant effect source.

VariantFilter

Model object that represents named variant filter query. The

VEQueryCriteria

Model object that represents variant query criteria.

VEAnalysisResult

Represents the result of calling VEAnalyzer.compute_metrics.

TaskPkViolations

PkViolations

VEAnalysisCalibrationResult

Represents the result of calling VEAnalyzer.compute_calibration_metrics.

Module Contents

class aigct.model.VariantId[source]

Model object that represents a variant id.

Attributes

genome_assemblystr

genome assembly symbol, i.e. hg38

chromosomestr

chromosome

positionint

position

reference_nucleotidestr

reference nucleotide

alternate_nucleotidestr

alternate_nucleotide

genome_assembly: str[source]
chromosome: str[source]
position: int[source]
reference_nucleotide: str[source]
alternate_nucleotide: str[source]
class aigct.model.VariantEffectSource[source]

Model object that represents a variant effect source.

Attributes

codestr

A unique code that identifies the variant effect source

namestr

A unique name of the source

source_typestr

i.e. VEP

descriptionstr

Description

code: str[source]
name: str[source]
source_type: str[source]
description: str[source]
class aigct.model.VariantFilter[source]

Model object that represents named variant filter query. The filter criteria consists either of a list of genes or a list of variant id’s or both.

Attributes

filterSeries

A series with the the unique code, name, description of the filter.

filter_genesDataFrame

A dataframe of gene symbols associated with the filter. If None then there filter_variants must not be None.

filter_variantsDataFrame

A dataframe of variant id’s associated with the filter. If None then there filter_genes must not be None.

filter: pandas.Series[source]
filter_genes: pandas.DataFrame[source]
filter_variants: pandas.DataFrame[source]
class aigct.model.VEQueryCriteria[source]

Model object that represents variant query criteria.

Attributes

gene_symbolslist or DataFrame, optional

List of gene symbols

include_genesbool, optional

If gene_symbols is provided, indicates whether to limit variants to associated with those gene_symbols to exclude variants associated with the gene_symbols.

variant_idsDataFrame, optional

List of variant ids. The dataframe is expected to have the following columns: GENOME_ASSEMBLY, CHROMOSOME, POSITION, REFERENCE_NUCLEOTIDE, ALTERNATE_NUCLEOTIDE If the column names are different specify a value for column_name_map mapping the column names to the expected names.

include_variant_idsbool, optional

If variant_ids is provided, indicates whether to limit variants to the variant_ids provided or to fetch all variants but those in variant_ids

column_name_mapDict, optional

A dictionary that maps the column names in variant_ids to the expected column names.

allele_frequency_operatorstr, optional

If allele_frequency is provided, this is one of “eq”, “gt”, “lt”, “ge”, “le”. i.e. limit variants to those whose allele_frequency is equal to, greater than, etc. the allele_frequency.

allele_frequencyfloat, optional

Used in conjunction to allele_frequency_operator to limit variants to those meeting a certain allele_frequency criteria.

filter_namesstr | list[str], optional

The name(s) of a system filter that can be used to limit the variants returned. If more than one is given then the filters are combined using a logical OR.

gene_symbols: list[str] | pandas.DataFrame | pandas.Series = None[source]
include_genes: bool = True[source]
variant_ids: pandas.DataFrame = None[source]
include_variant_ids: bool = True[source]
column_name_map: Dict = None[source]
allele_frequency_operator: str = '='[source]
allele_frequency: float = None[source]
filter_names: str | list[str] = None[source]
class aigct.model.VEAnalysisResult[source]

Represents the result of calling VEAnalyzer.compute_metrics.

Attributes

num_variants_includedint

The total number of unique variants included in the analysis across all veps.

num_user_variantsint

The number of user supplied variants included in the analysis

user_vep_namestr

Name of user vep

general_metricsDataFrame

Has the following columns: SCORE_SOURCE - Short unique vep identifier NUM_VARIANTS, NUM_POSITIVE_LABELS, NUM_NEGATIVE_LABELS, SOURCE_NAME - Name of vep

roc_metricsDataFrame, optional

Roc metrics with columns: SCORE_SOURCE, ROC_AUC, EXCEPTION, SOURCE_NAME EXCEPTION would store an exception message in the event the roc could not be computed for that vep.

pr_metricsDataFrame, optional

Precision/Recall metrics containing columns: SCORE_SOURCE, PR_AUC, SOURCE_NAME

mwu_metricsDataFrame, optional

Mann-Whitney U metrics containing columns: SCORE_SOURCE, NEG_LOG10_MWU_PVAL, SOURCE_NAME

gene_general_metricsDataFrame, optional

Gene-level general metrics with columns: SCORE_SOURCE, GENE_SYMBOL, NUM_VARIANTS, NUM_POSITIVE_LABELS, NUM_NEGATIVE_LABELS, SOURCE_NAME

gene_roc_metricsDataFrame, optional

Gene-level ROC metrics with columns: SCORE_SOURCE, GENE_SYMBOL, ROC_AUC, EXCEPTION, SOURCE_NAME

gene_pr_metricsDataFrame, optional

Gene-level precision/recall metrics with columns: SCORE_SOURCE, GENE_SYMBOL, PR_AUC, SOURCE_NAME

gene_mwu_metricsDataFrame, optional

Gene-level Mann-Whitney U metrics with columns: SCORE_SOURCE, GENE_SYMBOL, NEG_LOG10_MWU_PVAL, SOURCE_NAME

roc_curve_coordinatesDataFrame, optional

Columns: SCORE_SOURCE, FALSE_POSITIVE_RATE, TRUE_POSITIVE_RATE, THRESHOLD

pr_curve_coordinatesDataFrame, optional

Columns: SCORE_SOURCE, PRECISION, RECALL, THRESHOLD

gene_roc_curve_coordinatesDataFrame, optional

Gene-level ROC curve coordinates with columns: SCORE_SOURCE, GENE_SYMBOL, FALSE_POSITIVE_RATE, TRUE_POSITIVE_RATE, THRESHOLD

gene_pr_curve_coordinatesDataFrame, optional

Gene-level precision/recall curve coordinates with columns: SCORE_SOURCE, GENE_SYMBOL, PRECISION, RECALL, THRESHOLD

variants_includedDataFrame, optional

List of variants included for each vep included the user vep. Columns: SCORE_SOURCE, GENOME_ASSEMBLY, CHROMOSOME, POSITION, REFERENCE_NUCLEOTIDE, ALTERNATE_NUCLEOTIDE

gene_unique_variant_counts_dfDataFrame, optional

Count of unique variants per gene across all vepswith columns: GENE_SYMBOL, NUM_UNIQUE_VARIANTS

num_variants_included: int[source]
num_user_variants: int[source]
user_vep_name: str[source]
general_metrics: pandas.DataFrame[source]
roc_metrics: pandas.DataFrame[source]
pr_metrics: pandas.DataFrame[source]
mwu_metrics: pandas.DataFrame[source]
gene_general_metrics: pandas.DataFrame[source]
gene_roc_metrics: pandas.DataFrame[source]
gene_pr_metrics: pandas.DataFrame[source]
gene_mwu_metrics: pandas.DataFrame[source]
roc_curve_coordinates: pandas.DataFrame[source]
pr_curve_coordinates: pandas.DataFrame[source]
gene_roc_curve_coordinates: pandas.DataFrame[source]
gene_pr_curve_coordinates: pandas.DataFrame[source]
variants_included: pandas.DataFrame[source]
gene_unique_variant_counts_df: pandas.DataFrame[source]
class aigct.model.TaskPkViolations[source]
dups_found: bool[source]
variant_effect_label_dups: pandas.DataFrame[source]
variant_effect_score_dups: pandas.DataFrame[source]
variant_filter_dups: pandas.DataFrame[source]
variant_filter_gene_dups: pandas.DataFrame[source]
variant_filter_variant_dups: pandas.DataFrame[source]
class aigct.model.PkViolations[source]
dups_found: bool[source]
variant_dups: pandas.DataFrame[source]
variant_effect_source_dups: pandas.DataFrame[source]
task_violations: dict[str, TaskPkViolations][source]
class aigct.model.VEAnalysisCalibrationResult[source]

Represents the result of calling VEAnalyzer.compute_calibration_metrics.

Attributes

num_variants_includedint

The total number of unique variants included in the calibration analysis.

vep_namestr

Name of the variant effect predictor (VEP) used in the calibration. It could be system vep or a user supplied vep name.

pr_curve_coordinates_dfDataFrame

Precision-Recall curve coordinates for variants with columns: SCORE_SOURCE, PRECISION, RECALL, THRESHOLD

f1_curve_coordinates_dfDataFrame

f1 score curve coordinates for variants with columns: F1_SCORE, THRESHOLD

score_pathogenic_fraction_dfDataFrame

Statistics about positive and negative variants in different score bins. The variants are grouped into equal sized bins based on their score and the mean score and fraction of positive (pathogenic) variants in each bin is computed. Columns: SCORE_RANGE, LEFT_BOUNDARY_EXCLUSIVE, RIGHT_BOUNDARY_INCLUSIVE, MEAN_SCORE, NUM_VARIANTS, NUM_POSITIVE_LABELS, NUM_NEGATIVE_LABELS

scores_and_labels_dfDataFrame

List of variants included in the calibration analysis with columns: GENOME_ASSEMBLY, CHROMOSOME, POSITION, REFERENCE_NUCLEOTIDE, ALTERNATE_NUCLEOTIDE, BINARY_LABEL, RANK_SCORE

num_variants_included: int[source]
vep_name: str[source]
pr_curve_coordinates_df: pandas.DataFrame[source]
f1_curve_coordinates_df: pandas.DataFrame[source]
score_pathogenic_fraction_df: pandas.DataFrame[source]
scores_and_labels_df: pandas.DataFrame[source]