aigct.model

Classes that represent model objects.

Model objects are containers for data. They generally do not have behavior associated with them.

Classes

`VariantId`	Model object that represents a variant id.
`VariantEffectSource`	Model object that represents a variant effect source.
`VariantFilter`	Model object that represents named variant filter query. The
`VEQueryCriteria`	Model object that represents variant query criteria.
`VEAnalysisResult`	Represents the result of calling VEAnalyzer.compute_metrics.
`TaskPkViolations`
`PkViolations`
`VEAnalysisCalibrationResult`	Represents the result of calling VEAnalyzer.compute_calibration_metrics.

Module Contents

class aigct.model.VariantId[source]

Model object that represents a variant id.

Attributes

genome_assemblystr: genome assembly symbol, i.e. hg38
chromosomestr: chromosome
positionint: position
reference_nucleotidestr: reference nucleotide
alternate_nucleotidestr: alternate_nucleotide

genome_assembly: str[source]

chromosome: str[source]

position: int[source]

reference_nucleotide: str[source]

alternate_nucleotide: str[source]

class aigct.model.VariantEffectSource[source]

Model object that represents a variant effect source.

Attributes

codestr: A unique code that identifies the variant effect source
namestr: A unique name of the source
source_typestr: i.e. VEP
descriptionstr: Description

code: str[source]

name: str[source]

source_type: str[source]

description: str[source]

class aigct.model.VariantFilter[source]

Model object that represents named variant filter query. The filter criteria consists either of a list of genes or a list of variant id’s or both.

Attributes

filterSeries: A series with the the unique code, name, description of the filter.
filter_genesDataFrame: A dataframe of gene symbols associated with the filter. If None then there filter_variants must not be None.
filter_variantsDataFrame: A dataframe of variant id’s associated with the filter. If None then there filter_genes must not be None.

filter: pandas.Series[source]

filter_genes: pandas.DataFrame[source]

filter_variants: pandas.DataFrame[source]

class aigct.model.VEQueryCriteria[source]

Model object that represents variant query criteria.

Attributes

gene_symbolslist or DataFrame, optional: List of gene symbols
include_genesbool, optional: If gene_symbols is provided, indicates whether to limit variants to associated with those gene_symbols to exclude variants associated with the gene_symbols.
variant_idsDataFrame, optional: List of variant ids. The dataframe is expected to have the following columns: GENOME_ASSEMBLY, CHROMOSOME, POSITION, REFERENCE_NUCLEOTIDE, ALTERNATE_NUCLEOTIDE If the column names are different specify a value for column_name_map mapping the column names to the expected names.
include_variant_idsbool, optional: If variant_ids is provided, indicates whether to limit variants to the variant_ids provided or to fetch all variants but those in variant_ids
column_name_mapDict, optional: A dictionary that maps the column names in variant_ids to the expected column names.
allele_frequency_operatorstr, optional: If allele_frequency is provided, this is one of “eq”, “gt”, “lt”, “ge”, “le”. i.e. limit variants to those whose allele_frequency is equal to, greater than, etc. the allele_frequency.
allele_frequencyfloat, optional: Used in conjunction to allele_frequency_operator to limit variants to those meeting a certain allele_frequency criteria.
filter_namesstr | list[str], optional: The name(s) of a system filter that can be used to limit the variants returned. If more than one is given then the filters are combined using a logical OR.

gene_symbols: list[str] | pandas.DataFrame | pandas.Series = None[source]

include_genes: bool = True[source]

variant_ids: pandas.DataFrame = None[source]

include_variant_ids: bool = True[source]

column_name_map: Dict = None[source]

allele_frequency_operator: str = '='[source]

allele_frequency: float = None[source]

filter_names: str | list[str] = None[source]

class aigct.model.VEAnalysisResult[source]

Represents the result of calling VEAnalyzer.compute_metrics.

Attributes

num_variants_includedint: The total number of unique variants included in the analysis across all veps.
num_user_variantsint: The number of user supplied variants included in the analysis
user_vep_namestr: Name of user vep
general_metricsDataFrame: Has the following columns: SCORE_SOURCE - Short unique vep identifier NUM_VARIANTS, NUM_POSITIVE_LABELS, NUM_NEGATIVE_LABELS, SOURCE_NAME - Name of vep
roc_metricsDataFrame, optional: Roc metrics with columns: SCORE_SOURCE, ROC_AUC, EXCEPTION, SOURCE_NAME EXCEPTION would store an exception message in the event the roc could not be computed for that vep.
pr_metricsDataFrame, optional: Precision/Recall metrics containing columns: SCORE_SOURCE, PR_AUC, SOURCE_NAME
mwu_metricsDataFrame, optional: Mann-Whitney U metrics containing columns: SCORE_SOURCE, NEG_LOG10_MWU_PVAL, SOURCE_NAME
gene_general_metricsDataFrame, optional: Gene-level general metrics with columns: SCORE_SOURCE, GENE_SYMBOL, NUM_VARIANTS, NUM_POSITIVE_LABELS, NUM_NEGATIVE_LABELS, SOURCE_NAME
gene_roc_metricsDataFrame, optional: Gene-level ROC metrics with columns: SCORE_SOURCE, GENE_SYMBOL, ROC_AUC, EXCEPTION, SOURCE_NAME
gene_pr_metricsDataFrame, optional: Gene-level precision/recall metrics with columns: SCORE_SOURCE, GENE_SYMBOL, PR_AUC, SOURCE_NAME
gene_mwu_metricsDataFrame, optional: Gene-level Mann-Whitney U metrics with columns: SCORE_SOURCE, GENE_SYMBOL, NEG_LOG10_MWU_PVAL, SOURCE_NAME
roc_curve_coordinatesDataFrame, optional: Columns: SCORE_SOURCE, FALSE_POSITIVE_RATE, TRUE_POSITIVE_RATE, THRESHOLD
pr_curve_coordinatesDataFrame, optional: Columns: SCORE_SOURCE, PRECISION, RECALL, THRESHOLD
gene_roc_curve_coordinatesDataFrame, optional: Gene-level ROC curve coordinates with columns: SCORE_SOURCE, GENE_SYMBOL, FALSE_POSITIVE_RATE, TRUE_POSITIVE_RATE, THRESHOLD
gene_pr_curve_coordinatesDataFrame, optional: Gene-level precision/recall curve coordinates with columns: SCORE_SOURCE, GENE_SYMBOL, PRECISION, RECALL, THRESHOLD
variants_includedDataFrame, optional: List of variants included for each vep included the user vep. Columns: SCORE_SOURCE, GENOME_ASSEMBLY, CHROMOSOME, POSITION, REFERENCE_NUCLEOTIDE, ALTERNATE_NUCLEOTIDE
gene_unique_variant_counts_dfDataFrame, optional: Count of unique variants per gene across all vepswith columns: GENE_SYMBOL, NUM_UNIQUE_VARIANTS

num_variants_included: int[source]

num_user_variants: int[source]

user_vep_name: str[source]

general_metrics: pandas.DataFrame[source]

roc_metrics: pandas.DataFrame[source]

pr_metrics: pandas.DataFrame[source]

mwu_metrics: pandas.DataFrame[source]

gene_general_metrics: pandas.DataFrame[source]

gene_roc_metrics: pandas.DataFrame[source]

gene_pr_metrics: pandas.DataFrame[source]

gene_mwu_metrics: pandas.DataFrame[source]

roc_curve_coordinates: pandas.DataFrame[source]

pr_curve_coordinates: pandas.DataFrame[source]

gene_roc_curve_coordinates: pandas.DataFrame[source]

gene_pr_curve_coordinates: pandas.DataFrame[source]

variants_included: pandas.DataFrame[source]

gene_unique_variant_counts_df: pandas.DataFrame[source]

class aigct.model.TaskPkViolations[source]

dups_found: bool[source]

variant_effect_label_dups: pandas.DataFrame[source]

variant_effect_score_dups: pandas.DataFrame[source]

variant_filter_dups: pandas.DataFrame[source]

variant_filter_gene_dups: pandas.DataFrame[source]

variant_filter_variant_dups: pandas.DataFrame[source]

class aigct.model.PkViolations[source]

dups_found: bool[source]

variant_dups: pandas.DataFrame[source]

variant_effect_source_dups: pandas.DataFrame[source]

task_violations: dict[str, TaskPkViolations][source]

class aigct.model.VEAnalysisCalibrationResult[source]

Represents the result of calling VEAnalyzer.compute_calibration_metrics.

Attributes

num_variants_includedint: The total number of unique variants included in the calibration analysis.
vep_namestr: Name of the variant effect predictor (VEP) used in the calibration. It could be system vep or a user supplied vep name.
pr_curve_coordinates_dfDataFrame: Precision-Recall curve coordinates for variants with columns: SCORE_SOURCE, PRECISION, RECALL, THRESHOLD
f1_curve_coordinates_dfDataFrame: f1 score curve coordinates for variants with columns: F1_SCORE, THRESHOLD
score_pathogenic_fraction_dfDataFrame: Statistics about positive and negative variants in different score bins. The variants are grouped into equal sized bins based on their score and the mean score and fraction of positive (pathogenic) variants in each bin is computed. Columns: SCORE_RANGE, LEFT_BOUNDARY_EXCLUSIVE, RIGHT_BOUNDARY_INCLUSIVE, MEAN_SCORE, NUM_VARIANTS, NUM_POSITIVE_LABELS, NUM_NEGATIVE_LABELS
scores_and_labels_dfDataFrame: List of variants included in the calibration analysis with columns: GENOME_ASSEMBLY, CHROMOSOME, POSITION, REFERENCE_NUCLEOTIDE, ALTERNATE_NUCLEOTIDE, BINARY_LABEL, RANK_SCORE

num_variants_included: int[source]

vep_name: str[source]

pr_curve_coordinates_df: pandas.DataFrame[source]

f1_curve_coordinates_df: pandas.DataFrame[source]

score_pathogenic_fraction_df: pandas.DataFrame[source]

scores_and_labels_df: pandas.DataFrame[source]