aigct.model
Classes that represent model objects.
Model objects are containers for data. They generally do not have behavior associated with them.
Classes
Model object that represents a variant id. |
|
Model object that represents a variant effect source. |
|
Model object that represents named variant filter query. The |
|
Model object that represents variant query criteria. |
|
Represents the result of calling VEAnalyzer.compute_metrics. |
|
Represents the result of calling VEAnalyzer.compute_calibration_metrics. |
Module Contents
- class aigct.model.VariantId[source]
Model object that represents a variant id.
Attributes
- genome_assemblystr
genome assembly symbol, i.e. hg38
- chromosomestr
chromosome
- positionint
position
- reference_nucleotidestr
reference nucleotide
- alternate_nucleotidestr
alternate_nucleotide
- class aigct.model.VariantEffectSource[source]
Model object that represents a variant effect source.
Attributes
- codestr
A unique code that identifies the variant effect source
- namestr
A unique name of the source
- source_typestr
i.e. VEP
- descriptionstr
Description
- class aigct.model.VariantFilter[source]
Model object that represents named variant filter query. The filter criteria consists either of a list of genes or a list of variant id’s or both.
Attributes
- filterSeries
A series with the the unique code, name, description of the filter.
- filter_genesDataFrame
A dataframe of gene symbols associated with the filter. If None then there filter_variants must not be None.
- filter_variantsDataFrame
A dataframe of variant id’s associated with the filter. If None then there filter_genes must not be None.
- class aigct.model.VEQueryCriteria[source]
Model object that represents variant query criteria.
Attributes
- gene_symbolslist or DataFrame, optional
List of gene symbols
- include_genesbool, optional
If gene_symbols is provided, indicates whether to limit variants to associated with those gene_symbols to exclude variants associated with the gene_symbols.
- variant_idsDataFrame, optional
List of variant ids. The dataframe is expected to have the following columns: GENOME_ASSEMBLY, CHROMOSOME, POSITION, REFERENCE_NUCLEOTIDE, ALTERNATE_NUCLEOTIDE If the column names are different specify a value for column_name_map mapping the column names to the expected names.
- include_variant_idsbool, optional
If variant_ids is provided, indicates whether to limit variants to the variant_ids provided or to fetch all variants but those in variant_ids
- column_name_mapDict, optional
A dictionary that maps the column names in variant_ids to the expected column names.
- allele_frequency_operatorstr, optional
If allele_frequency is provided, this is one of “eq”, “gt”, “lt”, “ge”, “le”. i.e. limit variants to those whose allele_frequency is equal to, greater than, etc. the allele_frequency.
- allele_frequencyfloat, optional
Used in conjunction to allele_frequency_operator to limit variants to those meeting a certain allele_frequency criteria.
- filter_namesstr | list[str], optional
The name(s) of a system filter that can be used to limit the variants returned. If more than one is given then the filters are combined using a logical OR.
- class aigct.model.VEAnalysisResult[source]
Represents the result of calling VEAnalyzer.compute_metrics.
Attributes
- num_variants_includedint
The total number of unique variants included in the analysis across all veps.
- num_user_variantsint
The number of user supplied variants included in the analysis
- user_vep_namestr
Name of user vep
- general_metricsDataFrame
Has the following columns: SCORE_SOURCE - Short unique vep identifier NUM_VARIANTS, NUM_POSITIVE_LABELS, NUM_NEGATIVE_LABELS, SOURCE_NAME - Name of vep
- roc_metricsDataFrame, optional
Roc metrics with columns: SCORE_SOURCE, ROC_AUC, EXCEPTION, SOURCE_NAME EXCEPTION would store an exception message in the event the roc could not be computed for that vep.
- pr_metricsDataFrame, optional
Precision/Recall metrics containing columns: SCORE_SOURCE, PR_AUC, SOURCE_NAME
- mwu_metricsDataFrame, optional
Mann-Whitney U metrics containing columns: SCORE_SOURCE, NEG_LOG10_MWU_PVAL, SOURCE_NAME
- gene_general_metricsDataFrame, optional
Gene-level general metrics with columns: SCORE_SOURCE, GENE_SYMBOL, NUM_VARIANTS, NUM_POSITIVE_LABELS, NUM_NEGATIVE_LABELS, SOURCE_NAME
- gene_roc_metricsDataFrame, optional
Gene-level ROC metrics with columns: SCORE_SOURCE, GENE_SYMBOL, ROC_AUC, EXCEPTION, SOURCE_NAME
- gene_pr_metricsDataFrame, optional
Gene-level precision/recall metrics with columns: SCORE_SOURCE, GENE_SYMBOL, PR_AUC, SOURCE_NAME
- gene_mwu_metricsDataFrame, optional
Gene-level Mann-Whitney U metrics with columns: SCORE_SOURCE, GENE_SYMBOL, NEG_LOG10_MWU_PVAL, SOURCE_NAME
- roc_curve_coordinatesDataFrame, optional
Columns: SCORE_SOURCE, FALSE_POSITIVE_RATE, TRUE_POSITIVE_RATE, THRESHOLD
- pr_curve_coordinatesDataFrame, optional
Columns: SCORE_SOURCE, PRECISION, RECALL, THRESHOLD
- gene_roc_curve_coordinatesDataFrame, optional
Gene-level ROC curve coordinates with columns: SCORE_SOURCE, GENE_SYMBOL, FALSE_POSITIVE_RATE, TRUE_POSITIVE_RATE, THRESHOLD
- gene_pr_curve_coordinatesDataFrame, optional
Gene-level precision/recall curve coordinates with columns: SCORE_SOURCE, GENE_SYMBOL, PRECISION, RECALL, THRESHOLD
- variants_includedDataFrame, optional
List of variants included for each vep included the user vep. Columns: SCORE_SOURCE, GENOME_ASSEMBLY, CHROMOSOME, POSITION, REFERENCE_NUCLEOTIDE, ALTERNATE_NUCLEOTIDE
- gene_unique_variant_counts_dfDataFrame, optional
Count of unique variants per gene across all vepswith columns: GENE_SYMBOL, NUM_UNIQUE_VARIANTS
- class aigct.model.PkViolations[source]
-
- task_violations: dict[str, TaskPkViolations][source]
- class aigct.model.VEAnalysisCalibrationResult[source]
Represents the result of calling VEAnalyzer.compute_calibration_metrics.
Attributes
- num_variants_includedint
The total number of unique variants included in the calibration analysis.
- vep_namestr
Name of the variant effect predictor (VEP) used in the calibration. It could be system vep or a user supplied vep name.
- pr_curve_coordinates_dfDataFrame
Precision-Recall curve coordinates for variants with columns: SCORE_SOURCE, PRECISION, RECALL, THRESHOLD
- f1_curve_coordinates_dfDataFrame
f1 score curve coordinates for variants with columns: F1_SCORE, THRESHOLD
- score_pathogenic_fraction_dfDataFrame
Statistics about positive and negative variants in different score bins. The variants are grouped into equal sized bins based on their score and the mean score and fraction of positive (pathogenic) variants in each bin is computed. Columns: SCORE_RANGE, LEFT_BOUNDARY_EXCLUSIVE, RIGHT_BOUNDARY_INCLUSIVE, MEAN_SCORE, NUM_VARIANTS, NUM_POSITIVE_LABELS, NUM_NEGATIVE_LABELS
- scores_and_labels_dfDataFrame
List of variants included in the calibration analysis with columns: GENOME_ASSEMBLY, CHROMOSOME, POSITION, REFERENCE_NUCLEOTIDE, ALTERNATE_NUCLEOTIDE, BINARY_LABEL, RANK_SCORE