aigct.model =========== .. py:module:: aigct.model .. autoapi-nested-parse:: Classes that represent model objects. Model objects are containers for data. They generally do not have behavior associated with them. Classes ------- .. autoapisummary:: aigct.model.VariantId aigct.model.VariantEffectSource aigct.model.VariantFilter aigct.model.VEQueryCriteria aigct.model.VEAnalysisResult aigct.model.TaskPkViolations aigct.model.PkViolations aigct.model.VEAnalysisCalibrationResult Module Contents --------------- .. py:class:: VariantId Model object that represents a variant id. Attributes ---------- genome_assembly : str genome assembly symbol, i.e. hg38 chromosome : str chromosome position : int position reference_nucleotide : str reference nucleotide alternate_nucleotide : str alternate_nucleotide .. py:attribute:: genome_assembly :type: str .. py:attribute:: chromosome :type: str .. py:attribute:: position :type: int .. py:attribute:: reference_nucleotide :type: str .. py:attribute:: alternate_nucleotide :type: str .. py:class:: VariantEffectSource Model object that represents a variant effect source. Attributes ---------- code : str A unique code that identifies the variant effect source name : str A unique name of the source source_type : str i.e. VEP description : str Description .. py:attribute:: code :type: str .. py:attribute:: name :type: str .. py:attribute:: source_type :type: str .. py:attribute:: description :type: str .. py:class:: VariantFilter Model object that represents named variant filter query. The filter criteria consists either of a list of genes or a list of variant id's or both. Attributes ---------- filter : Series A series with the the unique code, name, description of the filter. filter_genes : DataFrame A dataframe of gene symbols associated with the filter. If None then there filter_variants must not be None. filter_variants : DataFrame A dataframe of variant id's associated with the filter. If None then there filter_genes must not be None. .. py:attribute:: filter :type: pandas.Series .. py:attribute:: filter_genes :type: pandas.DataFrame .. py:attribute:: filter_variants :type: pandas.DataFrame .. py:class:: VEQueryCriteria Model object that represents variant query criteria. Attributes ---------- gene_symbols : list or DataFrame, optional List of gene symbols include_genes : bool, optional If gene_symbols is provided, indicates whether to limit variants to associated with those gene_symbols to exclude variants associated with the gene_symbols. variant_ids : DataFrame, optional List of variant ids. The dataframe is expected to have the following columns: GENOME_ASSEMBLY, CHROMOSOME, POSITION, REFERENCE_NUCLEOTIDE, ALTERNATE_NUCLEOTIDE If the column names are different specify a value for column_name_map mapping the column names to the expected names. include_variant_ids : bool, optional If variant_ids is provided, indicates whether to limit variants to the variant_ids provided or to fetch all variants but those in variant_ids column_name_map : Dict, optional A dictionary that maps the column names in variant_ids to the expected column names. allele_frequency_operator : str, optional If allele_frequency is provided, this is one of "eq", "gt", "lt", "ge", "le". i.e. limit variants to those whose allele_frequency is equal to, greater than, etc. the allele_frequency. allele_frequency : float, optional Used in conjunction to allele_frequency_operator to limit variants to those meeting a certain allele_frequency criteria. filter_names : str | list[str], optional The name(s) of a system filter that can be used to limit the variants returned. If more than one is given then the filters are combined using a logical OR. .. py:attribute:: gene_symbols :type: list[str] | pandas.DataFrame | pandas.Series :value: None .. py:attribute:: include_genes :type: bool :value: True .. py:attribute:: variant_ids :type: pandas.DataFrame :value: None .. py:attribute:: include_variant_ids :type: bool :value: True .. py:attribute:: column_name_map :type: Dict :value: None .. py:attribute:: allele_frequency_operator :type: str :value: '=' .. py:attribute:: allele_frequency :type: float :value: None .. py:attribute:: filter_names :type: str | list[str] :value: None .. py:class:: VEAnalysisResult Represents the result of calling VEAnalyzer.compute_metrics. Attributes ---------- num_variants_included : int The total number of unique variants included in the analysis across all veps. num_user_variants : int The number of user supplied variants included in the analysis user_vep_name : str Name of user vep general_metrics : DataFrame Has the following columns: SCORE_SOURCE - Short unique vep identifier NUM_VARIANTS, NUM_POSITIVE_LABELS, NUM_NEGATIVE_LABELS, SOURCE_NAME - Name of vep roc_metrics : DataFrame, optional Roc metrics with columns: SCORE_SOURCE, ROC_AUC, EXCEPTION, SOURCE_NAME EXCEPTION would store an exception message in the event the roc could not be computed for that vep. pr_metrics : DataFrame, optional Precision/Recall metrics containing columns: SCORE_SOURCE, PR_AUC, SOURCE_NAME mwu_metrics : DataFrame, optional Mann-Whitney U metrics containing columns: SCORE_SOURCE, NEG_LOG10_MWU_PVAL, SOURCE_NAME gene_general_metrics : DataFrame, optional Gene-level general metrics with columns: SCORE_SOURCE, GENE_SYMBOL, NUM_VARIANTS, NUM_POSITIVE_LABELS, NUM_NEGATIVE_LABELS, SOURCE_NAME gene_roc_metrics : DataFrame, optional Gene-level ROC metrics with columns: SCORE_SOURCE, GENE_SYMBOL, ROC_AUC, EXCEPTION, SOURCE_NAME gene_pr_metrics : DataFrame, optional Gene-level precision/recall metrics with columns: SCORE_SOURCE, GENE_SYMBOL, PR_AUC, SOURCE_NAME gene_mwu_metrics : DataFrame, optional Gene-level Mann-Whitney U metrics with columns: SCORE_SOURCE, GENE_SYMBOL, NEG_LOG10_MWU_PVAL, SOURCE_NAME roc_curve_coordinates : DataFrame, optional Columns: SCORE_SOURCE, FALSE_POSITIVE_RATE, TRUE_POSITIVE_RATE, THRESHOLD pr_curve_coordinates : DataFrame, optional Columns: SCORE_SOURCE, PRECISION, RECALL, THRESHOLD gene_roc_curve_coordinates : DataFrame, optional Gene-level ROC curve coordinates with columns: SCORE_SOURCE, GENE_SYMBOL, FALSE_POSITIVE_RATE, TRUE_POSITIVE_RATE, THRESHOLD gene_pr_curve_coordinates : DataFrame, optional Gene-level precision/recall curve coordinates with columns: SCORE_SOURCE, GENE_SYMBOL, PRECISION, RECALL, THRESHOLD variants_included : DataFrame, optional List of variants included for each vep included the user vep. Columns: SCORE_SOURCE, GENOME_ASSEMBLY, CHROMOSOME, POSITION, REFERENCE_NUCLEOTIDE, ALTERNATE_NUCLEOTIDE gene_unique_variant_counts_df : DataFrame, optional Count of unique variants per gene across all vepswith columns: GENE_SYMBOL, NUM_UNIQUE_VARIANTS .. py:attribute:: num_variants_included :type: int .. py:attribute:: num_user_variants :type: int .. py:attribute:: user_vep_name :type: str .. py:attribute:: general_metrics :type: pandas.DataFrame .. py:attribute:: roc_metrics :type: pandas.DataFrame .. py:attribute:: pr_metrics :type: pandas.DataFrame .. py:attribute:: mwu_metrics :type: pandas.DataFrame .. py:attribute:: gene_general_metrics :type: pandas.DataFrame .. py:attribute:: gene_roc_metrics :type: pandas.DataFrame .. py:attribute:: gene_pr_metrics :type: pandas.DataFrame .. py:attribute:: gene_mwu_metrics :type: pandas.DataFrame .. py:attribute:: roc_curve_coordinates :type: pandas.DataFrame .. py:attribute:: pr_curve_coordinates :type: pandas.DataFrame .. py:attribute:: gene_roc_curve_coordinates :type: pandas.DataFrame .. py:attribute:: gene_pr_curve_coordinates :type: pandas.DataFrame .. py:attribute:: variants_included :type: pandas.DataFrame .. py:attribute:: gene_unique_variant_counts_df :type: pandas.DataFrame .. py:class:: TaskPkViolations .. py:attribute:: dups_found :type: bool .. py:attribute:: variant_effect_label_dups :type: pandas.DataFrame .. py:attribute:: variant_effect_score_dups :type: pandas.DataFrame .. py:attribute:: variant_filter_dups :type: pandas.DataFrame .. py:attribute:: variant_filter_gene_dups :type: pandas.DataFrame .. py:attribute:: variant_filter_variant_dups :type: pandas.DataFrame .. py:class:: PkViolations .. py:attribute:: dups_found :type: bool .. py:attribute:: variant_dups :type: pandas.DataFrame .. py:attribute:: variant_effect_source_dups :type: pandas.DataFrame .. py:attribute:: task_violations :type: dict[str, TaskPkViolations] .. py:class:: VEAnalysisCalibrationResult Represents the result of calling VEAnalyzer.compute_calibration_metrics. Attributes ---------- num_variants_included : int The total number of unique variants included in the calibration analysis. vep_name : str Name of the variant effect predictor (VEP) used in the calibration. It could be system vep or a user supplied vep name. pr_curve_coordinates_df : DataFrame Precision-Recall curve coordinates for variants with columns: SCORE_SOURCE, PRECISION, RECALL, THRESHOLD f1_curve_coordinates_df : DataFrame f1 score curve coordinates for variants with columns: F1_SCORE, THRESHOLD score_pathogenic_fraction_df : DataFrame Statistics about positive and negative variants in different score bins. The variants are grouped into equal sized bins based on their score and the mean score and fraction of positive (pathogenic) variants in each bin is computed. Columns: SCORE_RANGE, LEFT_BOUNDARY_EXCLUSIVE, RIGHT_BOUNDARY_INCLUSIVE, MEAN_SCORE, NUM_VARIANTS, NUM_POSITIVE_LABELS, NUM_NEGATIVE_LABELS scores_and_labels_df : DataFrame List of variants included in the calibration analysis with columns: GENOME_ASSEMBLY, CHROMOSOME, POSITION, REFERENCE_NUCLEOTIDE, ALTERNATE_NUCLEOTIDE, BINARY_LABEL, RANK_SCORE .. py:attribute:: num_variants_included :type: int .. py:attribute:: vep_name :type: str .. py:attribute:: pr_curve_coordinates_df :type: pandas.DataFrame .. py:attribute:: f1_curve_coordinates_df :type: pandas.DataFrame .. py:attribute:: score_pathogenic_fraction_df :type: pandas.DataFrame .. py:attribute:: scores_and_labels_df :type: pandas.DataFrame