deirokay.statements.builtin.statistic_in_interval.StatisticInInterval
- class deirokay.statements.builtin.statistic_in_interval.StatisticInInterval(*args, **kwargs)[source]
Bases:
BaseStatementCompare the actual value of a statistic for the scope against a list of comparison expressions.
The available options are:
- statistic: One (or a list) of the following: ‘min’, ‘max’,
‘mean’, ‘std’, ‘var’, ‘count’, ‘nunique’, ‘sum’, ‘median’, ‘mode’.
One or more of the following comparators: <, <=, ==, !=, =~, !~, >=, >.
atol: Absolute tolerance (for =~ and !~). Default is 0.0.
rtol: Absolute tolerance (for =~ and !~). Default is 1e-09.
combination_logic: ‘and’ or ‘or’. Default is ‘and’.
Multiple comparison expressions can be used to represent multiple conditions. The combination_logic option can be set to express the logical relationship when grouping two or more comparisons.
You may provide, for instance, [‘min’, ‘max’] as statistic to test if all values in the scope are withing a range of values.
Examples
To check if the mean of the ‘a’ column is between 0.4 and 0.6 and not equal (approx.) to 0.5, its standard deviation is less than 0.1 or greater than 0.2, and all the values are between 0 and 1:
{ "scope": "a", "statements": [ { "type": "statistic_in_interval", "statistic": "mean", ">": 0.4, "!~": 0.5, "<": 0.6 }, { "type": "statistic_in_interval", "statistic": "std", "<": 0.1, ">": 0.2, "combination_logic": "or" }, { "type": "statistic_in_interval", "statistic": ["min", "max"], ">=": 0, "<=": 1 }, ] }
Methods
Generate a subclass that concretizes multibackend backend methods into their intended name.
Get current active backend for this class.
Given a template data table, generate a statement dict from it.
Proxy for register_backend_method to register an existing function as a backend-specific method.
Receive a DataFrame containing only columns on the scope of validation and returns a report of related metrics that can be used later to declare this Statement as fulfilled or failed.
Receive the report previously generated and declare this statement as either fulfilled (True) or failed (False).
Attributes
Parameters expected for this statement.
Statement name when referred in Validation Documents (only valid for Deirokay built-in statements).
Backends supported by this resource.
- ALLOWED_STATISTICS = ['min', 'max', 'mean', 'std', 'var', 'count', 'nunique', 'sum', 'median', 'mode']
- __call__(df: DeirokayDataSource) dict
Run statement instance.
- classmethod __init_subclass__() None
Validate subclassed statement.
- classmethod __post_attach_backend__()
This classmethod can be optionally overwritten to serve as a callback function for when the attach_backend() method is called.
- classmethod attach_backend(backend: Backend) Type[_AnyMultiBackendClass]
Generate a subclass that concretizes multibackend backend methods into their intended name. The methods marked with the given backend will compose the returned class.
- Parameters
cls (type) – Class to be subclassed with the given backend.
backend (Backend) – Backend to be selected.
- Returns
Subclass of the current class with methods filtered for the given backend.
- Return type
Type[MultiBackendMixin]
- expected_parameters: List[str] = ['statistic', '<', '<=', '>', '>=', '==', '!=', '=~', '!~', 'combination_logic', 'atol', 'rtol']
Parameters expected for this statement.
- Type
List[str]
- classmethod get_backend() Backend
Get current active backend for this class.
- Returns
The current active backend.
- Return type
- Raises
InvalidBackend – Backend not set or not a valid execution class.
- name: str = 'statistic_in_interval'
Statement name when referred in Validation Documents (only valid for Deirokay built-in statements).
- Type
str
- static profile(df: DeirokayDataSource) Dict[str, Any]
Given a template data table, generate a statement dict from it.
- Parameters
df (DataFrame) – The DataFrame to be used as template.
- Returns
Statement dict.
- Return type
dict
- Raises
NotImplementedError – If this method is not implemented by the subclass or the profile generation for this statement was intentionally skipped.
- classmethod register_backend_method(alias_for: str, func: Callable[[...], Any], backend: Backend) None
Proxy for register_backend_method to register an existing function as a backend-specific method.
- Parameters
alias_for (str) – The name of the method to be substituted with a backend-specific version.
func (AnyCallable) – Existing function to be registered as a method.
backend (Backend) – Backend for the method.
- report(df: DeirokayDataSource) dict
Receive a DataFrame containing only columns on the scope of validation and returns a report of related metrics that can be used later to declare this Statement as fulfilled or failed.
- Parameters
df (DataFrame) – The scoped DataFrame columns to be analysed in this report by this statement.
- Returns
A dictionary of useful statistics about the target columns.
- Return type
dict
- result(report: dict) bool[source]
Receive the report previously generated and declare this statement as either fulfilled (True) or failed (False).
- Parameters
report (dict) – Report generated by report method. Should ideally contain all statistics necessary to evaluate the statement validity.
- Returns
Whether or not this statement passed.
- Return type
bool