Creating Custom Statements
Deirokay is designed to be broadly extensible. Even if your set of rules about your data is not in the bultin set of Deirokay statements, you can still subclass the BaseStatement class and implement your own statements. If you believe your statement will be useful for other users, we encourage you to propose it as a Merge Request so that it can become a builtin statement in a future release.
A custom Statement should override at least two methods from BaseStatement: report and result. The report method presents statistics and measurements that may be used by the result method. The report is attached to the Validation Result Document in order to compose a detailed list of facts about your data. The result method receives the report generated by report and returns either True (to signal success) or False (to signal that the statement is invalid).
The code below shows an example of a custom statement:
from deirokay.statements import BaseStatement
class ThereAreValuesGreaterThanX(BaseStatement):
# Give your statement class a name (only for completeness,
# its name is only useful when proposing it in a Merge Request)
name = 'there_are_values_greater_than_x'
# Declare which parameters are valid for this statement
expected_parameters = ['x']
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# All the arguments necessary for the statement are collected
# from `self.options`. If they were templated arguments, they
# should have already been rendered and you may transparently
# use their final value in `report` and `result` methods.
self.x = self.options.get('x')
def report(self, df) -> dict:
"""
Report statistics and facts about the data.
"""
bools = df > self.x
report = {
'values_greater_than_x': list(bools[bools.all(axis=1)].index)
}
return report
def result(self, report: dict) -> bool:
"""
Use metrics from the report to indicate either success
(True) or failure (False)
"""
return len(report.get('values_greater_than_x')) > 0
The following Validation Document shows how to use your custom Statement for a validation process:
validation_document = {
"name": "VENDAS",
"description": "Validation using custom statement",
"items": [
{
"scope": "NUM_TRANSACAO01",
"statements": [
{
"type": "custom",
"location": "/home/custom_statement.py::"
"ThereAreValuesGreaterThanX",
"x": 2
}
]
}
]
}
Besides the parameters necessary for your custom statement (“x”: 2 in the example above) and the custom statement type, you should pass a location parameter that instructs Deirokay how to find your statement class. There is not need for the module file to be in current directory: your class will be magically imported by Deirokay and used during validation process.
The location parameter must follow the pattern path_to_module::class_name.
Currently, you can pass either a local path or an S3 key:
/home/ubuntu/my_module.py::MyStatementClass
s3://my-bucket/my_statements/module_of_statements.py::Stmt (make sure you have boto3 installed)