deirokay.airflow.deirokay_operator.DeirokayOperator
- class deirokay.airflow.deirokay_operator.DeirokayOperator(*args, **kwargs)[source]
Bases:
BaseOperatorParse a DataFrame from a file using Deirokay options and validate against a Deirokay Validation Document. You may choose different severity levels to mark a task as “soft failure” (skipped state) or “normal failure” (failed state).
- Parameters
data (Optional[str]) – File to be parsed into Deirokay.
path_to_file (Optional[str]) – (Deprecated) File to be parsed into Deirokay. Use data instead.
options (Union[dict, str]) – A dict or a local/S3 path to a YAML/JSON options file.
against (Union[dict, str]) – A dict or a local/S3 path to a YAML/JSON validation document file.
backend (Optional[Backend]) – Backend to use when processing data.
template (Optional[dict]) – Map of templates to be passed to Deirokay validation.
save_to (Optional[str], optional) – Where validation logs will be saved to. If None, no log is saved. By default None.
soft_fail_level (Union[SeverityLevel, int]) – Minimum Deirokay severity level to trigger a “soft failure”. Any statement with lower severity level will only raise a warning. Set to None to never trigger. By default SeverityLevel.MINIMAL (1).
hard_fail_level (Union[SeverityLevel, int]) – Minimum Deirokay severity level to trigger a task failure. Set to None to never trigger. By default SeverityLevel.CRITICAL (5).
reader_kwargs (Optional[dict]) – Additional keyword arguments for Deirokay.data_reader method.
validator_kwargs (Optional[dict]) – Additional keyword arguments for Deirokay.validate method.
**kwargs (Optional[dict]) – Additional keyword arguments for BaseOperator.
Methods
Sets inlets to this operator
Adds only new items to item set
Defines the outlets of this operator
Clears the state of task instances associated with the task, following the parameters specified.
Performs dry run for the operator - just render template fields.
This is the main method to derive when creating an operator.
Get set of the direct relative ids to the current task, upstream or downstream.
Get list of the direct relatives to the current task, upstream or downstream.
For an operator, gets the URL that the external links specified in extra_links should point to.
Get a flat set of relatives' ids, either upstream or downstream.
Get a flat list of relatives, either upstream or downstream.
- return
list of inlets defined for this operator
- return
list of outlets defined for this operator
Stringified DAGs and operators contain exactly these fields.
Get a set of task instance related to this task for a specific date range.
Fetch a Jinja template environment from the DAG or instantiate empty environment if no DAG.
Returns True if the Operator has been assigned to a DAG.
Return if this operator can use smart service.
Override this method to cleanup subprocesses when a task instance gets killed.
This hook is triggered right after self.execute() is called.
This hook is triggered right before self.execute() is called.
Lock task for execution to disable custom action in __setattr__ and returns a copy of the task
Hook that is triggered after the templated fields get replaced by their content.
Render a templated string.
Template all attributes listed in template_fields.
Getting the content of files for template_field / template_ext
Run a set of task instances for a date range.
Set a task or a task list to be directly downstream from the current task.
Set a task or a task list to be directly upstream from the current task.
Resolves upstream dependencies of a task.
Update relationship information about another TaskMixin.
Pull XComs that optionally meet certain criteria.
Make an XCom available for tasks to pull.
Attributes
Returns the Operator's DAG if set, otherwise raises an error
Returns dag id if it has one or an adhoc + owner
Returns the set of dependencies for the operator.
list of tasks directly downstream
set of ids of tasks directly downstream
extra links for the task
Returns dictionary of all global extra links
Used to determine if an Operator is inherited from DummyOperator
Required by TaskMixin
Returns a logger.
Returns dictionary of all extra links for the operator
Returns reference to XCom pushed by current operator
Total priority weight for the task.
Required by TaskMixin
type of the task
list of tasks directly upstream
set of ids of tasks directly upstream
- __deepcopy__(memo)
Hack sorting double chained task lists by task_id to avoid hitting max_depth on deepcopy operations.
- __gt__(other)
Called for [Operator] > [Outlet], so that if other is an attr annotated object it is set as an outlet of this Operator.
- __lshift__(other: Union[TaskMixin, Sequence[TaskMixin]])
Implements Task << Task
- __lt__(other)
Called for [Inlet] > [Operator] or [Operator] < [Inlet], so that if other is an attr annotated object it is set as an inlet to this operator
- __or__(other)
Called for [This Operator] | [Operator], The inlets of other will be set to pickup the outlets from this operator. Other will be set as a downstream task of this operator.
- __rlshift__(other: Union[TaskMixin, Sequence[TaskMixin]])
Called for Task << [Task] because list don’t have __lshift__ operators.
- __rrshift__(other: Union[TaskMixin, Sequence[TaskMixin]])
Called for Task >> [Task] because list don’t have __rshift__ operators.
- __rshift__(other: Union[TaskMixin, Sequence[TaskMixin]])
Implements Task >> Task
- add_inlets(inlets: Iterable[Any])
Sets inlets to this operator
- add_only_new(item_set: Set[str], item: str) None
Adds only new items to item set
- add_outlets(outlets: Iterable[Any])
Defines the outlets of this operator
- clear(start_date: Optional[datetime] = None, end_date: Optional[datetime] = None, upstream: bool = False, downstream: bool = False, session: Session = None)
Clears the state of task instances associated with the task, following the parameters specified.
- property dag: Any
Returns the Operator’s DAG if set, otherwise raises an error
- property dag_id: str
Returns dag id if it has one or an adhoc + owner
- deps: Iterable[BaseTIDep] = frozenset({<TIDep(Trigger Rule)>, <TIDep(Not In Retry Period)>, <TIDep(Previous Dagrun State)>, <TIDep(Not Previously Skipped)>})
Returns the set of dependencies for the operator. These differ from execution context dependencies in that they are specific to tasks and can be extended/overridden by subclasses.
- property downstream_list: List[BaseOperator]
list of tasks directly downstream
- Type
@property
- property downstream_task_ids: Set[str]
set of ids of tasks directly downstream
- Type
@property
- dry_run() None
Performs dry run for the operator - just render template fields.
- execute(context: dict)[source]
This is the main method to derive when creating an operator. Context is the same dictionary used as when rendering jinja templates.
Refer to get_template_context for more context.
- extra_links
extra links for the task
- Type
@property
- get_direct_relative_ids(upstream: bool = False) Set[str]
Get set of the direct relative ids to the current task, upstream or downstream.
- get_direct_relatives(upstream: bool = False) List[BaseOperator]
Get list of the direct relatives to the current task, upstream or downstream.
- get_extra_links(dttm: datetime, link_name: str) Optional[Dict[str, Any]]
For an operator, gets the URL that the external links specified in extra_links should point to.
- Raises
ValueError – The error message of a ValueError will be passed on through to the fronted to show up as a tooltip on the disabled link
- Parameters
dttm – The datetime parsed execution date for the URL being searched for
link_name – The name of the link we’re looking for the URL for. Should be one of the options specified in extra_links
- Returns
A URL
- get_flat_relative_ids(upstream: bool = False, found_descendants: Optional[Set[str]] = None) Set[str]
Get a flat set of relatives’ ids, either upstream or downstream.
- get_flat_relatives(upstream: bool = False)
Get a flat list of relatives, either upstream or downstream.
- get_inlet_defs()
- Returns
list of inlets defined for this operator
- get_outlet_defs()
- Returns
list of outlets defined for this operator
- classmethod get_serialized_fields()
Stringified DAGs and operators contain exactly these fields.
- get_task_instances(start_date: Optional[datetime] = None, end_date: Optional[datetime] = None, session: Session = None) List[TaskInstance]
Get a set of task instance related to this task for a specific date range.
- get_template_env() Environment
Fetch a Jinja template environment from the DAG or instantiate empty environment if no DAG.
- global_operator_extra_link_dict
Returns dictionary of all global extra links
- has_dag()
Returns True if the Operator has been assigned to a DAG.
- property inherits_from_dummy_operator
Used to determine if an Operator is inherited from DummyOperator
- is_smart_sensor_compatible()
Return if this operator can use smart service. Default False.
- property leaves: List[BaseOperator]
Required by TaskMixin
- property log: Logger
Returns a logger.
- on_kill() None
Override this method to cleanup subprocesses when a task instance gets killed. Any use of the threading, subprocess or multiprocessing module within an operator needs to be cleaned up or it will leave ghost processes behind.
- operator_extra_link_dict
Returns dictionary of all extra links for the operator
- operator_extra_links: Iterable['BaseOperatorLink'] = ()
- property output
Returns reference to XCom pushed by current operator
- pool: str = ''
- post_execute(context: Any, result: Any = None)
This hook is triggered right after self.execute() is called. It is passed the execution context and any results returned by the operator.
- pre_execute(context: Any)
This hook is triggered right before self.execute() is called.
- prepare_for_execution() BaseOperator
Lock task for execution to disable custom action in __setattr__ and returns a copy of the task
- prepare_template() None
Hook that is triggered after the templated fields get replaced by their content. If you need your operator to alter the content of the file before the template is rendered, it should override this method to do so.
- property priority_weight_total: int
Total priority weight for the task. It might include all upstream or downstream tasks. depending on the weight rule.
WeightRule.ABSOLUTE - only own weight
WeightRule.DOWNSTREAM - adds priority weight of all downstream tasks
WeightRule.UPSTREAM - adds priority weight of all upstream tasks
- render_template(content: Any, context: Dict, jinja_env: Optional[Environment] = None, seen_oids: Optional[Set] = None) Any
Render a templated string. The content can be a collection holding multiple templated strings and will be templated recursively.
- Parameters
content (Any) – Content to template. Only strings can be templated (may be inside collection).
context (dict) – Dict with values to apply on templated content
jinja_env (jinja2.Environment) – Jinja environment. Can be provided to avoid re-creating Jinja environments during recursion.
seen_oids (set) – template fields already rendered (to avoid RecursionError on circular dependencies)
- Returns
Templated content
- render_template_fields(context: Dict, jinja_env: Optional[Environment] = None) None
Template all attributes listed in template_fields. Note this operation is irreversible.
- Parameters
context (dict) – Dict with values to apply on content
jinja_env (jinja2.Environment) – Jinja environment
- resolve_template_files() None
Getting the content of files for template_field / template_ext
- property roots: List[BaseOperator]
Required by TaskMixin
- run(start_date: Optional[datetime] = None, end_date: Optional[datetime] = None, ignore_first_depends_on_past: bool = True, ignore_ti_state: bool = False, mark_success: bool = False) None
Run a set of task instances for a date range.
- set_downstream(task_or_task_list: Union[TaskMixin, Sequence[TaskMixin]]) None
Set a task or a task list to be directly downstream from the current task. Required by TaskMixin.
- set_upstream(task_or_task_list: Union[TaskMixin, Sequence[TaskMixin]]) None
Set a task or a task list to be directly upstream from the current task. Required by TaskMixin.
- set_xcomargs_dependencies() None
Resolves upstream dependencies of a task. In this way passing an
XComArgas value for a template field will result in creating upstream relation between two tasks.Example:
with DAG(...): generate_content = GenerateContentOperator(task_id="generate_content") send_email = EmailOperator(..., html_content=generate_content.output) # This is equivalent to with DAG(...): generate_content = GenerateContentOperator(task_id="generate_content") send_email = EmailOperator( ..., html_content="{{ task_instance.xcom_pull('generate_content') }}" ) generate_content >> send_email
- shallow_copy_attrs: Tuple[str, ...] = ()
- supports_lineage = False
- property task_type: str
type of the task
- Type
@property
- template_ext: Iterable[str] = ()
- template_fields: Iterable[str] = ['data', 'options', 'against', 'template', 'save_to', 'reader_kwargs', 'validator_kwargs']
- template_fields_renderers: Dict[str, str] = {'against': 'json', 'options': 'json'}
- ui_color: str = '#59f75e'
- ui_fgcolor: str = '#000'
- update_relative(other: TaskMixin, upstream=True) None
Update relationship information about another TaskMixin. Default is no-op. Override if necessary.
- property upstream_list: List[BaseOperator]
list of tasks directly upstream
- Type
@property
- property upstream_task_ids: Set[str]
set of ids of tasks directly upstream
- Type
@property
- static xcom_pull(context: Any, task_ids: Optional[List[str]] = None, dag_id: Optional[str] = None, key: str = 'return_value', include_prior_dates: Optional[bool] = None) Any
Pull XComs that optionally meet certain criteria.
The default value for key limits the search to XComs that were returned by other tasks (as opposed to those that were pushed manually). To remove this filter, pass key=None (or any desired value).
If a single task_id string is provided, the result is the value of the most recent matching XCom from that task_id. If multiple task_ids are provided, a tuple of matching values is returned. None is returned whenever no matches are found.
- Parameters
context – Execution Context Dictionary
key (str) – A key for the XCom. If provided, only XComs with matching keys will be returned. The default key is ‘return_value’, also available as a constant XCOM_RETURN_KEY. This key is automatically given to XComs returned by tasks (as opposed to being pushed manually). To remove the filter, pass key=None.
task_ids (str or iterable of strings (representing task_ids)) – Only XComs from tasks with matching ids will be pulled. Can pass None to remove the filter.
dag_id (str) – If provided, only pulls XComs from this DAG. If None (default), the DAG of the calling task is used.
include_prior_dates (bool) – If False, only XComs from the current execution_date are returned. If True, XComs from previous dates are returned as well.
- Type
Any
- static xcom_push(context: Any, key: str, value: Any, execution_date: Optional[datetime] = None) None
Make an XCom available for tasks to pull.
- Parameters
context – Execution Context Dictionary
key (str) – A key for the XCom
value (any pickleable object) – A value for the XCom. The value is pickled and stored in the database.
execution_date (datetime) – if provided, the XCom will not be visible until this date. This can be used, for example, to send a message to a task on a future date without it being immediately visible.
- Type
Any