DW Handbook

This page shows the name of the construct, its description

Attribute Value
Name Split Table
Description Split a single input dataset into multiple datasets
Function produce two subsets of a certain dataset based on a predicate and its inverse
Aim Allow a data engineer (user) to split a dataset into two different datasets using a specified condition
Context This operation is used when a dataset requires splitting into two subset of rows present in the original/unrefined dataset that are required for the analysis for which the dataset is to serve as input.
Rationale splitting the datasets into multiple subsets of rows allows for different subsequent operations or end-goal analysis making each dataset size smaller affecting the processing time for subsequent intermediary operations of the wrangling process and the analysis the dataset is being prepared for.
Mechanisim Perform different wrangling operations on each of the subsets of a dataset created by dividing the original dataset based on a certain condition using parallel <b>Filter Rows</b>.
Formalisim split(R, pred)={ Ra , Rb | Ra ∈ σ(R, pred) ^ Rb ∈ σ(R, ¬pred)} , where R,Ra and Rb are relations with n columns. pred is a function returning a Boolean.
Relational Algebra (RA) Similar to RA operation Select(σ)
Type Composite
Class Router
Transformation_category 1:M
Inputs
InputsNumber of input datasets
Input dataset, condition to split 1
Outputs
OutputsNumber of output datasets
datasetsM
Used in stage(s) Structuring2

Back