DW Handbook

This page shows the name of the construct, its description

Attribute Value
Name Filter Rows
Description produce a new dataset based by applying a condition on an input dataset
Function Reduce a dataset vertical dimension by removing unrequired column
Aim Allow a data engineer (user) to select a subset of rows from a dataset that are not required for the analysis which the dataset if being prepared for
Context This operation is used when a subset of rows present in the original/unrefined dataset are deemed irrelevant to the analysis for which the dataset is to serve as input.
Rationale removing the irrelevant subset of rows for subsequent operations or end-goal analysis makes the dataset size smaller which would affects the processing time for subsequent intermediary operations of the wrangling process and the analysis the dataset is being prepared for.
Mechanisim Reduce dataset dimension by removing a subset of rows in a dataset. This can be done by exploring the facilities found in GUI-based tools and programming language functions.
Formalisim σ(R, pred)={(a1,...,an) | (a1,...,an ∈ R ∧ pred((a1,...,an)}, where R is a relation with n columns. pred is a function returning a Boolean. (Raman, V and Hellerstein, J 2001)
Relational Algebra (RA) Similar to RA operation Select(σ)
Type Atomic
Class Unary
Transformation_category 1:1
Inputs
InputsNumber of input datasets
Input dataset, condition to split1
Outputs
OutputsNumber of output datasets
filtered dataset1
Used in stage(s) Cleaning , Structuring2

Back