DW Handbook

This page shows the name of the construct, its description

Attribute

Value

Name

Filter Rows

Description

produce a new dataset based by applying a condition on an input dataset

Function

Reduce a dataset vertical dimension by removing unrequired column

Aim

Allow a data engineer (user) to select a subset of rows from a dataset that are not required for the analysis which the dataset if being prepared for

Context

This operation is used when a subset of rows present in the original/unrefined dataset are deemed irrelevant to the analysis for which the dataset is to serve as input.

Rationale

removing the irrelevant subset of rows for subsequent operations or end-goal analysis makes the dataset size smaller which would affects the processing time for subsequent intermediary operations of the wrangling process and the analysis the dataset is being prepared for.

Mechanisim

Reduce dataset dimension by removing a subset of rows in a dataset. This can be done by exploring the facilities found in GUI-based tools and programming language functions.

Formalisim

σ(R, pred)={(a₁,...,a_n) | (a₁,...,a_n ∈ R ∧ pred((a₁,...,a_n)}, where R is a relation with n columns. pred is a function returning a Boolean. (Raman, V and Hellerstein, J 2001)

Relational Algebra (RA)

Similar to RA operation Select(σ)

Type

Atomic

Class

Unary

Transformation_category

1:1

Inputs

Inputs	Number of input datasets
Input dataset, condition to split	1

Outputs

Outputs	Number of output datasets
filtered dataset	1

Used in stage(s)

Cleaning , Structuring2

Back