DW Handbook

This page shows the name of the construct, its description

Attribute

Value

Name

Remove Column

Description

exclude a column from a relation

Function

Reduce dataset horizontal dimension by removing a column of attribute values .

Aim

Allow a data engineer (user) to remove one attribute from a dataset that is not required for the analysis which the dataset if being prepared for

Context

This operation is used when one attribute (or column) present in the original/unrefined dataset is deemed irrelevant to the analysis for which the dataset is to serve as input. The classification of an attribute as relevant or irrelevant is dependent on the analysis for which the dataset is being prepared, and, possibly, dependencies between attributes in the dataset and data preparation operations.

Rationale

removing a column irrelevant for subsequent operations or end-goal analysis makes the dataset size smaller which would affects the processing time for subsequent intermediary operations of the wrangling process and the analysis the dataset is being prepared for.

Mechanisim

Reduce dataset dimension by removing a column of attribute values . This can be done by exploring the facilities found in GUI-based tools and programming language functions.

Formalisim

π(R,i) = {(a₁, …, a_i-1, a_i+1, …, a_n) | (a₁, …, a_n) ϵ R, Where: R is a relation with n columns. i is a column index and a_i represent the value of a column in a row. (Raman, V and Hellerstein, J 2001)

Relational Algebra (RA)

Similar to RA operation Project (π)

Type

Atomic

Class

Unary

Transformation_category

1:1

Inputs

Inputs	Number of input datasets
Input dataset, column to remove	1

Outputs

Outputs	Number of output datasets
dataset	1

Used in stage(s)

Cleaning , Structuring2

Back