DW Handbook

This page shows the name of the construct, its description

Attribute Value
Name Remove Column
Description exclude a column from a relation
Function Reduce dataset horizontal dimension by removing a column of attribute values .
Aim Allow a data engineer (user) to remove one attribute from a dataset that is not required for the analysis which the dataset if being prepared for
Context This operation is used when one attribute (or column) present in the original/unrefined dataset is deemed irrelevant to the analysis for which the dataset is to serve as input. The classification of an attribute as relevant or irrelevant is dependent on the analysis for which the dataset is being prepared, and, possibly, dependencies between attributes in the dataset and data preparation operations.
Rationale removing a column irrelevant for subsequent operations or end-goal analysis makes the dataset size smaller which would affects the processing time for subsequent intermediary operations of the wrangling process and the analysis the dataset is being prepared for.
Mechanisim Reduce dataset dimension by removing a column of attribute values . This can be done by exploring the facilities found in GUI-based tools and programming language functions.
Formalisim π(R,i) = {(a1, …, ai-1, ai+1, …, an) | (a1, …, an) ϵ R, Where: R is a relation with n columns. i is a column index and ai represent the value of a column in a row. (Raman, V and Hellerstein, J 2001)
Relational Algebra (RA) Similar to RA operation Project (π)
Type Atomic
Class Unary
Transformation_category 1:1
Inputs
InputsNumber of input datasets
Input dataset, column to remove1
Outputs
OutputsNumber of output datasets
dataset1
Used in stage(s) Cleaning , Structuring2

Back