Attribute
|
Value
|
Name
|
Remove Column
|
Description
|
exclude a column from a relation
|
Function
|
Reduce dataset horizontal dimension by removing a column of attribute values .
|
Aim
|
Allow a data engineer (user) to remove one attribute from a dataset that is not required for the analysis which the dataset if being prepared for
|
Context
|
This operation is used when one attribute (or column) present in the original/unrefined dataset is deemed irrelevant to the analysis for which the dataset is to serve as input. The classification of an attribute as relevant or irrelevant is dependent on the analysis for which the dataset is being prepared, and, possibly, dependencies between attributes in the dataset and data preparation operations.
|
Rationale
|
removing a column irrelevant for subsequent operations or end-goal analysis makes the dataset size smaller which would affects the processing time for subsequent intermediary operations of the wrangling process and the analysis the dataset is being prepared for.
|
Mechanisim
|
Reduce dataset dimension by removing a column of attribute values . This can be done by exploring the facilities found in GUI-based tools and programming language functions.
|
Formalisim
|
π(R,i) = {(a1, …, ai-1, ai+1, …, an) | (a1, …, an) ϵ R, Where: R is a relation with n columns. i is a column index and ai represent the value of a column in a row. (Raman, V and Hellerstein, J 2001)
|
Relational Algebra (RA)
|
Similar to RA operation
Project (π)
|
Type
|
Atomic
|
Class
|
Unary
|
Transformation_category
|
1:1
|
Inputs
|
Inputs | Number of input datasets |
Input dataset, column to remove | 1 |
|
Outputs
|
Outputs | Number of output datasets |
dataset | 1 |
|
Used in stage(s)
|
Cleaning
,
Structuring2
|