DW Handbook

This page shows the name of the construct, its description

Attribute Value
Name Merge Columns
Description None
Function None
Aim Allow data engineer (user) to combine columns of string attributes to a new columns using a glue character
Context This operation is used when two attributes (or columns) present in the original/unrefined dataset requires merging to a single attribute (ex: parts of a name) to aid the analysis for which the dataset is to serve as input.
Rationale Providing a merged version of a column is used for fulfilment of analysis requirements or to ensures accurate results from the analysis the dataset is being prepared for.
Mechanisim Create a new columns with values from two existing columns and a string as a glue between them using enactors E12 and E15.
Formalisim µ((a1,...,an),i,j,glue) = α(R, x), φ(R, indexOf(x), ai⊕ glue ⊕ aj) = {(a1,...,an, ai⊕ glue ⊕ aj) | (a1,...,an) ∈ R}; where R is a relation, i and j are indices of the columns to be merged, glue is a character to be used to connect the values in the two columns, x is a column name to be created, x ⊕ y concatenates x and y. (Raman, V and Hellerstein, J 2001)
Relational Algebra (RA) Similar to RA operation Attribute Extension (ε)/Generalized Projection
Type Composite
Class Unary
Transformation_category 1:1
Inputs
InputsNumber of input datasets
None1
Outputs
OutputsNumber of output datasets
None1
Used in stage(s) Structuring1

Back