Attribute
|
Value
|
Name
|
Union
|
Description
|
None
|
Function
|
None
|
Aim
|
Allow a data engineer (user) to combine two datasets vertically using column-wise merge by appending the contents of the column(s) in the second relation to the column(s) at the same index in the base (first) relation to perform the required analysis and data preparation on a single dataset instead of performing the same on multiple datasets
|
Context
|
This operation is used when a user requires combining multiple input dataset vertically as required for the analysis for which the combined datasets are to serve as input.
|
Rationale
|
combining datasets required for subsequent operations or end-goal analysis would reduce the repetition of duplicate operations and may increases the accuracy of analysis the dataset is being prepared for.
|
Mechanisim
|
merge datasets vertically to reduce duplicate processing of datasets. This can be done by exploring the facilities found in GUI-based tools and programming language functions.
|
Formalisim
|
A(R1,R2) = {(a1,...,an) | (a1,...,an) ∈ R1 ^ (b1,...,bm) ∈ R2}, where R1 is a relation with n columns and R2 is a relation with m columns and m <= n
|
Relational Algebra (RA)
|
Similar to RA operation
Union (U)
|
Type
|
Atomic
|
Class
|
N-Ary
|
Transformation_category
|
M:1
|
Inputs
|
Inputs | Number of input datasets |
Input dataset to merge | M |
|
Outputs
|
Outputs | Number of output datasets |
combined datasets | 1 |
|
Used in stage(s)
|
Integration
|