Attribute
|
Value
|
Name
|
Aggregate
|
Description
|
None
|
Function
|
None
|
Aim
|
Allow a data engineer (user) to create a summarised form specific numerical attributes of the dataset using a set of statistical functions (sum, count, mean, min, max, median or a combination of these) and grouped by specific attributes
|
Context
|
This operation is used to create a new dataset aggregating the values within a specified attribute using a statistical function across distinct values of another specified attribute to serve as input into target data analysis.
|
Rationale
|
creating a summarised form of the dataset can be used to increase value from the analysis the datasets are being prepared for.
|
Mechanisim
|
Aggregate a dataset by creating a statistical summary of a dataset. This can be done by exploring the facilities found in GUI-based tools and programming language functions.
|
Formalisim
|
<indicies of grouping attribute(s)> ℑ (R) = {(<indicies of grouping attribute(s)>,func_aj) | <indicies of grouping attribute(s)> and j are index values of attributes ϵ R, Where: R is a relation with n columns. func is an aggregation function (such as sum,count,average,min,max), func_aj represents the result of func applied to the column aj. (Elmasri and Navathe, 2015)
|
Relational Algebra (RA)
|
Similar to RA operation
Aggregation(γ)
|
Type
|
Atomic
|
Class
|
Unary
|
Transformation_category
|
N:1
|
Inputs
|
Inputs | Number of input datasets |
Input dataset, index of grouping attribute, function to aggregate by applied to index of aggregated attribute | 1 |
|
Outputs
|
Outputs | Number of output datasets |
Aggregated dataset | 1 |
|
Used in stage(s)
|
Structuring2
|