Name Aggregate
Description None
Function None
Aim Allow a data engineer (user) to create a summarised form specific numerical attributes of the dataset using a set of statistical functions (sum, count, mean, min, max, median or a combination of these) and grouped by specific attributes
Context This operation is used to create a new dataset aggregating the values within a specified attribute using a statistical function across distinct values of another specified attribute to serve as input into target data analysis.
Rationale creating a summarised form of the dataset can be used to increase value from the analysis the datasets are being prepared for.
Mechanisim Aggregate a dataset by creating a statistical summary of a dataset. This can be done by exploring the facilities found in GUI-based tools and programming language functions.
Formalisim <indicies of grouping attribute(s)> (R) = {(<indicies of grouping attribute(s)>,func_aj) | <indicies of grouping attribute(s)> and j are index values of attributes ϵ R, Where: R is a relation with n columns. func is an aggregation function (such as sum,count,average,min,max), func_aj represents the result of func applied to the column aj. (Elmasri and Navathe, 2015)
Relational Algebra (RA) Similar to RA operation Aggregation(γ)
Type Atomic
Class Unary
Transformation_category N:1
InputsNumber of input datasets
Input dataset, index of grouping attribute, function to aggregate by applied to index of aggregated attribute1
OutputsNumber of output datasets
Aggregated dataset1
Used in stage(s) Structuring2
