DW Handbook

This page shows the name of the construct, its description

Attribute

Value

Name

Aggregate

Description

None

Function

None

Aim

Allow a data engineer (user) to create a summarised form specific numerical attributes of the dataset using a set of statistical functions (sum, count, mean, min, max, median or a combination of these) and grouped by specific attributes

Context

This operation is used to create a new dataset aggregating the values within a specified attribute using a statistical function across distinct values of another specified attribute to serve as input into target data analysis.

Rationale

creating a summarised form of the dataset can be used to increase value from the analysis the datasets are being prepared for.

Mechanisim

Aggregate a dataset by creating a statistical summary of a dataset. This can be done by exploring the facilities found in GUI-based tools and programming language functions.

Formalisim

_{<indicies of grouping attribute(s)>} ℑ (R) = {(<indicies of grouping attribute(s)>,func_a_j) | <indicies of grouping attribute(s)> and j are index values of attributes ϵ R, Where: R is a relation with n columns. func is an aggregation function (such as sum,count,average,min,max), func_a_j represents the result of func applied to the column a_j. (Elmasri and Navathe, 2015)

Relational Algebra (RA)

Similar to RA operation Aggregation(γ)

Type

Atomic

Class

Unary

Transformation_category

N:1

Inputs

Inputs	Number of input datasets
Input dataset, index of grouping attribute, function to aggregate by applied to index of aggregated attribute	1

Outputs

Outputs	Number of output datasets
Aggregated dataset	1

Used in stage(s)

Structuring2

Back