DW Handbook

The Data Wrangling (DW) Handbook has been created based on mining performed on a set of data processing pipelines that were created using KNIME Analytics Platform, an open-source Workflow Management System (WfMS) used for designing, and orchestrating data science workflows with built-in operations as well as community developed operations. The mined pipelines were sourced from myExperiment, NodePit and KNIME Hub through NodePit.

As part of the mining process, a dictionary of DW stages as well as a taxonomy of DW constructs were required to enable the proper mining of the said pipelines because of the different unique implementations of operations across different tools and also within the same tool. The mining of the pipelines resulted in two sets of patterns, patterns of DW stages and patterns of DW operations. The patterns, stages, and constructs are conceptually represented in the image below followed by links to different sections of the handbook.

List of Stages Patterns of Stages Constructs Patterns of Constructs

Back