Expert lecture: Reproducibility of results in Big Data systems

Big Data in Business was the motto of a workshop of the Competence Center for Scalable Data Services and Solutions (ScaDS) held on November 13, 2015. Michael Schmeißer from mgm gave the talk on site and spoke about challenges and solution approaches revolving around reproducible results in Big Data scenarios.

Products from Big Data systems often have numerous input data sources. With a permanent flow of data into the system and storage in various technologies, the global condition of the input data volume is often difficult to describe. Given the demand to reproduce the data products provided exactly or with slight variations, precisely such an exact description of the input data volume is necessary. The lecture described how this can be achieved by keeping a complete history, guaranteed time windows for consistency within a storage system and a general application of bitemporality.