This innovation corresponds to specification of an end-to-end platform for Big Data as-a-self-service, developed within the I-BiDaaS project. The platform specification is mainly based on the consolidated system requirements derived from the I-BiDaaS project end user requirements analysis (functional requirements), and non-functional requirements which stem from software engineering good practices. The specification takes into consideration existing research in big data requirements and complies with existing Big Data Reference Architectures. The platform involves a number of open source and proprietary technologies provided by the I-BiDaaS partners.
The platform specification ensures multiple functionalities offered to end users, including fabrication of realistic synthetic data for experimentation and testing, batch and streaming analytics, and simple, intuitive, and effective visualization and interaction capabilities. The specification assumes three different modes of the platform, offered to different user types.
- The Self-service mode allows a user that has the required domain knowledge and a degree of (non-expert) knowledge about data analysis to construct Big Data pipelines in a user-friendly way, by selecting a pre-defined data analytics algorithm from an available list.
- The Expert mode allows experts (Big Data developers) to upload their own data analytics code based on the available reusable templates.
- The Co-develop mode corresponds to an end-to-end solution for a given industry project that is developed with the use of expertise of the personnel that builds and maintains the platform.
Financial, Manufacturing, Telecommunication, Data Economy, Circular Economy
Current solutions available in the market typically focus either on 1) expert developers and are non-trivial to be configured and used, or 2) non-experts and are easy to use, usually entailing a high price.
The I-BiDaaS platform specification offers a high degree of flexibility that may lead to a more agile or more cost-effective development of Big Data applications in the targeted sectors.
The technological novelties are manyfold. For example, the results produced by the batch processing module are fed back to the data fabrication tool; these results are then used for training and to help building rules that will be used for future data generation purposes. Furthermore, the platform specification allows for stream processing that the parts of the streaming analytics that can be parallelized are offloaded to the GPU-accelerated streaming analytics module. By carefully performing part of the queries at the lowest level (especially for filtering), only the required data is forwarded to the stream-processing engine for a more sophisticated analysis, while the remainder is ignored at the earliest possible.