Small Big Data

Granulated Data Summaries for Scalable Machine Learning

The core principles of SmallBigData:
  1. Most data is highly redundant and contains garbage information – there is no need to use all the data for useful analysis. The raw data stream may be compacted, compressed or summarized using various methods, thus reducing the size of the data whilst preserving useful information.

  2. Data analysis, particularly machine learning, may be conducted on compressed representations of the original data. This enables models based on large quantities of raw data, but with the same speed and accuracy as for small data sets.

Based on the above principles, SmallBigData is a framework that:
Provides methods for machine learning using compacted representations.
Provides methods for generating compacted representations from raw data.
The type of data includes, but is not limited to, tabular/structured data, images, time series, relational data, sounds, text data
Provides methods for data analysis using compacted representations.

Do you want to know more about BrightBox? Contact us!
Piotr Biczyk
Chief Strategy Office
Do you want to work for BrightBox? Check job offers


Project information

Development of innovative methods for creating summaries of very large data sets in order to improve the effectiveness of machine learning algorithms for large-scale problems.

Application number: POIR.01.01.01-00-0570/19
Value of the project: 11 484 818,46 zł
Donation: 8 269 944.49 zł
Beneficiary: Small Big Data Sp. z o. o.
Project duration: 01.03.2020 – 31.10.2022
Project realised as a part of: Działania 1.1 Poddziałania 1.1.1 Programu Operacyjnego Inteligentny Rozwój 2014-2020 współfinansowanego ze środków Europejskiego Funduszu Rozwoju Regionalnego

Project purpose
The goal of this project is to develop software that supports an innovative approach to compress very large, rapidly growing data sets, along with new versions of known machine learning algorithms (hereinafter referred to as AI-ML algorithms or methods), working directly on previously compressed (summarized, granulated) data, many times faster than the counterparts of these algorithms would work on the original data. In order to develop a solution that will best meet the market needs, cooperation with commercial organizations is planned in the course of the project, consisting in examining the operation of the developed algorithms on specific data sets that are an emanation of the real problems of these organizations. As a result of such action, it will be possible on the one hand to optimize the developed algorithms for specific business and technological problems, and on the other – to create the most general algorithms that work for all sorts of problems.

The research and development work in the project will primarily involve the development of effective algorithms for summarizing data, as well as effective implementations of AI-ML methods that can work on such summaries.

The result of our project will be aimed at companies that have incurred very large investments in data management and processing, and which are considering or implementing AI-ML initiatives. These will be companies for whom the crucial functionality of the IT infrastructure and tools they use is the ability to quickly calculate / recalculate predictive models, etc.

🠕 Go up