OTL is a high-level data processing language. It was originally designed on the basis of Splunk's SPL language and is backward compatible with SPL for most of its features.
The OTL language is a highly efficient solution to a number of processing tasks, especially in machine-generated logs.
Main functions of the language are:
- full-text search and filtering on all message content;
- selection and search by selected fields;
- aggregation;
- union;
- data enrichment from external or internal sources;
- calculation of statistical indicators;
- mathematical data transformations;
- data preparation for specific types of visualization;
- train and apply machine learning models from a wide range of open source libraries.
A fundamental feature of OTL is high efficiency of using the parallelizing computations possibilities over a computing resources cluster both in near real-time and batch processing modes.
SMaLL (Simple Machine Learning Language) is language that extends the capabilities of OTL and allows users to use all the features of the WDC-Platform without machine learning techniques or data engineering knowledge.
The main idea of SMaLL is to reduce the solution of most analytical problems to three standard fixed command pipeline sequences - patterns:
- Model training pattern (GetData | Fit | Explain | Score | Show).
- Model application pattern (GetData | Apply | Score | Show).
- Directories pattern (GetData | Eval | Put).
These sequences are constant and differ only by specifying different profiles as operands - selected areas for which plugin sets are created - specialized data preprocessors and machine learning algorithms specific to a particular area of knowledge.
Such approach allows to hide the processing data routine details from the end user and concentrate his/her efforts on achieving the final result.
The unique feature of SMaLL is the obligatory Explain command, which interprets the machine learning model and presents it in a human-understandable form. This allows not only to understand and adjust the resulting model, but also bring experts' knowledge into it, which is unavailable in a particular training dataset.