Log data, root cause diagnostics, multi-type data sources, high-end industrial machinery
Many companies in different industrial domains invest heavily in instrumenting and connecting their industrial equipment and collect large amounts of data. The advanced exploitation of that data by means of machine learning (ML) and artificial intelligence (AI) methods is currently a hot topic. The major focus of state-of-the-art methods is on time-series and image data, as witnessed by recent evolutions of the popular deep learning paradigm (e.g. LSTMs, CNNs). However, equipment also generates log data, which typically contains status messages, events that happen, errors that occur, etc. Such data provides valuable and detailed insights into the status and internal behaviour of the equipment and the incorporation of this log data in the data analytics workflow can help to address industrial challenges related to suboptimal service and support:
- The service cost of complex high-end industrial machinery: While imminent failures can be identified based on sensor data analysis, diagnosing their root cause typically remains problematic. R&D engineers need to scrutinize log files and cross-reference them with sensor data, which is a manual, time-consuming and error-prone task heavily reliant on expert knowledge and domain expertise.
- Optimizing the energy efficiency of industrial equipment: The financial and environmental impact of a system that is performing sub-optimally is significant. Engineers often rely on basic sensor-based analysis that only detects generic inefficiencies, which they need to complement manually with specific log data to understand the specific context, to interpret what is going on and to decide what needs to happen. Obviously, this process cannot be scaled to tens of thousands of systems eligible for optimization without significant automation.
In addition, existing analytical tools and visualization solutions need to be further advanced in order to address certain challenges:
- Current AI and ML methods mostly focus on time-series or image analysis and few methods, if any, are natively capable of dealing with multi-type data sources consisting of a mix of time series and log data for example. However, log data perfectly complements other types of data in several ways:
1) sensor data is often not annotated which prevents the application of supervised learning approaches, whereas knowledge extracted from log data can provide such annotations,
2) unsupervised anomaly detection approaches can identify anomalies in sensor data but cannot pinpoint their exact root cause, whereas analyzing the chain of events within a log file can point to the root cause of an issue,
3) sensor data is not easily interpretable by a domain expert, while log data offers information in natural language.
- Current AI and ML methods are not optimized to deal with the inherent heterogeneity of hardware and software systems in a real-world industrial setting. As such, they can often only detect generic and obvious deviations from normal operations. Log data provides detailed insights into the specific behaviour of a machine, allowing to integrate equipment-specific knowledge into this generic analytical process. However, the lack of standardization prevents straightforward application of standard AI and ML methods, leaving such data underexploited.
- Current data visualization mechanisms are focused on numeric or categorical data only and do not adequately support visualizing a combination of semi-structured log information and multidimensional time series data. This hinders the data science process itself, as data visualization is important to identify patterns, structures and relationships to exploit, as well as decision making by end users, as analytical results cannot be represented and explored in the most intuitive way.
To address these challenges Sirris together with the industrial partners Xeikon, CMC, Datylon, I-Care and Yazzoom initiated the TRACY project. In this project, the partners will investigate how to optimally use the log data generated by industrial assets and refine existing AI and machine learning techniques targeted at time series analysis. To this end, TRACY will research how to handle the complexities of log data, e.g. the heterogeneity of the industrial assets, the lack of standardisation amongst log data and the scalable interactive visualisation of the heterogeneous data. The research will be validated on complex industrial use cases as optimising the performance of compressors and decreasing the service cost of electrophotographic machines.
February 2021 - January 2023
With the financial support of