Pattern-Based Anomaly Detection in Mixed-Type Time Series

Abstract: The present-day accessibility of technology enables easy logging of both sensor values and event logs over extended periods. In this context, detecting abnormal segments in time series data has become an important data mining task. Existing work on anomaly detection focuses either on continuous time series or discrete event logs and not on the combination. However, in many practical applications, the patterns extracted from the event log can reveal contextual and operational conditions of a device that must be taken into account when predicting anomalies in the continuous time series. This paper proposes an anomaly detection method that can handle mixed-type time series. The method leverages frequent pattern mining techniques to construct an embedding of mixed-type time series on which an isolation forest is trained. Experiments on several real-world univariate and multivariate time series, as well as a synthetic mixed-type time series, show that our anomaly detection algorithm outperforms state-of-the-art anomaly detection techniques such as MatrixProfile, Pav, Mifpod and Fpof

By Len Feremans, Vincent Vercruyssen*, Boris Cule, Wannes Meert*, and Bart Goethals.

In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Data (ECML PKDD 2019), 2019 Springer.

* Department of Computer Science, KU Leuven, Belgium

Efficiently mining cohesion-based patterns and rules in event sequences

Abstract: Discovering patterns in long event sequences is an important data mining task. Traditionally, research focused on frequency-based quality measures that allow algorithms to use the anti-monotonicity property to prune the search space and efficiently discover the most frequent patterns. In this work, we step away from such measures, and evaluate patterns using cohesion — a measure of how close to each other the items making up the pattern appear in the sequence on average. We tackle the fact that cohesion is not an anti-monotonic measure by developing an upper bound on cohesion in order to prune the search space. By doing so, we are able to efficiently unearth rare, but strongly cohesive, patterns that existing methods often fail to discover. Furthermore, having found the occurrences of cohesive itemsets in the input sequence, we use them to discover the representative sequential patterns and the dominant partially ordered episodes, without going through the computationally expensive candidate generation procedures typically associated with sequential pattern and episode mining. Experiments show that our method efficiently discovers important patterns that existing state-of-the-art methods fail to discover.

By Boris Cule, Len Feremans, and Bart Goethals.

In Data Mining and Knowledge Discovery Volume 33(4), pp.1125-1182, 2019 Springer.

Fleet-oriented pattern mining combined with time series signature extraction for understanding of wind farm response to storm conditions

Abstract: Offshore wind turbine installations are rapidly spreading around Europe and all over the world. These turbines are typically installed in large wind farms combining turbines of the same type. Farm owners target maximal performance of the farm in general and particularly predictability of behaviour. The latter is getting increasingly important since offshore wind farms are being managed more and more as conventional power plants driven by the electricity market supply and demand considerations. The context of zero subsidy farms exposes farm operators to fluctuations in electricity market prices. As such, deep understanding of farm behaviour is essential to come up with a good strategy to deal with these fluctuations.

This paper focusses on the automated extraction of farm-wide response to storm conditions. The input data for the analysis are status logs and SCADA 1-second data. The status logs record the important turbine controller events. Typically, they consist of a number, a time of occurrence, and a time of deactivation. The number is linked to a detailed description. The SCADA data consists of time series of the most important sensors in the turbine: power produced, RPM, wind speed,… The advantage of the 1- sec data over the traditional 10-minute averages is that the dynamic event content is much more preserved. Data of several offshore wind farms is used in the analysis to have a solid dataset. In total, 5 years of data of more than 50 turbines is used.

We show a novel farm-wide pattern mining approach that extracts events occurring for multiple turbines in the same time period. This allows us to identify those events that are predominantly driven by global wind excitations (e.g., gusts) or grid events (e.g., low voltage ride through). From the extracted events we lift out the storm conditions. For these conditions a further investigation of the time series data is done. Using event detection algorithms we extract the signatures of the stop events that each turbine is performing from the time series data. We show that the extreme change in wind speed and wind direction leads to an excessive misalignment of the turbines in the farm, followed by a stop of those turbines. The extracted patterns are compared to the time signatures to show their correlation and complementarity. As such, the typical turbine response to this event is identified. This can serve as input for identification of novel controller approaches by the farm owner and turbine manufacturer to deal with this problem.

By Pieter-Jan Daems*, Len Feremans, Timothy Verstraeten**, Boris Cule, Bart Goethals and Jan Helsen*.

In second World Congress on Condition Monitoring (WCCM) 2019

  • Publication: pdf

* Acoustics and Vibrations Research Group, Vrije Universiteit Brussel

** Artificial Intelligence Laboratory, Vrije Universiteit Brussel

A framework for pattern mining and anomaly detection in multi-dimensional time series and event logs.

Abstract: In the present-day, sensor data and textual logs by many devices. Analyzing these time series data leads to the discovery of interesting patterns and anomalies. In recent years, to discover interesting patterns in time series data detect periods of anomalous behaviour. However, these algorithms are challenging to apply in real-world settings. We propose a framework, generic transformations, that allows to combine state-of-art time series representation, pattern mining, and pattern-based anomaly detection algorithms. Using integration, our framework handles a mix of multi-dimensional continuous series and event logs. Finally, we present an open-source, lightweight, that assists both pattern mining and domain experts to select algorithms, specify parameters, and visually inspect the results, while shielding them from the underlying technical complexity of implementing our framework.

By Len Feremans, Vincent Vercruyssen*, Wannes Meert*, Boris and Bart Goethals.

In International Workshop on New Frontiers in Mining Complex Patterns, held with ECML-PKDD 2019.

* Department of Computer Science, KU Leuven, Belgium

Mining Top-k Quantile-based Cohesive Sequential Patterns

Abstract: Finding patterns in long event sequences is an important data mining task. Two decades ago research focused on finding all frequent patterns, where the anti-monotonic property of support was used to design efficient algorithms. Recent research focuses on producing a smaller output containing only the most interesting patterns. To achieve this goal, we introduce a new interestingness measure by computing the proportion of the occurrences of a pattern that are cohesive. This measure is robust to outliers, and is applicable to sequential patterns. We implement an efficient algorithm based on constrained prefix-projected pattern growth and pruning based on an upper bound to uncover the set of top-k quantile-based cohesive sequential patterns. We run experiments to compare our method with existing state-of-the-art methods for sequential pattern mining and show that our algorithm is efficient and produces qualitatively interesting patterns on large event sequences.

By Feremans Len, Boris Cule and Bart Goethals.

In proceedings of the 2018 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2018.

Combining Instance and Feature Neighbors for Efficient Multi-label Classification

Abstract: Multi-label classification problems occur naturally in different domains. For example, within text categorization the goal is to predict a set of topics for a document, and within image scene classification the goal is to assign labels to different objects in an image. In this work we propose a combination of two variations of k nearest neighborhoods (kNN) where the first neighborhood is computed instance (or row) based and the second neighborhood is feature (or column) based. Instance based kNN is inspired by user-based collaborative filtering, while feature kNN is inspired by item-based collaborative filtering. Finally we apply a linear combination of instance and feature neighbors scores and apply a single threshold to predict the set of labels. Experiments on various multi-label datasets show that our algorithm outperforms other state-of-the-art methods such as ML-kNN, IBLR and Binary Relevance with SVM, on different evaluation metrics. Finally our algorithm uses an inverted index during neighborhood search and scales to extreme datasets that have millions of instances, features and labels.

By Feremans Len, Boris Cule, Celine Vens, and Bart Goethals.

In Proceedings of the International Conference on Data Science and Advanced Analytics (DSAA), 2017 (pp. 109-118)

Efficient Discovery of Sets of Co-occurring Items in Event Sequences

Abstract: Discovering patterns in long event sequences is an important data mining task. Most existing work focuses on frequency-based quality measures that allow algorithms to use the anti-monotonicity property to prune the search space and efficiently discover the most frequent patterns. In this work, we step away from such measures, and evaluate patterns using cohesion—a measure of how close to each other the items making up the pattern appear in the sequence on average. We tackle the fact that cohesion is not an anti-monotonic measure by developing a novel pruning technique in order to reduce the search space. By doing so, we are able to efficiently unearth rare, but strongly cohesive, patterns that existing methods often fail to discover.

By Boris Cule, Len Feremans, and Bart Goethals.

In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Data (ECML PKDD 2016), 2016 Springer.

Pattern mining for learning typical turbines response during dynamic wind turbine events

Abstract: Maintenance costs are a main cost driver for offshore wind energy. Prediction of failure and particularly failure understanding can help to bring these costs down significantly. Since the wind turbine is subjected to a large number of dynamic events it is important to fully understand the turbine response to these events. Pattern mining has been used successfully for different applications. We believe it to have large potential for understanding turbine behavior based on turbine status logs. These logs record all turbine actions and can be used as input for pattern mining algorithms. This paper proposes the use of a multi-level pattern mining approach in order to minimize the number of uninteresting patterns and facilitate response understanding. The paper mainly focuses on the extraction of patterns and association rules linked to certain alarms and how they can be annotated for further use in the multi-level pattern mining approach. Several years of wind turbine data is used. The use of the approach is illustrated by detecting the characteristic pattern linked to turbine response to an Extremely High Wind Speed Alert.

By Len Feremans, Boris Cule, Christof Devriendt*, Bart Goethals and Jan Helsen*.

In ASME 2017 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference.

  • Publication: pdf
* Department of Mechanical Engineering, Vrije Universiteit Brussel