Because of recent developments in knowledge engineering and hardware, large digital databases have become common. The value of these databases, however, is not only determined by their size, but also by the possibility of analysis. Data mining is the discovery of previously unknown dependencies in data. The goal is to find and reveal structure in the data, rather than going into the details of the data. These kinds of meta-data determine the value of the database. There are already important applications of data mining in the industry. The advantages of data mining for a company are huge. Since data mining is concerned with the meta-level of the data, there are many similarities with artificial intelligence and knowledge engineering. Because of this, data mining is regularly referred to as `knowledge discovery in databases.' Data mining is also strongly linked with OLAP online analytical processing. The research of data mining techniques started in the early nineties and grew enormously since 1995.
Association rules are a type of rules commonly studied in data mining. There are a lot of known algorithms for finding this type of rules. However, this type of rules is very elementary. It would be interesting to study more general rules. A possible description language is first order logic. There are multiple complexity measures that can be studied: complexity in the number of tuples, in the number of attributes, and in the length of the rules. An interesting problem is the classification of these rules by their complexities. Afterwards, the research can go into studying patterns that allow algorithms with acceptable running times. In my graduate thesis, the search for a certain type of more general rules is studied.
Another aspect of data mining is the following: how can a user of database- and OLAP-systems make maximal use of data mining tools? In this perspective, there is a need for expressive query languages that give the user the opportunity to query the database in a simple way. Another point of interest is the notion of `genericity'. Generic queries are queries that are independent of the chosen data structure. In data mining, however, many of the methods are strongly dependent on the physical shape of the data.
At the moment, there is a lot of interest in data mining. At the UIA (University of Antwerp), a project on data mining, funded by FWO, is running. The proposed project is an extension to this project. The understanding of theoretical foundations is important. In contrast with much of the current research, that is performed in an ad-hoc way, this project has as goal to enlarge the theoretical knowledge in data mining.