Edinburgh Research Explorer

A Bayesian Network Model for Interesting Itemsets

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationThe European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML-PKDD 2016)
Place of PublicationRiva del Garda, Italy
PublisherSpringer, Cham
Pages410-425
Number of pages16
ISBN (Electronic)978-3-319-46227-1
ISBN (Print)978-3-319-46226-4
DOIs
Publication statusPublished - 4 Sep 2016
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2016 - Riva del Garda, Italy
Duration: 19 Sep 201623 Sep 2016
http://www.ecmlpkdd2016.org/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer, Cham
Volume9852
ISSN (Print)0302-9743

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2016
Abbreviated titleECML-PKDD 2016
CountryItaly
CityRiva del Garda
Period19/09/1623/09/16
Internet address

Abstract

Mining itemsets that are the most interesting under a statistical model of the underlying data is a commonly used and well-studied technique for exploratory data analysis, with the most recent interestingness models exhibiting state of the art performance. Continuing this highly promising line of work, we propose the first, to the best of our knowledge, generative model over itemsets, in the form of a Bayesian network, and an associated novel measure of interestingness. Our model is able to efficiently infer interesting itemsets directly from the transaction database using structural EM, in which the E-step employs the greedy approximation to weighted set cover. Our approach is theoretically simple, straightforward to implement, trivially parallelizable and retrieves
itemsets whose quality is comparable to, if not better than, existing state
of the art algorithms as we demonstrate on several real-world datasets.

Download statistics

No data available

ID: 26618373