Proceedings of FIMI'03 Workshop on Frequent Itemset Mining Implementations, The 3rd IEEE International Conference on Data Mining

Proceedings of FIMI'03 Workshop on Frequent Itemset Mining Implementations, The 3rd IEEE International Conference on Data Mining Mohammed J. Zaki Bart Goethals Since the introduction of association rule mining in 1993 by Agrawal Imielinski and Swami [3], the frequent itemset mining (FIM) tasks have received a great deal of attention. Within the last decade, a phenomenal number of algorithms have been developed for min- ing all [3{5, 10, 18, 19, 21, 23, 26, 28, 31, 33], closed [6, 12, 22, 24, 25, 27, 29, 30, 32] and maximal frequent item- sets [1, 2, 7, 11, 15{17, 20, 35]. Every new paper claims to run faster than previously existing algorithms, based on their experimental testing, which is oftentimes quite limited in scope, since many of the original algorithms are not available due to intellectual property and copy- right issues. Zheng, Kohavi and Mason [34] observed that the performance of several of these algorithms is not always as claimed by its authors, when tested on some different datasets. Also, from personal experi- ence, we noticed that even di®erent implementations of the same algorithm could behave Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 11/19/2003 cs-03-14

Proceedings of FIMI'03 Workshop on Frequent Itemset Mining Implementations, The 3rd IEEE International Conference on Data Mining

Mohammed J. Zaki

Bart Goethals

Since the introduction of association rule mining in 1993 by Agrawal Imielinski and Swami [3], the frequent itemset mining (FIM) tasks have received a great deal of attention. Within the last decade, a phenomenal number of algorithms have been developed for min- ing all [3{5, 10, 18, 19, 21, 23, 26, 28, 31, 33], closed [6, 12, 22, 24, 25, 27, 29, 30, 32] and maximal frequent item- sets [1, 2, 7, 11, 15{17, 20, 35]. Every new paper claims to run faster than previously existing algorithms, based on their experimental testing, which is oftentimes quite limited in scope, since many of the original algorithms are not available due to intellectual property and copy- right issues. Zheng, Kohavi and Mason [34] observed that the performance of several of these algorithms is not always as claimed by its authors, when tested on some different datasets. Also, from personal experi- ence, we noticed that even di®erent implementations of the same algorithm could behave

Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY

11/19/2003

cs-03-14