Iterative sampling based frequent itemset mining for big data

Xian Wu, Wei Fan, Jing Peng, Kun Zhang, Yong Yu

Research output: Contribution to journalArticlepeer-review

26 Scopus citations

Abstract

Frequent pattern mining attracts extensive research interests over the past two decades: including mining frequent item sets from transactions, extracting frequent sequences from bio-arrays and detecting common subgraph from molecular structures. In the era of big data, the explosive data volume brings new challenges to frequent pattern mining: (1) Space complexity: both input data, intermediate results and the outputted patterns could be too large to fit into memory which prevents many algorithms from executing; (2) Time complexity: many existing approaches rely on exhaustive search or complicated data structures to mine frequent patterns which prove to be inapplicable for big data. To deal with these two challenges. we propose ISbFIM, an Iterative Sampling based Frequent Itemset Mining method. Rather than process the entire data set at once, ISbFIM samples computationally-manageable subsets and extracts frequent itemsets from these subsets. By repeating this process for a sufficient number of times, we can guarantee both theoretically and empirically that the frequent itemsets can be enumerated without running into a combinatorial explosion. ISbFIM can be easily parallelized and applied to mine item sets, sequences or structures. We implement a Map-Reduce version of ISbFIM to demonstrate its scalability on big data.

Original languageEnglish
Pages (from-to)875-882
Number of pages8
JournalInternational Journal of Machine Learning and Cybernetics
Volume6
Issue number6
DOIs
StatePublished - 1 Dec 2015

Keywords

  • Big data
  • Frequent itemset mining
  • Iterative sampling
  • Map-reduce
  • Parallelization

Fingerprint

Dive into the research topics of 'Iterative sampling based frequent itemset mining for big data'. Together they form a unique fingerprint.

Cite this