Iterative sampling based frequent itemset mining for big data

Xian Wu, Wei Fan, Jing Peng, Kun Zhang, Yong Yu

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Frequent pattern mining attracts extensive research interests over the past two decades: including mining frequent item sets from transactions, extracting frequent sequences from bio-arrays and detecting common subgraph from molecular structures. In the era of big data, the explosive data volume brings new challenges to frequent pattern mining: (1) Space complexity: both input data, intermediate results and the outputted patterns could be too large to fit into memory which prevents many algorithms from executing; (2) Time complexity: many existing approaches rely on exhaustive search or complicated data structures to mine frequent patterns which prove to be inapplicable for big data. To deal with these two challenges. we propose ISbFIM, an Iterative Sampling based Frequent Itemset Mining method. Rather than process the entire data set at once, ISbFIM samples computationally-manageable subsets and extracts frequent itemsets from these subsets. By repeating this process for a sufficient number of times, we can guarantee both theoretically and empirically that the frequent itemsets can be enumerated without running into a combinatorial explosion. ISbFIM can be easily parallelized and applied to mine item sets, sequences or structures. We implement a Map-Reduce version of ISbFIM to demonstrate its scalability on big data.

Original languageEnglish
Pages (from-to)875-882
Number of pages8
JournalInternational Journal of Machine Learning and Cybernetics
Volume6
Issue number6
DOIs
StatePublished - 1 Dec 2015

Fingerprint

Sampling
Molecular structure
Explosions
Data structures
Scalability
Data storage equipment
Big data

Keywords

  • Big data
  • Frequent itemset mining
  • Iterative sampling
  • Map-reduce
  • Parallelization

Cite this

Wu, Xian ; Fan, Wei ; Peng, Jing ; Zhang, Kun ; Yu, Yong. / Iterative sampling based frequent itemset mining for big data. In: International Journal of Machine Learning and Cybernetics. 2015 ; Vol. 6, No. 6. pp. 875-882.
@article{770ffd904af240e49ee880a09f3d624c,
title = "Iterative sampling based frequent itemset mining for big data",
abstract = "Frequent pattern mining attracts extensive research interests over the past two decades: including mining frequent item sets from transactions, extracting frequent sequences from bio-arrays and detecting common subgraph from molecular structures. In the era of big data, the explosive data volume brings new challenges to frequent pattern mining: (1) Space complexity: both input data, intermediate results and the outputted patterns could be too large to fit into memory which prevents many algorithms from executing; (2) Time complexity: many existing approaches rely on exhaustive search or complicated data structures to mine frequent patterns which prove to be inapplicable for big data. To deal with these two challenges. we propose ISbFIM, an Iterative Sampling based Frequent Itemset Mining method. Rather than process the entire data set at once, ISbFIM samples computationally-manageable subsets and extracts frequent itemsets from these subsets. By repeating this process for a sufficient number of times, we can guarantee both theoretically and empirically that the frequent itemsets can be enumerated without running into a combinatorial explosion. ISbFIM can be easily parallelized and applied to mine item sets, sequences or structures. We implement a Map-Reduce version of ISbFIM to demonstrate its scalability on big data.",
keywords = "Big data, Frequent itemset mining, Iterative sampling, Map-reduce, Parallelization",
author = "Xian Wu and Wei Fan and Jing Peng and Kun Zhang and Yong Yu",
year = "2015",
month = "12",
day = "1",
doi = "10.1007/s13042-015-0345-6",
language = "English",
volume = "6",
pages = "875--882",
journal = "International Journal of Machine Learning and Cybernetics",
issn = "1868-8071",
publisher = "Springer Science + Business Media",
number = "6",

}

Iterative sampling based frequent itemset mining for big data. / Wu, Xian; Fan, Wei; Peng, Jing; Zhang, Kun; Yu, Yong.

In: International Journal of Machine Learning and Cybernetics, Vol. 6, No. 6, 01.12.2015, p. 875-882.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Iterative sampling based frequent itemset mining for big data

AU - Wu, Xian

AU - Fan, Wei

AU - Peng, Jing

AU - Zhang, Kun

AU - Yu, Yong

PY - 2015/12/1

Y1 - 2015/12/1

N2 - Frequent pattern mining attracts extensive research interests over the past two decades: including mining frequent item sets from transactions, extracting frequent sequences from bio-arrays and detecting common subgraph from molecular structures. In the era of big data, the explosive data volume brings new challenges to frequent pattern mining: (1) Space complexity: both input data, intermediate results and the outputted patterns could be too large to fit into memory which prevents many algorithms from executing; (2) Time complexity: many existing approaches rely on exhaustive search or complicated data structures to mine frequent patterns which prove to be inapplicable for big data. To deal with these two challenges. we propose ISbFIM, an Iterative Sampling based Frequent Itemset Mining method. Rather than process the entire data set at once, ISbFIM samples computationally-manageable subsets and extracts frequent itemsets from these subsets. By repeating this process for a sufficient number of times, we can guarantee both theoretically and empirically that the frequent itemsets can be enumerated without running into a combinatorial explosion. ISbFIM can be easily parallelized and applied to mine item sets, sequences or structures. We implement a Map-Reduce version of ISbFIM to demonstrate its scalability on big data.

AB - Frequent pattern mining attracts extensive research interests over the past two decades: including mining frequent item sets from transactions, extracting frequent sequences from bio-arrays and detecting common subgraph from molecular structures. In the era of big data, the explosive data volume brings new challenges to frequent pattern mining: (1) Space complexity: both input data, intermediate results and the outputted patterns could be too large to fit into memory which prevents many algorithms from executing; (2) Time complexity: many existing approaches rely on exhaustive search or complicated data structures to mine frequent patterns which prove to be inapplicable for big data. To deal with these two challenges. we propose ISbFIM, an Iterative Sampling based Frequent Itemset Mining method. Rather than process the entire data set at once, ISbFIM samples computationally-manageable subsets and extracts frequent itemsets from these subsets. By repeating this process for a sufficient number of times, we can guarantee both theoretically and empirically that the frequent itemsets can be enumerated without running into a combinatorial explosion. ISbFIM can be easily parallelized and applied to mine item sets, sequences or structures. We implement a Map-Reduce version of ISbFIM to demonstrate its scalability on big data.

KW - Big data

KW - Frequent itemset mining

KW - Iterative sampling

KW - Map-reduce

KW - Parallelization

UR - http://www.scopus.com/inward/record.url?scp=84945565851&partnerID=8YFLogxK

U2 - 10.1007/s13042-015-0345-6

DO - 10.1007/s13042-015-0345-6

M3 - Article

AN - SCOPUS:84945565851

VL - 6

SP - 875

EP - 882

JO - International Journal of Machine Learning and Cybernetics

JF - International Journal of Machine Learning and Cybernetics

SN - 1868-8071

IS - 6

ER -