COMPARISON OF ALGORITHMS FOR CHOOSING THE BEST ONE FOR THE FREQUENT ITEM SET MINING TECHNIQUES

Abstract

Paper Title/ Authors Name	Download	View
COMPARISON OF ALGORITHMS FOR CHOOSING THE BEST ONE FOR THE FREQUENT ITEM SET MINING TECHNIQUES Manu Mohan. P, Prof. Rasheeda Z Khan

Massive amount of data is stored and transferred from the tremendous number of sources like sensor devices, mobile devices, social media networks, network operators, internet applications etc and those data are called as Big data. This big data is a set of structured and unstructured data as it is coming from various kinds of sources and will include text files, audio files, video clips, images, and even graphs and charts. So the management of big data is an important stage in the development of all kind of business fields. Big data management is not done using conventional tools and software techniques. The big data management is essential as it needs efficient techniques and the result will provide better insights about the stored data. There are many algorithms used for big data analysis. But the traditional methods need the entire data to be in main memory. But it is not possible to get all the data to be in main memory. Association rules and frequent itemset mining are the common techniques used for the big data management. To handle this drawback new Hadoop Mapreduce framework is used which has scalability and robustness features to manage big data sets. A new algorithm called clustBigFIM algorithm which is a modified bigFIM algorithm which makes use of Apriori algorithm and Ã©clat algorithm for finding extensions had been implemented in HadoopMapReduce paradigm. The problem with hadoop mapreduce is that it stores the intermediate results in local discs. So it will become necessary to retrieve these data from the intermediate discs for further use and hence it will take time to access. This will lead to high latency problem. Spark gives a sequential execution model which leads to an in memory computational mechanism and querying data will be much faster than the disc based methods like MapReduce. So the paper mainly points out the advantages of spark framework to use clustBigFIM algorithm to enhance the speed of process and get better efficiency