Friday, April 26, 2013

Implementation of C4.5 Algorithm using Hadoop Map Reduce Paradigm


C4.5 is a commonly used in decision tree algorithm in data mining for classification. The existing C4.5 algorithm implementation is running in serial way. We are implementing this algorithm using Hadoop MapReduce framework which can run parallel in multiple system. In this project we are comparing our result with Weka's result where C4.5 is serially implemented with different data source of different size.


Algorithm:

CurrentNode is assumed for splitting.
Map(key, value)
{