Benchmarking of Distributed Computing Engines Spark and GraphLab for Big Data Analytics

Jian Wei, Kai Chen, Yi Zhou, Qu Zhou, Jianhua He

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    17 Citations (Scopus)

    Abstract

    In this paper we evaluate and compare two representativeand popular distributed processing engines for large scalebig data analytics, Spark and graph based engine GraphLab. Wedesign a benchmark suite including representative algorithmsand datasets to compare the performances of the computingengines, from performance aspects of running time, memory andCPU usage, network and I/O overhead. The benchmark suite istested on both local computer cluster and virtual machines oncloud. By varying the number of computers and memory weexamine the scalability of the computing engines with increasingcomputing resources (such as CPU and memory). We also runcross-evaluation of generic and graph based analytic algorithmsover graph processing and generic platforms to identify thepotential performance degradation if only one processing engineis available. It is observed that both computing engines showgood scalability with increase of computing resources. WhileGraphLab largely outperforms Spark for graph algorithms, ithas close running time performance as Spark for non-graphalgorithms. Additionally the running time with Spark for graphalgorithms over cloud virtual machines is observed to increaseby almost 100% compared to over local computer clusters.

    Original languageEnglish
    Title of host publicationProceedings - 2016 IEEE 2nd International Conference on Big Data Computing Service and Applications, BigDataService 2016
    PublisherIEEE
    Pages10-13
    Number of pages4
    ISBN (Electronic)9781509022519
    DOIs
    Publication statusPublished - 19 May 2016
    Event2nd IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2016 - Oxford, United Kingdom
    Duration: 29 Mar 20161 Apr 2016

    Publication series

    NameProceedings - 2016 IEEE 2nd International Conference on Big Data Computing Service and Applications, BigDataService 2016

    Conference

    Conference2nd IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2016
    Country/TerritoryUnited Kingdom
    CityOxford
    Period29/03/161/04/16

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Information Systems and Management
    • Computer Science Applications
    • Information Systems

    Fingerprint

    Dive into the research topics of 'Benchmarking of Distributed Computing Engines Spark and GraphLab for Big Data Analytics'. Together they form a unique fingerprint.

    Cite this