DATE 2019

A Parallel Graph Environment for Real-World DataAnalytics Workflows

Vito Giovanni Castellana^1,a, Maurizio Drocco^1,b, John Feo^1,c, Jesun Firoz^2,k, Thejaka Kanewala^2,l, Andrew Lumsdaine^1,d, Joseph Manzano^1,e, Andrés Marquez^1,f, Marco Minutoli^1,g, Joshua Suetterlein^1,h, Antonino Tumeo^1,i and Marcin Zalewski^1,j
¹High Performance Computing Pacific Northwest National Laboratory Richland, WA, USA
^avitoGiovanni.castellana@pnnl.gov
^bmaurizio.drocco@pnnl.gov
^cjohn.feo@pnnl.gov
^dandrew.lumsdaine@pnnl.gov
^ejoseph.manzano@pnnl.gov
^fandres.marquez@pnnl.gov
^gmarco.minutoli@pnnl.gov
^hjoshua.suetterlein@pnnl.gov
ⁱantonino.tumeo@pnnl.gov
^jmarcin.zalewski@pnnl.gov
²School of Informatics, Computing, and Engineering Indiana University Bloomington, IN, USA
^kjsfiroz@iu.edu
^ljthejkane@iu.edu

ABSTRACT

Economic competitiveness and national security depend increasingly on the insightful analysis of large data sets. The diversity of real-world data sources and analytic workflows impose challenging hardware and software requirements for parallel graph platforms. The irregular nature of graph methods is not supported well by the deep memory hierarchies of conventional distributed systems, requiring new processor and runtime system designs to tolerate memory and synchronization latencies. Moreover, the efficiency of relational table operations and matrix computations are not attainable when data is stored in common graph data structures. In this paper, we present HAGGLE, a high-performance, scalable data analytics platform. The platform’s hybrid data model supports a variety of distributed, thread-safe data structures, parallel programming constructs, and persistent and streaming data. An abstract runtime layer enables us to map the stack to conventional, distributed computer systems with accelerators. The runtime uses multithreading, active messages, and data aggregation to hide memory and synchronization latencies on large-scale systems.

Keywords: Graph Analytics, Attributed Graphs

Full Text (PDF)