HPXA: A Highly Parallel XML Parser

Isaar Ahmada, Sanjog Patilb and Smruti R. Sarangic
IIT Delhi New Delhi, India
aisaar.ahmad@gmail.com
bjvl142701@ee.iitd.ac.in
csrsarangi@cse.iitd.ac.in

ABSTRACT


State of the art XML parsing approaches read an XML file byte by byte, and use complex finite state machines to process each byte. In this paper, we propose a new parser, HPXA, which reads and processes 16 bytes at a time. We designed most of the components ab initio, to ensure that they can process multiple XML tokens and tags in parallel. We propose two basic elements ‐ a sparse 1D array compactor, and a hardware unit called LTMAdder that takes its decisions based on adding the rows of a lower triangular matrix. We demonstrate that we are able to process 16 bytes in parallel with very few pipeline stalls for a suite of widely used XML benchmarks. Moreover, for a 28nm technology node, we can process XML data at 106 Gbps, which is roughly 6.5X faster than competing prior work.

Keywords: XML parser, multibyte input, highly parallel parser.



Full Text (PDF)