A Hardware Implementation of the MCAS Synchronization Primitive

Srishty Patela, Rajshekar Kalayappanb, Ishani Mahajanc and Smruti R. Sarangid
Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India.
asrishtypatel13@gmail.com
brajshekar.kalayappan@gmail.com
cishanimahajan7@gmail.com
dsrsarangi@cse.iitd.ac.in

ABSTRACT


Lock-based parallel programs are easy to write. However, they are inherently slow as the synchronization is blocking in nature. Non-blocking lock-free programs, which use atomic instructions such as compare-and-set (CAS), are significantly faster. However, lock-free programs are notoriously difficult to design and debug. This can be greatly eased if the primitives work on multiple memory locations instead of one. We propose MCAS, a hardware implementation of a multi-word compare-and-set primitive. Ease of programming aside, MCASbased programs are 13.8X and 4X faster on an average than lock-based and traditional lock-free programs respectively. The area overhead, in a 32-core 400mm2 chip, is a mere 0.046%.



Full Text (PDF)