Exploiting Architecture Advances for Sparse Solvers in Circuit Simulation

Zhiyuan Yan1,2,3,a, Biwei Xie1,2,3,b, Xingquan Li3,4,c and Yungang Bao1,2,3,d
1State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences
2University of Chinese Academy of Sciences
3Peng Cheng Laboratory
4Minnan Normal University
ayanzhiyuan@ict.ac.cn
bxiebiwei@ict.ac.cn
cfzulxq@gmail.com
dbaoyg@ict.ac.cn

ABSTRACT


Sparse direct solvers provide vital functionality for a wide variety of scientific applications. The dominated part of the sparse direct solver, LU factorization, suffers a lot from the irregularity of sparse matrices. Meanwhile, the specific characteristics of sparse solvers in circuit simulation and unique sparse pattern of circuit matrices provide more design spaces and also great challenges.

In this paper, we propose a sparse solver named FLU and re-examine the performance of LU factorization from the perspectives of vectorization, parallelization, and data locality. To improve vectorization efficiency and data locality, F LU introduces a register-level supernode computation method by delicately manipulating data movement. With alternating multiple columns computation, FLU further reduces the off-chip memory accesses greatly. Furthermore, we implement a fine-grained elimination tree based parallelization scheme to fully exploit task-level parallelism. Compared with PARDISO and NICSLU, experimental results show that FLU achieves a speedup up to 19.51× (3.86× on average) and 2.56× (1.66× on average) on Intel Xeon respectively.

Keywords: High Performance Computing, Circuit Simulation, Sparse LU Factorization.



Full Text (PDF)