Reducing memory requirements for high-performance and numerically stable Gaussian elimination

David Peter Boland

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

2 Citations (Scopus)

Abstract

Gaussian elimination is a well-known technique to compute the solution to a system of linear equations and boosting its performance is highly desirable. While straightforward parallel techniques are limited either by I/O or on-chip memory bandwidth, block-based algorithms offer the potential to bridge this gap by interleaving I/O with computation. However, these algorithms require the amount of on-chip memory to be at least the square of the number of processing elements available. Using the latest generation Altera FPGAs with hardened floating-point units, this is no longer the case. It follows that the amount of on-chip memory limits performance, a problem that is only likely to increase unless on-chip memory dominates FPGA architecture. In addition to this limitation, existing FPGA implementations of block-based Gaussian elimination either sacrifice numerical stability or efficiency. The former limits the usefulness of these implementations to a small class of matrices, the latter limits its performance. This paper presents a high-performance and numerically stable method to perform Gaussian elimination on an FPGA. This modified algorithm makes use of a deep pipeline to store the matrix and ensures that the peak performance is once again limited by the number of floating-point units that can fit on the FPGA. When applied to large matrices, this technique can obtain a sustained performance of up to 256 GFLOPs on an Arria 10, beginning to tap into the full potential of these devices. This performance is comparable to the peak that could be achieved using a simple block-based algorithm, with the performance on a Stratix 10 predicted to be superior. This is in spite of the fact that the underlying algorithm for the implementation in this paper, Gaussian elimination with pairwise pivoting, is more complex and applicable to a wider range of practical problems.
Original languageEnglish
Title of host publicationFPGA 2016: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
EditorsJonathan Greene
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages244 - 253
Number of pages10
ISBN (Print)9781450338561
DOIs
Publication statusPublished - 2016
EventACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA) 2016 - Monterey, United States of America
Duration: 21 Feb 201623 Feb 2016
https://dl.acm.org/doi/proceedings/10.1145/2847263 (Proceedings)

Conference

ConferenceACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA) 2016
Abbreviated titleFPGA 2016
CountryUnited States of America
CityMonterey
Period21/02/1623/02/16
Internet address

Keywords

  • Gaussian elimination
  • Pairwise pivoting
  • FPGA
  • Linear algebra

Cite this