180 likes | 333 Vues
This project presents the design and implementation of an in-memory Relational Algebra Processor (RAP) on an FPGA aimed at overcoming bottlenecks in traditional Database Management Systems (DBMS). Today's DBMS, constrained by processor speed, software overhead, latency, and bandwidth, can benefit significantly from FPGA acceleration. The RAP tackles basic RA operators, seeking to outperform SQLite. Key aspects include microarchitecture for selection, projection, and binary operations, highlighting performance benchmarks, memory bandwidth considerations, and future improvements for optimizing data-intensive operations.
E N D
A Relational Algebra Processor 6.375 Final Project Ming Liu, ShuotaoXu
Motivation • Today’s Database Management Systems (DBMS): software running on a standard operating system on a general purpose CPU • DBMS frequently used in analytics and scientific computing, but bottlenecked by: • Processor speed, software overhead, latency & bandwidth • Proposal: FPGA Based Relational Algebra Processor Host PC (DBMS) FPGA Relational Algebra Processor Physical Storage
Background|Relational Algebra (RA) • Many database queries are fundamentally decomposable to five basic RA operators • Although SQL is capable of much more Design dedicated processors on the FPGA for each operator
Project Goal • Design and implement an in-memory relational algebra processor on the FPGA • Explore the types of queries that can benefit from FPGA acceleration • Secondary: Outperform SQLite! • Some assumptions: • 32-bit wide table entries • Tables fit in memory • Max number of columns is 32 • Read only
Microarchitecture | Top-Level RAProcessor Host PC (C++ functions) Host PC (DBMS) RA Processor RA Processor DRAM Physical Storage PCIe
Microarchitecture | Row Marshaller • Exposes a simple interface for operators to access tables in DRAM • Address translation, burst aggregation, truncation & alignment • Multiplexes requests • Table values sent/received as 32-bit bursts
Microarchitecture | Selection • Filters rows based on predicates (e.g. age < 40) • 16 predicate evaluators • Internally comparators • A tree of gates to qualify the predicates • Max: 4 ORs of 4 ANDs
Microarchitecture | Projection • Select columns of a table • Column mask one-hot encoded • Do not need to buffer row; operate directly on data bursts
Microarchitecture | Binary Operators • Cartesian Product, Union, Difference and Deduplication • Nested loop implementation
Microarchitecture|Inter-operatorBypassing • Operators enabled concurrently; data passed between operators • No intermediate storage • Conditions: • A singly link of unary operators • Each operator has a single target output • No structural hazard • Software reorders and schedules the RA commands • Data source/destination encoded in command
Microarchitecture|Inter-operatorBypassing • Multiple 32-bit wide output FIFOs to other operators
Implementation Evaluation • Timing • Maximum Frequency: 55.786MHz • Critical Path: Row Marshaller mux • Area • Slice Registers: 50% • LUTs: 85% • BRAM/FIFOs: 47%
Performance Benchmark | Setup • SQLite • Internal SQLite timer to report execution time of the query • Thinkpad T430, Core i7-3520M @ 2.90Ghz, 1x8GB DDR3-1600 • RA Processor • Performance counters: cycles from start to ack of an operator
Performance Benchmark | Results • Limitation: Memory Bandwidth: 200MB/s vs 12.8GB/s
Performance Benchmark | Results • Select operator most competitive with SQLite • What happens with more predicates?
Improvements • Increasing data burst width • 32-bit to 256-bit: potential 8x speedup • Area/critical path increase • Maximizing memory bandwidth • Additional row buffers to buffer data from DDR2 Memory • Larger, faster DRAM; Higher clock speed
Conclusion & Future Work • Complex filtering operations performs well on the FPGA • Better than SQLite with sufficient memory bandwidth • Data intensive operators do not perform well • Future opportunities: • An accelerator alongside SQLite • Integration with HDD/SSD controller