Efficient Data Access Management over Flash-Memory Storage Systems

快閃記憶體儲存系統 –資料存取管理(Efficient Data Access Management over Flash-Memory Storage Systems) 吳晉賢台灣大學資訊工程研究所

Vita • 姓名 : 吳晉賢 • 學歷 • 博士：國立台灣大學資訊工程研究所 (2001年9月~2006年6月) • 碩士：國立台灣大學資訊工程研究所 (1999年9月~2001年6月) • 學士：國立中正大學資訊工程學系 (1995年9月~1999年6月) • 現職 • 國安局科技中心少尉資訊預官 (預計今年7月退伍) • 著作論文/專利 (共發表 8 篇學術論文於國內外著名期刊或國際會議) • International Conferences • ACM GIS • IEEE/ACM/IFIP CODES-ISSS • IEEE/ACM ICCAD • Journals • ACM Transactions on Embedded Computing Systems • ACM Transactions on Storage • Patent • 兩篇國內外專利申請中

Vita • 專長 • 嵌入式儲存系統、嵌入式檔案系統、即時系統、作業系統、計算機結構 • 經歷 • 訊連科技公司工程師 • 開發Linux版的PowerDVD • 主要負責Video Decoder的移植 • 微軟的VOIP Conference Project • 以RTC Library及C#開發程式 • 整合不同平台的VOIP設備 • 課程助教 - 台灣大學資工系 • 計算機概論、演算法、資料庫系統、即時系統、計算機結構、賽局理論、微積分 • 台大系統訓練班講師 • C/C++程式設計、網頁設計 • 擔任過多種 IEEE/ACM 期刊及會議的審稿者

Outline • Introduction • Flash Memory Characteristics • Efficient Data Access Managements over Flash-Memory Storages Systems • Efficient B-Tree Index Structures over Flash Memory • Efficient Initialization and Crash Recovery for File Systems over Flash Memory • An Adaptive Two-Level Management for the Flash Translation Layer in Embedded Systems • Conclusion

Introduction Reference: SAMSUNG’s web site

Introduction • Benefits of NAND Flash Memory • Shock Resistant、Non-Volatile、Power Economic NAND Flash Chips are inserted to here

Introduction • Longer file-system initialization time due to large-scale flash-memory storage systems. • Efficient initialization and crash recover for log-based file systems over flash memory. • Frequent small updates could deteriorate flash-memory storage systems. • B-Tree index structures over flash memory. • Overheads due to fine-grained and coarse-grained address translation layer for large-scale flash-memory storage systems. • An Adaptive Two-Level Management for the Flash Translation Layer in Embedded Systems.

Organization of a Typical NAND Flash MemoryErase a block time (2ms) > Write a page time (200us) > Read a page time (50us) Flash Memory Characteristics Spare Area 16B 1 Page = 512B + 16B 1 Block = 32 pages User Area 512B Read/Write one page Block 0 Block 1 Block 2 Erase one block Block 3 … …

Example 2:Garbage Collection Flash Memory Characteristics L D D L D D L D This block is to be recycled. (3 live pages and 5 dead pages) L L D L L L F D L F L L L L D F A live page F L L F L L F D A dead page A free page

Example 2:Garbage Collection Flash Memory Characteristics D D D D D D D D Live data are copied to somewhere else. L L D L L L L D L F L L L L D L A live page L L L F L L F D A dead page A free page

Example 2:Garbage Collection Flash Memory Characteristics F F F F F F F F • The block is then erased. • Overheads: • live data copying • block eraseing. L L D L L L L D L F L L L L D L A live page L L L F L L F D A dead page A free page

Flash Memory Characteristics

Flash Memory Characteristics Flash Translation Layer

Efficient B-Tree Index Structures over Flash Memory To reduce the number of page writes To reduce the number of block erases To reduce the overhead of garbage collection

Efficient B-Tree Index Structures over Flash Memory I1 I2 I3 I4 I5 I6 20 25 85 180 185 250 IndexUnit Sector 1 Sector 2 Flash Translation Layer (FTL)

Efficient B-Tree Index Structures over Flash Memory I1 I2 I3 I4 I5 I6 20 25 85 180 185 250 IndexUnit Sector 1 Sector 2 Flash Translation Layer (FTL) • Index Units vs. B-Tree Nodes

Efficient B-Tree Index Structures over Flash Memory Node: D F H I J I1 I2 I3 I4 I5 I6 I1 I2 I3 I4 I5 I6 Index Unit: 20 20 25 25 85 85 180 180 185 185 250 250 IndexUnit Sector: Sector 1 Sector 1 Sector 2 Sector 2 Flash Translation Layer (FTL) • The Commit Policy Suppose that a sector can store up to 3 index units • The technical issue is to minimize the number of sectors written to store the committed index units. • The packing problem of index units into sectors is NP-Hard. • That is reduced from the Bin-Packing problem. • A FIRST-FIT approximation algorithm was adopted in our experiments.

Efficient B-Tree Index Structures over Flash Memory 60 42 34 100 53 • The Node Translation Table A A A 23 B B B C C C 23 100 D 2 100 15 . D E F G H I . . (a) The logical view of a B-tree (b) The node translation table • To construct node C, BFTL reads and parses sectors whose addresses’ are 23 and 100 • To prevent the lists from being too long, the lists should be compacted when needed.

Efficient B-Tree Index Structures over Flash Memory • Performance Evaluation • Experimental Setup • Flash-memory environments • A 4MB NAND flash memory. • The greedy block-recycling policy in garbage collection. • B-Tree environments • The fan-out of was 21. • The size of a B-Tree node fitted in a sector (512 bytes). • Other parameters • The reservation buffer was configured to hold up to 60 records. • The bound of the lengths of lists in the node translation table was set as 4.

Efficient B-Tree Index Structures over Flash Memory Average Response Time of Insertions after Inserting 30,000 Records Rs: To control the distribution of the values of the inserted keys. 1 Sequential 0 Random Rs

Efficient B-Tree Index Structures over Flash Memory Number of Pages Written after Inserting 30,000 Records Number of Pages Read after Inserting 30,000 Records Number of Blocks Erased after Inserting 30,000 Records

Efficient B-Tree Index Structures over Flash Memory • Summary • An Efficient B-Tree implementation is proposed to resolve the confliction between page-based operations and intensive byte-wise updates. • Performance and reliability of flash memory could be greatly improved. • Result Publications • IEEE 9thInternational Conference on Real-Time and Embedded Computing Systems and Applications (IEEE RTCSA), 2003. • ACM Transactions on Embedded Computing Systems (ACM TECS), to Appear.

Efficient Initialization and Crash Recovery for File Systems offset 2 • YAFFSfocuses on building a NAND-friendly file system. A File Spare area: Set 5-th byte to zero to let the page be invalidated. Spare area: file id and file offset offset 0 offset 1 offset 2 offset n . . . . . . . . .

Efficient Initialization and Crash Recovery for File Systems • Motivation • To provide efficient initialization and fast crash recovery over flash-memory file systems (i.e., YAFFS). ECC, file-id, file-offset, and other house-keeping information for the page

Efficient Initialization and Crash Recovery for File Systems File-System Meta-Data Log-Record Log Records Manager Native Flash-Memory File Systems (LRM) Logger writes log records to Flash Memory as Check Regions Flash Memory Logger Spare Areas Log Records describe modifications to file systems LRM buffers log records and reduce the invalid log records According to the most-recent check region, we can have efficient initialization and crash recovery Logger writes log records from LRM to flash memory as Check Regions

Efficient Initialization and Crash Recovery for File Systems • A Log Record • A log record describes writes/updates to a continuous segment of a file. • (file_id, start_offset, start_address, size, version) • If start_offset = 0, the content in start_address is for the updates of the file attributes. • The access mode, the access time, uid, gid, and nlink. start_offset File: Flash Memory size start_address

Efficient Initialization and Crash Recovery for File Systems d == d . file _ id . file _ id 8 12 d + d == d Delete occurs Merge occurs . start _ offset . size . start _ offset 8 8 12 d + d == d . start _ address . size . start _ address 8 8 12 An Example of a Delete and a Merge

Efficient Initialization and Crash Recovery for File Systems Log-Segment Directories Blocks Log Record Version = 1 Addr. of Log Segment 1 Addr. of Log Segment 2 . . . Log Segment 1 Log Segment 2 Log Segment 3 Version = 2 Addr. of Log Segment 3 Addr. of Log Segment 4 . . . Log Segment 4 . . . An Example of the Check Region

Efficient Initialization and Crash Recovery for File Systems Efficient Crash Recovery – Adopt Block Skipping

Efficient Initialization and Crash Recovery for File Systems • Performance Evaluation • Experimental Setup • Flash-Memory Environments • The file system (YAFFS) was over an 1GB NAND flash memory. • The block size, the page size, and the size of the spare area of each page were 16KB, 512B, and 16B, respectively. • The access times were about 156us and 30 us for reading the user area and the spare area of each page.

Efficient Initialization and Crash Recovery for File Systems • Performance Evaluation • Experimental Setup • Access Pattern • There were 400MB of data written to 100 files. • The average size of each modification to a file was 10KB, and 80% of the 400MB were written to 20% of the 100 files (i.e., 80-20 locality) • Other parameters • Modification and updates to the files were controlled by a parameter append ratio (AR). • The other parameter buffer size (BS) which controlled the maximum number of log records possibly held in the buffer of the LRM.

Efficient Initialization and Crash Recovery for File Systems Performance Evaluation – Different Append Ratios • That was roughly 5,258ms if the LRM did not adopt the merge and delete operations.

Efficient Initialization and Crash Recovery for File Systems Performance Evaluation – Different Buffer Sizes

Efficient Initialization and Crash Recovery for File Systems Performance Evaluation – Crash Recovery It took roughly 13,096ms for the original YAFFS to mount a dirty file system for the same set of experiments.

Efficient Initialization and Crash Recovery for File Systems • Summary • We proposes a method for efficient initialization and crash recovery for flash-memory file systems. • During initialization or crash recovery, the house-keeping data structure of a flash-memory file system is efficiently reconstructed based on check regions. • Result Publications • In Proceedings of the ACM Symposium on Applied Computing (ACM SAC), April 23-27, 2006 • ACM Transactions on Storage (ACM TOS), to Appear.

Memory Space Requirements – FTL FTL needs large memory space for the address translation table • The problem is large memory space requirements. • FTL adopts a page-level address translation mechanism. For example, 256MB NAND flash with a page size of 512 bytes needs 524,288(256*1024*1024/512) entries Assume that an entry needs 4 bytes, the address translation information of FTL requires 2,048KB memory space.

System Architecture – NFTL Write data to LBA=1011 Block Offset=3 VBA=126 • LBA=1011 • VBA = 1011 / 8 = 126 • block offset = 1011 % 8 = 3 • A logical address is divided into • a virtual block address (VBA) • a block offset. NFTL Address Translation Table (in main-memory) A Primary Block Address = 9 A Replacement Block Address = 23 Free Used Free Used . . . Free Free Used Free (9,23) Free Free If the page has been used . . . Write to the first free page Free Free Free Free Free Free

Memory Space Requirements - NFTL • NFTL does not need large memory space requirements. • NFTL adopts a block-level address translation. • NFTL would need 128KB memory space to store 16,384 (256*1024/16) entries for 256MB NAND flash. • Assume that a block consists of 32 pages.

Address Translation Time - NFTL • The address translation performance of read and write requests can be deteriorated, due to linear searches of physical addresses. • Assume that each block contains 8 pages. • Let LBA A, B, C, D, and E be written for 5, 5, 1, 1, and 1 times, respectively. Their data distribution could be like to what in the left figure. • For example, it might need to scan 9 spare areas for LBA E.

Garbage Collection Overhead - NFTL • Copy the most-recent content to the new primary block. 2. Erase the old primary block and the replacement block. 3. Overhead is 2 block erases and 5 page writes.

An Adaptive Two-Level Management for the Flash Translation Layer in Embedded Systems • Motivation • An adaptive two-level management design of a flash translation layer, called AFTL. • Exploit the advantages of the fine-grained address mechanism andthe coarse-grained address mechanism.

AFTL – Coarse-to-Fine Switching • AFTL doesn’t erase the two blocks immediately. 2. AFTL moves the mapping information of the replacement block to the fine-grained hash table by adding fine-grained slots. 3. The RPBA field of the corresponding mapping information is nullified. FTL NFTL

AFTL – Fine-to-Coarse Switching • Assume that the two fine-grained slots are replaced. • Data stored in the pages with the given (physical) addresses are copied to the primary or replacement block of the corresponding coarse-grained slot, as defined by NFTL. 3. If there dose not exist any corresponding coarse-grained slot, a new one is created. • The number of the fine-grained slots is limited. • Some least recently used mapping information of fine-grained slots should be moved to the coarse-grained hash table. FTL NFTL

AFTL – Fine-to-Coarse Switching • Coarse-to-fine switches would introduce fine-to-coarse switches and overhead in valid page copying. • It is because the number of the fine-grained slots is limited. • Stop any coarse-to-fine switch when some frequency bound in coarse-to-fine switches is reached. • We set a parameter in the experiments to control the frequency of switches.

The Advantages of AFTL • Improve the address translation performance. • It is because the moving of their mapping information to the fine-grained hash table. • Improve the garbage collection overhead. • The delayed recycling of any replacement blockreducesthe potential number of valid data copyings and blocks erased. FTL NFTL

Performance Evaluation • Performance Setup • The characteristics of the experiment traces was over a 20GB disk.

Efficient Data Access Management over Flash-Memory Storage Systems