Donghui Zhang CCIS, Northeastern University ccs.neu/home/donghui/research/neustore/

NEUStore: A Simple Java Package for the Construction of Disk-based, Paginated, and Buffered Indices Donghui Zhang CCIS, Northeastern University http://www.ccs.neu.edu/home/donghui/research/neustore/

A DB index should be stored as a disk file. Let’s check what Java.io.RandomAccessFile provides…

java.io.RandomAccessFile • Views a file as a sequential list of bytes • Has operators seek, read, write // In the file “./mydata”, copy bytes 10-19 to 0-9. RandomAccessFile file = new RandomAccessFile(“mydata”, “rw”); byte[] buf = new byte[10]; file.seek(10); file.read(buf); file.seek(0); file.write(buf);

java.io.RandomAccessFile • Views a file as a sequential list of bytes buf: 10 bytes in memory // In the file “./mydata”, copy bytes 10-19 to 0-9. RandomAccessFile file = new RandomAccessFile(“mydata”, “rw”); byte[] buf = new byte[10]; file.seek(10); file.read(buf); file.seek(0); file.write(buf);

There is a gap between what Java.io.RandomAccessFile provides and what a DB index implementation needs.

Consider implementing a B+-tree class BPlusTree { private RandomAccessFile file; private DBBuffer buffer; private int rootPageID; public BPlusTree(String filename, DBBuffer buffer); public void Insert(Key, Data); public void Delete(Key); public Data Search(Key); }

Consider implementing a B+-tree class BPlusTree { …… public BPlusTree(String filename, DBBuffer buffer) { file = new RandomAccessFile(filename, “rw”); this.buffer = buffer; rootPageID = …… } …… }

public void Insert(Key, Data) { // Find the leaf page where the record should be inserted, // pinning all pages along the path. int pageID = rootPageID; DBPage page = buffer.readPage(file, rootPageID); while ( page.nodeType != LEAF ) { buffer.pinPage(file, pageID); pageID = ((BPlusTreeIndexPage)page).Search(Key); page = buffer.readPage(file, pageID); } buffer.pinPage(file, pageID); // Insert the record into the leaf page ((BPlusTreeLeafPage)page).Insert(Key, Data); // Handle Overflow… }

public void Insert(Key, Data) { // Continued …………………… if (page.isOverflow()) { int newPageID = allocate(); // in Delete, freePage() DBPage newPage = new DBPage(newPageID); Move half of the records from page to newPage. Read the right sibling and fix the sibling pointers. buffer.writePage(file, pageID, page); buffer.unpin(file, pageID); buffer.writePage(file, newPageID, newPage); buffer.writePage for the right sibling page. Recursive split if needed. } Unpin all pages. }

1 2 3 4 5 A DB index… • Uses a random access disk file to store data. • Views a file as a sequential list of pages. • Needs to support allocate and freePage. • Needs to manage the empty pages in the middle. • Needs to utilize a buffer… 0 6 7

1 1 4 5 5 4 3 2 2 3 5 4 3 2 1 file1 0 6 7 file2 0 6 7 file3 0 6 7 A set of DB indices readPage writePage A DB buffer file1, page0 file2, page3

Summary of the gap • Java.io.RandomAccessFile provides: seek(), read(), write() • Implementing a DB index needs: allocate(), freePage(), buffer.readPage(), buffer.writePage()

NEUStore fills the gap! • class DBIndex { allocate() freePage() } • class DBBuffer{ readPage() writePage() }

Download and setup • See the PDF file! • Download the .tgz file and unzip. A neustoreroot directory will be created. • Use Eclipse to create a new project. • Use javadoc. • Smartly apply certain changes due to newer version of Java and Eclipse.

neustoreroot has 3 directories • doc: the document generated using Javadoc. • neustore: the storage package. It contains: • base: the base of the storage package. It contains definitions of core classes DBIndex, DBBuffer, DBPage, and so on. • heapfile: a generic implementation of the heapfile. Here the class HeapFile was derived from DBIndex. • test: sample test programs. It contains: • naiveheapfile: implementation of a naive heapfile, and the test program for it. • heapile: test program for using neustore.heapfile.

Packages and classes

class DBPage • Abstract base class for a memory-version page. • E.g. derive BPlusTreeIndexPage from it. • Constructor: DBPage(int nodeType, int pageSize); creates a DBPage. nodeType -1 and 0 are preserved. 1 and above can be user-defined. E.g. nodeType 1 means BPlusTreeIndexPage and nodeType 2 means BPlusTreeLeafPage.

class DBPage • Convert to/from disk-version page (byte[]): • abstract void read( byte[] page ); Reads the content of this DBPage from a byte array. • abstract void write( byte[] page ); Writes the content of this DBPage to a byte array. • Example usage of neustore.base.ByteArray: byte[] b = …… ByteArray ba = new ByteArray( b, ByteArray.READ ); int firstInt = ba.readInt( ); int secondInt = ba.readInt( );

class DBBuffer • Constructor: DBBuffer(int bufferSize, int pageSize); bufferSize: # pages pageSize: # bytes/page • Some useful functions: • pin() • unpin() • clearIOs() • getIOs()

DBBuffer.readPage(file, pageID) • What should the return type be? • DBPage -- problematic to implement: DBPage readPage(RandomAccessFile file, int pageID) { if (file, pageID) exists in the buffer return the corresponding DBPage object; else { choose a buffered page to throw away; read from the file pageSize bytes to a byte[]; convert to a DBPage and return; } } ?? but how??

DBBuffer.readPage(file, pageID) • A DBBuffer is general. It buffers pages from different indices. • So whenever a page is physically read from disk, it should be kept as a byte[], period. • What if store byte[] in DBBuffer? • Works for C++: cast (char*) to (MyDBPage*) • Extremely inefficient for Java!! Every time to access a page, needs to convert from byte[].

Solution: allow storing in both types! • When physically read, store as byte[]. Later when the application calls writePage(file, pageID, DBPage dbpage), store as DBPage. • return type of readPage should be: public class DBBufferReturnElement { public boolean parsed; public Object object; public int nodeType; } parsed means the object is a DBPage; and non-parsed means the object is a byte[].

DBBuffer’s abstract functions • add(): to add a new page into buffer; • find(): to find a buffered page; • flush(): to empty the buffer while writing dirty pages to disk. • A derived class should implement them. • neustore.base includes a derived class: LRUBuffer. But the existing implementation did not use hashing.

DBIndex • Constructor: DBIndex(DBBuffer buffer, String filename, boolean isCreate) Create or open a DBIndex. • int allocate() • void freePage(int pageID) • void close(): // should be called before terminating the application to write head page and flush buffer.

DBIndex • The head page (having pageID=0) uses the first OVERHEAD=16 bytes to store four integers: • -1 : in every disk page, the first four bytes is nodeType. Type -1 means head page. Type 0 means empty page. • pagesize • numOfPages: number of non-empty pages • free: pageID of the first empty page. All empty pages are linked together. • The rest (pageSize – OVERHEAD) bytes store user-specified meta data, e.g. rootPageID.

Meta data in DBIndex head page • To manage the meta data, a derived class needs to implement the three abstract functions: • initIndexHead(): called when the index is built; e.g. set rootPageID = allocate(); • readIndexHead(): called when the index is opened; e.g. read existing rootPageID; • writeIndexHead(): called right before the index is closed; e.g. write the updated rootPageID.

Case study 1: naïve heap file • Simplified requirement: • Insert = add an integer at the end of the file. • Search = tell whether an integer exists. • Three files in neustoreroot/test/naiveheapfile: • NaiveHeapFilePage.java: derive from DBPage • NaiveHeapFile.java: derive from DBIndex • TestNaiveHeapFile.java: contains main()

NaiveHeapFilePage.java: derive from DBPage • Needs to implement abstract functions: read() and write() • NaiveHeapFile.java: derive from DBIndex • TestNaiveHeapFile.java: contains main()

public class NaiveHeapFilePage extends DBPage { private Vector<Integer> records; public NaiveHeapFilePage( int _pageSize ) { super(1, _pageSize); records = new Vector<Integer>(); } public int numRecs() { return records.size(); } public void insert( int key ) { records.add( new Integer(key) ); } public boolean search( int key ) {……} protected void read( byte[] b ) {……} protected void write( byte[] b ) {……} }

public class NaiveHeapFilePage extends DBPage { …… public boolean search( int key ) { for ( Integer e: records ) { if (e.intValue() == key) return true; } return false; } protected void read( byte[] b ) {……} protected void write( byte[] b ) {……} }

public class NaiveHeapFilePage extends DBPage { …… protected void read( byte[] b ) { ByteArray ba = new ByteArray(b, ByteArray.READ); ba.readInt(); // nodeType, ignore int numRecs = ba.readInt(); records.removeAllElements(); for ( int i=0; i<numRecs; i++ ) { int key = ba.readInt(); records.add(new Integer(key)); } } protected void write( byte[] b ) {……} }

public class NaiveHeapFilePage extends DBPage { …… protected void write( byte[] b ) { ByteArray ba=new ByteArray(b, ByteArray.WRITE); int numRecs = records.size(); ba.writeInt( nodeType ); ba.writeInt( numRecs ); for ( Integer key: records ) { ba.writeInt( key.intValue() ); } } }

NaiveHeapFilePage.java: derive from DBPage • NaiveHeapFile.java: derive from DBIndex • Needs to implement abstract functions initIndexHead(), readIndexHead() and writeIndexHead(). • TestNaiveHeapFile.java: contains main()

public class NaiveHeapFile extends DBIndex { private int pageCapacity; // max # recs in a page private int lastPageID; // ID for last page private NaiveHeapFilePage lastPage; public NaiveHeapFile(DBBuffer buffer, String filename, boolean isCreate) {……} protected void initIndexHead() {……} protected void readIndexHead(byte[] head) {……} protected void writeIndexHead(byte[] head) {……} public void insert( int key ) {……} public boolean search( int key ) {……} protected NaiveHeapFilePage myReadPage( int pageID ) {……} }

public class NaiveHeapFile extends DBIndex { …… public NaiveHeapFile(DBBuffer _buffer, String filename, boolean isCreate) { super(_buffer, filename, isCreate); pageCapacity = (pageSize-8)/4; // nodeType & numRecs if(isCreate) lastPage = new NaiveHeapFilePage(pageSize); else lastPage = myReadPage(lastPageID); } …… }

public class NaiveHeapFile extends DBIndex { …… protected void initIndexHead() { lastPageID = allocate(); } protected void readIndexHead(byte[] head) { ByteArray ba = new ByteArray(head, ByteArray.READ); lastPageID = ba.readInt(); } protected void writeIndexHead(byte[] head) { ByteArray ba = new ByteArray(head, ByteArray.WRITE); ba.writeInt(lastPageID); } …… }

public class NaiveHeapFile extends DBIndex { …… public void insert( int key ) { if ( lastPage.numRecs() >= pageCapacity ) { lastPageID = allocate(); lastPage = new NaiveHeapFilePage(pageSize); } lastPage.insert( key ); buffer.writePage(file, lastPageID, lastPage); } …… }

public class NaiveHeapFile extends DBIndex { …… public boolean search( int key ) { for ( int currentPageID=1; currentPageID<=lastPageID; currentPageID++ ) { NaiveHeapFilePage currentPage = myReadPage(currentPageID); if ( currentPage.search(key) ) return true; } return false; } …… }

public class NaiveHeapFile extends DBIndex { …… protected NaiveHeapFilePage myReadPage( int pageID ) { NaiveHeapFilePage page; DBBufferReturnElement ret = buffer.readPage(file, pageID); if ( ret.parsed ) { page = (NaiveHeapFilePage)ret.object; } else { page = new NaiveHeapFilePage(pageSize); page.read((byte[])ret.object); } return page; } }

NaiveHeapFilePage.java: derive from DBPage • NaiveHeapFile.java: derive from DBIndex • TestNaiveHeapFile.java: contains main()

public class TestNaiveHeapFile { public static void main(String[]args) { System.out.println( "***** Creating index *****" ); LRUBuffer buffer = null; buffer = new LRUBuffer( 5, 20 ); NaiveHeapFile hp = new NaiveHeapFile(buffer, filename, DBIndex.CREATE); System.out.println( "***** Inserting keys *****"); hp.insert(10); System.out.println( "***** Searching keys *****" ); hp.search(10); System.out.println( “***** Closing index ******” ); hp.close(); } }

Limitations of the naïve heap file • A record has no data (only key). • No deletion is allowed. • Key type is fixed (as integer).

Naïve heap file  neustore.heapfile • A record has no data (only key). Solution: define a private class HeapFileRecord, inside class HeapFilePage, which contains a key and a data. 2. No deletion is allowed. Solution: allow deletion. To enable leaving space in the middle, maintain a double linked list of full pages and a separate double linked list of non-full pages. • Key type is fixed (as integer). ?? How to implement a generic heapfile, which can be used for any type of key (and data)?

Generic key (data) • Define two interfaces Key and Data. Use them as key and data types to HeapFile (and B+-tree you are about to implement). • Two important member functions of both Key and Data are: • public abstract void read(DataInputStream in): Reads the content of the Key object from an input stream; Example use: to convert a byte[] to a HeapFilePage, read the number of records in the page, and then repeatedly generate a Key object (and a Data object) and call its read function. • public abstract void write(DataOutputStream out): Writes the content of the object to an output stream. • neustore.base implements IntKey and StringData.

A challenge in generic key (data) • In the implementation of the (generic) heap file, we need to create a Key object and call its read() member function. However, Key is an interface and therefore Key key = new Key() does not work!! • Solution: require the constructor of HeapFile (and also HeapFilePage) to take as input a sample key and a sample data. • When a new Key object is needed: sampleKey.clone();

Summary

Donghui Zhang CCIS, Northeastern University ccs.neu/home/donghui/research/neustore/