Secondary Indexing in Phoenix

Secondary Indexing in Phoenix SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software Engineer Jesse Yates HBase Committer Software Engineer

Agenda https://www.madison.k12.wi.us/calendars About Indexes In Phoenix Immutable Indexes Mutable Indexes Demo! Roadmap SF HUG – Sept 2013

Phoenix • Open Source • https://github.com/forcedotcom/phoenix • “SQL-skin” on HBase • Everyone knows SQL! • JDBC Driver • Plug-and-play • Faster than HBase • in some cases SF HUG – Sept 2013

Secondary Indexes Sort on ‘orthogonal’ axis Save full-table scan Expected database feature Hard in HBase b/c of ACID considerations SF HUG – Sept 2013

Agenda About Indexes In Phoenix Immutable Indexes Mutable Indexes Demo! Roadmap SF HUG – Sept 2013

Indexes In Phoenix • Creating an index • DDL statement • Creates another HBase table behind the scenes • Deciding when an index is used • Transparent to the user • (but user can override through hint) • No stats yet • Knowing which table was used • EXPLAIN <query> SF HUG – Sept 2013

Creating Indexes In Phoenix • CREATE INDEX <index_name> ON <table_name>(<columns_to_index>…) INCLUDE (<columns_to_cover>…); • Optionally add IMMUTABLE_ROWS=true property to CREATE TABLE statement SF HUG – Sept 2013

Creating Indexes In Phoenix CREATE TABLE baby_names( name VARCHAR PRIMARY KEY, occurrences BIGINT); CREATE INDEX baby_names_idx ON baby_names(occurrences DESC, name); SF HUG – Sept 2013

Deciding When To Use • Transparent to the user • Query optimizer does the following: • Compiles query against data and index tables • Chooses “best” one (not yet stats driven) • Can index even be used? • Active, Using columns contained in index (no join back to data table) • Can ORDER BY be removed? • Which plan forms the longest start/stop scan key? SF HUG – Sept 2013

Deciding When To Use ORDER BY not necessary since rows in index table are already ordered this way SELECT name, occurrences FROM baby_names ORDER BY occurrences DESC LIMIT 10; SELECT name, occurrences FROM baby_names_idx LIMIT 10 SF HUG – Sept 2013

Deciding When To Use Uses index, since we can form start row for scan based on filter of occurrences SELECT name, occurrences FROM baby_names WHERE occurrences > 100; SELECT name, occurrences FROM baby_names_idx WHERE occurrences > 100; SF HUG – Sept 2013

Deciding When To Use Override optimizer by telling it not to use any indexes Tell optimizer priority in which it should consider using indexes` SELECT /* NO_INDEX */ name FROM baby_names WHERE occurrences > 100; SELECT /*+ INDEX (baby_names baby_names_idx other_baby_names_idx) */ name,occurrences FROM baby_names WHERE occurrences > 100; SF HUG – Sept 2013

Knowing which table was used EXPLAINSELECT name, occurrences FROM baby_names ORDER BY occurrences DESC LIMIT 10; CLIENT PARALLEL 1-WAY FULL SCAN OVER BABY_NAMES_IDX SERVER FILTER BY PageFilter10 CLIENT 10 ROW LIMIT SF HUG – Sept 2013

Immutable Indexes Immutable Rows Much easier to implement Client-managed Bulk-loadable SF HUG – Sept 2013

Mutable Indexes • Global Index • Change row state • Common use-case • “expected” implementation • Covered Columns/Join Index SF HUG – Sept 2013

1.5 years* SF HUG – Sept 2013

Internals • Index Management • Build index updates • Ensures index is ‘cleaned up’ • Recovery Mechanism • Ensures index updates are “ACID” SF HUG – Sept 2013

“There is no magic” - Every programming hipster (chipster) SF HUG – Sept 2013

Mutable Indexing: Standard Write Path Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore SF HUG – Sept 2013

Mutable Indexing Indexer Region Coprocessor Host Builder Codec WAL Updater WAL Durable! Index Table Indexer Region Coprocessor Host Index Table Index Table SF HUG – Sept 2013

Index Management public interface IndexBuilder{ public void setup(RegionCoprocessorEnvironmentenv); public Map<Mutation, String> getIndexUpdate(Put put); public Map<Mutation, String> getIndexUpdate(Deletedelete); } Lives within a RegionCoprocesorObserver Access to the local HRegion Specifies the mutations to apply to the index tables SF HUG – Sept 2013

Why not write my own? • Managing Cleanup • Efficient point-in-time correctness • Performance tricks • Abstract access to HRegion • Minimal network hops • Sorting correctness • Phoenix typing ensures correct index sorting SF HUG – Sept 2013

Example: Managing Cleanup • Updates can arrive out of order • Client-managed timestamps SF HUG – Sept 2013

Example: Managing Cleanup Index Table SF HUG – Sept 2013

Example: Managing Cleanup SF HUG – Sept 2013

Managing Cleanup History “roll up” Out-of-order Updates Point-in-time correctness Multiple Timestamps per Mutation Delete vs. DeleteColumn vs. DeleteFamily Surprisingly hard! SF HUG – Sept 2013

Phoenix Index Builder public interfaceIndexCodec{ public void initialize(RegionCoprocessorEnvironmentenv); public Iterable<IndexUpdate> getIndexDeletes(TableState state); public Iterable<IndexUpdate> getIndexUpserts(TableState state); } Much simpler than full index management Hides cleanup considerations Abstracted access to local state SF HUG – Sept 2013

Phoenix Index Codec SF HUG – Sept 2013

Dude, where’s my data? Ensuring Correctness SF HUG – Sept 2013

HBase ACID • Does NOT give you: • Cross-row consistency • Cross-table consistency • Does give you: • Durable data on success • Visibility on success without partial rows SF HUG – Sept 2013

Key Observation “Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.” - Lars Hofhansl SF HUG – Sept 2013

Idempotent Index Updates • Doesn’t need full transactions • Replay as many times as needed • Can tolerate a little lag • As long as we get the order right SF HUG – Sept 2013

Failure Recovery • <property> • <name>hbase.regionserver.wal.codec</name> <value>o.a.h.hbase.regionserver.wal.IndexedWALEditCodec</value> • </property> • <property> • <name>hbase.regionserver.hlog.reader.impl</name> • <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value> • </property> • Custom WALEditCodec • Encodes index updates • Supports compressed WAL • Custom WAL Reader • Replay index updates from WAL SF HUG – Sept 2013

Failure Situations Any time before WAL, client replay Any time after WAL, HBase replay All-or-nothing SF HUG – Sept 2013

Failure #1: Before WAL Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore SF HUG – Sept 2013

Failure #1: Before WAL Client HRegion RegionCoprocessorHost WAL No problem! No data is stored in the WAL, client just retries entire update. RegionCoprocessorHost MemStore SF HUG – Sept 2013

Failure #2: After WAL Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore SF HUG – Sept 2013

Failure #2: After WAL Client HRegion RegionCoprocessorHost WAL WAL replayed via usual replay mechanisms RegionCoprocessorHost MemStore SF HUG – Sept 2013

“Magic” Server-short circuit Lazy load columns Skip-scan for cache Parallel Writing Custom MemStore in Indexer Caching HTables Pluggable Index Writing/Failure Policy Minimize byte[] copy (ImmutableBytesPtr) SF HUG – Sept 2013

Demo SF HUG – Sept 2013

Roadmap Next release of Phoenix Performance improvements Functional Indexes Other indexing approaches (Huawei, SEP) SF HUG – Sept 2013

Open Source! Main: https://github.com/forcedotcom/phoenix Indexing: https://github.com/forcedotcom/phoenix/tree/mutable-si SF HUG – Sept 2013

Secondary Indexing in Phoenix

Secondary Indexing in Phoenix

Presentation Transcript

Indexing

Indexing:

Secondary Indexing in Phoenix

Secondary Indexing

Indexing

Indexing

Indexing

Indexing

Indexing

Indexing

Indexing

Indexing

Indexing

Indexing

WEEK ONE – SECONDARY MENU – PHOENIX COLLEGE SEPTEMBER 2012

Indexing

Indexing

Indexing

Indexing

Indexing

Indexing