Querying Sensor Networks

Querying Sensor Networks Sam Madden UC Berkeley October 2, 2002 @ UCLA

Introduction • Programming Sensor Networks Is Hard • Especially if you want to build a “real” application • Declarative Queries Are Easy • And, can be faster and more robust than most applications!

Overview • Overview of Declarative Systems • TinyDB • Features • Demo • Challenges+ Research Issues • Language • Optimizations • The Next Step

Overview • Overview of Declarative Systems • TinyDB • Features • Demo • Challenges + Research Issues • Language • Optimizations • The Next Step

Declarative Queries: SQL • SQL is the traditional declarative language used in databases SELECT {sel-list} FROM {tables} WHERE {pred} GROUP BY {pred} HAVING {pred} SELECT dept.name, AVG(emp.salary) FROM emp,dept WHERE emp.dno = dept.dno AND (dept.name=“Accounting” OR dept.name=“Marketing”) GROUP BY dept.name

ON EVENT bird_detect(loc) AS bd • SELECT AVG(s.light), AVG(s.temp) • FROM sensors AS s • WHERE dist(bd.loc,s.loc) < 10m • SAMPLE PERIOD 1s for 10 • [Coming soon!] 3 • SELECT AVG(volume) • FROM sensors • WHERE light > 400 • GROUP BY roomNo • HAVING AVG(volume) > 200 2 Rooms w/ volume > 200 Declarative Queries for Sensor Networks • Examples: SELECT nodeid, light FROM sensors WHERE light > 400 SAMPLE PERIOD 1s 1

General Declarative Advantages • Data Independence • Not required to specify how or where, just what. • Of course, can specify specific addresses when needed • Transparent Optimization • System is free to explore different algorithms, locations, orders for operations

Data Independence In Sensor Networks • Vastly simplifies execution for large networks • Since locations are described by predicates • Operations are over groups • Enables tolerance to faults • Since system is free to choose where and when operations happen

Optimization In Sensor Networks • Optimization Goal : Power! • Where to process data • In network • Outside network • Hybrid • How to process data • Predicate & Join Ordering • Index Selection • How to route data • Semantically Driven Routing

Overview • Overview of Declarative Systems • TinyDB • Features • Demo • Challenges + Research Issues • Language • Optimizations • The Next Step

TinyDB • A distributed query processor for networks of Mica motes • Available today! • Goal: Eliminate the need to write C code for most TinyOS users • Features • Declarative queries • Temporal + spatial operations • Multihop routing • In-network storage

Query {A,B,C,D,E,F} A {B,D,E,F} B C {D,E,F} D F E TinyDB @ 10000 Ft (Almost) All Queries are Continuous and Periodic • Written in SQL • With Extensions For : • Sample rate • Offline delivery • Temporal Aggregation

TinyDB Demo

Applications + Early Adopters • Some demo apps: • Network monitoring • Vehicle tracking • “Real” future deployments: • Environmental monitoring @ GDI (and James Reserve?) • Generic Sensor Kit • Parking Lot Monitor Demo!

TinyDB Architecture (Per node) SelOperator AggOperator • TupleRouter: • Fetches readings (for ready queries) • Builds tuples • Applies operators • Deliver results (up tree) TupleRouter • AggOperator: • Combines local & neighbor readings Network • SelOperator: • Filters readings Radio Stack Schema TinyAllloc • Schema: • “Catalog” of commands & attributes (more later) • TinyAlloc: • Reusable memory allocator!

Free Bitmap Master Pointer Table Heap Free Bitmap Master Pointer Table Heap Free Bitmap Free Bitmap Master Pointer Table Master Pointer Table Heap Heap TinyAlloc • Handle Based Compacting Memory Allocator • For Catalog, Queries Handle h; call MemAlloc.alloc(&h,10); … (*h)[0] = “Sam”; call MemAlloc.lock(h); tweakString(*h); call MemAlloc.unlock(h); call MemAlloc.free(h); User Program Compaction

Schema • Attribute & Command IF • At INIT(), components register attributes and commands they support • Commands implemented via wiring • Attributes fetched via accessor command • Catalog API allows local and remote queries over known attributes / commands. • Demo of adding an attribute, executing a command.

Overview • Overview of Declarative Systems • TinyDB • Features • Demo • Challenges + Research Issues • Language • Optimizations • Quality

? ? ? ? ? ? 3 Questions ? • Is this approach expressive enough? • Can this approach be efficient enough? • Are the answers this approach gives good enough?

Q1: Expressiveness • Simple data collection satisfies most users • How much of what people want to do is just simple aggregates? • Anecdotally, most of it • EE people want filters + simple statistics (unless they can have signal processing) • However, we’d like to satisfy everyone!

Query Language • New Features: • Joins • Event-based triggers • Via extensible catalog • In network & nested queries • Split-phase (offline) delivery • Via buffers

Sample Query 1 Bird counter: CREATE BUFFER birds(uint16 cnt) SIZE 1 ON EVENT bird-enter(…) SELECT b.cnt+1 FROM birds AS b OUTPUT INTO b ONCE

Sample Query 2 Birds that entered and left within time t of each other: ON EVENT bird-leave AND bird-enter WITHIN t SELECT bird-leave.time, bird-leave.nest WHERE bird-leave.nest = bird-enter.nest ONCE

Sample Query 3 Delta compression: SELECT light FROM buf, sensors WHERE|s.light – buf.light| > t OUTPUT INTO buf SAMPLE PERIOD 1s

Sample Query 4 Offline Delivery + Event Chaining CREATE BUFFER equake_data( uint16 loc, uint16 xAccel, uint16 yAccel) SIZE 1000 PARTITION BY NODE SELECT xAccel, yAccel FROM SENSORS WHERE xAccel > t OR yAccel > t SIGNAL shake_start(…) SAMPLE PERIOD 1s ON EVENT shake_start(…) SELECT loc, xAccel, yAccel FROM sensors OUTPUT INTO BUFFER equake_data(loc, xAccel, yAccel) SAMPLE PERIOD 10ms

Event Based Processing • Enables internal and chained actions • Language Semantics • Events are inter-node • Buffers can be global • Implementation plan • Events and buffers must be local • Since n-to-n communication not (well) supported • Next: operator expressiveness

Operator Expressiveness: Aggregate Framework • Standard SQL supports “the basic 5”: • MIN, MAX, SUM, AVERAGE, and COUNT • We support any function conforming to: Aggn={fmerge, finit, fevaluate} Fmerge{<a1>,<a2>}  <a12> finit{a0}  <a0> Fevaluate{<a1>}  aggregate value (Merge associative, commutative!) Partial Aggregate Example: Average AVGmerge {<S1, C1>, <S2, C2>}  < S1 + S2 , C1 + C2> AVGinit{v}  <v,1> AVGevaluate{<S1, C1>}  S1/C1 From Tiny AGgregation (TAG), Madden, Franklin, Hellerstein, Hong. OSDI 2002 (to appear).

Isobar Finding

Temporal Aggregates • TAG was about “spatial” aggregates • Inter-node, at the same time • Want to be able to aggregate across time as well • Two types: • Windowed: AGG(size,slide,attr) • Decaying: AGG(comb_func, attr) • Demo! size =4 slide =2 … R1 R2 R3 R4 R5 R6 …

Expressiveness Review • Internal & nested queries • With logging of results for offline delivery • Event based processing • Extensible aggregates • Spatial & temporal • On to Question 2: What about efficiency?

Q2: Efficiency • Metric: power consumption • Goal: reduce communication, which dominates cost • 800 instrs/bit! • Standard approach: in-network processing, sleeping whenever you can…

But that’s not good enough… • What else can we do to bring down costs? • Sleep Even More? • Events are key • Apply automatic optimization! • Semantically driven routing • …and topology construction • Operator placement + ordering • Adaptive data delivery

TAG • In-network processing • Reduces costs depending on type of aggregates • Exploitation of operator semantics Tiny AGgregation (TAG), Madden, Franklin, Hellerstein, Hong. OSDI 2002 (to appear).

1 2 3 4 5 Illustration: Pipelined Aggregation SELECT COUNT(*) FROM sensors Depth = d

1 2 3 4 5 Illustration: Pipelined Aggregation SELECT COUNT(*) FROM sensors Epoch 1 1 Sensor # 1 1 1 Epoch # 1

Simulation Result Simulation Results 2500 Nodes 50x50 Grid Depth = ~10 Neighbors = ~20 Some aggregates require dramatically more state!

Taxonomy of Aggregates • TAG insight: classify aggregates according to various functional properties • Yields a general set of optimizations that can automatically be applied

Optimization: Channel Sharing • Insight: Shared channel enables optimizations • Suppress messages that won’t affect aggregate • E.g., in a MAX query, sensor with value v hears a neighbor with value ≥ v, so it doesn’t report • Applies to all exemplary, monotonic aggregates • Learn about query advertisements it missed • If a sensor shows up in a new environment, it can learn about queries by looking at neighbors messages. • Root doesn’t have to explicitly rebroadcast query!

Optimization: Hypothesis Testing • Insight: Root can provide information that will suppress readings that cannot affect the final aggregate value. • E.g. Tell all the nodes that the MIN is definitely < 50; nodes with value ≥ 50 need not participate. • Depends on monotonicity • How is hypothesis computed? • Blind guess • Statistically informed guess • Observation over first few levels of tree / rounds of aggregate

B C B B C C B B C C 1 A A A 1/2 1/2 A A Optimization: Use Multiple Parents • For duplicate insensitive aggregates • Or aggregates that can be expressed as a linear combination of parts • Send (part of) aggregate to all parents • Decreases variance • Dramatically, when there are lots of parents

TAG Summary • In Query Processing A Win For Many Aggregate Functions • By exploiting general functional properties of operators, many optimizations are possible • Requires new aggregates to be tagged with their properties • Up next: non-aggregate query processing optimizations – a flavor of things to come!

Attribute Driven Topology Selection • Observation: internal queries often over local area* • Or some other subset of the network • E.g. regions with light value in [10,20] • Idea: build topology for those queries based on values of range-selected attributes • Requires range attributes, connectivity to be relatively static * Heideman et. Al, Building Efficient Wireless Sensor Networks With Low Level Naming. SOSP, 2001.

Attribute Driven Query Propagation SELECT … WHERE a > 5 AND a < 12 Precomputed intervals == “Query Dissemination Index” 4 [1,10] [20,40] [7,15] 1 2 3

Attribute Driven Parent Selection Even without intervals, expect that sending to parent with closest value will help 1 2 3 [1,10] [20,40] [7,15] [3,6]  [1,10] = [3,6] [3,7]  [7,15] = ø [3,7]  [20,40] = ø 4 [3,6]

Hot off the press…

Operator Placement & Ordering • Observation: Nested queries, triggers, and joins can often be re-ordered • Ordering can dramatically affect the amount of work you do • Lots of standard database tricks here

Querying Sensor Networks