Democratizing Sensor Networks through Reliable and Efficient High-level Programming Abstractions

Section 1: Problem and Motivation:

Wireless sensor networks consist of a system of distributed sensors embedded in the physical world. They are finding increasing use in a wide range of scientific, engineering, sociological, military, industrial and commercial applications, because they allow us to observe, measure, understand and control physical phenomena at scales and resolutions not imagined before. The intended target of sensor networks is scientists and engineers drawn from a broad spectrum of technical disciplines. Today, in order to exploit the full potential of sensor networks, these users must also be proficient programmers, intimately familiar with many deep issues involving programming languages, distributed systems, operating systems, and computer networking [1,2]. 

Constructing practical and reliable wirelessly-networked systems is a significant challenge currently because programmers must cope explicitly with severe resource, bandwidth, and power constraints on low-cost, battery-powered sensor nodes. Also, programmers must simultaneously deal with the challenges of distributed systems, such as the need to maintain concurrency, fault-tolerance, consistency and synchronization among numerous, asynchronous loosely coupled nodes.

Our main goal is to enable non-expert programmers to make effective use of sensor networks by raising the level of abstraction over current practice significantly. The critical change is one of perspective: rather than writing programs from the point of view of an individual node in the network, programmers implement a central program that conceptually has access to the entire network. This change allows a programmer to focus attention on the higher-level algorithmics and global intent of an application, and the compiler automatically generates the node-level programs that implement the application on the network correctly, efficiently, and in a fault-tolerant manner. This style of programming sensor networks is known as macroprogramming [1,2,3].

We present Kairos, which is a language, compiler and runtime system that exposes a high-level C-like programming model with sequential semantics that most users are already familiar with, and generates equivalent node-level versions of the program automatically [2]. By freeing the programmer from manually decomposing sequential-style high-level programs into concurrent node-level programs, Kairos reduces cognitive burden and enables sensor networks to be programmed easily, correctly and efficiently by a large community of scientists and engineers.

Thus, the key contribution of this research is one of the first practical programming systems that allows non-experts to program sensor networks. In order to realize this goal, we have developed the Kairos compiler and runtime for state-of-the-art sensor networking hardware (Section 3), evaluated it extensively in several real deployments and on programs written by non-expert programmers (Section 4), and made the system available to the community at-large. This work also resulted in an ongoing project that aims to simplify programming even further by providing a graphical development environment similar to MATLAB/Simulink that reuses many of the ideas developed here.

Section 2: Background and Related Work:

A sensor node consists of a low-power processor, a few kilobytes of RAM and ROM, various type of sensor devices, and a low bit-rate, short-range radio. Each such node is called a mote, and is exclusively battery-powered. Wireless messaging consumes the bulk of energy, and programs must therefore be optimized for message efficiency.

Today, the dominant method of programming such resource- and power-constrained devices is a low-level language called nesC [4]. nesC is low-level in the sense that it provides language constructs to express only node-level operations, which are then compiled to use the services of an operating system called TinyOS. While nesC and TinyOS provide abstractions and libraries that simplify node-level sensor-network application programming, ensuring the global efficiency and reliability of sensor network applications is still the province of the programmer. But this process is hard, tedious and error-prone because the programmer must manually decompose a high-level algorithm into programs for each individual sensor node, must ensure that these programs efficiently communicate with one another, must implement any necessary data consistency and control-flow synchronization protocols among these node-level programs, and must explicitly manage resources at each node. In contrast, the Kairos language provides a centralized programming model, and the compiler and runtime compile the program into node-level versions, and implement the necessary concurrency, synchronization, and fault-tolerance in order to ensure a reliable and efficient execution. Under the covers, the language constructs are compiled to efficient event-driven nesC code, and linked to TinyOS libraries, thereby maintaining compatibility with existing software and hardware.

Section 3: Approach and Uniqueness:

Kairos is designed to provide a simple programming model that addresses the challenges and requirements of sensor network programming. Kairos' sequential semantics makes programs easy to understand and is natural when programming sensor networks in a centralized fashion.  Concurrency is introduced in a simple manner appropriate to the domain. This strong form of consistency and reliability is the default, and is important for many sensor network applications. Kairos also supports weaker forms of consistency that the programmer can call explicitly. To our knowledge, no other macroprogramming system guarantees even weak forms of consistency.

Kairos has three unique features: a) its language constructs, which enable programmers to express code succinctly; b) its compiler, which provides automatic program partitioning to convert centralized programs into node-level versions and efficient control flow migration in order to execute these node-level versions efficiently; and c) its runtime, which ensures data consistency while providing enough concurrency and fault-tolerance. We describe each feature in turn.

Language Constructs

Node naming: Kairos provides a set of language constructs that allow programmers to easily access nodes and node-local state in a high-level, centralized, and topology-independent manner. The node type provides an abstraction of a single network node, and the nodeset type provides an iterator abstraction for an unordered collection of nodes.

Node-local variables: Kairos extends standard C variable naming to address node-local state. This facility allows programmers to naturally express distributed computations and eliminates the need for programmers to manually implement inter-node data access and communication. Node-local variables are declared as ordinary C variables but include the attribute nodelocal. The attribute indicates that there is one version of the variable per node in the network. A node-local variable is addressed inside a Kairos program using a new expression var@e, where var is a nodelocal variable and e is an expression of type node.

Concurrency: By default, a Kairos program has a sequential execution semantics.  However, Kairos also provides a simple form of programmer-directed concurrency. The cfor loop is like an ordinary for loop but allows for concurrent execution of the loop's iterations. A cfor loop can iterate over any nodeset, and the loop body will be executed concurrently for each node in the set.  While concurrency is often essential to achieve good performance, it can cause subtle errors that are difficult to understand and debug. To help programmers obtain the benefits of concurrency while maintaining reliability, the Kairos compiler and runtime system ensure that the execution of a cfor is always serializable:  the effect of a cfor always corresponds to some sequential execution of the loop. 

Compiler

The Kairos compiler is built as an extension to the CIL infrastructure for C analysis and transformation [5].  Our compiler accepts a Kairos program as input and produces node-level nesC code that can be linked with standard TinyOS components and the Kairos runtime system.  The Kairos runtime system is a collection of TinyOS modules that orchestrates the execution of the compiler-generated nesC code across the nodes in the network. The compiler is responsible for program partitioning and control flow migration.

Partitioning: The Kairos compiler performs a dataflow analysis in order to partition a Kairos program into a set of stand-alone units called nodecuts. Each nodecut is then converted into a nesC task, to be executed by the Kairos runtime system on a single node in the network.

At one extreme, one could consider the entire Kairos program to be a single nodecut and execute it at one node, fetching node-local and central variables from other nodes as needed (moving the data to the computation).  The other extreme would be to consider each instruction in the Kairos program as its own nodecut, executing it on the node whose local variables are used in the computation (moving the computation to the data).  Both of these strategies lead to generated code that has high messaging overhead and high latency, in the first case due to the on-the-fly fetching of individual variables, and in the second case due to the per-instruction migration of the thread of control.

We adopt a compilation strategy for Kairos that lies in between these two extremes, involving both control flow migration and data movement. A nodecut can include any number of statements, but it must have the property that just before it is to be executed, the runtime system can determine the location of all the node-local variables needed for the nodecut's execution. We therefore define a nodecut as a subgraph of a program's control-flow graph (CFG) such that for every expression of the form var@e in the subgraph, the l-values in e have no reaching definitions within that subgraph.

Given this property of nodecuts, the runtime system can retrieve all the necessary node-local and central variables concurrently, before beginning execution of a nodecut, which improves the latency immensely over the first strategy above.  At the same time, because the runtime system has information about the required node-local variables, it can determine the best node (in terms of messaging costs) at which to execute the nodecut, thereby obtaining the benefits of the second strategy above without the latency and message costs of per-statement migration. 

Figure 1 below shows the algorithm for determining nodecuts of a program. The algorithm starts by assuming that all CFG nodes are in the same nodecut and does a forward traversal through the CFG, creating new nodecuts along the way. For each CFG node n containing an expression of the form var@e, we find all reaching definitions of the l-values in e and collect the subset R of such definitions that occur within n’s nodecut.  If R is nonempty, we induce a new nodecut by finding a CFG node d that dominates node n and post-dominates all of the nodes in R. Node d then becomes the entry node of the new nodecut.  Any such node d can be used, but our implementation uses simple heuristics that attempt to keep the bodies of conditionals and loops in the same nodecut whenever possible.  The implementation also uses heuristics to increase the potential for concurrency.  For example, the body of a cfor is always partitioned into nodecuts that do not contain any statements from outside the cfor, so that these nodecuts can be executed concurrently.

Find-Nodecut(P)

1 Compute the CFG G of the program P

2 for all nodes n є G

3 do nodecut(n) entry(G)

4 for all nodes n є G

5 do if n contains an expression of the form exp1@v

6     then NC ← {nє G|nodecut(n) = nodecut(n)}

7           RD ← {nє NC|ncontains a definition of v that reaches n}

8           SUB Urd є RD graph of all paths from rd to n

9           D ← {n є NC|ndominates n in NC and for all rd є RD npost-dominates rd in SUB }

10         pick some node d є D as the entry node of a new nodecut

11         nodecut(d) d

12         For all n є NC that are reachable from d in NC without traversing a back edge, nodecut(n) d

13 return set of nodecuts formed

Figure 1: Algorithm for determining nodecuts.

Control Flow Migration: The Kairos runtime system is responsible for executing each nodecut produced by the compiler across the sensor network. When execution of a nodecut C completes at some node n, that node's runtime system determines an appropriate node n’at which to run the subsequent nodecut C’ and migrates the thread of control to n’.  All of the Kairos program's central variables migrate along with the thread of control, thereby making them available to C’. Because of the special property of nodecuts, the runtime system knows exactly which node-local variables are required by C’, so these variables are also concurrently fetched to n’ before execution of C’ is begun.

Since the nodecuts along with the set of node-local variables accessed in each nodecut are statically supplied by the compiler, our migration approach thus exploits a novel combination of static and dynamic information in order to optimize energy efficiency. We note that this approach does not require every node to keep a fully consistent topological map, but only the relative distances of the nodes involved in the nodecut.

Runtime

Serializable execution of cfors: To execute a cfor loop, the Kairos runtime system forks a separate thread for each iteration of the loop.  Program execution following the cfor only continues once all the forked threads have joined.  Each forked thread is initially placed at the node representing the value of the variable the cfor iterates over, and any subsequent nodecuts in the thread are placed using the migration algorithm for nodecuts described above. A forked thread may itself execute a cfor statement, in which case that thread becomes the coordinator for the inner cfor, forking threads and awaiting their join.

To provide reliability in the face of concurrency, Kairos ensures serializability of cfor loops.   This allows programmers to correctly understand their Kairos programs in terms of a sequential execution semantics. The Kairos compiler and runtime ensure serializability by transparently locking variables accessed in each cfor body. The use of locking has the potential to cause deadlocks, so we also provide a novel distributed deadlock detection and recovery algorithm.

Fault-tolerance: Failures are an important issue in sensor networks. The runtime deals with transient failures such as message losses and corruptions through a lightweight reliable transport suited for resource-constrained nodes. The runtime handles node dynamics such as crashes and additions through a simple retry-based mechanism that extends the reliable routing and transport mechanisms used in the normal case. The basic idea is that node failures trigger an undo mechanism similar to that already used for deadlock recovery, which allows the initiator of the computation to retry safely, and ignore failed nodes.

Section 4: Results and Contributions:

We have implemented the Kairos compiler and runtime running on the popular TelosB Tmote Sky mote platform. We have evaluated our system extensively across metrics such as programming simplicity, reliability, efficiency and fault-tolerance compared to the status quo [2,3]. Here, we present the key highlights of our evaluation on two sample applications.

Pursuit-Evasion Games

We compare a Kairos implementation of a Pursuit-Evasion Game (PEG) against a hand-coded node-level nesC implementation of the same application written by others on a 40 node mote testbed (Figure 2). In a PEG, multiple robots (the pursuers) collectively determine the location of one or more evaders using the sensor network, and try to corral them.  

pioneerRTH 

Figure 2: The pursuer and evader robots (left) and the 40-node topology (right).

Figure 3 shows the relative performance of Kairos version of the PEG (called Kairos-PEG) relative to the hand-coded version (nesC-PEG). It shows that the Kairos program requires less than 10% of lines of code compared to the hand-crafted node-level version (nesC-PEG), and provides better tracking accuracy due to its consistency semantics. It uses 20% additional messages and incurs slightly more than 2x additional latency in order to provide this extra accuracy. Thus, Kairos allows the programmer to be succinct and express reliability guarantees, and implements these semantics with an efficiency rivaling hand-crafted versions.

Figure 3: Relative comparison of Kairos-generated code versus hand-coded native version (lower is better).

 

Fault-tolerance: Figure 4 shows structural vibration data gathered from sensors installed on Vincent Thomas Bridge in Los Angeles. We obtained this data from a real-world deployment by other members of our group. We ran experiments on the collected traces to see if Kairos could correctly isolate the faulty sensor and its data. Kairos’ failure-recovery mechanisms identified the faulty sensor and allowed the data collection experiment to proceed to completion, as shown in Figure 4.