Democratizing Sensor Networks through Reliable and Efficient High-level Programming Abstractions
Section 1: Problem and Motivation:
Wireless sensor
networks consist of a system of distributed sensors embedded in the physical
world. They are finding increasing use in a wide range of scientific,
engineering, sociological, military, industrial and commercial applications, because
they allow us to observe, measure, understand and
control physical phenomena at scales and resolutions not imagined before. The
intended target of sensor networks is scientists and engineers drawn from a
broad spectrum of technical disciplines. Today, in order to exploit the full
potential of sensor networks, these users must also be proficient programmers,
intimately familiar with many deep issues involving programming languages,
distributed systems, operating systems, and computer networking [1,2].
Constructing practical
and reliable wirelessly-networked systems is a significant challenge currently because
programmers must cope explicitly with severe resource, bandwidth, and power
constraints on low-cost, battery-powered sensor nodes. Also, programmers must simultaneously
deal with the challenges of distributed systems, such as the need to maintain concurrency,
fault-tolerance, consistency and synchronization among numerous, asynchronous
loosely coupled nodes.
Our main goal is
to enable non-expert programmers to make effective use of sensor networks by raising
the level of abstraction over current practice significantly. The critical
change is one of perspective: rather than writing programs from the point of
view of an individual node in the network, programmers implement a central program that conceptually has
access to the entire network. This change allows a programmer to focus attention
on the higher-level algorithmics and global intent of an application, and the
compiler automatically generates the node-level programs that implement the
application on the network correctly, efficiently, and in a fault-tolerant
manner. This style of programming sensor networks is known as macroprogramming
[1,2,3].
We present Kairos,
which is a language, compiler and runtime system that exposes a high-level
C-like programming model with sequential semantics that most users are already
familiar with, and generates equivalent node-level versions of the program
automatically [2]. By freeing the programmer from manually decomposing
sequential-style high-level programs into concurrent node-level programs,
Kairos reduces cognitive burden and enables sensor networks to be programmed easily,
correctly and efficiently by a large community of scientists and engineers.
Thus, the key
contribution of this research is one of the first practical programming systems
that allows non-experts to program sensor networks. In
order to realize this goal, we have developed the Kairos compiler and runtime
for state-of-the-art sensor networking hardware (Section 3), evaluated it
extensively in several real deployments and on programs written by non-expert
programmers (Section 4), and made the system available to the community
at-large. This work also resulted in an ongoing project that aims to simplify
programming even further by providing a graphical development environment similar
to MATLAB/Simulink that reuses many of the ideas developed here.
Section 2: Background
and Related Work:
A sensor node
consists of a low-power processor, a few kilobytes of RAM and ROM, various type
of sensor devices, and a low bit-rate, short-range radio. Each such node is
called a mote, and is exclusively battery-powered. Wireless messaging consumes
the bulk of energy, and programs must therefore be optimized for message
efficiency.
Today, the
dominant method of programming such resource- and power-constrained devices is
a low-level language called nesC [4]. nesC is
low-level in the sense that it provides language constructs to express only
node-level operations, which are then compiled to use the services of an
operating system called TinyOS. While nesC and TinyOS provide abstractions and
libraries that simplify node-level sensor-network application programming, ensuring
the global efficiency and
reliability of sensor network applications is still the province of the
programmer. But this process is hard, tedious and error-prone because the
programmer must manually decompose a high-level algorithm into programs for
each individual sensor node, must ensure that these programs efficiently
communicate with one another, must implement any necessary data consistency and
control-flow synchronization protocols among these node-level programs, and
must explicitly manage resources at each node. In contrast, the Kairos language
provides a centralized programming model, and the compiler and runtime compile
the program into node-level versions, and implement the necessary concurrency,
synchronization, and fault-tolerance in order to ensure a reliable and
efficient execution. Under the covers, the language constructs are compiled to
efficient event-driven nesC code, and linked to TinyOS libraries, thereby
maintaining compatibility with existing software and hardware.
Section 3: Approach
and Uniqueness:
Kairos is designed to provide a simple programming model that addresses
the challenges and requirements of sensor network programming. Kairos'
sequential semantics makes programs easy to understand and is natural when
programming sensor networks in a centralized fashion. Concurrency is introduced in a simple manner
appropriate to the domain. This strong form of consistency and reliability is the
default, and is important for many sensor network applications. Kairos also
supports weaker forms of consistency that the programmer can call explicitly. To
our knowledge, no other macroprogramming system guarantees even weak forms of
consistency.
Kairos has three unique features: a) its language constructs,
which enable programmers to express code succinctly; b) its compiler,
which provides automatic program partitioning to convert centralized programs
into node-level versions and efficient control flow migration in order to execute
these node-level versions efficiently; and c) its runtime, which ensures
data consistency while providing enough concurrency and fault-tolerance. We
describe each feature in turn.
Language
Constructs
Node naming:
Kairos provides a set of language constructs that allow programmers to easily
access nodes and node-local state in a high-level, centralized, and
topology-independent manner. The node type
provides an abstraction of a single network node, and the nodeset type provides an iterator abstraction for an unordered
collection of nodes.
Node-local variables: Kairos extends standard C variable naming to address node-local
state. This facility allows programmers to naturally express distributed computations
and eliminates the need for programmers to manually implement inter-node data
access and communication. Node-local variables are declared as ordinary C
variables but include the attribute nodelocal. The
attribute indicates that there is one version of the variable per node in the
network. A node-local variable is addressed inside a Kairos program using a new
expression var@e, where var is a nodelocal variable and e is an expression of type node.
Concurrency:
By default, a Kairos program has a sequential
execution semantics. However, Kairos
also provides a simple form of programmer-directed concurrency. The cfor loop is like an ordinary for loop but allows for concurrent execution of the
loop's iterations. A cfor loop can iterate over any nodeset, and the loop body will be executed concurrently for
each node in the set. While concurrency
is often essential to achieve good performance, it can cause subtle errors that
are difficult to understand and debug. To help programmers obtain the benefits
of concurrency while maintaining reliability, the Kairos compiler and runtime
system ensure that the execution of a cfor is always
serializable: the effect of a cfor always corresponds to some sequential execution of
the loop.
Compiler
The Kairos
compiler is built as an extension to the CIL infrastructure for C analysis and
transformation [5]. Our compiler accepts
a Kairos program as input and produces node-level nesC code that can be linked
with standard TinyOS components and the Kairos runtime system. The Kairos runtime system is a collection of
TinyOS modules that orchestrates the execution of the compiler-generated nesC
code across the nodes in the network. The compiler is responsible for program
partitioning and control flow migration.
Partitioning: The
Kairos compiler performs a dataflow analysis in order to partition a Kairos
program into a set of stand-alone units called nodecuts. Each nodecut is then converted into a nesC task, to be
executed by the Kairos runtime system on a single node in the network.
At one extreme,
one could consider the entire Kairos program to be a single nodecut and execute
it at one node, fetching node-local and central variables from other nodes as
needed (moving the data to the computation).
The other extreme would be to consider each instruction in the Kairos
program as its own nodecut, executing it on the node whose local variables are
used in the computation (moving the computation to the data). Both of these strategies lead to generated
code that has high messaging overhead and high latency, in the first case due
to the on-the-fly fetching of individual variables, and in the second case due
to the per-instruction migration of the thread of control.
We adopt a
compilation strategy for Kairos that lies in between these two extremes,
involving both control flow migration and data movement. A nodecut can include
any number of statements, but it must have the property that just before it is
to be executed, the runtime system can determine the location of all the
node-local variables needed for the nodecut's execution. We therefore define a nodecut
as a subgraph of a program's control-flow graph (CFG) such that for every
expression of the form var@e
in the subgraph, the l-values in e have no reaching definitions within that subgraph.
Given this
property of nodecuts, the runtime system can retrieve all the necessary
node-local and central variables concurrently, before beginning execution of a
nodecut, which improves the latency immensely over the first strategy
above. At the same time, because the
runtime system has information about the required node-local variables, it can
determine the best node (in terms of messaging costs) at which to execute the nodecut,
thereby obtaining the benefits of the second strategy above without the latency
and message costs of per-statement migration.
Figure 1 below
shows the algorithm for determining nodecuts of a program. The algorithm starts
by assuming that all CFG nodes are in the same nodecut and does a forward
traversal through the CFG, creating new nodecuts along the way. For each CFG
node n containing an expression of the form var@e, we find all reaching definitions of the l-values in e and collect the subset R of
such definitions that occur within n’s
nodecut. If R is nonempty, we induce a new nodecut by finding a CFG
node d that dominates node n and
post-dominates all of the nodes in R. Node d then becomes the entry node of the new nodecut. Any such node d can
be used, but our implementation uses simple heuristics that attempt to keep the
bodies of conditionals and loops in the same nodecut whenever possible. The implementation also uses heuristics to
increase the potential for concurrency.
For example, the body of a cfor is always
partitioned into nodecuts that do not contain any statements from outside the cfor, so that these nodecuts can be executed concurrently.
Find-Nodecut(P)
1 Compute the CFG G of the
program P
2 for all nodes n є G
3 do nodecut(n)
← entry(G)
4 for all nodes n є G
5 do if n contains an
expression of the form exp1@v
6 then NC ← {n′ є G|nodecut(n′) = nodecut(n)}
7 RD ← {n′
є NC|n′ contains a definition of v that reaches n}
8 SUB ← Urd
є RD graph of all paths from rd to n
9 D ← {n′ є NC|n′ dominates
n in NC and for all rd є RD n′ post-dominates rd in SUB }
10 pick some node d є D as the entry node of a
new nodecut
11 nodecut(d) ← d
12 For all n′ є NC that
are reachable from d in NC without traversing a back edge, nodecut(n′) ← d
13 return set of nodecuts
formed
Figure
1: Algorithm for determining nodecuts.
Control Flow Migration: The Kairos runtime system is responsible for executing
each nodecut produced by the compiler across the sensor network. When execution
of a nodecut C completes at some node n, that
node's runtime system determines an appropriate node n’at which
to run the subsequent nodecut C’ and migrates the thread
of control to n’. All of the Kairos
program's central variables migrate along with the thread of control, thereby
making them available to C’. Because of the special property of nodecuts, the
runtime system knows exactly which node-local variables are required by C’, so these variables are also concurrently fetched to n’ before execution of C’ is
begun.
Since the nodecuts
along with the set of node-local variables accessed in each nodecut are
statically supplied by the compiler, our migration approach thus exploits a
novel combination of static and dynamic information in order to optimize energy
efficiency. We note that this approach does not require every node to keep a
fully consistent topological map, but only the relative distances of the nodes
involved in the nodecut.
Runtime
Serializable
execution of cfors: To execute a cfor loop, the Kairos runtime
system forks a separate thread for each iteration of
the loop. Program execution following
the cfor only
continues once all the forked threads have joined. Each forked thread is initially placed at the
node representing the value of the variable the cfor iterates over, and any
subsequent nodecuts in the thread are placed using the migration algorithm for nodecuts
described above. A forked thread may itself execute a cfor statement, in which case
that thread becomes the coordinator for the inner cfor, forking threads and
awaiting their join.
To provide reliability in the face of concurrency, Kairos ensures
serializability of cfor loops. This allows programmers to correctly
understand their Kairos programs in terms of a sequential
execution semantics. The Kairos compiler and runtime ensure serializability by
transparently locking variables accessed in each cfor body. The use of locking
has the potential to cause deadlocks, so we also provide a novel distributed
deadlock detection and recovery algorithm.
Fault-tolerance: Failures are an important
issue in sensor networks. The runtime deals with transient failures such as
message losses and corruptions through a lightweight reliable transport suited
for resource-constrained nodes. The runtime handles node dynamics such as
crashes and additions through a simple retry-based mechanism that extends the
reliable routing and transport mechanisms used in the normal case. The basic
idea is that node failures trigger an undo mechanism similar to that already
used for deadlock recovery, which allows the initiator of the computation to
retry safely, and ignore failed nodes.
Section 4: Results and Contributions:
We have implemented the Kairos compiler and
runtime running on the popular TelosB Tmote Sky mote platform. We have evaluated our system
extensively across metrics such as programming simplicity, reliability,
efficiency and fault-tolerance compared to the status quo [2,3].
Here, we present the key highlights of our evaluation on two sample
applications.
Pursuit-Evasion Games
We compare a Kairos implementation of a
Pursuit-Evasion Game (PEG) against a hand-coded node-level nesC implementation
of the same application written by others on a 40 node mote testbed
(Figure 2). In a PEG, multiple robots (the pursuers) collectively determine the
location of one or more evaders using the sensor network, and try to corral
them.

Figure
2: The pursuer and evader robots (left) and the 40-node topology (right).
Figure
3 shows the relative performance of Kairos version of the PEG (called
Kairos-PEG) relative to the hand-coded version (nesC-PEG). It shows that the Kairos
program requires less than 10% of
lines of code compared to the hand-crafted node-level version (nesC-PEG), and
provides better tracking accuracy due to its consistency semantics. It uses 20%
additional messages and incurs slightly more than 2x additional latency in
order to provide this extra accuracy. Thus, Kairos allows the programmer to be
succinct and express reliability guarantees, and implements these semantics
with an efficiency rivaling hand-crafted versions.

Figure
3: Relative comparison of Kairos-generated code versus hand-coded native
version (lower is better).
Fault-tolerance: Figure 4 shows structural
vibration data gathered from sensors installed on Vincent Thomas Bridge in Los
Angeles. We obtained this data from a real-world deployment by other members of
our group. We ran experiments on the collected traces to see if Kairos could
correctly isolate the faulty sensor and its data. Kairos’ failure-recovery
mechanisms identified the faulty sensor and allowed the data collection experiment
to proceed to completion, as shown in Figure 4.