Delay Composition Algebra

A Reduction-based Approach to Schedulability Analysis in Distributed Real-time Systems


Sponsor: NSF



The theoretical foundations of this work were funded by two NSF grants. EHS 0208769 (A Paradigm for Scalable Open Real-Time Computing under Uncertainty) funded early efforts in 2002-2006, that resulted in Feasible Region Calculus. CSR 0720513 (An Extended Theory for Temporal Composition of Distributed Real-Time Computing Systems) funded follow-up work in 2007-2012, that resulted in Delay Composition Algebra. Applications of this theory were funded by MURI, ARL, ONR NSF, and DTRA grants.  


PART (I): Feasible Region Calculus (2002-2006)



1. Introduction

Timing properties of software (relating to its ability to meet timing constraints) have traditionally been analyzed using real-time scheduling theory. This theory is perhaps one of the most studied and well understood aspects of real-time computing. However, the current state of the art is far from complete.

Most prior research addressed small systems such as single processors or multiprocessor machines. In contrast, future computing systems will be marked by massive distribution. They will see increased convergence of computation, communication, and sensing, and will augment a widely-distributed physical environment, interacting with it under constraints of real time and space. Despite the very large volume of real-time computing literature that presents different analysis techniques for computing the ability of systems to meet time constraints (called schedulability analysis in real-time computing terminology), there is a lack of a fundamental theory that allows systematic analysis of distributed real-time systems by appropriately composing properties of their components. Consider, by contrast, the fundamental laws of circuit theory used to analyze linear electric circuits. In that theory, a small number of fundamental rules (e.g., Kirchhoff laws) allow a designer to analyze complex circuits of arbitrary interconnection topology, reducing them to their effective transfer functions and deducing their end-to-end characteristics, such as total impedance, current draw, and voltage drop. Feasible region calculus aims to develop a similar theory for distributed real-time computing based on results, which suggest rules for composition of temporal behavior of real-time system components.

Observe that the capacity of a distributed system is a complicated concept because the rate at which a distributed system can execute an application is a function of not only the speed and topology of the system but also the application flow graph. For example, an application with more internal parallelism will run faster on a given system than one that is primarily serialized. Furthermore, in a time-constrained system, tasks that finish late do not contribute useful work. Hence, it is important to quantify only the fraction of work that can be done on time. We call it the real-time capacity of the system. A mathematical foundation is developed for composing real-time capacity expressions given application flow graphs, time constraints and resource topology. This foundation utilizes a small set of composition rules to arrive at the corresponding application-aware capacity expressions. We also consider limits when system components become infinitesimal and the number of components needed for an application grows to infinity. In such cases, it will be more appropriate to consider resource density and liquid task flows rather than individual instances of resources and tasks. Capacity expressions are re-written in terms of density and flow distributions. Such models will be suitable for future computing artifacts such as smart paint and amorphous computing systems where individual computing components become insignificant and computations may involve very large numbers of components.

2. Utilization-Based Feasible Regions

One of the big gaps in the theoretical underpinnings of real-time computing have traditionally lied in the inability to relate aggregate metrics (such as utilization) to per-task temporal performance metrics (such as response time of individual tasks) except in some restricted special cases (such as that of scheduling periodic tasks). The first major result of feasible region calculus has been the derivation of regions, defined in the state space of resource utilization, that imply satisfaction of all individual temporal performance constraints. Utilization bounds for schedulability existed previously only for periodic tasks. However, the periodicity requirement was a limit on applicability. Moreover, no framework existed for composing the bounds of individual components to compute those of aggregate systems. Composition is complicated by the fact that in a multistage system a task can still meet its end-to-end deadline despite overload on some resources as long as others are underloaded. Hence, looking at resources in isolation may be misleading. Instead, a surface exists in a space whose dimensions are the utilizations of individual resources such that all utilization vectors below that surface result in schedulable systems. The shape of that surface is a key research problem addressed by feasible region calculus. 

Observe that most scheduling problems are NP-complete, which typically calls for heuristic solutions. We demonstrate that computationally tractable sufficient solutions to the general problem of temporal analysis of distributed resource constrained communicating tasks are also possible if the feasible region is appropriately defined. The following references constitute a brief tutorial on feasible region calculus.

The first basic result: a utilization bound for aperiodic tasks under fixed-priority scheduling. This result drops the periodicity requirement that prevented prior work from being applicable to a large category of systems with no regularity in task arrival patterns.

Extensions of the basic results to multiprocessors (these are derived mainly for the sake of completeness and intellectual curiosity).

Extensions that account for mutual exclusion (blocking): These allow modeling systems of non-independent tasks, in which semaphores or other synchronization constraints are used to regulate access to passive resources.

Extensions of feasible regions to resource pipelines: These basic results allow composing feasible regions of individual components to compute those of distributed systems.

An application of feasible region calculus to compute the real-time capacity of a sensor network:

3. Non-Utilization-Based Feasible Regions

The scheduling policy has a dramatic effect on the ability of a system to meet deadlines. Hence, feasible region calculus must allow composition under different scheduling policies. An interesting and important consideration is that often the scheduling policy is imposed on the application developer by legacy factors such as operating system support. Hence, the scheduling policy is often a constraint and not a design parameter. This is very different from the typical direction in real-time computing that focuses on analyzing a handful of ``good'' policies with no general framework for incorporating arbitrary (often suboptimal) scheduling policies into the analysis as design constraints. A significant contribution of out work therefore lies in its ability to analyze arbitrary scheduling policies within the same general framework for quantifying real-time capacity. To the best of our knowledge, ours is the first such general framework in real-time computing literature.

The incorporation of arbitrary scheduling policies into the capacity analysis has very important ramifications on the analytic engine used. Previous efforts have attempted to relate latency (or ability to meet deadlines) to system state (typically, resource utilization) in distributed systems. The implicit assumption has been that utilization is a good predictor of latency or the ability of a system to meet deadlines. Unfortunately, we have shown that, depending on the resource scheduling policy, utilization may or may not be a good predictor of ability to meet latency constraints. Instead, we developed an automated method for choosing, for any scheduling policy, a family of load metrics (not necessarily utilization) to represent system state such that it becomes possible to predict (based on any metric in that family) whether or not deadlines are met. Specifically, if the load, as measured by the new metrics, is within a certain range, it is assured that all distributed tasks finish on time. As before, composition rules are defined to determine feasible regions defined on non-utilization-based load metrics.

·         Xue Liu and Tarek Abdelzaher, "Nonutilization bounds and feasible regions for arbitrary fixed-priority policies," ACM Transactions on Embedded Systems, Vol. 10, Issue 3, April 2011. (An earlier version appeared in the Proceedings of the Twelfth IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2006))


PART (II): Delay Composition Algebra (2007-2012)


The above line of research focused on extensions to the derivation of utilization bounds or utilization-like bounds for schedulability. These extensions addressed aperiodic tasks, new load metrics, and pipelines. The idea was to quantify feasible regions defined in the space of resource utilization such that end-to-end deadlines are met as long as the system operated within the feasible regions derived. Feasible regions, like utilization bounds, however, are high-level derived constructs. A more basic and fundamental question is to ask: how does delay compose in distributed systems? Answering this question led to the development of a new approach for schedulability analysis of distributed systems, based on reducing them to equivalent centralized uniprocessors. This approach, called delay composition algebra, is described below.


4. Sub-additive Delay Composition


Uniprocessor schedulability results are “easy” because delays on a single processor compose by addition. If task A is preempted by task B and task C, then it is preempted by the sum of B’s and C’s (worst case) computation times. This additive property is the foundation for schedulability analysis expressions for uniprocessors. The problem with distributed systems is that delays do not necessarily compose additively. For example, if task A is preempted by B and C on processors X and Y, where X and Y form a pipeline, then task A is not necessarily preempted by the sum of B’s and C’s (worst case) computation times on X and Y. This is due to the inherent concurrency in distributed systems. If A waits for B and C on processor X, then while A finally executes on X, B and C might execute on Y. Hence, A does not wait for all of B and C on Y. The delay composition is therefore sub-additive. The interesting question is, can we describe how delays compose in distributed systems? The following papers derived the basic delay composition theorems for resource pipelines and directed acyclic graphs:



The above papers lead to the development of a delay composition algebra. It enables a new approach to schedulability analysis in distributed systems: namely, a reduction-based approach.  The algebra reduces distributed system workload to an equivalent uniprocessor workload that can be analyzed using uniprocessor schedulability analysis techniques to infer end-to-end delay and schedulability properties of each of the original distributed jobs.



5. The Basic Algebra


The algebraic operators in the delay composition algebra perform reductions on a resource graph (representing a distributed system) into a smaller graph by merging nodes. Each node represents a resource in the system and is characterized by a load matrix that summarizes its workload. When nodes are merged in the original system, their workloads are merged as well into an equivalent workload for the resulting node. The main function of the algebraic operators of delay composition algebra is to perform workload reductions as nodes merge. The reductions stop when only one node remains, meaning that the distributed system workload is fully reduced to that of an equivalent uniprocessor. Analyzing the workload on the resulting uniprocessor using any of the standard uniprocessor schedulability analysis techniques determines the schedulability of the original distributed task set.


A key consideration in developing operators of the new algebra is that the operand set must be closed under composition. In other words, both the operands and the result of each operator must have the same structure. As mentioned above, the operands refer to the load matrices of the nodes being merged. The result is the load matrix of the resulting merged node. Since merged nodes that result from composition represent more than one physical node, the workload representation for a node should be the same whether this node represents a single resource or the result of merging an entire subsystem. A key contribution of this project has been the attainment of such a general representation (the load matrix of a node) and the derivation of simple algebraic operators that manipulate this representation to reduce distributed systems workload into a single-node workload that is equivalent from a schedulability perspective.


Observe that if the operand matrices include variables (e.g., task computation times that are yet unspecified), applying our algebraic operators to reduce the distributed system to a single node yields equivalent uniprocessor load expressions that are a function of those variables. Furthermore, applying schedulability tests (e.g., uniprocessor utilization bounds) to the equivalent load yields schedulability expressions that are a direct function of the original load variables of the distributed system.  This makes it possible to quantify the distributed system parameter space for which schedulability is attained.


In summary, existing techniques for analyzing delay and schedulability of jobs in distributed systems can be broadly classified into two categories: (i) decomposition-based, and (ii) extension-based. The decomposition-based techniques break the system into multiple subsystems, analyze each subsystem independently using current uniprocessor analysis techniques, then combine the results. The extension-based techniques explore ways to extend current uniprocessor analyses to accommodate distributed tasks and resources. In contrast, we developed a third category of techniques for analyzing distributed systems based on reduction (as opposed to decomposition or extension). Rather than breaking up the problem into subproblems, or extending uniprocessor analyses to more complex systems, we systematically reduce the distributed system schedulability problem to a single simple problem on a uniprocessor. The uniprocessor problem is then solved thereby determining the schedulability of the original distributed system. The following paper describes the basic delay composition algebra:




6. Optimizations and Extensions


The original algebraic framework, published in 2008, was extended and various optimizations were developed. The first extension was to the case where the task flows in the system can form dependency cycles:




Reduction of distributed systems to equivalent uniprocessors led to a new 'unirpocessor' task model that does not naturally arise in uniprocessor systems and hence was not previously explored. New schedulability results had to be developed for this task model to produce a better schedulability test. Namely, in distributed systems of periodic tasks, a task that traverses a given set of processing stages may end up co-located with a different subset of other tasks on each stage of its distributed execution. When the entire task system is reduced to a single equivalent uniprocessor, an effect similar to mode changes is observed: the task set that competes with a given task, say task T, changes over time as task T moves from one stage to another in the original distributed system. The exact times at which the (mode) changes occur are not known in advance, since they depend on task latency on each stage, which is not yet computed. What is known, however, is the set of competing tasks at each stage. Hence, a new problem arises on the reduced equivalent uniprocessor: Namely, given task T and the competing task sets in each mode, it is desired to find the worst-case possible time instants for mode changes, taking into account that such changes can occur only when task T moves from one stage to the next in the original distributed system. A polynomial-time algorithm was developed to solve the above problem. Note that, this algorithm is orthogonal to the reduction developed in previous years. The reduction merely defines the parameters of the equivalent task set in each mode on the equivalent uniprocessor (given the original execution parameters of tasks in the distributed system). Those parameters are computed via set of maximization and addition operators. Schedulability analysis of the uniprocessor with mode changes was published in RTSS 2009:




A comprehensive summary of the algebra was then invited for publication in the Journal of Real-time Systems:




In an extension of the main algebra line of research, the PIs used the developed delay composition algebra to understand robustness of end-to-end timing behavior of distributed systems to changes in task execution parameters at individual stages. The algebra offers a simple, compact way to relate parameters of individual stages to properties of end-to-end delay. This, in turn, led to the idea of optimizing robustness. In other words, a research question posed was how to allocate finite resources in a distributed system in such a way that robustness is maximized while ensuring that the system remains schedulable. A metric to measure robustness was developed, and an algorithm to maximize robustness (subject to schedulability constraints) was derived and published in RTSS 2010:




A second advance was to develop a closed form real-time capacity result for (a class of) distributed real-time systems. Specifically, we developed a closed form real-time capacity expression for data fusion systems characterized by tree topologies where the workflows originate at the leaves and propagate towards the root. The closed form expression describes the conditions under which end-to-end deadlines of all workflows are met. The result was derived by reducing such a system generically to an equivalent uniprocessor using the reduction techniques developed earlier in the project, then using uniprocessor utilization bounds to derive an expression of the schedulability boundary. A paper on the topic appeared in Fusion 2011:


  • Forrest Iandola, Fatemeh Saremi, Tarek Abdelzaher, Praveen Jayachandran, Aylin Yener, “Real-time Capacity of Networked Data Fusion,” in Proc 14th International Conference on Information Fusion (Fusion '11), Chicago, IL, July 2011.



Finally, the authors extended the above paper to consider the important case where synchronization primitives are used such that a module must wait until all of its inputs arrive before executing. This case was new in that all previous work on the algebra under this project assumed that any input arrival always triggered some work immediately. A good example of systems where such synchronization is used is data fusion systems where different data pipelines merge and fusion results can be computed only when all data inputs have been received. The previously derived delay composition theorem was augmented to account for additional latency imposed due to synchronization. A reduction was obtained from (a category of) systems that feature such synchronization to an equivalent uniprocessor. These results were published in Fusion 2012.




7. Example Applications Related to this Project


The framework found applicability in many applications (addressed, in part, under separate funding). The PI and his students demonstrated that their work on schedulability in pipelines can be integrated with recent network optimization techniques to produce wireless networks that dynamically maximize the total bandwidth of schedulable flows:




The algebra developed in this project further allowed latency analysis of new classes of networks, namely disruption-tolerant networks, a result that was published in RTSS 2010: