Compiler Design: Analysis and Transformation
Helmut Seidl, Reinhard Wilhelm, Sebastian Hack (auth.)While compilers for highlevel programming languages are large complex software systems, they have particular characteristics that differentiate them from other software systems. Their functionality is almost completely welldefined  ideally there exist complete precise descriptions of the source and target languages. Additional descriptions of the interfaces to the operating system, programming system and programming environment, and to other compilers and libraries are often available.
The book deals with the optimization phase of compilers. In this phase, programs are transformed in order to increase their efficiency. To preserve the semantics of the programs in these transformations, the compiler has to meet the associated applicability conditions. These are checked using static analysis of the programs. In this book the authors systematically describe the analysis and transformation of imperative and functional programs. In addition to a detailed description of important efficiencyimproving transformations, the book offers a concise introduction to the necessary concepts and methods, namely to operational semantics, lattices, and fixedpoint algorithms.
This book is intended for students of computer science. The book is supported throughout with examples, exercises and program fragments.
 Open in Browser
 Checking other formats...
 Convert to EPUB
 Convert to FB2
 Convert to MOBI
 Convert to TXT
 Convert to RTF
 Converted file can differ from the original. If possible, download the file in its original format.
 Please login to your account first

Need help? Please read our short guide how to send a book to Kindle.
Please note you need to add our email km@bookmail.org to approved email addresses. Read more.


Compiler Design Helmut Seidl Reinhard Wilhelm Sebastian Hack • Compiler Design Analysis and Transformation 123 Sebastian Hack Programming Group Universität des Saarlandes Saarbrücken, Germany Helmut Seidl Fakultät für Informatik Technische Universität München Garching, Germany Reinhard Wilhelm Compiler Research Group Universität des Saarlandes Saarbrücken, Germany ISBN 9783642175473 DOI 10.1007/9783642175480 ISBN 9783642175480 (eBook) Springer Heidelberg New York Dordrecht London Library of Congress Control Number: 2012940955 ACM Codes: D.1, D.3, D.2 Ó SpringerVerlag Berlin Heidelberg 2012 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acidfree paper Springer is part of Springer Science+Business Media (www.springer.com) Preface Compilers for programming languages should translate sourcelanguage programs correctly into targetlanguage programs, often programs of a machine language. But not only that; they should often generate targetmachine code that is as efficient as possible. This book deals with this problem, namely the methods to improve the efficiency of target programs by a compiler. The history of this particular subarea of compilation dates back to the early days of computer science. In the 1950s, a team at IBM led by John Backus implemented a first compiler for the programming language FORTRAN. The target machine was the IBM 704, which was, according to today’s standards, an incredibly small and incredibly slow machine. This motivated the team to think about a translation that would efficiently exploit the very modest machine resources. This was the birth of ‘‘optimizing compilers’’. FORTRAN is an imperative programming language designed for numerical computations. It offers arrays as data structures to store mathematical objects such as vectors and matrices, and it offers loops to formulate iterative algorithms on these objects. Arrays in FORTRAN, as well as in ALGOL 60, are very close to the mathematical objects that are to be stored in them. The descriptional comfort enjoyed by the numerical analyst was at odds with the requirement of runtime efficiency of generated target programs. Several sources for this clash were recognized, and methods to deal with them were discovered. Elements of a multidimensional array are selected through sequences of integervalued expressions, which may lead to complex and expensive computations. Some numerical computations use the same or similar index expressions at different places in the program. Translating them naively may lead to repeatedly computing the same values. Loops often step through arrays with a constant increment or decrement. This may allow us to improve the efficiency by computing the next address using the address used in the last step instead of computing the address anew. By now, it should be clear that arrays and loops represent many challenges if the compiler is to improve a program’s efficiency compared to a straightforward translation. v vi Preface Already the first FORTRAN compiler implemented several efficiency improving program transformations, called optimizing transformations. They should, however, be carefully applied. Otherwise, they would change the semantics of the program. Most such transformations have applicability conditions, which when satisfied guarantee the preservation of the semantics. These conditions, in general, depend on nonlocal properties of the program, which have to be determined by a static analysis of the program performed by the compiler. This led to the development of dataflow analysis. This name was probably chosen to express that it determines the flow of properties of program variables through programs. The underlying theory was developed in the 1970s when the semantics of programming languages had been put on a solid mathematical basis. Two doctoral dissertations had the greatest impact on this field; they were written by Gary A. Kildall (1972) and by Patrick Cousot (1978). Kildall clarified the latticetheoretic foundations of dataflow analysis. Cousot established the relation between the semantics of a programming language and static analyses of programs written in this language. He therefore called such a semanticsbased program analysis abstract interpretation. This relation to the language semantics allows for a correctness proof of static analyses and even for the design of analyses that are correct by construction. Static program analysis in this book always means sound static analysis. This means that the results of such a static analysis can be trusted. A property of a program determined by a static analysis holds for all executions of the program. The origins of dataflow analysis and abstract interpretation thus lie in the area of compilation. However, static analysis has emancipated itself from its origins and has become an important verification method. Static analyses are routinely used in industry to prove safety properties of programs such as the absence of runtime errors. Soundness of the analyses is mandatory here as well. If a sound static analysis determines that a certain runtime error will never occur at a program point, this holds for all executions of the program. However, it may be that a certain runtime error can never happen at a program point, but the analysis is unable to determine this fact. Such analyses thus are sound, but may be incomplete. This is in contrast with bugchasing static analysis, which may fail to detect some errors and may warn about errors that will never occur. These analyses may be unsound and incomplete. Static analyses are also used to prove partial correctness of programs and to check synchronization properties of concurrent programs. Finally, they are used to determine executiontime bounds for embedded realtime systems. Static analyses have become an indispensable tool for the development of reliable software. This book treats the compilation phase that attempts to improve the efficiency of programs by semanticspreserving transformations. It introduces the necessary theory of static program analysis and describes in a precise way both particular static analyses and program transformations. The basis for both is a simple programming language, for which an operational semantics is presented. The volume Wilhelm and Seidl: Compiler Design: Virtual Machines treats several programming paradigms. This volume, therefore, describes analyses and Preface vii transformations for imperative and functional programs. Functional languages are based on the kcalculus and are equipped with a highly developed theory of program transformation. Several colleagues and students contributed to the improvement of this book. We would particularly like to mention Jörg Herter and Iskren Chernev, who carefully read a draft of this translation and pointed out quite a number of problems. We wish the reader an enjoyable and profitable reading. München and Saarbrücken, November 2011 Helmut Seidl Reinhard Wilhelm Sebastian Hack General literature The list of monographs that give an overview of static program analysis and abstract interpretation is surprisingly short. The book by Matthew S. Hecht [Hec77], summarizing the classical knowledge about dataflow analysis is still worth reading. The anthology edited by Steven S. Muchnick and Neil D. Jones [MJ81], which was published only a few years later, contains many original and influential articles about the foundations of static program analysis and, in particular, the static analysis of recursive procedures and dynamically allocated data structures. A similar collection of articles about the static analysis of declarative programs was edited by Samson Abramsky and Chris Hankin [AH87]. A comprehensive and modern introduction is offered by Flemming Nielson, Hanne Riis Nielson and Chris Hankin [NNH99]. Several comprehensive treatments of compilation contain chapters about static analysis [AG04, CT04, ALSU07]. Steven S. Muchnick’s monograph ‘‘Advanced Compiler Design and Implementation’’ [Muc97] contains an extensive treatment.The Compiler Design Handbook, edited by Y.N. Srikant and Priti Shankar [SS03], offers a chapter about shape analysis and about techniques to analyze objectoriented programs. Ongoing attempts to prove compiler correctness [Ler09, TL09] have led to an increased interest in the correctness proofs of optimizing program transformations. Techniques for the systematic derivation of correct program transformations are described by Patrick and Radhia Cousot [CC02]. Automated correctness proofs of optimizing program transformations are described by Sorin Lerner [LMC03, LMRC05, KTL09]. ix Contents 1 Foundations and Intraprocedural Optimization. . 1.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . 1.2 Avoiding Redundant Computations . . . . . . . 1.3 Background: An Operational Semantics . . . . 1.4 Elimination of Redundant Computations . . . . 1.5 Background: Complete Lattices . . . . . . . . . . 1.6 Least Solution or MOP Solution?. . . . . . . . . 1.7 Removal of Assignments to Dead Variables . 1.8 Removal of Assignments Between Variables . 1.9 Constant Folding . . . . . . . . . . . . . . . . . . . . 1.10 Interval Analysis . . . . . . . . . . . . . . . . . . . . 1.11 Alias Analysis . . . . . . . . . . . . . . . . . . . . . . 1.12 FixedPoint Algorithms. . . . . . . . . . . . . . . . 1.13 Elimination of Partial Redundancies. . . . . . . 1.14 Application: Moving LoopInvariant Code . . 1.15 Removal of Partially Dead Assignments . . . . 1.16 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . 1.17 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 7 8 11 16 27 32 40 43 54 67 83 89 97 102 108 114 2 Interprocedural Optimization . . . . . . . . . . . . 2.1 Programs with Procedures . . . . . . . . . . . 2.2 Extended Operational Semantics . . . . . . 2.3 Inlining . . . . . . . . . . . . . . . . . . . . . . . . 2.4 TailCall Optimization . . . . . . . . . . . . . 2.5 Interprocedural Analysis . . . . . . . . . . . . 2.6 The Functional Approach . . . . . . . . . . . 2.7 Interprocedural Reachability . . . . . . . . . 2.8 DemandDriven Interprocedural Analysis 2.9 The CallString Approach . . . . . . . . . . . 2.10 Exercises. . . . . . . . . . . . . . . . . . . . . . . 2.11 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 115 117 121 123 124 125 131 132 135 137 139 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi xii Contents . . . . . . . . . . 141 142 143 146 147 149 155 159 166 170 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 3 Optimization of Functional Programs . . . . . . . . . . . . . . 3.1 A Simple Functional Programming Language . . . . . 3.2 Some Simple Optimizations . . . . . . . . . . . . . . . . . 3.3 Inlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Specialization of Recursive Functions. . . . . . . . . . . 3.5 An Improved Value Analysis. . . . . . . . . . . . . . . . . 3.6 Elimination of Intermediate Data Structures . . . . . . 3.7 Improving the Evaluation Order: Strictness Analysis 3.8 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1 Foundations and Intraprocedural Optimization 1.1 Introduction This section presents basic techniques to improve the quality of compilergenerated code. The quality metric need not be a priori fixed. It could be the execution time, the required space, or the consumed energy. This book, however, is primarily concerned with methods to improve the execution time of programs. We now give several examples of how to improve the execution time of programs. One strategy to improve the efficiency of programs is to avoid superfluous computations. A computation may be superfluous when it has already been performed, and when a repetition would provably always produce the same result. The compiler can avoid this recomputation of the same result if it takes care to store the result of the first computation. The recomputation can then be avoided by accessing this stored value. The execution time of a program can be also reduced if some of the computations can already be done at compile time. Constant folding replaces expressions whose value is already known at compile time by this value. This optimization supports the development of generic programs, often called program families. These are parametrized in a number of variables and thus can be instantiated to many different variants by supplying different combinations of parameter values. This is good and effective development practice, for instance, in the embeddedsystems industry. One generic powertrain control program may be instantiated to many different versions for different car engines. Constant folding eliminates the loss in efficiency that could result from such a programming style. Checks for runtime errors can be eliminated if it is clear that they would always fail, that is, if these errors would provably never happen. A good example is the check for index out of bounds. It checks the indices of arrays against their lower and upper bounds. These checks can be avoided if the indices provably always lie within these bounds. Another idea to improve the efficiency of programs is to move computations from more frequently executed program parts into less frequently executed parts. H. Seidl et al., Compiler Design, DOI: 10.1007/9783642175480_1, © SpringerVerlag Berlin Heidelberg 2012 1 2 1 Foundations and Intraprocedural Optimization An example of this kind of optimization is to move loopinvariant computations out of loops. Some operations are more costly in execution time than others. For example, multiplication is more expensive than addition. Multiplication can be defined, and this means also replaced by, repeated addition. An optimization, called reduction in operator strength would, under certain conditions, replace a multiplication occurring in a loop by an addition. Finally, procedure inlining, i.e., replacing a procedure call by an appropriately instantiated body of the procedure, eliminates the procedurecall overhead and often opens the way to new optimizations. The following example shows how big the impact of optimizations on the quality of generated code can be: Example 1.1.1 Consider a program that should sort an array a written in an imperative programming language. This program would use the following function swap: void swap ( int i, int j) { int t; if (a[i] > a[ j]) { t ← a[ j]; a[ j] ← a[i]; a[i] ← t; } } The inefficiencies of this implementation are apparent. The addresses of a[i] and a[ j] are computed three times. This leads to 6 address computations altogether. However, two should be sufficient. In addition, the values of a[i] and a[ j] are loaded twice, resulting in four memory accesses where two should be sufficient. These inefficiencies can be removed by an implementation as suggested by the array concept of the C programming language. The idea is to access array elements through pointers. Another idea is to store addresses that are used multiple times. void swap (int ∗ p, int ∗ q) { int t, ai, a j; ai ← ∗ p; a j ← ∗q; if (ai > a j) { t ← a j; ∗q ← ai; ∗ p ← t; } } Looking more closely at this new code reveals that the temporary variable t can be eliminated as well. This second version is apparently more efficient, while the original version was much more intuitive. Highlevel programming languages are designed to allow intu 1.1 Introduction 3 itive formulations of algorithms. It is then the duty of the compiler to generate efficient target programs. Optimizing program transformations ought to preserve the semantics of the program, as defined through the semantics of the programming language in which the program is written. Example 1.1.2 Consider the transformation: y ← f() + f(); ==⇒ y ← 2 ∗ f(); The idea behind the “optimization” is to save the evaluation of the second call of the function f. However, the program resulting from this transformation is only equivalent to the original program if the second call to f is guaranteed to produce the same result and if the call does not produce a side effect. This last condition is not immediately clear for functions written in an imperative language. Socalled program optimizations are not correct if they change the semantics of the program. Therefore, most optimizing transformations have an associated applicability condition. This is a sufficient condition for the preservation of the semantics of programs. Checking the satisfaction of these applicability conditions is the duty of static program analysis. Such analyses need to be automatic, that is, run without user intervention, as they will have to be performed by the compiler. A careful treatment of the issue of semantics preservation needs several proofs. First, a proof is needed that the applicability condition is, in fact, a sufficient condition for semantics preservation. A second proof is needed that the analysis that is to determine the applicability is correct, will never give wrong answers to the question posed by the applicability condition. Both proofs refer to an operational semantics as their basis. Several optimizations are effective across several classes of programming languages. However, each programming language and also each class of programming languages additionally require specific optimizations, designed to improve the efficiency of particular language constructs. One such example is the compiletime removal of dynamic method invocations in objectoriented programs. A static method call, which replaces a dynamic call, can be inlined and thus opens the door for further optimizations. This is very effective since methods in objectoriented programs are often rather small. In Fortran, on the other hand, inlining does not play a comparably large role. For Fortran, the parallelization or vectorization of nested loops has greater impact. The programming language, in particular its semantics, also has a strong influence on the efficiency and the effectiveness of program analyses. The programming language may enforce restrictions whose validation would otherwise require an enormous effort. A major problem in the analysis of imperative programs is the determination of dependencies between the statements in programs. Such dependencies restrict the compiler’s possibility to reorder statements to better exploit the resources 4 1 Foundations and Intraprocedural Optimization of the target machine. The unrestricted use of pointers, as in the C programming language, makes this analysis of dependencies difficult due to the aliasproblem created through pointers. The more restricted use of pointers in Java eases the corresponding analysis. Example 1.1.3 Let us look at the programming language Java. Inherently inefficient language constructs are the mandatory checks for indices out of array bounds, dynamic method invocation, and storage management for objects. The absence of pointer arithmetic and of pointers into the stack increases the analyzability of Java programs. On the other hand, dynamic loading of classes may ruin the precision of Java analyses due to the lack of information about their semantics and their implementation. Further tough challenges for an automatic static analysis are offered by language constructs such as exceptions, concurrency, and reflection, which still may be useful for the Java programmer. We have stressed in the preface that sound static program analysis has become a verification technology. It is therefore interesting to draw the connection to the problem of proving the correctness of Java programs. Any correctness proof needs a formally specified semantics of the programming language. Quite some effort went into the development of such a semantics for Java. Still, Java programs with a formal correctness proof are rather rare, not because of a principal impossibility, but due to the sheer size of the necessary effort. Java just has too many language constructs, each with its nontrivial semantics. For this reason, we will not use Java as our example language. Instead we use a small subset of an imperative programming language. This subset is, on the one hand, simple enough to limit the descriptional effort, and is, on the other hand, realistic enough to include essential problems of actual compilers. This programminglanguage fragment can be seen as an intermediate language into which source programs are translated. The int variables of the program can be seen as virtual registers. The compiler backend will, during register allocation, assign physical registers to them as far as such physical registers are available. Such variables can also be used to store addresses for indirect memory accesses. Arithemtic expressions represent computations of, in our fragment, int values. Finally, the fragment contains an abitrarily large array M, into which int values can be stored and from which they can be retrieved. This array can be imagined as the whole (virtual) memory allocated to a program by the operating system. The separation between variables and memory may, at first glance, look somewhat artificial. It is motivated by the wish to avoid the alias problem. Both a variable x and a memoryaccess expression M[·] denote containers for values. The identity of a memory cell denoted by M[e] is not directly visible because it depends on the value of the expression e. In general, it is even undecidable whether M[e1 ] and M[e2 ] denote the same memory cell. This is different for variables: A variable name x is the only name by which the container associated with x can be accessed. This is important for many program analyses: If the analysis is unable to derive the identity of the memory cell denoted by M[e] in a write access then no assumptions can be made about the 1.1 Introduction 5 contents of the rest of memory. The analysis looses much precision. The derivation of assumptions about the contents of containers associated with variables is easier since no indirect access to their containers is possible. Our language fragment has the following constructs: • variables : • arithmetic expressions : • assignments : • reading access to memory : • writing access to memory : • conditional statement : • unconditional jump : x e x ←e x ← M[e] M[e1 ] ← e2 if(e) s1 else s2 gotoL Note that we have not included explicit loop constructs. These can be realized by conditional and unconditional jumps to labeled program points. Also missing so far are functions and procedures. This chapter is therefore restricted to the analysis and optimization of single functions. Example 1.1.4 Let us again consider the function swap() of Example 1.1.1. How would a compiler translate the body of this function into our language fragment? The array a can be allocated into some section of the memory M. Accesses to array components need to be translated into explicit address calculations. The result of a schematic, nonoptimized translation could be: 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10 : 11 : 12 : 13 : // A0 = &a[0] A1 ← A0 + 1 ∗ i; // R1 = a[i] R1 ← M[A1 ]; A2 ← A0 + 1 ∗ j; // R2 = a[ j] R2 ← M[A2 ]; if (R1 > R2 ) { ← A0 + 1 ∗ j; A3 t ← M[A3 ]; ← A0 + 1 ∗ j; A4 ← A0 + 1 ∗ i; A5 ← M[A5 ]; R3 M[A4 ] ← R3 ; ← A0 + 1 ∗ i; A6 M[A6 ] ← t; } // We assume that variable A0 holds the start address of the array a. Note that this code makes explicit the inherent inefficiencies discussed in Example 1.1.1. Which optimizations are applicable to this code? Optimization 1: 1 ∗ R ==⇒ R The scaling factor generated by an automatic (and schematic) translation of array indexing can be dispensed with if this factor is 1 as is the case in the example. 6 1 Foundations and Intraprocedural Optimization Optimization 2: Reuse of values calculated for (sub)expressions A closer look at the example shows that the variables A1 , A5 , and A6 have the same values as is the case for the variables A2 , A3 , and A4 : A1 = A5 = A6 A2 = A3 = A4 In addition, the memory accesses M[A1 ] and M[A5 ] as well as the accesses M[A2 ] and M[A3 ] will deliver the same values: M[A1 ] = M[A5 ] M[A2 ] = M[A3 ] Therefore, the variables R1 and R3 , as well as the variables R2 and t also contain the same values: R2 = t R1 = R3 If a variable x already contains the value of an expression e whose value is required then x’s value can be used instead of reevaluating the expression e. The program can be greatly simplified by exploiting all this information: A1 ← A0 + i; R1 ← M[A1 ]; A2 ← A0 + j; R2 ← M[A2 ]; if (R1 > R2 ) { M[A2 ] ← R1 ; M[A1 ] ← R2 ; } The temporary variable t as well as the variables A3 , A4 , A5 , and R3 are now superfluous and can be eliminated from the program. The following table lists the achieved savings: + * load store > ← Before After 6 6 4 2 1 6 2 0 2 2 1 2 1.1 Introduction 7 The optimizations applied to the function swap “by hand” should, of course, be done in an automated way. The following sections will introduce the necessary analyses and transformations. 1.2 Avoiding Redundant Computations This chapter presents a number of techniques to save computations that the program would otherwise needlessly perform. We start with an optimization that avoids redundant computations, that is, multiple evaluations of the same expression guaranteed to produce the same result. This first example is also used to exemplify fundamentals of the approach. In particular, an operational semantics of our language fragment is introduced in a concise way, and the necessary latticetheoretic foundations are discussed. A frequently used trick to speed up algorithms is to trade time against space, more precisely, invest some additional space in order to speed up the program’s execution. The additional space is used to save some computed values. These values are then later retrieved instead of recomputed. This technique is often called memoization. Let us consider the profitability of such a transformation replacing a recomputation by an access to a stored value. Additional space is needed for the storage of this value. The recomputation does not disappear completely, but is replaced by an access to the stored value. This access is cheap if the value is kept in a register, but it can also be expensive if the value has to be kept in memory. In the latter case, recomputing the value may, in fact, be cheaper. To keep things simple, we will ignore such considerations of the costs and benefits, which are highly architecturedependent. Instead, we assume that accessing a stored value is always cheaper than recomputing it. The computations we consider here are evaluations of expressions. The first problem is to recognize potential recomputations. Example 1.2.1 Consider the following program fragment: A: B: z ← 1; y ← M[5]; x1 ← y + z ; ... x2 ← y + z ; It seems like at program point B, the expression y + z will be evaluated a second time yielding the same value. This is true under the following conditions: The occurrence of y + z at program point B is always evaluated after the one at program point A, and the values of the variables y and z have the same values before B that they had before A. 8 1 Foundations and Intraprocedural Optimization Our conclusion from the example is that for a systematic treatment of this optimization we need to be able to answer the following questions: • Will one evaluation of an expression always be executed before another one? • Does a variable always have the same value at a given program point that it had at another program point? To answer these types of questions, we need several things: an operational semantics, which defines what happens when a program is executed, and a method that identifies redundant computations in programs. Note that we are not so ambitious as to attempt to identify all redundant computations. This problem is undecidable. In practice, the method to be developed should at least find some redundant computations and should never classify as redundant a computation that, in fact, is not redundant. 1.3 Background: An Operational Semantics Smallstep operational semantics have been found to be quite adequate for correctness proofs of program analyses and transformations. Such a semantics formalizes what a step in a computation is. A computation is then a sequence of such steps. We start by choosing a suitable program representation, controlflow graphs. The vertices of these graphs correspond to program points; we will therefore refer to these vertices as program points. Program execution traverses these vertices. The edges of the graph correspond to steps of the computation. They are labeled with the corresponding program actions, that is, with conditions, assignments, loads and stores from and to memory, or with the empty statement, “;”. Program point start represents the entry point of the program, and stop the exit point. Possible edge labels are: test: assignment: load: store: empty statement: NonZero (e) or Zero (e) x ←e x ← M[e] M[e1 ] ← e2 ; A section of the controlflow graph for the body of the function swap is shown in Fig. 1.1. Sometimes, we omit an edge label ;. A conditional statement with condition e in a program has two corresponding edges in the controlflow graph. The one labeled with NonZero(e) is taken if the condition e is satisfied. This is the case when e evaluates to some value not equal to 0. The edge labeled with Zero is taken if the condition is not satisfied, i.e., when e evaluates to 0. Computations are performed when paths of the controlflow graph are traversed. They transform the program state. Program states can be represented as pairs s = (ρ, μ) 1.3 Background: An Operational Semantics 9 start A1 ← A 0 + 1 ∗ i R1 ← M [A1 ] A2 ← A 0 + 1 ∗ j R2 ← M [A2 ] Zero (R1 > R2 ) NonZero (R1 > R2 ) stop A3 ← A0 + 1 ∗ j Fig. 1.1 A section of the controlflow graph for swap() The function ρ maps each program variable to its actual value, and the function μ maps each memory address to the actual contents of the corresponding memory cell. For simplicity, we assume that the values of variables and memory cells are integers. The types of the functions ρ and μ, thus, are: ρ : Vars → int value of variables μ : N → int memory contents An edge k = (u, lab, v) with source vertex u, target vertex v and label lab defines a transformation [[k]] of the state before the execution of the action labeling the edge to a state after the execution of the action. We call this transformation the effect of the edge. The edge effect need not be a total function. It could also be a partial function. Program execution in a state s will not execute the action associated with an edge if the edge effect is undefined for s. There may be two reasons for an edge being undefined: The edge may be labeled with a condition that is not satsified in all states, or the action labeling the edge may cause a memory access outside of a legal range. The edge effect [[k]] of the edge k = (u, lab, v) only depends on its label lab: [[k]] = [[lab]] The edge effects [[lab]] are defined as follows: [[; ]] (ρ, μ) [[NonZero(e)]] (ρ, μ) [[Zero(e)]] (ρ, μ) [[x ← e]] (ρ, μ) = = = = (ρ, μ) (ρ, μ) (ρ, μ) ( ρ ⊕ {x → [[e]] ρ} , μ) [[x ← M[e]]] (ρ, μ) = ( ρ ⊕ {x → μ([[e]]ρ)} , μ) [[M[e1 ] ← e2 ]] (ρ, μ) = (ρ, μ ⊕ {[[e1 ]]ρ → [[e2 ]]ρ} ) if [[e]] ρ = 0 if [[e]] ρ = 0 10 1 Foundations and Intraprocedural Optimization An empty statement does not change the state. Conditions NonZero(e) and Zero(e), represent partial identities; the associated edge effects are only defined if the conditions are satisfied, that is if the expression e evaluated to a value not equal to or equal to 0, resp. They do, however, not change the state. Expressions e are evaluated by an auxiliary function [[e]], which takes a variable binding ρ of the program’s variables and calculates e’s value in the valuation ρ. As usual, this function is defined by induction over the structure of expressions. This is now shown for some examples: [[x + y]] {x → 7, y → −1} = 6 [[¬(x = 4)]] {x → 5} = ¬0 = 1 The operator ¬ denotes logical negation. An assignment x ← e modifies the ρcomponent of the state. The resulting ρ holds the value [[e]] ρ for variable x, that is, the value obtained by evaluating e in the old variable binding ρ. The memory M remains unchanged by this assignment. The formal definition of the change to ρ uses an operator ⊕. This operator modifies a given function such that it maps a given argument to a given new value: d if y ≡ x ρ ⊕ {x → d}(y) = ρ(y) otherwise A load action, x ← M[e], is similar to an assignment with the difference that the new value of variable x is determined by first calculating a memory address and then loading the value stored at this address from memory. The store operation, M[e1 ] ← e2 , has the most complex semantics. Values of variables do not change. The following sequence of steps is performed: The values of the expressions e1 , e2 are computed. e1 ’s value is the address of a memory cell at which the value of e2 is stored. We assume for both load and store operations that the address expressions deliver legal addresses, i.e., values > 0. Example 1.3.1 An assignment x ← x + 1 in a variable binding {x → 5} results in: [[x ← x + 1]] ({x → 5}, μ) = (ρ, μ) where: ρ = {x → 5} ⊕ {x → [[x + 1]] {x → 5}} = {x → 5} ⊕ {x → 6} = {x → 6} We have now established what happens when edges of the controlflow graph are traversed. A computation is (the traversal of) a path in the controlflow graph leading 1.3 Background: An Operational Semantics 11 from a starting point u to an endpoint v. Such a path is a sequence π = k1 . . . kn of edges ki = (u i , labi , u i+1 ) of the controlflow graph (i = 1, . . . , n − 1), where u 1 = u and u n = v. The state transformation [[π]] corresponding to π is obtained as the composition of the edge effects of the edges of π: [[π]] = [[kn ]] ◦ . . . ◦ [[k1 ]] Note that, again, the function [[π]] need not be defined for all states. A computation along π starting in state s is only possible if [[π]] is defined for s. 1.4 Elimination of Redundant Computations Let us return to our starting point, the attempt to find an analysis that determines for each program point whether an expression has to be newly evaluated or whether it has an already computed value. The method to do this is to identify expressions available in variables. An expression e is available in variable x at some program point if it has been evaluated before, the resulting value has been assigned to x, and neither x nor any of the variables in e have been modified in between. Consider an assignment x ← e such that x ∈ Vars(e), that is, x does not occur in e. Let π = k1 . . . kn be a path from the entry point of the program to a program point v. The expression e is available in x at v if the two following conditions hold: • The path π contains an edge ki , labeled with an assignment x ← e. • No edge ki+1 , . . . , kn is labeled with an assignment to one of the variables in Vars(e) ∪ {x}. For simplicity, we say in this case that the assignment x ← e is available at v Otherwise, we call e or x ← e, resp., not available in x at v. We assume that no assignment is available at the entry point of the program. So, none are available at the end of an empty path π = . Regard an edge k = (u, lab, v) and assume we knew the set A of assignments available at u, i.e., at the source of k. The action labeling this edge determines which assignments are added to or removed from the availability set A. We look for a function [[k]] such that the set of assignments available at v, i.e., at the target of k, is obtained by applying [[k]] to A. This function [[k]] should only depend on the label of k. It is called the abstract edge effect in contrast to the concrete edge effect of the operational semantics. We now define the abstract edge effects [[k]] = [[lab]] for different types of actions. Let Ass be the set of all assignments of the form x ← e in the program and with the constraint that x ∈ Vars(e). An assignment violating this constraint cannot be considered as available at the subsequent program point and therefore are excluded from the set Ass. Let us assume that A ⊆ Ass is available at the source u of the edge 12 1 Foundations and Intraprocedural Optimization k = (u, lab, v). The set of assignments available at the target of k is determined according to: [[; ]] A = A A = A [[NonZero(e)]] A = [[Zero(e)]] (A\Occ(x)) ∪ {x ← e} if x ∈ Vars(e) [[x ← e]] A = A\Occ(x) otherwise [[x ← M[e]]] A = A\Occ(x) [[M[e1 ] ← e2 ]] A = A where Occ(x) denotes the set of all assignments in which x occurs either on the left or in the expression on the right side. An empty statement and a condition do not change the set of available assignments. Executing an assignment to x means evaluating the expression on the right side and assigning the resulting value to x. Therefore, all assignments that contain an occurrence of x are removed from the availableset. Following this, the actual assignment is added to the availableset provided x does not occur in the right side. The abstract edge effect for loads from memory looks similar. Storing into memory does not change the value of any variable, hence, A remains unchanged. The abstract effects, which were just defined for each type of label, are composed to an abstract effect [[π]] for a path π = k1 . . . kn in the following way: [[π]] = [[kn ]] ◦ . . . ◦ [[k1 ]] The set of assignments available at the end of a path π from the entry point of the program to program point v is therefore obtained as: [[π]] ∅ = [[kn ]] (. . . ([[k1 ]] ∅) . . .) Applying such a function associated with a path π can be used to determine which assignments are available along the path. However, a program will typically have several paths leading to a program point v. Which of these paths will actually be taken at program execution may depend on program input and is therefore unknown at analysis time. We define an assignment x ← e to be definitely available at a program point v if it is available along all paths leading from the entry node of the program to v. Otherwise, x ← e is possibly not available at v. Thus, the set of assignments definitely available at a program point v is: A∗ [v] = {[[π]] ∅  π : start →∗ v} where start →∗ v denotes the set of all paths from the entry point start of the program to the program point v. The sets A[v] are called the mergeoverallpaths (MOP) solution of the analysis problem. We temporarily postpone the question of 1.4 Elimination of Redundant Computations 13 A1 ← A + 7 A1 ← A + 7 B1 ← M [A1 ] B1 ← M [A1 ] B2 ← B1 − 1 B2 ← B1 − 1 A2 ← A + 7 A2 ← A1 M [A2 ] ← B2 M [A2 ] ← B2 Fig. 1.2 Transformation RE applied to the code for a[7]−−; how to compute these sets. Instead, we discuss how the analysis information can be used for optimizing the program. Transformation RE: An assignment x ← e is replaced by an assignment x ← y, if y ← e is definitely available at program point u just before this assignment, i.e., y ← e is contained in the set A∗ [u]. This is formally described by the following graph rewrite rule: u y ← e ∈ A∗ [u] x←e u x←y Analogous rules describe the replacement of expressions by variable accesses in conditions, in loads from and in stores into memory. The transformation RE is called redundancy elimination. The transformation appears quite simple. It may, however, require quite some effort to compute the program properties necessary to ascertain the applicability of the transformation. Example 1.4.1 Regard the following program fragment: x ← y + 3; x ← 7; z ← y + 3; The assignment x ← y + 3 is not available before, but it is available after the first statement. The second assignment overwrites the value of x. So, the third assignment can not be simplified using rule RE. Example 1.4.2 Consider the C statement a[7]; — as implemented in our language fragment. Assume that the start address of the array a is contained in variable A. Figure 1.2 shows the original controlflow graph of the program fragment together with the application of transformation rule RE. The right side, A + 7, of the 14 1 Foundations and Intraprocedural Optimization assignment A2 ← A + 7 can be replaced by the variable A1 since the assignment A1 ← A + 7 is definitely available just before the assignment A2 ← A + 7. According to transformation RE, the evaluation of an expression is not always replaced by a variable lookup, when the evaluation is definitely repeated. Additionally, the result of the last evaluation still should be available in a variable, see Example 1.4.1. In order to increase applicability of the transformation, a compiler therefore could introduce a dedicated variable for each expression occurring in the program. To develop a corresponding transformation is the task of Exercise 5. To decide when the application of the transformation RE is profitable can be nontrivial. Storing values of subexpressions costs storage resources. Access to stored values will be fast if the compiler succeeds to keep the values in registers. However, registers are scarce. Spending one of them for temporary storage may cause more costs somewhere else. Storing the value in memory, on the other hand, will result in long access times in which case it may be cheaper to reevaluate the expression. Let us turn to the correctness proof of the described transformation. It can be split into two parts: 1. The proof of correctness of the abstract edge effects [[k]] with respect to the definition of availability; 2. The proof of correctness of the replacement of definitely available expressions by accesses to variables. We only treat the second part. Note that availability of expressions has been introduced by means of semantic terms, namely the evaluation of expressions and the assignment of their values to variables. In order to formulate the analysis, we then secretly switched to syntactic terms namely, labeled edges of the controlflow graph, paths in this graph, and occurrences of variables on the left and right side of assignments or in conditions. The proof thus has to connect syntax with semantics. Let π be a path leading from the entry point of the program to a program point u, and let s = (ρ, μ) be the state after the execution of the path π. Let y ← e be an assignment such that y ∈ Vars(e) holds and that y ← e is available at u. It can be shown by induction over the length of executed paths π that the value of y in state s is equal to the value of the expression e when evaluated in the valuation ρ, i.e., ρ(y) = [[e]] ρ. Assume that program point u has an outgoing edge k labeled with assignment x ← e, and that y ← e is contained in A∗ [u], i.e., definitely available. y ← e is in particular available at the end of path π. Therefore, ρ(y) = [[e]] ρ holds. Under this condition, the assignment x ← e can be replaced by x ← y. The proof guarantees the correctness of the analysis and the associated transformation. But what about the precision of the analysis? Does a compiler realizing this analysis miss some opportunity to remove redundant computations, and if so, why? There are, in fact, several reasons why this can happen. The first reason is caused by infeasible paths. We have seen in Sect. 1.3 that a path may be not executable in all states or even in not any states at all. In the latter case, such a path is called infeasible. The composition of the concrete edge effects of such a path is not defined anywhere. 1.4 Elimination of Redundant Computations 15 0 y←1 Zero(x > 1) 5 1 NonZero(x > 1) 2 y ←x∗y 3 x←x−1 4 A[0] ⊆ ∅ A[1] ⊆ (A[0]\Occ(y)) ∪ {y ← 1} A[1] ⊆ A[4] A[2] ⊆ A[1] A[3] ⊆ A[2]\Occ(y) A[4] ⊆ A[3]\Occ(x) A[5] ⊆ A[1] Fig. 1.3 The system of inequalities for the factorial function The abstract edge effects of our analysis, however, are total functions. They do not know about infeasibility. Such a path would be considered in forming the intersection in the definition of definite availability and may pollute the information if this path does not contain an assignment available on all other paths. A second reason is the following: Assume that the assignment x ← y + z is available at program point u, and that there exists an edge k = (u, y ← e, v) leaving u. Assume further that the value of e is always the one that y has at u. In this case, the transformation replacing y + z by x would still be correct although x ← y + z would no longer be recognized as available at v. An important question remains: How are the sets A∗ [u] computed? The main idea is to derive from the program a system of inequalities that characterizes these values: A[start] ⊆ ∅ A[v] ⊆ [[k]] (A[u]) for an edge k = (u, lab, v) The first inequality expresses the assumption that no assignments are available at the entry of the program. Further, each edge k leading from a node u to a node v generates an inequality of the second kind. [[k]] (A[u]) are the assignments that are propagated as available along the edge k = (u, lab, v), either since they were available at u and “survived” [[k]] or since they were made available by [[k]]. This set is at most available at v since other edges may target v along which these assignments might not be available. Example 1.4.3 Let us consider the program implementing the factorial function as in Fig. 1.3. We see that the system of inequalities can be produced from the controlflow graph and the abstract edge transformers in a straightforward way. The only assignment whose leftside variable does not also occur on the right side is y ← 1. The complete lattice for the analysis of available assignments therefore consists of only two elements, ∅ and {y ← 1}. Correspondingly, Occ(y) = {y ← 1} and Occ(x) = ∅ hold. 16 1 Foundations and Intraprocedural Optimization A[0] = A[1] = A[2] = A[3] = A[4] = A[5] = ∅ Fig. 1.4 A trivial solution of the system of inequalities of Example 1.4.3 Figure 1.4 shows a trivial solution of this system of inequalities. In this case, this is the only solution. In general, there could be several solutions. In the availableassignment analysis, we are interested in largest sets. The larger the sets, the more assignments have been shown to be available, and the more optimizations can be performed. In consequence, we consider an analysis more precise that identifies more assignments as available. In this case, the largest solution is the best solution. The question is, does a best solution always exist? If yes, can it be efficiently computed? We generalize the problem a bit to be able to systematically answer the question as to the existence of best solutions of systems of inequalities and as to their efficient computation. This general treatment will provide us universal algorithms for solving virtually all programanalysis problems in this book. The first observation is that the set of possible values for the unknowns A[v] forms a partial order with respect to the subset relation ⊆. The same holds for the superset relation ⊇. These partial orders have the additional property that each subset of X has a least upper bound and a greatest lower bound, namely the union and the intersection, respectively, of the sets in X . Such a partial order is called a complete lattice. A further observation is that the abstract edge transformers [[k]] are monotonic functions, that is, they preserve the ordering relation between values: [[k]] (B1 ) ⊇ [[k]] (B2 ) if B1 ⊇ B2 1.5 Background: Complete Lattices This section presents fundamental notions and theorems about complete lattices, solutions of systems of inequalities, and the essentials of methods to compute least solutions. The reader should not be confused about best solutions being least solutions, although in the availableassignments analysis the largest solution was claimed to be the best solution. The following treatment is in terms of partial orders, , where less is, by convention, always more precise. In the case of available assignments, we therefore take the liberty to set = ⊇. We start with definitions of partial orders and complete lattices. A set D together with a relation on D × D is called a partial order if for all a, b, c ∈ D it holds that: 1.5 Background: Complete Lattices 17 aa a b ∧ b a =⇒ a = b a b ∧ b c =⇒ a c reflexivity antisymmetry transitivity The sets we consider in this book consist of information at program points about potential or definite program behaviors. In our running example, such a piece of information at a program point is a set of available assignments. The ordering relation indicates precision. By convention, less means more precise. More precise in the context of program optimizations should mean enabling more optimizations. For the availableassignments analysis, more available assignments means potentially more enabled optimizations. So, the ordering relation is the superset relation ⊇. We give some examples of partial orders, representing lattices graphically as directed graphs. Vertices are the lattice elements. Edges are directed upwards and represent the relation. Vertices not connected by a sequence of edges are incomparable by . 1. The set 2{a,b,c} of all subsets of the set {a, b, c} together with the relation ⊆: a, b, c a, b a, c b, c a b c 2. The set of all integer numbers Z together with the relation ≤: 2 1 0 1 3. The set of all integer numbers Z⊥ = Z ∪ {⊥}, extended by an additional element ⊥ together with the order: 2 1 0 1 2 ⊥ An element d ∈ D is called an upper bound for a subset X ⊆ D if x d for all x ∈ X An element d is called a least upper bound of X if 1. d is an upper bound of X , and 2. d y holds for each upper bound y of X . 18 1 Foundations and Intraprocedural Optimization Not every subset of a partially ordered set has an upper bound, let alone a least upper bound. The set {0, 2, 4} has the upper bounds 4, 5, . . . in the partially ordered set Z of integer numbers, with the natural order ≤, while the set {0, 2, 4, . . .} of all even numbers has no upper bound. A partial order D is a complete lattice if each subset X ⊆ D possesses a least upper bound. This least upper bound is represented as X . Forming the least upper bound of a set of elements is an important operation in program analysis. Let us consider the situation that several edges of the controlflow graph have the same target node v. The abstract edge effects associated with these edges propagate different information towards v. The least upper bound operator then can be applied to combine the incoming information in a sound way to a value at v. Each element is an upper bound of the empty set of elements of D. The least upper bound ⊥ of the empty set, therefore, is less than or equal to any other element of the complete lattice. This least element is called the bottom element of the lattice. The set of all elements of a complete lattice also possesses an upper bound. Each complete lattice therefore also has a greatest element, , called the top element. Let us consider the partial orders of our examples. We have: 1. The set D = 2{a,b,c} of all subsets of the basic set {a, b, c} and, in general, of each base set together with the subset relation is a complete lattice. 2. The set Z of the integer numbers with the partial order ≤ is not a complete lattice. 3. The set Z together with the equality relation = is also not a clomplete lattice. A complete lattice, however, is obtained if an extra least element, ⊥, and an extra greatest element, , is added: 2 1 0 1 2 ⊥ This lattice Z ⊥ = Z ∪ {⊥, } contains only a minimum of pairs in the ordering relation. Such lattices are called flat. In analogy to upper and least upper bounds, one can define lower and greatest lower bounds for subsets of partially ordered sets. For a warmup, we prove the following theorem: Theorem 1.5.1 Each subset X of a complete lattice D has a greatest lower bound X. Proof Let U = {u ∈ D  ∀ x ∈ X : u x}the set of all lower bounds of the set X . The set U has a least upper bound g : = U since D is a complete lattice. We claim that g is the desired greatest lower bound of X . We first show that g is a lower bound of the set X . For this, we take an arbitrary element x ∈ X . It holds u x for each u ∈ U , since each u ∈ U is even a lower bound for the whole set X . Therefore, x is an upper bound of the set U , and therefore 1.5 Background: Complete Lattices 19 Fig. 1.5 The least upper bound and the greatest lower bound for a subset X greater than or equal to the least upper bound of U , i.e., g x. Since x was an arbitrary element, g is in deed a lower bound of X . Since g is an upper bound of U and therefore greater than or equal to each element in U , i.e., u g for all u ∈ U , g is the greatest lower bound of X , which completes the proof. Figure 1.5 shows a complete lattice, a subset, and its greatest lower and least upper bounds. That each of its subsets has a least upper bound makes a complete lattice out of a partially ordered set. Theorem 1.5.1 says that each subset also has a greatest lower bound. Back to our search for ways to determine solutions for systems of inequalities! Recall that the unknowns in the inequalities for the analysis of available assignments are the sets A[u] for all program points u. The complete lattice D of values for these unknowns is the powerset lattice 2Ass , where the partial order is the superset relation ⊇. All inequalities for the same unknown v can be combined into one inequality by applying the least upper bound operator to the right sides of the original inequalities. This leads to the form: A[start] ⊆ ∅ A[v] ⊆ {[[k]] (A[u])  k = (u, lab, v) edge} for v = start This reformulation does not change the set of solutions due to x d1 ∧ . . . ∧ x dk iff x {d1 , . . . , dk } 20 1 Foundations and Intraprocedural Optimization As a result, we obtain the generic form of a system of inequalities specifying a programanalysis problem: xi f i (x1 , . . . , xn ) i = 1, . . . , n The functions f i : Dn → D describe how the unknowns xi depend on other unknowns. One essential property of the functions f i that define the right sides of the inequalities is their monotonicity. This property guarantees that an increase of values on righthand sides, may have no impact or increase also the values on the lefthand sides. A function f : D1 → D2 between the two partial orders D1 , D2 is monotonic, if a b implies f (a) f (b). For simplicity, the two partial orders in D1 and in D2 have been represented by the same symbol, . Example 1.5.1 For a set U , let D1 = D2 = 2U be the powerset lattice with the partial order ⊆. Each function f defined through f x = (x ∩ a) ∪ b for a, b ⊆ U is monotonic. A function g defined through g x = a \ x for a = ∅, however, is not monotonic. The functions inc and dec defined as inc x = x + 1 and dec x = x − 1 are monotonic on D1 = D2 = Z together with the partial order “≤”. The function inv defined through inv x = −x is not monotonic. If the functions f 1 : D1 → D2 and f 2 : D2 → D3 are monotonic so is their composition f 2 ◦ f 1 : D1 → D3 . If D2 is a complete lattice then the set [D1 → D2 ] of monotonic functions f : D1 → D2 forms a complete lattice, where f g iff f x g x for all x ∈ D1 holds. In particular, for F ⊆ [D1 → D2 ] the function f defined by f x = {g x  g ∈ F} is again monotonic, and it is the least upper bound of the set F. Let us consider the case D1 = D2 = 2U . For functions f i x = ai ∩ x ∪ bi , where ai , bi ⊆ U , the operations “◦”, “” and “” can be described by operations on the sets ai , bi : ( f 2 ◦ f 1 ) x = a1 ∩ a2 ∩ x ∪ a2 ∩ b1 ∪ b2 composition ( f 1 f 2 ) x = (a1 ∪ a2 ) ∩ x ∪ b1 ∪ b2 union ( f 1 f 2 ) x = (a1 ∪ b1 ) ∩ (a2 ∪ b2 ) ∩ x ∪ b1 ∩ b2 intersection Functions of this form occur often in socalled bitvector frameworks. Our goal is to find a least solution in a complete lattice D for the system of inequalities xi f i (x1 , . . . , xn ), i = 1, . . . , n (∗) 1.5 Background: Complete Lattices 21 where the functions f i : Dn → D that define the right sides of the inequalities are monotonic. We exploit that Dn is a complete lattice if D is one. We combine the n functions f i to one function f : Dn → Dn to simplify the presentation of the underlying problem. This function f is defined through f (x1 , . . . , xn ) = (y1 , . . . , yn ), where yi = f i (x1 , . . . , xn ). It turns out that this constructions leads from monotonic component functions to a monotonic combined function. This transformation of the problem has reduced our problem to one of finding a least solution for a single inequality x f x, however in the slightly more complex complete lattice Dn . The search proceeds in the following way: It starts with an element d that is as small as possible, for instance, with d = ⊥ = (⊥, . . . , ⊥), the least element in Dn . In case d f d holds, a solution has been found. Otherwise, d is replaced by f d and tested for being a solution. If not, f is applied to f d and so on. Example 1.5.2 Consider the complete lattice D = 2{a,b,c} with the partial order = ⊆ and the system of inequalities: x1 ⊇ {a} ∪ x3 x2 ⊇ x3 ∩ {a, b} x3 ⊇ x1 ∪ {c} The iterative search for a least solution produces the results for the different iteration steps as they are listed in the following table: x1 x2 x3 0 ∅ ∅ ∅ 1 2 3 4 {a} {a, c} {a, c} ditto ∅ ∅ {a} {c} {a, c} {a, c} We observe that at least one value for the unknowns increases in each iteration until finally a solution is found. We convince ourselves of the fact that this is the case for any complete lattice given that right sides of equations are monotonic. More precisely, we show: Theorem 1.5.2 Let D be a complete lattice and f : D → D be a monotonic function. Then the following two claims hold: 1. The sequence ⊥, f ⊥, f 2 ⊥, . . . is an ascending chain, i.e., it holds that f i−1 ⊥ f i ⊥ for all i ≥ 1. 2. If d = f n−1 ⊥ = f n ⊥ then d is the least element d satisfying d f (d ). Proof The first claim is proved by induction: For i = 1, the first claim holds since f 1−1 ⊥ = f 0 ⊥ = ⊥ is the least element of the complete lattice and therefore less than or equal to f 1 ⊥ = f ⊥. Assume that the claim holds for i − 1 ≥ 1, i.e., f i−2 ⊥ f i−1 ⊥ holds. The monotonicity of the function f implies: f i−1 ⊥ = f ( f i−2 ⊥) f ( f i−1 ⊥) = f i ⊥ 22 1 Foundations and Intraprocedural Optimization We conclude that the claim also holds for i. Therefore, the claim holds for all i ≥ 1. Let us now regard the second claim. Assume that d = f n−1 ⊥ = f n ⊥ Then d is a solution of the inequality x f x. Let us further assume we have another solution d of the same inequality. Thus, d f d holds. It suffices to show that f i ⊥ d holds for all i ≥ 0. This is again shown by induction. It is the case for i = 0. Let i > 0 and f i−1 ⊥ d . The monotonicity of f implies f i ⊥ = f ( f i−1 ⊥) f d d since d is a solution. This proves the claim for all i. Theorem 1.5.2 supplies us with a method to determine not only a solution, but even the least solution of an inequality, assuming that the ascending chain f i ⊥ eventually stabilizes, i.e., becomes constant at some i. It is therefore sufficient for the termination of our search for a fixed point that all ascending chains in D eventually stabilize. This is always the case in finite lattices. The solution found by the iterative method is the least solution not only of the inequality x f x, but is also the least solution of the equality x = f x, i.e., it is the least fixed point of f . What happens if not all ascending chains of the complete lattice eventually stabilize? Then the iteration may not always terminate. Nontheless, a least solution is guaranteed to exist. Theorem 1.5.3 (Knaster–Tarski) Each monotonic function f : D → D on a complete lattice D has a least fixed point d0 , which is also the least solution of the inequality x f x. Proof A solution of the inequality x f x is also called a postfixed point of f . Let P = {d ∈ D  d f d} be the set of postfixed points of f . We claim that the greatest lower bound d0 of the set P is the least fixed point of f . We first prove that d0 is an element of P, i.e., is a postfixed point of f . It is clear that f d0 f d d for each postfixed point d ∈ P. Thus f d0 is a lower bound of P and is therefore less than or equal to the greatest lower bound, i.e., f d0 d0 . d0 is a lower bound of P, and it is an element of P. It is thus the least postfixed point of f . It remains to prove that d0 also is a fixed point of f and therefore the least fixed point of f . We know already that f d0 d0 holds. Let us consider the other direction: The monotonicity of f implies f ( f d0 ) f d0 . Therefore, f d0 is a postfixed point of f , i.e., f d0 ∈ P. Since d0 is a lower bound of P, the inequality d0 f d0 follows. Theorem 1.5.3 guarantees that each monotonic function f on a complete lattice has a least fixed point, which conicides with the least solution of the inequality x f x. 1.5 Background: Complete Lattices 23 Example 1.5.3 Let us consider the complete lattice of the natural numbers augmented by ∞, i.e., D = N ∪ {∞} together with the partial order ≤. The function inc defined by inc x = x + 1 is monotonic. We have: inci ⊥ = inci 0 = i i + 1 = inci+1 ⊥ Therefore, this function has a least fixed point, namely, ∞. This fixed point will not be reached after finitely many iteration steps. Theorem 1.5.3 can be applied to the complete lattice with the dual partial order (instead of ). Thus, we obtain that each monotonic function not only has a least, but also a greatest fixed point. Example 1.5.4 Let us consider again the powerset lattice D = 2U for a base set U and a function f with f x = x ∩ a ∪ b. This function is monotonic. It therefore has a least and a greatest fixed point. Fixedpoint iteration delivers for f : f fk⊥ fk 0 ∅ U 1 b a∪b 2 b a∪b With this newly acquired knowledge, we return to our application, which is to solve systems of inequalities xi f i (x1 , . . . , xn ), i = 1, . . . , n (∗) over a complete lattice D for monotonic functions f i : Dn → D. Now we know that such a system of inequalities always has a least solution, which coincides with the least solution of the associated system of equations xi = f i (x1 , . . . , xn ), i = 1, . . . , n In the instances of static program analysis considered in this book, we will frequently meet complete lattices where ascending chains eventually stabilize. In these cases, the iterative procedure of repeated evaluation of righthand sides according to Theorem 1.5.2, is able to compute the required solution. This naive fixedpoint iteration, however, is often quite inefficient. Example 1.5.5 Let us consider again the factorial program in Example 1.4.3. The fixedpoint iteration to compute the least solution of the system of inequalities for available assignments is shown in Fig. 1.6. The values for the unknowns stabilize only after four iterations. 24 1 Foundations and Intraprocedural Optimization 1 2 3 4 0 ∅ ∅ ∅ 1 {y ← 1} {y ← 1} ∅ 2 {y ← 1} {y ← 1} {y ← 1} ∅ ∅ ∅ 3 ∅ ∅ ∅ 4 {y ← 1} ∅ ∅ 5 {y ← 1} {y ← 1} {y ← 1} ∅ ∅ ∅ 5 ditto Fig. 1.6 Naive fixedpoint iteration for the program in Example 1.4.3 1 0 ∅ 1 {y ← 1} 2 {y ← 1} 3 ∅ 4 ∅ 5 ∅ 2 ∅ ∅ ∅ ∅ ∅ ∅ 3 ditto Fig. 1.7 Roundrobin iteration for the program in Example 1.4.3 How can naive fixedpoint iteration be improved? A significant improvement is already achieved by roundrobin iteration. In roundrobin iteration, the computation of a value in a new round does not use the values computed in the last round, but for each variable xi the last value which has been computed for xi . In the description of the algorithm, we must distinguish between the unknowns xi and their values. For that purpose we introduce an array D that is indexed with the unknowns. The array component D[xi ] always holds the value of the unknown xi . The array D is successively updated until it finally contains the resulting variable assignment. for (i ← 1; i ≤ n; i++) D[xi ] ← ⊥; do { finished ← true; for (i ← 1; i ≤ n; i++) { new ← f i (D[x1 ], . . . , D[xn ]); if (¬(D[xi ] new)) { finished ← false; D[xi ] ← D[xi ] new; } } } while (¬finished) Example 1.5.6 Let us consider again the system of inequalities for available assignments for the factorial program in Example 1.4.3. Figure 1.7 shows the corresponding roundrobin iteration. It appears that three iteration rounds suffice. 1.5 Background: Complete Lattices 25 Let us have a closer look at roundrobin iteration. The assignment D[xi ] ← D[xi ] new; in our implementation does not just overwrite the old value of xi , but replaces it by the least upper bound of the old and the new value. We say that the algorithm accumulates the solution for xi during the iteration. In the case of a monotonic function f i , the least upper bound of old and new values for xi is equal to the new value. For a nonmonotonic function f i , this need not be the case. The algorithm is robust enough to compute an ascending chain of values for each unknown xi even in the nonmonotonic case. Thus, it still returns some solution of the system of inequalities whenever it terminates. The run time of the algorithm depends on the number of times the dowhile loop is executed. Let h be the maximum of the lengths of all proper ascending chains, i.e., one with no repetitions ⊥ d1 d2 . . . dh in the complete lattice D. This number is called the height of the complete lattice D. Let n be the number of unknowns in the system of inequalities. Roundrobin iteration needs at most h · n rounds of the dowhile loop until the values of all unknowns for the least solution are determined and possibly one more round to detect termination. The bound h · n can be improved to n if the complete lattice is of the form 2U for some base set U , if all functions f i are constructed from constant sets and variables using only the operations ∪ and ∩. The reason for this is the following: whether an element u ∈ U is in the result set for the unknowns xi is independent of whether any other element u is contained in these sets. For which variables xi a given element u is in the result sets for xi can be determined in n iterations over the complete lattice 2{u} of height 1. Roundrobin iteration for all u ∈ U is performed in parallel by using the complete lattice 2U instead of the lattice 2{u} . These bounds concern the worst case. The least solution is often found in far fewer iterations if the variables are ordered appropriately. Will this new iteration strategy also find the least solution if naive fixedpoint iteration would have found the least solution? To answer this question at least in the monotonic case, we assume again that all functions f i are monotonic. Let yi(d) be the ith component of F d ⊥ and xi(d) be the value of D[xi ] after the dth execution of the dowhile loop of roundrobin iteration. For all i = 1, . . . , n and d ≥ 0 we prove the following claims: (d) (d) 1. yi xi z i for each solution (z 1 , . . . , z n ) of the system of inequalities; 2. if the roundrobin iteration terminates then the variables x1 , . . . , xn will, after termination, contain the least solution of the system of inequalities; (d) (d) 3. yi xi . Claim 1 is shown by induction. It implies that all approximations xi(d) lie below the value of the unknown xi in the least solution. Let us assume that the roundrobin (d) iteration terminates after round d. The values xi therefore satisfy the system of inequalities and thereby are a solution. Because of claim 1, they also form a least solution. This implies claim 2. 26 1 Foundations and Intraprocedural Optimization Favorable: Unfavorable: 5 0 y←1 y←1 Zero(x > 1) 5 1 NonZero(x > 1) Zero(x > 1) 2 4 NonZero(x > 1) 3 0 y ←x∗y y ←x∗y 2 3 x←x−1 x←x−1 1 4 Fig. 1.8 A favorable and an unfavorable order of unknowns 1 2 3 4 0 {y ← 1} {y ← 1} ∅ 1 {y ← 1} {y ← 1} {y ← 1} 2 ∅ ∅ ∅ 3 {y ← 1} {y ← 1} ∅ ∅ ∅ ∅ ∅ 4 {y ← 1} 5 ∅ ∅ ∅ ∅ ∅ ∅ ∅ 5 ditto Fig. 1.9 Roundrobin iteration for the unfavorable order of Fig. 1.8 Claim 1 also entails that after d rounds the roundrobin iteration computes values at least as large as the naive fixedpoint iteration. If the naive fixedpoint iteration terminates after round d, then the roundrobin iteration terminates after at most d rounds. We conclude that roundrobin iteration is never slower than naive fixedpoint iteration. Nevertheless, roundrobin iteration can be performed more or less cleverly. Its efficiency substantially depends on the order in which the variables are reevaluated. It is favorable to reevaluate a variable xi on which another variable x j depends before this variable. This strategy leads to termination with a least solution after one execution of the dowhile loop for an acyclic system of inequalities. Example 1.5.7 Let us consider again the system of inequalities for the determination of available assignments for the factorial program in Example 1.4.3. Figure 1.8 shows a favorable and an unfavorable order of unknowns. In the unfavorable case, iteration needs four rounds for this program, as shown in Fig. 1.9. 1.6 Least Solution or MOP Solution? 27 1.6 Least Solution or MOP Solution? Section 1.5 presented methods to determine least solutions of systems of inequalities. Let us now apply these techniques for solving program analysis problems such as availability of expressions in variables. Assume we are given a controlflow graph. The analysis problem consists in computing one information for each program point, i.e., each node v in the controlflow graph. A specifation of the analysis then consists of the following items: • a complete lattice D of possible results for the program points; • a start value d0 ∈ D for the entry point start of the program; together with • a function [[k]] : D → D for each edge k of the controlflow graph, which is monotonic. These functions are also called the abstract edge effects for the controlflow graph. Each such specification constitutes an instance of the monotonic analysis framework. For availability of expressions in variables we provided such a specification, and we will see more instances of this framework in the coming sections. Given an instance of the monotonic analysis framework, we can define for each program point v, the value I ∗ [v] = {[[π]] d0  π : start →∗ v} The mapping I ∗ is called the merge over all paths solution (in short: MOP solution) of the analysis problem. On the other hand, we can put up a system of inequalities which locally describes how information is propagated between nodes along the edges of the controlflow graph: I[start] d0 I[v] [[k]] (I[u]) for each edge k = (u, lab, v) According to the theorems of the last section, this system has a least solution. And if the complete lattice D has finite height, this least solution can be computed by means, e.g., of roundrobin iteration. The following theorem clarifies the relation between the least solution of the inequalities and the MOP solution of the analysis. Theorem 1.6.1 (Kam and Ullman 1975) Let I ∗ denote the MOP solution of an instance of the monotonic framework and I the least solution of the corresponding system of inequalities. Then for each program point v, I[v] I ∗ [v] holds. This means that for each path π from program entry to v, we have: I[v] [[π]] d0 . (∗) 28 1 Foundations and Intraprocedural Optimization Proof We prove the claim (∗) by induction over the length of π. For the empty path π, i.e., π = , we have: [[π]] d0 = [[]] d0 = d0 I[start] Otherwise π is of the form π = π k for an edge k = (u, lab, v). According to the induction hypothesis, the claim holds for the shorter path π , that is, [[π ]] d0 I[u]. It follows that: [[π]] d0 = [[k]] ([[π ]] d0 ) [[k]] (I[u]) since [[k]] is monotonic I[v] since I is a solution This proves the claim. Theorem 1.6.1 is somewhat disappointing. We would have hoped that the least solution was the same as the MOP solution. Instead, the theorem tells us that the least solution is only an upper bound of the MOP solution. This means that, in general, the least solution may be not as precise as the MOP solution and thus exhibit less opportunities for optimization as the MOP. Still, in many practical cases the two solutions agree. This is, in particular, the case if all functions [[k]] are distributive. A function f : D1 → D2 is called • distributive, if f ( X ) = { f x  x ∈ X } holds for all nonempty subsets X ⊆ D; • strict, if f ⊥ = ⊥; • totally distributive, if f is distributive and strict. Example 1.6.1 Let us consider the complete lattice D = N ∪ {∞} with the canonical order ≤. The function inc defined by inc x = x + 1 is distributive, but not strict. As another example, let us look at the function add : (N ∪ {∞})2 → (N ∪ {∞}) where add (x1 , x2 ) = x1 + x2 , and where the complete lattice (N ∪ {∞})2 is componentwise ordered. We have: add ⊥ = add (0, 0) = 0 + 0 = 0 Therefore, this function is strict. But it is not distributive, as the following counterexample shows: add ((1, 4) (4, 1)) = add (4, 4) = 8 = 5 = add (1, 4) add (4, 1) 1.6 Least Solution or MOP Solution? 29 Example 1.6.2 Let us again consider the powerset lattice D = 2U with the partial order ⊆. For all a, b ⊆ U the function f defined by f x = x ∩ a ∪ b is distributive since ( X ) ∩ a ∪ b = {x ∩ a  x ∈ X } ∪ b = {x ∩ a ∪ b  x ∈ X } = { f x  x ∈ X} for each nonempty subset X ⊆ D. The function f is, however, strict only if b = ∅ holds. Functions f of the form f x = (x ∪ a) ∩ b have similar properties on the powerset means lattice D = 2U with the reversed order ⊇. For this partial order, distributivity that for each nonempty subset X ⊆ 2U it holds that f ( X ) = { f x  x ∈ X }. There exists a precise characterization of all distributive functions if their domain is an atomic lattice. Let A be a complete lattice. An element a ∈ A is called atomic if a = ⊥ holds and the only elements a ∈ A with a a are the elements a = ⊥ and a = a. A complete lattice A is called atomic if each element d ∈ A is the least upper bound of all atomic elements a d in A. In the complete lattice N ∪ {∞} of Example 1.6.1, 1 is the only atomic element. Therefore, this lattice is not atomic. In the powerset lattice 2U , ordered by the subset relation ⊆, the atomic elements are the singleton sets {u}, u ∈ U . In the powerset lattice with the same base set, but the reversed order ⊇, the atomic elements are the sets (U \{u}), u ∈ U . The next theorem states that for atomic lattices distributive functions are uniquely determined by their values for the least element ⊥ and for the atomic elements. Theorem 1.6.2 Let A and D be complete lattices where A is atomic. Let A ⊆ A be the set of atomic elements of A. It holds that 1. Two distributive functions f, g : A → D are equal if and only if f (⊥) = g(⊥) and f (a) = g(a) for all a ∈ A. 2. Each pair (d, h) such that d ∈ D and h : A → D define a distributive function f d,h : A → D by: f d,h (x) = d {h(a)  a ∈ A, a x}, x ∈A Proof We only prove the first claim. If the functions f and g are equal they agree on ⊥ and the atomic elements of A. For the opposite direction, we regard an arbitrary element x ∈ A. For x = ⊥ holds f (x) = g(x) according to our assumption. For x = ⊥, the set A x = {a ∈ A  a x} is not empty. It follows that: f (x) = f ( Ax ) = { f (a)  a ∈ A, a x} = {g(a)  a ∈ A, a x} = g(x) which was to be proved. 30 1 Foundations and Intraprocedural Optimization Note that each distributive function f : D1 → D2 is also monotonic. a b holds if and only if a b = b holds. If a b holds we have: f b = f (a b) = f a f b Consequently, we have f a f b, what was to be shown. There is an important theorem for program analyses with distributive edge effects: Theorem 1.6.3 (Kildall 1972) Assume that every program point v is reachable from the program’s entry point. Assume further that all edge effects [[k]] : D → D are distributive. The least solution I of the system of inequalities agrees with the MOP solution I ∗ , i.e., I ∗ [v] = I[v] for all program points v. Proof Because of Theorem 1.6.1 it suffices to show that I[v] I ∗ [v] holds for all v. Since I is the least solution of the system of inequalities it suffices to show that under the given circumstances, I ∗ is a solution, that is, satisfies all inequalities. For the entry point start of the program, we have: I ∗ [start] = {[[π]] d0  π : start →∗ start} [[]] d0 d0 For each edge k = (u, lab, v) we check that {[[π]] d0  π : start →∗ v} {[[π k]] d0  π : start →∗ u} = {[[k]] ([[π ]] d0 )  π : start →∗ u} = [[k]] ( {[[π ]] d0  π : start →∗ u}) I ∗ [v] = = [[k]] (I ∗ [u]) The next to last equality holds since the set {π  π : start →∗ u} of all paths from the entry point of the program start to u is not empty, and since the abstract edge effect [[k]] is distributive. We conclude that I ∗ satisfies all inequalities. This proves the claim. The following example shows that in Theorem 1.6.3, the assumption that all program points are reachable is necessary. Example 1.6.3 Regard the controlflow graph of Fig. 1.10. As complete lattice we choose D = N ∪ {∞} with the canonical order ≤. As single edge effect we choose 1.6 Least Solution or MOP Solution? 31 7 0 inc 1 2 Fig. 1.10 A controlflow graph showing the consequences of unreachability the distributive function inc. For an arbitrary starting value at the entry point, we have I[2] = inc (I[1]) = inc 0 =1 On the other hand, we have: I ∗ [2] = ∅=0 since there is no path from the entry point 0 to program point 2. It follows that the MOP solution is different from the least solution. It is not critical to assume that all program points are reachable. Unreachable program points can be easily identified and then removed without changing the semantics of the program. Conclusion 1.6.1 We gather all the observations about monotonic analysis frameworks. • The MOP solution of a monotonic analysis framework is always less or equal to the least solution of the corresponding system of inequalities. • If all edge effects are distributive and every program point is reachable from the entry point of the program, the MOP solution coincides with teh least solution. • Roundrobin iteration can be used to determine the least solution if all ascending chains in the complete lattice have finite lengths. Let us apply these observations to the analysis of the availability of expressions in variables. In this analysis, the complete lattice D = 2Ass is a finite powerset lattice with the order ⊇. The value for the entry point of the program is d0 = ∅, and the abstract edge effects [[k]] are functions f of the form f x = (x ∪ a)\b = (x ∪ a) ∩ b̄ for b̄ = Ass\b. Example 1.6.2 shows that all such functions are distributive. Thus, roundrobin iteration for correpsonding systems of inequalities computes the MOP solution, provided that all program points are reachable from the entry point of the program. We conclude the section about the removal of redundant computations. The transformation we presented has several disadvantages: 32 1 Foundations and Intraprocedural Optimization The analysis of availability of expressions in variables may fail to discover a redundant computation because it requires an available expression, i.e., an expression whose reevaluation could be avoided, to be available in the same variable along all paths. It also misses to identify expressions as available which occur in conditions or index expressions, because their values are not available in variables. At the expense of introducing extra auxiliary variables, the compiler could transform the program beforehand in order to make this program analysis more effective. This transformation introduces unique temporary variables Te for selected expressions e and insert an assignments of e into Te at each occurrence of e (see Exercise 5). An assignment x ← e thus is decomposed into the sequence Te ← e; x ← Te which is the evaluation of the righthand side, followed by a variabletovariable assignment. Most of these variabletovariable assignments, though, turn out to be superfluous, and thus should better be removed. Transformations doing that are provided in Sect. 1.8. 1.7 Removal of Assignments to Dead Variables So far, we have only met a single optimizing transformation. It replaces the recomputation of an expression by accessing a previously computed value, provided that this value is guaranteed to be available in a variable. To present this transformation and, in particular, to prove properties like correctness and termination of the associated static program analysis, we introduced an operational semantics of our language, complete lattices as domains of analysis information, and abstractions of the operational semantics to statically analyze programs. This foundational background enables us now to introduce more optimizing transformations and their associated program analyses quite easily. Example 1.7.1 Let us regard the following example: 0: 1: 2: x ← y + 2; y ← 5; x ← y + 3; The value of program variable x at program points 0 and 1 is not of any importance. It is overwritten before being used. We therefore call variable x dead at this program point. The first assignment to x can be removed because the value of x before the second assignment is irrelevant for the semantics of the program. We also call this assignment dead. These notions are now made precise. 1.7 Removal of Assignments to Dead Variables Fig. 1.11 Example for liveness of variables 33 x←y+2 0 y←5 1 x←y+3 2 3 Let us assume that, after program execution, the values of the variables from some set X ⊆ Vars are still needed. This set X can be empty, in case all variables are only used within the program under analysis. However, the analysis to be presented can also be applied to individual procedure bodies. Returning from a procedure does not necessarily mean leaving the whole program. This means that accesses to globally visible variables may still happen. The set X should, in this case, be defined as the set of global variables. The following definitions use the terms definition and use, well known in the compiler literature. A definition of a variable x is a statement which may change the value of x. In our small example language, the only definitions are assignments and loads where the left sides are x. A use of a variable x is an occurrence where the value of x is read. The sets of variables used and defined at an edge in the controlflow graph can be derived from the statement labeling the edge. For a label lab, they are determined by: Lab Used Defined ; ∅ ∅ NonZero(e) Vars(e) ∅ Zero(e) Vars(e) ∅ x ←e Vars(e) {x} x ← M[e] Vars(e) {x} M[e1 ] ← e2 Vars(e1 ) ∪ Vars(e2 ) ∅ where Vars(e) denotes the set of program variables that occur in e. We call a variable x live (relative to X ) along path π to program exit, if x ∈ X and π contains no definition of x, or if there exists at least one use of x in π, and the first use of x does not follow a definition of x. π can, in this case, be decomposed into π = π1 k π2 such that the edge k contains a use of variable x and the prefix π1 contains no definition of x. We will in the future omit the restriction, “relative to X ” and tacitly assume a set X being given. A variable x that is not live along π is called dead along π. A variable x is called (possibly) live at a program point v if x is live along at least one path from v to the program exit stop. Otherwise, we call x dead at program point v. Whether a variable is possibly live or (definitely) dead at a program point depends on the possible continuations of program execution, this is the future. This is in contrast to the availability of assignments at a program point, which depends on the history of program execution before this program point is reached. Example 1.7.2 Let us regard the simple program of Fig. 1.11. In this example, we assume that all variables are dead at the end of the program. There is only one path from each program point to the program exit. So, the sets of live and dead variables 34 1 Foundations and Intraprocedural Optimization at each program point are easily determined. For the program points of the example, they are: Live Dead 0 {y} {x} 1 ∅ {x, y} 2 {y} {x} 3 ∅ {x, y} How can at each program point the set of live variables be computed? In principle, we proceed in the same way as we did for available assignments. The domain of possible values is L = 2Vars . Instead of providing a value for the entry point of the program, however, we now provide a value for the exit point, namely, the set X of variables which are live when exiting the program. Also, we provide for each edge an abstract edge effect. Since the liveness of variables at a program point does not depend on the history but on the future, the abstract effect of the edge k = (u, lab, v) is a function [[k]] that determines the set of variables possibly live at u, given a set of variables possibly live at v. Again, the abstract edge effect only depends on the label lab of the edge. this means that [[k]] = [[lab]] , where [[;]] L [[NonZero(e)]] L [[x ← e]] L [[x ← M[e]]] L [[M[e1 ] ← e2 ]] L = = = = = L [[Zero(e)]] L = L ∪ Vars(e) (L\{x}) ∪ Vars(e) (L\{x}) ∪ Vars(e) L ∪ Vars(e1 ) ∪ Vars(e2 ) The abstract effects [[k]] of edges k on a path π = k1 . . . kr can be composed to form the abstract effect of this path. We define: [[π]] = [[k1 ]] ◦ . . . ◦ [[kr ]] The sequence of the edges is maintained by this function composition (and not reverted as for the analysis of expressions available in variables). The reason is that the abstract effect [[π]] of the path π is to describe how a set of variables L live at the end of π is propagated through the path to compute the set of variables possibly live at the beginning of π. The set of program variables possibly live at a program point v is obtained as the union of the sets of variables that are live along at least one program path π from v to the program exit, that is, as the union of the sets [[π]] X . Correspondingly, we define: {[[π]] X  π : v →∗ stop} L∗ [v] = where v →∗ stop denotes the set of all paths from v to the program exit stop. As partial order on the set L we choose the subset relation ⊆. Intuitively, smaller sets of 1.7 Removal of Assignments to Dead Variables 35 live variables mean larger sets of dead variables, which means more opportunities for optimization. The function L∗ represents the MOP solution of our analysis problem. Program analyses are called forward analyses if the value at a program point depends on the paths reaching this program point from program entry. Program analyses are called backward analyses if the value at a program point depends on the paths leaving that program point and reaching program exit. Liveness of variables therefore is a backward analysis. Available assignments, in contrast, is a forward analysis. Transformation DE: Let us assume that we are given the MOP solution L∗ . At each program point v, this is the set L∗ [v]. Its complement contains only variables that are definitely dead at v, i.e., dead along all program paths starting at v. Assignments to these variables are superfluous and can be removed by the following transformation DE, for Dead variable assignment Elimination: x∈ L∗ [v] x←e v ; v Memory accesses whose results are not needed could also be removed in analogy to the removal of assignments. This could, however, change the semantics of the program if it were to remove an illegal access to memory, which—depending on the semantics of the programming language—may produce a side effect such as raising an exception. Useless accesses to memory are therefore not removed. Transformation DE is called deadcode elimination. Correctness of this transformation is again shown in two steps: 1. The abstract edge effects are shown to correctly implement the definition of liveness. 2. The transformation is shown to preserve the semantics of programs. We again consider the second step only. An important insight is that to show semantics preservation it is not necessary to show that the value of each variable at each program point remains invariant under the transformation. In fact, this is not the case here. For the applicability of the transformation, however, it suffices that the observable behavior of the original and the transformed program are the same. The only question is: what is potentially observable? Here, we demand that the program points traversed by program execution are the same, and that in each step the contents of memory coincide as well as the values of the variables in the set X at the end of program execution. Claim 2 then is that the value of a variable, dead at some program point v, does not influence the observable behavior of the program. To prove this, we consider the state s at v and the computation of a path π starting at v and reaching program exit. Remember, the state is a pair consisting of a concrete variable binding ρ and a memory. We show by induction over the length of program paths π: 36 1 Foundations and Intraprocedural Optimization (L) Let s be a state that differs from s only in the values of dead variables. The state transformation [[π]] is also defined for s , and the states [[π ]] s and [[π ]] s agree up to the values of dead variables for all prefixes π of π. The invariant (L) entails that two states at a program point v definitely lead to the same program behavior if they only disagree in the values of variable that are dead at v. To prove the correctness of the transformation it suffices to show that only the values of dead variables may be different during the execution of the original and the transformed program. The computation of the set L∗ [u] of variables possibly live at program point u works analogously to the way sets of definitely available assignments were computed. We set up an appropriate system of inequalities. Recall that, opposed to the availableassignments analysis where we fixed a start value at program entry start, we now fix a value X at program exit stop which consists of all variables which are live at program exit. Also, each edge k = (u, lab, v) has an associated inequality that delivers the set of possibly live variables at u, given the set of possible live variables at v. Remember that the inequalities in the availableassignment analysis were oriented the other way around. The resulting system of inequalities is: L[stop] ⊇ X L[u] ⊇ [[k]] (L[v]) for an edge k = (u, lab, v) Thus, the difference in the systems of inequalities for forward and backward analyses only consists in the exchange of start and stop and in the reverted orientation of the edges. The complete lattice in which the system of inequalities will be solved is finite. This means that all ascending chains will eventually stabilize. The abstract edge effects are monotonic. Roundrobin iteration can thus be used to determine the least solution L. In fact, the abstract edge effects are even distributive, which means that this least solution is the same as the MOP solution L∗ provided that the program exit stop is reachable from each program point (cf. Theorem 1.6.3). Example 1.7.3 We consider again the program for the factorial function assuming that it obtains its input and returns its output through memory cells, more precisely through the cells M[I ] and M[R]. No variables are assumed to be live at program exit. The controlflow graph and the system of inequalities derived from it are shown in Fig. 1.12. The system of inequalities closely corresponds to the controlflow graph. After all, the system of equations was extracted from the controlflow graph. However, it does not need to be explicitly constructed. Instead, fixedpoint iteration could traverse the controlflow graph, executing the abstract edge effects associated with the traversed edges. The fixedpoint iterator would be something like a driver. Another analysis can be conducted by executing it with the edges effects of this new analysis. Roundrobin iteration delivers the solution after only one round, given the right ordering of the unknowns: 1.7 Removal of Assignments to Dead Variables 37 0 x ← M [I] L[0] ⊇ (L[1]\{x}) ∪ {I} 1 y←1 Zero(x > 1) 6 2 NonZero(x > 1) 3 y ←x∗y M [R] ← y 7 L[1] L[2] L[3] L[4] L[5] L[6] ⊇ ⊇ ⊇ ⊇ ⊇ ⊇ L[2]\{y} (L[6] \ { x}) ∪ (L[3] ∪ {x}) (L[4]\{y}) ∪ {x, y} (L[5]\{x}) ∪ {x} L[2] L[7] \ { y, R} L[7] ⊇ ∅ 4 x←x−1 5 Fig. 1.12 The system of inequalities for possibly live variables for the factorial program 7 6 2 5 4 3 1 0 1 2 ∅ {y, R} {x, y, R} ditto {x, y, R} {x, y, R} {x, y, R} {x, R} {I, R} We notice that no assignment in the factorial program has a dead left side. Therefore, transformation DE does not modify this program. The removal of assignments to dead variables can make other variables dead. This is witnessed in Fig. 1.13. This example shows a weakness of the analysis for dead variables: It may classify variables as live due to later uses in assignments to dead variables. A removal of such an assignment and a subsequent reanalysis would discover new dead variables. This iterative application of transformation and analysis is rather inefficient. In the example of livevariable analysis the repeated analysis can be avoided by strengthening the analysis. Strengthening leads to possibly smaller sets of possibly live variables. The new analysis works with a more restricted condition for liveness. The new notion, true liveness, uses the notion, true use of a variable on a path starting at a program point. A use in an assignment to a dead variable is not considered a true use. This renders the definition of true liveness recursive: true liveness depends on true use, which depends on true liveness. Let us assume again that the values of variables in a set X are still used at program exit. We call a variable x truly live along a path π to program exit if x ∈ X and π contains no definition of x or if π contains a true use of x, which occurs before any 38 1 Foundations and Intraprocedural Optimization 1 {y, R} 2 {x, y, R} ; {y, R} 2 2 ; {y, R} 3 M [R] ← y 4 1 x←y+1 z ←2∗x 3 {y, R} 1 x←y+1 ; {y, R} 3 M [R] ← y ∅ 4 M [R] ← y ∅ 4 Fig. 1.13 Repeated application of transformation DE {y, R} 1 1 x←y+1 {y, R} 2 ; 2 z ←2∗x {y, R} 3 ; 3 M [R] ← y 4 ∅ M [R] ← y 4 Fig. 1.14 Truly live variables definition of x, i.e., π can be decomposed into π = π1 k π2 , such that π1 contains no definition of x, and k contains a true use of x relative to π2 . The true use of variables at edge k = (u, lab, v) is defined by: Lab y truly used ; f alse NonZero(e) y ∈ Vars(e) Zero(e) y ∈ Vars(e) x ←e y ∈ Vars(e) ∧ x is truly live at v x ← M[e] y ∈ Vars(e) ∧ x is truly live at v M[e1 ] ← e2 y ∈ Vars(e1 ) ∨ y ∈ Vars(e2 ) The additional condition that the assignment’s left side must be truly live makes up the only difference to normal liveness. Example 1.7.4 Consider the program in Fig. 1.14. Variable z is not live (nor truly live) at program point 2. Therefore, the variables on the right side of the corresponding assignment, i.e. x, are not truly used. Thus, x is not truly live at program point 1 since x is not truly used at the edge to program point 2. 1.7 Removal of Assignments to Dead Variables 39 The abstract edge effects for true liveness are as follows: [[;]] L [[NonZero(e)]] L [[x ← e]] L [[x ← M[e]]] L [[M[e1 ] ← e2 ]] L = = = = = L [[Zero(e)]] L = L ∪ Vars(e) (L\{x}) ∪ ((x ∈ L) ? Vars(e) : ∅) (L\{x}) ∪ ((x ∈ L) ? Vars(e) : ∅) L ∪ Vars(e1 ) ∪ Vars(e2 ) For an element x and sets a, b, c, the conditional expression (x ∈ a) ? b : c denotes the set: b if x ∈ a (x ∈ a) ? b : c = c if x ∈ a The abstract edge effects for true liveness are thus more complex than those for plain liveness. However, they are still distributive! This follows from the fact that the new conditional operator is distributive provided that c ⊆ b holds. To convince ourselves of this property, we consider an arbitrary powerset domain D = 2U together with the partial order ⊆ and the function: f y = (x ∈ y) ? b : c For an arbitrary nonempty set Y ⊆ 2U , we calculate: f( Y) = = = = (x ∈ Y ) ? b : c ( {x ∈ y  y ∈ Y }) ? b : c c ∪ {(x ∈ y) ? b : c  y ∈ Y } c ∪ { f y  y ∈ Y} Theorem 1.6.2 has a more general implication: Theorem 1.7.4 Let U be a finite set and f : 2U → 2U be a function. 1. f (x1 ∪ x2 ) = f (x1 ) ∪ f (x2 ) for all x1 , x2 ⊆ U holds if and only if f can be represented in the following form: f (x) = b0 ∪ ((u 1 ∈ x) ? b1 : ∅) ∪ · · · ∪ ((u r ∈ x) ? br : ∅) for appropriate u i ∈ U and bi ⊆ U . 2. f (x1 ∩ x2 ) = f (x1 ) ∩ f (x2 ) for all x1 , x2 ⊆ U holds if and only if f can be represented in the form: f (x) = b0 ∩ ((u 1 ∈ x) ? U : b1 ) ∩ · · · ∩ ((u r ∈ x) ? U : br ) for appropriate u i ∈ U and bi ⊆ U . 40 1 Foundations and Intraprocedural Optimization Liveness: True liveness: x←x−1 {x} x←x−1 ∅ ; ; ∅ ∅ Fig. 1.15 True liveness in loops 1 1 T ←x+1 2 T ←x+1 2 y←T 3 T ←x+1 2 y←T 3 M [R] ← y 4 1 ; 3 M [R] ← T 4 M [R] ← T 4 Fig. 1.16 A program with copying instructions Note that the functions of Theorem 1.7.1 are closed under composition, least upper bounds, and greatest lower bounds (Exercise 11). The least solution of systems of inequalities for true liveness agree with the MOP solutions due to the distributivity of the abstract edge effects. We must, however, require that program exit stop is reachable from each program point. It is interesting to note that the analysis of true liveness discovers more superfluous assignments than repeated analysis of plain liveness and deadcode elimination. Example 1.7.5 Figure 1.15 shows a loop in which a variable is modified that is only used in the loop. Plain liveness analysis cannot determine that this variable is dead, while trueliveness analysis is able to do so. 1.8 Removal of Assignments Between Variables Programs often contain assignments that simply copy values from one variable into another variable. These copy instructions may be the result of other optimizations or of conversions of one program representation into a different form. Example 1.8.1 Consider the program in Fig. 1.16. Storing a value in variable T is useless in the given case, since the value of the expression is used exactly once. Variable T can be used directly instead of variable y since T is guaranteed to contain 1.8 Removal of Assignments Between Variables ∅ 41 1 T ←x+1 {T } 2 y←T {y, T } 3 M [R] ← y {y, T } 4 Fig. 1.17 Variables in Example 1.8.1 having the same value as T the same value. This renders variable y dead at program point 2, such that the compiler can eliminate the assignment to y. The resulting program still con