Unit – 5
Code Generation
Various problems may occur in the code generation phase:
● The code generator input contains the intermediate representation of the source programme and the symbol table information. The front end creates the source programme.
● The multiple choices have intermediate representation:
● We assume that the front end generates a low-level intermediate representation, i.e. the system instructions may directly control the values of names in it.
● The code generation stage requires maximum intermediate code that is error-free as required by an input.
2. Target program
The output of the code generator is the target programme. The performance could be:
3. Memory management
● The symbol table entries have to be mapped to individual p addresses during the code generation process and levels have to be mapped to the instruction address.
● The front end and code generator agree with the mapping name in the source programme to address data.
● In the activation record, local variables are stack allocation, while global variables are in the static field
4. Instruction selection
● The design of the target machine's instruction set should be complete and consistent.
● If the performance of the target machine is considered, then the speed of instruction and machine idioms are important considerations.
● Its speed and size will decide the quality of the produced code.
Example:
The Three address code is:
t:= w + x
y:= t + z
Inefficient assembly code is:
MOV w, R0 R0→w
ADD x, R0 R0 x+ R0
MOV R0, t t → R0
MOV t, R0 R0→ t
ADD z, R0 R0 → z + R0
MOV R0, y y → R0
5. Register allocation
The register can be reached more easily than memory. The instructions for registry operations are shorter and faster than those for memory operations.
When we use registers, the following sub-issues emerge:
For certain operands and outcomes, certain devices need even-odd pairs of registers.
Example
Consider the form's following division instruction:
D a,b
Where,
a - dividend even register in even/odd register pair
b - divisor
Even a register is used to hold the remainder.
The old register is used to hold the quotient.
6. Evaluation order
The effectiveness of the goal code can be influenced by the order in which the calculations are conducted. Some calculation orders require fewer registers than others to carry intermediate data.
Key takeaway:
- The code generator input contains the intermediate representation of the source programme and the symbol table information.
- The code generation stage requires maximum intermediate code that is error-free as required by an input.
- The register can be reached more easily than the memory
For designing a successful code generator, familiarity with the target computer and its instruction set is a prerequisite. Unfortunately, it is not possible to explain any target machine in enough detail in a general discussion of code generation to produce good code for a complete language on that machine.
For a simple computer that is representative of many register machines, we will use assembly code as a target language.
A three-address machine with load and store operations, computing operations, jump operations, and conditional jumps are modeled on our target computer. A byte-addressable machine with n general-purpose registers is the underlying computer, R0, R1,..., Rn —1. There will be dozens of instructions for a fully-fledged assembly language. We will use a very minimal set of instructions to avoid hiding the concepts in a plethora of specifics and presume that all operands are integers.
Many instructions consist of an operator, followed by a target, followed by a list of operands from the source. An instruction can precede a mark. We presume there are the following kinds of instructions:
● Load operations: The LD dst, addr command loads the value from the addr location to the dst location. This command signifies the dst=addr assignment. LD r, x, which loads the value at position x into register r, is the most common type of this instruction. The LD r±,r2 type instruction is a register-to-register copy in which the contents of the r2 register are copied to the r1 register.
● Store operations: In the ST x, r command, the value in register r is stored at position x. The assignment x = r is denoted by this instruction.
● OP dst, src1,src2 type computation operations, where OP is an operator such as ADD or SUB, and dst, srcy, and src2 are positions, not necessarily different. The application of the operation is the result of this computer instruction.
● Unconditional jumps: The BR L instruction allows power to branch to the label L computer instruction. (BR means branch.)
● Conditional jumps of the form Bcond r, L, where r is a register, L is a label, and cond is one of the common checks in the register r for values. For instance, if the value in register r is less than zero, BLTZ r, L causes a jump to label L, and if not, allows control to pass to the next machine instruction.
2. Program and Instruction Costs
We also equate a price with the compilation and execution of a programme. Depending on what aspect of a programme we are interested in optimizing, the length of compilation time and the scale, running time, and power consumption of the target programme are some typical cost measures.
A complex problem is calculating the actual cost of compiling and running a programme.
It is an undecidable problem in general to find an optimal target programme for a given source programme, and many of the sub-problems involved are NP-hard.
As we have shown, we must sometimes be satisfied with heuristic techniques in code generation that generate successful but not necessarily optimal target programmes.
We take the cost of an instruction to be one for simplicity plus the costs associated with the operand addressing modes. This cost relates to the duration of the instruction in words. There are zero additional costs for addressing modes involving registers, while those involving a memory location or constant in them have an additional cost of one since such operands must be stored in the words following the instruction.
A combination of registers and memory location with address modes may specify the source and destination of an instruction.
Mode | Form | Address | Example | Added Cost |
absolute | M | M | Add R0, R1 | 1 |
register | R | R | Add temp, R1 | 0 |
indexed | c(R) | C+ contents(R) | ADD 100 (R2), R1 | 1 |
indirect register | *R | contents(R) | ADD * 100 | 0 |
indirect indexed | *c(R) | contents(c+ contents(R)) | (R2), R1 | 1 |
literal | #c | c | ADD #3, R1 | 1 |
Cost 1 indicates that it only consumes one word of memory here.
Example:
Move register to memory R0 → M
MOV R0, M
cost = 1+1+1 (Since the memory location address M is in the word following the instruction)
Key takeaway:
- For a simple computer that is representative of many register machines, we will use assembly code as a target language.
- A three-address machine with load and store operations, computing operations, jump operations, and conditional jumps is modeled on our target computer.
We defined how each execution programme runs in its own logical address space divided into four areas of code and data:
➢ The area code that contains the executable target code is statically defined. At compile time, the size of the target code can be calculated.
➢ A Static Data Area Static for keeping global constants and other compiler-generated data. You may also calculate the size of the global constants and compiler data at the time of compilation.
➢ A dynamically controlled heap area for keeping allocated and released data objects during programme execution. The heap size cannot be determined at the time of the compilation.
➢ A dynamically controlled area stack for keeping activation records as calls and returns are generated and destroyed during the process. The size of the stack can not be calculated at compile time, much like the heap.
We will concentrate on the following three-address declarations to demonstrate code generation for simplified procedure calls and returns:
c a l l callee
Return
h a l t
a c t i o n, which is a placeholder for the other three-address statements.The size and layout of the activation records are calculated by the code generator via the name information stored in the table of symbols. First, we will explain how to store the return address on a procedure call in an activation record and how to return access to it after the procedure call.
Example :
// code for c
actioni
call p
a c t i o n2
halt
// code for p
Actions
return
The instructions starting at address 100 implement the statements
action i ; call p ; actiori2; h a l t
Fig 1: target code for static allocation
2. Stack allocation
By using relative addresses for storage in activation records, the static allocation can become stack allocation. In stack allocation, however, before run time, the position of an activation record for a procedure is not known. Typically, this location is stored in a file, so it is possible to access words in the activation record as offsets from the value in this register. Our target machine's indexed address mode is convenient for this reason.
The code for the first procedure initializes the stack by setting SP to the start of the stack area in memory:
LD SP, ftstackStart II initialize the stack
code for the first procedure
HALT // terminate execution
A procedure call sequence increments SP, saves the return address, and transfers control to the called procedure:
ADD SP, SP, #caller.recordSize II increment stack pointer
ST *SP, #here + 16 // save return address
BR callee. code Area II return to caller
Fig 2: Code for the above example
3. Run - time address for the name
The storage-allocation policy and the configuration of local data for a process in an activation record decide how to access the storage for names. In a three-address argument, we assumed that a name is simply a reference to a symbol-table entry for that name. This approach has a major advantage; it makes the compiler more versatile because even when the compiler is moved to a different machine where a different run-time organization is needed, the front end does not need to be modified.
Key takeaway:
- A Static Data Area Static for keeping global constants and other compiler-generated data.
- The size of the stack can not be calculated at compile time, much like the heap.
- The area code that contains the executable target code is statically defined.
- At compile time, the size of the target code can be calculated.
Fig 3: target code for stack allocation
5.4.1 Basic blocks
The Basic Block includes a declaration sequence. The control flow enters at the beginning of the sentence and leaves without any halt at the end (except maybe the last instruction of the block).
A fundamental block is generated by the following sequence of three address statements:
Basic block construction:
Algorithm: Partition into basic blocks
Input: It comprises the series of three sentences of address
Output: It comprises a list of basic blocks in exactly one block with three address statements each.
Method: Identify the leader in the code first. The guidelines are as follows for finding leaders:
● The first statement is a leader.
● Statement L is a leader if the goto statement is conditional or unconditional, such as: if....goto L or goto L
● Instruction L is a leader if a goto or conditional goto sentence like if goto B or goto B is followed immediately.
The basic block for each leader consists of the leader and all statements up to that. The next leader and the end of the curriculum are not included.
For the dot product of two vectors a and b of length 10, consider the following source code:
begin
prod :=0;
i:=1;
do begin
prod :=prod+ a[i] * b[i];
i :=i+1;
end
while i <= 10
end
The three address codes for the source programme above are given below:
B1
B2
The basic block B1 contains the declaration (1) to (2)
The basic block B2 contains a declaration (3) to (12)
5.4.2 Flow graph
A directional graph is a flow graph. It contains the control information stream for the basic block collection.
A control flow graph is used to show how the control of the programme between the blocks is parsed. It is helpful in the optimization of the loop.
The vector dot product's flow graph is given as follows:
Fig 4: Flow graph
● The initial node is block B1. Block B2 follows B1 immediately, so an edge occurs from B2 to B1.
● The jump goal from the last statement of B1 is the first statement of B2, so there is an edge from B1 to B2.
● B2 is a successor to B1 and the precursor to B2 is B1.
Key takeaway:
- The Basic Block includes a declaration sequence.
- A directional graph is a flow graph.
- It contains the control information stream for the basic block collection.
The process of optimization can be extended to a simple block. We don't need to adjust the set of expressions computed by the block during optimization.
There are two forms of simple optimization for blocks. Those are like the following:
The main Structure-Preserving Basic Block Transformation is as follows:
You do not need to be computed again and again in the common sub-expression. Instead, you can compute it once and store it from where it is referenced when it is again encountered.
a : = b + c
b : = a - d
c : = b + c
d : = a - d
In the expression above, the same expression was computed in the second and fourth expressions. So it is possible to transform the block as follows:
a : = b + c
b : = a - d
c : = b + c
d : = b
B. Dead code elimination
C. Renaming of temporary variables
A statement t:= b + c can be changed to u:= b + c where t is a temporary variable and u is a new temporary variable. Without modifying the basic block value, the entire instance of t can be replaced with u.
D. Interchange of two independent adjacent statement
Suppose the following two adjacent statements are given to a block:
t1 : = b + c
t2 : = x + y
If the value of t1 does not affect the value of t2, these two statements can be interchanged without affecting the value of the block.
2. Algebraic Transformations
● In an algebraic transformation, the expression set can be changed into an algebraically equivalent set. The expression x:= x + 0 or x:= x *1 can thus be omitted from the basic block without altering the expression package.
● A type of similar optimization is constant folding. We test constant expressions here at the time of compilation and substitute their values for the constant expression. Thus, 13.5 will replace the expression 5*2.7.
● Often, relational operators such as <=, >=, <, >, +, = etc. produce an unexpected common sub-expression.
● The associative expression is often used to reveal a common sub-expression without altering the value of the basic block. If there are assignments in the source code,
a:= b + c
e:= c +d +b
They will generate the following intermediate code:
a:= b + c
t:= c +d
e:= t + b
Key takeaway :
- You can compute it once and store it from where it is referenced when it is again encountered.
- The process of optimization can be extended to a simple block.
- We don't need to adjust the set of expressions computed by the block during optimization.
For three-address statements, the code generator is used to generate the target code. It uses registers to store the three address statement operands.
Consider the three address statements a = b + c. The following sequence of codes may be available.
MOV a, R0
ADD b, R0
Register and Address Descriptors:
➢ The track of what is currently in each register is contained in a register descriptor. The register descriptors mean that all the registers are initially empty.
➢ An address descriptor is used to store the location where you can find the name's current value at runtime.
A code - generation algorithm :
A set of three-address statements is taken as input by the algorithm. The various acts are performed for every three address statements of the form a:= b op c. Those are like the following:
➢ Invoke a getreg function to find out where the output of the b op c calculation should be stored at position L.
➢ If the value of y is already in memory and register both then prefer the register y '.If the value of y is not already in L then generate the instruction MOV y', L to put a copy of y in L. If the value of y is not already in L then generate the instruction MOV y', L.
➢ Generate the OP z', L instruction where z 'is used to represent z's current position. If z is in both, then a memory location is preferred to a register.
Update the x address descriptor to specify that x is located at position L. If x is in L, then the descriptor is modified and x is omitted from all other descriptors.
➢ If the current value of y or z is not used or lives on exit from the block or in the register, change the register descriptor to show that such registers will no longer contain y or z after the execution of x:=y op z.
Generating Code for Assignment Statements:
The d:= (a-b) + (a-c) + (a-c) assignment statement can be translated into the following sequence of three address codes:
t:= a-b
u:= a-c
v:= t +u
d:= v+u
Code sequence for the example :
Statement | Code generated | Register descriptor Register empty | Address descriptor |
t:= a - b | MOV a, R0 SUB b, R0 | R0 contains t | t in R0 |
u:= a - c | MOV a, R1 SUB c, R1 | R0 contains t R1 contains u | t in R0 u in R1 |
v:= t + u | ADD R1, R0 | R0 contains v R1 contains u | u in R1 v in R1 |
d:= v + u | ADD R1, R0 MOV R0, d | R0 contains d | d in R0 d in R0 and memory |
Key takeaway :
- For three-address statements, the code generator is used to generate the target code.
- It uses registers to store the three address statement operands.
- An address descriptor is used to store the location where you can find the name's current value at runtime.
❖ To get a better target code, computer independent optimization attempts to enhance the intermediate code. No absolute memory location or any CPU registers are involved in the portion of the code that is transformed here.
❖ Much inefficiency is introduced by the intermediate code generation method such as: using variables instead of constants, additional copies of variable, repetitive expression evaluation. You can eliminate certain efficiencies and optimize code through the optimization of code.
❖ Often it can modify the programme structure beyond recognition, such as unrolling loops, inline functions, removing certain programmer-defined variables.
Code Optimization can be carried out in the following ways:
1) Compile Time Evaluation:
(a) a = 5*(45.0/5.0)*r
Perform 5*(45.0/5.0)*r at compile time.
(b) s = 5.7
t = s/3.6
Evaluate s/3.6 as 5.7/3.6 at compile time.
2) variable propagation :
The code before optimization is:
y = w * x
a = w
till
z = a * x + 4
The code after optimization is:
y = w * x
a = aw
till
z = w * x + 4
Here, w * x and a * x are defined as a typical sub-expression after variable propagation.
3) Dead code elimination :
The code before elimination:
c = a * b
x = b
till
d = a * b + 4
The code after elimination:
c = a * b
till
d = a * b + 4
Here, x= b is a dead state, since it can never be included in the software afterwards. So, we can get this state removed.
4) Code motion :
● It reduces the evaluation frequency of expression.
● It brings loop invariant statements out of the loop.
do
{
item = 10;
valuevalue = value + item;
} while(value<100);
//This code can be further optimized as
item = 10;
do
{
valuevalue = value + item;
} while(value<100);
5) Induction variable and strength reduction :
● Strength reduction is used to replace the high strength operator with low strength.
● An induction variable is used in the loop for the following kind of assignment like I = I + constant.
Before reduction the code is:
i = 1;
while(i<10)
{
y = i * 4;
}
After Reduction the code is:
i = 1
t = 4
{
while( t<40)
y = t;
t = t + 4;
}
Key takeaway :
- To get a better target code, computer independent optimization attempts to enhance the intermediate code.
- You can eliminate certain efficiencies and optimize code through the optimization of code.
Loop optimization is the most useful machine-independent optimization since the inner loop of the software requires a programmer's time in bulk.
When we reduce the number of instructions in an inner loop, even though we increase the amount of code outside that loop, the running time of a programme can be increased.
The following three techniques are critical for loop optimization:
To minimize the amount of code in the loop, code movement is used. This transition involves a statement or expression that can be transferred beyond the body of the loop without affecting the program's semantics.
Example :
The limit-2 equation is a loop invariant equation in the sense argument.
while (i<=limit-2) /*statement does not change limit*/
After code motion the result is as follows:
a= limit-2;
while(i<=a) /*statement does not change limit or a*/
2. Induction-variable elimination
To substitute the variable from the inner loop, induction variable removal is used.
In a loop, it may reduce the number of additions. It enhances the efficiency of both code space and runs time.
Fig 6: shows before and after the condition
We can replace the assignment t4:=4*j in this figure with t4:=t4-4. The only issue that will occur is that when we first join block B2, t4 does not have a meaning. So we put a t4=4*j relationship on the B2 block entry.
3. Strength reduction
● Strength reduction is used to substitute the cheaper ones on the goal computer for the costly process.
● It is easier to add a constant than a multiplication. So, inside the loop, we can substitute multiplication with an addition.
● It is easier for multiplication than exponentiation. So we can substitute multiplication inside the loop with exponentiation.
Example :
while (i<10)
{
j= 3 * i+1;
a[j]=a[j]-2;
i=i+2;
}
The code would be, after strength reduction:
s= 3*i+1;
while (i<10)
{
j=s;
a[j]= a[j]-2;
i=i+2;
s=s+6;
}
It is easier to compute s=s+6 in the above code than j=3 *i
Key takeaway :
- To substitute the variable from the inner loop, induction variable removal is used.
- To minimize the amount of code in the loop, the code movement is used.
- Loop optimization is the most useful machine-independent optimization.
A simple block DAG is a directed acyclic graph containing the following labels on nodes:
● DAGs are a type of structure for data. It is used on simple blocks to execute transformations.
● A good way to decide a common sub-expression is given by DAG.
● This provides an image of how the value determined by the argument is used in subsequent statements.
Algorithm for construction of DAG
Input: It contains a basic block
Output: It contains the details given below:
● A label is embedded in every node. For leaves, an identifier is a name.
● A list of attached identifiers for keeping the computed values is stored in each node.
Case (i) x:= y OP z
Case (ii) x:= OP y
Case (iii) x:= y
Method
Step 1:
If the operand of y is undefined, create a node (y).
If z operand is undefined, build a node for Case(i) (z).
Step 2:
Construct node(OP) for case(i), whose right child is node(z) and whose left child is node(y) .
Check if there is a node(OP) with one child node (y) for case(ii).
Node n will be the node (y) for case(iii).
Output
For node(x), remove x from the ID list. Append x to the list of attached identifiers for the n node listed in step 2. Third, set node(x) to n.
Key takeaway :
- DAGs are a type of structure for data.
- DAG is a directed acyclic graph.
- It is used on simple blocks to execute transformations.
Example :
Find the following three statements of address:
S1:= 4 * i
S2:= a[S1]
S3:= 4 * i
S4:= b[S3]
S5:= s2 * S4
S6:= prod + S5
Prod:= s6
S7:= i+1
i := S7
if i<= 20 goto (1)
Stages in DAG Construction:
Value numbering is a method of deciding whether two computations are identical in a programme and replacing one of them with an optimization preserving semantics.
Global value numbering
Global value numbering (GVN) is a compiler optimization focused on the intermediate representation of the static single assignment type (SSA). It also helps remove redundant code that does not eliminate common subexpression (CSE). However, at the same time, CSE can delete code that GVN doesn't do, so both are often found in modern compilers.
By assigning a value number to variables and expressions, global value numbering works. For those variables and expressions which are known to be equal, the same value number is assigned. In the following code, for instance:
w := 3
x := 3
y := x + 4
z := w + 4
A successful GVN routine will assign w and x to the same number of values, and y and z to the same number of values. For example, for this block, the
map would be an optimal number-value mapping.
Using this information, you can safely turn the previous code fragment into:
w := 3
x := w
y := w + 4
z := y
Local value numbering
Local value numbering (LVN) is an optimization of the compiler that seeks to find and replace several instances of identical expressions (i.e. expressions that yield the same result) with the first occurrence. LVN is a local optimization, which means it works on a single simple block at a time, unlike global value numbering.
a ← 4 a is tagged as #1
b ← 5 b is tagged as #2
c ← a + b c (#1 + #2) is tagged as #3
d ← 5 d is tagged as #2, the same as b
e ← a + d e, being '#1 + #2' is tagged as #3
Algebraic
Algebraic identities represent another important class of optimizations on basic blocks. For example, we may apply arithmetic identities, such as
x + 0 = 0 + x = x x - 0 = x
x + 1 = 1 * x = x x / 1 = x
To exclude computations from a simple block.
Another class of algebraic optimizations involves a local reduction in power, that is, replacing a more expensive operator with a cheaper one as in:
Expensive Cheaper
x2 = x * x
2 * x = x + x
x / 2 = x * 0.5
Constant folding is a third category of similar optimizations.
Often relational operators such as < and — produce unexpected common subexpressions.
a = b + c ;
e = c + d + b;
It could generate the following intermediate code:
a = b + c
t = c + d
e = t + b
If outside this block, t is not necessary, we can change this sequence to
a = b + c
e = a + d
Using both the relation and commutation of +.
Remove instructions such as the following:
-x = x + 0
-x = x * 1
These instruction types are often generated by simple intermediate code generation algorithms, and peephole optimization can easily eliminate them.
Key takeaway :
- Value numbering is a method of deciding whether two computations are identical in a programme.
- And replacing one of them with an optimization preserving semantics.
- It also helps remove redundant code that does not eliminate common subexpression (CSE).
A compiler can perform such optimizations based on the local data. Consider, for instance, the following code:
x = a + b;
x = 6 * 3
● The first assignment of x in this code is useless. In the programme, the value machine for x is never used.
● The expression 6*3 will be computed at compile-time, simplifying the second assignment statement to x = 18;
More global knowledge requires some optimization. Consider, for instance, the following code:
a = 1;
b = 2;
c = 3;
if (....) x = a + 5;
else x = b + 4;
c = x + 1;
In this code, the initial assignment is useless at line 3 and it is possible to simplify the x+1 expression as 7.
But how a compiler can discover these facts by looking at only one or two consecutive statements is less apparent. A more global analysis is needed so that, at each point in the programme, the compiler knows the following things:
● Which variables have constant values are guaranteed to have
● Until being redefined, which variables will be used
To discover this sort of property, data flow analysis is used. Data flow analysis can be done on the control flow graph of the software (CFG).
A program's control flow graph is used to specify certain parts of a programme to which a specific value assigned to a variable will propagate.
Key takeaway :
- It is the data flow analysis in the control flow graph, i.e. the analysis, which defines the information in the software for the description and usage of data.
- Optimization can be achieved with the assistance of this study.
References: