Unit - 1
Introduction to Compiler
Q1) What do you mean by Compiler?
A1) Compiler:
● A compiler is a translator that translates the language at the highest level into the language of the computer.
● A developer writes high-level language and the processor can understand machine language.
● The compiler is used to display a programmer's errors.
● The main objective of the compiler is to modify the code written in one language without altering the program's meaning.
● When you run a programme that is written in the language of HLL programming, it runs in two parts.
Q2) Define the different phases of the compiler?
A2) Phases of Compiler -
The method of compilation includes the sequence of different stages. Each phase takes a source programme in one representation and produces output in another representation. From its previous stage, each process takes input.
The various stages of the compiler take place:
2. Syntax Analysis - Syntax analysis is the second step of the method of compilation. Tokens are required as input and a parse tree are generated as output. The parser verifies that the expression made by the tokens is syntactically right or not in the syntax analysis process.
3. Semantic Analysis - The third step of the method of compilation is semantic analysis. This tests whether the parse tree complies with language rules. The semantic analyzer keeps track of identifiers, forms, and expressions of identifiers. The output step of semantic analysis is the syntax of the annotated tree.
4. Intermediate Code Generation - The compiler generates the source code into an intermediate code during the intermediate code generation process. Between high-level language and machine language, intermediate code is created. You can produce the intermediate code in such a way that you can convert it easily into the code of the target computer.
5. Code Optimization - An optional step is code optimization. It is used to enhance the intermediate code so that the program's output can run quicker and take up less space. It eliminates redundant code lines and arranges the sequence of statements to speed up the execution of the programme.
6. Code Generation - The final stage of the compilation process is code generation. It takes as its input the optimized intermediate code and maps it to the language of the target computer. The code generator converts the intermediate code to the required computer's machine code.
Q3) What is Bootstrapping?
A3) Bootstrapping -
Bootstrapping is commonly used in the production of compilations.
To generate a self-hosting compiler, bootstrapping is used. Self-hosting is a type of compiler that is capable of compiling its source code.
To compile the compiler, the Bootstrap compiler is used, and then you can use this compiled compiler to compile all else and future versions of the compiler itself.
Three languages can characterize a compiler:
A SCIT compiler for source S, Goal T, implemented in I, is shown in the T- diagram.
To generate a new language L for machine A, follow some steps:
2. Build an LCSA compiler for language L that is written in a subset of L.
3. Compile LCSA using the compiler SCAA to obtain LCAA. LCAA is a compiler for language L, which runs on machine A and produces code for machine A.
Bootstrapping is called the mechanism defined by the T-diagrams.
Q4) Explain the Lexical - analyzer Generator, in detail?
A4) Lexical - analyzer Generator
Lex is a lexical analyzer generating programme. It is used to produce the YACC parser. Software that transforms an input stream into a sequence of tokens is the lexical analyzer. It reads the input stream and, by implementing the lexical analyzer in the C programme, produces the source code as output.
If a token is considered invalid by the lexical analyzer, it produces an error. The lexical analyzer works with the syntax analyzer in close collaboration. It reads character streams from the source code, checks for legal tokens, and, when necessary, passes the information to the syntax analyzer.
Tokens
It is said that lexemes are a character sequence (alphanumeric) in a token. For each lexeme to be identified as a valid token, there are some predefined rules. These rules, by means of a pattern, are described by grammar rules. A pattern describes what a token can be, and by means of regular expressions, these patterns are described.
Example: the variable declaration line in the C language
int value = 100;
contains the tokens:
int (keyword), value (identifiers) = (operator), 100 (constant) and; (symbol)
Specification of Tokens:
Let us consider how the theory of language assumes the following terms:
Alphabets: set of symbols {0,1} is a collection of binary alphabets, {0,1,2,3,4,5,6,7,8,9, A, B, C, D, E, F} is a set of hexadecimal alphabets, and {a-z, A-Z} is a set of alphabets in the English language.
Strings: A string is called any finite sequence of alphabets. The length of the string is the total number of alphabet occurrences, i.e., the length of the string is 5 and is denoted by |india| = 5.
A string without an alphabet, i.e., a zero-length string, is known as an empty string and is denoted by ε (epsilon).
Special symbols: The following symbols are used in a standard high-level language:
Arithmetic Symbols | Addition(+), Subtraction(-), Modulo(%), Multiplication(*), Division(/) |
Punctuation | Comma(,), Semicolon(;), Dot(.), Arrow(->) |
Assignment | = |
Special Assignment | +=, /=, *=, -= |
Comparison | ==, !=, <, <=, >, >= |
Preprocessor | # |
Location Specifier | & |
Logical | &, &&, |, ||, ! |
Shift Operator | >>, >>>, <<, <<< |
Q5) Summarize the lex compiler?
A5) lex compiler
● Firstly, in the language of Lex, the lexical analyzer produces a lex.1 programme. The Lex compiler then runs the lex.1 programme and generates the lex.yy.c C programme.
● Finally, the programmer C runs the programme lex.yy.c and generates the object programme a.out.
● A.out is a lexical analyzer that converts an input stream into a token sequence.
Lex file format:
By percent delimiters, a Lex programme is divided into three parts. The structured sources for Lex are as follows:
{definitions}
%%
{rules}
%%
{user subroutines}
Definitions include constant, variable and standard meaning declarations.
The rules describe the form statement p1 {action1} p2 {action2} ....pn {action} p2 form p1....pn {action}
Where pi describes the regular expression and action1 describes the behaviour that should be taken by the lexical analyzer when a lexeme matches the pattern pi.
Auxiliary procedures required by the acts are user subroutines. It is possible to load the subroutine with the lexical analyzer and compile it separately.
Q6) What is YACC, explain it?
A6) YACC
● YACC stands for Compiler Compiler Yet Another.
● For a given grammar, YACC provides a method to generate a parser.
● YACC is a software designed to compile the grammar of LALR (1).
● It is used to generate the source code of the language syntactic analyzer provided by LALR (1) grammar.
● The YACC input is a rule of grammar, and a C programme is the output
Some points about YACC are as follows:
Input: A CFG- file. y
Output: A parser y.tab.c (yacc)
● The output file "file.output" includes tables of parsing.
● There are declarations in the file 'file.tab.h'.
● Yyparse was named by the parser ().
● To get tokens, Parser expects to use a function called yylex ().
The following is the essential operational sequence:
In YACC format, this file contains the desired grammar.
YACC software is seen.
It is the YACC-generated c source software.
C - compiler
Executable file to decode grammar defined in gram.YY.
Q7) Describe a Derivation?
A7) Derivation -
The derivation is a sequence of the laws of development. Via these production laws, it is used to get the input string. We have to make two decisions during parsing. Those are like the following:
● The non-terminal that is to be replaced must be determined by us.
● We have to determine the law of development by which the non-terminal would be substituted.
We have two choices to determine which non-terminal is to be replaced by the rule of output.
Leftmost derivation:
The input is scanned and replaced with the development rule from left to right in the left-most derivation. In most derivatives on the left, the input string is read from left to right.
Example -
Production rules:
Input
a - b + c
The left-most derivation is:
S = S + S
S = S - S + S
S = a - S + S
S = a - b + S
S = a - b + c
Right most derivation:
The input is scanned and replaced with the development rule from right to left in the rightmost derivation. So, we read the input string from right to left in most derivatives on the right.
Example:
Input:
a - b + c
The right-most derivation is:
S = S - S
S = S - S + S
S = S - S + c
S = S - b + c
S = a - b + c
Q8) Explain Parse tree, with example?
A8) Parse tree -
● The graphical representation of symbols is the Parse tree. Terminal or non-terminal may be the symbol.
● In parsing, using the start symbol, the string is derived. The starter symbol is the root of the parse tree.
● It is the symbol's graphical representation, which can be terminals or non-terminals.
● The parse tree follows the operators' precedence. The deepest sub-tree went through first. So, the operator has less precedence over the operator in the sub-tree in the parent node.
Following these points, the parse tree:
● The terminals have to be all leaf nodes.
● Non-terminals have to be all interior nodes.
● In-order traversal supplies the initial string of inputs.
Example -
Production rules:
Input
a * b + c
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Q9) What do you mean by BNF notation and Ambiguity, explain?
A9) BNF notation -
BNF stands for Backus-Naur Form. It is used to write a formal representation of grammar that is context-free. It is often used to define a programming language's syntax.
Basically, BNF notation is just a version of context-free grammar.
Productions at the BNF take the form of:
Left side → definition
If left side∈ (Vn∪ Vt)+ and (definition) concept ∈ (Vn∪ Vt)*. In BNF, one non-terminal is found on the left line.
The various productions with the same left side can be identified. The vertical bar symbol "|" is used to distinguish all productions.
For any grammar, the output is as follows:
In BNF, the grammar above can be expressed as follows:
S → aSa| bSb| c
Ambiguity -
If there is more than one leftmost derivation or more than one rightmost derivative or more than one parse tree for the given input string, grammar is said to be ambiguous. It's called unambiguous if the grammar is not vague.
Example:
The above grammar produces two parse trees for the string Aabb:
If there is uncertainty in the grammar, then it is not good for constructing a compiler.
No method can detect and remove ambiguity automatically, but by re-writing the entire grammar without ambiguity, you can remove ambiguity.
Q10) Define finite state machines?
A10) Finite state machine -
➢ To identify patterns, finite state machines are used.
➢ The Finite Automated System takes the symbol string as input and modifies its state accordingly. When the desired symbol is found in the input, then the transformation happens.
➢ The automated devices may either switch to the next state during the transition or remain in the same state.
➢ FA has two states: state approve or state deny. When the input string is processed successfully and the automatic has reached its final state, it will approve it.
The following refers to a finite automatic:
Q: a finite set of states
∑: a finite set of the input symbol
q0: initial state
F: final state
δ: Transition function
It is possible to describe transition functions as
δ: Q x ∑ →Q
The FA is described in two ways:
DFA
DFA stands for Deterministic Finite Automata. Deterministic refers to computational uniqueness. In DFA, the character of the input goes to only one state. The DFA does not allow the null shift, which means that without an input character, the DFA does not change the state.
DFA has five tuples {Q, ∑, q0, F, δ}
Q: a set of all states
∑: a finite set of input symbol where δ: Q x ∑ →Q
q0: initial state
F: final state
δ: Transition function
NDFA
NDFA applies to Finite Non-Deterministic Automata. It is used for a single input to pass through any number of states. NDFA embraces the NULL step, indicating that without reading the symbols, it can change the state.
Like the DFA, NDFA also has five states. NDFA, however, has distinct transformation features.
NDFA's transition role may be described as:
δ: Q x ∑ →2Q