How does Grapnel Compiler work

The Grapnel is a graphical programming language. This means that a programmer can build up his program using graphical elements placing them on the screen and connecting them together. In this context every graphical element represents a small or even a bigger piece of the program and the connections describe in which order these parts of the program must be executed. The programmer uses a Graphical Editor (GRED) instead a text editor when (s)he makes programs. It is very easy to make programs in this way but unfortunately there is not an execution system that would be able to execute graphical symbols directly. We are developing a compiler program that can translate the graphical representation of the program into the standard C source code. The programs generated by the compiler can be compiled with any standard C compiler and can be executed in general way.

First step of the translation is made by the graphical editor. The editor makes a text file called GRP file that describes the whole system built by the programmer. The GRP file contains information not only about the program but the screen of the editor, location of the windows and the graphical symbols, etc. and information requested by the debugging, monitoring and animating tools. The compiler has to select information and pick up explanations about the program.

The syntax of the GRP file is very strictly and it is defined in BNF (Backus-Naur Form). We may consider that the GRP file is the source of a program that uses the GRP syntax and the Grapnel compiler has to translate it into the source of an other program that uses the syntax of the C programs. The syntax definition of the GRP file allows Grapnel editor to split information about elements of the program into several parts and to place them into the GRP file separately. This was the main problem during the implementation of the Grapnel compiler.

Lexical analysis

We used standard programming tools of the UNIX operating system to implement the Grapnel compiler. The first step of the implementation of any compiler is making a lexical analyser. The analyser routine reads characters of the source code sequentially and tries to reconcile the read character string with the syntactical units defined in the BNF syntax. We describe these syntactical units in a special form and use this description to make the lexical analyser routine with the tool called lex. The lex is a very effective programming facility that requires a description of the syntax rules and makes the lexical analyser routine in C source. Required description is very simply, for example let's see the following rule:

[-+]?[0-9]+[eE][-+]?[0-9]+	{
				  FloatParam= atof(yytext);
				  return(FLOATNUMBER);
				}

This rule is from the source of the lexical analyser of the Grapnel compiler and describes the syntax of the floating point numbers. The left side of this rule is an expression. The lex translates this expression into the C instructions and this code checks the input string and if it matches with the expression then executes actions described in the right hand side of the rule. This action is a compound C instruction. It converts the recognised input string that is in the yytext variable into the floating point number and places it in the FloatParam variable. After the conversion the analyser returns to the caller and informs it that the recognised token was a FLOATNUMBER. The caller of the analyser must make a decision that the token can appear in the source at the actual point of the processing or not.

If the analyser founds a matched expression then it avoids to check of the next rules so the order of the definition of the rules is very important. Last three rules are special in some respect.

[a-zA-Z_][0-9a-zA-Z_]*		return(LookReserved());
\n				LineNum++;
.				;

The first one of these three rules contain an expression that corresponds to the identifiers and reserved words of the GRP file. If the input may be an identifier the analyser calls a function made by us. This function picks up the identifier and looks for it in the database of the reserved words. If the word is in the database then analyser will return token attached to that reserved word otherwise it will return the token called ID that is a token of identifiers.

The expression in the second rule will match with the end of the line character. If analyser detects the end of the line then increments the line counter. The number of the actual line is very important information when we inform user or the Grapnel editor about the wrong lines of the source GRP file.

The last rule has got a special expression that will match with every string read from the input file and force the analyser to do nothing. It is very important because without this action the analyser copies unrecognised inputs --for example the empty lines-- into the standard output. This side effect is unexpected and can confuse the user of the compiler.

First phase of the compilation

As we explained earlier the GRP file may split information about elements of the program into several parts. Because of this characteristic of the GRP file the Grapnel compiler needs use two phases durging the compilation. In the first phase the compiler scans the GRP file and collects information about the elements of the program such as processes, ports, connections, etc. During this activity the compiler calls lexical analyser to get the next token from the source and checks if it is valid token or not. It is not so easy to write a C program that accomplishes this parsing operation. There is an other UNIX programing tool called yacc that helps us to realise this routine. Yacc uses a description of the syntax to implement the parser routine.

This description is very similar to the BNF. Yacc parser definition file contains rules like BNF does. Let's see an example:

HeaderPart	: HEADERPART
		  {ErrorCode= ceNeedBraceOpen;}
		  BRACEOPEN
		  HeaderSList
		  {ErrorCode= ceNeedBraceClose;}
		  BRACECLOSE
		  {ErrorCode= ceNeedAppPart;}
		;

HeaderSList	:
		| HeaderSList
		  HeaderSection
		;

Left side of the rules defines a non-terminal symbol that represents a rule of the syntax. Right side describes the rule. The description is a list of terminal and/or non-terminal symbols. Terminal symbols are tokens returned by the lexical analyser. Every non-terminal symbol used in the rules must appear in the left side of one or more other rule. Rules are closed with semicolon.

In the example shown above the HeaderPart is symbol of the part of the syntax that describes how the information about headers should be written in the GRP file. This part of the file must begin with HEADERPART token. This token belongs to reserved word "headerpart". The reserved word must be followed by a "{" character and a list of header sections. This part of the file must be closed with a "}" character (token BRACECLOSE). In this rule the HeaderSList is a non-terminal symbol and it is described in the next rule. Any part of the syntax can be represented by one or more rules. We can write these alternative forms in one rule separated with "|". Description of the list of the header sections means that the list can be empty or can be a list followed by a header section.

In the rules we may write C instructions which will be executed after the recognition of the previous token. Using this capability we implement actions that must be taken during the first phase of the compilation.

The most important work of the compiler is to build internal data structures that represent the program. For example if the parser finds information about a process then it makes a new data structure and places information into it. If the parser finds information about the same process during the scanning then it picks up the information and extends the existing data structure and places new information into the extended structure.

Second phase of the compilation

When all the requested information about processes, ports, and other program elements are collected from the GRP file, the second phase of the compilation can be done. Second phase is source code generation. The Grapnel compiler makes one source file for the application server program and one file for every process. The generated source files are unique executable C programs that can communicate with each other using Grapnel system calls. These function calls are inserted into the source files by the compiler.

The compiler first makes the code of the application server program. The compiler uses a template file to make the source file. This template file is inserted into the executable code of the compiler so it is very hard to change it. In the future versions of the compiler we are going to use external template. The compiler includes process creation functions into the template using information about processes found in process collection. The compiler also makes a header file for the application with same name and ".h" suffix. This header file contains constant definitions for example the number of the processes.

After the application server program the compiler generates source codes of the processes. Every process is implemented in a separate source file. Like the application template file, the process template file is inserted into to executable code of the compiler too, but we are going to use external template in the future. Every source code of the processes starts with include section. It depends on information located in the header section of the GRP file. Next part is the definition part of the global variables. This part and the beginning part of the main function --where local variables and the channels are defined-- are included to the template by the compiler. After variable definitions the compiler inserts C instructions into the template.

Several Grapnel system calls will be inserted before the code made by programmer. These system calls start the process (inform application server that the process is being run) and initialise the channels used by the process. To make the source code of these function calls the compiler gives information from the collection of the ports that is built during the first phase of the compilation.

The functionality of the process is defined by the programmer and it is represented by the program items. These items correspond to graphical elements of the program. Every item represents a small piece of the executable code and the connections between the items define the order of the execution. The items must be translated into the C code in the appropriate order. First and last items in the chain have special type. They used to sign the beginning and the end of the chain only. The program items have connections. These connections are references to an other item. The object of the process collects these references in a collection. The compiler looks for the first item and starts the code generation using that item. After the code generation the compiler looks for an other item connected to the actual one. Every item must be processed only once. There are several types of the program items and some of them must be processed recursively.

Back to Grapnel Compiler.