Inside the mkCheckSource Parser

RADE	C++ Source Checker	Inside the mkCheckSource Parser Testing source code in the CAA V5 environment
Technical Article

Abstract

This article describes the choices of architecture made for the internal parser of CSC. Reading this article, you will better understand the technology used and the limitations inherent to this technology.

Positioning the parser in CSC architecture
Similarities with a compiler
The technology
Limitations
In Short

Positioning the Parser in CSC architecture

CSC is a static controller, i.e. it is a tool working on C++ source code and analyzing it to reveal potential defects and logical errors existing inside. This tool uses a C++ parser to analyze the source, to generate the syntax tree of the source code and to create the symbol table. The extension modules of CSC will then use these results in conjunction with pre-coded rules to report potential bugs.

[Top]

Similarities with a compiler

The following picture describes :

the different steps of generation of a standard executable program.
more precisely the steps involved in program compilation.

The C++ analyzer included in CSC (the engine of the tool, we call it improperly "the parser"), is a cousin tool of a compiler. This parser makes the front task of a compiler, plus an internal preprocessing. The final task of the compiler is not implemented in CSC, but CSC uses the data structures generated by the parser to implement its checks.

[Top]

The technology

CSC is a fully internal tool. No external software (like a lexical analyzer) have been used for the analyzer. We can then have a complete control on the code of the parser, to implement some choices of architecture. These choices are a compromise between speed and reliability.

Next picture describes the different steps occurring when analyzing a C++ file.

Let's take an example with a very simple source :

CATBase.cpp

#include "inc1.h"
#include "inc2.h"
#include "inc3.h"

HRESULT CATBase::Run()
{
	HRESULT hr;
}

These steps are :

Loader

This module loads the file from disk to memory. File loader is running in a separate thread, thus preventing the program from waiting for file access.

Include analyzer

This module analyzes the "#include ..." in the file to load recursively every file included, according to CAA file tree.

CATBase.cpp

CATBase.cpp
  -> inc1.h
     -> inc4.h
     -> inc5.h
        -> inc9.h
        -> inc10.h
     -> inc6.h
  -> inc2.h
     -> inc7.h
     -> inc8.h
  -> inc3.h

The first file loaded will be inc4.h, then inc9.h, then inc10.h, then inc5.h, ... and the last one will be inc3.h.

All these files will be scanned, parsed and semantically analyzed before file CATBase.cpp.

Scan and macro expansion

Linear analyze. This module scans the current file to keep only useful characters. In this step, every preprocessor commands are removed, as well as every return or unneeded blank.

The macro expansion is not systematic, only macros declared in the setting files will be treated.These macros are expanded as follows: first, the source code is scanned to find all their occurrences. Then, for each occurrence, the macro definition is retrieved and the expansion is done, taking into account the possible arguments. Finally, the occurrence of the macro is replaced by the corresponding expanded code.

After this step, the program got in memory a single line describing the whole source file.

HRESULT CATBase::Run(){HRESULT hr;}

Parse

This step creates the parse tree of the source from the buffer and the lexical tokens.

The method used is a predictive recursive-descent parsing by operators precedence. Neither backtracking, nor error recovery is done in this step.

Declaration  HRESULT CATBase::Run()
       TAPExpression [::]   HRESULT CATBase::Run()
         TAPExpression [dc]   HRESULT CATBase
           TAPExpression [id]   HRESULT
             TAPIdentifiant  HRESULT
           TAPExpression [id]   CATBase
             TAPIdentifiant  CATBase
         TAPExpression [mt]   Run()
           TAPExpression [id]   Run
             TAPIdentifiant  Run
     TAPBloc
       TAPInstruction  HRESULT hr
         TAPExpression [dc]   HRESULT hr
           TAPExpression [id]   HRESULT
             TAPIdentifiant  HRESULT
           TAPExpression [id]   hr
             TAPIdentifiant  hr

Semantic analysis

This analyze builds the symbol table of the source. This is not a symbol table in the common sense of the term, but rather the data structure of the definitions found in the source (like class definition, method declaration, method implementation, data member declaration, variable declaration, ...), with all links between them.

[Top]

Limitations

preprocessing

For performance reasons, no real preprocessing is done.

No preprocessor directives interpretation is done (like #ifdef, #ifndef, #else, #endif...). The whole code is taken into account.

The included files are recursively analyzed, without any expansion (one included file will be analyzed only once in one run, no matter the number of files including this header, this is quite similar to incremental compilation done by MSDev).

Nevertheless, if needed, expansion of some macros in source code can be done. These macros must be declared in the setting file. For some reasons, some checks need to take into account some other macros. Those are well defined, and their behavior is pre-coded.

templates

Templates are not supported (templates are forbidden in CAA).

operators overloading

Operators overloading is not supported.

multiples definitions

Multiple definitions are not supported. Only the first definition will be taken into account.

inner classes

Inner classes are not supported (they are not seen by the semantic analyzer, and not kept in the data structure).

file tree

Only the CAA file-tree is supported (structure with Frameworks, modules, Interfaces directories...).

[Top]

In Short

The source analyzer is the engine of CSC. It is equivalent to the front part of a compiler. The choices of architecture (speed of execution) implies that little preprocessing is done (included files and expansion of some macros). For that reason, it is strongly recommended to avoid using macros (except the macros provided in CAA) to code a CAA application. Nevertheless, if needed, it is possible to declare in the setting file [1] (see section Global_OptionsLists - Macro2Expand) a list of macros to be expanded by the tool.

[Top]

References

[1]	Setting Files
[Top]

History

Version: 1 [Apr 2001]	Document created
[Top]

RADE

C++ Source Checker