RADE |
C++ Source Checker |
Inside the mkCheckSource ParserTesting source code in the CAA V5 environment |
Technical Article |
AbstractThis article describes the choices of architecture made for the internal parser of CSC. Reading this article, you will better understand the technology used and the limitations inherent to this technology. |
CSC is a static controller, i.e. it is a tool working on C++ source code and analyzing it to reveal potential defects and logical errors existing inside. This tool uses a C++ parser to analyze the source, to generate the syntax tree of the source code and to create the symbol table. The extension modules of CSC will then use these results in conjunction with pre-coded rules to report potential bugs.
[Top]
The following picture describes :
The C++ analyzer included in CSC (the engine of the tool, we call it improperly "the parser"), is a cousin tool of a compiler. This parser makes the front task of a compiler, plus an internal preprocessing. The final task of the compiler is not implemented in CSC, but CSC uses the data structures generated by the parser to implement its checks.
[Top]
CSC is a fully internal tool. No external software (like a lexical analyzer) have been used for the analyzer. We can then have a complete control on the code of the parser, to implement some choices of architecture. These choices are a compromise between speed and reliability.
Next picture describes the different steps occurring when analyzing a C++ file.
Let's take an example with a very simple source :
CATBase.cpp
#include "inc1.h" #include "inc2.h" #include "inc3.h" HRESULT CATBase::Run() { HRESULT hr; } |
These steps are :
This module loads the file from disk to memory. File loader is running in a separate thread, thus preventing the program from waiting for file access.
This module analyzes the "#include ..." in the file to load recursively every file included, according to CAA file tree.
CATBase.cpp CATBase.cpp -> inc1.h -> inc4.h -> inc5.h -> inc9.h -> inc10.h -> inc6.h -> inc2.h -> inc7.h -> inc8.h -> inc3.h |
The first file loaded will be inc4.h, then inc9.h, then inc10.h, then inc5.h, ... and the last one will be inc3.h.
All these files will be scanned, parsed and semantically analyzed before file CATBase.cpp.
Scan and macro expansion
Linear analyze. This module scans the current file to keep only useful characters. In this step, every preprocessor commands are removed, as well as every return or unneeded blank.
The macro expansion is not systematic, only macros declared in the setting files will be treated.These macros are expanded as follows: first, the source code is scanned to find all their occurrences. Then, for each occurrence, the macro definition is retrieved and the expansion is done, taking into account the possible arguments. Finally, the occurrence of the macro is replaced by the corresponding expanded code.
After this step, the program got in memory a single line describing the whole source file.
HRESULT CATBase::Run(){HRESULT hr;} |
Parse
This step creates the parse tree of the source from the buffer and the lexical tokens.
The method used is a predictive recursive-descent parsing by operators precedence. Neither backtracking, nor error recovery is done in this step.
Declaration HRESULT CATBase::Run() TAPExpression [::] HRESULT CATBase::Run() TAPExpression [dc] HRESULT CATBase TAPExpression [id] HRESULT TAPIdentifiant HRESULT TAPExpression [id] CATBase TAPIdentifiant CATBase TAPExpression [mt] Run() TAPExpression [id] Run TAPIdentifiant Run TAPBloc TAPInstruction HRESULT hr TAPExpression [dc] HRESULT hr TAPExpression [id] HRESULT TAPIdentifiant HRESULT TAPExpression [id] hr TAPIdentifiant hr |
This analyze builds the symbol table of the source. This is not a symbol table in the common sense of the term, but rather the data structure of the definitions found in the source (like class definition, method declaration, method implementation, data member declaration, variable declaration, ...), with all links between them.
[Top]
For performance reasons, no real preprocessing is done.
No preprocessor directives interpretation is done (like #ifdef, #ifndef, #else, #endif...). The whole code is taken into account.
The included files are recursively analyzed, without any expansion (one included file will be analyzed only once in one run, no matter the number of files including this header, this is quite similar to incremental compilation done by MSDev).
Nevertheless, if needed, expansion of some macros in source code can be done. These macros must be declared in the setting file. For some reasons, some checks need to take into account some other macros. Those are well defined, and their behavior is pre-coded.
Templates are not supported (templates are forbidden in CAA).
Operators overloading is not supported.
Multiple definitions are not supported. Only the first definition will be taken into account.
Inner classes are not supported (they are not seen by the semantic analyzer, and not kept in the data structure).
Only the CAA file-tree is supported (structure with Frameworks, modules, Interfaces directories...).
[Top]
The source analyzer is the engine of CSC. It is equivalent to the front part of a compiler. The choices of architecture (speed of execution) implies that little preprocessing is done (included files and expansion of some macros). For that reason, it is strongly recommended to avoid using macros (except the macros provided in CAA) to code a CAA application. Nevertheless, if needed, it is possible to declare in the setting file [1] (see section Global_OptionsLists - Macro2Expand) a list of macros to be expanded by the tool.
[Top]
[1] | Setting Files |
[Top] |
Version: 1 [Apr 2001] | Document created |
[Top] |
Copyright © 2000, Dassault Systèmes. All rights reserved.