Details of Pattern Specification and Installation

Downloading the project:

The jar files and the source for the pattern recognition tool can be downloaded from the following links.

Installing the Pattern Recognition Tool

Before running the pattern recognizer tool, certain extra packages must be installed on the system. As the tool is a Java-based project, a suitable version of Java depending on your platform must be installed.

Installing the C preprocessor (cpp)

The parsing process of Cetus requires the invocation of the C preprocessor (cpp) which should be installed on your system. For the Linux platform, the GCC package could be installed while for the Windows platform, the Cygwin project or the MinGW project can be used. Both the cpp and the Java commands must be accessible from the command line.

Running the tool

Browse to the folder containing the project from the command line and use the following commands to invoke the tool and the analysis package.

on Linux:

java -classpath cetus.jar:idaprt.jar dsptester.TestcaseRunner ”folder- path”

on Windows:

java -classpath cetus.jar;idaprt.jar dsptester.TestcaseRunner ”folder- path”

where folderpath is the path for the folder that contains the C source files. The tool then applies the pattern matching algorithm to any file found

in the source folder and analyzes each file by calculating certain measurements . At the end, the tool generates the Results.html file which includes the analysis results for all the files in the source folder. Apart from the analysis results, the tool creates two other folders which include the results from running the application. The cetus output folder contains the new files which have been annotated with various instances of patterns and the non detected folder includes the list of non matched statements in each file.

Including the header files

If any C file chosen for pattern recognition uses any custom library or header files, those files must be put in the same folder as the tool, otherwise the file would not pass the parsing step of the Cetus framework.

Importing the recognition tool in other Java projects: The tool can also be invoked from other java programs by importing the following files in those projects :

PatternRecognitionTool.execute(filepath);


Choosing the target architecture:

The current version of the tool can output the detected patterns to one of the following architectures:

Switching among different target architectures can be easily accomplished by opening the TargetArchitecture.xml file and modifying the id property of the defaultplatfrom tag. The value of the id property can be choosen among the name of various architectures defined by the architecture tag .

Pattern Structure

Extending the pattern specification file (PatternDefinition.xml) with a new pattern is accomplished through the specification of the XML element corresponding to the new pattern. The XML definition of patterns are based on the elements of the Cetus IR tree so that the recognition tool can easily map the nodes in the program IR to the elements in the XML definition file. We start by defining trivial patterns for the fundamental data types of Cetus such as integer, float, variable and array which are found at the leaf node level as the program Cetus IR is traversed in post-order. The tool then continues the recognition process until it reaches a root of a sub-tree, at this point, the tool tries to map between the whole sub-tree and one of patterns with the help of children patterns and the pattern hierarchy graph. The following sections describe the pattern definition task in detail.

Specifying an XML element for a pattern

For each new pattern an independent pattern XML tag is defined which includes specific properties regarding the name, the structure, the constraints and the output format of that pattern. The following figure describes the general structure of each pattern.


The definition of each property is as follows:

Examples of defining new patterns:

As this pattern basically represents a + operation between two variables, it would have two children;one for the left side of the operation and one for the right side. As we can see here in the examples, the left and the right side can be either an array, a variable or number which are represented by their own unique trivial patterns, as a result, all the occurring pattern instances of each child node are specified. For example consider the definition of the ADD2OP pattern:


As seen here, the definition shows that first child ( the left side of the add operation) and the second child can be one of the following patterns: Identifier, ArrayAcess, IntergerLiteral, FloatLiteral.

Defining a Level1/ Level2 pattern:

The Level1 and Level2 pattern represent a for-loop statement for which certain constraints have been defined. Before getting into the details of pattern definition, certain concepts have to be explained:

// the condition of the for- loop is missing

for( i=0 ; ;i++)

{

x[i]=10

}

// the variable initialization statement is missing

for(; i<10 ;i++)

{

x[i]=10;

}

// the step statement is missing

for (i=0 ; i<10)

{

x[i]=10;

}

Pattern Definition: The definition of these patterns is accomplished by setting the operator tag to the value ForLoop and the level tag to either Level1 or Level2. For patterns with the ForLoop operator tag, the program calls a certain routine to make sure that this for-loop is valid according to our own convention. The pattern definition continues with specifying the children pattern and certain constraints depending on the pattern.

For example, the definition of the VCOPY pattern is as follows:


Patterns in form of an if-statement: certain patterns are structured as if-statements, for example consider one definition of the MAXVAL pattern which is as follows:

if( a>b)

m=a;

else

m=b;

As we can see here the whole pattern is defined as an if-statement which can be specified by in the XML pattern tag by setting the operator property of the tree tag to the value “IfStatement”.

<children skipcondition="true">

Defining a help-pattern: In order to define a help-pattern, we simply set the helppattern property of the pattern tag to the value “true” in the following way:

<pattern name="" helppattern="true" >

Details of Constraints:

where the paramid property refers to the element for which this typecheck constraint is being defined and the type property refers to the expected type an can have one of the following values:

For example the typecheck constraints specified for the VCOPY pattern are as follows:


Both typecheck constraints specify that the first sub-child of the scopy child pattern ( the left side of the scopy pattern) and the second sub-child of the the scopy child pattern (the right side of scopy pattern) must be subscripted arrays on the loop-variable of this for-loop.

For example, consider the definition of the AADD pattern ( x=x+y):




As the left side of the assignment must be equal to first element on the right side, two rulecheck constraints have used. The AADD pattern has two children where the first one refers to left side of the assignment and second child refers to the right side which in itself is an instance of the ADD2OP ( x+y) pattern. The first rulecheck constraint used here is as follows:

<rulecheck type="eq">

<arg paramid="1" />

<arg paramid="2.2"/>

</rulecheck>

The first arg tag refers to the left side of the assignment and the second one refers to second child of the ADD2OP pattern (the left side of the add operation).

<indexstructure paramid="">

<indices>

<index >

<type></type>

<type></type>

</index>

</indices>

<indexrulecheck type="">

<arg paramid="" />

<arg paramid=""/>

</indexrulecheck>

<indexrulecheck type="">

<arg paramid="" />

<arg paramid=""/>

</indexrulecheck>

</indexstructure>

The inner elements of the indexstructure tag are as follows:

LV

Refers to the loop-variable of the current sub-tree for-loop.

LV_UB

Refers to the upper bound of loop-variable of the current sub-tree for-loop.

LV_LB

Refers to lower bound of the loop-variable of the current sub-tree for-loop.



<forloopcheck direction="ASC" stepincrement="1"/>

Defining a Horizontal pattern: Defining a horizontal pattern can be accomplished in the following steps:

Nomemoryaccess Constraint: of all the defined constraints, this specific constraint is used to check the occurrence of the read or write operation for certain variable in between two statements and is defined using the following constraint:

<nomemoryaccess paramid="" type="" startpoint="" endpoint=""/>

The definition of elements inside this tag is as follows:

An Example: the horizontal swap pattern tries to merge some assignment statements which have the following conditions:

x=y; \\statement 1

.....

y=z; \\ statement 2

..

z=x; \\ statement 3

As seen here, the value of variable y has been swapped with the value of variable z. This three assignments can be replaced with an instance of the swap pattern under the following conditions:

The definition of the structure and constraints of the swap pattern is as follows:


Extending the recognition tool to handle complex patterns:

While the specified XML based pattern definition syntax can be used to specify a large number of new patterns, there still exist certain complicated patterns which require unique constraints. Our tool allows manual specification of recognition rules with the help of an auxiliary class which is done in the following steps:

AuxiliaryRecognitionModule.setCustomUserConstraint(AuxiliaryClass);