Matching Strategies

Support for concrete syntax in pattern specifications is not uncommon among advanced pattern detection tools. However, they perform the actual matching in a strict, syntactic manner. In our approach, template terms are matched according to multiple matching strategies that vary in leniency. Otherwise, none of the following queries could ever have any solutions in real-life. Their template terms exemplify the prototypical implementation of a pattern's characteristics. At the top, those of a regular for-statement that might be transformed to the enhanced for-statement introduced in Java 5. In the middle, those of a closed scanner that is read from in Java. At the bottom, those of the prototypical implementation of a getter method.

if jtStatement(?s){
     for(?initList; ?iterator.hasNext(); ?updList)
       ?iterator.next();
   }

if jtMethodDeclaration(?m){
     public static void main(String[] args) {
       ?scanner := new java.util.Scanner(?argList);
       ?scanner.close();
       ?scanner.next();
     }
    }

if jtClassDeclaration(?c){
     class ?className {
       private ?fieldDeclarationType ?fieldName;
       ?modifierList ?returnType ?methodName(?parameterList) {
         return ?fieldName;
       }
     }
   }

We refer to these matching strategies as example-based interpretations. They realize the example-based semantics of a template term. The following example-based interpretations are predefined:

Under the syntactic interpretation, AST nodes match a template term if they exhibit the syntactic characteristics exemplified by its code excerpt. Moreover, matching nodes should not exhibit any other syntactic characteristics. None of the template terms in the above queries would have a real-life match under this interpretation.
The lexical interpretation is a less restrictive version of the syntactic interpretation. Matches have to exhibit the syntactic characteristics exemplified by the excerpt, but are allowed to exhibit additional ones. Class declarations, for instance, are allowed to have additional declarations besides the exemplified ones. Most class declarations with a getter method would therefore match the third template term above. Likewise, for-statements with multiple statements in their body are also potential matches for the first template term.
Under the control flow interpretation, matching method declarations have to exhibit the control flow characteristics exemplified by the template term. There should be a path through the control flow graph of the method (i.e., existentially qualified) on which all exemplified instructions are executed. Non-specified instructions are allowed on the execution path. The path also crosses method boundaries (i.e., it is inter-procedural). As a result, matches for an instruction in the template term need not reside in the method declaration that matches the term. This is why the second template term above can have real-life matches. While ?m will always be bound to main(String[]), the close() and next() instructions can reside in methods called directly or transitively from main(String[]).

In template terms, multiple occurrences of the same variable express a data flow characteristic. In the first template term above, for instance, the occurrences of ?iterator require the receiver of hasNext() and next() to be the same. The occurrences of ?scanner in the second template term require the receiver of close() and next() to be the same Scanner instance. In the third template term, the occurrences of ?fieldName link the value returned by the getter method to the field it is protecting. The bindings for multiple occurrences of the same variable have to unify —which brings us to the domain-specific unification procedure of SOUL.

The SOUL program query language

Features of SOUL

Matching Strategies