Features » The SOUL program query language

Motivating Example: Pattern Detection

SOUL can easily be adopted as an advanced pattern detection tool. In theory, the declarative nature of logic programming should result in descriptive pattern specifications. In practice, however, we have observed that logic program queries tend to become convoluted and operational in nature when they are used to detect pattern instances.

To illustrate this problem, consider a pattern that describes a coding convention in which each subclass of Component (except the one named Composite) is required to define a method acceptVisitor(ComponentVisitor) that prints a message before double dispatching to its parameter. The prototypical implementation of such a method is depicted below:

public class PrototypicalLeaf extends Component {
  public void acceptVisitor(ComponentVisitor v) {
    System.out.println("Prototypical.");
    v.visitPrototypicalLeaf(this);
  }
}

The following SOUL query can be used to detect methods that adhere to this coding convention (cf. the syntax of SOUL):

if ?type isTypeWithFullyQualifiedName: ['presentation.Component'],
   ?class extendsType: ?type,
   not(?class classDeclarationHasName: simpleName(['Composite'])),
   
   ?class definesMethod: ?m,
   ?m methodDeclarationHasName: simpleName(['acceptVisitor']),
   ?m methodDeclarationHasParameters: nodeList(<?p>),
   ?p singleVariableDeclarationHasName: simpleName(?id),
   
   ?m methodDeclarationHasBody: ?body,
   ?body equals: block(nodeList(<expressionStatement(?log),expressionStatement(?dd)>)),
   or(?so equals: qualifiedName(simpleName(['System']),simpleName(['out'])),
      ?so equals: fieldAccess(simpleName(['System']),simpleName(['out']))),
   ?log equals: methodInvocation(?so,?,simpleName(['println']),?s),
   ?dd equals: methodInvocation(simpleName(?id),?,?,nodeList(<thisExpression([nil])>))

Using the predicates from the CAVA library, this logic program query quantifies explicitly over the abstract syntax trees of the program under investigation. In each solution to the query, logic variable ?class is bound to a class declaration that complies with the coding convention. Positive about this query is that the conditions on lines 6—15 merely have to be negated to find class declaration AST nodes that violate the coding convention. Unfortunately, the query is convoluted, requires expert knowledge to understand and only recalls the prototypical implementation of the pattern:

The query is convoluted. Lines 1—5 identify a method ?m defined in a class ?class that extends the fully qualified type example.Component and is not named Composite. Lines 6—8 ensure that method ?m is named acceptVisitor and has a single parameter that can be referred to by the name ?id in its body ?body. Lines 10—15 ensure that this body consists of two statements wrapping the actual expressions ?log and ?dd that perform the logging and the double dispatching respectively. Line 15 specifies how this double dispatching has to be implemented: by means of a method invocation of which the receiver is named ?id (i.e., the name of the parameter of the method) and of which the single argument is an unqualified this-expression (i.e., this).
Expert knowledge is required to understand this query. First of all, one must know how the program representation is reified in order to quantify over it in a logic query. In this case, AST nodes seem to be reified as compound terms (e.g., simpleName(?id)) of which each argument is a reified child node. Second, one must know how the relations between reified program elements are reified (e.g., the two-argument predicate methodDeclarationHasName:/2 from the CAVA library). Finally, one has to be aware of the intricate details of the program representation itself. In this case, the details of the abstract grammar for the actual AST nodes (e.g., expressions within a block are wrapped in an expression statement) and the parser that produced them (e.g., System.out can be parsed as a field access as well as a qualified reference). Both stem from the implementation of the Eclipse IDE.
The query does not recall implementation variants of the pattern, such as the following:
```
public class SuperLogLeaf extends OnlyLoggingLeaf {
  public void acceptVisitor(ComponentVisitor v) {
    super.acceptVisitor(v);
    v.visitSuperLogLeaf(this);
  }
}

public class MustAliasLeaf extends Component {
  public void acceptVisitor(ComponentVisitor v) {
    System.out.println("Must alias.");
    Component temp = this;
    v.visitMustAliasLeaf(temp);
  }
}

public class MayAliasLeaf extends Component {
  public Object m(Object o) { 
    if(getInput() % 2 == 0) 
      return o;
    else 
      return new MayAliasLeaf();
    }
  public void acceptVisitor(ComponentVisitor v) {
    System.out.println("May alias.");
    v.visitMayAliasLeaf((MayAliasLeaf)m(this));
  }
} 
```
Each class represents an implementation variant of the pattern. In class SuperLogLeaf, for instance, the required logging is performed through a super invocation. It is, however, not included in the results for the logic program query. In class MustAliasLeaf, the double dispatching occurs in all executions of the program. In class MayAliasLeaf, in contrast, the required double dispatching only occurs when the user inputs an even number. While all of these classes adhere to the coding convention to some extent, none of them are detected by the above logic program query.

This is because the query expresses the control flow (i.e., the logging should occur before the double dispatching) and data flow characteristics (i.e., the receiver of the method invocation should be the method's parameter and the argument of the method invocation should be the current object) of a method that adheres to the coding convention in terms of the structural characteristics of its AST nodes. The method's body, for instance, is required to consist of exactly two statements (line 11 of the query) rather than an arbitrary block of statements that will successively perform the logging and the double dispatching at run-time. The argument to the second statement is, moreover, required to be a this-expression rather than an arbitrary expression that evaluates to the current object. Clearly, enumerating the structural characteristics of each implementation variant of a behavioral characteristic is infeasible. Alternatively, the control flow characteristics of a complying methods could be expressed by quantifying over the paths through its control flow graph. The data flow characteristics could be expressed by quantifying over the results of analyses that compute this information (e.g., an alias analysis) —which are much more intricate than those of AST nodes illustrated above.

The SOUL program query language

Features of SOUL

Motivating Example: Pattern Detection