Supporting Code Integration Activities Using Program Differencing

Printer-friendly version
Info
Context: 

Most major software projects use version control, like Subversion or Git. Especially Git, a distributed version control system, has seen growth in recent years (Skerett, 2014). Git encourages a workflow in which developers work on their own branches, their own versions of the code. Gitflow (Driessen, 2010), one of the more popular Git workflow approaches, even advocates using different branches for every feature or hotfix that needs to be implemented. Furthermore, a developer is not limited to merging in entire branches, but can also pick out those commits that seem interesting in their branch.

Sometimes, this leads to inconsistencies in the code, so called merge conflicts. Consider for example the situation in the following image where the last commit of the left branch needs to be ported to the right branch. While both branches have a common parent, they have had different changes to their source code since that point. These different changes may be inconsistent with one another.

You could place the errors that occur due to these inconsistent changes in two different categories. The first one are simple textual conflicts, occurring when the same line of source code has been changed in both branches of the code. When combining the changes, it may not be clear which version of that specific line gets priority, requiring developer intervention. The second category is more subtle and considers the changes that cause a certain change of behaviour d1 in the source branch (i.e., the original branch, in this case the left branch), but cause a different change of behaviour d2 in the target branch. This is problematic as the developer may not realise their changes have unintended side-effects until a much later point in development.

To this end, it would be useful to look not just at the difference (d1 for the source branch, d2 for the target branch) a change brings to a piece of code, but at the delta Δ(d1, d2) between the difference on each branch.

Goal & Research Activities: 

The goal of this thesis is to build tool support to help developers integrate changes from different sources. You are to look further into the problem described above from two different angles.

A first angle is from a purely syntactical level using ChangeNodes, a program developed at the Software Languages Lab by Reinout Stevens. ChangeNodes analyses two versions of a program and compares their abstract syntax trees (AST). It then describes what changed using four operations: insert, update, move, delete.

A second angle is on a behavioural level, by using a form of program analysis. An interesting starting point could be the work of Suzette Person (2008) who employs symbolic execution.

Symbolic execution (King, 1976; Cadar and Sen, 2013) is a method of statically analysing a program in order to find input that would cause the execution to follow a certain path within the program. This is done by assigning symbolic values to input. When this input is used to branch the program, for example in an if statement, constraints can be placed on the input depending on which path is followed. When reaching the end of a path, these constraints can be solved to obtain a concrete input that would guide the program down said path. Consider, for example, the following program.

  1. function sign(x) {
  2. if (x > 0) {
  3. return 1;
  4. } else if (x < 0) {
  5. return -1;
  6. } else {
  7. return 0;
  8. }
  9. }

For this program, three different paths can be distinguished depending on the value of `x`, which we assume to be numeric. Symbolic execution would use a symbolic variable X when examining this program, resulting in the following path constraints (i.e., the constraints related to a certain path). Assume RET is a special symbolic variable signifying the return value.

  1. X > 0 ∧ RET = 1
  2. X ≤ 0 ∧ X < 0 ∧ RET = -1
  3. X ≤ 0 ∧ X ≥ 0 ∧ RET = 0

Note that for the second path, X needed to first be deemed not larger than zero before reaching the second if statement.

References: 
  • Cadar, C. and Sen, K., "Symbolic Execution for Software Testing: Three Decades Later", Communications of the ACM, 2013.
  • Driessen, Vincent, "A successful Git branching model", http://nvie.com/posts/a-successful-git-branching-model/, 5 January 2010.
  • King, J. C., "Symbolic Execution and Program Testing", Communications of the ACM, 1976.
  • Person, S., Dwyer, M. B., Elbaum, S., and Pasareanu, C. S., "Differential Symbolic Execution", ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2008.
  • Skerett, Ian, "Eclipse Community Survey 2014 Results", https://ianskerrett.wordpress.com/2014/06/23/eclipse-community-survey-20..., 23 June 2014.
  • Stevens, R. and De Roover, C., "Querying the History of Software Projects using QwalKeko", IEEE International Conference on Software Maintenance and Evolution, 2014.