Commit Untangling Using Fine-Grained Source Code Changes
It is a best practice that a single commit corresponds to a single development activity, such as adding a single feature, fixing a bug, performing a refactoring. This has several benefits: a commit is easier to understand, to revert in case it was unneeded, to apply on a different branch or port of the system etc. In open-source development of large systems contributed patches are often only accepted if they adhere to this best practice. Unfortunately, most commits do not adhere to this best practice. A solution to this problem is to have an automated approach that splits an single commit into multiple ones, in which each new commit only contains a single development activity. Several approaches exist that do this by defining relationships between the modified lines in a commit. For example, if one line uses a particular variable and a different line introduced that variable both lines correspond to the same development activity.
In this proposal we are interested in detecting whether using more fine-grained source code changes can increase the accuracy of such an approach. Fine-grained source code changes concern the operations that were performed on the level of the Abstract Syntax Tree (AST). Such operations can be the insertion of a particular node, an update of the value of a node, the removal of a subtree and the move of a node to a different location. In order to retrieve such fine-grained source code changes either a change logger or a change distiller can be used. The former records all operations a developer performs inside his IDE, while the latter is an algorithmic approach that takes two revisions of a file as input and outputs a list of changes. The advantage of a logger is that it produces an accurate overview of the actions performed by the developer, but it needs to be installed before performing the changes. The advantage of a distiller is that it can be used on a software project that is stored in a version control system, but it can contain unexpected changes due to its algorithmic nature. We have developed such a change distilling algorithm that works for Java code stored inside Eclipse. On top of the output of this algorithm we have already added a change dependency system that stores syntactical dependencies across changes. As a result we can split a change sequence into different subsequences that can be applied independently from each other. Such a dependency system can be a starting point to perform change untangling.
The goal of this thesis is to add semantic dependencies on top of distilled changes. An example of such a dependency could be that in order to insert a method invocation that the called method must be present. Such a semantic dependency system models its dependencies as a graph, in which a node corresponds to a single change, and two nodes are connected if there is a semantic dependency between these two changes. This graph can then be queried to detect interesting change clusters. For example, if the graph consists out of different components then each component could correspond to a single development activity. Other properties could also be used, such as identifying bridges and inspecting what kind of changes correspond to such a bridge. Finally, an empirical study can be performed to evaluate the usage of fine-grained source code changes over line changes in the context of commit untangling. To this end, the data set of existing studies can be used as a starting point.
- Martín Dias, Alberto Bacchelli, Georgios Gousios, Damien Cassou, and Stéphane Ducasse. Untangling fine-grained code changes. Proceedings of the 22nd Interna- tional Conference on Software Analysis, Evolution, and Reengineering (SANER15), 2015.
- Mike Barnett, Christian Bird, João Brunet, and Shuvendu K. Lahiri. Helping de- velopers help themselves: Automatic decomposition of code review changesets. In Proceedings of the 37th International Conference on Software Engineering (ICSE15), 2015.
- Beat Fluri, Michael Würsch, Martin Pinzger, and Harald C. Gall. Change distilling: Tree differencing for fine-grained source code change extraction. Transactions on Software Engineering, 2007.
- Peter Ebraert, Jorge Vallejos, Pascal Costanza, Ellen Van Paesschen, and Theo D’Hondt. Change-oriented software engineering. In Proceedings of the 2007 International Conference on Dynamic languages (ICDL07), 2007.
- Yida Toa and Kim, Sunghun. Partitioning Composite Code Changes to Facilitate Code Review. In Proceedings of the 12th Working Conference on Mining Software Repositories (MSR15), 2015