Mining Continuously Integrated Software Repositories for Merge Conflicts
Most major software projects use version control, like Subversion or Git. Especially Git, a distributed version control system, has seen growth in recent years (Skerett, 2014). Git encourages a workflow in which developers work on their own branches, their own versions of the code. Gitflow (Driessen, 2010), one of the more popular Git workflow approaches, even advocates using different branches for every feature or hotfix that needs to be implemented.
A major part of this workflow is the necessity to merge different branches together. This means that the changes made in one branch, the source branch, have to be transferred to another branch, the target branch. Most often, these branches have seen different changes since they separated and some of these changes may be incompatible. In this case, the merging may lead to merge conflicts, leaving the project in a non-working state after merging.
These conflicts could be separated into two categories: textual and higher-order (Brun et al., 2011). Textual conflicts are easy to spot in that your version control software will complain about them when trying to merge two branches together. They occur when, for example, two developers in their respective branches make changes to the same piece of code. Often the version control software does not know which version of the code would take priority and a merge conflict occurs, requiring a developer to manually fix it. The higher-order conflicts occur after a seemingly successful textual merge which leaves the project in a broken state. This could mean the project fails to build or the tests do not pass.
This proposal falls under the category of "mining software repositories" which focuses on the data available in software repositories in hopes of discovering information upon which further research can be based.
Brun et al. (2011, 2013) previously looked into how often these conflicts appear when merging. The dataset used in this study looked at nine open source projects. For this thesis proposal we suggest doing a similar study, but on a larger dataset. This would be performed by mining GitHub, a site hosting many Git repositories. The idea would be to gather projects using Travis CI, a platform performing builds and tests on every commit of a project a vital part of the process known as continuous delivery (Humble and Farley, 2010). By using this information from Travis one would thus learn about the potential higher-order conflicts present due to merging, enabling the conducting of an empirical study similar to Brun et al.
- Brun, Y., Holmes, R., Ernst, M. D., and Notkin, D., "Proactive Detection of Collaboration Conflicts", Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (ESEC/FSE), 2011.
- Brun, Y., Holmes, R., Ernst, M. D., and Notkin, D., "Early Detection of Collaboration Conflicts and Risks", IEEE Transactions on Software Engineering, 2013.
- Driessen, Vincent, "A successful Git branching model", http://nvie.com/posts/a-successful-git-branching-model/, 5 January 2010.
- GitHub, https://github.com/.
- Humble, J. and Farley, D., "Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation", Addison-Wesley, 2010.
- Skerett, Ian, "Eclipse Community Survey 2014 Results", https://ianskerrett.wordpress.com/2014/06/23/eclipse-community-survey-20..., 23 June 2014.
- Travis CI, "Travis CI - Test and Deploy Your Code With Confidence", https://travis-ci.org/.