Department of Computer Science
Wayne State University, Detroit, USA
Monday 20 Sept. 2010
Abstract Software systems are designed and engineered to process data. However, software is data too. The size and variety of today’s software artifacts and the multitude of stakeholder activities result in so much data that individuals can no longer reason about all of it. Software evolution is no longer just about writing code, it is becoming an information management problem.
Analysis and management of the software data are activities that software engineers are not trained to do. We have to look for solutions outside software engineering, adopt them, and make them our own. These solutions can come from data mining, information retrieval, machine learning, statistical analysis, etc. This is not the first time software engineers are looking at such solutions. It has been going on for about two decades, in a form or another. The results so far indicate that software engineering is facing a paradigm shift, where more and more software engineering tasks are reinterpreted as optimization, search, retrieval, or classification problems. Despite this experience, applications of data analysis, data integration, and data mining in software engineering are in their infancy by comparison with other research fields. New research is needed to adapt existing algorithms and tools for software engineering data and processes, and new ones will have to be created. This research has to be supported by integration with software development processes and with education as well. More than that, in order for this type of research to succeed, it should be supported with new approaches to empirical work, where data and results are shared globally among researchers and practitioners.
The talk will focus on arguing for and mapping out (part of) this research agenda, while looking back at (some of) the existing work in the area.
Biography Dr. Andrian Marcus is an Associate Professor in the Department of Computer Science at Wayne State University in Detroit, USA. He received his PhD in Computer Science from Kent State University, USA in 2003.
His research interests include software evolution, program comprehension, and software visualization, focusing on the management of unstructured information during evolution of large scale software systems.
Dr. Marcus served on the Steering Committee of the IEEE International Conference on Software Maintenance (ICSM) during 2005-2008 and on the steering Committee of the IEEE International Workshop on Visualizing Software for Understanding and Analysis (VISSOFT) during 2007-2009. He was the Program Co-Chair of the 17th IEEE International Conference on Program Comprehension (ICPC 2009), he is the Program Co-Chair of the 26th IEEE International Conference on Software Maintenance (ICSM 2010), and he will be the General Chair of the 2tth IEEE International Conference on Software Maintenance (ICSM 2011).
Dr. Marcus' publications earned a Best Dissertation Paper Award at the IEEE ICSM in 2004 and two Best Paper Awards at the IEEE ICPC in 2006 and 2007 respectively. He is also the recipient of a Fulbright Junior Research Fellowship in 1997 and the US NSF CAREER Award in 2009.
Massimiliano Di Penta
Department of Engineering
University of Sannio, Benevento, Italy
Tuesday 21 Sept. 2010
Abstract In recent and past years, there have been hundreds of studies aimed at characterizing the evolution of a software system. Many of these studies analyze the behavior of a variable over a given period of observation. How does the size of a software system evolve? What about its complexity? Does the number of defects increase over time or does it remain stable?
In some cases, studies also attempt to correlate variables, and, possibly, to build predictors upon them. This is to say, one could estimate the likelihood that a fault occurs in a class, based on some metrics the class exhibits, on the kinds of changes the class underwent. Similarly, change couplings can be inferred by observing how artifacts tend to co-change. Although in many cases we are able to obtain models ensuring good prediction performances, we are not able to claim any causal-effect relationship between our independent and dependent variables. We could easily correlate the presence of some design constructs with the change-proneness of a software component, however the same correlation could be found with the amount of good Belgian beer our developers drink. As a matter of fact, the component could undergo changes for other, external reasons.
Recent software evolution studies rely on fine-grained information mined by integrating several kinds of repositories, such as versioning systems, bug tracking systems, or mailing lists. Nowadays, many other precious sources of information, ranging from code search repositories, vulnerability databases, informal communications, and legal documents are also being considered. This would possibly aid to capture the rationale of some events occurring in a software project, and link them to statistical relations we observed.
The road towards shifting from solid empirical models towards “principles of software evolution” will likely be long and difficult, therefore we should prepare ourselves to traverse it and go as far as possible with limited damages. To do this, we need to carefully prepare our travelling equipment by paying attention at: (i) combining quantitative studies with qualitative studies, surveys, and informal interviews, (ii) relating social relations among developers with variables observed on the project, (iii) using proper statistical and machine learning techniques able to capture the temporal relation among different events, and (iv) making a massive use of natural language processing and text mining among the various sources of information available.
Biography Massimiliano Di Penta is assistant professor at the University of Sannio, Department of Engineering, Italy. He received his laurea degree in Computer Engineering in 1999 and his PhD in Computer Engineering in 2003. His research interests include software maintenance and evolution, reverse engineering, empirical software engineering, search-based software engineering service-centric software engineering. He is author of over 130 papers appeared on journals, conferences and workshops. He serves and has served in the organizing and program committees of several conferences such as ICSE, ASE, ICSM, ICPC, CSMR, GECCO, MSR, SCAM, WCRE, and many others. He has been general chair of WSE 2008, general co-chair of WCRE 2008, and program co-chair of SSBSE 2008, WCRE 2006 and 2007, IWPSE 2007, WSE 2007, SCAM 2006, STEP 2005, and of other workshops. He is steering committee member of ICPC, SCAM, CSMR, WCRE, IWPSE, and SSBSE. He is in the editorial board of the Empirical Software Engineering Journal edited by Springer. He is member of IEEE, IEEE Computer Society, and of the ACM.