WS05: ECOOP 2002 Workshop on
Benchmarks for Empirical Studies in Object-Oriented Software Evolution

University of Malaga, Spain
Monday June 10, 2002

Workshop organizers
Important dates
Introduction
Workshop participation
Open questions
References
Related events
About the organizers

ORGANIZERS

Tom Mens, Programming Technology Lab, Vrije Universiteit Brussel, Belgium
Serge Demeyer, Lab on Re-Engineering, University of Antwerp, Belgium
Michael Godfrey, Software Architecture Group, University of Waterloo, Canada
Kim Mens, Dept. of Computer Science, Université catholique de Louvain, Belgium

IMPORTANT DATES

April 15, 2002: EXTENDED DEADLINE !!! Submission of workshop participation
April 29, 2002: Notification of acceptance
May 6, 2002: ECOOP 2002 early registration deadline
June 10, 2002: The Workshop

Software evolution [1, 2] is the collection of all software development activities that are intended to generate a new software release from an earlier operational version. It includes both planned change as well as unplanned phenomena.

The study of software evolution investigates when, how and why software changes over time. This is a very active research area, as can be witnessed by the annual international workshops on principles of software evolution (IWPSE), conferences on software maintenance (ICSM and CSMR) as well as Wiley's international journal devoted to the topic. Evolution of object-oriented software also attracts wide interest, and investigates a variety of techniques to deal with evolution in presence of object-oriented constructs such as inheritance, polymorphism, frameworks, design patterns, refactorings, etc... Unfortunately, there is still a lack of scientific validation of these evolution techniques.

A case study is an instance of an empirical investigation method for providing scientific evidence concerning the applicability of a given tool or technique on a concrete software system [3, 4]. During a case study, researchers monitor the effect of applying a certain technique on a given system (the subject system) and try to assess this effect both quantitatively and qualitatively. Case studies not only illustrate the applicability of the technique for a concrete system, but also allow us to compare results from different experiments and afterwards derive more general conclusions. However, in order to compare the results of different experiments, the subject systems must be selected carefully.

The goal of this one-day workshop is to identify and agree upon a number of subject systems that can be used as a "benchmark" for the scientific investigation of software evolution. The subject systems should be representative, i.e., each of them will involve typical kinds of evolution steps (i.e. extension, correction, adaptation), each of them in a different context (i.e., characterised by life-cycle, scale and team issues). An initial proposal for such a benchmark exists [5] and various researchers have reacted enthusiastically. During this workshop, we hope to iterate and refine this proposal to reach consensus on such an object-oriented evolution benchmark, i.e., a standard selection of candidate software systems around which the software evolution community will build (incrementally, over time) a body of knowledge about their evolution.

We expect such a benchmark to be used to validate three kinds of techniques:
(a) retrospective study: verify whether a technique can reconstruct how and why a software system has evolved in the past;
(b) curative activity: verify whether a technique supports a given software evolution process (e.g., refactoring);
(c) predictive analysis: verify whether a technique may predict certain kinds of evolution based on the current state of the system (e.g., quality metrics such as evolvability, maintainability, extendibility, ...).

WORKSHOP PARTICIPATION

Solicited submissions. To ensure an active collaboration between the workshop participants, the call for participation is built up in a Q&A-style. Instead of submitting a position paper, participants should provide partial answers to a number of tentative open questions. Participants may also propose new subject systems to be included in the benchmark, or suggest interesting experiments that could be set up. Additionally, participants are invited to pose new relevant questions (preferably with a motivation and a partial answer) that seem important to address. The most relevant questions will be incorporated in a tentative list that will be distributed before the workshop so that all participants can have a look at them and form an opinion about them.

Submission format. To facilitate processing, submissions should be written in plain ASCII text (no pictures or special formatting) and should be no more than 1000 words in length. Submissions should be sent by e-mail to the following submission address ecoopws5@plg.uwaterloo.ca. The ASCII text of the submission should be directly inlined in the e-mail body. Moreover, the e-mail body should include the authors’ name, address, and affiliation. If the submission is incomplete or unclear, the workshop organisers may ask to revise it before the workshop.

During the workshop. Based on the answers gathered before the workshop, participants will discuss alternative views, further work out some partially answered questions, attempt to reach a consensus on one or more object-oriented software evolution benchmarks, and discuss some concrete experiments that could be set up.

Workshop report. As per the tradition with past ECOOP workshops, Springer-Verlag will publish the ECOOP 2002 Workshop Reader as an LNCS volume. This volume will include the report of this workshop, which will contain a synopsis of the workshop's discussions, as well as any convergences of view that took place during the workshop. This report will be written by the workshop organisers, in collaboration with the participants of the workshop.

OPEN QUESTIONS

A. Does it makes sense to define a benchmark? What are the advantages and shortcomings of using a benchmark for studying software evolution? According to the workshop organisers, the purpose of a representative set of accessible object-oriented subject systems for studying evolution is that it becomes easier to replicate results, and to compare different research techniques on the same subject systems (to find out how these techniques may complement or overlap each other). We feel that this makes a benchmark an ideal vehicle for exchanging information and experience concerning evolving software.

B. If a benchmark were available, would you use it to validate your own work? Why (not)? How?

C. Which characteristic attributes should be used to determine whether a given subject system makes a suitable candidate to be included in the benchmark? To define a benchmark, we need a clear idea of what to measure and what information about the subject systems is required to perform these measurements. Therefore, we need to define a list of subject system characteristics to serve as an instrument in selecting appropriate representatives. With such a list, we can assess whether the subject systems in the benchmark may serve as representatives for a wide range of software applications. However, care should be taken whether the instrument is accurate. In particular, we should address the question whether the list of characteristics is complete, because if it is not then we risk that the selected subject systems are not representative. Equally important is the question whether the list of characteristics is minimal, because if it is not then we risk that we must select too many systems to cover all possibilities.
A first proposal for a list of characteristics was suggested in [5]. Our initial experience with the selection of subject systems based on this list suggests that the list of characteristics is not minimal but reasonably complete. We explicitly ask the workshop participants to propose improvements to this list.

Life-cycle characteristics determine which phases in the software life-cycle are covered by the subject system: analysis (requirements specification, domain models, user interviews, mock-ups, CRC cards, use cases, ...); design (architecture, detailed design, formal specifications); implementation (source code); testing (test plans, test code, test results); maintenance (bug reports, feature requests, version control, configuration management)
Evolution characteristics assess the evolution process of the subject system, i.e., the various iterations and increments that one can identify in the development process. These evolution characteristics may include: number of iterations; total time of the evolution process; scale (size of each iteration in lines of code, number of classes, number of use cases); type of iteration (refactoring, extension, correction, adaptation); granularity of increments in terms of time (days, weeks, months), size (lines of code, pages of documentation) or components (methods, classes, modules); staff (how many persons were involved, personnel turnover between iterations, level of experience of the developers, educational background, ...); etc.
Domain characteristics qualify the domains of the subject system. We make a distinction between application domain (e.g., telecommunication, e-commerce, desktop systems), problem domain (e.g., graphical user interfaces, distributed systems, web-based systems, embedded systems, real-time systems) and solution domain (e.g., library, framework, components, program, legacy system).
Tool characteristics evaluate the kind of tools that are necessary to replicate an experiment. As such, the tool characteristics are not necessary to assess whether a subject system may serve as a representative. These tool characteristics include: implementation language (C++, Java, Smalltalk, Delphi, Ada, Eiffel, ...); analysis and design language or notations (UML, OMT, Z, VDM, statecharts, ...); operating system (Unix, Windows, MacOS, ...); integrated development and CASE environments; special libraries (CORBA, ...); extra utilities (configuration management tools, version control tools, ...); etc.
Process characteristics indicate the process that was followed when developing different releases of the system. Examples are: open source software development, eXtreme Programming, Unified Process, ...

D. How can we guarantee that potential subject systems are representative and replicable? In order for a subject system to make a meaningful candidate for inclusion in the benchmark, it should be representative and replicable.
Representative means that the subject system provides as much coverage as possible of all characteristics, so that it can be used as a representative for a wide range of evolving object-oriented software systems. This gives rise to a first subquestion: "Is a single benchmark sufficient, or do we need more than one benchmark?"
Replicable means that as much information concerning the subject system should be freely available and accessible, in order to replicate any experiments that have been performed on this system. This includes source code, documentation, analysis and design, for all releases of the system. This gives rise to an interesting subquestion: "What kind of information is needed to replicate an experiment?" The answer to this question may depend on the particular research technique one envisions, since different techniques may use different information regarding the subject system. Therefore, a related subquestion is "What are the kinds of experiments we wish to perform using the subject systems, and which information do we require from the subject systems in order to be able to carry out the experiment?"

E. Which concrete subject systems make likely candidates for inclusion in the benchmark? A selection of subject systems that are meant to represent a wide range of evolving object-oriented software systems was suggested in [5]. However, the selected systems fall a bit short: they are weak in the early life-cycle phases (little analysis or design documentation is available); they only include object-oriented implementations limited to Java, C++ and Smalltalk systems (no Ada or Eiffel, ...); they cover few application domains (only networking, graphics, and software development). Therefore, we explicitly ask the workshop participants to point out other systems that can help us to provide better coverage of all the characteristics.

Toy examples
- A Local Area Network simulation. Available in Smalltalk and Java. Both the University of Berne and the Vrije Universiteit Brussel use this example to teach object-oriented design and refactoring techniques.
- A Conduits simulation. Available in Smalltalk and Java. This example to simulate flows of fluids in pipes is used in an introductory course on object-oriented programming at the Vrije Universiteit Brussel.
Industrial systems
- VisualWorks/Smalltalk. An object-oriented programming environment for Smalltalk that is freely available for academic purposes and that features an impressive GUI-builder.
- Swing. A platform independent GUI-builder for Java. The subsequent releases of the Swing framework mark a smooth evolution process.
Semi-commercial research prototypes
- HotDraw / DrawLets. A two-dimensional graphics framework for structured drawing editors. It started off as a Smalltalk framework, but was later redone in Java.
- ET++. A two-dimensional graphics framework developed in C++ at the University of Zurich framework. It served as a "known use" for many of the GoF design patterns.
- Jun. An impressive framework for implementation of fast 3D graphics and multimedia applications. Implemented in Smalltalk and in Java.
Open source software
- Mozilla. An open-source web browser designed for standards compliance, performance and portability and partly serving as a basis for the Netscape browser. Implemented partly in C, C++ and Java.
- Squeak. An open source variant of Smalltalk, based on the Smalltalk-80 virtual machine. Written entirely in Smalltalk and with superb graphic features.
- Jikes. An open source compiler for Java written in C++.

F. Which subject systems are beyond the scope of the benchmark that we aim to define? Several classes of systems are not discussed in this proposal, including embedded and real-time systems, games, scientific computation packages, and even websites (which are clearly software systems, but quite different in flavour). Including all these classes of systems into the benchmark as well would be too ambitious. However, it might be feasible to define a separate benchmark for each class (e.g., a benchmark for studying the evolution of real-time systems).

REFERENCES

[1] Dewayne Perry. Dimensions of Software Evolution. Invited Keynote Paper, Proc. Int. Conf. Software Maintenance, Victoria, British Columbia, September 1994.
[2] M.M. Lehman and J. F. Ramil. Software Evolution. Invited Keynote Paper, Int. Workshop Principles of Software Evolution, Vienna, Austria, September 2001. (Revised and extended version of an article to appear in Marciniak J. (ed.), Encyclopedia of Software Engineering, 2nd. Ed., Wiley, 2002.)
[3] N. Fenton and S. L. Pfleeger. Software Metrics: A Rigorous and Practical Approach. 2nd edition. International Thomson Computer Press, 1997.
[4] M. V. Zelkowitz, D. R. Wallace. Experimental Models for Validating Technology. IEEE Computer, pp 23 - 31, IEEE Computer Society Press, May 1998.
[5] Serge Demeyer, Tom Mens and Michel Wermelinger. Towards a software evolution benchmark. Proc. Int. Workshop on Principles of Software Evolution, Vienna, Austria, September 2001. ACM Press, 2002.

RELATED EVENTS

Tuesday, June 11, 2002, ECOOP workshop on Unanticipated Software Evolution (USE)

ABOUT THE ORGANIZERS

Tom Mens is a postdoctoral fellow of the Fund for Scientific Research - Flanders (Belgium) since October 2000. He is associated as a computer science researcher to the Programming Technology Lab of the Vrije Universiteit Brussel, where he finished his PhD on "A Formal Foundation for Object-Oriented Evolution" in September 1999. In 1998 he was part of the ECOOP Organizing Team. He co-organised the ECOOP 2001 workshop on Object-Oriented Architectural Evolution. His main research interest lies in the use of formal techniques for improving support for software evolution, and he published several papers on this research topic. In the EMOOSE-programme (European Masters in Object-Oriented Software Engineering), jointly organised by the Vrije Universiteit Brussel (Belgium) and the Ecole des Mines de Nantes (France), he gives an advanced course on object-oriented software evolution. Finally, he is co-founder and coordinator of the Scientific Research Network on Foundations of Software Evolution.

Serge Demeyer is a professor at the Department of Mathematics and Computer Science of the University of Antwerp. His main research interest concerns software engineering (more precisely, reengineering in an object-oriented context) but due to historical reasons he maintains a heavy interest in hypermedia systems as well. He is an active member of the corresponding international research communities, serving in various conference organization and program committees. He is currently writing a book entitled "Object-Oriented Reengineering" and was the main editor of the ECOOP'98 Workshop Reader. He was co-organiser of three ECOOP workshops on Object-Orienteed Architectural Evolution. He has written a considerable amount of peer reviewed articles, some of them in highly-respected scientific journals. He completed his M.Sc. in 1987 and his Ph.D. in 1996, both at the Vrije Universiteit Brussel. After his Ph.D., he worked for three years in the University of Bern in Switzerland, where he served as a technical coordinator of a European research project.

Michael Godfrey is an assistant professor in the Department of Computer Science at the University of Waterloo in Waterloo, Ontario, Canada. He holds an NSERC Industrial Research Chair in Telecommunications Software Engineering, sponsored by Nortel Networks, NSERC, and the University of Waterloo. Prior to joining the University of Waterloo, he earned his PhD at the University of Toronto, and was subsequently a faculty member at Cornell University. Currently, he is a member of the Software Architecture Group (SWAG) and is also the director of the software engineering laboratory. His research interests include software evolution, patterns of software change, software architecture, program comprehension, software visualization, and software engineering education.

Kim Mens obtained the degrees of Licentiate in Mathematics, Licentiate in Computer Science and Doctor in Computer Science at the Vrije Universiteit Brussel. In October 2000, he obtained his PhD on "architectural conformance checking while being assigned on an industrial research project with Getronics funded by the Belgian government. After his PhD he became a post-doctoral assistent at the VUB, before starting as a computer science professor at the Université Catholique de Louvain-la-Neuve in September 2001. In addition to his current interest in "declarative meta-programming", he is one of the founding fathers of the "reuse contract" technique for automatically detecting conflicts in evolving software. He also has a strong interest in "aspect-oriented programming" and actively participated in the organisation of several workshops and conferences on that subject.

This workshop is an offical activity of the Scientific Research Network on "Foundations of Software Evolution", and is partially financed by the Fund for Scientific Research - Flanders (Belgium).

WS05: ECOOP 2002 Workshop on Benchmarks for Empirical Studies in Object-Oriented Software Evolution

University of Malaga, Spain Monday June 10, 2002

WS05: ECOOP 2002 Workshop on
Benchmarks for Empirical Studies in Object-Oriented Software Evolution

University of Malaga, Spain
Monday June 10, 2002