Department of Computer Science, Vrije Universiteit Brussel
Instructor: Jennifer B. Sartor
When: | Tuesdays 1-3pm |
Where: | D.1.05, T.2.201, T.1.111 SOFT at VUB |
Instructor Email: | jsartor@soft.vub.ac.be |
Question and Answer Time: | By Appointment, F.10.739 |
As the computer industry has grown over the last 50 years, the focal point has often been performance. Machines have gotten faster and faster, smaller and smaller. We have much larger and more complex pieces of software now than we did 40 years ago. This course will be about de-mystifying how to evaluate and analyze the performance of computer programs, particularly those written in managed languages. We will look at the software stack, and discuss how each of the layers -- the application, the compiler, the (potential) runtime environment, the operating system, the processor, and the memory system -- all impact the end performance of a program. We will look at some tools that shed light on why a program gets the execution time it does - analyzing its memory behavior or in what methods the bottlenecks might be. These are profiling or performance monitoring tools that are relatively easy to use to analyze performance. We will talk about metrics and statistics - once you run a program or a suite of programs, how do you accurately present and summarize results about their performance? A part of this is understanding the managed runtime running underneath a Java application, and how the dynamic compiler or memory manager impacts performance. Finally, we will touch on how the multicore era has also influenced the industry, and talk about the opportunities and limitations programming in parallel bring to performance. We will include a discussion on heterogeneous hardware, as well, and how some computer architects have built simulators to help design future processors that are both well-performing and energy-efficient.
Here is a preliminary list of topics to be discussed:
Of the layers of the software/system stack, we will focus more on the runtime environment and memory system than other layers. We will touch on details about adaptive or dynamic compilers that are used in managed runtime environments. We will also discuss processors and how they execute instructions. Understanding how they work at a high level helps to understand how an application achieves the performance it does. Also, the memory system is clearly linked to the processor and has a huge impact on overall performance. So we will focus on how caches are laid out and used, and how a runtime environment lays out memory and reclaims dead memory. To gather information about the hardware's behavior, we will explore hardware performance counters, and use tools that can read them and summarize results from them.
Two important aspects of analyzing performance are 1) how to properly set up experiments to measure and evaluate what you want to show, and 2) how to present results if you do perform experiments. We will discuss proper experimental methodology (for Java applications), such as how to eliminate non-determinism and exploring the space-time tradeoff of garbage collection. I will also teach good practices on how to repeat experiments and take averages over a set of results, and present confidence intervals in graphs.
Finally, we will explore a bit of how parallel programming affects performance. We'll assume a shared-memory model. We will discuss speedup and Amdahl's Law. We'll talk a little about cache coherence and false sharing. We'll talk a bit about the OS's job scheduling tasks. We will also talk briefly about the hardware reordering instructions and memory consistency, if time allows. We will also touch on power as a prime concern, which led to the rise of heterogeneous machines.
Date | Material | Assignments |
---|---|---|
25 Sept 11:00-13:00 D.1.05 | Introduction and the evolution of processors (to superscalar and out-of-order) | |
2 Oct D.1.05 | How caches and the memory subsystem work | |
9 Oct D.1.05 | Performance analysis tools and a bit about methodology | Project 1: Matrix multiplication, due Wednesday 31 October 23:59. Use your own machine or ask for a login to Serenity, a server within SOFT. |
16 Oct D.1.05 1pm to 5pm |
Analytical modeling of an out-of-order processor, CPI stacks AND Simulators, Heterogeneous Machines (a bit about in-order processors), Power |
Project 2: Use a simulator (through VirtualBox), due Wednesday 21 November 23:59 |
23 Oct | NO CLASS (Instructor traveling) | |
30 Oct T.2.201 | Language virtual machine or runtime environment, dynamic compilers | |
6 Nov T.2.201 | Garbage collection | |
13 Nov T.2.201 | Experimental methodology for managed languages, and data presentation/analysis | Project 3: Using Jikes Research Virtual Machine to measure garbage collectors, due Wednesday 12 December at 23:59 |
20 Nov | NO CLASS (VUB closed) | |
27 Nov T.1.111 | Multi-threaded processors, Amdahl's Law, Cache Coherence, Races, Parallel Allocators | |
4 Dec T.2.201 | Cilk Plus Concurrency Platform | Project 4: Multi-threaded project - Parallelize matrix multiplication and a screensaver with Cilk Plus, due Monday 7 January at 23:59 |
11 Dec T.2.201 | Memory Consistency, Matrix-matrix multiplication | |
18 Dec T.2.201 | Compiler optimizations + In-order processors | |
?? Jan | ORAL EXAM |
I expect advanced programming skills. In particular, I assume you have basic knowledge of object-oriented programming in a mainstream OO language such as C++/Java/C#. We will focus on Java in this course. I will also use code examples in C, and will assume you will be able to understand the code. You will have to write your own basic C programs, or extend existing C code that I give you as a starting point.
I will teach you the basics of how modern processors work, and the memory subsystem, so only minimal computer architecture knowledge is expected. However, it will be useful to have some system-level background from a prior course. Prior statistics classes could be useful for the part of the class where we talk about how to present results, but are not required. It will give students an advantage to have already studied a bit about the internals of compilers, or the interpretation of computer programs. I will go into some details of dynamic and adaptive compilers in a runtime environment, but will assume you have a basic knowledge of compilers and routine optimizations that they perform.
The main study material used in this course are slides, which can be downloaded from Pointcarre. For some lectures, I also provide relevant pointers to the literature (mostly to articles, books or papers freely available online). Reading material will be listed below, which you are encouraged to read. Some papers could be required reading (which will be clearly specified).
I will be taking content from the following online course
taught at MIT called Performance
Engineering of Software Systems [Saman Amarasinghe,
and Charles Leiserson. 6.172 Performance Engineering of Software
Systems, Fall 2010. (Massachusetts Institute of Technology: MIT
OpenCourseWare), http://ocw.mit.edu (Accessed). License: Creative
Commons BY-NC-SA].
I will also borrow material from Professor Lieven Eeckhout at Ghent University, and from his two books:
I will also be using material from other professor's courses: Dr. Kathryn McKinley's Memory Management course, and Dr. Kathryn McKinley's Advanced Compilers course.
For the theoretical part of this course, students are expected to attend the lectures and are encouraged to read the reading material, if it is listed. Although this course is 6 credits, we won't have a fixed lab scheduled. If there are readings listed, they will count as your extra work for those weeks. The practical part of this course will be done on your own, and you will have to complete 4 projects throughout the semester. The projects, roughly, will be over the following topics:
The grade for this course will be divided as follows. Each of the 4 projects will be 15% of your grade, for a total of 60% during the semester. These projects will be completed individually. You are required to turn in all 4 projects to pass this class! You are also required to take the oral examination at the end to pass. Your final exam will then be worth 40% of your total grade. To pass the entire course, you are required to pass the project part of the course (60%) as well as the exam portion of the course (40%).
You are required to do your own work. Don't copy code or answers from projects. It's okay to talk about general concepts or algorithms, or even benchmark sets for the projects, but don't share pseudocode or code or data. You can talk about problems you are having on assignments, but do not show code or your project output data to classmates to get help. Either use tools or ask the instructor for more specific help. The best way to do this is to avoid talking to others about the program while you are at the computer. If you have questions on what constitutes cheating or questions about this policy, please talk to the instructor.