Lectures: content and schedule

Lectures hours: Thursday, 12-14, room P17-ASPC

Date	Subject	Contents / Abstract / Focus	Slides	Reading
Week 1	Introduction	Many opportunities for paralellism: hidden hardware paralellism, multicores, clusters, general-purpose computing on GPUs concepts: parallel vs concurrent vs distributed the goal of parallel computing: make programs run faster the challenges of parallel programming: parallelize algorithm and take into account particularities of hardware. Motivating example: counting number of occurrences of a certain value in an array	intro.ppt	Chapter 1 from [Lin,Snyder]
Week 2	Parallel Programming Platforms	some details required to use the hardware efficiently cost metrics - usefull in algorithm design and analysis before attempting parallelization, try to optimize serial performance taking into account systems characteristics the tasks of serial and parallel optimization often have similar characteristics or approaches basic concepts (dichotomy of parallel platforms, interconnection networks)	pa2.ppt	Chapter 2 from [Grama,Gupta,Karypis,Kumar]
Week 3.	Basic Communication Operations	analytical cost model for communication through message passing group communication operations: semantics and importance	pa3.ppt	Chapter 4 from [Grama, Gupta, Karypis, Kumar]
Week 4.	Principles of Parallel Algorithm Design		Slides of A.Gupta	Chapter 3 from [Grama, Gupta, Karypis, Kumar]
Week 5.	Parallel Programming Using the Message Passing Paradigm		Slides of A.Gupta	Chapter 6 from [Grama, Gupta, Karypis, Kumar]
Week 6.	MPI Tutorial
Week 7.	MPI Tutorial
Week 8	Analytical Modelling of parallel Systems	Metrics: Paralell Runtime, Speedup, Efficiency, Total Overhead, Cost-optimality, scalability, Isoefficiency function	Slides of A.Gupta	Chapter 5 from [Grama, Gupta, Karypis, Kumar]
Week 9	Parallel Programming Shared Address Space Platforms	Basics of shared address space programming = thread programming Thread APIs: operations for explicitly: creating threads, assigning work functions to threads, joining threads. Syncronization: (usually) only low-level built-in mechanisms (mutex), higher-level synchronization mechanisms (ex barrier) can be defined Disadvantage: Using directly a thread programming API mixes the parallel algorithm with explicit thread manipulation Shared memory programming performance pitfalls: false sharing Parallel languages or paralell language extensions offer higher level constructs which hide low-level thread manipulation parallel iteration spaces	Slides of A.Gupta	Chapter 7 from [Grama, Gupta, Karypis, Kumar]
Week 10	Dense Matrix Algorithms	Processing of data with regular structure, suitable for data decomposition Data decomposition: input data, output data, intermediate data Partitioning and distribution: 1D and 2D block, cyclic and block-cyclic	Slides of A.Gupta	Chapter 8 from [Grama, Gupta, Karypis, Kumar]
Week 11.	Sorting Algorithms	A fundamental operation for many sorting algorithms is compare-exchange Parallel compare-exchange, parallel compare-split when the sorting data sequence is distributed	Slides of A.Gupta	Chapter 9 from [Grama, Gupta, Karypis, Kumar]
Week 12	Graph Algorithms		Slides of A.Gupta	Chapter 10 from [Grama, Gupta, Karypis, Kumar]
Week 13	Search Algorithms for Discrete Optimization Problems	Issues: Load balancing Termination detection Work splitting	Slides of A.Gupta	Chapter 11 from [Grama, Gupta, Karypis, Kumar]