Date |
Subject |
Contents / Abstract / Focus |
Slides |
Reading |
Week 1
|
Introduction |
- Many opportunities for paralellism: hidden hardware paralellism,
multicores, clusters, general-purpose computing on GPUs
- concepts: parallel vs concurrent vs distributed
- the goal of parallel computing: make programs run faster
- the challenges of parallel programming: parallelize
algorithm and take into account particularities of hardware.
Motivating example:
counting number of occurrences of a certain value in an array
|
intro.ppt |
Chapter 1 from [Lin,Snyder] |
Week 2
|
Parallel Programming Platforms |
- some details required to use the hardware efficiently
- cost metrics - usefull in algorithm design and analysis
- before attempting parallelization, try to optimize serial performance
taking into account systems characteristics
- the tasks of serial and parallel optimization often have similar
characteristics or approaches
- basic concepts (dichotomy of parallel platforms, interconnection
networks)
|
pa2.ppt |
Chapter 2 from [Grama,Gupta,Karypis,Kumar]
|
Week 3.
|
Basic Communication Operations |
- analytical cost model for communication through message passing
- group communication operations: semantics and importance
|
pa3.ppt |
Chapter 4 from [Grama, Gupta, Karypis, Kumar] |
Week 4.
|
Principles of Parallel Algorithm Design |
|
Slides of A.Gupta |
Chapter 3 from [Grama, Gupta, Karypis, Kumar] |
Week 5.
|
Parallel Programming Using the Message Passing Paradigm |
|
Slides of A.Gupta |
Chapter 6 from [Grama, Gupta, Karypis, Kumar] |
Week 6.
|
MPI Tutorial |
|
|
|
Week 7.
|
MPI Tutorial |
|
|
|
Week 8
|
Analytical Modelling of parallel Systems |
Metrics: Paralell Runtime, Speedup, Efficiency, Total Overhead, Cost-optimality, scalability, Isoefficiency function |
Slides of A.Gupta |
Chapter 5 from [Grama, Gupta, Karypis, Kumar] |
Week 9
|
Parallel Programming Shared Address Space Platforms |
- Basics of shared address space programming = thread programming
- Thread APIs: operations for explicitly: creating threads, assigning
work
functions to threads, joining threads.
- Syncronization: (usually) only low-level built-in mechanisms (mutex),
higher-level synchronization mechanisms (ex barrier) can be defined
- Disadvantage: Using directly a thread programming API mixes the
parallel
algorithm with explicit thread manipulation
- Shared memory programming performance pitfalls: false sharing
- Parallel languages or paralell language extensions offer higher
level constructs which hide low-level thread manipulation
- parallel iteration spaces
|
Slides of A.Gupta |
Chapter 7 from [Grama, Gupta, Karypis, Kumar] |
Week 10
|
Dense Matrix Algorithms |
- Processing of data with regular structure, suitable for data
decomposition
- Data decomposition: input data, output data, intermediate data
- Partitioning and distribution: 1D and 2D block, cyclic and
block-cyclic
|
Slides of A.Gupta |
Chapter 8 from [Grama, Gupta, Karypis, Kumar] |
Week 11.
|
Sorting Algorithms |
- A fundamental operation for many sorting algorithms is
compare-exchange
- Parallel compare-exchange, parallel compare-split when the sorting
data sequence is distributed
|
Slides of A.Gupta |
Chapter 9 from [Grama, Gupta, Karypis, Kumar] |
Week 12
|
Graph Algorithms |
|
Slides of A.Gupta |
Chapter 10 from [Grama, Gupta, Karypis, Kumar] |
Week 13
|
Search Algorithms for Discrete Optimization Problems |
Issues:
- Load balancing
- Termination detection
- Work splitting
|
Slides of A.Gupta
|
Chapter 11 from [Grama, Gupta, Karypis, Kumar]
|