Lectures: content and schedule

Lectures hours: Thursday, 12-14, room P17-ASPC
Date Subject Contents / Abstract / Focus Slides Reading
Week 1 Introduction
  • Many opportunities for paralellism: hidden hardware paralellism, multicores, clusters, general-purpose computing on GPUs
  • concepts: parallel vs concurrent vs distributed
  • the goal of parallel computing: make programs run faster
  • the challenges of parallel programming: parallelize algorithm and take into account particularities of hardware. Motivating example: counting number of occurrences of a certain value in an array
intro.ppt Chapter 1 from [Lin,Snyder]
Week 2 Parallel Programming Platforms
  • some details required to use the hardware efficiently
  • cost metrics - usefull in algorithm design and analysis
  • before attempting parallelization, try to optimize serial performance taking into account systems characteristics
  • the tasks of serial and parallel optimization often have similar characteristics or approaches
  • basic concepts (dichotomy of parallel platforms, interconnection networks)
pa2.ppt Chapter 2 from [Grama,Gupta,Karypis,Kumar]
Week 3. Basic Communication Operations
  • analytical cost model for communication through message passing
  • group communication operations: semantics and importance
pa3.ppt Chapter 4 from [Grama, Gupta, Karypis, Kumar]
Week 4. Principles of Parallel Algorithm Design Slides of A.Gupta Chapter 3 from [Grama, Gupta, Karypis, Kumar]
Week 5. Parallel Programming Using the Message Passing Paradigm Slides of A.Gupta Chapter 6 from [Grama, Gupta, Karypis, Kumar]
Week 6. MPI Tutorial
Week 7. MPI Tutorial
Week 8 Analytical Modelling of parallel Systems Metrics: Paralell Runtime, Speedup, Efficiency, Total Overhead, Cost-optimality, scalability, Isoefficiency function Slides of A.Gupta Chapter 5 from [Grama, Gupta, Karypis, Kumar]
Week 9 Parallel Programming Shared Address Space Platforms
  • Basics of shared address space programming = thread programming
    • Thread APIs: operations for explicitly: creating threads, assigning work functions to threads, joining threads.
    • Syncronization: (usually) only low-level built-in mechanisms (mutex), higher-level synchronization mechanisms (ex barrier) can be defined
    • Disadvantage: Using directly a thread programming API mixes the parallel algorithm with explicit thread manipulation
  • Shared memory programming performance pitfalls: false sharing
  • Parallel languages or paralell language extensions offer higher level constructs which hide low-level thread manipulation
    • parallel iteration spaces
Slides of A.Gupta Chapter 7 from [Grama, Gupta, Karypis, Kumar]
Week 10 Dense Matrix Algorithms
  • Processing of data with regular structure, suitable for data decomposition
  • Data decomposition: input data, output data, intermediate data
  • Partitioning and distribution: 1D and 2D block, cyclic and block-cyclic
Slides of A.Gupta Chapter 8 from [Grama, Gupta, Karypis, Kumar]
Week 11. Sorting Algorithms
  • A fundamental operation for many sorting algorithms is compare-exchange
  • Parallel compare-exchange, parallel compare-split when the sorting data sequence is distributed
Slides of A.Gupta Chapter 9 from [Grama, Gupta, Karypis, Kumar]
Week 12 Graph Algorithms Slides of A.Gupta Chapter 10 from [Grama, Gupta, Karypis, Kumar]
Week 13 Search Algorithms for Discrete Optimization Problems
    Issues:
  • Load balancing
  • Termination detection
  • Work splitting
Slides of A.Gupta Chapter 11 from [Grama, Gupta, Karypis, Kumar]