During the PDS/HPDA Seminar of 6/1/2023 from 10:00 to 12:00, Jana Toljaga will present a reading group talk, Victor Laforet will present a reading group talk, Thomas Collignon will present a reading group talk and Sahar Boussoukaya will present a reading group talk.
Visio: https://webconf.imt.fr/frontend/fra-vcg-byn-fxd
Location: 4A312
# Reading group: DUNE: Safe User-level Access to Privileged CPU Features (OSDI’12)\n\nPresented by Jana Toljaga on 6/1/2023 at 10:00. Attending this presentation is mandatory for the master students.
Paper: https://www.usenix.org/conference/osdi12/technical-sessions/presentation/belay
Full post: https://www.inf.telecom-sudparis.eu/pds/seminars_cpt/reading-group-19/
## Abstract
Dune is a system that provides applications with direct but safe access to hardware features such as ring protection, page tables, and tagged TLBs, while preserving the existing OS interfaces for processes. Dune uses the virtualization hardware in modern processors to provide a process, rather than a machine abstraction. It consists of a small kernel module that initializes virtualization hardware and mediates interactions with the kernel, and a user-level library that helps applications manage privileged hardware features. We present the implementation of Dune for 64bit x86 Linux. We use Dune to implement three userlevel applications that can benefit from access to privileged hardware: a sandbox for untrusted code, a privilege separation facility, and a garbage collector. The use of Dune greatly simplifies the implementation of these applications and provides significant performance advantages.
# Reading group: Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask (SOSPâ13)\n\nPresented by Victor Laforet on 6/1/2023 at 10:30. Attending this presentation is mandatory for the master students.
Paper: https://dl.acm.org/doi/pdf/10.1145/2517349.2522714
Full post: https://www.inf.telecom-sudparis.eu/pds/seminars_cpt/reading-group-20/
## Abstract
This paper presents the most exhaustive study of synchronization to date. We span multiple layers, from hardware cache-coherence protocols up to high-level concurrent software. We do so on different types of architectures, from single-socket â uniform and nonuniform â to multi-socket â directory and broadcastbased â many-cores. We draw a set of observations that, roughly speaking, imply that scalability of synchronization is mainly a property of the hardware.
# Reading group: ITAP: Idle-Time-Aware Power Management for GPU Execution Units (ACM TACO’19)\n\nPresented by Thomas Collignon on 6/1/2023 at 11:00. Attending this presentation is mandatory for the master students.
Paper: https://dl.acm.org/doi/pdf/10.1145/3291606
Full post: https://www.inf.telecom-sudparis.eu/pds/seminars_cpt/reading-group-21/
## Abstract
Graphics Processing Units (GPUs) are widely used as the accelerator of choice for applications with massively data-parallel tasks. However, recent studies show that GPUs suffer heavily from resource underutilization, which, combined with their large static power consumption, imposes a significant power overhead. One of the most power-hungry components of a GPUâthe execution unitsâfrequently experience idleness when (1) an underutilized warp is issued to the execution units, leading to partial lane idleness, and (2) there is no active warp to be issued for the execution due to warp stalls (e.g., waiting for memory access and synchronization). Although large in total, the idle time of execution units actually comes from short but frequent stalls, leaving little potential for common power saving techniques, such as power-gating. In this article, we propose ITAP, a novel idle-time-aware power management technique, which aims to effectively reduce the static energy consumption of GPU execution units. By taking advantage of different power management techniques (i.e., power-gating and different levels of voltage scaling), ITAP employs three static power reduction modes with different overheads and capabilities of static power reduction. ITAP estimates the idle period length of execution units using prediction and peek-ahead techniques in a synergistic way and then applies the most appropriate static power reduction mode based on the estimated idle period length. We design ITAP to be power-aggressive or performance-aggressive, not both at the same time. Our experimental results on several workloads show that the power-aggressive design of ITAP outperforms the state-of-the-art solution by an average of 27.6% in terms of static energy savings, with less than 2.1% performance overhead. However, the performance-aggressive design of ITAP improves the static energy savings by an average of 16.9%, while keeping the GPU performance almost unaffected (i.e., up to 0.4% performance overhead) compared to the state-of-the-art static energy savings mechanism.
# Reading group: From Laptop to Lambda: outsourcing everyday jobs to thousands of transient functional containers (USENIX ATC’19)\n\nPresented by Sahar Boussoukaya on 6/1/2023 at 11:30. Attending this presentation is mandatory for the master students.
Paper: https://cs.stanford.edu/~matei/papers/2019/usenix_atc_gg.pdf
Full post: https://www.inf.telecom-sudparis.eu/pds/seminars_cpt/reading-group-24/
## Abstract
We present gg, a framework and a set of command-line tools that helps people execute everyday applicationsâe.g., software compilation, unit tests, video encoding, or object recognitionâusing thousands of parallel threads on a cloudfunctions service to achieve near-interactive completion times. In the future, instead of running these tasks on a laptop, or keeping a warm cluster running in the cloud, users might push a button that spawns 10,000 parallel cloud functions to execute a large job in a few seconds from start. gg is designed to make this practical and easy. With gg, applications express a job as a composition of lightweight OS containers that are individually transient (lifetimes of 1â60 seconds) and functional (each container is hermetically sealed and deterministic). gg takes care of instantiating these containers on cloud functions, loading dependencies, minimizing data movement, moving data between containers, and dealing with failure and stragglers. We ported several latency-sensitive applications to run on gg and evaluated its performance. In the best case, a distributed compiler built on gg outperformed a conventional tool (icecc) by 2â5Ă, without requiring a warm cluster running continuously. In the worst case, gg was within 20% of the hand-tuned performance of an existing tool for video encoding (ExCamera).