[PDS/HPDA Seminar] 27/1/2023 from 10:00 to 12:00 at 4A312 – Ewa Turska (reading group), Timothée Zerbib (reading group), Dimitrije Panic (reading group) and Aleksandar Maksimovic (reading group)

During the PDS/HPDA Seminar of 27/1/2023 from 10:00 to 12:00, Ewa Turska will present a reading group talk, Timothée Zerbib will present a reading group talk, Dimitrije Panic will present a reading group talk and Aleksandar Maksimovic will present a reading group talk.

Visio: https://webconf.imt.fr/frontend/fra-vcg-byn-fxd

Location: 4A312

# Reading group: Adaptive Random Forests for Evolving Data Stream Classification (ECML PKDD 2017)\n\nPresented by Ewa Turska on 27/1/2023 at 10:00. Attending this presentation is mandatory for the master students.

Paper: https://core.ac.uk/download/pdf/85165409.pdf

Full post: https://www.inf.telecom-sudparis.eu/pds/seminars_cpt/reading-group-26/

## Abstract
Random Forests is currently one of the most used machine learning algorithms in the non-streaming (batch) setting. This preference is attributable to its high learning performance and low demands with respect to input preparation and hyper-parameter tuning. However, in the challenging context of evolving data streams, there is no Random Forests algorithm that can be considered state-ofthe-art in comparison to bagging and boosting based algorithms. In this work, we present the Adaptive Random Forest (ARF) algorithm for classification of evolving data streams. In contrast to previous attempts of replicating Random Forests for data stream learning, ARF includes an effective resampling method and adaptive operators that can cope with different types of concept drifts without complex optimizations for different data sets. We present experiments with a parallel implementation of ARF which has no degradation in terms of classification performance in comparison to a serial implementation, since trees and adaptive operators are independent from one another. Finally, we compare ARF with state-of-the-art algorithms in a traditional test-then-train evaluation and a novel delayed labelling evaluation, and show that ARF is accurate and uses a feasible amount of resources

# Reading group: OS scheduling with nest: keeping tasks close together on warm cores (EuroSys’22)\n\nPresented by Timothée Zerbib on 27/1/2023 at 10:30. Attending this presentation is mandatory for the master students.

Paper: https://dl.acm.org/doi/pdf/10.1145/3492321.3519585

Full post: https://www.inf.telecom-sudparis.eu/pds/seminars_cpt/reading-group-27/

## Abstract
To best support highly parallel applications, Linux’s CFS scheduler tends to spread tasks across the machine on task creation and wakeup. It has been observed, however, that in a server environment, such a strategy leads to tasks being unnecessarily placed on long-idle cores that are running at lower frequencies, reducing performance, and to tasks being unnecessarily distributed across sockets, consuming more energy. In this paper, we propose to exploit the principle of core reuse, by constructing a nest of cores to be used in priority for task scheduling, thus obtaining higher frequencies and using fewer sockets. We implement the Nest scheduler in the Linux kernel. While performance and energy usage are comparable to CFS for highly parallel applications, for a range of applications using fewer tasks than cores, Nest improves performance 10%–2× and can reduce energy usage.

# Reading group: ePrints: A Real-Time and Scalable System for Fair Apportionment and Tracking of Personal Energy Footprints in Commercial Buildings (BuildSys’17)\n\nPresented by Dimitrije Panic on 27/1/2023 at 11:00. Attending this presentation is mandatory for the master students.

Paper: https://dl.acm.org/doi/pdf/10.1145/3137133.3137150

Full post: https://www.inf.telecom-sudparis.eu/pds/seminars_cpt/reading-group-29/

## Abstract
We propose a system that tracks each occupant’s personal share of energy use, or “energy footprint”, inside commercial building environments, and provides insights to occupants on the real-time energy impact of their actions. We propose a new space-centric policy for fair apportionment of energy in shared environments and demonstrate a method for automatically determining space-centric energy zones. We design and implement ePrints – a system for tracking personalized energy usage in real-time. ePrints supports different apportionment policies, with µs-level footprint computation time and graceful scaling with size of building, frequency of energy updates, and rate of occupant location changes. Finally, we present applications enabled by our system, such as mobile and wearable applications to provide users timely feedback on the energy impacts of their actions, as well as applications to provide energy saving suggestions and inform building-level policies.

# Reading group: Corey: An Operating System for Many Cores (OSDI’08)\n\nPresented by Aleksandar Maksimovic on 27/1/2023 at 11:30. Attending this presentation is mandatory for the master students.

Paper: https://www.usenix.org/legacy/event/osdi08/tech/full_papers/boyd-wickizer/boyd_wickizer.pdf

Full post: https://www.inf.telecom-sudparis.eu/pds/seminars_cpt/reading-group-25/

## Abstract
Multiprocessor application performance can be limited by the operating system when the application uses the operating system frequently and the operating system services use data structures shared and modified by multiple processing cores. If the application does not need the sharing, then the operating system will become an unnecessary bottleneck to the application’s performance. This paper argues that applications should control sharing: the kernel should arrange each data structure so that only a single processor need update it, unless directed otherwise by the application. Guided by this design principle, this paper proposes three operating system abstractions (address ranges, kernel cores, and shares) that allow applications to control inter-core sharing and to take advantage of the likely abundance of cores by dedicating cores to specific operating system functions. Measurements of microbenchmarks on the Corey prototype operating system, which embodies the new abstractions, show how control over sharing can improve performance. Application benchmarks, using MapReduce and a Web server, show that the improvements can be significant for overall performance: MapReduce on Corey performs 25% faster than on Linux when using 16 cores. Hardware event counters confirm that these improvements are due to avoiding operations that are expensive on multicore machines.