Quand : le lundi 6 février 2023, à 10h00
Où : Amphi C06, à Télécom SudParis, 9 rue Charles Fourrier, Evry
Titre : Network Dataset Quality Assessment with Permutation Testing
ML models can only be as good as the datasets they are trained on. The problem of the lack of high-quality network datasets has been mentioned many times in papers. The quality of datasets is difficult to assess, but also to define. What does it mean that a dataset is of high quality? Generally, a dataset is said to be of high quality if it meets the requirements for its intended use. In the convention of this ambiguity, I would like to introduce the PerQoDA methodology, which evaluates the dataset in terms of the relationship between observations and labels in a classification problem. This is just one aspect of the problem of assessing the quality of datasets, but it highlights its problematic nature and complexity.
Katarzyna Wasielewska received the M.Sc. degree in computer science at the Faculty of Mathematics and Computer Science, Nicolaus Copernicus University (NCU), Torun, Poland, and the Ph.D. degree in telecommunications at the Faculty of Telecommunications, Information Technology and Electrical Engineering, UTP University of Science and Technology, Bydgoszcz, Poland. She has been awarded the Marie Sklodowska-Curie Actions Individual Fellowships (MSCA) program. She is currently a Postdoctoral Researcher at the Department of Signal Theory, Networking and Communications and researcher in the Information and Communication Technologies Research Centre (CITIC) at the University of Granada, Spain. Her research interests include cybersecurity, network security, machine learning, multivariate data analysis, and dataset quality problem. She has ten years of experience as an ISP Network Administrator.