{"id":7579,"date":"2026-02-23T16:00:57","date_gmt":"2026-02-23T15:00:57","guid":{"rendered":"https:\/\/samovar.telecom-sudparis.eu\/?p=7579"},"modified":"2026-02-23T16:00:57","modified_gmt":"2026-02-23T15:00:57","slug":"avis-de-soutenance-de-monsieur-guanlin-li","status":"publish","type":"post","link":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/2026\/02\/23\/avis-de-soutenance-de-monsieur-guanlin-li\/","title":{"rendered":"AVIS DE SOUTENANCE de Monsieur Guanlin LI"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">L&rsquo;Ecole doctorale : Ecole Doctorale de l&rsquo;Institut Polytechnique de Paris<br><br>et le Laboratoire de recherche SAMOVAR &#8211; Services r\u00e9partis, Architectures, Mod\u00e9lisation, Validation, Administration des R\u00e9seaux<\/h2>\n\n\n\n<p><br>pr\u00e9sentent<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">l\u2019AVIS DE SOUTENANCE de Monsieur Guanlin LI<\/h2>\n\n\n\n<p><br>Autoris\u00e9 \u00e0 pr\u00e9senter ses travaux en vue de l\u2019obtention du Doctorat de l&rsquo;Institut Polytechnique de Paris, pr\u00e9par\u00e9 \u00e0 T\u00e9l\u00e9com SudParis en :<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Informatique<\/h2>\n\n\n\n<h1 class=\"wp-block-heading\">\u00ab M\u00e9thodes frugales en donn\u00e9es pour l\u2019acquisition lexicale personnalis\u00e9e \u00bb<\/h1>\n\n\n\n<p><br>le JEUDI 5 MARS 2026 \u00e0 9h00<br><br>\u00e0<br><br>Online<\/p>\n\n\n\n<p><strong>Teams meeting link:<\/strong>&nbsp;<\/p>\n\n\n\n<p><a href=\"https:\/\/teams.microsoft.com\/l\/meetup-join\/19%3ameeting_OGRmNGEyZjQtMmJiZi00MzZiLWJiYTgtZjM2YTcwMzU0NDll%40thread.v2\/0?context=%7b%22Tid%22%3a%22d47b090e-3f5a-4ca0-84d0-9f89d269f175%22%2c%22Oid%22%3a%227990e1ec-3c91-4936-880d-061e6fd518f2%22%7d\">https:\/\/teams.microsoft.com\/l\/meetup-join\/19%3ameeting_OGRmNGEyZjQtMmJiZi00MzZiLWJiYTgtZjM2YTcwMzU0NDll%40thread.v2\/0?context=%7b%22Tid%22%3a%22d47b090e-3f5a-4ca0-84d0-9f89d269f175%22%2c%22Oid%22%3a%227990e1ec-3c91-4936-880d-061e6fd518f2%22%7d<\/a><\/p>\n\n\n\n<p><br><strong>Meeting ID:<\/strong>&nbsp;<a href=\"callto:346 577 915 845 60\" target=\"_blank\" rel=\"noreferrer noopener\">346 577 915 845 60<\/a><br><strong>Passcode:<\/strong>&nbsp;iP3LJ2u5<\/p>\n\n\n\n<p><br><br><strong><u>Membres du jury :<\/u><\/strong><\/p>\n\n\n\n<p><strong>M. Noel&nbsp;CRESPI<\/strong>, Full professor, T\u00e9l\u00e9com SudParis, FRANCE &#8211; Directeur de these<br><strong>M. Reyer&nbsp;ZWIGGELAAR<\/strong>, Full professor, Aberystwyth University, ROYAUME-UNI &#8211; Rapporteur<br><strong>Mme Claire&nbsp;GARDENT<\/strong>, Directrice de recherche, LORIA, CNRS and Universit\u00e9 de Lorraine, FRANCE &#8211; Examinatrice<br><strong>M. Fran\u00e7ois&nbsp;YVON<\/strong>, Directeur de recherche, ISIR, CNRS and Sorbonne Universit\u00e9, FRANCE &#8211; Rapporteur<br><strong>Mme Yuki&nbsp;ARASE<\/strong>, Full professor, Institute of Science Tokyo, JAPON &#8211; Examinatrice<br><strong>Mme Phi Le &nbsp;NGUYEN<\/strong>, Ma\u00eetresse de conf\u00e9rences, Hanoi University of Science and Technology, Acting Director, Institute for AI Innovation and Societal Impact (AI4LIFE) VIETNAM &#8211; Examinatrice<br><strong>M. Cheng-Zhong&nbsp;XU<\/strong>, Full professor, IEEE Fellow, University of Macau, CHINE &#8211; Examinateur<br><strong>Mme Praboda&nbsp;RAJAPAKSHA<\/strong>, Ma\u00eetresse de conf\u00e9rences, Aberystwyth University, ROYAUME-UNI &#8211; Co-encadrant de these<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u00ab M\u00e9thodes frugales en donn\u00e9es pour l\u2019acquisition lexicale personnalis\u00e9e \u00bb<\/h2>\n\n\n\n<h2 class=\"wp-block-heading\">pr\u00e9sent\u00e9 par Monsieur Guanlin LI<\/h2>\n\n\n\n<p><br><strong>R\u00e9sum\u00e9 :<\/strong><\/p>\n\n\n\n<p>Cette th\u00e8se \u00e9tudie des m\u00e9thodes computationnelles \u00e9conomes en donn\u00e9es pour l\u2019apprentissage personnalis\u00e9 d\u2019une langue seconde (SLA), avec un accent sur l\u2019acquisition du vocabulaire chez les apprenants d\u00e9butants. Bien que les grands mod\u00e8les de langage (LLM) pr\u00e9sentent de fortes capacit\u00e9s g\u00e9n\u00e9ralistes, leur application directe au SLA est limit\u00e9e par le manque d\u2019adaptation au domaine, la raret\u00e9 des donn\u00e9es annot\u00e9es d\u2019apprenants et le besoin de personnalisation fine. Cette th\u00e8se aborde ces d\u00e9fis \u00e0 travers trois axes compl\u00e9mentaires : (i) le transfert translingue robuste, (ii) l\u2019adaptation des entr\u00e9es textuelles au niveau de comp\u00e9tence de l\u2019apprenant et (iii) la pr\u00e9diction personnalis\u00e9e des connaissances lexicales. Premi\u00e8rement, le transfert translingue z\u00e9ro-shot est \u00e9tudi\u00e9 comme fondation des syst\u00e8mes SLA destin\u00e9s \u00e0 des apprenants aux langues premi\u00e8res vari\u00e9es. Nous proposons CoNLST, une m\u00e9thode en deux \u00e9tapes combinant apprentissage n\u00e9gatif contrastif et auto-apprentissage pour exploiter les \u00e9chantillons de faible confiance dans la langue cible. L\u2019utilisation de labels compl\u00e9mentaires am\u00e9liore la calibration et r\u00e9duit le biais de confirmation, tandis qu\u2019un auto-apprentissage \u00e0 seuils dynamiques, coupl\u00e9 \u00e0 l\u2019augmentation de donn\u00e9es, exploite efficacement les pseudo-labels fiables. Des exp\u00e9riences sur XNLI avec mBERT et XLM-R montrent des gains constants par rapport \u00e0 l\u2019auto-apprentissage standard et une compatibilit\u00e9 avec les approches par alignement, produisant des repr\u00e9sentations multilingues plus robustes pour des t\u00e2ches translingues en aval. Deuxi\u00e8mement, une m\u00e9thode de simplification contr\u00f4l\u00e9e de textes est propos\u00e9e afin d\u2019aligner les contenus linguistiques sur le niveau de comp\u00e9tence de l\u2019apprenant. Motiv\u00e9e par l\u2019id\u00e9e que l\u2019acquisition pr\u00e9coce d\u2019une langue seconde devrait s\u2019appuyer sur un input riche, fr\u00e9quent et adapt\u00e9 au niveau, similaire \u00e0 celui de l\u2019acquisition de la langue maternelle, cette approche vise \u00e0 g\u00e9n\u00e9rer un input propice \u00e0 une acquisition naturaliste. La m\u00e9thode repose sur l\u2019apprentissage par renforcement, sans corpus parall\u00e8le complexe\u2013simple, et formule la simplification comme un probl\u00e8me de recherche avec anticipation. Des r\u00e9compenses aux niveaux du token et de la phrase permettent d\u2019optimiser un LLM sur ses propres g\u00e9n\u00e9rations. L\u2019approche am\u00e9liore de plus de 20 % la couverture du vocabulaire cible sur CEFR-SP et TurkCorpus, sans d\u00e9gradation de la qualit\u00e9 de simplification. Troisi\u00e8mement, la th\u00e8se \u00e9tudie la pr\u00e9diction personnalis\u00e9e de la connaissance lexicale (VKP), qui vise \u00e0 estimer si un apprenant conna\u00eet un mot donn\u00e9 \u00e0 partir de tr\u00e8s peu de r\u00e9ponses annot\u00e9es. Pour pallier la raret\u00e9 des donn\u00e9es, un pipeline de simulation d\u2019apprenants fond\u00e9 sur les LLM est propos\u00e9 afin de g\u00e9n\u00e9rer des r\u00e9ponses synth\u00e9tiques pour diff\u00e9rentes cohortes de comp\u00e9tence. Sur cette base, un mod\u00e8le multi-t\u00e2ches exploitant des caract\u00e9ristiques s\u00e9mantiques et des r\u00e9gularit\u00e9s de cohortes est d\u00e9velopp\u00e9, surpassant nettement les approches fond\u00e9es sur la fr\u00e9quence et les m\u00e9thodes \u00e0 traits manuels, en particulier en r\u00e9gime de donn\u00e9es tr\u00e8s limit\u00e9es. Une analyse en apprentissage actif montre en outre que des strat\u00e9gies de requ\u00eate adapt\u00e9es sont particuli\u00e8rement efficaces dans ces contextes. Dans l\u2019ensemble, ces travaux proposent une perspective unifi\u00e9e sur des m\u00e9thodes de NLP personnalis\u00e9es et \u00e9conomes en donn\u00e9es pour le SLA. La th\u00e8se met en \u00e9vidence des avanc\u00e9es qui renforcent la robustesse multilingue, permettent la g\u00e9n\u00e9ration d\u2019input lexicalement adapt\u00e9 et soutiennent une mod\u00e9lisation fine des apprenants sous fortes contraintes de donn\u00e9es, tout en soulignant le potentiel des LLM comme base pour de futurs syst\u00e8mes d\u2019apprentissage des langues sensibles aux besoins individuels.<\/p>\n\n\n\n<p><br><strong>Abstract :<\/strong><\/p>\n\n\n\n<p>This thesis investigates data-efficient computational methods for personalized second language acquisition (SLA), with a focus on vocabulary learning for early-stage learners. Although large language models (LLMs) exhibit strong general capabilities, their direct application to SLA is constrained by limited domain adaptation, scarce labeled learner data, and the need for fine-grained personalization. To address these challenges, the thesis develops three complementary lines of work: (i) robust cross-lingual transfer, (ii) proficiency-aligned input adaptation, and (iii) personalized vocabulary knowledge prediction. First, zero-shot cross-lingual transfer is studied as a foundational requirement for SLA systems serving learners with diverse first-language backgrounds. A two-stage method, CoNLST, is proposed, combining contrastive negative learning with self-training to exploit low-confidence target-language samples. Contrastive learning improves calibration and reduces confirmation bias, while dynamically thresholded self-training with data augmentation leverages high-confidence pseudo-labels. Experiments on XNLI with mBERT and XLM-R demonstrate consistent gains over standard self-training and compatibility with alignment-based methods, yielding stronger multilingual representations for downstream tasks, including low-resource SLA applications. Second, a controlled text simplification method is introduced to align input with learner proficiency. Motivated by the view that early-stage SLA should approximate first-language acquisition through rich, high-frequency, level-appropriate input, the method aims to generate proficiency-aligned text without parallel simplification corpora. The approach formulates simplification as a lookahead search problem and applies reinforcement learning with token- and sentence-level rewards to optimize an LLM on its own generations. The method improves target-level vocabulary coverage by over 20% on CEFR-SP and TurkCorpus, while maintaining simplification quality, demonstrating data-efficient control of learner-appropriate input. Third, the thesis addresses personalized vocabulary knowledge prediction (VKP), which estimates whether an individual learner knows a word given very limited annotations. To mitigate data scarcity, an LLM-based learner simulation pipeline generates synthetic response patterns across proficiency levels. A multi-task model leveraging both semantic features and cohort-level patterns is then proposed, substantially outperforming frequency-based and feature-engineered baselines, particularly in low-data regimes. Active learning analyses further show that principled querying is especially effective when learner observations are extremely limited. Overall, the thesis presents a unified framework for data-efficient, personalized NLP methods for SLA, advancing multilingual robustness, proficiency-aware input generation, and fine-grained learner modeling under severe data constraints. The findings highlight the potential of LLM-based approaches for adaptive language learning systems and provide a principled, frequency-based foundation for future research.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>L&rsquo;Ecole doctorale : Ecole Doctorale de l&rsquo;Institut Polytechnique de Paris et le Laboratoire de recherche SAMOVAR &#8211; Services r\u00e9partis, Architectures, Mod\u00e9lisation, Validation, Administration des R\u00e9seaux pr\u00e9sentent l\u2019AVIS DE SOUTENANCE de Monsieur Guanlin LI Autoris\u00e9 \u00e0 pr\u00e9senter ses travaux en vue de l\u2019obtention du Doctorat de l&rsquo;Institut Polytechnique de Paris, pr\u00e9par\u00e9 \u00e0 T\u00e9l\u00e9com SudParis en : [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ocean_post_layout":"","ocean_both_sidebars_style":"","ocean_both_sidebars_content_width":0,"ocean_both_sidebars_sidebars_width":0,"ocean_sidebar":"","ocean_second_sidebar":"","ocean_disable_margins":"enable","ocean_add_body_class":"","ocean_shortcode_before_top_bar":"","ocean_shortcode_after_top_bar":"","ocean_shortcode_before_header":"","ocean_shortcode_after_header":"","ocean_has_shortcode":"","ocean_shortcode_after_title":"","ocean_shortcode_before_footer_widgets":"","ocean_shortcode_after_footer_widgets":"","ocean_shortcode_before_footer_bottom":"","ocean_shortcode_after_footer_bottom":"","ocean_display_top_bar":"default","ocean_display_header":"default","ocean_header_style":"","ocean_center_header_left_menu":"","ocean_custom_header_template":"","ocean_custom_logo":0,"ocean_custom_retina_logo":0,"ocean_custom_logo_max_width":0,"ocean_custom_logo_tablet_max_width":0,"ocean_custom_logo_mobile_max_width":0,"ocean_custom_logo_max_height":0,"ocean_custom_logo_tablet_max_height":0,"ocean_custom_logo_mobile_max_height":0,"ocean_header_custom_menu":"","ocean_menu_typo_font_family":"","ocean_menu_typo_font_subset":"","ocean_menu_typo_font_size":0,"ocean_menu_typo_font_size_tablet":0,"ocean_menu_typo_font_size_mobile":0,"ocean_menu_typo_font_size_unit":"px","ocean_menu_typo_font_weight":"","ocean_menu_typo_font_weight_tablet":"","ocean_menu_typo_font_weight_mobile":"","ocean_menu_typo_transform":"","ocean_menu_typo_transform_tablet":"","ocean_menu_typo_transform_mobile":"","ocean_menu_typo_line_height":0,"ocean_menu_typo_line_height_tablet":0,"ocean_menu_typo_line_height_mobile":0,"ocean_menu_typo_line_height_unit":"","ocean_menu_typo_spacing":0,"ocean_menu_typo_spacing_tablet":0,"ocean_menu_typo_spacing_mobile":0,"ocean_menu_typo_spacing_unit":"","ocean_menu_link_color":"","ocean_menu_link_color_hover":"","ocean_menu_link_color_active":"","ocean_menu_link_background":"","ocean_menu_link_hover_background":"","ocean_menu_link_active_background":"","ocean_menu_social_links_bg":"","ocean_menu_social_hover_links_bg":"","ocean_menu_social_links_color":"","ocean_menu_social_hover_links_color":"","ocean_disable_title":"default","ocean_disable_heading":"default","ocean_post_title":"","ocean_post_subheading":"","ocean_post_title_style":"","ocean_post_title_background_color":"","ocean_post_title_background":0,"ocean_post_title_bg_image_position":"","ocean_post_title_bg_image_attachment":"","ocean_post_title_bg_image_repeat":"","ocean_post_title_bg_image_size":"","ocean_post_title_height":0,"ocean_post_title_bg_overlay":0.5,"ocean_post_title_bg_overlay_color":"","ocean_disable_breadcrumbs":"default","ocean_breadcrumbs_color":"","ocean_breadcrumbs_separator_color":"","ocean_breadcrumbs_links_color":"","ocean_breadcrumbs_links_hover_color":"","ocean_display_footer_widgets":"default","ocean_display_footer_bottom":"default","ocean_custom_footer_template":"","ocean_post_oembed":"","ocean_post_self_hosted_media":"","ocean_post_video_embed":"","ocean_link_format":"","ocean_link_format_target":"self","ocean_quote_format":"","ocean_quote_format_link":"post","ocean_gallery_link_images":"on","ocean_gallery_id":[],"footnotes":""},"categories":[286,402],"tags":[],"class_list":["post-7579","post","type-post","status-publish","format-standard","hentry","category-fractualites-ennews-fr","category-seminaires-ness-2013-fr","entry"],"_links":{"self":[{"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/posts\/7579","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/comments?post=7579"}],"version-history":[{"count":1,"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/posts\/7579\/revisions"}],"predecessor-version":[{"id":7580,"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/posts\/7579\/revisions\/7580"}],"wp:attachment":[{"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/media?parent=7579"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/categories?post=7579"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/tags?post=7579"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}