{"id":5647,"date":"2023-03-16T15:42:08","date_gmt":"2023-03-16T14:42:08","guid":{"rendered":"https:\/\/samovar.telecom-sudparis.eu\/?p=5647"},"modified":"2023-03-16T15:42:10","modified_gmt":"2023-03-16T14:42:10","slug":"avis-de-soutenance-de-madame-abir-fathallah","status":"publish","type":"post","link":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/2023\/03\/16\/avis-de-soutenance-de-madame-abir-fathallah\/","title":{"rendered":"AVIS DE SOUTENANCE de Madame Abir FATHALLAH"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">L&rsquo;Ecole doctorale : Ecole Doctorale de l&rsquo;Institut Polytechnique de Paris<br>et le Laboratoire de recherche SAMOVAR &#8211; Services r\u00e9partis, Architectures, MOd\u00e9lisation, Validation, Administration des R\u00e9seaux<\/h2>\n\n\n\n<p>pr\u00e9sentent<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">l\u2019AVIS DE SOUTENANCE de Madame Abir FATHALLAH<\/h2>\n\n\n\n<p>Autoris\u00e9e \u00e0 pr\u00e9senter ses travaux en vue de l\u2019obtention du Doctorat de l&rsquo;Institut Polytechnique de Paris, pr\u00e9par\u00e9 \u00e0 T\u00e9l\u00e9com SudParis en :<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Signal, Images, Automatique et robotique<\/h2>\n\n\n\n<h1 class=\"wp-block-heading\">\u00ab Contributions \u00e0 l&rsquo;indexation de documents historiques arabes \u00e0 l&rsquo;aide d&rsquo;approches d&rsquo;apprentissage profond \u00bb<\/h1>\n\n\n\n<p>le LUNDI 20 MARS 2023 \u00e0 10h00<\/p>\n\n\n\n<p>Amphi 3<br>19 Place Marguerite Perey 91120 Palaiseau<\/p>\n\n\n\n<p><strong>Membres du jury :<\/strong><\/p>\n\n\n\n<p><strong>Mme Laurence\u00a0LIKFORMAN-SULEM<\/strong>, Ma\u00eetresse de Conf\u00e9rences, T\u00e9l\u00e9com Paris, FRANCE &#8211; Rapporteure<br><strong>M. Mohamed Adel\u00a0ALIMI<\/strong>, Professeur, \u00c9cole nationale d&rsquo;ing\u00e9nieurs de Sfax, TUNISIE &#8211; Rapporteur<br><strong>Mme Afef\u00a0KACEM<\/strong>, Ma\u00eetresse de conf\u00e9rences, \u00c9cole sup\u00e9rieure des Sciences et techniques de Tunis, TUNISIE &#8211; Examinatrice<br><strong>M. Mehdi\u00a0AMMI<\/strong>, Professeur, Universit\u00e9 Paris 8, FRANCE &#8211; Examinateur<br><strong>Mme Najoua\u00a0ESSOUKRI BEN AMARA<\/strong>, Professeure, Ecole Nationale d&rsquo;Ing\u00e9nieurs de Sousse, TUNISIE &#8211; Directrice de th\u00e8se<br><strong>M. Moun\u00eem A. \u00a0EL YACOUBI<\/strong>, Professeur, T\u00e9l\u00e9com SudParis, FRANCE &#8211; Directeur de th\u00e8se<\/p>\n\n\n\n<p><br><strong>R\u00e9sum\u00e9 :<\/strong><\/p>\n\n\n\n<p>Avec les \u00e9normes progr\u00e8s technologiques de ces derni\u00e8res ann\u00e9es, la quantit\u00e9 de documents historiques num\u00e9ris\u00e9s, tant manuscrits qu&rsquo;imprim\u00e9s, a consid\u00e9rablement augment\u00e9. Il est \u00e9vident que les documents historiques num\u00e9riques ne sont pas faciles \u00e0 traiter dans leur forme originale, mais ils doivent \u00eatre transform\u00e9s en une forme lisible afin d&rsquo;\u00eatre compris automatiquement par les outils de vision par ordinateur. Le rep\u00e9rage de mots est une t\u00e2che importante pour comprendre et exploiter le contenu des documents en cr\u00e9ant des index. Il s&rsquo;agit d&rsquo;une technique de recherche d&rsquo;informations qui vise \u00e0 identifier toutes les occurrences d&rsquo;un mot de requ\u00eate dans un ensemble de documents (par exemple, un livre). Dans la t\u00e2che de rep\u00e9rage de mots, l&rsquo;entr\u00e9e est un ensemble de documents non index\u00e9s et la sortie est une liste de mots class\u00e9s en fonction de leur similarit\u00e9 avec le mot de requ\u00eate. Cela permet un acc\u00e8s en ligne rapide et facile aux documents du patrimoine culturel et offre d&rsquo;autres possibilit\u00e9s d&rsquo;\u00e9tudier ces ressources. La pr\u00e9sente th\u00e8se de doctorat porte sur le probl\u00e8me du rep\u00e9rage des mots dans les documents historiques. La premi\u00e8re contribution de ce travail est le d\u00e9veloppement d&rsquo;un espace de repr\u00e9sentation d&rsquo;images de mots bas\u00e9 sur la combinaison de r\u00e9seaux convolutifs et de pertes de triplets. Ensuite, les distances de similarit\u00e9 sont appliqu\u00e9es pour \u00e9tablir une correspondance entre les mots de la requ\u00eate et tous les mots pr\u00e9sents dans les documents historiques. La deuxi\u00e8me contribution de cette th\u00e8se pr\u00e9sente une m\u00e9thode am\u00e9lior\u00e9e de construction d&rsquo;un espace de repr\u00e9sentation pour un mod\u00e8le de rep\u00e9rage de mots gr\u00e2ce \u00e0 l&rsquo;adoption de plusieurs strat\u00e9gies d&rsquo;am\u00e9lioration. Ces strat\u00e9gies comprennent des \u00e9tapes de pr\u00e9traitement, l&rsquo;apprentissage par transfert, l&rsquo;extraction de triplets en ligne et des techniques de s\u00e9lection de triplets semi-durs. La troisi\u00e8me contribution vise \u00e0 am\u00e9liorer les performances de rep\u00e9rage des mots en d\u00e9veloppant un mod\u00e8le conditionnel g\u00e9n\u00e9ratif bas\u00e9 sur un r\u00e9seau adversatif pour g\u00e9n\u00e9rer des images de documents propres \u00e0 partir d&rsquo;images fortement d\u00e9grad\u00e9es. Ce mod\u00e8le d&rsquo;am\u00e9lioration traite de diverses t\u00e2ches de d\u00e9gradation telles que les filigranes et la d\u00e9gradation chimique, dans le but de produire des images de documents hyper-propres et des performances de r\u00e9cup\u00e9ration de d\u00e9tails fins. Dans la derni\u00e8re contribution, nous proposons l&rsquo;utilisation d&rsquo;une architecture de Vision Transformer pour la g\u00e9n\u00e9ration de repr\u00e9sentations mot-image. L&rsquo;approche utilise la perte de triplets comme crit\u00e8re d&rsquo;optimisation et incorpore l&rsquo;apprentissage par transfert de deux domaines distincts pour am\u00e9liorer la performance de la repr\u00e9sentation mot-image. Toutes ces contributions sont \u00e9valu\u00e9es sur de nombreuses bases de donn\u00e9es publiques qui fournissent diff\u00e9rents d\u00e9fis de documents historiques. Les r\u00e9sultats exp\u00e9rimentaux obtenus dans la t\u00e2che de rep\u00e9rage de mots pour les documents historiques se comparent favorablement \u00e0 de nombreuses m\u00e9thodes r\u00e9centes de l&rsquo;\u00e9tat de l&rsquo;art.<\/p>\n\n\n\n<p><br><strong>Abstract : \u00ab\u00a0Contributions to the indexing of Arabic historical documents using deep learning approaches\u00a0\u00bb<\/strong><\/p>\n\n\n\n<p>With the enormous technological advances of recent years, the amount of digitized historical documents, both handwritten and printed, has increased. It is well known that digital historical documents are not easily processed in their original form, but they need to be transformed into a readable form in order to be automatically understood by computer vision tools. Word spotting is an important task to understand and exploit document contents by creating indexes. It is an information retrieval technique that aims to identify all occurrences of a query word in a set of documents (for example, a book). In the word spotting task, the input is a set of unindexed documents and the output is a ranked list of words according to their similarity to the query word. This allows quick and easy online access to cultural heritage materials and provides further opportunities to investigate these resources. The present PhD thesis investigates the problem of word spotting in historical documents. The first contribution of this work is the development of embedding space for word image representation based on the combination of convolutional networks and triplet loss. Subsequently, similarity distances are employed to match query words with all words present in the historical documents. The second contribution of this thesis presents an improved method for constructing an embedding space for a word spotting model through the adoption of multiple enhancement strategies. These strategies include preprocessing steps, transfer learning, online triplet mining, and semi-hard triplet selection techniques. The third contribution aims to enhance word spotting performance by developing a conditional generative adversarial network-based model for generating clean document images from highly degraded images. This enhancement model addresses various degradation tasks such as watermarks and chemical degradation, with the goal of producing hyper-clean document images and fine detail recovery performance. In the final contribution, we propose the utilization of a vision transformer architecture for the generation of word-image representations. The approach utilizes triplet loss as the optimization criterion and incorporates transfer learning from two distinct domains to improve the performance of the word-image representation. All these contributions are evaluated on many public databases that provide different challenges of historical documents. The obtained experimental results in the word spotting task for historical documents compare favorably with many recent state-of-the-art methods.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>L&rsquo;Ecole doctorale : Ecole Doctorale de l&rsquo;Institut Polytechnique de Pariset le Laboratoire de recherche SAMOVAR &#8211; Services r\u00e9partis, Architectures, MOd\u00e9lisation, Validation, Administration des R\u00e9seaux pr\u00e9sentent l\u2019AVIS DE SOUTENANCE de Madame Abir FATHALLAH Autoris\u00e9e \u00e0 pr\u00e9senter ses travaux en vue de l\u2019obtention du Doctorat de l&rsquo;Institut Polytechnique de Paris, pr\u00e9par\u00e9 \u00e0 T\u00e9l\u00e9com SudParis en : Signal, [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ocean_post_layout":"","ocean_both_sidebars_style":"","ocean_both_sidebars_content_width":0,"ocean_both_sidebars_sidebars_width":0,"ocean_sidebar":"0","ocean_second_sidebar":"0","ocean_disable_margins":"enable","ocean_add_body_class":"","ocean_shortcode_before_top_bar":"","ocean_shortcode_after_top_bar":"","ocean_shortcode_before_header":"","ocean_shortcode_after_header":"","ocean_has_shortcode":"","ocean_shortcode_after_title":"","ocean_shortcode_before_footer_widgets":"","ocean_shortcode_after_footer_widgets":"","ocean_shortcode_before_footer_bottom":"","ocean_shortcode_after_footer_bottom":"","ocean_display_top_bar":"default","ocean_display_header":"default","ocean_header_style":"","ocean_center_header_left_menu":"0","ocean_custom_header_template":"0","ocean_custom_logo":0,"ocean_custom_retina_logo":0,"ocean_custom_logo_max_width":0,"ocean_custom_logo_tablet_max_width":0,"ocean_custom_logo_mobile_max_width":0,"ocean_custom_logo_max_height":0,"ocean_custom_logo_tablet_max_height":0,"ocean_custom_logo_mobile_max_height":0,"ocean_header_custom_menu":"0","ocean_menu_typo_font_family":"0","ocean_menu_typo_font_subset":"","ocean_menu_typo_font_size":0,"ocean_menu_typo_font_size_tablet":0,"ocean_menu_typo_font_size_mobile":0,"ocean_menu_typo_font_size_unit":"px","ocean_menu_typo_font_weight":"","ocean_menu_typo_font_weight_tablet":"","ocean_menu_typo_font_weight_mobile":"","ocean_menu_typo_transform":"","ocean_menu_typo_transform_tablet":"","ocean_menu_typo_transform_mobile":"","ocean_menu_typo_line_height":0,"ocean_menu_typo_line_height_tablet":0,"ocean_menu_typo_line_height_mobile":0,"ocean_menu_typo_line_height_unit":"","ocean_menu_typo_spacing":0,"ocean_menu_typo_spacing_tablet":0,"ocean_menu_typo_spacing_mobile":0,"ocean_menu_typo_spacing_unit":"","ocean_menu_link_color":"","ocean_menu_link_color_hover":"","ocean_menu_link_color_active":"","ocean_menu_link_background":"","ocean_menu_link_hover_background":"","ocean_menu_link_active_background":"","ocean_menu_social_links_bg":"","ocean_menu_social_hover_links_bg":"","ocean_menu_social_links_color":"","ocean_menu_social_hover_links_color":"","ocean_disable_title":"default","ocean_disable_heading":"default","ocean_post_title":"","ocean_post_subheading":"","ocean_post_title_style":"","ocean_post_title_background_color":"","ocean_post_title_background":0,"ocean_post_title_bg_image_position":"","ocean_post_title_bg_image_attachment":"","ocean_post_title_bg_image_repeat":"","ocean_post_title_bg_image_size":"","ocean_post_title_height":0,"ocean_post_title_bg_overlay":0.5,"ocean_post_title_bg_overlay_color":"","ocean_disable_breadcrumbs":"default","ocean_breadcrumbs_color":"","ocean_breadcrumbs_separator_color":"","ocean_breadcrumbs_links_color":"","ocean_breadcrumbs_links_hover_color":"","ocean_display_footer_widgets":"default","ocean_display_footer_bottom":"default","ocean_custom_footer_template":"0","ocean_post_oembed":"","ocean_post_self_hosted_media":"","ocean_post_video_embed":"","ocean_link_format":"","ocean_link_format_target":"self","ocean_quote_format":"","ocean_quote_format_link":"post","ocean_gallery_link_images":"off","ocean_gallery_id":[],"footnotes":""},"categories":[286,169],"tags":[],"class_list":["post-5647","post","type-post","status-publish","format-standard","hentry","category-fractualites-ennews-fr","category-seminaires-armedia","entry"],"_links":{"self":[{"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/posts\/5647","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/comments?post=5647"}],"version-history":[{"count":1,"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/posts\/5647\/revisions"}],"predecessor-version":[{"id":5648,"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/posts\/5647\/revisions\/5648"}],"wp:attachment":[{"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/media?parent=5647"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/categories?post=5647"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/samovar.telecom-sudparis.eu\/index.php\/wp-json\/wp\/v2\/tags?post=5647"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}