Below is a list of my publications organized by date. You can also find my Google Scholar profile.
2025
-
ParaRev: Building a dataset for Scientific Paragraph Revision annotated with revision instruction.
Léane Jourdan, Nicolas Hernandez, Richard Dufour, Florian Boudin, Akiko Aizawa.
International Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WR-AI-CogS).
[data] -
ACL-rlg: A Dataset for Reading List Generation.
Julien Aubert-Béduchaud, Florian Boudin, Béatrice Daille and Richard Dufour.
International Conference on Computational Linguistics (COLING).
[data]
2024
-
Self-Compositional Data Augmentation for Scientific Keyphrase Generation.
Maël Houbre, Florian Boudin, Béatrice Daille, Akiko Aizawa.
Joint Conference on Digital Libraries (JCDL).
[arXiv] -
Unsupervised Domain Adaptation for Keyphrase Generation using Citation Contexts.
Florian Boudin, Akiko Aizawa.
Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings.
[paper, bib, arXiv, code, dataset] -
Automatically Suggesting Diverse Example Sentences for L2 Japanese Learners Using Pre-Trained Language Models.
Enrico Benedetti, Akiko Aizawa, Florian Boudin.
Association for Computational Linguistics (ACL): Student Research Workshop.
[paper, bib, code] -
CASIMIR: A Corpus of Scientific Articles enhanced with Multiple Author-Integrated Revisions.
Léane Jourdan, Florian Boudin, Nicolas Hernandez, Richard Dufour.
Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING).
[paper, bib, arXiv, dataset] -
A Survey of Pre-trained Language Models for Processing Scientific Text.
Xanh Ho, Anh Khoa Duong Nguyen, An Tuan Dao, Junfeng Jiang, Yuki Chida, Kaito Sugimoto, Huy Quoc To, Florian Boudin, Akiko Aizawa.
[github, arXiv]
2023
-
Text revision in Scientific Writing Assistance: A Review.
Léane Jourdan, Florian Boudin, Nicolas Hernandez, Richard Dufour.
International Workshop on Bibliometric-enhanced Information Retrieval (BIR).
[paper, arXiv] -
CASIMIR: un Corpus d’Articles Scientifiques Intégrant les ModIfications et Révisions des auteurs.
Léane Jourdan, Florian Boudin, Nicolas Hernandez, Richard Dufour.
Atelier sur l’Analyse et Recherche de Textes Scientifiques (ARTS).
[paper, bib] -
Classification de relation pour la génération de mots-clés absents.
Maël Houbre, Florian Boudin, Béatrice Daille.
Atelier sur l’Analyse et Recherche de Textes Scientifiques (ARTS).
[paper, bib] -
Projet NaviTerm: navigation terminologique pour une montée en compétence rapide et personnalisée sur un domaine de recherche.
Florian Boudin, Richard Dufour, Béatrice Daille.
Atelier sur l’Analyse et Recherche de Textes Scientifiques (ARTS).
[paper, bib] -
Analyse et indexation de textes scientifiques.
Florian Boudin.
Habilitation à Diriger les Recherches (HDR).
2022
-
A large-scale dataset for biomedical keyphrase generation.
Maël Houbre, Florian Boudin, Béatrice Daille.
International Workshop on Health Text Mining and Information Analysis (LOUHI).
[paper, bib, code, dataset] -
Cross-lingual and Cross-domain Transfer Learning for Automatic Term Extraction from Low Resource Data.
Amir Hazem, Mérieme Bouhandi, Florian Boudin, Beatrice Daille.
Language Resources and Evaluation Conference (LREC).
[paper, bib] -
From Fundamentals to Recent Advances: A Tutorial on Keyphrasification.
Rui Meng, Debanjan Mahata, Florian Boudin.
Half-day tutorial at the European Conference on Information Retrieval (ECIR).
[website] -
Extraction and evaluation of formulaic expressions used in scholarly papers.
Kenichi Iwatsuki, Florian Boudin, Akiko Aizawa.
Expert Systems with Applications.
[paper]
2021
-
Redefining Absent Keyphrases and their Effect on Retrieval Effectiveness.
Florian Boudin, Ygor Gallina.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
[paper, bib, arXiv, code, dataset] -
ACM-CR: A Manually Annotated Test Collection for Citation Recommendation.
Florian Boudin.
Joint Conference on Digital Libraries (JCDL).
[arXiv, dataset]
2020
-
Keyphrase Generation for Scientific Document Retrieval.
Florian Boudin, Ygor Gallina, Akiko Aizawa.
Association for Computational Linguistics (ACL).
[paper, bib, video, code] -
Large-Scale Evaluation of Keyphrase Extraction Models.
Ygor Gallina, Florian Boudin, Béatrice Daille.
Joint Conference on Digital Libraries (JCDL).
[paper, arXiv, code, dataset] -
The DELICES Project: Indexing Scientific Literature Through Semantic Expansion.
Florian Boudin, Béatrice Daille, Evelyne Jacquey, Jian-Yun Nie.
Joint Conference of the Information Retrieval Communities in Europe (CIRCLE).
[paper] -
An Evaluation Dataset for Identifying Communicative Functions of Sentences in English Scholarly Papers.
Kenichi Iwatsuki, Florian Boudin, Akiko Aizawa.
Language Resources and Evaluation Conference (LREC).
[paper, bib, dataset] -
TermEval 2020: TALN-LS2N System for Automatic Term Extraction.
Amir Hazem, Mérieme Bouhandi, Florian Boudin, Beatrice Daille.
6th International Workshop on Computational Terminology (CompuTerm).
[paper, bib]
2019
-
KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents.
Ygor Gallina, Florian Boudin, Béatrice Daille.
International Conference on Natural Language Generation (INLG).
[paper, bib, arXiv, dataset] -
DeFT 2019: Auto-encodeurs, Gradient Boosting et combinaisons de modèles pour l’identification automatique de mots-clés.
Mérième Bouhandi, Florian Boudin, Ygor Gallina.
Défi Fouille de Textes (DEFT).
[paper, bib]
2018
- Unsupervised Keyphrase Extraction with Multipartite Graphs.
Florian Boudin.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
[paper, bib, arXiv, code]
2017
-
Modélisation à base de graphe pour l’indexation en domaines de spécialité.
Adrien Bougouin, Florian Boudin, Béatrice Daille.
Recherche d’information, document et web sémantique.
[paper, bib] -
Présentation et résultats du défi fouille de textes DEFT 2016.
Béatrice Daille, Sabine Barreaux, Adrien Bougouin, Florian Boudin, Damien Cram, Amir Hazem.
Recherche d’information, document et web sémantique.
[paper, bib]
2016
-
How Document Pre-processing affects Keyphrase Extraction Performance.
Florian Boudin, Hugo Mougard and Damien Cram.
Workshop on Noisy User-generated Text (WNUT).
[paper, bib, arXiv, code, dataset] -
pke: an open source python-based keyphrase extraction toolkit.
Florian Boudin.
International Conference on Computational Linguistics (COLING), demonstration papers.
[paper, bib, code] -
Keyphrase Annotation with Graph Co-Ranking.
Adrien Bougouin, Florian Boudin, Béatrice Daille.
International Conference on Computational Linguistics (COLING).
[paper, bib] -
TermITH-Eval: a French Standard-Based Resource for Keyphrase Extraction Evaluation.
Adrien Bougouin, Sabine Barreaux, Laurent Romary, Florian Boudin, Béatrice Daille.
Language Resources and Evaluation Conference (LREC).
[paper, bib, dataset] -
Modélisation unifiée du document et de son domaine pour une indexation par termes-clés libre et contrôlée.
Adrien Bougouin, Florian Boudin, Béatrice Daille.
Traitement Automatique des Langues Naturelles (TALN).
[paper, bib] -
Indexation d’articles scientifiques : Présentation et résultats du défi fouille de textes DEFT 2016.
Béatrice Daille, Sabine Barreaux, Florian Boudin, Adrien Bougouin, Damien Cram, Amir Hazem.
Défi Fouille de Textes (DEFT).
[paper] -
TopicRank en domaines de spécialité : participation du LINA à DEFT 2016.
Adrien Bougouin, Florian Boudin, Béatrice Daille.
Défi Fouille de Textes (DEFT).
[paper]
2015
-
Concept-based Summarization using Integer Linear Programming: From Concept Pruning to Multiple Optimal Solutions.
Florian Boudin, Hugo Mougard, Benoit Favre.
Conference on Empirical Methods in Natural Language Processing (EMNLP).
[paper, bib] -
Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming.
Florian Boudin.
Workshop on Novel Computational Approaches to Keyphrase Extraction.
[paper, bib] -
LINA: Identifying Comparable Documents from Wikipedia.
Emmanuel Morin, Amir Hazem, Florian Boudin, Elizaveta Loginova-Clouet.
Eighth Workshop on Building and Using Comparable Corpora (BUCC).
[paper, bib]
2014
-
TopicRank : ordonnancement de sujets pour l’extraction automatique de termes-clés.
Adrien Bougouin, Florian Boudin.
Traitement Automatique des Langues.
[paper] -
De quoi parle ce Tweet? Résumer Wikipédia pour contextualiser des microblogs.
Romain Deveaud, Florian Boudin.
The Information - Intelligence - Interaction (I3) Journal.
[paper] -
Label Pre-annotation for Building Non-projective Dependency Treebanks for French.
Ophélie Lacroix, Denis Béchet, Florian Boudin.
Conference on Intelligent Text Processing and Computational Linguistics (CICLing).
[paper] -
Influence des domaines de spécialité dans l’extraction de termes-clés.
Adrien Bougouin, Florian Boudin, Béatrice Daille.
Traitement Automatique des Langues Naturelles (TALN).
[paper, bib]
2013
-
TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction.
Adrien Bougouin, Florian Boudin, Béatrice Daille.
International Joint Conference on Natural Language Processing (IJCNLP).
[paper, bib] -
A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction.
Florian Boudin.
International Joint Conference on Natural Language Processing (IJCNLP).
[paper, bib] -
Keyphrase Extraction for N-best Reranking in Multi-Sentence Compression.
Florian Boudin, Emmanuel Morin.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
[paper, bib, dataset, code] -
TALN Archives : une archive numérique francophone des articles de recherche en Traitement Automatique de la Langue.
Florian Boudin.
Traitement Automatique des Langues Naturelles (TALN).
[paper, bib, dataset] -
Construction d’un large corpus écrit libre annoté morpho-syntaxiquement en français.
Nicolas Hernandez, Florian Boudin.
Traitement Automatique des Langues Naturelles (TALN).
[paper, bib] -
Contextualisation automatique de Tweets à partir de Wikipédia.
Romain Deveaud, Florian Boudin.
Conférence en Recherche d’Information et Applications (CORIA).
[paper] -
Effective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization.
Romain Deveaud, Florian Boudin.
INitiative for the Evaluation of XML Retrieval (INEX).
[paper]
2012
-
Using a Medical Thesaurus to Predict Query Difficulty.
Florian Boudin, Jian-Yun Nie, Martin Dawes.
European Conference on Information Retrieval (ECIR).
[paper, bib] -
Détection et correction automatique d’erreurs d’annotation morpho-syntaxique du French TreeBank.
Florian Boudin, Nicolas Hernandez.
Traitement Automatique des Langues Naturelles (TALN).
[paper, bib] -
LIA/LINA at the INEX 2012 Tweet Contextualization track.
Romain Deveaud, Florian Boudin.
INitiative for the Evaluation of XML Retrieval (INEX).
[paper] -
Participation du LINA à DEFT 2012.
Florian Boudin, Amir Hazem, Nicolas Hernandez, Prajol Shrestha.
Défi Fouille de Textes (DEFT).
[paper, bib]
2011
-
A Graph-based Approach to Cross-language Multi-document Summarization.
Florian Boudin, Stéphane Huet, Juan-Manuel Torres-Moreno.
Conference on Intelligent Text Processing and Computational Linguistics (CICLing).
[paper] -
Utilisation d’un score de qualité de traduction pour le résumé multi-document cross-lingue.
Stéphane Huet, Florian Boudin, Juan-Manuel Torres-Moreno.
Traitement Automatique des Langues Naturelles (TALN).
[paper, bib] -
Correction de césures et enrichissement de requêtes pour la recherche de livres.
Romain Deveaud, Florian Boudin, Eric SanJuan, Patrice Bellot.
Conférence en Recherche d’Information et Applications (CORIA).
[paper] -
LIA at INEX 2010 Book Track.
Romain Deveaud, Florian Boudin, Patrice Bellot.
INitiative for the Evaluation of XML Retrieval (INEX).
[paper, bib]
2010
-
Combining classifiers for robust PICO element detection.
Florian Boudin, Jian-Yun Nie, Joan Bartlett, Roland Grad, Pierre Pluye, Martin Dawes.
BMC Medical Informatics and Decision Making.
[paper, ris] -
Positional Language Models for Clinical Information Retrieval.
Florian Boudin, Jian-Yun Nie, Martin Dawes.
Conference on Empirical Methods in Natural Language Processing (EMNLP).
[paper, bib, dataset] -
Clinical Information Retrieval using Document and PICO Structure.
Florian Boudin, Jian-Yun Nie, Martin Dawes.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
[paper, bib, dataset] -
Improving Medical Information Retrieval with PICO Element Detection.
Florian Boudin, Lixin Shi, Jian-Yun Nie.
European Conference on Information Retrieval (ECIR).
[paper, bib] -
Deriving a test collection for clinical information retrieval from systematic reviews.
Florian Boudin, Jian-Yun Nie, Martin Dawes.
Data and Text Mining in Biomedical Informatics (DTMBIO).
[paper, dataset]
2009
-
Résumé automatique multi-document et indépendance de la langue : une première évaluation en français.
Florian Boudin, Juan-Manuel Torres-Moreno.
Traitement Automatique des Langues Naturelles (TALN).
[paper, bib] -
A Maximization-Minimization Approach for Update Text Summarization.
Florian Boudin, Juan-Manuel Torres-Moreno.
Current Issues in Linguistic Theory: Recent Advances in Natural Language Processing.
[paper]
2008
-
A Scalable MMR Approach to Sentence Scoring for Multi-Document Update Summarization.
Florian Boudin, Marc El-Bèze, Juan-Manuel Torres-Moreno.
International Conference on Computational Linguistics (COLING).
[paper, bib] -
Mixing Statistical and Symbolic Approaches for Chemical Names Recognition.
Florian Boudin, Juan Torres-Moreno, Marc El-Bèze.
Conference on Intelligent Text Processing and Computational Linguistics (CICLing).
[paper] -
An Efficient Statistical Approach for Automatic Organic Chemistry Summarization.
Florian Boudin, Juan-Manuel Torres-Moreno, Patricia Velázquez-Morales..
International Conference on Natural Language Processing (GoTAL)
[paper] -
The LIA Update Summarization system at TAC-2008.
Florian Boudin, Marc El-Bèze, Juan-Manuel Torres-Moreno.
Text Analysis Conference (TAC).
[paper] -
Exploration d’approches statistiques pour le résumé automatique de texte.
Florian Boudin.
Laboratoire Informatique d’Avignon – Université d’Avignon.
[PhD thesis]
2007
-
A Cosine Maximization Minimization approach for User Oriented Multi-Document Update Summarization.
Florian Boudin, Juan-Manuel Torres-Moreno.
Recent Advances in Natural Language Processing (RANLP).
[paper] -
NEO-CORTEX: A Performant User-Oriented Multi-Document Summarization System.
Florian Boudin, Juan Torres Moreno.
Conference on Intelligent Text Processing and Computational Linguistics (CICLing).
[paper] -
The LIA-Thales summarization system at DUC-2007.
Florian Boudin, Benoit Favre, Frederic Béchet, Marc El-Bèze, Laurent Gillard, Juan-Manuel Torres-Moreno.
Document Understanding Conference (DUC).
[paper]
2006
- The LIA-Thales summarization system at DUC-2006.
Benoit Favre, Frederic Béchet, Patrice Bellot, Florian Boudin, Marc El-Beze, Laurent Gillard, Guy Lapalme, Juan-Manuel Torres-Moreno.
Document Understanding Conference (DUC).
[paper]