publications | Le Minh Khoi

2024

ACL

UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for Low-Resource Languages

Khoi Le, Trinh Pham, and Luu Anh Tuan

Proceedings of ACL, 2024

Abs arXiv Code

In this paper, we introduce UniBridge (Cross-Lingual Transfer Learning with Optimized Embeddings and Vocabulary), a comprehensive approach developed to improve the effectiveness of Cross-Lingual Transfer Learning, particularly in languages with limited resources. Our approach tackles two essential elements of a language model: the initialization of embeddings and the optimal vocabulary size. Specifically, we propose a novel embedding initialization method that leverages both lexical and semantic alignment for a language. In addition, we present a method for systematically searching for the optimal vocabulary size, ensuring a balance between model complexity and linguistic coverage. Our experiments across multilingual datasets show that our approach greatly improves the F1-Score in several languages. UniBridge is a robust and adaptable solution for cross-lingual systems in various languages, highlighting the significance of initializing embeddings and choosing the right vocabulary size in cross-lingual environments.
AAAI

LAMPAT: Low-Rank Adaption for Multilingual Paraphrasing Using Adversarial Training

Khoi Le, Trinh Pham, Tho Quan, and Luu Anh Tuan

Proceedings of AAAI, 2024

Abs arXiv Code

Paraphrases are texts that convey the same meaning while using different words or sentence structures. It can be used as an automatic data augmentation tool for many Natural Language Processing tasks, especially when dealing with low-resource languages, where data shortage is a significant problem. To generate a paraphrase in multilingual settings, previous studies have leveraged the knowledge from the machine translation field, i.e., forming a paraphrase through zero-shot machine translation in the same language. Despite good performance on human evaluation, those methods still require parallel translation datasets, thus making them inapplicable to languages that do not have parallel corpora. To mitigate that problem, we proposed the first unsupervised multilingual paraphrasing model, LAMPAT (Low-rank Adaptation for Multilingual Paraphrasing using Adversarial Training), by which monolingual dataset is sufficient enough to generate a human-like and diverse sentence. Throughout the experiments, we found out that our method not only works well for English but can generalize on unseen languages as well.
ECCV

MAMA: A Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

Thong Nguyen, Yi Bin, Xiaobao Wu, Xinshuai Dong, and 5 more authors

Proceedings of ECCV, 2024

Abs arXiv Code Website

Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering the downstream performance across unpopular subjects. To address these problems, we propose a contrastive objective with a subtractive angular margin to regularize cross-modal representations in their effort to reach perfect similarity. Furthermore, to adapt to the non-uniform concept distribution, we propose a multi-layer perceptron (MLP)-parameterized weighting function that maps loss values to sample weights which enable dynamic adjustment of the model’s focus throughout the training. With the training guided by a small amount of unbiased meta-data and augmented by video-text data generated by large vision-language model, we improve video-language representations and achieve superior performances on commonly used video question answering and text-video retrieval datasets.
AAAI

READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling

Thong Nguyen, Xiaobao Wu, Xinshuai Dong , Khoi Le, and 4 more authors

Proceedings of AAAI, 2024

Abs arXiv Code Website

Fully fine-tuning pretrained large-scale transformer models has become a popular paradigm for video-language modeling tasks, such as temporal language grounding and video-language summarization. With a growing number of tasks and limited training data, such full fine-tuning approach leads to costly model storage and unstable training. To overcome these shortcomings, we introduce lightweight adapters to the pre-trained model and only update them at fine-tuning time. However, existing adapters fail to capture intrinsic temporal relations among video frames or textual words. Moreover, they neglect the preservation of critical task-related information that flows from the raw video-language input into the adapter’s low-dimensional space. To address these issues, we first propose a novel REcurrent ADapter (READ) that employs recurrent computation to enable temporal modeling capability. Second, we propose Partial Video-Language Alignment (PVLA) objective via the use of partial optimal transport to maintain task-related information flowing into our READ modules. We validate our READ-PVLA framework through extensive experiments where READ-PVLA significantly outperforms all existing fine-tuning strategies on multiple low-resource temporal language grounding and video-language summarization benchmarks.

2023

IUI

A Vietnamese Spelling Correction System

Nguyen Hai Thien, Thinh Pham , Khoi Le, Manh Luong, and 8 more authors

Proceedings of Intelligent User Interfaces, 2023

Abs PDF

This paper presents a new Vietnamese spelling correction system that allows users to correct spelling errors in their text. Our system is an interactive writing assistant that integrates advanced technologies in natural language processing to (i) identify spelling errors and (ii) replace those errors with their corrected version. To the best of our knowledge, our system is the first Vietnamese spelling correction tool that interacts with the users via Web Interface, Microsoft Word and Chrome extensions to provide the best user experience. We also perform automatic and human evaluations to demonstrate the effectiveness of our system.