This page contains a list of my publications. You can also find them on my Google Scholar profile. Equal contribution is indicated by an asterisk (*) after the author's name.
2024
ViMedAQA
ViMedAQA: A Vietnamese Medical Abstractive Question-Answering Dataset and Findings of Large Language Model
Minh-Nam Tran, Phu-Vinh Nguyen, Long Nguyen, and Dien Dinh
In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop) , Aug 2024
Question answering involves creating answers to questions. With the growth of large language models, the ability of question-answering systems has dramatically improved. However, there is a lack of Vietnamese abstractive question-answering datasets, especially in the medical domain. Therefore, this research aims to mitigate this gap by introducing ViMedAQA. This **Vi**etnamese **Med**ical **A**bstractive **Q**uestion-**A**nswering dataset covers four topics in the Vietnamese medical domain, including body parts, disease, drugs and medicine. Additionally, the empirical results on the proposed dataset examine the capability of the large language models in the Vietnamese medical domain, including reasoning, memorizing and awareness of essential information.
@inproceedings{tran-etal-2024-vimedaqa,title={{V}i{M}ed{AQA}: A {V}ietnamese Medical Abstractive Question-Answering Dataset and Findings of Large Language Model},author={Tran, Minh-Nam and Nguyen, Phu-Vinh and Nguyen, Long and Dinh, Dien},editor={Fu, Xiyan and Fleisig, Eve},booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)},month=aug,year={2024},address={Bangkok, Thailand},publisher={Association for Computational Linguistics},url={https://aclanthology.org/2024.acl-srw.31},doi={10.18653/v1/2024.acl-srw.31},pages={356--364},}
ViGLUE
ViGLUE: A Vietnamese General Language Understanding Benchmark and Analysis of Vietnamese Language Models
Minh-Nam Tran*, Phu-Vinh Nguyen*, Long Nguyen, and Dien Dinh
In Findings of the Association for Computational Linguistics: NAACL 2024 , Jun 2024
As the number of language models has increased, various benchmarks have been suggested to assess the proficiency of the models in natural language understanding. However, there is a lack of such a benchmark in Vietnamese due to the difficulty in accessing natural language processing datasets or the scarcity of task-specific datasets. *ViGLUE*, the proposed dataset collection, is a Vietnamese General Language Understanding Evaluation benchmark developed using three methods: translating an existing benchmark, generating new corpora, and collecting available datasets. ViGLUE contains twelve tasks and encompasses over ten areas and subjects, enabling it to evaluate models comprehensively over a broad spectrum of aspects. Baseline models utilizing multilingual language models are also provided for all tasks in the proposed benchmarks. In addition, the study of the available Vietnamese large language models is conducted to explore the language models’ ability in the few-shot learning framework, leading to the exploration of the relationship between specific tasks and the number of shots.
@inproceedings{tran-etal-2024-viglue,title={{V}i{GLUE}: A {V}ietnamese General Language Understanding Benchmark and Analysis of {V}ietnamese Language Models},author={Tran, Minh-Nam and Nguyen, Phu-Vinh and Nguyen, Long and Dinh, Dien},editor={Duh, Kevin and Gomez, Helena and Bethard, Steven},booktitle={Findings of the Association for Computational Linguistics: NAACL 2024},month=jun,year={2024},address={Mexico City, Mexico},publisher={Association for Computational Linguistics},url={https://aclanthology.org/2024.findings-naacl.261},doi={10.18653/v1/2024.findings-naacl.261},pages={4174--4189},}
MultiView
Multi-perspective Traffic Video Description Model with Fine-grained Refinement Approach
The analysis of traffic patterns is crucial for enhancing safety and optimizing flow within urban cities. While urban cities possess extensive camera networks for monitoring the raw video data often lacks the contextual detail necessary for understanding complex traffic incidents and the behaviors of road users. This paper proposes a novel methodology for generating comprehensive descriptions of traffic scenarios combining a vision-language model (VLM) with rule-based refinements to capture pertinently pedestrian vehicle and environment factors. First a captioning model will generate a general description using processed video as input. Subsequently this description is refined sequentially through three primary modules: pedestrian-aware vehicle-aware and context-aware enhancing the final description. We evaluate our method on the Woven Traffic Safety datasets in Track 2 of the AI City Challenge 2024 obtaining competitive results with an S2 score of 22.6721. Code will be available at https://github.com/ToTuanAn/AICityChallenge2024_Track2
@inproceedings{To_2024_CVPR,author={To, Tuan-An and Tran, Minh-Nam and Ho, Trong-Bao and Ha, Thien-Loc and Nguyen, Quang-Tan and Luong, Hoang-Chau and Cao, Thanh-Duy and Tran, Minh-Triet},title={Multi-perspective Traffic Video Description Model with Fine-grained Refinement Approach},booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},month=jun,year={2024},pages={7075-7084},}