Skip to main content

Association for Vietnamese Language and Speech Processing

A chapter of VAIP - Vietnam Association for Information Processing

VLSP 2023 Invited Talk: An overview of foundation models for Vietnamese language processing

Dat Quoc Nguyen (Ph.D.) is a Senior Research Scientist and the Head of the Natural Language Processing department at VinAI Research, Vietnam. He was an Honorary Fellow in the School of Computing and Information Systems at the University of Melbourne, Australia, where previously he was a Research Fellow. Before that, he received his Ph.D. in Computer Science from Macquarie University, Australia. Dat Quoc Nguyen authored 70+ scientific papers covering core NLP problems, ML methods for NLP and their applications for low-resource languages and specific domains, achieving an h-index of 32 with over 5000 citations (according to Google Scholar). He released many ML/NLP toolkits and datasets, which are widely used in both academia and industry. He also created large language models and other foundation models, including PhoGPT, PhoBERT, BARTpho, XPhoneBERT and BERTweet, with millions of downloads.

An overview of foundation models for Vietnamese language processing

Abstract.  In this talk, I will provide a brief overview of foundation models for Vietnamese language processing, including encoder-only, decoder-only, and encoder-decoder architectures. I will then delve into the details of a 7.5B-parameter generative model series named PhoGPT for Vietnamese, which comprises the base pre-trained monolingual model PhoGPT-7B5 and its instruction-following variant, PhoGPT-7B5-Instruct.

Sponsors and Partners

VinBIGDATA   VinIF  AIMESOFT  bee  Dagoras            

 

 zalo    VTCC  VCCorp

 

 

IOIT  HUS  USTH  UET    TLU  UIT  INT2  jaist  VIETLEX