Publications

Check the full publications here

2023

  1. arXiv
    Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
    Zheng-Xin Yong, Ruochen Zhang, Jessica Zosa Forde, Skyler Wang, Samuel Cahyawijaya, Holy Lovenia, Genta Indra Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Long Phan, Yin Lin Tan, and Alham Fikri Aji
    2023

2022

  1. arXiv
    The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges
    Genta Indra Winata, Alham Fikri Aji, Zheng-Xin Yong, and Thamar Solorio
    2022
  2. arXiv
    NusaCrowd: Open Source Initiative for Indonesian NLP Resources
    Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, and others
    arXiv preprint arXiv:2212.09648 2022
  3. arXiv
    BLOOM+ 1: Adding Language Support to BLOOM for Zero-Shot Prompting
    Zheng-Xin Yong, Hailey Schoelkopf, Niklas Muennighoff, Alham Fikri Aji, David Ifeoluwa Adelani, Khalid Almubarak, M Saiful Bari, Lintang Sutawika, Jungo Kasai, Ahmed Baruwa, and others
    arXiv preprint arXiv:2212.09535 2022
  4. arXiv
    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
    Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, and others
    arXiv preprint arXiv:2211.05100 2022
  5. AACL
    Cross-lingual Few-Shot Learning on Unseen Languages
    Genta Winata, Shijie Wu, Mayank Kulkarni, Thamar Solorio, and Daniel Preoţiuc-Pietro
    In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing 2022
  6. SumEval
    IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages
    Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Genta Indra Winata, Pascale Fung, and Ayu Purwarianti
    In Proceedings of the First Workshop on Scaling Up Multilingual Evaluation 2022
  7. arXiv
    Transfer Learning Application of Self-supervised Learning in ARPES
    Sandy Adhitia Ekahana, Genta Indra Winata, Gabriel Aeppli, Radovic Milan, and Ming Shi
    arXiv preprint arXiv:2208.10893 2022
  8. arXiv
    NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages
    Samuel Cahyawijaya, Alham Fikri Aji, Holy Lovenia, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Fajri Koto, David Moeljadi, Karissa Vincentio, Ade Romadhony, and others
    arXiv preprint arXiv:2207.10524 2022
  9. EMNLP Demo
    GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
    Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, and others
    arXiv preprint arXiv:2206.11249 2022
  10. Accepted at TMLR
    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
    Aarohi Srivastava, et al. (443 authors)
    2022
  11. Accepted at EACL
    NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
    Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, and Sebastian Ruder
    2022
  12. DialDoc
    Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters
    Yan Xu, Etsuko Ishii, Zihan Liu, Genta Indra Winata, Dan Su, Andrea Madotto, and Pascale Fung
    Accepted at DialDoc 2022
  13. ACL
    One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
    Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, and others
    Accepted at ACL 2022
  14. LREC
    CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
    Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J Barezi, Peng Xu, Cheuk Tung Shadow Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen, and others
    Accepted at LREC 2022
  15. LREC
    ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
    Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Peng Xu, Xu Yan, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J Barezi, and others
    Accepted at LREC 2022

2021

  1. arXiv
    NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
    Kaustubh D Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Srivastava, Samson Tan, and others
    arXiv preprint arXiv:2112.02721 2021
  2. arXiv
    Few-Shot Bot: Prompt-Based Learning for Dialogue Systems
    Andrea Madotto, Zhaojiang Lin, Genta Indra Winata, and Pascale Fung
    arXiv preprint arXiv:2110.08118 2021
  3. ICAICTA
    A Comparative Study on Language Models for Task-Oriented Dialogue Systems
    Vinsen Marselino Andreas, Genta Indra Winata, and Ayu Purwarianti
    In 2021 8th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA) 2021
  4. MRL
    Language Models are Few-shot Multilingual Learners
    Genta Indra Winata, Andrea Madotto, Zhaojiang Lin, Rosanne Liu, Jason Yosinski, and Pascale Fung
    arXiv preprint arXiv:2109.07684 2021
  5. arXiv
    Greenformer: Factorization toolkit for efficient deep neural networks
    Samuel Cahyawijaya, Genta Indra Winata, Holy Lovenia, Bryan Wilie, Wenliang Dai, Etsuko Ishii, and Pascale Fung
    arXiv preprint arXiv:2109.06762 2021
  6. RepL4NLP
    Preserving Cross-Linguality of Pre-trained Models via Continual Learning
    Zihan Liu, Genta Indra Winata, Andrea Madotto, and Pascale Fung
    In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021) 2021
  7. DialDoc21
    CAiRE in DialDoc21: Data Augmentation for Information-Seeking Dialogue System
    Etsuko Ishii, Yan Xu, Genta Indra Winata, Zhaojiang Lin, Andrea Madotto, Zihan Liu, Peng Xu, and Pascale Fung
    DialDoc21 2021
  8. NeurIPS
    BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling
    Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Peng Xu, Feijun Jiang, Yuxiang Hu, Chen Shi, and Pascale Fung
    In arXiv preprint arXiv:2106.02787 2021
  9. Interspeech
    Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition
    Genta Indra Winata, Guangsen Wang, Caiming Xiong, and Steven Hoi
    INTERSPEECH 2021
  10. SIGDIAL
    ERICA: An Empathetic Android Companion for Covid-19 Quarantine
    Etsuko Ishii, Genta Indra Winata, Samuel Cahyawijaya, Divesh Lala, Tatsuya Kawahara, and Pascale Fung
    SIGDIAL 2021
  11. RepL4NLP
    Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning
    Zihan Liu, Genta Indra Winata, Andrea Madotto, and Pascale Fung
    RepL4NLP 2021
  12. RepL4NLP
    X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented Compositional Semantic Parsing
    Zihan Liu, Genta Indra Winata, Peng Xu, and Pascale Fung
    RepL4NLP 2021
  13. ACL-IJCNLP Findings
    Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural Machine Translation
    Zihan Liu, Genta Indra Winata, and Pascale Fung
    ACL-IJCNLP Findings 2021
  14. CALCS
    Are Multilingual Models Effective in Code-Switching?
    Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, and Pascale Fung
    In Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching 2021
  15. AAAI
    On the Importance of Word Order Information in Cross-lingual Sequence Labeling
    Zihan Liu, Genta Indra Winata, Samuel Cahyawijaya, Andrea Madotto, Zhaojiang Lin, and Pascale Fung
    In Proceedings of the AAAI Conference on Artificial Intelligence 2021
  16. EMNLP
    IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation
    Samuel Cahyawijaya, Genta Indra Winata, Bryan Wilie, Karissa Vincentio, Xiaohong Li, Adhiguna Kuncoro, Sebastian Ruder, Zhi Yuan Lim, Syafri Bahar, Masayu Khodra, and others
    In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
  17. arXiv
    Nora: The Well-Being Coach
    Genta Indra Winata, Holy Lovenia, Etsuko Ishii, Farhad Bin Siddique, Yongsheng Yang, and Pascale Fung
    arXiv preprint arXiv:2106.00410 2021

2020

  1. EMNLP
    Cross-lingual Spoken Language Understanding with Regularized Representation Alignment
    Zihan Liu, Genta Indra Winata, Peng Xu, Zhaojiang Lin, and Pascale Fung
    In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
  2. EMNLP
    MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
    Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, and Pascale Fung
    In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
  3. EMNLP-Findings
    Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems
    Andrea Madotto, Samuel Cahyawijaya, Genta Indra Winata, Yan Xu, Zihan Liu, Zhaojiang Lin, and Pascale Fung
    In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings 2020
  4. AACL-IJCNLP
    IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding
    Bryan Wilie, Karissa Vincentio, Genta Indra Winata, Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, and others
    In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing 2020
  5. arXiv
    Emograph: Capturing emotion correlations using graph networks
    Peng Xu, Zihan Liu, Genta Indra Winata, Zhaojiang Lin, and Pascale Fung
    arXiv preprint arXiv:2008.09378 2020
  6. ACL
    Meta-Transfer Learning for Code-Switched Speech Recognition
    Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu, Peng Xu, and Pascale Fung
    In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
  7. ACL
    Coach: A Coarse-to-Fine Approach for Cross-domain Slot Filling
    Zihan Liu, Genta Indra Winata, Peng Xu, and Pascale Fung
    In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
  8. arXiv
    Variational Transformers for Diverse Response Generation
    Zhaojiang Lin, Genta Indra Winata, Peng Xu, Zihan Liu, and Pascale Fung
    arXiv preprint arXiv:2003.12738 2020
  9. ConvAI
    XPersona: Evaluating Multilingual Personalized Chatbot
    Zhaojiang Lin, Zihan Liu, Genta Indra Winata, Samuel Cahyawijaya, Andrea Madotto, Yejin Bang, Etsuko Ishii, and Pascale Fung
    arXiv preprint arXiv:2003.07568 2020
  10. Interspeech
    Learning Fast Adaptation on Cross-Accented Speech Recognition
    Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, and Pascale Fung
    Proc. Interspeech 2020
  11. RepL4NLP
    Zero-Resource Cross-Domain Named Entity Recognition
    Zihan Liu, Genta Indra Winata, and Pascale Fung
    In Proceedings of the 5th Workshop on Representation Learning for NLP 2020
  12. AAAI
    Attention-informed mixed-language training for zero-shot cross-lingual task-oriented dialogue systems
    Zihan Liu, Genta Indra Winata, Zhaojiang Lin, Peng Xu, and Pascale Fung
    In Proceedings of the AAAI Conference on Artificial Intelligence 2020
  13. ICASSP
    Lightweight and Efficient End-to-End Speech Recognition Using Low-Rank Transformer
    Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu, and Pascale Fung
    In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
  14. AAAI
    CAiRE: An End-to-End Empathetic Chatbot
    Zhaojiang Lin, Peng Xu, Genta Indra Winata, Farhad Bin Siddique, Zihan Liu, Jamin Shin, and Pascale Fung
    In Proceedings of the AAAI Conference on Artificial Intelligence 2020

2019

  1. EMNLP-IJCNLP
    Zero-shot Cross-lingual Dialogue Systems with Transferable Latent Variables
    Zihan Liu, Jamin Shin, Yan Xu, Genta Indra Winata, Peng Xu, Andrea Madotto, and Pascale Fung
    In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 2019
  2. EMNLP-IJCNLP
    Hierarchical Meta-Embeddings for Code-Switching Named Entity Recognition
    Genta Indra Winata, Zhaojiang Lin, Jamin Shin, Zihan Liu, and Pascale Fung
    In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 2019
  3. MRQA
    Generalizing Question Answering System with Pre-trained Language Model Fine-tuning
    Dan Su, Yan Xu, Genta Indra Winata, Peng Xu, Hyeondey Kim, Zihan Liu, and Pascale Fung
    In EMNLP 2019 MRQA Workshop 2019
  4. CoNNL
    Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences
    Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, and Pascale Fung
    In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) 2019
  5. PACLIC
    On the Effectiveness of Low-Rank Matrix Factorization for LSTM Model Compression
    Genta Indra Winata, Andrea Madotto, Jamin Shin, Elham J Barezi, and Pascale Fung
    In Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation 2019
  6. WMT
    Incorporating Word and Subword Units in Unsupervised Machine Translation Using Language Model Rescoring
    Zihan Liu, Yan Xu, Genta Indra Winata, and Pascale Fung
    In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1) 2019
  7. RepL4NLP
    Learning Multilingual Meta-Embeddings for Code-Switching Named Entity Recognition
    Genta Indra Winata, Zhaojiang Lin, and Pascale Fung
    In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) 2019
  8. SemEval
    CAiRE_HKUST at SemEval-2019 Task 3: Hierarchical Attention for Dialogue Emotion Classification
    Genta Indra Winata, Andrea Madotto, Zhaojiang Lin, Jamin Shin, Yan Xu, Peng Xu, and Pascale Fung
    In Proceedings of the 13th International Workshop on Semantic Evaluation 2019
  9. ICASSP
    Learning comment generation by leveraging user-generated data
    Zhaojiang Lin, Genta Indra Winata, and Pascale Fung
    In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
  10. FinNLP
    Learning to learn sales prediction with social media sentiment
    Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Zihan Liu, Yan Xu, Cong Gao, and Pascale Fung
    In Proceedings of the First Workshop on Financial Technology and Natural Language Processing 2019

2018

  1. arXiv
    Towards end-to-end automatic code-switching speech recognition
    Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, and Pascale Fung
    arXiv preprint arXiv:1810.12620 2018
  2. arXiv
    Learn to code-switch: Data augmentation using copy mechanism on language modeling
    Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, and Pascale Fung
    arXiv preprint arXiv:1810.10254 2018
  3. CALCS
    Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning
    Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, and Pascale Fung
    In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching 2018
  4. ICASSP
    End-to-End Dynamic Query Memory Network for Entity-Value Independent Task-oriented Dialog
    Chien-Sheng Wu, Andrea Madotto, Genta Indra Winata, and Pascale Fung
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
  5. ICASSP
    Attention-Based LSTM for Psychological Stress Detection from Spoken Language Using Distant Supervision
    Genta Indra Winata, Onno Pepijn Kampman, and Pascale Fung
    In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
  6. CALCS
    Bilingual Character Representation for Efficiently Addressing Out-of-Vocabulary Words in Code-Switching Named Entity Recognition
    Genta Indra Winata, Chien-Sheng Wu, Andrea Madotto, and Pascale Fung
    In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching 2018

2017

  1. DSTC6
    End-to-end recurrent entity network for entity-value independent goal-oriented dialog learning
    Chien-Sheng Wu, Andrea Madotto, Genta Winata, and Pascale Fung
    In Wu, Chien-Sheng, et al. "End-to-end recurrent entity network for entity-value independent goal-oriented dialog learning." Dialog System Technology Challenges Workshop, DSTC6 2017
  2. Interspeech
    Nora the Empathetic Psychologist
    Genta Indra Winata, Onno Kampman, Yang Yang, Anik Dey, and Pascale Fung
    Proc. Interspeech 2017 2017

2015

  1. ICEEI
    Handling imbalanced dataset in multi-label text categorization using Bagging and Adaptive Boosting
    Genta Indra Winata, and Masayu Leylia Khodra
    In 2015 International Conference on Electrical Engineering and Informatics (ICEEI) 2015