Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning

Published in arXiv preprint arXiv, 2020

Recommended citation: https://arxiv.org/pdf/2004.14218.pdf

Recently, fine-tuning pre-trained cross-lingual models (e.g., multilingual BERT) to downstream cross-lingual tasks has shown promising results. However, the fine-tuning process inevitably changes the parameters of the pre-trained model and weakens its cross-lingual ability, which could lead to sub-optimal performances. To alleviate this issue, we leverage the idea of continual learning to preserve the original cross-lingual ability of the pre-trained model when we fine-tune it to downstream cross-lingual tasks. The experiment on the cross-lingual sentence retrieval task shows that our fine-tuning approach can better preserve the cross-lingual ability of the pre-trained model. In addition, our method achieves better performance than other fine-tuning baselines on zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.