基于双向预训练语言模型的文本情感分类Text sentiment classification based on bidirectional pre-training language model
马力;王璐琳;
摘要(Abstract):
针对Word2vec和GloVe等词嵌入方法无法学习相同词汇在不同上下文中的一词多义,以及单向语言模型的特征融合能力较弱的问题,提出一种基于双向预训练语言模型的文本情感分类方法。在大型通用语料库上利用双向Transformer语言模型进行无监督的训练,得到预训练语言模型。在预训练语言模型的输出层上添加一层softmax网络,并在文本情感分类的任务语料上微调模型,利用该模型进行文本情感分类。实验结果表明,该方法在SST-2和Yelp14数据集上分类的准确率分别提高了1.8%和1.4%,有效地提高了情感分类准确率。
关键词(KeyWords): 文本情感分类;一词多义;特征融合;双向语言模型;微调
基金项目(Foundation): 国家自然科学基金项目(61373116);; 陕西省自然科学基础研究计划项目(2016JM6085)
作者(Author): 马力;王璐琳;
Email:
DOI: 10.13682/j.issn.2095-6533.2020.05.014
参考文献(References):
- [1] 王伟,孙玉霞,齐庆杰,等.基于BiGRU-Attention神经网络的文本情感分类模型[J].计算机应用研究,2018,36(12):3558-3564.DOI:10.19734/j.issn.1001-3695.2018.07.0413.
- [2] 马力,丁蔚,李培,等.基于情感特征的主客观分类研究[J].西安邮电大学学报,2017,22(4):101-104.DOI:10.13682/j.issn.2095-6533.2017.04.019.
- [3] 王勇,何养明,邹辉,等.WordNG-Vec:一种应用于CNN文本分类的词向量模型[J].小型微型计算机系统,2019,40(3):499-502.
- [4] HINTON G E.Learning distributed representations of concepts[C]//Proceedings of the 8th Annual Conference of the Cognitive Science Society,Hillsdale:Lawrence Erlbaum,1986:1-12.
- [5] 郁可人,傅云斌,董启文.基于神经网络语言模型的分布式词向量研究进展[J].华东师范大学学报(自然科学版),2017(5):52-65.DOI:10.3969/j.issn.1000-5641.2017.05.006.
- [6] 梁海霞,张凌东,贾蓉.基于AM的BLSTM网络电商评论情感倾向分类[J].西安邮电大学学报,2019,24(5):74-80.DOI:10.13682/j.issn.2095-6533.2019.05.013.
- [7] BENGIO Y,SCHWENK H,SENECAL J S,et al.A neural probabilistic language model[J].Journal of Machine Learning Research,2000,3(6):932-938.DOI:10.1109/TNN.2007.912312.
- [8] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of 27th Annual Conference on Neural Information Processing Systems.Red Hook:Curran Associates,2013:3111-3119.
- [9] MIKOLOV T,CHEN K,CORRADO G S,et al.Efficient estimation of word representations in vector space[C]//Proceedings of International Conference on Intelligent Text Processing and Computational Linguistics.Berlin:Springer,2013:430-443.
- [10] PENNINGTON J,SOCHER R,MANNINGG C.Glove:Global vectors for word representation[C]//Proceeding of the 2014 Conference on Empirical Methods in Natural Language Processing.Stroudsburg:ACL,2014:1532-1543.DOI:10.3115/v1/D14-1162.
- [11] JOHN W,MOHIT B,KEVIN G,et al.Charagram:Embedding words and sentences via character n-grams[C]//Proceeding of the 2016 Conference on Empirical Methods in Natural Language Processing.Stroudsburg:ACL,2016:1504-1515.DOI:10.18653/v1/D16-1157.
- [12] JOULIN A,GRAVE E,BOAGNOWSKI P,et al.Bag of tricks for efficient text classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.Valencia:ACL,2017:427-431.DOI:10.18653/v1/e17-2068.
- [13] PETERSM E,NEUMANN M,LYYER M.et al.Deep contextualized word representations[EB/OL].[2020-01-15].http://arxiv.org/abs/1802.05365.
- [14] ALEC R,KARTHIK N,TIM S,et al.Improving language understanding by generative pre-training[EB/OL].[2020-01-15].https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
- [15] DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training of deep bidirectional transformers for language understanding[EB/OL].[2020-01-15].http://arxiv.org/abs/1810.04805 context=cs.
- [16] GULRAJANI I,AHMED F,ARJOVSKY M,et al.Improved training of wasserstein GANs[EB/OL].[2020-01-15].http://arxiv.org/abs/1704.00028.
- [17] RICHARD S,ALEX P,JASON C,et al.Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the Conference on Empirical Methods in Natural Language.Washington:ACL,2013:1631-1642.