A HYBRID MODEL USING THE PRETRAINED BERT AND DEEP NEURAL NETWORKS WITH RICH FEATURE FOR EXTRACTIVE TEXT SUMMARIZATION
Keywords:Extractive Summarization, BERT multilingual, CNN, Encoder-Decoder, TF-IDF feature
Deep neural networks have been applied successfully to extractive text summarization tasks with the accompany of large training datasets. However, when the training dataset is not large enough, these models reveal certain limitations that affect the quality of the system’s summary. In this paper, we propose an extractive summarization system basing on a Convolutional Neural Network and a Fully Connected network for sentence selection. The pretrained BERT multilingual model is used to generate embeddings vectors from the input text. These vectors are combined with TF-IDF values to produce the input of the text summarization system. Redundant sentences from the output summary are eliminated by the Maximal Marginal Relevance method. Our system is evaluated with both English and Vietnamese languages using CNN and Baomoi datasets, respectively. Experimental results show that our system achieves better results comparing to existing works using the same dataset. It confirms that our approach can be effectively applied to summarize both English and Vietnamese languages.
Aishwarya Jadhav, Vaibhav Rajan, Extractive Summarization with SWAP-NET: Sentences and Words from Alternating Pointer Networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 142–151 Melbourne, Australia, July 15 – 20, 2018.
Al-Sabahi Kamal, Zuping Zhang, Nadher Mohammed, A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS). IEEE Access, Volume 6. pp. 24205-24212, 2018.
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever, “Improving Language Understanding by Generative Pre-Training”. url: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf (preprint 2018).
Alex Graves, Abdel-rahman Mohamed, Geoffrey Hinton. Speech Recognition with Deep Recurrent Neural Networks. arXiv:1303.5778 [cs.NE] (Last revised: 22 Mar 2013).
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes, “Supervised learning of universal sentence representations from natural language inference data,” arXiv:1705.02364, 2017.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin, "Attention is all you need", 2017.
Daniel Cera, Yinfei Yanga, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, "Universal Sentence Encoder", 2018.
Dat Quoc Nguyen and Anh Tuan Nguyen. “PhoBERT: Pretrained language models for Vietnamese,”. arXiv:2003.00744v3 [cs.CL] (Last updated: 5 Oct 2020).
Devlin J., Chang M.W., Lee K., Toutanova K, “Bert: Pre-training of deep bidirectional transformers for language understanding,” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186.
Diganta Misra and Landskape, “Mish: A Self Regularized Non-Monotonic Activation Function”. arXiv:1908.08681v3 [cs.LG] (Last revised: 13 Aug 2020).
Gunes Erkan and Dragomir R. Radev, “LexRank: Graph-based Lexical Centrality as Salience in Text Summarization,” Journal of Artificial Intelligence Research 22 (2004) 457-479. 2004.
Hai Cao Manh, Huong Le Thanh, and Tuan Luu Minh. “Extractive Multi-document Summarization using K-means, Centroid-based Method, MMR, and Sentence Position”. In SoICT 2019: Proceedings of the Tenth International Symposium on Information and Communication Technology, Pages 29–35, https://doi.org/10.1145/3368926.3369688, 2019 .
Jaime Carbonell and Jade Goldstein, “The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries,” In Research and Development in Information Retrieval, 1998.
Jeffrey Pennington, Richard Socher, and Christopher D. Manning, “Glove: Global vectors for word representation,” In Proceeding of EMNLP, 2014.
Julian Kupiec, Jan Pedersen, Francine Chen, A trainable document summarizer, In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pages 68–73. ACM, 1995.
Kam-Fai Wong, Mingli Wu, Wenjie Li, Extractive Summarization Using Supervised and Semi-Supervised Learning, In Proceedings of the 22nd International Conference on Computational Linguistics, pages 985–992, 2008.
Kang Yang , Kamal Al-Sabahi , Yanmin Xiang and Zuping Zhang, “An Integrated Graph Model for Document Summarization” , 2018.
Karl Moritz Hermann, Tomas Kocisk y, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom, "Teaching machines to read and comprehend,” In Advances in Neural Information Processing Systems 28, pages 1693–1701," 2015.
Lam Quang Tuong, Pham The Phi va Do Duc Hao, “Tom tat van ban tieng Viet tu dong voi mo hinh sequence-to-sequence”, Tap chi khoa hoc Truong Dai hoc Can Tho. So chuyen de: Cong nghe thong tin, pp. 125-132, 2017.
Leo Laugier, Evan Thompson, Alexandros Vlissidis, Extractive Document Summarization Using Convolutional Neural Networks - Reimplementation, Department of Electrical Engineering and Computer Sciences University of California, Berkeley, 2018.
Mark Wasson, “Using leading text for news summaries: Evaluation results and implications for commercial summarization applications,” In Proc. of the 17th international conference on Computational linguistics-Volume 2, 1998.
Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assef, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez and Krys Kochut, “Text Summarization Techniques: A Brief Survey”, arXiv, 2017.
Ming Zhong, Pengfei Liu, Yiran Chen, Danqing Wang, Xipeng Qiu, and Xuanjing Huang, Extractive Summarization as Text Matching. arXiv:2004.08795v1 [cs.CL], 19 Apr 2020.
M.L. Nguyen, Shimazu, Akira, Xuan, Hieu Phan, Tu, Bao Ho, Horiguchi, Susumu, "Sentence Extraction with Support Vector Machine Ensemble,” In Proceedings of the First World Congress of the International Federation for Systems Research: The New Roles of Systems Sciences For a Knowledge-based Society, 2005.
Mohit Lyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daume III, “Deep unordered composition arivals syntactic methods for text classification,” In Proceedings of ACL/IJCNLP, 2015.
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom, A Convolutional Neural Network for Modelling Sentences. In arXiv:1404.2188v1 [cs.CL] (Last revised: 8 Apr 2014), 2014.
Nguyen Quang Uy, Pham Tuan Anh, Truong Cong Doan, Nguyen Xuan Hoai, “A Study on the Use of Genetic Programming for Automatic Text Summarization,” KSE, Fourth International Conference on Knowledge and Systems Engineering, pp.93-98, 2012.
Nguyen Thi Thu Ha, “Phat trien mot so thuat toan tom tat van ban tieng Viet su dung phuong phap hoc ban giam sat”, Hoc vien ky thuat quan su, 2012.
Rada Mihalcea and Paul Tarau. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, 2004.
Rafael Ferreira, Frederico Freitas, Luciano de Souza Cabral, Rafael Dueire Lins, Rinaldo Lima, “A Four Dimension Graph Model for Automatic Text mmarization”, IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013.
Ramesh Nallapati, Feifei Zhai, Bowen Zhou, Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In AAAI, pp. 3075–3081, 2017.
Pengjie Ren, Zhumin Chen, Zhaochun Ren, Furu Wei, Jun Ma, Maarten de. Rijke, Leveraging contextual sentence relations for extractive summarization using a neural attention model. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 95–104, New York, NY, USA. ACM, 2017.
Qingyu Zhou, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao, Neural Document Summarization by Jointly Learning to Score and Select Sentences. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 654–663 Melbourne, Australia, July 15 – 20, 2018.
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuglu, P. Kuksa. Natural Language Processing (Almost) from Scratch, Journal of Machine Learning Research 12:2493–2537, 2011.
Shashi Narayan, "Ranking Sentences for Extractive Summarization with Reinforcement Learning", 2018.
Sergey Brin and Lawrence Page, “The anatomy of a large-scale hypertextual Web search engine,” In Computer networks and ISDN systems 30(1-7):107–117, 1998.
Telmo Pires, Eva Schlinger, and Dan Garrette. “How multilingual is Multilingual BERT?”. arXiv:1906.01502v1 [cs.CL] 4 Jun 2019 (Last updated: 4 Jun 2019).
Truong Quoc Dinh, Nguyen Quang Dung. “Mot giai phap tom tat van ban tieng Viet tu dong”, Hoi thao quoc gia lan thu XV: Mot so van de chon loc cua Cong nghe thong tin va truyen thong, Ha Noi, 03-04/12/2012.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. “Distributed representations of words and phrases and their compositionality,” In Proceedings of NIPS’13, 2013.
Viet Nguyen Quoc, Huong Le Thanh, and Tuan Luu Minh. “Abtractive Text Sumarization using LSTMs with rich features”. In International Conference of the Pacific Association for Computational Linguistics PACLING 2019: Computational Linguistics pp 28-40. DOI: 10.1007/978-981-15-6168-9_3, 2019.
Vishal Gupta, “A Survey of Text Summarization Extractive Techniques”. JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 2, NO.3, 2010.
“What Is ROUGE And How It Works For Evaluation Of Summarization Tasks?” url: https://rxnlp.com/how-rouge-works-for-evaluation-of-summarizationtasks/#.XOO5Z8j7TIW, 2019.
Xingxing Zhang, Furu Wei, and Ming Zhou, HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization. arXiv:1905.06566v1 [cs.CL], 16 May 2019.
Xingxing Zhang, Mirella Lapata, Furu Wei, Ming Zhou, Neural Latent Extractive Document Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 779–784 Brussels, Belgium, October 31 - November 4, 2018.
Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil. “Multilingual Universal Sentence Encoder for Semantic Retrieval”. arXiv:1907.04307v1 [cs.CL] (Submitted on 9 Jul 2019).
Y. Zhang et al, Extractive Document Summarization Based on Convolutional Neural Networks, IECON 2016 - 42nd Annual Conference of the IEEE Industrial Electronics Society, p. 918-922, 2016.
Y. Kim, “Convolutional neural networks for sentence classification,” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, 2014.
Y. Liu, "Fine-tune BERT for Extractive Summarization," arXiv preprint arXiv:1903.10318, 2019.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov, “RoBERTa: A Robustly Optimized BERT Pretraining Approach,”. arXiv:1907.11692v1 [cs.CL] (Submitted on 26 Jun 2019).
Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In arXiv preprint arXiv:1506.06724, 2015.
Yuxiang Wu, Baotian Hu, Learning to Extract Coherent Summary via Deep Reinforcement Learning, The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 5602-5609, 2018.
Ziqiang Cao, Furu Wei, Li Dong, Sujian Li, Ming Zhou, Ranking with recursive neural networks and its application to multi-document summarization. In AAAI, pages 2153–2159, 2015a.
License1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.
2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.