page contents

Google: AI can automate text summaries

(Original title: Google: AI can automate text summaries)

  Google's artificial intelligence enables state-of-the-art text snippet performance


Automatic text summarization is one of the directions that machine learning algorithms are working on, and a recent paper published by Microsoft also shows this trend. This is great news for workers who read a lot of text messages every day. Surveys have shown that such workers spend approximately 2.6 hours a day reading information alone.


Correspondingly, Google Brain and a team at Imperial College London built a system-Pregasining (Pre-training with Extracted Gap-sentences for Abstractive Summarization Sequence-to-sequence), which uses Google ’s Transformer architecture and combines Pre-training goals for text summary capabilities. It is said to have reached the most advanced level in 12 tests, including science, stories, email, patents and legislation. Not only that, it also performed amazingly in textual integration tests that lacked material.


As the researchers have pointed out, the purpose of text summaries is to summarize the input documents and generate their accurate and concise summaries.


The abstract summary is not simply copying and pasting text fragments from the input text, but it will generate new words or summarize important information, so that the output language remains smooth.


Transformers are a neural structure that researchers at Google Brain (Google's artificial intelligence research unit) are introducing.


It extracts features and learns to make predictions in the same way as all deep neural networks: neurons are arranged in layers that are connected to each other. These layers pass the signal of input data and adjust the weight of each connection.


But the Transformers architecture is unique: each output element and each input element are connected, and the weights between them are calculated dynamically.


In testing, the research team selected the best performing Pegasus model, which contains 568 million parameters. It has two training materials. One is 750GB of text extracted from 350 million web pages. There is also a training material covering 1.5 billion news articles, totaling 3.8TB. Researchers said that in the latter case, they used whitelisted domains to implant web crawlers, covering uneven quality content.


According to the researchers, Pegasus's summary language is excellent, with high levels of fluency and coherence. In addition, in a text-poor environment, even with only 100 example articles, the quality of the abstracts it produces is comparable to models trained on a complete dataset of 20,000 to 200,000 articles.


Source: NetEase Smart, translated by Google Translate

Statement: this information is reprinted from authoritative news media. Reprinted for the purpose of transmitting more information and academic exchange, it is not used for commercial purposes, and does not mean to agree with its views or confirm its description. The content of this article is for reference only. If you violate the rights and interests of a third party, please contact us and we will deal with it as soon as possible. 

推荐

  • QQ空间

  • 新浪微博

  • 人人网

  • 豆瓣

取消
技术支持: 机器人行业建站
  • Home
  • 手机
  • 地址
  • QQ