Japanese text extractor

12/31/2023

Therefore, automatic keyword extraction has become a great challenge that needs to be solved today.

In the new environment of the rapid development of Internet technology and the rise of big data, as well as the rapid and dramatic increase of massive text, society urgently needs the technology of automatic keyword extraction for text to improve the efficiency of people's access to text information. Due to the advantages of low cost and high efficiency, automatic keyword extraction technology has become an important technology to assist people to obtain the required text information quickly. Another way is to use automatic keyword extraction technology i.e., the computer automatically extracts the corresponding keywords from the text according to a certain method. To extract keywords from texts, one way is to perform annotation manually however, this way is costly and inefficient, which is difficult to meet the processing needs of massive texts. However, among the huge number of texts, some texts provide keywords (e.g., academic papers), but most texts do not, which is not conducive to the effective acquisition and processing of textual information. In short, keywords can greatly improve the efficiency of people in acquiring and processing textual information. For example, in a certain paper, according to the keywords provided in the paper, one can roughly understand the main content of the paper for example, seeing the keywords provided in the news of a certain period, one can roughly understand and judge what happened in this period furthermore, in the field of information retrieval, using keywords, one can quickly search and find the papers, web pages, and other documents. Keywords are the core words that cover the main idea of a text, and, with the help of keywords, one can quickly access the topic of the text. To get the required information from the huge number of texts quickly and effectively, people usually need to use keywords. With the development of information technology and the popularity of Internet applications, the amount of text in various fields is growing rapidly. The experimental results show that the proposed keyword extraction algorithm can improve the performance by a maximum of 6.45% and 20.36% compared with the existing word frequency statistics and graph model methods, respectively MF-Rank can achieve a maximum performance improvement of 1.76% compared with PW-TF. To verify the performance of the algorithm, extensive simulation experimental studies were conducted on three different types of datasets. The main idea is to use not only the statistical and structural features of words but also the semantic features of words extracted through word-embedding techniques, i.e., multifeature fusion, to obtain the importance weights of words themselves and the attraction weights between words and then iteratively calculate the final weight of each word through the graph model algorithm to determine the extracted keywords. In this paper, the sliding window in TextRank is designed to connect internal document information to improve the in-text semantic coherence. In this paper, we conduct an in-depth study of Japanese keyword extraction from news reports, train external computer document word sets from text preprocessing into word vectors using the Ship-gram model in the deep learning tool Word2Vec, and calculate the cosine distance between word vectors.

0 Comments

Japanese text extractor

Leave a Reply.

Author

Archives

Categories