Natural Language Processing Characterization of Recurring Calls in Public Security Services

2020 International Conference on Computing, Networking and Communications (ICNC)(2020)

引用 7|浏览357
暂无评分
摘要
Extracting knowledge from unstructured data silos, a legacy of old applications, is mandatory for improving the governance of today’s cities and fostering the creation of smart cities. Texts in natural language often compose such data. Nevertheless, the inference of useful information from a linguistic-computational analysis of natural language data is an open challenge. In this paper, we propose a clustering method to analyze textual data employing the unsupervised machine learning algorithms k-means and hierarchical clustering. We assess different vector representation methods for text, similarity metrics, and the number of clusters that best matches the data. We evaluate the methods using a real database of a public record service of security occurrences. The results show that the k-means algorithm using Euclidean distance extracts non-trivial knowledge, reaching up to 93% accuracy in a set of test samples while identifying the 12 most prevalent occurrence patterns.
更多
查看译文
关键词
Natural Language Processing,K-means,Hierarchical Clustering,Machine Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要