On the Importance of Text Classification Pipeline Components for Practical Applications: A Case Study

semanticscholar(2020)

引用 0|浏览2
暂无评分
摘要
The worlds of academia and industry have different priorities for machine learning models. In the academic world, the model’s performance is often the main focus, whereas finding the balance between the model’s performance, resource requirements, and the ease of its deployment is often deemed more important in the production environment of the industry. In this paper we consider a real world text classification problem, compare the specifics of different parts of natural language processing pipelines and investigate their contribution to the final model’s performance. We also take into consideration the practical aspects of the model’s use and deployment, such as the size of the model and preprocessing time. Our case-study shows that in this particular scenario the performance of simpler models can be on par with the more complex ones. We find this result valuable, as simpler and smaller models are normally also easier to deploy in practice, e.g. in a serverless environment. To showcase the practical usefulness of our final model, we deploy it to AWS Lambda and show that its execution time in this environment scales linearly with the input text’s length.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要