Chrome Extension

WeChat Mini Program

Use on ChatGLM

Log in

Academic Profile User Profile

My Following Paper Collections Browse History

A Twitter Corpus for Hindi-English Code Mixed POS Tagging.

Kushagra Singh,Indira Sen,Ponnurangam Kumaraguru

Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media（2018）

Cited 45|Views7

Abstract

Code-mixing is a linguistic phenomenon where multiple languages are used in the same occurrence that is increasingly common in multilingual societies. Code-mixed content on social media is also on the rise, prompting the need for tools to automatically understand such content. Automatic Parts-of-Speech (POS) tagging is an essential step in any Natural Language Processing (NLP) pipeline, but there is a lack of annotated data to train such models. In this work, we present a unique language tagged and POS-tagged dataset of code-mixed English-Hindi tweets related to five incidents in India that led to a lot of Twitter activity. Our dataset is unique in two dimensions: (i) it is larger than previous annotated datasets and (ii) it closely resembles typical real-world tweets. Additionally, we present a POS tagging model that is trained on this dataset to provide an example of how this dataset can be used. The model also shows the efficacy of our dataset in enabling the creation of code-mixed social media POS taggers.

More

Translated text

Bibtex

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Related Papers

Reference papers

Cited Papers

Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network

K Toutanova,D Klein,CD Manning,Y Singer

2003

被引用4525 | 浏览

Part-of-speech Tagging for Twitter: Annotation, Features, and Experiments

Kevin Gimpel,Nathan Schneider,Brendan O'Connor,Dipanjan Das,Daniel Mills,Jacob Eisenstein,Michael Heilman,Dani Yogatama,Jeffrey Flanigan,Noah A. Smith

2011

被引用1320 | 浏览

Generation of T-cell Receptor Retrogenic Mice.

Jeff Holst,Andrea L Szymczak-Workman,Kate M Vignali,Amanda R Burton,Creg J Workman, A

2006

被引用1537 | 浏览

A Survey of Named Entity Recognition and Classification

David Nadeau,Satoshi Sekine

2007

被引用1972 | 浏览

Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters.

Olutobi Owoputi,Brendan O'Connor,Chris Dyer,Kevin Gimpel,Nathan Schneider,Noah A. Smith

2013

被引用938 | 浏览

Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns

Yejin Choi,Claire Cardie,Ellen Riloff, Siddharth Patwardhan

2005

被引用484 | 浏览

POS Tagging of English-Hindi Code-Mixed Social Media Content

Yogarshi Vyas,Spandana Gella,Jatin Sharma,Kalika Bali,Monojit Choudhury

2014

被引用278 | 浏览

Contextual LSTM (CLSTM) Models for Large Scale NLP Tasks

Shalini Ghosh,Oriol Vinyals,Brian Strope,Scott Roy, Tom Dean,Larry Heck

2016

被引用275 | 浏览

Understanding Language Preference for Expression of Opinion and Sentiment: What Do Hindi-English Speakers Do on Twitter?

Koustav Rudra,Shruti Rijhwani,Rafiya Begum,Kalika Bali,Monojit Choudhury,Niloy Ganguly

2016

被引用86 | 浏览

Shallow Parsing Pipeline for Hindi-English Code-Mixed Social Media Text

Arnav Sharma,Sakshi Gupta,Raveesh Motlani,Piyush Bansal,Manish Srivastava,Radhika Mamidi,Dipti M. Sharma

2016

被引用104 | 浏览

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

GPU is busy, summary generation fails

Rerequest