A Study of the Tibetan Linguistic Picture of the World Using Computer Ontology

REVUE D ETUDES TIBETAINES（2020）

引用 0|浏览1

暂无评分

摘要

he research presented in this article is a summary of several research projects aimed at the creation of a full-scale natural language processing engine based on a consistent formal model of Tibetan vocabulary, grammar, and semantics, verified by and developed on the basis of a representative, hand-tested corpus of texts. The Basic Corpus of Classical Tibetan2 and the Corpus of Indigenous Tibetan Grammar Treatises 3 comprise 34,000 and 48,000 tokens, 4 respectively. Tibetan texts are represented both in the Tibetan Unicode script and in standard Wylie romanization.5 These corpora are developed, annotated, and tested manually by Tibetologists, and in this sense, are unique. The ultimate goal of our project is to create a formal model (a grammar and a linguistic ontology) of the Tibetan language, including morphosyntax, syntax of phrases, hyperphrase unities,6 and semantics, that can produce a correct morpho-syntactic, syntactic, and semantic annotation of the corpora without any further manual corrections. This study is based on the technologies and tools of the AIIRE project.7 AIIRE8 is a free open-source natural language understanding

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要