Academia Sinica Balanced Corpus
(706 words)

1. The Sinica Corpus

Academia Sinica Balanced Corpus (Sinica Corpus) is the first proportionally sampled Chinese corpus with part-of-speech tagging. The corpus (Sinica 1.0) was compiled and opened to the research community through direct license in 1995 (Huang et al. 1995). Its size was two million words. After 10 years of further development, it was upgraded to the Sinica 5.0 with ten million words in 2005. Its on-line web service is available at http://asbc.iis.sinica.edu.tw. The corpus can also be a…

Cite this page
Keh-Jiann CHEN and Chu-Ren HUANG, “Academia Sinica Balanced Corpus”, in: Encyclopedia of Chinese Language and Linguistics, General Editor Rint Sybesma. Consulted online on 27 March 2017 <http://dx.doi.org/10.1163/2210-7363_ecll_COM_000191>
First published online: 2015



▲   Back to top   ▲