Bulgarian-X Language Parallel Corpus Collocation service




Collocations service is a web service for collocations search and different types of statistics over the Bulgarian-X Language Parallel Corpus.

Bulgarian-X Language Parallel Corpus includes parallel corpora of 33 languages – English, German, French, Slavic and Balkan languages, as well as other European and non-European languages (28 languages are available trough the web interface).

At present, the Bulgarian-X Language Parallel Corpus contains 1.9 billion tokens, comprising the biggest parallel corpus of Bulgarian. Languages are not equally represented: the largest parallel corpus is the Bulgarian-English parallel corpus (280.8 and 283.1 million words for Bulgarian and English respectively); there are 5 other corpora between 100 and 200 million tokens per language, 16 parallel corpora of size in the range 30-52 million tokens per language, further 7 in the range 1-10 million tokens, and the rest are below 1 million, with the smallest corpora being the Chinese, Japanese and Icelandic with less than 50,000 tokens per language. Each parallel subcorpus within Bul-X-Cor mirrors the structure of BulNC.

The Corpus Collocation service employs the free of charge NoSketchEngine, a system for corpora processing that combines Manatee and Bonito.

The Collocation service is a RESTful webservice, supporting complicated queries through http. Example: http://dcl.bas.bg/collocations/?cmd=collocations&word=нет
user: bulnc
pass: bulnc
The query returns the collocations of a given word in the NoSketchEngine format.
The system also supports additional arguments, namely all that are accepted by NoSketchEngine, provided with default values and an optional language identificator. The following example restricts the statistics to Bulgarian: http://dcl.bas.bg/collocations/?cmd=collocations&word=нет&lang=bg

You don’t have the permission to edit this resource.