NLP for Low-Resource Settings

[unable to retrieve full-text content]

Natural language processing (NLP) is a field of Artificial Intelligence that tries to establish human-like communication with computers. Although it can boast significant success, computers still struggle with comprehending many facets of language, such as pragmatics, that are difficult to characterize formally. Moreover, most of the success is achieved for popular languages like English or other languages that have text corpora of hundreds of millions of words. But we should understand that these are only about 20 languages from approximately 7,000 languages in the world. The majority of human languages are in dire need of tools and resources to overcome the resource barrier such that NLP can deliver more widespread benefits. They are called low-resource languages, or languages lacking large monolingual or parallel corpora and/or manually crafted linguistic resources sufficient for building statistical NLP applications.

Why Is It Important?

It might look like we need only a dozen of languages to do fine in the world, so why bother with minor or extinct languages? However, building NLP applications for such languages can at the same time reinforce the ties between the world and ensure its diversity.

Preservation. An obvious task for NLP is to process and document languages that do not have a …

