NYU Abu Dhabi researchers develop large-scale readability-levelled thesaurus for Arabic

NYU Abu Dhabi researchers develop large-scale readability-levelled thesaurus for Arabic

ABU DHABI, 19th January, 2021 (WAM) -- Researchers from NYU Abu Dhabi (NYUAD) have developed an Online Readability Levelled Arabic Thesaurus.

The work was conducted by Associate Professor of Practice of Arabic Language, Muhamed Al Khalil, in collaboration with Professor of Computer Science, Nizar Habash, who also leads the Computational Approaches to Modelling Language (CAMeL) Laboratory.

The one-of-a-kind interface provides the possible roots, English glosses, related Arabic words and phrases, and readability on a five-level readability scale for a user-entered Arabic word. It also connects multiple existing Arabic resources and processing tools, enabling Arabic speakers and learners to benefit from recent advances in Arabic computational linguistics technologies.

The interface is one of the products of the NYUAD-funded project Simplification of Arabic Masterpieces for Extensive Reading (SAMER).

A collaboration between NYUAD’s Arabic Studies Programme and CAMeL Lab, SAMER seeks to create a standard for the simplification of modern fiction in Arabic to school-age learners and to use this standard to simplify several Arabic fiction masterpieces.

Commenting on the research paper, Al Khalil said, "Arabic is one of the six official languages of the United Nations and is the language of hundreds of millions of people in the Arab world and beyond. It is extraordinarily rich linguistically but with that comes higher complexity and a steeper learning curve. Add to this the fact that the standard form of Arabic used in education and media is not the daily form spoken by modern-day Arabs who speak a variety of its dialects."

Habash commented, "Arabic poses many difficulties for artificial intelligence, some of which are similar to those facing new learners: it has a very rich word structure, a highly ambiguous spelling system, and many dialects. The resources we developed have great potential for developing smart technologies that can assist natives and learners interested in writing and reading in Arabic."

Established in September 2014, CAMeL’s mission was research and education in artificial intelligence, specifically focusing on natural language processing, computational linguistics, and data science. The main laboratory research areas are Arabic natural language processing, machine translation, text analytics, and dialogue systems.

The interface was presented as part of the International Conference on Computational Linguistics (COLING) 2020. The paper entitled A Large-Scale Levelled Readability Lexicon for Standard Arabic, (presented at the 12th Language Resources and Evaluation Conference in Marseille, France) provides further research background on the thesaurus.