بارق: المكنز العربي المتوازن لتقييم الانقرائية

BAREC: The Balanced Arabic Readability Evaluation Corpus

About نبذة

The overarching objective of the BAREC project is to develop a comprehensive reference resource to facilitate the study and evaluation of Arabic readability across the Arab world. This proposal is aligned with the recommendations set forth in the Arabic language curriculum research that the Abu Dhabi Arabic Language Center is currently conducting. BAREC will adopt an evidence-based approach and generate practical resources and tools to support and enhance the use of the Arabic language. To this end, we aim to compile a corpus of 10 million words that encompasses diverse genres, topics, and countries of origin, with a particular focus on readability levels. Portions of this corpus will undergo manual annotation to mark vocabulary and syntax complexity. Furthermore, we will build a comprehensive lexicon annotated for readability levels. These annotations will serve as the basis for developing artificial intelligence (AI) tools to automatically annotate the remaining corpus. We will also design additional AI tools to assist content creators in assessing the readability levels of their materials based on specific target audiences.

The project start date: September, 2023

إن الهدف الأساسي من مشروع «بارق» هو تطوير مورد مرجعي شامل من أجل تسهيل دراسة وتقييم إمكانية القراءة باللغة العربية في جميع أنحاء العالم العربي. ويأتي المقترح هذا متوافقًا مع التوصيات الواردة في أبحاث مناهج اللغة العربية التي يجريها مركز أبوظبي للغة العربية حاليًا. يعتمد مشروع «بارق» نهجًا قائمًا على الأدلة ويوفر موارد وأدوات عملية بغية دعم وتعزيز استخدام اللغة العربية. وتحقيقًا لهذه الغاية، نهدف إلى تجميع ذخيرة لغوية مكونة من 10 ملايين كلمة تشمل طيفًا واسعًا من الأنواع والموضوعات وبلدان المصدر، مع تركيز خاص على مستويات إمكانية القراءة. وستخضع أجزاء من هذه الذخيرة اللغوية لعملية إضافة توسيمات إلى المفردات والتراكيب المعقّدة. علاوة على ذلك، سنبني معجمًا شاملًا تضاف إليه تعليقات توضيحية لأغراض تخص مستويات إمكانية القراءة. وستشكّل هذه التوسيمات أساسًا لتطوير أدوات الذكاء الاصطناعي التي ستعمل على إضافة التوسيمات تلقائيًا إلى باقي الذخيرة اللغوية. كما سنصمم أدوات ذكاء اصطناعي إضافية لمساعدة مطوري المحتوى في تقييم مستويات انقرائية موادهم.

تاريخ بداية المشروع: سبتمبر ٢٠٢٣

Team فريق العمل

The BAREC project will be led by a team of researchers and experts from both New York University Abu Dhabi, Zayed University and the Abu Dhabi Arabic Language Center. The Principal Investigator is Prof. Nizar Habash, NYUAD Professor of Computer Science and Director of the Computational Approaches to Modeling Language (CAMeL) Lab, a leading research group on Arabic Artificial Intelligence. Prof. Hanada Taha, Director of the ZAI Centre at Zayed University, will work closely in the development of the project as the Co-Prinicipal Investigator.

يقود مشروع «بارق» فريق من الباحثين والخبراء من كل من جامعة نيويورك أبوظبي وجامعة زايد ومركز أبوظبي للغة العربية. الباحث الرئيسي هو البروفيسور نزار حبش، أستاذ علوم الحاسوب في جامعة نيويورك أبوظبي ومدير مختبر الأساليب الحاسوبية لنمذجة اللغة (CAMeL Lab مختبر «كامل»)، وهو مختبر بحثي رائد في مجال الذكاء الاصطناعي للعربية. وستعمل البروفيسورة هنادا طه، مديرة مركز ZAI بجامعة زايد، بشكل وثيق في تطوير المشروع بصفتها الباحث الرئيسي المشارك.

Lectures محاضرات

RLRL2023-BAREC-5Sep2023.pdf

BAREC-2023-Intro-Arabic.pdf