CIRAL


Cross-Lingual Information Retrieval for African Languages

@FIRE 2023

Call for Submissions

TRACK OVERVIEW



As information on the web continually expands across different languages, Cross-lingual information retrieval (CLIR) systems which enable users search in one language and retrieve documents in another are becoming increasingly important. Research in CLIR for African languages is also growing, and these methods often require African CLIR test collections to adequately evaluate systems and expand research. Collections have been curated which either include some African languages or solely focus on African languages, however, these collections are mostly created via translation or synthetically and could be prone to bias and translation issues.

The goal of the CIRAL track is promote the research and evaluation of CLIR for African languages. With the intent of curating a human-annotated test collection through a community shared task, our track entails retrieval between English and four African languages which are Hausa, Somali, Swahili and Yoruba. Given the low-resourced nature of African languages, this track also focuses on fostering CLIR research and evaluation in low-resource settings, and hence the development of retrieval systems that are well suited for such tasks.

PARTICIPATION


Task

Track participants are tasked with developing retrieval systems that return documents in a specified African language when issued a query in English. Retrieval is done at the passage level, with queries formulated as natural language questions and passages relevant to a given query are those with answers to the question. More details on the training and tests sets are provided in the Dataset section.

Submission

Each team is required to submit run files obtained from their retrieval systems in the standard TREC format. Submissions are expected to be 2 to 3 per language, but with a cap of 3. Participants with more than 3 submissions in any of the languages would have the top 3 selected based on ranking by the team. Run files can be submitted using this form

Evaluation

Evaluation is done by creating pools for each query and manually judging for the binary relevance of retrieved passages (pooling depth is k = 50). Using the provided judgements, the submitted run files are evaluated with standard retrieval metrics such as MAP to account for precision and recall. We also evaluate for early precision using nDCG@10 and P@10.

Working Notes

Each team is be required to submit a working note detailing their proposed retrieval system and approach to the task. The required format for the working note is the ACM SIG’s template, with a maximum of 5 pages. Submissions would be made to the track’s email at ciralproject23@gmail.com.

DATASET



For each language, a static collection of passages extracted from news articles is provided. The training set comprises of the static collection, approximately 10 queries per language and some binary relevance judgements for each query. The test set comprises approximately 30 queries per languages. The statistics of the collection is documented in the dataset repo, and can also be found in the table on the right. The table would be updated as the dataset is curated.

The datasets would be made available in this Google drive according to the release date for each set. Participants can send a mail to ciralproject23@gmail.com to obtain permission to the folder of interest.

# Train Queries # Test Queries # Passages
Hausa (hau) 10 30 715,355
Somali (som) 10 30 1,015,567
Swahili (swa) 10 30 981,658
Yoruba (yor) 8 27 82,095

BULLETINS


Track Timeline


22nd May 2023 Track Website opens
Registration for Track Begins
28th May 7th Jun 2023 Training Data Released
12th Jul 2023 Test Data Released
1st Aug 2023 Run Submission Deadline
15th Aug 2023 Declaration of Results
15th Sep 2023 Working Note Submission
5th Oct 2023 Review Notifications
15th Oct 2023 Final Version of Working Note

Announcements


Coming soon…

Organizers


Mofetoluwa Adeyemi
Mofetoluwa Adeyemi

University of Waterloo

Akintunde Oladipo
Akintunde Oladipo

University of Waterloo

Xinyu Crystina Zhang
Xinyu Crystina Zhang

University of Waterloo

Jimmy Lin
Jimmy Lin

University of Waterloo