Stemming Arabic Conjunctions and Prepositions

Date

2005-1

Type

Article

Journal title

String Processing and Information Retrieval. SPIRE 2005

Issue

Vol. 0 No. 3772

Author(s)

Abdusalam Alfitory Ahmad Nwesri
S.M.M. Tahaghoghi
Falk Scholer

Pages

206 - 217

Abstract

Arabic is the fourth most widely spoken language in the world, and is characterised by a high rate of inflection. To cater for this, most Arabic information retrieval systems incorporate a stemming stage. Most existing Arabic stemmers are derived from English equiv- alents; however, unlike English, most affixes in Arabic are difficult to discriminate from the core word. Removing incorrectly identified affixes sometimes results in a valid but incorrect stem, and in most cases reduces retrieval precision. Conjunctions and prepositions form an interesting class of these affixes. In this work, we present novel approaches for deal- ing with these affixes. Unlike previous approaches, our approaches focus on retaining valid Arabic core words, while maintaining high retrieval performance.

Publisher's website

View