Date
2005-1Type
ArticleJournal title
String Processing and Information Retrieval. SPIRE 2005Issue
Vol. 0 No. 3772Author(s)
Abdusalam Alfitory Ahmad NwesriS.M.M. Tahaghoghi
Falk Scholer
Pages
206 - 217Abstract
Arabic is the fourth most widely spoken language in the world, and is characterised by a high rate of inflection. To cater for this, most Arabic information retrieval systems incorporate a stemming stage. Most existing Arabic stemmers are derived from English equiv- alents; however, unlike English, most affixes in Arabic are difficult to discriminate from the core word. Removing incorrectly identified affixes sometimes results in a valid but incorrect stem, and in most cases reduces retrieval precision. Conjunctions and prepositions form an interesting class of these affixes. In this work, we present novel approaches for deal- ing with these affixes. Unlike previous approaches, our approaches focus on retaining valid Arabic core words, while maintaining high retrieval performance.