Abstract
Arabic is the fourth most widely spoken language in the world, and is characterised by a high rate of inflection. To cater for this, most Arabic information retrieval systems incorporate a stemming stage. Most existing Arabic stemmers are derived from English equiv- alents; however, unlike English, most affixes in Arabic are difficult to discriminate from the core word. Removing incorrectly identified affixes sometimes results in a valid but incorrect stem, and in most cases reduces retrieval precision. Conjunctions and prepositions form an interesting class of these affixes. In this work, we present novel approaches for deal- ing with these affixes. Unlike previous approaches, our approaches focus on retaining valid Arabic core words, while maintaining high retrieval performance.