Capturing Variants of Transliterated Arabic Names in English Text

Date

2009-9

Type

Conference paper

Conference title

The International Symposium on Arabic Transliteration Standard Challenges and Solutions

Author(s)

Abdusalam F. Ahmad Nwesri
Nabila Al-Mabrouk S. Shinbir

Pages

160 - 173

Abstract

Transliteration is the process of representing words of one language into another using corresponding equivalent phonemes. For example, “Mohammed”, “Mohammad”, or “Muhammed” are three valid transliterations to the Arabic proper noun “محمد”. Transliteration from Arabic to English usually results in several different versions for the same Arabic name causing some names to have more than 40 different versions. Finding transliterated names is a problem in most languages. In English, this problem has been studied by researchers and many techniques have been developed to find transliterated names referring to the same foreign name. Techniques such as string matching and phonetic matching have been used to find similar names. However, some of these techniques were designed to find similar names of English origin and not a specific transliterated names. In this paper we review current techniques used to find variants of the same name and introduce a new technique we specifically developed to find transliterated Arabic names in English text. We developed a data set of more than 25,000 transliterated Arabic names and tested the effectiveness of current and the new technique on finding 115 names within that list. Our results show that our technique is superior to all other techniques. We also present an online system that we developed to find transliterated Arabic names on the web using our technique. arabic 8 English 57

Fulltext

View