A Semantic Based Gender Identifications through User Generated Contents
Keywords:
Gender identification, user profiling, semantic content, user-generated contentAbstract
User gender is crucial information for personalized services and applications in online social networks. It impacts areas such as recommendation systems, advertising, and connection discovery. However, user gender information may be hidden or not specified in online social networks, leading to inaccuracies or limitations in various applications. The daily interactions of billions of users on online social networks like Flickr contribute to creating vast amounts of user-generated content. This content includes multiple media such as images, videos, and textual information. The primary aim of this paper is to address the challenge of identifying the gender of users. Our approach involves a semantic-based data technique. Using a semi-automatic image tagging system implies a process where images are labeled or categorized with automation, potentially improving efficiency and accuracy. We employ two classification algorithms for gender identification: Naive Bayes and Support Vector Machines (SVM), where data are typically represented as feature vectors. Our experimental results on more than 149,700 Flickr users demonstrate an accuracy of over 84% for gender identification. This suggests that combining Naive Bayes and SVM algorithms, with data represented as feature vectors, has proven effective in classifying gender based on user-generated content.