Abstract
With the pervasive use of social media sites, an extraordinary amount of data has been generated in different data types such as text and image. Combining image features and text information annotated by users reveals interesting properties of social user mining, and serves as a powerful way of discovering unknown information about the users. However, there has been few research work reported about combination of image and text data for social user mining. The progress of data mining techniques makes it possible to integrate different data types for effective mining of social media data. In this study, we propose a novel idea to classify the gender of user by integrating multiple types of features. We utilize not only text information, i.e., tag or description, but also images posted by a user with semantic based data fusion technique. Unlike the previous approaches that used a content based approach to merge multiple types of features, our approach is based on image semantic through a semi-automatic image tagging system. For the classifier, we employ Naive Bayes and SVM algorithms, where the integrated data are typically represented as feature vector. We perform the experiments with the data set, and the results show over 80% in terms of accuracy for gender classification, which outperforms the content-based one.