Analyzing Linguistic Errors in the Writings of Non-Native Swahili Learners Using Data Science Techniques

Badredden Mohamed Salem

Analyzing Linguistic Errors in the Writings of Non-Native Swahili Learners Using Data Science Techniques

Date

2026-5

Type

Conference paper

Conference title

Author(s)

Badredden Mohamed Salem

Pages

667 - 672

Abstract

This research aims to employ data science techniques to analyze linguistic errors in the writings of Libyan Swahili learners. The goal is to identify recurring patterns in writing performance and provide data-driven insights for improving language teaching. This research represents an intersection between applied linguistics and modern technologies. A sample of Swahili texts written by students at the University of Tripoli studying Swahili as a foreign language was collected. The linguistic errors in these texts were classified into grammatical, morphological, lexical, and orthographic categories and analyzed using data science tools such as Python, NLTK, and Pandas. Interactive dashboards were developed to monitor error frequencies and their distribution by students' educational levels and language backgrounds. The results revealed that the most common linguistic problems were related to the use of tenses, complex sentence constructions, and indirect grammatical structures, with a correlation between error patterns and learners' native language which is Arabic. The study recommends using this type of analysis to develop intelligent assessment tools and teaching curricula based on real data, which will contribute to improving the efficiency of teaching Swahili to Arabic speakers from Libya. The research highlights the importance of integrating language and technology and establishes a model that can be applied to other languages that are not adequately supported digitally.

Fulltext

View

Publisher's website

View