Language Detection Using Naive Bayes

A text-classification pipeline that uses TF-IDF features and a Multinomial Naïve Bayes classifier to predict the language of input sentences.

Category
Machine Learning
Completion Date
May 2024
Technologies Used
Python 3 pandas NumPy scikit-learn seaborn matplotlib
Project File
Downloading is only permitted with permission from Ameen Qahtan. Contact him to get permission.

Project Overview

<table><tbody><tr data-start=\"2036\" data-end=\"2510\"><td data-start=\"2067\" data-end=\"2510\" data-col-size=\"xl\">The notebook loads the “Language Detection.csv” dataset into pandas (10 337 entries with “Text” and “Language” columns) , then splits into training and test sets. It vectorizes text using <code data-start=\"2331\" data-end=\"2348\">TfidfVectorizer</code>, fits a <code data-start=\"2357\" data-end=\"2372\">MultinomialNB</code> model, and evaluates performance via accuracy score, confusion matrix, and classification report .</td></tr></tbody></table><table><tbody><tr data-start=\"2511\" data-end=\"2879\"><td data-start=\"2511\" data-end=\"2542\" data-col-size=\"sm\"></td></tr></tbody></table>

Project File

Preview of the project's File