Canada Customer Data Extraction

Loads and inspects an Excel dataset, then reads and parses a text file to extract specific customer names and monetary values using regular expressions.

Category
Machine Learning
Completion Date
April 2024
Technologies Used
Python 3 Jupyter Notebook pandas matplotlib re (regular expressions) built-in file
Project File
Downloading is only permitted with permission from Ameen Qahtan. Contact him to get permission.

Project Overview

<table><tbody><tr data-start=\"2126\" data-end=\"2830\"><td data-start=\"2157\" data-end=\"2830\" data-col-size=\"xl\">The notebook first imports <strong data-start=\"2186\" data-end=\"2196\">pandas</strong> and <strong data-start=\"2201\" data-end=\"2215\">matplotlib</strong> to load the <code data-start=\"2228\" data-end=\"2252\">clean_canada_data.xlsx</code> Excel file into a DataFrame and display its first few rows for a quick overview. It then switches to text processing: using Python’s built-in <strong data-start=\"2395\" data-end=\"2403\">open</strong> and the <strong data-start=\"2412\" data-end=\"2418\">re</strong> module, it reads <code data-start=\"2436\" data-end=\"2454\">Iphone_Order.txt</code>, extracts the full name of the second customer whose name starts with “S” and ends with “er”, finds all dollar-amount patterns (e.g. “$1,499.99”), and demonstrates splitting the text at punctuation. Finally, it shows the first five records with proper header columns to illustrate basic data-wrangling and pattern-matching techniques .</td></tr></tbody></table><table><tbody><tr data-start=\"2831\" data-end=\"3243\"><td data-start=\"2831\" data-end=\"2862\" data-col-size=\"sm\"></td></tr></tbody></table>

Project File

Preview of the project's File