KCSE RESULTS WEB SCRAPING PROJECT

The Full Project on GitHub

In this project, I designed and implemented a comprehensive web scraping solution using Python to extract and structure tabular data related to the Kenya Certificate of Secondary Education (KCSE) from the official webpage of Moi Kapsowar Girls High School. The motivation behind this initiative was to streamline the data collection process, which is often labor-intensive and prone to errors when done manually.

The primary objective of the web scraping solution was to automate the extraction of KCSE data, ensuring that the information gathered was both accurate and up-to-date. To achieve this, I employed various Python libraries, including Beautiful Soup and Pandas, which facilitated the parsing of HTML content and the organization of the extracted data into a structured format. This involved identifying and isolating relevant HTML elements, such as tables containing examination results analysis, and systematically pulling the required data points.

After extracting the data, I focused on the critical tasks of cleaning and structuring it to enhance its usability for subsequent analysis. This included removing any inconsistencies and formatting the data into a coherent and standardized structure. The end goal was to prepare the dataset for in-depth analysis, enabling stakeholders to derive meaningful insights from the KCSE results, track performance trends, and inform future academic strategies.

Source of Data

Official website of Moi Kapsowar Girls.

Tools & Libararies

Jupyter Notebook: Interactive environment used for writing, testing, and documenting the code.
Python: Primary programming language utilized.
Requests: Library used to send HTTP requests and retrieve the HTML content of the webpage.
BeautifulSoup: HTML parser employed to extract the specific kcse result tables.
Pandas: Library used to manipulate, and organize the extracted data into DataFrames.
CSV: Format used to export the final data for storage and further analysis.

Files

Project Structure

1. Data Extraction.

The webpage was accessed using the requests library, and the HTML content was retrieved.
The HTML content was parsed using BeautifulSoup to locate the two target tables on the webpage.

2. Data Cleaning and Structuring.

The tables lacked headers, so appropriate column names were manually defined based on the data context.
The first rows of the tables, which were not relevant to the analysis, were excluded from the final dataset.
The data from each table was extracted, cleaned, and structured into pandas DataFrames.

3. Data Export.

The cleaned DataFrames were exported as CSV files for further analysis.
The CSV files were saved to a specified directory, ensuring the data was organized and easily accessible.

final-exam-results-test-reading-books-words-concept.jpg

THANK YOU!

Thank you for taking the time out to view my project!

In case you would like to discuss this project further, feel free to email me at:

patriciavalentinedanga@gmail.com.

Patricia Valentine