Iris Flower Classification
This work was carried out as a technical project, being a member of the third cohort of the She Code Africa Data Science Mentorship program.
The dataset used is the Iris flower dataset, gotten from Kaggle. In this project, I explored and classified the species of Iris flower based on the sepal length, sepal width petal length and petal width. This work was carried out in three parts: Exploratory Data Analysis, Predictive Modelling and Data Web Application Building. Let’s go!
Exploratory Data Analysis
The data has 150 rows and 6 columns. The columns are Id, sepal length, sepal width, petal length, petal width and species. There are three species of Iris flower in the dataset: Iris setosa, Iris versicolor and Iris virginica.
The table below gives a brief statistical description of the sepal length, sepal width, petal length and petal width (all measured in cm) of all species of iris flower in the dataset.
How does the sepal length and width of an iris flower determine its specie?
The figures above show that Iris setosa has sepal length not greater than 6.0 cm and sepal width not less than 2.3 cm; Iris versicolor has sepal length not less than 5.0 cm and sepal width not greater than 3.4 cm; Iris virginica has sepal length not less than 6.0 cm and sepal width not greater than 3.8 cm. The sepal length and width of Iris setosa is clearly distinct from the other species of Iris flower.
How does the petal length and width of an iris flower determine its specie?
From the figures above, the petal length and width of Iris setosa is again clearly distinct from the others. Iris setosa has petal width and length not greater than 0.6 cm and 2 cm respectively; Iris versicolor has petal width and petal length not less than 0.9 cm and 3.5 cm respectively and Iris virginica has petal width and petal length not less than 1.3 cm and 4 cm respectively.
How does the sepal length and petal length of an iris flower determine its specie?
How does the sepal width and petal width of an iris flower determine its specie?
The figures above show that Iris setosa can be easily distinguished from other Iris flower species (thus, very easily classified) but Iris versicolor and Iris virginica are quite similar in terms of petal length , petal width, sepal length and sepal width.
Predictive Modelling
I trained a classification model using the decision tree classifier from Python’s Scikit-learn library which gave an accuracy of over 97%. The decision tree classifier was used because it can be easily interpreted and the decision tree can be visualized.
The figure below shows the decision tree model plot.
Data Web Application
I built a web application using Streamlit and heroku. The application predicts the specie of iris flower based on selected petal length, petal width, sepal length and sepal width, and gives a picture of the predicted iris flower .
The application can be found here.
Here is the streamlit script for the application.