The iris dataset is a classic example in machine learning and data analysis, which consists of 150 samples from three species of irises (Setosa, Versicolour, and Virginica). Each sample is described by four features: sepal length, sepal width, petal length, and petal width. In this article, we will explore how to create a 3D scatter plot using the iris dataset and apply difussion-based classification to classify the samples into their respective species.
Loading the Iris Dataset
The first step is to load the iris dataset, which can be done using the following code:
import numpy as np
from sklearn.datasets import load_iris
iris = load_iris()
This will load the iris dataset into a NumPy array, where each row represents a sample and each column represents a feature.
Creating a 3D Scatter Plot
Next, we will create a 3D scatter plot using the first three features of the iris dataset (sepal length, sepal width, and petal length). This can be done using the following code:
import matplotlib.pyplot as plt
fig = plt.figure(1, figsize=(8, 6))
ax = fig.add_subplot(111, projection="3d", elev=-150, azim=110)
X_reduced = PCA(n_components=3).fit_transform(iris.data)
ax.scatter(
X_reduced[:, 0],
X_reduced[:, 1],
X_reduced[:, 2],
c=iris.target,
s=40,
)
ax.set_title("First three PCA dimensions")
ax.set_xlabel("1st Eigenvector")
ax.xaxis.set_ticklabels([])
ax.set_ylabel("2nd Eigenvector")
ax.yaxis.set_ticklabels([])
ax.set_zlabel("3rd Eigenvector")
ax.zaxis.set_ticklabels([])
plt.show()
This code creates a 3D scatter plot using the first three features of the iris dataset. The x-axis, y-axis, and z-axis represent the first, second, and third principal components, respectively. The color of each point represents the species of the iris (Setosa, Versicolour, or Virginica).
Applying Difussion-based Classification
To classify the samples into their respective species using difussion-based classification, we can use the following code:
import numpy as np
from sklearn.preprocessing import StandardScaler
X_reduced = PCA(n_components=3).fit_transform(iris.data)
X_scaled = StandardScaler().fit_transform(X_reduced)
beta = 0.5
alpha = 0.1
for i in range(len(X_scaled)):
for j in range(i+1, len(X_scaled)):
if iris.target[i] == iris.target[j]:
X_scaled[i, :] += beta * (X_scaled[i, :] - X_scaled[j, :])
else:
X_scaled[i, :] += alpha * (X_scaled[i, :] - X_scaled[j, :])
Y_pred = []
for i in range(len(X_scaled)):
distances = np.linalg.norm(X_scaled[i, :] - X_scaled, axis=1)
nearest_neighbors = np.argsort(distances)[:5]
classes = iris.target[nearest_neighbors]
Y_pred.append(np.argmax(np.bincount(classes)))
print(Y_pred)
This code applies difussion-based classification to the 3D scatter plot created earlier. The basic idea is to iteratively update each sample's feature vector based on its neighbors' feature vectors, where the weights of the updates are determined by the similarity between the samples.
The beta
parameter controls the strength of the similarity-based updates, while the alpha
parameter controls the strength of the dissimilarity-based updates. The classification is done by finding the most frequent class among the nearest neighbors for each sample.
Results
The resulting classification is printed to the console and can be used to evaluate the performance of the difussion-based classifier. The accuracy of the classifier can be evaluated using metrics such as precision, recall, and F1-score., we have demonstrated how to create a 3D scatter plot using the iris dataset and apply difussion-based classification to classify the samples into their respective species. This is just one example of how difussion-based classification can be used in machine learning applications.