卷積神經網路進行影像識別

xieju0605發表於2020-11-11

簡介

這是國外的一個機器學習和計算機視覺課程的作業,設計得非常好,以一個影像識別任務由淺入深,從實現knn、pca-knn完成30%-40%左右的準確率,到使用多層感知機實現50%的準確率,再到最後使用卷積神經網路實現60%左右的準確率,很有意思。

Machine Learning and Computer Vision


This assignment contains Tensorflow programming exercises.

Problem 1: Install Tensorflow

Follow the directions on https://www.tensorflow.org/install/ to install Tensorflow on your computer.

Note: You will not need GPU support for this assignment so don’t worry if you don’t have one. Furthermore, installing with GPU support is often more difficult to configure so it is suggested that you install the CPU only version. However, if you have a GPU and would like to install GPU support feel free to do so at your own risk ?

Note: On windows, Tensorflow is only supported in python3, so you will need to install python3 for this assignment.

Run the following cell to verify your instalation.

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
b'Hello, TensorFlow!'

Problem 2: Downloading CIFAR10

Download the CIFAR10 dataset (http://www.cs.toronto.edu/~kriz/cifar.html). You will need the python version: http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

Extract the data to ./data
Once extracted run the following cell to view a few example images.

import numpy as np

# unpickles raw data files
def unpickle(file):
    import pickle
    import sys
    with open(file, 'rb') as fo:
        if sys.version_info[0] < 3:
            dict = pickle.load(fo)
        else:
           dict = pickle.load(fo, encoding='bytes') 
    return dict

# loads data from a single file
def getBatch(file):
    dict = unpickle(file)
    data = dict[b'data'].reshape(-1,3,32,32).transpose(0,2,3,1)
    labels = np.asarray(dict[b'labels'], dtype=np.int64)
    return data,labels

# loads all training and testing data
def getData(path='./data'):
    classes = [s.decode('UTF-8') for s in unpickle(path+'/batches.meta')[b'label_names']]
    
    trainData, trainLabels = [], []
    for i in range(5):
        data, labels = getBatch(path+'/data_batch_%d'%(i+1))
        trainData.append(data)
        trainLabels.append(labels)
    trainData = np.concatenate(trainData)
    trainLabels = np.concatenate(trainLabels)
    
    testData, testLabels = getBatch(path+'/test_batch')
    return classes, trainData, trainLabels, testData, testLabels

# training and testing data that will be used in the following problems
classes, trainData, trainLabels, testData, testLabels = getData()

# display some example images
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure(figsize=(14, 6))
for i in range(14):
    plt.subplot(2,7,i+1)
    plt.imshow(trainData[i])
    plt.title(classes[trainLabels[i]])
plt.show()

print ('train shape: ' + str(trainData.shape) + ', ' + str(trainLabels.shape))
print ('test shape : ' + str(testData.shape) + ', ' + str(testLabels.shape))

在這裡插入圖片描述

train shape: (50000, 32, 32, 3), (50000,)
test shape : (10000, 32, 32, 3), (10000,)

Below are some helper functions that will be used in the following problems.

# a generator for batches of data
# yields data (batchsize, 3, 32, 32) and labels (batchsize)
# if shuffle, it will load batches in a random order
def DataBatch(data, label, batchsize, shuffle=True):
    n = data.shape[0]
    if shuffle:
        index = np.random.permutation(n)
    else:
        index = np.arange(n)
    for i in range(int(np.ceil(n/batchsize))):
        inds = index[i*batchsize : min(n,(i+1)*batchsize)]
        yield data[inds], label[inds]

# tests the accuracy of a classifier
def test(testData, testLabels, classifier):
    batchsize=50
    correct=0.
    for data,label in DataBatch(testData,testLabels,batchsize):
        prediction = classifier(data)
        #print (prediction)
        correct += np.sum(prediction==label)
    return correct/testData.shape[0]*100

# a sample classifier
# given an input it outputs a random class
class RandomClassifier():
    def __init__(self, classes=10):
        self.classes=classes
    def __call__(self, x):
        return np.random.randint(self.classes, size=x.shape[0])

randomClassifier = RandomClassifier()
print ('Random classifier accuracy: %f'%test(testData, testLabels, randomClassifier))
Random classifier accuracy: 9.530000

Problem 3: Confusion Matirx

Here you will implement a test script that computes the confussion matrix for a classifier.
The matrix should be nxn where n is the number of classes.
Entry M[i,j] should contain the number of times an image of class i was classified as class j.
M should be normalized such that each row sums to 1.

Hint: see the function test() above for reference.

def confusion(testData, testLabels, classifier):
    
    """your code here"""

    n = len(set(testLabels))
    prediction = classifier(testData)
    M = np.zeros((n,n))
    for i,j in zip(testLabels,prediction):
        M[i,j]+=1
    M=M/1000
    return M

def VisualizeConfussion(M):
    plt.figure(figsize=(14, 6))
    plt.imshow(M)#, vmin=0, vmax=1)
    plt.xticks(np.arange(len(classes)), classes, rotation='vertical')
    plt.yticks(np.arange(len(classes)), classes)
    plt.show()

M = confusion(testData, testLabels, randomClassifier)
VisualizeConfussion(M)

在這裡插入圖片描述

Problem 4: K-Nearest Neighbors (KNN)

Here you will implemnet a simple knn classifer. The distance metric is euclidian in pixel space. k refers to the number of neighbors involved in voting on the class.

Hint: you may want to use: sklearn.neighbors.KNeighborsClassifier

from sklearn.neighbors import KNeighborsClassifier
class KNNClassifer():
    def __init__(self, k=3):
        """your code here"""
        self.classes = k
        # k is the number of neighbors involved in voting
        
        
        
    def train(self, trainData, trainLabels):
        """your code here"""
        self.model = KNeighborsClassifier()
        n = trainData.shape[0]
        xn = trainData.shape[1]
        yn = trainData.shape[2]
        zn = trainData.shape[3]
        trainData = trainData.reshape((n,xn*yn*zn))
        self.model.fit(trainData, trainLabels)
        
        
        
    def __call__(self, x):
        """your code here"""
        n = x.shape[0]
        xn = x.shape[1]
        yn = x.shape[2]
        zn = x.shape[3]
        x = x.reshape((n,xn*yn*zn))
        y = self.model.predict(x)
        return y
    
        # this method should take a batch of images (batchsize, 32, 32, 3) and return a batch of prediction (batchsize)
        # predictions should be int64 values in the range [0,9] corrisponding to the class that the image belongs to
        
    
    
# test your classifier with only the first 100 training examples (use this while debugging)
# note you should get around 10-20% accuracy
knnClassiferX = KNNClassifer()
knnClassiferX.train(trainData[:100], trainLabels[:100])
print ('KNN classifier accuracy: %f'%test(testData, testLabels, knnClassiferX))
KNN classifier accuracy: 17.410000
# test your classifier with all the training examples (This may take a while)
# note you should get around 30% accuracy
knnClassifer = KNNClassifer()
knnClassifer.train(trainData, trainLabels)
print ('KNN classifier accuracy: %f'%test(testData, testLabels, knnClassifer))

# display confusion matrix for your KNN classifier with all the training examples
M = confusion(testData, testLabels, knnClassifer)
VisualizeConfussion(M)
KNN classifier accuracy: 33.980000

Problem 5: Principal Component Analysis (PCA) K-Nearest Neighbors (KNN)

Here you will implemnet a simple knn classifer in PCA space.
You should implement PCA yourself using svd (you may not use sklearn.decomposition.PCA
or any other package that directly implements PCA transofrmations

Hint: Don’t forget to apply the same normalization at test time.

Note: you should get similar accuracy to above, but it should run faster.

from sklearn.decomposition import PCA
class PCAKNNClassifer():
    def __init__(self, components=25, k=3):
        """your code here"""
        self.components = 25
        
    def train(self, trainData, trainLabels):
        """your code here"""
        self.model = KNeighborsClassifier()
        n = trainData.shape[0]
        xn = trainData.shape[1]
        yn = trainData.shape[2]
        zn = trainData.shape[3]
        trainData = trainData.reshape((n,xn*yn*zn))
        
        #pca求主成分
        Mat = np.array(trainData, dtype='float64')
        p,n = np.shape(Mat) # shape of Mat 
        t = np.mean(Mat, 0) # mean of each column
        Mat = Mat - t
        
        # covariance Matrix
        cov_Mat = np.dot(Mat.T, Mat)/(p-1)
        u,d,v = np.linalg.svd(cov_Mat)
        self.u = u 
        T2 = np.dot(Mat, u[:,:self.components])
        self.model.fit(T2, trainLabels)
        
        
    def __call__(self, x):
        """your code here"""
        n = x.shape[0]
        xn = x.shape[1]
        yn = x.shape[2]
        zn = x.shape[3]
        x = x.reshape((n,xn*yn*zn))
        
        Mat = np.array(x, dtype='float64')
        p,n = np.shape(Mat) # shape of Mat 
        t = np.mean(Mat, 0) # mean of each column
        Mat = Mat - t
        
        # covariance Matrix
        cov_Mat = np.dot(Mat.T, Mat)/(p-1)
         
        T = np.dot(Mat, self.u[:,:self.components])
        y = self.model.predict(T)
        return y

        
    
# test your classifier with only the first 100 training examples (use this while debugging)
pcaknnClassiferX = PCAKNNClassifer()
pcaknnClassiferX.train(trainData[:100], trainLabels[:100])
print ('PCA-KNN classifier accuracy: %f'%test(testData, testLabels, pcaknnClassiferX))
PCA-KNN classifier accuracy: 16.530000
# test your classifier with all the training examples (This may take a few minutes)
pcaknnClassifer = PCAKNNClassifer()
pcaknnClassifer.train(trainData, trainLabels)
print ('KNN classifier accuracy: %f'%test(testData, testLabels, pcaknnClassifer))

# display the confusion matrix
M = confusion(testData, testLabels, pcaknnClassifer)
VisualizeConfussion(M)
KNN classifier accuracy: 39.800000

在這裡插入圖片描述

Deep learning

Below is some helper code to train your deep networks

Hint: see https://www.tensorflow.org/get_started/mnist/pros or https://www.tensorflow.org/get_started/mnist/beginners for reference

# base class for your Tensorflow networks. It implements the training loop (train) and prediction(__call__)  for you.
# You will need to implement the __init__ function to define the networks structures in the following problems
class TFClassifier():
    def __init__(self):
        pass
    
    def train(self, trainData, trainLabels, epochs=1, batchsize=50):
        self.prediction = tf.argmax(self.y,1)
        self.cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=self.y_, logits=self.y))
        self.train_step = tf.train.AdamOptimizer(1e-4).minimize(self.cross_entropy)
        self.correct_prediction = tf.equal(self.prediction, self.y_)
        self.accuracy = tf.reduce_mean(tf.cast(self.correct_prediction, tf.float32))
        
        self.sess.run(tf.global_variables_initializer())
        
        for epoch in range(epochs):
            for i, (data,label) in enumerate(DataBatch(trainData, trainLabels, batchsize, shuffle=True)):
                _, acc = self.sess.run([self.train_step, self.accuracy], feed_dict={self.x: data, self.y_: label})
                #if i%100==99:
                #    print ('%d/%d %d %f'%(epoch, epochs, i, acc))
                    
            print ('testing epoch:%d accuracy: %f'%(epoch+1, test(testData, testLabels, self)))
        
    def __call__(self, x):
        return self.sess.run(self.prediction, feed_dict={self.x: x})

# helper function to get weight variable
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.01)
    return tf.Variable(initial)

# helper function to get bias variable
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

# example linear classifier
class LinearClassifer(TFClassifier):
    def __init__(self, classes=10):
        self.sess = tf.Session()

        self.x = tf.placeholder(tf.float32, shape=[None,32,32,3]) # input batch of images
        self.y_ = tf.placeholder(tf.int64, shape=[None]) # input labels

        # model variables
        self.W = weight_variable([32*32*3,classes])
        self.b = bias_variable([classes])

        # linear operation
        self.y = tf.matmul(tf.reshape(self.x,(-1,32*32*3)),self.W) + self.b
        
# test the example linear classifier (note you should get around 20-30% accuracy)
linearClassifer = LinearClassifer()
linearClassifer.train(trainData, trainLabels, epochs=20)

# display confusion matrix
M = confusion(testData, testLabels, linearClassifer)
VisualizeConfussion(M)
testing epoch:1 accuracy: 23.910000
testing epoch:2 accuracy: 27.150000
testing epoch:3 accuracy: 28.420000
testing epoch:4 accuracy: 26.790000
testing epoch:5 accuracy: 29.410000
testing epoch:6 accuracy: 28.210000
testing epoch:7 accuracy: 28.040000
testing epoch:8 accuracy: 29.030000
testing epoch:9 accuracy: 25.070000
testing epoch:10 accuracy: 25.520000
testing epoch:11 accuracy: 32.700000
testing epoch:12 accuracy: 26.960000
testing epoch:13 accuracy: 27.370000
testing epoch:14 accuracy: 29.700000
testing epoch:15 accuracy: 24.490000
testing epoch:16 accuracy: 27.470000
testing epoch:17 accuracy: 28.980000
testing epoch:18 accuracy: 29.040000
testing epoch:19 accuracy: 25.450000
testing epoch:20 accuracy: 28.610000

在這裡插入圖片描述

Problem 6: Multi Layer Perceptron (MLP)

Here you will implement an MLP. The MLP shoud consist of 3 linear layers (matrix multiplcation and bias offset) that map to the following feature dimensions:

32x32x3 -> hidden

hidden -> hidden

hidden -> classes

The first two linear layers should be followed with a ReLU nonlinearity. The final layer should not have a nonlinearity applied as we desire the raw logits output (see: the documentation for tf.nn.sparse_softmax_cross_entropy_with_logits used in the training)

The final output of the computation graph should be stored in self.y as that will be used in the training.

Hint: see the example linear classifier

Note: you should get around 50% accuracy

class MLPClassifer(TFClassifier):
    def __init__(self, classes=10, hidden=100):
        self.sess = tf.Session()

        self.x = tf.placeholder(tf.float32, shape=[None,32,32,3]) # input batch of images
        self.y_ = tf.placeholder(tf.int64, shape=[None]) # input labels

        """your code here"""
        #初始化輸入層權重,尺寸為[32*32*3,1000]
        self.W1 = weight_variable([32*32*3,hidden*10])
        self.b1 = bias_variable([hidden*10])
        
        #隱層第一層與第二層之間的權重,尺寸為[1000,100]
        self.W2 = weight_variable([hidden*10,hidden])
        self.b2 = bias_variable([hidden])
        
        #隱層第二層與輸出層之間的權重,尺寸為[100,10]
        self.W3 = weight_variable([hidden,classes])
        self.b3 = bias_variable([classes])
        
        #隱層第一層relu啟用
        self.hidden1 = tf.nn.relu(tf.matmul(tf.reshape(self.x,(-1,32*32*3)), self.W1) + self.b1)

        #隱層第二層relu啟用
        self.hidden2 = tf.nn.relu(tf.matmul(self.hidden1, self.W2) + self.b2)
        

        #輸出層線性變化
        self.y = tf.matmul(self.hidden2, self.W3) + self.b3

# test your MLP classifier (note you should get around 50% accuracy)
mlpClassifer = MLPClassifer()
mlpClassifer.train(trainData, trainLabels, epochs=20)

# display confusion matrix
M = confusion(testData, testLabels, mlpClassifer)
VisualizeConfussion(M)
testing epoch:1 accuracy: 37.790000
testing epoch:2 accuracy: 42.510000
testing epoch:3 accuracy: 45.190000
testing epoch:4 accuracy: 45.060000
testing epoch:5 accuracy: 47.180000
testing epoch:6 accuracy: 48.400000
testing epoch:7 accuracy: 46.450000
testing epoch:8 accuracy: 48.190000
testing epoch:9 accuracy: 49.500000
testing epoch:10 accuracy: 50.230000
testing epoch:11 accuracy: 50.970000
testing epoch:12 accuracy: 49.440000
testing epoch:13 accuracy: 49.480000
testing epoch:14 accuracy: 49.980000
testing epoch:15 accuracy: 50.620000
testing epoch:16 accuracy: 51.400000
testing epoch:17 accuracy: 52.450000
testing epoch:18 accuracy: 50.270000
testing epoch:19 accuracy: 50.670000
testing epoch:20 accuracy: 51.900000

在這裡插入圖片描述

Problem 7: Convolutional Neural Netork (CNN)

Here you will implement a CNN with the following architecture:

ReLU( Conv(kernel_size=4x4 stride=2, output_features=n) )

ReLU( Conv(kernel_size=4x4 stride=2, output_features=n*2) )

ReLU( Conv(kernel_size=4x4 stride=2, output_features=n*4) )

Linear(output_features=classes)

def conv2d(x, W, stride=2):
    return tf.nn.conv2d(x, W, strides=[1, stride, stride, 1], padding='SAME')

class CNNClassifer(TFClassifier):
    def __init__(self, classes=10, n=16):
        self.sess = tf.Session()

        self.x = tf.placeholder(tf.float32, shape=[None,32,32,3]) # input batch of images
        self.y_ = tf.placeholder(tf.int64, shape=[None]) # input labels
        """your code here"""
        #初始化網路層權重
        conv1_weight = tf.Variable(tf.truncated_normal([4,4,3,n],stddev=0.05,dtype=tf.float32))
        conv1_bias = tf.Variable(tf.truncated_normal([n],stddev=0.05,dtype=tf.float32))
  
        conv2_weight = tf.Variable(tf.truncated_normal([4,4,n,n*2],stddev=0.05,dtype=tf.float32))
        conv2_bias = tf.Variable(tf.truncated_normal([n*2],stddev=0.05,dtype=tf.float32))
        
        conv3_weight = tf.Variable(tf.truncated_normal([4,4,n*2,n*4],stddev=0.05,dtype=tf.float32))
        conv3_bias = tf.Variable(tf.truncated_normal([n*4],stddev=0.05,dtype=tf.float32))
        
        #初始化cnn模型結構
        conv1 = conv2d(self.x,conv1_weight)
        relu1 = tf.nn.relu(tf.nn.bias_add(conv1,conv1_bias))
        max_pool1 = tf.nn.max_pool(relu1,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')
        
        conv2 = conv2d(max_pool1,conv2_weight)
        relu2 = tf.nn.relu(tf.nn.bias_add(conv2,conv2_bias))
        max_pool2 = tf.nn.max_pool(relu2,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME') 
        
        conv3 = conv2d(max_pool2,conv3_weight)
        relu3 = tf.nn.relu(tf.nn.bias_add(conv3,conv3_bias))
        
        #線性輸出
        self.W = weight_variable([n*4,classes])
        #self.W  = tf.Variable(tf.truncated_normal([n*4,classes],stddev=0.05,dtype=tf.float32))
        
        #self.b  = tf.Variable(tf.truncated_normal([classes],stddev=0.05,dtype=tf.float32))
        
        self.b = bias_variable([classes])

        # linear operation
        self.y = tf.matmul(tf.reshape(relu3,(-1,n*4)),self.W) + self.b


# test your CNN classifier (note you should get around 65% accuracy)
cnnClassifer = CNNClassifer()
cnnClassifer.train(trainData, trainLabels, epochs=20)

# display confusion matrix
M = confusion(testData, testLabels, cnnClassifer)
VisualizeConfussion(M)
testing epoch:1 accuracy: 40.500000
testing epoch:2 accuracy: 43.380000
testing epoch:3 accuracy: 46.930000
testing epoch:4 accuracy: 48.190000
testing epoch:5 accuracy: 50.180000
testing epoch:6 accuracy: 51.990000
testing epoch:7 accuracy: 53.040000
testing epoch:8 accuracy: 53.060000
testing epoch:9 accuracy: 54.430000
testing epoch:10 accuracy: 55.080000
testing epoch:11 accuracy: 55.130000
testing epoch:12 accuracy: 55.900000
testing epoch:13 accuracy: 56.700000
testing epoch:14 accuracy: 56.630000
testing epoch:15 accuracy: 56.710000
testing epoch:16 accuracy: 57.580000
testing epoch:17 accuracy: 57.630000
testing epoch:18 accuracy: 58.490000
testing epoch:19 accuracy: 59.120000
testing epoch:20 accuracy: 59.420000

在這裡插入圖片描述
有需要資料自己動手實驗的的請私信樓主。

相關文章