[computer vision] Bag of Visual Word (BOW)

芒果和小貓發表於2022-04-02

原文網址 : https://www.cnblogs.com/WAoyu/p/16089835.html

Bag of Visual Word (BoW, BoF, 詞袋)

簡介

BoW 是傳統的計算機視覺方法，用一些特徵（一些向量）來表示一個影像。BoW的核心思想是利用一組較為通用的特徵，將影像用這些特徵來表示，不同影像對於同一個特徵的響應也是不同的，最終一個影像可以轉化成關於這一組特徵的一個頻率直方圖（向量）。這裡有個挺清晰的介紹。BoW 常常用在 content-based image retrieval (CBIR) 任務上。
例如下面這張圖（來源 Brown Computer Vision 2021 ）形象的介紹了BoW的，首先有一堆圖片，然後提取這些圖片中的特徵，然後提取具有代表性的通用特徵，然後計算不同影像對於這些特徵的響應，從而將影像轉換成關於這組特徵的一個特徵向量。

[computer vision] Bag of Visual Word (BOW)

實踐

本文不過多的介紹理論部分，主要使用opencv來進行一些實踐操作。

資料集

本文使用的是一個比較老的資料集是 ZuBuD 資料集，是蘇黎世聯邦理工構建的資料集，開放下載。資料集是蘇黎世城市內的一些建築，訓練集有1005張影像，包含201個建築，測試集有115張影像，用來測試 image retrieval，有ground truth資訊，即指定來哪些影像是對應的，如下隨便找了兩張圖片。

以下是 ground truth 的部分資訊，例如第一行代表測試集中編號為 1 的影像對應到訓練集中，應該是編號 100。

TEST	TRAIN
001	100
002	102
003	104
004	105
005	107
006	109
...
...

總體思路

對每個影像提取sift特徵
將訓練集的所有特徵放在一起進行聚類
對訓練集中的影像計算直方圖
對測試集中的影像計算直方圖
從訓練集中找和測試影像直方圖最接近的影像作為結果
計算正確率

程式碼部分

有了上述思路後，程式碼的邏輯也比較清晰了，下面給出所有的程式碼，詳細的解釋在註釋裡。

#1.對每個影像提取sift特徵
#2.將訓練集合的所有特徵放在一起進行聚類
#3.對每個影像計算直方圖
#4.對測試影像計算直方圖
#5.從訓練集中尋找和測試影像直方圖最近接近的影像作為結果
#6.計算正確率

import cv2
import os
import matplotlib.pyplot as plt
import numpy as np
import time
from sklearn.cluster import MiniBatchKMeans

DataPath = "../Dataset/ZuBuD" #資料集的根目錄
TrainPath = os.path.join(DataPath, "png-ZuBuD") #訓練集的根目錄
TestPath = os.path.join(DataPath,"1000city","qimage") #測試集的根目錄
trainList = os.listdir(TrainPath) #訓練集影像的所有名字

TrainSIFTPath = "../Dataset/ZuBuD/Train_SIFT" #訓練集影像SIFT儲存的路徑（儲存在檔案中時有用）
TestSIFTPath = "../Dataset/ZuBuD/Test_SIFT" #測試集影像SIFT儲存的路徑（儲存在檔案中時有用）

TrainSIFT = []#訓練集的SIFT特徵，為了後面numpy方便拼接
TestSIFT = []#測試集的SIFT特徵

Train_SIFT_dict = {}#同上，只不過用名字來索引特徵
Test_SIFT_dict = {}


#批量生成SIFT特徵
def genSIFT(dataDir,outdir, outlist,outdict):
    begin = time.time()
    sift = cv2.SIFT_create()
    imgList = os.listdir(dataDir)
    if not os.path.exists(outdir):
        os.mkdir(outdir)
    count = 0
    for name in imgList:
        ext = os.path.splitext(name)[-1]
        if ext!=".png" and ext!=".JPG" and ext!=".jpg" :
            continue
        #讀取圖片、轉成灰度、提取描述子
        path = os.path.join(dataDir,name)
        imgdata = cv2.imread(path)
        gray = cv2.cvtColor(imgdata,cv2.COLOR_BGR2GRAY)
        _, des = sift.detectAndCompute(gray, None)
        outlist.append(des)
        outdict[name] = des
        #np.save(os.path.join(outdir,name),des)
        print(len(imgList),count)
        count = count + 1
    end = time.time()

#聚類，也是生成通用特徵、詞袋，這裡用的是MiniBatchKMeans，這個比KMeans快，精度沒有差很多
def cluster(featureList, n):
    #將所有訓練圖片的SIFT特徵放在一起進行聚類
    begin = time.time()
    X = np.concatenate(featureList)
    kmeans = MiniBatchKMeans(n_clusters=n, random_state=0,verbose=1).fit(X)
    end = time.time()
    return kmeans

#計算餘弦距離，為了計算相似度
def get_cos_similar(v1, v2):
    num = float(np.dot(v1, v2))  
    denom = np.linalg.norm(v1) * np.linalg.norm(v2) 
    return 0.5 + 0.5 * (num / denom) if denom != 0 else 0

#讀取groundtruth檔案，生成資料對
def getGroundTruth(dataPath):
    gtpair = {}
    with open(os.path.join(dataPath,"zubud_groundtruth.txt")) as f:
        gt = f.readlines()
    for i, line in enumerate(gt):
        if i == 0:
            continue
        test, train = line[:-1].split("\t")
        gtpair[test] = train
    return gtpair
    

#根據聚類的結果，也就是詞袋生成頻率向量，這裡就將影像轉成了一個向量表示
def getFeatureHistogram(dataDict,kmeans):
    outDict = {}
    for k in dataDict.keys():
        feat = dataDict[k]
        his = np.bincount(kmeans.predict(feat))
        if his.shape[0] < kmeans.n_clusters:
            diff = kmeans.n_clusters - his.shape[0]
            for i in range(diff):
                his = np.append(his,0)
        outDict[k] = his
    return outDict


#這裡時進行測試，這裡使用了一種比較樸素的方法，也就是測試影像
#和訓練集裡的影像挨個比較，取餘弦距離最大的那個作為結果。
def predict(testHisDict, trainHisDict, gtpair):
    predict = {}
    
    for testk in testHisDict.keys():
        testhis = testHisDict[testk]
        score = 0.0
        index = ""
        for traink in trainHisDict.keys():
            trainhis = trainHisDict[traink]
            s = get_cos_similar(testhis,trainhis)
            if s > score:
                score = s
                index = traink
        predict[testk] = index
        
    suc = 0
    for k in predict.keys():
        tk = k[5:8]
        pk = predict[k][7:10]
        if gtpair[tk] == pk:
            suc = suc+1
    return suc/len(predict)

#將以上步驟串起來，調整聚類的類別，來觀察精度
def pipeline(n_list):
    result = []
    
    #1.對訓練集、測試集提取sift特徵
    t0 = time.time()
    genSIFT(TrainPath,TrainSIFTPath,TrainSIFT,Train_SIFT_dict)
    genSIFT(TestPath,TestSIFTPath,TestSIFT,Test_SIFT_dict)
    t1 = time.time()
    #2.讀取ground truth
    gtpair = getGroundTruth(DataPath)
    
    #3.對訓練集提取的sift進行聚類，生成 visual word
    for n in n_list:
        t3 = time.time()
        clu = cluster(TrainSIFT, n)
        t4 = time.time()
        #4.計算每個影像關於 visual word 的直方圖
        train_his = getFeatureHistogram(Train_SIFT_dict, clu)
        test_his = getFeatureHistogram(Test_SIFT_dict, clu)
        t5 = time.time()
        #5.利用餘弦距離計算相似度
        acc = predict(test_his,train_his, gtpair)
        t6 = time.time()
        info = {"sift":t1-t0,"clu":t4-t3,"calvw":t5-t4,"predict":t6-t5,"acc":acc}
        result.append(info)
        print(info)
    return result
    
result = pipeline([50,100,300,600,1000,2000])
print(result)

測試結果

本文一共測試了6組聚類的類別，隨著類別增多，準確的逐漸上升，但是太對類別準確度反而會下降，這是因為在實驗中發現每張影像平均也就能提取1000～1500個特徵點，2000個類別太多啦。下面是繪製的準確度折線圖，因為1000 - 2000之間沒有測試，因此可能準確率還會有所提升。600個類別的準確率為 75.65%， 1000個準確率為 78.26%。

關於耗時，2020年 mac pro：

提取所有影像 SIFT 特徵，耗時 55s 左右。
聚類 600 類，耗時 191s 左右，聚類 1000 類，耗時 251s 左右
計算頻率直方圖，600 類大概 6s，1000 類 9s
預測耗時基本都是 1.5s

Computer Vision
2024-11-22
Computer Vision Exercise
2023-01-18
UM EECS 542: Advanced Topics in Computer Vision
2024-10-03
[Computer Vision]Harris角點檢測的詳細推導
2020-06-12
閱讀翻譯Hugging Face Community Computer Vision Course之Feature Matching （特徵匹配）
2024-07-20
Hugging FaceUnity特徵
bag-of-words
2020-10-05
影象識別sift+bow+svm
2019-07-22
A. Build a Computer
2024-10-28
UI
computer2019623
2021-09-09
COMPUTER DATABASES AND ABSTRACTING SERVICES 2
2018-11-10
Database
Computer Science 320SC
2024-10-09
COMP3811 Computer Graphics
2024-10-12
CPT205 Computer Graphics
2024-11-20
CS 0447 Computer Organization and Assembly
2024-12-06
CCIT4020 Introduction to Computer
2024-12-08
bag系列ConcurrentModificationException---併發修改異常
2020-12-24
Exception
Rosbag使用：bag檔案視覺化實現
2020-09-25
ROS視覺化
BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations
2024-07-05
MIT
CCIT4020 Introduction to Computer Programming
2024-11-16
COMP612 Computer Graphics Programming
2024-10-09
CPSC 219: Introduction to Computer Science II
2024-10-03
LCSCI4207 undamentals of Computer Science
2024-10-30
COMP42215 Introduction to Computer Science
2024-12-06
Computer programming and database - 考試整理
2020-12-28
Database
CE235 Computer Security Bitcoin
2024-12-09
Flink Shuffle 3.0: Vision, Roadmap and Progress
2022-12-30
快速開發後臺管理系統vue-bag-admin
2022-04-24
Vue
Fundamentals of Computer Graphics 4th目錄
2020-12-20
BOW弓箭swap交易所繫統開發流程及原始碼示例
2023-03-17
原始碼
win10系統computer management怎麼開啟_win10系統computer management如何開啟
2020-02-27
Win10
VISION for mac(3D建模軟體)
2020-07-30
Mac3D
Ought to i contribute in ISTA/Pro computer program to discover deficiencies
2024-06-27
OpenCV探索之路（二十八）：Bag of Features(BoF)影像分類實踐
2020-04-05
OpenCV
pdf轉word格式PDF to word for Mac
2023-05-05
Mac
Visual Studio Code
2019-12-13
Visual Basic for Application
2020-04-05
APP
Visual Studio Tips
2024-08-04
Visual Instruction Tuning
2024-06-14
Struct

[computer vision] Bag of Visual Word (BOW)

Bag of Visual Word (BoW, BoF, 詞袋)

簡介

實踐

資料集

總體思路

程式碼部分

測試結果

相關文章