【驗證碼識別專欄】今天不煉丹，用 cv 來秒驗證碼

宣告

本文章中所有內容僅供學習交流使用，不用於其他任何目的，不提供完整程式碼，抓包內容、敏感網址、資料介面等均已做脫敏處理，嚴禁用於商業用途和非法用途，否則由此產生的一切後果均與作者無關！

本文章未經許可禁止轉載，禁止任何修改後二次傳播，擅自使用本文講解的技術而導致的任何意外，作者均不負責，若有侵權，請在公眾號【K哥爬蟲】聯絡作者立即刪除！

前言

最近檢視 QQ 群訊息，無意間看到了粉絲們關於 opencv 的相關討論，有熱心的群友給出了大致的解決方向。同時也有很多星球夥伴，在星球分享關於驗證碼識別的相關知識，學習交流的氛圍很好。本文就針對提問和已經存在的主題做一期總結與答疑，也是豐富驗證碼識別型別的相關文章：

分析目標

地址 1：aHR0cHM6Ly96bmZtLmJhaXdhbmcuY29tLw==
地址 2：aHR0cHM6Ly93d3cuZGluZ3hpYW5nLWluYy5jb20vYnVzaW5lc3MvY2FwdGNoYQ==
暫定某東旋轉驗證碼

初識乾坤

首先我們來分析一下粉絲提到的站點 1，該站的查詢介面返回的驗證碼圖片如下：

上圖分別是一張缺口圖與一張完整的影像，這種型別在滑塊中還是比較好處理的，完全可以利用 absdiff 來完成。在影像處理和計算機視覺領域，計算影像之間的差異是一項基本的任務。這種差異可以幫助我們識別影像中的變化、運動物件或者進行影像配準等。在 OpenCV 庫中，absdiff 函式提供了一種高效的方式來計算兩個影像之間的絕對差值：

特性	描述
函式名稱	`cv2.absdiff()`
引數	`src1` 和 `src2`：兩張要進行比較的影像，必須具有相同的尺寸和通道數。
返回值	返回一張新的影像，其中每個畫素值是 `src1` 和 `src2` 在對應位置的絕對差異。
影像型別	支援灰度影像和彩色影像（RGB）。對於彩色影像，分別計算每個顏色通道的絕對差異。
常見用途	1. 背景減除：檢測影片中的運動物體；2. 影像對比：比較兩張影像是否相似；3. 影像變化檢測：找出影像間的變化。
應用場景	1. 運動檢測：透過計算背景和當前幀之間的差異檢測前景物體；2. 監控：背景與前景變化檢測；3. 影像註冊與對比。
效能特點	快速計算兩張影像之間的畫素差異，適用於影像對比、背景減除等任務。

所以我們先將兩張影像轉為灰度，然後計算絕對差值，結果如下：

之後，使用 Canny 邊緣檢測提取邊緣，最終使用輪廓查詢，找到符合條件的輪廓即可，完整程式碼如下：

import cv2
import numpy as np


def detect_slider_gap_with_canny(original_path, with_hole_path):
    """
    使用 Canny 邊緣檢測法檢測滑塊驗證碼缺口位置，並返回滑塊需要移動的 x 座標值。

    引數:
        original_path (str): 原圖路徑（無缺口）。
        with_hole_path (str): 帶缺口的圖片路徑。

    返回:
        int: 滑塊需要移動的 x 座標值。
    """
    # 讀取圖片（灰度模式）
    original = cv2.imread(original_path, cv2.IMREAD_GRAYSCALE)
    with_hole = cv2.imread(with_hole_path, cv2.IMREAD_GRAYSCALE)

    # 檢查兩張圖片是否大小一致
    if original.shape != with_hole.shape:
        raise ValueError("兩張圖片的尺寸不一致，請檢查輸入圖片！")

    # 計算絕對差值
    diff = cv2.absdiff(original, with_hole)
    cv2.imshow("diff",diff)
    cv2.waitKey(0)

    # 使用 Canny 邊緣檢測提取邊緣
    edges = cv2.Canny(diff, threshold1=50, threshold2=150)

    cv2.imshow("edges", edges)
    cv2.waitKey(0)

    # 查詢邊緣圖中的輪廓
    contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    # 遍歷輪廓，找到缺口的 x 座標
    for contour in contours:
        # 獲取輪廓的邊界矩形
        x, y, w, h = cv2.boundingRect(contour)

        # 假設缺口的寬度和高度有一定限制，篩選可能的缺口
        if 20 < w < 100 and 20 < h < 100:  # 根據具體驗證碼調整範圍
            return x

    # 如果未檢測到缺口
    raise ValueError("未能檢測到缺口，請檢查輸入圖片！")


# 示例用法
if __name__ == "__main__":
    original_path = "1.png"  # 原圖路徑
    with_hole_path = "2.png"  # 帶缺口的圖片路徑

    try:
        slider_x = detect_slider_gap_with_canny(original_path, with_hole_path)
        print(f"滑塊需要移動的 x 值: {slider_x}")
    except Exception as e:
        print(f"錯誤: {e}")

漸入佳境

給粉絲答疑完之後，我們再來看看星球成員最近分享的，稍微複雜點的案例，是關於差異點選型別的驗證碼。該驗證碼是基於給定的影像中，選擇其中不同型別的圖案或文字：

該型別的驗證碼如下圖所示：

主要是利用 PCA 特徵降維和餘弦計算，關鍵程式碼如下：

def find_anomalous_image(images):
    # 提取所有影像的特徵
    features = [extract_features(img) for img in images]

    # 設定 PCA 的 n_components 為樣本數和特徵數的最小值
    n_components = min(len(features), len(features[0]))
    pca = PCA(n_components=n_components)
    reduced_features = pca.fit_transform(features)

    # 計算餘弦相似度
    similarity_matrix = cosine_similarity(reduced_features)
    average_similarity = np.mean(similarity_matrix, axis=1)

    # 找到平均相似度最低的影像索引
    anomalous_index = np.argmin(average_similarity)
    return anomalous_index

那麼，藉此思路我們同樣也可以用其他辦法來解決，既然圖案有差異，那麼我們可以透過模板匹配的結果來計算得分，同樣還是遍歷每個圖案，求自身與其他圖案的模板的得分平均值，最終還是利用 np.argmin 去得到平均相似度最低的字元索引，即可得到答案：

import cv2
import ddddocr
import numpy as np


# 初始化 ddddocr 檢測器
det = ddddocr.DdddOcr(det=True, show_ad=False)

def cv_show(img):
    cv2.imshow("img", img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

def preprocess_image(image):
    """
    對影像進行預處理：灰度化和二值化。
    """
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    _, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
    return binary

def extract_cropped_regions(image_path):
    """
    從驗證碼影像中裁剪檢測到的字元區域。

    :param image_path: 驗證碼影像路徑
    :return: 裁剪後的字元影像列表和檢測框座標
    """
    original_image = cv2.imread(image_path)
    with open(image_path, 'rb') as f:
        image_data = f.read()
    poses = det.detection(image_data)  # 檢測字元位置

    cropped_images = []
    for bbox in poses:
        x_min, y_min, x_max, y_max = bbox
        cropped = original_image[y_min:y_max, x_min:x_max]
        cropped_images.append((cropped, bbox))

    return cropped_images

def match_cropped_regions(cropped_images, threshold=0.8):
    """
    使用模板匹配比較裁剪的字元影像，找到異常字元。

    :param cropped_images: 裁剪的字元影像和其對應的檢測框
    :param threshold: 模板匹配的相似度閾值
    :return: 異常字元的索引
    """
    num_images = len(cropped_images)
    similarity_scores = np.zeros((num_images, num_images))

    for i, (img1, _) in enumerate(cropped_images):
        for j, (img2, _) in enumerate(cropped_images):
            if i != j:
                # 預處理影像
                img1_processed = preprocess_image(img1)
                img2_processed = preprocess_image(img2)

                # 計算模板匹配得分
                result = cv2.matchTemplate(img1_processed, img2_processed, cv2.TM_CCOEFF_NORMED)
                # print(result)
                similarity_scores[i, j] = np.max(result)

    # 計算每個字元與其他字元的平均相似度
    print(similarity_scores)
    average_similarity = np.mean(similarity_scores, axis=1)
    print(average_similarity)

    # 找到平均相似度最低的字元索引
    anomalous_index = np.argmin(average_similarity)
    return anomalous_index

def main():
    # 輸入驗證碼影像路徑
    captcha_image_path = "1.png"  # 替換為你的驗證碼路徑

    # 裁剪字元區域
    cropped_images = extract_cropped_regions(captcha_image_path)

    # cv_show(cropped_images)

    # 找到異常字元
    anomalous_index = match_cropped_regions(cropped_images)

    # 視覺化結果
    original_image = cv2.imread(captcha_image_path)
    for i, (_, bbox) in enumerate(cropped_images):
        x_min, y_min, x_max, y_max = bbox
        color = (0, 255, 0) if i != anomalous_index else (0, 0, 255)  # 異常字元用紅框標記
        cv2.rectangle(original_image, (x_min, y_min), (x_max, y_max), color, 2)

    # 儲存並顯示結果
    output_path = "output.png"
    cv2.imwrite(output_path, original_image)
    print(f"結果已儲存到: {output_path}")

    # 顯示結果
    cv2.imshow("Matched Result", original_image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()


if __name__ == "__main__":
    main()

依此類推，利用孿生 SiameseResNet50 網路，同樣也可以不煉丹就解決該型別驗證碼， ResNet50 已經在大量的影像資料集（如 ImageNet）上訓練過，預訓練的 ResNet50 能提供非常好的效能，尤其是在影像分類、特徵提取和其他計算機視覺任務中：

import cv2
import ddddocr

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.models as models

import numpy as np

# 使用 ddddocr 進行文字檢測
det = ddddocr.DdddOcr(det=True, show_ad=False)

# 定義孿生網路模型
class SiameseResNet50(nn.Module):
    def __init__(self):
        super(SiameseResNet50, self).__init__()
        # 載入預訓練的ResNet50模型
        resnet50 = models.resnet50(pretrained=True)
        # 去掉最後的全連線層
        self.resnet50 = nn.Sequential(*list(resnet50.children())[:-1])
        self.fc = nn.Sequential(
            nn.Linear(resnet50.fc.in_features, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
        )

    def forward_one(self, x):
        # 提取影像的特徵
        x = self.resnet50(x)
        x = x.view(x.size(0), -1)  # Flatten
        x = self.fc(x)
        return x

    def forward(self, x1, x2):
        output1 = self.forward_one(x1)
        output2 = self.forward_one(x2)
        return output1, output2


# 提取影像特徵
def extract_features(image, model, transform):
    image = cv2.resize(image, (224, 224))  # 調整為ResNet50的輸入大小
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # 轉換為RGB格式
    image = transform(image).unsqueeze(0)  # 應用變換並新增batch維度
    model.eval()
    with torch.no_grad():
        feature = model.forward_one(image).numpy()
    return feature


# 找到異常影像
def find_anomalous_image(images, model, transform):
    # 提取所有影像的特徵
    features = [extract_features(img, model, transform) for img in images]
    features = np.vstack(features)

    # 計算特徵之間的歐幾里得距離
    distance_matrix = np.linalg.norm(features[:, np.newaxis] - features[np.newaxis, :], axis=2)
    average_distance = np.mean(distance_matrix, axis=1)
    print(average_distance)

    # 找到具有最大平均距離的影像
    anomalous_index = np.argmax(average_distance)
    return anomalous_index


def main():
    # 載入預訓練模型
    model = SiameseResNet50()
    # 定義影像變換
    transform = transforms.Compose([
        transforms.ToPILImage(),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),  # Normalize RGB影像
    ])

    # 讀取輸入影像
    image_path = "1.png"  # 替換為您的影像路徑
    original_image = cv2.imread(image_path)
    with open(image_path, 'rb') as f:
        image_data = f.read()
    poses = det.detection(image_data)

    # 裁剪檢測到的區域
    cropped_images = []
    for bbox in poses:
        x_min, y_min, x_max, y_max = bbox
        cropped = original_image[y_min:y_max, x_min:x_max]
        cropped_images.append(cropped)

    # 找到最異常的影像
    anomalous_index = find_anomalous_image(cropped_images, model, transform)

    # 輸出異常影像的座標
    print(f"與其他不同的影像座標為: {poses[anomalous_index]}")

    # 繪製矩形框，跳過最異常的影像
    for i, bbox in enumerate(poses):
        x_min, y_min, x_max, y_max = bbox
        color = (0, 255, 0) if i == anomalous_index else (0, 0, 255)  # 綠色表示異常，紅色表示其他
        cv2.rectangle(original_image, (x_min, y_min), (x_max, y_max), color, 2)

    # 儲存結果影像
    output_path = "output.png"
    cv2.imwrite(output_path, original_image)
    print(f"結果已儲存到: {output_path}")


if __name__ == "__main__":
    main()

嶄露頭角

經過上文幾輪介紹，我們已經拿下 2 種型別驗證碼的識別，這倆種應該算比較簡單的，那麼對於個別型別的驗證碼如果單單透過降維來提取主幹特徵，篩選正確答案，是遠遠不能滿足要求的。比如上文的差異點選升級以後，便是字型風格型別驗證碼的識別，想從主幹特徵幾乎一模一樣的字型中篩選出正確答案，我們對特徵的提取是需要疊加的，還要考慮文字的字型、筆畫以及大小等等因素的影響，該型別的驗證碼如下圖所示：

可以看到，此類驗證碼對於特徵的提取，肯定不是單純的模板匹配或者直接相似度就能解決的。換湯不換藥，我們首先還是將影像轉為灰度圖，然後建立一個特徵容器 features，用來儲存特徵集合。

提取輪廓面積與周長

_, binary_image = cv2.threshold(inverted_image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
contours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# 輪廓面積與周長
if contours:
    largest_contour = max(contours, key=cv2.contourArea)
    contour_area = cv2.contourArea(largest_contour)
    contour_perimeter = cv2.arcLength(largest_contour, True)
    features.extend([contour_area, contour_perimeter])

    # 輪廓寬高比
    _, _, width, height = cv2.boundingRect(largest_contour)
    aspect_ratio = width / height
    features.append(aspect_ratio)
else:
    features.extend([0, 0, 0])  # 如果沒有找到輪廓，新增 0

筆畫寬度提取

這類驗證碼最重要的特徵就是筆畫了，主要提取寬度均值和寬度標準差：

kernel = np.ones((3, 3), np.uint8)
dilated_image = cv2.dilate(binary_image, kernel, iterations=1)
eroded_image = cv2.erode(binary_image, kernel, iterations=1)
stroke_width_map = dilated_image - eroded_image
stroke_width_mean = np.mean(stroke_width_map)
stroke_width_std = np.std(stroke_width_map)
features.extend([stroke_width_mean, stroke_width_std])

骨架提取

skeleton_image = skeletonize(binary_image > 0)  # 先二值化，再進行骨架化
skeleton_length = np.sum(skeleton_image)
features.append(skeleton_length)

灰度統計與區域性梯度

梯度統計主要用來求邊緣強度，不同風格的字型邊緣特徵有時候有明顯差別：

# 5. 灰度統計特徵
mean_intensity = np.mean(gray_image)  # 灰度均值
std_intensity = np.std(gray_image)  # 灰度標準差
features.extend([mean_intensity, std_intensity])

# 6. 區域性梯度分析（提取邊緣強度）
gradient_x = cv2.Sobel(inverted_image, cv2.CV_64F, 1, 0, ksize=3)
gradient_y = cv2.Sobel(inverted_image, cv2.CV_64F, 0, 1, ksize=3)
gradient_magnitude = np.sqrt(gradient_x ** 2 + gradient_y ** 2)
gradient_mean = np.mean(gradient_magnitude)
gradient_std = np.std(gradient_magnitude)
features.extend([gradient_mean, gradient_std])

區域性特徵

區域性特徵主要用來提取字型形狀和結構：

# 7. 區域性特徵（將影像分割為網格）
grid_size = 8  # 網格大小
cell_height, cell_width = resized_image.shape[0] // grid_size, resized_image.shape[1] // grid_size
local_grid_features = []

# 遍歷每個網格，計算每個小區域的均值
for i in range(grid_size):
    for j in range(grid_size):
        cell = resized_image[i * cell_height:(i + 1) * cell_height, j * cell_width:(j + 1) * cell_width]
        local_grid_features.append(np.mean(cell))  # 計算網格單元的均值
        
features.extend(local_grid_features)

最終特徵合併，計算相似性矩陣，迴歸上文相似度計算，繼續計算均值，找到平均相似度最低的索引：

similarity_matrix = cosine_similarity(features)

# 計算每個影像與其他影像的平均相似度
average_similarity = np.mean(similarity_matrix, axis=1)
print("平均相似度:", average_similarity)

# 找到平均相似度最低的影像索引
anomalous_index = np.argmin(average_similarity)

最終效果如下：

行雲流水

走到這裡，已經對 3 種驗證碼進行了處理，最後我們來用 cv 處理一下旋轉驗證碼，這種方法對於中小型網站，是足夠使用的，相反對於 AI 型別的驗證碼也是一種處理辦法，對於 AI 生成的旋轉驗證碼，模型通常沒有很好的泛性進行適配，如果模型可以一勞永逸，那麼風控頻繁的更新將毫無意義：

感知雜湊 (pHash) + 漢明距離

以某度驗證碼為例，相簿有限的情況下，這不疑也是一種解決辦法，處理思路就是透過將圖片的主幹特徵提取出來，計算平均值，生成二進位制雜湊值，程式碼如下：

def phash(image):
    resized = cv2.resize(image, (32, 32), interpolation=cv2.INTER_AREA)
    gray = cv2.cvtColor(resized, cv2.COLOR_BGR2GRAY)
    dct = cv2.dct(np.float32(gray))
    dct_low = dct[:8, :8]  # 提取左上角低頻部分
    mean_val = np.mean(dct_low)
    hash_str = ''.join(['1' if x > mean_val else '0' for x in dct_low.flatten()])
    return hash_str

透過讀取轉正的影像，將雜湊值以列表的形式讀出，最終儲存在序列化檔案中，格式如下：

['1010100001000000101000000010000010000000100000001000000000000000', '1011101000110010111000001010010010000001000000001000000000100010', '1010101100000000111010001000000010000000000000000000000000000000', '1110100000001100101000001000000010000000100000000000000000010000', '1010001010100010101000000000001000000010000010001000001000001000', '1110001100110000101110001100000010000000000000101000000000000000']

那麼雜湊值的對比就需要用到漢明距離了。

漢明距離的基本原理

漢明距離是用於衡量兩個等長字串之間的相似程度的指標，它表示兩個字串對應位上不同字元的個數。用於計算兩個雜湊值的相似性。距離越小，影像越相似，反之則差異越大：

from scipy.spatial.distance import hamming

hash1 = "1100101011110000"
hash2 = "1100101011010000"
distance = hamming(list(hash1), list(hash2))  # 漢明距離
print(f"漢明距離: {distance}")

最終透過將待旋轉的圖片經過 360 度旋轉計算雜湊值，最後找到最相似的角度，即可完成旋轉驗證碼的識別：

import cv2
import numpy as np

def rotate_image(image, angle):
    h, w = image.shape[:2]
    center = (w // 2, h // 2)
    matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
    return cv2.warpAffine(image, matrix, (w, h))

def find_best_angle(target_image, reference_image, step=1):
    target_hash = phash(target_image)
    best_angle, min_distance = 0, float('inf')
    for angle in range(0, 360, step):
        rotated = rotate_image(reference_image, angle)
        rotated_hash = phash(rotated)
        distance = hamming(list(target_hash), list(rotated_hash))
        if distance < min_distance:
            best_angle, min_distance = angle, distance
    return best_angle

個別角度可能順時針、逆時針不同，需要用 360 - 計算出來的角度。

更詳細的程式碼可以參考熱心網友已經整理好的 GitHub，方法大同小異：

相關連結：https://github.com/decodecaptcha/Rotate-Captcha-Angle-Prediction

SIFT 特徵匹配 + 仿射變換

SIFT 的基本原理

SIFT 是一種經典的特徵提取演算法，能提取影像的關鍵點及其區域性特徵，具有旋轉不變性和尺度不變性，適用於旋轉驗證碼中影像特徵的匹配。

在標註階段，首先需要準備一些已知的正確影像（即“轉正”影像），這些影像的旋轉角度是已知的。然後，透過計算這些影像的直方圖特徵，並將其儲存到 pkl 檔案 中，以便在後續的預測階段使用：

import cv2
import pickle
import numpy as np

def calculate_histogram(image):
    # 計算影像的灰度直方圖
    grayscale_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    hist = cv2.calcHist([grayscale_image], [0], None, [256], [0, 256])
    # 歸一化直方圖
    cv2.normalize(hist, hist, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)
    return hist

def save_histograms(images, output_file):
    histograms = {'baidu': {}, 'allimg': {}}

    for idx, img_path in enumerate(images):
        img = cv2.imread(img_path)  # 讀取影像
        hist = calculate_histogram(img)  # 計算影像的灰度直方圖
        histograms['baidu'][f"image_{idx}"] = hist  # 將直方圖儲存到 'baidu' 鍵下
        histograms['allimg'][f"image_{idx}"] = img  # 將影像資料儲存到 'allimg' 鍵下

    # 將資料儲存到 pkl 檔案
    with open(output_file, 'wb') as file:
        pickle.dump(histograms, file)

    print(f"saved to {output_file}")

# 示例：儲存直方圖資料和影像集合
image_paths = ['5_288.jpeg', '6_268.jpeg']  # 示例圖片路徑
output_file = 'baidu.pkl'  # 輸出檔案路徑
save_histograms(image_paths, output_file)

之後讀取未轉正的影像，計算直方圖，找到已經儲存到 pkl 檔案中最合適的影像，可以用 cv2.compareHist() 方法來計算影像之間的直方圖相似度，找到相似的影像以後利用 SIFT 進行特徵匹配和仿射變換來計算旋轉角度。

流程如下：

import cv2
import time
import pickle
import numpy as np


# 從檔案載入直方圖模型
def load_histograms_from_file(file_path):
    with open(file_path, 'rb') as file:
        histograms_data = pickle.load(file)
    return histograms_data

# 特徵匹配計算影像的旋轉角度
def compute_rotation_angle_by_features(reference_img, query_img):
    # 將影像轉換為灰度圖
    query_gray = cv2.cvtColor(query_img, cv2.COLOR_BGR2GRAY)
    reference_gray = cv2.cvtColor(reference_img, cv2.COLOR_BGR2GRAY)

    # 使用 ORB 特徵提取器代替 SIFT（ORB 更加高效）
    orb = cv2.ORB_create()

    # 提取特徵點和描述符
    kp1, des1 = orb.detectAndCompute(query_gray, None)
    kp2, des2 = orb.detectAndCompute(reference_gray, None)

    # 使用暴力匹配器進行特徵點匹配
    bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
    matches = bf.match(des1, des2)
    matches = sorted(matches, key=lambda x: x.distance)

    # 提取匹配點
    src_pts = np.float32([kp1[m.queryIdx].pt for m in matches]).reshape(-1, 1, 2)
    dst_pts = np.float32([kp2[m.trainIdx].pt for m in matches]).reshape(-1, 1, 2)

    # 計算變換矩陣，這裡使用 RANSAC 方法剔除錯誤匹配
    matrix, _ = cv2.estimateAffinePartial2D(src_pts, dst_pts, method=cv2.RANSAC, ransacReprojThreshold=5.0)

    if matrix is not None:
        # 提取旋轉角度
        angle = np.arctan2(matrix[0, 1], matrix[0, 0]) * (180 / np.pi)
    else:
        angle = None
        print("未能估計變換矩陣。")

    return angle

# 根據旋轉角度旋轉影像
def rotate_image(img, angle):
    height, width = img.shape[:2]
    # 計算旋轉矩陣
    rotation_matrix = cv2.getRotationMatrix2D((width / 2, height / 2), angle, 1)

    # 進行旋轉變換
    rotated_img = cv2.warpAffine(img, rotation_matrix, (width, height), flags=cv2.INTER_CUBIC)
    return rotated_img

# 使用直方圖匹配估計影像的旋轉角度
def estimate_image_angle(histograms, image_collection, query_img):
    start_time = time.time()

    # 計算查詢影像的灰度直方圖
    query_gray = cv2.cvtColor(query_img, cv2.COLOR_BGR2GRAY)
    query_hist = cv2.calcHist([query_gray], [0], None, [256], [0, 256])
    cv2.normalize(query_hist, query_hist, 0, 1, cv2.NORM_MINMAX)

    best_match_score = -1
    best_match_key = None

    # 查詢與查詢影像最相似的影像
    for key, hist in histograms.items():
        similarity = cv2.compareHist(query_hist, hist, cv2.HISTCMP_CORREL)
        if similarity > best_match_score:
            best_match_score = similarity
            best_match_key = key

    print(f"找到最匹配的影像: {best_match_key}")

    # 使用 'best_match_key' 作為索引，從 'image_collection' 字典中獲取影像
    best_match_image = image_collection[best_match_key]

    # 計算最佳匹配影像與查詢影像之間的旋轉角度
    rotation_angle = compute_rotation_angle_by_features(best_match_image, query_img)
    end_time = time.time()

    print(f"執行時間: {end_time - start_time:.2f} 秒，旋轉角度: {rotation_angle:.2f} 度。")

    # 顯示最匹配的影像
    cv2.imshow("Best Match Image", best_match_image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

    return rotation_angle


# 載入直方圖模型和影像集合
histograms_data = load_histograms_from_file('baidu.pkl')
histogram_model = histograms_data['baidu']
image_dataset = histograms_data['allimg']
query_image_path = '5.jpeg'  # 示例查詢影像路徑

query_img = cv2.imread(query_image_path)

# 估計查詢影像的旋轉角度
estimated_angle = estimate_image_angle(histogram_model, image_dataset, query_img)

最終結果如下：

更細緻的處理方法還可以建立掩碼，將中心圖摳出來，這樣會更加準確，可以根據這個思路去設計標註工具，基本很快就能完成對抗，持續對抗是一個不錯的選擇。這部分標註好的資料集大概 5000 個，後續我會上傳到知識星球中，僅供學習交流。