基於python+深度學習構建驗證碼識別服務系列文章第一章

簡單而真實發表於2019-10-17

原文網址 : https://juejin.im/post/5da81e73f265da5b8d18d889

Python深度學習

注意：請勿用於商業用途

python環境：python3.5

第一節：準備工作

1.前言

專案基於python+CNN+Tensorflow,模型訓練中使用Tensorflow CPU版本，只要你的機器記憶體8G以上，就可以按照文章描述的替換訓練樣本為你自己的樣本、簡單修改模型幾個引數就可以訓練出一個期望的模型。

2.常見字元驗證碼形式

上述展示的驗證碼圖片不代表任何實際的網站，如有雷同，純屬巧合，該專案只能用於學習和交流用途，不得用於非法用途。常見驗證碼驗證規則為輸入圖片中的字元，或者輸入彩色字元中指定顏色的字元。

3.訓練樣本獲得

識別模型效果與訓練樣本的質量、數量有直接關係，就目前我自己多個驗證碼識別模型訓練經驗來說，大小寫+數字這種格式的驗證碼識別，訓練樣本個數只要超過一萬基本就可以得到一個80%準確率的模型；如果依據具體驗證碼的特點通過影像處理做一些簡單處理之後在去訓練和識別，樣本量還可以更少。上圖中的第五種驗證碼和第六種驗證碼在做一定處理之後2000張樣本就可以得到90%的識別準確率。

3.1 人工打碼

人工打碼是最常見的一種辦法，目前網路上有多家公司提供打碼服務，基於他們的服務就可以批量的對我們的樣本做標註。但是問題也比較明顯，首先打碼平臺需要付費需要一定成本、其次標註的資料有一部分錯誤資料，錯誤資料對模型最終的效果有一定的影響。錯誤標籤可以基於一定的邏輯來規避，比如驗證標註的資料是否正確，這個大家自己腦補即可。

3.2 模擬生成

分析驗證碼的特點，通過程式模擬生成類似的甚至完全一樣的驗證碼；技術要求高、但是可以獲得無群訓練樣本

3.3 基於Python生成驗證碼

實際網站驗證碼（來源網路）

程式碼生成驗證碼

程式碼生成驗證碼注意的點：

字型檔案：要找到實際圖片所用的字型庫

字型大小：通過觀察比較的方式確定字型大小

干擾新增：分析圖片中的干擾，模擬生成干擾

生成所用的python程式碼：

其他型別的驗證碼可以依據下面程式碼做一定修改即可生成對應格式的驗證碼

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2019/10/12 10:01
# @Author  : shm
# @Site    : 
# @File    : create_yzm.py
# @Software: PyCharm
import random
from PIL import Image,ImageDraw,ImageFont
def getRandomColor():
    '''
    生成隨機顏色
    :return:
    '''
    r = random.randint(0,255)
    g = random.randint(0,255)
    b = random.randint(0,255)
    return (r,g,b)

def getRandomChar():
    '''
    生成隨機字元
    :return:
    '''
    charlist = "123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"
    random_char = random.choice(charlist)
    return random_char

def genImg(width,height,font_size,chr_num):
    '''
    生成一張width*height圖片
    :param width:圖片寬度
    :param height:圖片高度
    :param font_size: 字型大小
    :param chr_num: 字元數
    :return:
    '''
    #bg_color = getRandomColor()
    bg_color = (255,255,255)   #白色背景
    #建立一張隨機背景色圖片
    img = Image.new(mode="RGB",size=(width,height),color=bg_color)
    #獲取圖片畫筆、用於描繪字
    draw = ImageDraw.Draw(img)
    #修改字型
    font = ImageFont.truetype(font="Action Jackson",size=font_size)
    #font = ImageFont.truetype(font="華文彩雲", size=font_size)
    for i in range(chr_num):
        #隨機生成5種字元+5種顏色
        random_txt = getRandomChar()
        #txt_color = getRandomColor()
        txt_color = (0,0,255)   #藍色字型
        # while txt_color == bg_color:
        #     txt_color = getRandomColor()
        draw.text((36+16*i,5),text=random_txt,fill=txt_color,font=font)
    #畫干擾線
    drawLine(draw,width,height)
    #畫噪點
    drawPoint(draw,width,height)
    return img
def drawLine(draw,width,height):
    '''
    隨機畫線
    :param draw:
    :param width:
    :param height:
    :return:
    '''
    for i in range(10):
        x1 = random.randint(0, width)
        #x2 = random.randint(0,width-x1)
        x2 = x1+random.randint(0,25)
        y1 = random.randint(0, height)
        y2 = y1
        #y2 = random.randint(0, height)
        #draw.line((x1, y1, x2, y2), fill=getRandomColor())
        draw.line((x1, y1, x2, y2), fill=(0,0,255))
def drawPoint(draw,width,height):
    '''
    新增噪點
    :param draw:
    :param width:
    :param height:
    :return:
    '''
    for i in range(5):
        x = random.randint(0, 40)
        y = random.randint(0, height)
        #draw.point((x, y), fill=getRandomColor())
        draw.point((x, y), fill=(0,0,255))
def drawOther(draw):
    '''
    新增自定義噪聲
    :return:
    '''
    pass
def genyzm():
    '''
    生成驗證碼
    :param path:
    :return:
    '''
    #圖片寬度
    width = 106
    #圖片高度
    height = 30
    #字型大小
    font_size = 20
    #字元個數
    chr_num = 4
    #驗證碼儲存位置
    path = "./yzm_pic/"
    for i in range(10):
        img = genImg(width,height,font_size,chr_num)
        dir  = path + str(i)+".png"
        with open(dir,"wb") as fp:
            img.save(fp,format="png")

if __name__=="__main__":
    try:
        genyzm()
    except Exception as e:
        print(e)

複製程式碼

注意：真實的驗證碼可能會規避一些容易混淆的字元，比如1、0、O、z、2等容易被混淆的字元，所以在生成實際樣本時可以不加入這些字元，這樣訓練模型時標籤類別可以減少一些。

第二種漢字+數字+字母格式的驗證碼的生成程式碼此處就不直接貼出來了，這個驗證碼目前有網站在使用，為了不影響網站正常使用，此處不開源具體python程式碼，整體思路和上面程式碼類似，只是背景不是單一顏色，字元中加入了中文而已。

第二節：模型訓練

2.1 驗證碼識別思路

以下面兩種驗證碼為例進行說明

首先分析驗證碼能否切割、切割成單個字元識別需要的訓練樣本比較少，而且識別率很容易做的很高。比如上面這兩種驗證碼，通過切分成單個字元進行識別，我們只需要模擬隨機生成2000個驗證碼，切分之後就是4*2000 = 8000訓練樣本，即可獲得90%以上的準確率。

2.1.1 驗證碼特點分析

圖一中的驗證碼大小106X30且字元都集中在右邊，通過windows自帶的畫圖工具開啟，之後，發現字符集中在36-100這一區段，所以首先對圖片擷取36-106區間圖片，擷取之後圖片大小64*30 結果如下圖：

擷取圖片主要區域程式碼：

def screen_shot(src,dstpath):
    '''
    圖片預處理，擷取圖片主要區域
    :param src:源圖片地址
    :param dstpath:目標圖片地址
    :return:
    '''
    try:
        img = Image.open(src)
        s = os.path.split(src)
        fn = s[1].split(".")
        basename = fn[0]
        ext = fn[-1]
        box = (36, 0, 100, 30)
        dstdir = dstpath + basename + "." + ext
        img.crop(box).save(dstdir)
    except Exception as e:
        print("screenshot:",e)
複製程式碼

圖二驗證碼大小100X38字元均勻分佈，無需額外處理

2.1.2 圖片切割

對圖一處理之後的圖片做均勻切割，每個驗證碼圖片分割成四個小圖片，分割之後的結果如下圖：

直接對圖二做均勻切割，分割成四個單獨圖片結果如下：

通過觀察發現字元被完整切割成單個字元，圖一切分之後圖片大小16X30 圖二切分之後圖片大小25*38

切割圖片程式碼：

def split_image(src,rownum,colnum,dstpath):
    '''
    切分圖片
    :param src:
    :param rownum:
    :param colnum:
    :param dstpath:
    :return:
    '''
    try:
        img = Image.open(src)
        w,h = img.size
        if rownum <= h and colnum<=w:
            s = os.path.split(src)
            fn = s[1].split(".")
            basename = fn[0]
            ext = fn[-1]
            rowheight = h // rownum
            colwidth = w // colnum
            num = 0
            for r in range(rownum):
                for c in range(colnum):
                    name = str(basename[c:c+1])
                    t = str(int(time.time()*100000))
                    box = (c*colwidth,r*rowheight,(c+1)*colwidth,(r+1)*rowheight)
                    img.crop(box).save(dstpath+name+"/"+name+"#"+t+"."+ext)
                    num = num + 1
            print("圖片切割完畢，共生成%s張小圖片" % num)
        else:
            print("不合法的行列切割引數")
    except Exception as e:
        print("e:",e)
複製程式碼

2.2 深度學習模型

模型基於alexnet,關於alexnet模型詳細介紹可以檢視相關的文章，此處貼出模型程式碼：

此模型為輸出單個字元識別模型：

# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains a models definition for AlexNet.

This work was first described in:
  ImageNet Classification with Deep Convolutional Neural Networks
  Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton

and later refined in:
  One weird trick for parallelizing convolutional neural networks
  Alex Krizhevsky, 2014

Here we provide the implementation proposed in "One weird trick" and not
"ImageNet Classification", as per the paper, the LRN layers have been removed.

Usage:
  with slim.arg_scope(alexnet.alexnet_v2_arg_scope()):
    outputs, end_points = alexnet.alexnet_v2(inputs)

@@alexnet_v2
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)
def alexnet_v2_arg_scope(weight_decay=0.0005):
  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      activation_fn=tf.nn.relu,
                      biases_initializer=tf.constant_initializer(0.1),
                      weights_regularizer=slim.l2_regularizer(weight_decay)):
    with slim.arg_scope([slim.conv2d], padding='SAME'):
      with slim.arg_scope([slim.max_pool2d], padding='VALID') as arg_sc:
        return arg_sc
def alexnet_v2(inputs,
               num_classes=1000,
               is_training=True,
               dropout_keep_prob=0.5,
               spatial_squeeze=True,
               scope='alexnet_v2'):
  """AlexNet version 2.

  Described in: http://arxiv.org/pdf/1404.5997v2.pdf
  Parameters from:
  github.com/akrizhevsky/cuda-convnet2/blob/master/layers/
  layers-imagenet-1gpu.cfg

  Note: All the fully_connected layers have been transformed to conv2d layers.
        To use in classification mode, resize input to 224x224. To use in fully
        convolutional mode, set spatial_squeeze to false.
        The LRN layers have been removed and change the initializers from
        random_normal_initializer to xavier_initializer.

  Args:
    inputs: a tensor of size [batch_size, height, width, channels].
    num_classes: number of predicted classes.
    is_training: whether or not the models is being trained.
    dropout_keep_prob: the probability that activations are kept in the dropout
      layers during training.
    spatial_squeeze: whether or not should squeeze the spatial dimensions of the
      outputs. Useful to remove unnecessary dimensions for classification.
    scope: Optional scope for the variables.

  Returns:
    the last op containing the log predictions and end_points dict.
  """
  with tf.variable_scope(scope, 'alexnet_v2', [inputs]) as sc:
    end_points_collection = sc.name + '_end_points'
    # Collect outputs for conv2d, fully_connected and max_pool2d.
    with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
                        outputs_collections=[end_points_collection]):
      net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID',
                        scope='conv1')
      net = slim.max_pool2d(net, [3, 3], 2, scope='pool1')
      net = slim.conv2d(net, 192, [5, 5], scope='conv2')
      net = slim.max_pool2d(net, [3, 3], 2, scope='pool2')
      net = slim.conv2d(net, 384, [3, 3], scope='conv3')
      net = slim.conv2d(net, 384, [3, 3], scope='conv4')
      net = slim.conv2d(net, 256, [3, 3], scope='conv5')
      net = slim.max_pool2d(net, [3, 3], 2, scope='pool5')

      # Use conv2d instead of fully_connected layers.
      with slim.arg_scope([slim.conv2d],
                          weights_initializer=trunc_normal(0.005),
                          biases_initializer=tf.constant_initializer(0.1)):
        net = slim.conv2d(net, 4096, [5, 5], padding='VALID',
                          scope='fc6')
        net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                           scope='dropout6')
        net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
        net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                           scope='dropout7')
        net0 = slim.conv2d(net, num_classes, [1, 1],
                          activation_fn=None,
                          normalizer_fn=None,
                          biases_initializer=tf.zeros_initializer(),
                          scope='fc8_0')

      # Convert end_points_collection into a end_point dict.
      end_points = slim.utils.convert_collection_to_dict(end_points_collection)
      if spatial_squeeze:
        net0 = tf.squeeze(net0, [1, 2], name='fc8_0/squeezed')
        end_points[sc.name + '/fc8_0'] = net0
      return net0, end_points
alexnet_v2.default_image_size = 224
複製程式碼

模型檔案需要依據具體識別的驗證碼長度做一些調整：

輸出四字元驗證碼識別模型：

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf

slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)


def alexnet_v2_arg_scope(weight_decay=0.0005):
  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      activation_fn=tf.nn.relu,
                      biases_initializer=tf.constant_initializer(0.1),
                      weights_regularizer=slim.l2_regularizer(weight_decay)):
    with slim.arg_scope([slim.conv2d], padding='SAME'):
      with slim.arg_scope([slim.max_pool2d], padding='VALID') as arg_sc:
        return arg_sc


def alexnet_v2(inputs,
               num_classes=1000,
               is_training=True,
               dropout_keep_prob=0.5,
               spatial_squeeze=True,
               scope='alexnet_v2'):
  """AlexNet version 2.

  Described in: http://arxiv.org/pdf/1404.5997v2.pdf
  Parameters from:
  github.com/akrizhevsky/cuda-convnet2/blob/master/layers/
  layers-imagenet-1gpu.cfg

  Note: All the fully_connected layers have been transformed to conv2d layers.
        To use in classification mode, resize input to 224x224. To use in fully
        convolutional mode, set spatial_squeeze to false.
        The LRN layers have been removed and change the initializers from
        random_normal_initializer to xavier_initializer.

  Args:
    inputs: a tensor of size [batch_size, height, width, channels].
    num_classes: number of predicted classes.
    is_training: whether or not the model is being trained.
    dropout_keep_prob: the probability that activations are kept in the dropout
      layers during training.
    spatial_squeeze: whether or not should squeeze the spatial dimensions of the
      outputs. Useful to remove unnecessary dimensions for classification.
    scope: Optional scope for the variables.

  Returns:
    the last op containing the log predictions and end_points dict.
  """
  with tf.variable_scope(scope, 'alexnet_v2', [inputs]) as sc:
    end_points_collection = sc.name + '_end_points'
    # Collect outputs for conv2d, fully_connected and max_pool2d.
    with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
                        outputs_collections=[end_points_collection]):
      net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID',
                        scope='conv1')
      net = slim.max_pool2d(net, [3, 3], 2, scope='pool1')
      net = slim.conv2d(net, 192, [5, 5], scope='conv2')
      net = slim.max_pool2d(net, [3, 3], 2, scope='pool2')
      net = slim.conv2d(net, 384, [3, 3], scope='conv3')
      net = slim.conv2d(net, 384, [3, 3], scope='conv4')
      net = slim.conv2d(net, 256, [3, 3], scope='conv5')
      net = slim.max_pool2d(net, [3, 3], 2, scope='pool5')

      # Use conv2d instead of fully_connected layers.
      with slim.arg_scope([slim.conv2d],
                          weights_initializer=trunc_normal(0.005),
                          biases_initializer=tf.constant_initializer(0.1)):
        net = slim.conv2d(net, 4096, [5, 5], padding='VALID',
                          scope='fc6')
        net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                           scope='dropout6')
        net = slim.conv2d(net, 4096, [1, 1], scope='fc7')
        net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
                           scope='dropout7')
        net0 = slim.conv2d(net, num_classes, [1, 1],
                          activation_fn=None,
                          normalizer_fn=None,
                          biases_initializer=tf.zeros_initializer(),
                          scope='fc8_0')
        net1 = slim.conv2d(net, num_classes, [1, 1],
                          activation_fn=None,
                          normalizer_fn=None,
                          biases_initializer=tf.zeros_initializer(),
                          scope='fc8_1')
        net2 = slim.conv2d(net, num_classes, [1, 1],
                          activation_fn=None,
                          normalizer_fn=None,
                          biases_initializer=tf.zeros_initializer(),
                          scope='fc8_2')
        net3 = slim.conv2d(net, num_classes, [1, 1],
                          activation_fn=None,
                          normalizer_fn=None,
                          biases_initializer=tf.zeros_initializer(),
                          scope='fc8_3')

      # Convert end_points_collection into a end_point dict.
      end_points = slim.utils.convert_collection_to_dict(end_points_collection)
      if spatial_squeeze:
        net0 = tf.squeeze(net0, [1, 2], name='fc8_0/squeezed')
        end_points[sc.name + '/fc8_0'] = net0
        net1 = tf.squeeze(net1, [1, 2], name='fc8_1/squeezed')
        end_points[sc.name + '/fc8_1'] = net1
        net2 = tf.squeeze(net2, [1, 2], name='fc8_2/squeezed')
        end_points[sc.name + '/fc8_2'] = net2
        net3 = tf.squeeze(net3, [1, 2], name='fc8_3/squeezed')
        end_points[sc.name + '/fc8_3'] = net3
      return net0,net1,net2,net3,end_points
alexnet_v2.default_image_size = 224
複製程式碼

不一樣的地方：

五個、六個字元驗證碼就是依照上圖中的紅色部分擴充套件即可

2.3 TFrecord格式訓練資料生成

關於TFrecord格式檔案具體介紹請檢視Tensorflow相關文件

TFrecord訓練資料生成程式碼：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tensorflow as tf
import os
import random
import math
import sys
from PIL import Image
import numpy as np

_NUM_TEST = 500
_RANDOM_SEED = 0
MAX_CAPTCHA = 1
#切分之後的單個字元圖片儲存位置，檔名稱以圖片實際對應字元命名
DATASET_DIR = "./split_img/yzm"
#訓練資料存放位置
TFRECORD_DIR = './TFrecord/'
def _dataset_exists(dataset_dir):
    for split_name in ['train', 'test']:
        output_filename = os.path.join(dataset_dir, split_name + '.tfrecords')
        if not tf.gfile.Exists(output_filename):
            return False
    return True

def _get_filenames_and_classes(dataset_dir):
    photo_filenames = []
    for filename in os.listdir(dataset_dir):
        path = os.path.join(dataset_dir, filename)
        photo_filenames.append(path)
    return photo_filenames

def int64_feature(values):
    if not isinstance(values, (tuple, list)):
        values = [values]
    return tf.train.Feature(int64_list=tf.train.Int64List(value=values))

def bytes_feature(values):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[values]))

def image_to_tfexample(image_data, label0):
    # Abstract base class for protocol messages.
    return tf.train.Example(features=tf.train.Features(feature={
        'image': bytes_feature(image_data),
        'label0': int64_feature(label0)
    }))


def char2pos(c):
    if c == '_':
        k = 62
        return k
    k = ord(c) - 48
    if k > 9:
        k = ord(c) - 55
        if k > 35:
            k = ord(c) - 61
            if k > 61:
                raise ValueError('No Map')
    return k

def char2pos1(c):
    if c == '_':
        k = 36
        return k
    k = ord(c) - 48
    if k > 9:
        k = ord(c) - 55
        if k > 35:
            k = ord(c) - (61 + 26)
            if k > 36:
                raise ValueError('No Map')
    return k
def _convert_dataset(split_name, filenames, dataset_dir):
    assert split_name in ['train', 'test']

    with tf.Session() as sess:
        output_filename = os.path.join(TFRECORD_DIR, split_name + '.tfrecords')
        with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
            for i, filename in enumerate(filenames):
                try:
                    sys.stdout.write('\r>> Converting image %d/%d' % (i + 1, len(filenames)))
                    sys.stdout.flush()
                    image_data = Image.open(filename)
                    image_data = image_data.resize((224, 224))
                    image_data = np.array(image_data.convert('L'))
                    image_data = image_data.tobytes()

                    labels = filename.split('\\')[-1][0:1]
                    print(labels)
                    num_labels = []
                    num_labels.append(int(char2pos1(labels)))
                    example = image_to_tfexample(image_data, num_labels[0])
                    tfrecord_writer.write(example.SerializeToString())
                    # for j in range(4):                  //四字元用
                    #     num_labels.append(int(char2pos1(labels[j])))
                    # example = image_to_tfexample(image_data, num_labels[0], num_labels[1], num_labels[2], num_labels[3])
                    # tfrecord_writer.write(example.SerializeToString())

                except IOError as e:
                    print('Could not read:', filename)
                    print('Error:', e)
                    print('Skip it\n')
    sys.stdout.write('\n')
    sys.stdout.flush()

if _dataset_exists(TFRECORD_DIR):
    print('tfcecord file exists')
else:
    photo_filenames = _get_filenames_and_classes(DATASET_DIR)
    random.seed(_RANDOM_SEED)
    random.shuffle(photo_filenames)
    training_filenames = photo_filenames[_NUM_TEST:]
    testing_filenames = photo_filenames[:_NUM_TEST]
    _convert_dataset('train', training_filenames, DATASET_DIR)
    _convert_dataset('test', testing_filenames, DATASET_DIR)
    print('完成')
複製程式碼

2.4 模型訓練：

以圖一為例的訓練程式碼如下：

程式碼：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2019/4/30 10:59
# @Author  : shm
# @Site    : 
# @File    : MyTensorflowTrain.py
# @Software: PyCharm

import os
import tensorflow as tf
from PIL import Image
from nets import nets_factory
import numpy as np

# 不同字元數量
CHAR_SET_LEN = 36
#t圖片高度
IMAGE_HEIGHT = 30
# 圖片寬度
IMAGE_WIDTH = 16
# 批次
BATCH_SIZE = 100
# tfrecord檔案存放路徑
TFRECORD_FILE = "./TFrecord/train.tfrecords"
# placeholder
x = tf.placeholder(tf.float32, [None, 224, 224])
y0 = tf.placeholder(tf.float32, [None])
# 學習率
lr = tf.Variable(0.003, dtype=tf.float32)
# 從tfrecord讀出資料
def read_and_decode(filename):
    # 根據檔名生成一個佇列
    filename_queue = tf.train.string_input_producer([filename])
    reader = tf.TFRecordReader()
    # 返回檔名和檔案
    _, serialized_example = reader.read(filename_queue)
    features = tf.parse_single_example(serialized_example,
                                       features={
                                           'image': tf.FixedLenFeature([], tf.string),
                                           'label0': tf.FixedLenFeature([], tf.int64)
                                       })
    # 獲取圖片資料
    image = tf.decode_raw(features['image'], tf.uint8)
    # tf.train.shuffle_batch必須確定shape
    image = tf.reshape(image, [224, 224])
    # 圖片預處理
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.subtract(image, 0.5)
    image = tf.multiply(image, 2.0)
    # 獲取label
    label0 = tf.cast(features['label0'], tf.int32)
    return image, label0

# 獲取圖片資料和標籤
image, label0 = read_and_decode(TFRECORD_FILE)
# 使用shuffle_batch可以隨機打亂
image_batch, label_batch0 = tf.train.shuffle_batch(
    [image, label0], batch_size=BATCH_SIZE,
    capacity=50000, min_after_dequeue=10000, num_threads=1)

# 定義網路結構
train_network_fn = nets_factory.get_network_fn(
    'alexnet_v2',
    num_classes=CHAR_SET_LEN,
    weight_decay=0.0005,
    is_training=True)

with tf.Session() as sess:
    # inputs: a tensor of size [batch_size, height, width, channels]
    X = tf.reshape(x, [BATCH_SIZE, 224, 224, 1])
    # 資料輸入網路得到輸出值
    logits0,end_points = train_network_fn(X)

    # 把標籤轉成one_hot的形式
    one_hot_labels0 = tf.one_hot(indices=tf.cast(y0, tf.int32), depth=CHAR_SET_LEN)

    # 計算loss
    loss0 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits0, labels=one_hot_labels0))
    # 計算總的loss
    total_loss = (loss0)
    # 優化total_loss
    optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(total_loss)

    # 計算準確率
    correct_prediction0 = tf.equal(tf.argmax(one_hot_labels0, 1), tf.argmax(logits0, 1))
    accuracy0 = tf.reduce_mean(tf.cast(correct_prediction0, tf.float32))

    # 用於儲存模型
    saver = tf.train.Saver()
    # 初始化
    sess.run(tf.global_variables_initializer())

    # 建立一個協調器，管理執行緒
    coord = tf.train.Coordinator()
    # 啟動QueueRunner, 此時檔名佇列已經進隊
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)

    for i in range(60001):
        # 獲取一個批次的資料和標籤
        b_image, b_label0 = sess.run([image_batch, label_batch0])
        # 優化模型
        sess.run(optimizer, feed_dict={x: b_image, y0: b_label0})
        # 每迭代20次計算一次loss和準確率
        if i % 20 == 0:
            # 每迭代2000次降低一次學習率
            if i % 2000 == 0:
                sess.run(tf.assign(lr, lr / 3))
            acc0, loss_ = sess.run([accuracy0, total_loss],feed_dict={x: b_image,y0: b_label0})
            learning_rate = sess.run(lr)
            print("Iter:%d  Loss:%.3f  Accuracy:%.2f  Learning_rate:%.4f" % (i, loss_, acc0, learning_rate))
            # 儲存模型
            if acc0 > 0.99:
                saver.save(sess, "./models/crack_captcha_model", global_step=i)
                break
            if i == 60000:
                saver.save(sess, "./models/crack_captcha_model", global_step=i)
                break

    # 通知其他執行緒關閉
    coord.request_stop()
    # 其他所有執行緒關閉之後，這一函式才能返回
    coord.join(threads)
複製程式碼

2.5 模型識別率測試程式碼：

#coding=utf-8
import os
import tensorflow as tf 
from PIL import Image
from nets import nets_factory
import numpy as np
import matplotlib.pyplot as plt  

CHAR_SET_LEN = 36
IMAGE_HEIGHT = 30
IMAGE_WIDTH =16

BATCH_SIZE = 1
TFRECORD_FILE = "./TFrecord/test.tfrecords"
# placeholder
x = tf.placeholder(tf.float32, [None, 224, 224])  

def read_and_decode(filename):

    filename_queue = tf.train.string_input_producer([filename])
    reader = tf.TFRecordReader()

    _, serialized_example = reader.read(filename_queue)   
    features = tf.parse_single_example(serialized_example,
                                       features={
                                           'image' : tf.FixedLenFeature([], tf.string),
                                           'label0': tf.FixedLenFeature([], tf.int64),
                                       })

    image = tf.decode_raw(features['image'], tf.uint8)

    image_raw = tf.reshape(image, [224, 224])
    #
    image = tf.reshape(image, [224, 224])
    #
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.subtract(image, 0.5)
    image = tf.multiply(image, 2.0)
    #
    label0 = tf.cast(features['label0'], tf.int32)

    return image, image_raw, label0

image, image_raw, label0 = read_and_decode(TFRECORD_FILE)

#
image_batch, image_raw_batch, label_batch0 = tf.train.shuffle_batch([image, image_raw, label0], batch_size = BATCH_SIZE,capacity = 50000, min_after_dequeue=10000, num_threads=1)

train_network_fn = nets_factory.get_network_fn('alexnet_v2',num_classes=CHAR_SET_LEN,weight_decay=0.0005, is_training=False)

with tf.Session() as sess:
    # inputs: a tensor of size [batch_size, height, width, channels]
    X = tf.reshape(x, [BATCH_SIZE, 224, 224, 1])
    #
    logits0,end_points = train_network_fn(X)
    
    #
    predict0 = tf.reshape(logits0, [-1, CHAR_SET_LEN])  
    predict0 = tf.argmax(predict0, 1)
    #
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    #
    saver = tf.train.Saver()
    saver.restore(sess, './models/crack_captcha_model-1080')
    #
    coord = tf.train.Coordinator()
    #
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    count = 0
    for i in range(500):
        #
        try:
            b_image, b_image_raw, b_label0 = sess.run([image_batch,image_raw_batch, label_batch0])
        except Exception as e:
            print(e)
        #
        img=Image.fromarray(b_image_raw[0],'L')
        print('label:',b_label0)
        #獲得預測值
        label0 = sess.run(predict0, feed_dict={x: b_image})
        print('predict:',label0)
        if b_label0[0] == label0[0]:
            count = count + 1
    print(count)
    #
    coord.request_stop()
    #
    coord.join(threads)
複製程式碼

第三節：API介面開發

程式碼中直接載入了兩個模型檔案，統一一個介面通過module引數傳不同的值，可以呼叫不同的模型來識別不同驗證碼

3.1基於Flask實現驗證碼識別服務：

API服務程式碼

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2019/10/14 10:25
# @Author  : shm
# @Site    : 
# @File    : YZM_Service.py
# @Software: PyCharm

from flask import Flask, request, render_template
import tensorflow as tf
from PIL import Image
from nets import nets_factory
import numpy as np
import base64
from io import BytesIO

def num2char(num):
    '''
    數字轉字元
    :param num:
    :return:
    '''
    if num < 10:
        return (num + ord('0'))
    elif num < 36:
        return (num - 10 + ord('a'))
    elif num == 36:
        return (ord('_'))
    else:
        raise ValueError('Error')

def splitimage(img, rownum, colnum):
    '''
    圖片切割
    :param img:
    :param rownum:
    :param colnum:
    :return:
    '''
    w, h = img.size
    if rownum <= h and colnum <= w:
        rowheight = h // rownum
        colwidth = w // colnum
        r = 0
        imlist = []
        for c in range(colnum):
            box = (c * colwidth, r * rowheight, (c + 1) * colwidth, (r + 1) * rowheight)
            imlist.append(img.crop(box))
        return imlist
def ImageReshap(img):
    '''
    預處理224*224
    :param img:
    :return:
    '''
    image_data = img.resize((224, 224))
    image_data = np.array(image_data.convert('L'))
    return image_data

class LoadModel_v1:
    def __init__(self,model_path,char_set_len=36):
        '''
        :param model_path: 模型檔案路徑
        :param char_set_len:
        '''
        self.char_set_len = char_set_len
        g = tf.Graph()
        with g.as_default():
            self.sess = tf.Session(graph=g)
            self.graph = self.build_graph()
            BATCH_SIZE = 1
            self.x = tf.placeholder(tf.float32, [None, 224, 224])
            self.img = tf.placeholder(tf.float32, None)
            image_data1 = tf.cast(self.img, tf.float32) / 255.0
            image_data2 = tf.subtract(image_data1, 0.5)
            image_data3 = tf.multiply(image_data2, 2.0)
            self.image_batch = tf.reshape(image_data3, [1, 224, 224])
            X = tf.reshape(self.x, [BATCH_SIZE, 224, 224, 1])
            self.logits0, self.end_points = self.graph(X)
            self.sess.run(tf.global_variables_initializer())
            saver = tf.train.Saver()
        saver.restore(self.sess,model_path)

    def build_graph(self):
        '''
        載入模型檔案
        :return:
        '''
        train_network_fn = nets_factory.get_network_fn('alexnet_v2',num_classes=self.char_set_len,weight_decay=0.0005,is_training=False)
        return train_network_fn
    def recognize(self,image):
        '''
        圖片識別
        :param image:
        :return:
        '''
        try:
            inputdata = self.sess.run(self.image_batch, feed_dict={self.img: image})
            predict0 = tf.reshape(self.logits0, [-1, self.char_set_len])
            predict0 = tf.argmax(predict0, 1)
            label = self.sess.run(predict0, feed_dict={self.x: inputdata})
            text = chr(num2char(label))
            return text
        except Exception as e:
            print("recognize",e)
            return ""

    def screen_shot(self,img):
        '''
        圖片預處理，擷取圖片主要區域
        :param img:
        :return:
        '''
        try:
            box = (36, 0, 100, 30)
            return img.crop(box)
        except Exception as e:
            print("screenshot:", e)
            return None
    def img_to_text(self,imgdata):
        '''
        圖片轉字元
        :return:識別之後的字元結果
        '''
        yzmstr = ""
        with BytesIO() as iofile:
            iofile.write(imgdata)
            with Image.open(iofile) as img:
                img = self.screen_shot(img)
                imglist = splitimage(img, 1, 4)
            text = []
            for im in imglist:
                imgreshap = ImageReshap(im)
                yzmstr = self.recognize(imgreshap)
                text.append(yzmstr)
            yzmstr = "".join(text)
        return yzmstr
class LoadModel_v2(LoadModel_v1):
    def __init__(self,model_path):
        super(LoadModel_v2, self).__init__(model_path)
    def img_to_text(self,imgdata):
        yzmstr = ""
        with BytesIO() as iofile:
            iofile.write(imgdata)
            with Image.open(iofile) as img:
                imglist = splitimage(img, 1, 4)
            text = []
            for im in imglist:
                imgreshap = ImageReshap(im)
                yzmstr = self.recognize(imgreshap)
                text.append(yzmstr)
            print(yzmstr)
            yzmstr = "".join(text)
        return yzmstr

app = Flask(__name__)
@app.route('/')
def index():
    return render_template('index.html')
@app.route('/Recognition',methods=['POST'])
def recognition():
    try:
        imgdata = request.form.get('imgdata')
        module = request.form.get("module","")
        if module == "v1":
            decodeData = base64.b64decode(imgdata)
            yzmstr = loadModel_model1.img_to_text(decodeData)
            return yzmstr
        elif module == "v2":
            decodeData = base64.b64decode(imgdata)
            yzmstr = loadModel_model2.img_to_text(decodeData)
            return yzmstr
        else:
            return "unkonw channel"
    except Exception as e:
        return repr(e)
if __name__ == "__main__":
    #初始化模型1
    loadModel_model1 = LoadModel_v1("./models/crack_captcha_model-1080")
    #初始化模型2
    loadModel_model2 = LoadModel_v2("./models/crack_captcha.model-2140")
    app.run(host='0.0.0.0', port=2002, debug=True)
複製程式碼

3.2 介面呼叫程式碼

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2019/5/6 18:46
# @Author  : shm
# Site     : 
# @File    : test.py
# @Software: PyCharm
import base64
import requests
import os

#識別API介面
url = "http://127.0.0.1:2002/Recognition"

#測試用驗證碼存放路徑
path = "./image/pic"

#對應不同模型版本號
model = "v1"   
#model = "v2"

imglist = os.listdir(path)
count = 0
nums = len(imglist)
for file in imglist:
    try:
        dir  = path + "\\" + file
        with open(dir,"rb") as fp:
            database64 = base64.b64encode(fp.read())
        form = {
            'module':model,
            'imgdata': database64
        }
        r = requests.post(url, data=form)
        res = r.text
        yuan = file[0:4]
        if yuan.lower() == res:
            count = count + 1
            print("Success")
        else:
            print(file[0:4],"==",res)
    except Exception as e:
        print(e)
print("%s平臺-----總共：%s-----正確識別：%s" % (model,nums,count))
複製程式碼

總結

此處主要介紹瞭如何模擬生成驗證碼訓練樣本資料，以及如何切分驗證碼做識別。後續文章會在此文章基礎上，實現圖片不切分整體識別模型訓練、以及不定長驗證碼識別技術方案實現、基於深度學習驗證碼通用識別模型解決方案專案相關的程式碼後期都會同步到git上，後期會把地址新增進來，今天先寫到這兒文章中有不足地方和任何疑問，歡迎加QQ:1071830794進行交流探討，歡迎大家一起學習和成長。

基於python+深度學習構建驗證碼識別服務系列文章 第一章