影像讀取庫合集——cv2, PIL, skimage與numpy, pytorch(ToPILimage)

qq_37172182發表於2020-10-05

影像讀取庫合集——cv2, PIL, skimage與numpy, pytorch(ToPILimage)

1 影像讀取與屬性

1.1 PIL與numpy間的相互訪問
import numpy as np
from PIL import Image

#read a image with 3 channels, 500x889 pixels
img_pil =  Image.open('./test.png')
#show a image
img_pil.show()

#get image imfo
print(img_pil)

#get the pixel value in PIL format
print(img_pil.getpixel((0,0))) 

#covert PIL to numpy
img_np = np.array(img_pil)
print(img_np.shape)

#get the pixel value in numpy format
print(img_np[0,0])

#convert numpy to PIL
img_pil = Image.fromarray(img_np)
print(img_pil)

"""
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x889 at 0x193331AD240>
(219, 210, 193)
(889, 500, 3)
[219 210 193]
<PIL.Image.Image image mode=RGB size=500x889 at 0x1933330ADA0>
"""

n o t e note note

  1. PIL庫讀取影像的三通道順序為RGB,讀取影像的寬度( w i d t h width width)和高度( h e i g h t height height)與原始影像一致;
  2. PIL庫與 n u m p y numpy numpy的轉化存在細微的差別: n u m p y . a r r a y ( ) numpy.array() numpy.array()會改變PIL物件的寬度和高度資訊, I m a g e . f r o m a r r a y ( ) Image.fromarray() Image.fromarray()會重新調整回原始狀態;
  3. PIL訪問某一位置的畫素值時呼叫 i m g _ p i l . g e t p i x e l ( ( x , y ) ) img\_pil.getpixel((x,y)) img_pil.getpixel((x,y)), n u m p y numpy numpy為矩陣形式,直接訪問 i n d e x index index, i m g _ n p [ x , y ] img\_np[x,y] img_np[x,y]
1.2 cv2與numpy間的相互訪問
import numpy as np
import cv2

#read a image with 3 channels, 500x889 pixels
img_cv = cv2.imread('./test.png')
#show a image
cv2.imshow('img', img_cv)

#get image imfo
print(img_cv.shape)

#get the pixel value in cv2 format
print(img_cv[0,0]) 

#covert cv2 to numpy
img_np = np.array(img_cv)
print(img_np.shape)

#get the pixel value in numpy format
print(img_np[0,0])

#convert numpy to cv2(not necessary)
cv2.imshow('img_np', img_np)

cv2.waitKey(0)

"""
(889, 500, 3)
[193 210 219]
(889, 500, 3)
[193 210 219]
"""

n o t e note note:

  1. cv2讀取影像的三通道順序為GBR, 影像的寬度資訊和高度資訊發生調整;
  2. cv2訪問元素和 n u m p y numpy numpy的方式相同,通過 i n d e x index index直接訪問;
  3. cv2可以直接開啟 n u m p y numpy numpy陣列( u i n t 8 uint8 uint8);
  4. 為避免cv2閃退,通常加上 c v 2. w a i t K e y ( ) cv2.waitKey() cv2.waitKey()等待鍵入才退出;
1.3 skimg與numpy間的相互訪問
import numpy as np
from skimage import io, transform
import matplotlib.pyplot as plt


#read a image with 3 channels, 500x889 pixels
img_sk = io.imread('./test.png')

#get image info
print(img_sk.shape)
io.imshow(img_sk)

#get the pixel value in skimage format
print(img_sk[0,0])

#covert skimage to numpy
img_np = np.array(img_sk)
print(img_np.shape)

#get the pixel value in numpy format
print(img_np[0,0])

#convert numpy to skimg
io.imshow(img_np)

plt.show()

"""
(889, 500, 3)
[219 210 193]
(889, 500, 3)
[219 210 193]
"""

n o t e note note:

  1. s k i m a g e skimage skimage庫和 c v 2 cv2 cv2比較相似,可以看到結果輸出也基本相同,和 n u m p y numpy numpy的轉化也比較方便;
  2. s k i m a g e skimage skimage庫無法直接開啟影像,需要藉助 m a t p l o t l i b . p y p l o t matplotlib.pyplot matplotlib.pyplot,因此 s k i m a g e skimage skimage通常和 p y p l o t pyplot pyplot合併使用用於過程視覺化,可以方便畫圖、畫表格;

綜上而言,PIL庫儘可能保持了原始輸入的資訊,使用方便快捷,此外,PIL庫通常還可以與imageio庫相互結合做影像預處理;c v 2 v2 v2將影像轉化為陣列便於對影像的進一步處理; s k i m a g e skimage skimage m a t p l l t l i b matplltlib matplltlib相互結合,做影像對比更加方便;

2 Pytorch讀取影像

torch.utils.data.DataLoader(dataset, batch_size=batch_size,
                                  shuffle=False, num_workers=8, drop_last=False)

呼叫 P y t o r c h Pytorch Pytorch D a t a L o a d e r DataLoader DataLoader時需要載入 d a t a s e t dataset dataset,此處的 d a t a s e t dataset dataset為自定義的資料,用於輸出影像和對應的標籤資訊,同時對影像做資料增強,此時的資料型別為PIL物件,此處以Standford_car為例(程式碼來源:sourcecode):

class STANFORD_CAR():
    def __init__(self, input_size, root, is_train=True, data_len=None):
        self.input_size = input_size
        self.root = root
        self.is_train = is_train
        train_img_path = os.path.join(self.root, 'cars_train')
        test_img_path = os.path.join(self.root, 'cars_test')
        train_label_file = open(os.path.join(self.root, 'train.txt'))
        test_label_file = open(os.path.join(self.root, 'test.txt'))
        train_img_label = []
        test_img_label = []
        for line in train_label_file:
            train_img_label.append([os.path.join(train_img_path,
                                                 line[:-1].split(' ')[0]), 
                                    		int(line[:-1].split(' ')[1])-1])
        for line in test_label_file:
            test_img_label.append([os.path.join(test_img_path,
                                                line[:-1].split(' ')[0]), 														int(line[:-1].split(' ')[1])-1])
        self.train_img_label = train_img_label[:data_len]
        self.test_img_label = test_img_label[:data_len]


    def __getitem__(self, index):
        if self.is_train:
            img, target = imageio.imread(self.train_img_label[index][0]),
            									self.train_img_label[index][1]
            if len(img.shape) == 2:
                img = np.stack([img] * 3, 2)
            img = Image.fromarray(img, mode='RGB')

            img = transforms.Resize((self.input_size, 
                                     self.input_size), Image.BILINEAR)(img)
            # img = transforms.RandomResizedCrop(size=self.input_size,
            						#scale=(0.4, 0.75),ratio=(0.5,1.5))(img)#
            # img = transforms.RandomCrop(self.input_size)(img)
            img = transforms.RandomHorizontalFlip()(img)
            img = transforms.ColorJitter(brightness=0.2, contrast=0.2)(img)

            img = transforms.ToTensor()(img)
            img = transforms.Normalize([0.485, 0.456, 0.406], 
                                       [0.229, 0.224, 0.225])(img)

        else:
            img, target = imageio.imread(self.test_img_label[index][0]), 
            									self.test_img_label[index][1]
            if len(img.shape) == 2:
                img = np.stack([img] * 3, 2)
            img = Image.fromarray(img, mode='RGB')
            img = transforms.Resize((self.input_size, 
                                     self.input_size), Image.BILINEAR)(img)
            # img = transforms.CenterCrop(self.input_size)(img)
            img = transforms.ToTensor()(img)
            img = transforms.Normalize([0.485, 0.456, 0.406],
                                       	[0.229, 0.224, 0.225])(img)
        return img, target

    def __len__(self):
        if self.is_train:
            return len(self.train_img_label)
        else:
            return len(self.test_img_label)

此段程式碼同時使用了PIL庫, n u m p y numpy numpy庫,以及相應的 i m a g e i o imageio imageio庫進行相應的影像增強。

相關文章