【深度學習 論文篇 03-2】Pytorch搭建SSD模型踩坑集錦

最菜程式設計師Sxx發表於2022-05-02

論文地址:https://arxiv.org/abs/1512.02325

原始碼地址:http://github.com/amdegroot/ssd.pytorch

環境1:torch1.9.0+CPU

環境2:torch1.8.1+cu102、torchvision0.9.1+cu102

 

1. StopIteration。Batch_size設定32,訓練至60次報錯,訓練中斷;Batch_size改成8訓練至240次報錯。

報錯原因及解決方法:train.py第165行:

# 修改之前
images, targets = next(batch_iterator)

# 修改之後
try:
    images, targets = next(batch_iterator)
except:
    batch_iterator = iter(data_loader)
    images, targets = next(batch_iterator)

 

2. UserWarning: volatile was removed and now has no effect. Use 'with torch.no_grad():' instead.

報錯原因及解決方法:Pytorch版本問題,ssd.py第34行:

# 修改之前  
self.priors = Variable(self.priorbox.forward(), volatile=True)

# 修改之後
with torch.no_grad():
    self.priors = torch.autograd.Variable(self.priorbox.forward())

 

3. UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.

報錯原因及解決方法:nn.init.xavier_uniform是以前版本,改成nn.init.xavier_uniform_即可

 

4. VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. 

報錯原因及解決方法:版本問題,augmentation.py第238行mode = random.choice(self.sample_options)報錯,改為mode = np.array(self.sample_options, dtype=object),但並沒鳥用。。。由於是Warning,懶得再管了

 

5. AssertionError: Must define a window to update

報錯原因及解決方法:開啟vidsom視窗更新時報錯(train.py 153行)

# 報錯程式碼(153行)
update_vis_plot(epoch, loc_loss, conf_loss, epoch_plot, None, 'append', epoch_size)

將將158行epoch+=1放在報錯程式碼之前即可解決問題

 

 

6. KeyError: "filename 'storages' not found"。執行驗證指令碼eval.py和測試指令碼test.py報的錯

報錯原因及解決方法:載入的.pth模型檔案損壞

 

7. UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.

報錯原因及解決方法:版本問題,新版本損失函式的引數中,size_average和reduce已經被棄用,設定reduction即可。_reduction.py第90行修改如下:

# 修改之前(90行)
loss_l = F.smooth_ll_loss(loc_p, loc_t, size_average=False)
# 修改之後 loss_l = F.smooth_ll_loss(loc_p, loc_t, reduction=’sum’)

 

8. RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

報錯原因及解決方法:eval.py第425行,如果在cpu上執行則需要指定cpu模式

# 修改之前
net.load_state_dict(torch.load(args.trained_model))

# 修改之後
net.load_state_dict(torch.load(args.trained_model, map_location='cpu'))

  

9. RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method.

 出現在eval.py和train.py  ★★★★★★

(Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)

報錯原因:在pytorch1.3及以後的版本需要規定forward方法為靜態方法,所以在pytorch1.3以上的版本執行出錯。

官方建議:在自定義的autorgrad.Function中的forward,backward前加上@staticmethod

解決方法:

方法一:pytorch回退版本至1.3以前

方法二:根據官方建議,在ssd.py中forward前加@staticmethod,結果報出另一個錯誤

緊接著,將eval.py第385行 detections = net(x).data 改為 detections = net.apply(x).data,執行時又報如下錯誤

再然後,在ssd.py第100行加forward(或apply)

output=self.detect.forward(loc.view(loc.size(0), -1, 4), 
                           self.softmax(conf.view(conf.size(0), -1, self.num_classes)), 
                           self.priors.type(type(x.data)))

還是報和上邊同樣的錯誤,直接棄療。。。

在該專案issues裡看到:

   It has a class named 'Detect' which is inheriting torch.autograd.Function but it implements the forward method in an old deprecated way, so you need to restructure it i.e. you need to define the forward method with @staticmethod decorator and use .apply to call it from your SSD class.

   Also, as you are going to use decorator, you need to ensure that the forward method doesn't use any Detect class constructor variables.

也就是在forward定義前邊加@statemethod,然後呼叫的時候用.apply。staticmethod意味著Function不再能使用類內的方法和屬性,去掉init()用別的方法代替

最終解決方案(方法三):

detection.py改為如下,即將init()併入到forward函式中:

def forward(self, num_classes, bkg_label, top_k, conf_thresh, 
            nms_thresh, loc_data, conf_data, prior_data)

然後在ssd.py中呼叫的時候改為:

# 修改之前(46行)
# if phase == 'test':
#    self.softmax = nn.Softmax(dim=-1)
#    self.detect = Detect(num_classes, 0, 200, 0.01, 0.45)

# 修改之後
if phase == 'test':
    self.softmax = nn.Softmax()
    self.detect = Detect()

# 修改之前(99行)
# if self.phase == "test":
#     output = self.detect(
#        loc.view(loc.size(0), -1, 4),                   # loc preds
#        self.softmax(conf.view(conf.size(0), -1,
#                     self.num_classes)),                # conf preds
#        self.priors.type(type(x.data))                  # default boxes
#     )

# 修改之後
if self.phase == "test":
    output = self.detect.apply(2, 0, 200, 0.01, 0.45,
                               loc.view(loc.size(0), -1, 4),    # loc preds
                               self.softmax(conf.view(-1, 2)),  # conf preds
                               self.priors.type(type(x.data))   # default boxes
                               )

注意:方式三中,ssd.py的Forward方法前邊不能加@staticmethod,否則會報和方法二中相同的錯。detection.py的Forward方法前加不加@staticmethod都沒影響。

 

10. cv2.error: OpenCV(4.5.5) :-1: error: (-5:Bad argument) in function 'rectangle'

報錯原因及解決方法:opencv版本過高,不相容,改裝4.1.2.30問題解決

 

總結:遇到報錯別急著求助,一定要仔細閱讀報錯資訊,先自己分析下為什麼報錯,一般對程式碼比較熟悉的話都是能找到原因的。實在解決不了再百度或Google,另外可以多多參考原始碼的Issues。

 

參考資料:

1、https://blog.csdn.net/qq_39506912/article/details/116926504(主要參考這篇部落格)

2、http://github.com/amdegroot/ssd.pytorch/issues/234

相關文章