多執行緒操作stl::map引起的資料不一致問題的解決過程與方法

鍾超發表於2011-09-14

昨天晚上業務出現了一次core dump,今天早上來check生產環境的core檔案的堆疊內容如下:


        gdb ./appname --core=core.1234 
        (gdb) bt
        

得到棧的內容如下:


#0  0x00007f5634262734 in std::_Rb_tree_rotate_right () from /usr/lib/libstdc++.so.6
#1  0x00007f56342628c1 in std::_Rb_tree_insert_and_rebalance () from /usr/lib/libstdc++.so.6
#2  0x00000000004b556c in std::_Rb_tree<unsigned int, std::pair<unsigned int const, unsigned int>, std::_Select1st<std::pair<unsigned int const, unsigned int> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >::_M_insert_ (this=0x7fff3d253090, __x=0x0, __p=0x12e48f0, __v=@0x7fff3d251350)
    at /usr/include/c++/4.3/bits/stl_tree.h:854
#3  0x00000000004b63d2 in std::_Rb_tree<unsigned int, std::pair<unsigned int const, unsigned int>, std::_Select1st<std::pair<unsigned int const, unsigned int> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >::_M_insert_unique_ (this=0x7fff3d253090, __position={_M_node = 0x7f56182ee260}, __v=@0x7fff3d251350)
    at /usr/include/c++/4.3/bits/stl_tree.h:1201
#4  0x00000000004b65da in std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >::insert (
    this=0x7fff3d253090, __position={_M_node = 0x7f56182ee260}, __x=@0x7fff3d251350) at /usr/include/c++/4.3/bits/stl_map.h:496
#5  0x00000000004b6680 in std::map<unsigned int, unsigned int, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >::operator[] (
    this=0x7fff3d253090, __k=@0x7fff3d251508) at /usr/include/c++/4.3/bits/stl_map.h:419


找到#5對應的原始碼為:

 
mapped_type& operator[](const key_type& __k)
{
   // concept requirements
   __glibcxx_function_requires(_DefaultConstructibleConcept<mapped_type>)

   iterator __i = lower_bound(__k);
   // __i->first is greater than or equivalent to __k.
   if (__i == end() || key_comp()(__k, (*__i).first))
   {
      __i = insert(__i, value_type(__k, mapped_type()));//出問題的語句
   }
   return (*__i).second;
}

找到#4對應的原始碼為:

iterator insert(iterator __position, const value_type& __x)
{
   return _M_t._M_insert_unique_(__position, __x);//出問題的語句
}

找到#3對應的原始碼:

typename _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::iterator
_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_insert_unique_(const_iterator __position, const _Val& __v)
{
  // end() 如果position為end,則表示目前tree中沒有它
  if (__position._M_node == _M_end())
  {
	  if (size() > 0
		  && _M_impl._M_key_compare(_S_key(_M_rightmost()),
									_KeyOfValue()(__v)))
		return _M_insert_(0, _M_rightmost(), __v);
	  else
		return _M_insert_unique(__v).first;
  }
  else if (_M_impl._M_key_compare(_KeyOfValue()(__v), _S_key(__position._M_node)))
  {
	  // First, try before...
	  const_iterator __before = __position;
	  if (__position._M_node == _M_leftmost()) // begin()
	  {
		 return _M_insert_(_M_leftmost(), _M_leftmost(), __v);
	  }
	  else if (_M_impl._M_key_compare(_S_key((--__before)._M_node), _KeyOfValue()(__v)))
	  {
		  if (_S_right(__before._M_node) == 0)
		  {
			 return _M_insert_(0, __before._M_node, __v);//出問題的語句

分析到這裡,我們可以看到,在紅黑樹的insert_unique操作中,沒有走到 if (__position._M_node == _M_end())分支,而是到了 else if (...)的分支。標明該key值在map中已經存在了,此次插入的結果就是改變key對應的value值。而在生產環境上的log裡顯示該key值已經被drop了,如果插入應該走第一個 if 分支。


因此只有一種情況,就是因為多執行緒引起的資料不一致問題。在需要維護map的程式碼段加上:

boost::recursive_mutex::scoped_lock lock(m_mutex);

即可,其中m_mutex為:

boost::recursive_mutex m_mutex;



相關文章