重學c#系列——字典(十一)

敖毛毛發表於2021-10-17

原文網址 : https://www.cnblogs.com/aoximin/p/15375645.html

前言

重學c#系列繼續更新，簡單看一下字典的原始碼。

看原始碼主要是解釋一下江湖中的兩個傳言:

字典foreach 順序是字典新增的順序
字典刪除元素後，字典順序將會改變

正文

那麼就從例項化開始看起，這裡我們假定key 是string 情況下開始看。

一般我們之間例項化:

Dictionary<string, string> keys = new Dictionary<string, string>();

那麼看下內部的例項化是怎麼樣的。

public Dictionary() : this(0, null) { }

看下這個this 是什麼:

public Dictionary(int capacity, IEqualityComparer<TKey>? comparer)
{
	if (capacity < 0)
	{
		ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.capacity);
	}

	if (capacity > 0)
	{
		Initialize(capacity);
	}

	if (comparer != null && comparer != EqualityComparer<TKey>.Default) // first check for null to avoid forcing default comparer instantiation unnecessarily
	{
		_comparer = comparer;
	}

	// Special-case EqualityComparer<string>.Default, StringComparer.Ordinal, and StringComparer.OrdinalIgnoreCase.
	// We use a non-randomized comparer for improved perf, falling back to a randomized comparer if the
	// hash buckets become unbalanced.

	if (typeof(TKey) == typeof(string))
	{
		if (_comparer is null)
		{
			_comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.WrappedAroundDefaultComparer;
		}
		else if (ReferenceEquals(_comparer, StringComparer.Ordinal))
		{
			_comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.WrappedAroundStringComparerOrdinal;
		}
		else if (ReferenceEquals(_comparer, StringComparer.OrdinalIgnoreCase))
		{
			_comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.WrappedAroundStringComparerOrdinalIgnoreCase;
		}
	}
}

首先有個引數，capacity 這個表示容量，然後有另外一個引數comparer，從名字上看來是用來比較排序的。

那麼這時候可以大膽猜測一下字典的遍歷迴圈是否和比較器（comparer）有關呢？不過這裡預設是空的，暫時就不關心了。

繼續往下看:

if (typeof(TKey) == typeof(string))
{
	if (_comparer is null)
	{
		_comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.WrappedAroundDefaultComparer;
	}
	else if (ReferenceEquals(_comparer, StringComparer.Ordinal))
	{
		_comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.WrappedAroundStringComparerOrdinal;
	}
	else if (ReferenceEquals(_comparer, StringComparer.OrdinalIgnoreCase))
	{
		_comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.WrappedAroundStringComparerOrdinalIgnoreCase;
	}
}

一般來說我們的key一般都是string，那麼預設就是NonRandomizedStringEqualityComparer.WrappedAroundDefaultComparer。

如果遍歷和_comparer 有關的話，那麼NonRandomizedStringEqualityComparer.WrappedAroundDefaultComparer 就很重要，但是現在還不能確定，先不看。

先看一下新增元素的情況。

public void Add(TKey key, TValue value)
{
	bool modified = TryInsert(key, value, InsertionBehavior.ThrowOnExisting);
	Debug.Assert(modified); // If there was an existing key and the Add failed, an exception will already have been thrown.
}

TryInsert 比較長，那麼就一部分一部分看。

if (key == null)
{
	ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}

if (_buckets == null)
{
	Initialize(0);
}
Debug.Assert(_buckets != null);
Entry[]? entries = _entries;
Debug.Assert(entries != null, "expected entries to be non-null");

_buckets 一開始是是null，那麼先來看一下Initialize。

private int Initialize(int capacity)
{
	int size = HashHelpers.GetPrime(capacity);
	int[] buckets = new int[size];
	Entry[] entries = new Entry[size];

	// Assign member variables after both arrays allocated to guard against corruption from OOM if second fails
	_freeList = -1;
#if TARGET_64BIT
	_fastModMultiplier = HashHelpers.GetFastModMultiplier((uint)size);
#endif
	_buckets = buckets;
	_entries = entries;

	return size;
}

那麼看一下初始化的時候dic 的size是多少，通過 HashHelpers.GetPrime(capacity)。

我們知道如果我們沒有指定capacity，那麼capacity預設是0；

public static int GetPrime(int min)
{
	if (min < 0)
		throw new ArgumentException(SR.Arg_HTCapacityOverflow);

	foreach (int prime in s_primes)
	{
		if (prime >= min)
			return prime;
	}

	// Outside of our predefined table. Compute the hard way.
	for (int i = (min | 1); i < int.MaxValue; i += 2)
	{
		if (IsPrime(i) && ((i - 1) % HashPrime != 0))
			return i;
	}
	return min;
}

這個s_primes 我貼一下。

private static readonly int[] s_primes =
{
	3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919,
	1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
	17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,
	187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263,
	1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369
};

把0帶入進去，那麼結果應該是3，看來一開始是3個儲存桶，那麼我們可以思考一下這個儲存桶數量是否和效能有關，如果動態指定後效能是否更好，後面會在細節篇介紹。

TryInsert 這個方法繼續往下看:

IEqualityComparer<TKey>? comparer = _comparer;
uint hashCode = (uint)((comparer == null) ? key.GetHashCode() : comparer.GetHashCode(key));

uint collisionCount = 0;
ref int bucket = ref GetBucket(hashCode);
int i = bucket - 1; // Value in _buckets is 1-based

在TryInsert 出現了_comparer，每個comparer 有自己的hashcode 方式，那麼這個hashcode 會決定儲存到哪個儲存桶裡面嗎。

這裡我們假設是在string情況下, 如果我們在不填寫comparer的情況下，預設是OrdinalComparer。

private sealed class OrdinalComparer : NonRandomizedStringEqualityComparer
{
  internal OrdinalComparer(IEqualityComparer<string> wrappedComparer)
	: base(wrappedComparer)
  {
  }

  public override bool Equals(string x, string y)
  {
	return string.Equals(x, y);
  }

  public override int GetHashCode(string obj)
  {
	return obj.GetNonRandomizedHashCode();
  }
}

通過下面這個comparer.GetHashCode，獲取了hashcode。

介紹一下GetNonRandomizedHashCode 這個哈，這個就是說每個string 每次生成的都是確定的hashcode,而且其碰撞的可能性非常低,如果想看的話可以看一下原始碼，這裡非重點，或許會在後續的介紹hash中說明一下。

uint hashCode = (uint)((comparer == null) ? key.GetHashCode() : comparer.GetHashCode(key));

然後通過hashcode 獲取在bucket:

ref int bucket = ref GetBucket(hashCode);
int i = bucket - 1; // Value in _buckets is 1-based

檢視:GetBucket

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private ref int GetBucket(uint hashCode)
{
	int[] buckets = _buckets!;
#if TARGET_64BIT
	return ref buckets[HashHelpers.FastMod(hashCode, (uint)buckets.Length, _fastModMultiplier)];
#else
	return ref buckets[hashCode % (uint)buckets.Length];
#endif
}

通過上面這兩段程式碼得知，如果是第一次新增元素，那麼bucket 是0，i 為-1；

繼續在tryinsert 往下看。

while (true)
{
	// Should be a while loop https://github.com/dotnet/runtime/issues/9422
	// Test uint in if rather than loop condition to drop range check for following array access
	if ((uint)i >= (uint)entries.Length)
	{
		break;
	}

	if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
	{
		if (behavior == InsertionBehavior.OverwriteExisting)
		{
			entries[i].value = value;
			return true;
		}

		if (behavior == InsertionBehavior.ThrowOnExisting)
		{
			ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException(key);
		}

		return false;
	}

	i = entries[i].next;

	collisionCount++;
	if (collisionCount > (uint)entries.Length)
	{
		// The chain of entries forms a loop; which means a concurrent update has happened.
		// Break out of the loop and throw, rather than looping forever.
		ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported();
	}
}

這裡i 為-1 然後entries.Length 是3。

所以(uint)i >= (uint)entries.Length 是true。

第一次我們新增的時候就直接break了。

值得一提的是另外一件事情，那就是當我們反編譯的時候，看到的是這樣的。

那麼這裡就有人問了，i = -1的時候，那麼這個時候不是會出現異常嗎？

實際上並不會，要知道當我們執行的時候比較的是二進位制程式碼，故而建議編譯後的除錯的時候把16進位制開啟。

if (_freeCount > 0)
{
	index = _freeList;
	Debug.Assert((StartOfFreeList - entries[_freeList].next) >= -1, "shouldn't overflow because `next` cannot underflow");
	_freeList = StartOfFreeList - entries[_freeList].next;
	_freeCount--;
}
else
{
	int count = _count;
	if (count == entries.Length)
	{
		Resize();
		bucket = ref GetBucket(hashCode);
	}
	index = count;
	_count = count + 1;
	entries = _entries;
}

ref Entry entry = ref entries![index];
entry.hashCode = hashCode;
entry.next = bucket - 1; // Value in _buckets is 1-based
entry.key = key;
entry.value = value; // Value in _buckets is 1-based
bucket = index + 1;
_version++;

// Value types never rehash
if (!typeof(TKey).IsValueType && collisionCount > HashHelpers.HashCollisionThreshold && comparer is NonRandomizedStringEqualityComparer)
{
	// If we hit the collision threshold we'll need to switch to the comparer which is using randomized string hashing
	// i.e. EqualityComparer<string>.Default.
	Resize(entries.Length, true);
}

這裡_freeCount 是0，從這個命名(空閒數量)來看結合別人說的那個remove 之後順序會變，猜測一下，是不是刪除的時候保留了一個空位，然後如果有空位然後就往裡面新增呢？

這個先不管只有知道如果一直新增那麼其一直是0。

那麼就按照_freeCount 是0的節奏走，那麼就是判斷是count == entries.Length 判斷是否滿了，如果滿了肯定就是擴容一下，調整一下之類的了。

然後總量加1，同時設定了左邊是最後一位，那麼猜測一下，如果一直這樣加的話，如果遍歷的時候沒有使用到comparer，那麼就是按照新增的順序了。

那麼看一下字典的迭代步驟:

public bool MoveNext()
{
	if (_version != _dictionary._version)
	{
		ThrowHelper.ThrowInvalidOperationException_InvalidOperation_EnumFailedVersion();
	}

	// Use unsigned comparison since we set index to dictionary.count+1 when the enumeration ends.
	// dictionary.count+1 could be negative if dictionary.count is int.MaxValue
	while ((uint)_index < (uint)_dictionary._count)
	{
		ref Entry entry = ref _dictionary._entries![_index++];

		if (entry.next >= -1)
		{
			_current = new KeyValuePair<TKey, TValue>(entry.key, entry.value);
			return true;
		}
	}

	_index = _dictionary._count + 1;
	_current = default;
	return false;
}

_index 一開始預設是0，那麼我們儲存的值是一個陣列，那麼如果按照上面這樣邏輯的遍歷，那麼是按照新增順序來的。

這裡有一個遍歷出來的條件，entry.next >= -1，那麼是否是如果刪除的話那麼就做一個標記呢？因為如果移動陣列，那麼的確是一件艱難的事情（效能）。

那麼看remove了。

public bool Remove(TKey key)
{
	// The overload Remove(TKey key, out TValue value) is a copy of this method with one additional
	// statement to copy the value for entry being removed into the output parameter.
	// Code has been intentionally duplicated for performance reasons.

	if (key == null)
	{
		ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
	}

	if (_buckets != null)
	{
		Debug.Assert(_entries != null, "entries should be non-null");
		uint collisionCount = 0;
		uint hashCode = (uint)(_comparer?.GetHashCode(key) ?? key.GetHashCode());
		ref int bucket = ref GetBucket(hashCode);
		Entry[]? entries = _entries;
		int last = -1;
		int i = bucket - 1; // Value in buckets is 1-based
		while (i >= 0)
		{
			ref Entry entry = ref entries[i];

			if (entry.hashCode == hashCode && (_comparer?.Equals(entry.key, key) ?? EqualityComparer<TKey>.Default.Equals(entry.key, key)))
			{
				if (last < 0)
				{
					bucket = entry.next + 1; // Value in buckets is 1-based
				}
				else
				{
					entries[last].next = entry.next;
				}

				Debug.Assert((StartOfFreeList - _freeList) < 0, "shouldn't underflow because max hashtable length is MaxPrimeArrayLength = 0x7FEFFFFD(2146435069) _freelist underflow threshold 2147483646");
				entry.next = StartOfFreeList - _freeList;

				if (RuntimeHelpers.IsReferenceOrContainsReferences<TKey>())
				{
					entry.key = default!;
				}

				if (RuntimeHelpers.IsReferenceOrContainsReferences<TValue>())
				{
					entry.value = default!;
				}

				_freeList = i;
				_freeCount++;
				return true;
			}

			last = i;
			i = entry.next;

			collisionCount++;
			if (collisionCount > (uint)entries.Length)
			{
				// The chain of entries forms a loop; which means a concurrent update has happened.
				// Break out of the loop and throw, rather than looping forever.
				ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported();
			}
		}
	}
	return false;
}

這裡就直接說明一下這個思路哈。

首先給出一張概念圖:

這張是什麼意思呢？字典實際上預設情況下實際上是3個資料桶。

但是資料本身是存在陣列中的，然後通過資料key的hashcode 來決定資料放在哪個資料桶中。

且同一個資料桶中是鏈狀結構的，那麼刪除步驟就是下面這個概念圖。

就是進行連結串列刪除一樣，同樣把這個資料的key和value 清空，然後把next進行弄為比-1還小的數StartOfFreeList - _freeList，StartOfFreeList 預設是-3，_freeList 最小是-1，這樣就在陣列中標記刪除了。

那麼我們再來看新增的時候，如果有刪除的位置的時候的程式碼。

if (_freeCount > 0)
{
	index = _freeList;
	Debug.Assert((StartOfFreeList - entries[_freeList].next) >= -1, "shouldn't overflow because `next` cannot underflow");
	_freeList = StartOfFreeList - entries[_freeList].next;
	_freeCount--;
}

如果有刪除的位置那麼會頂替刪除的位置。

那麼刪除的模型是怎麼樣的？看StartOfFreeList - entries[_freeList].next。

這裡刪除邏輯其實是這樣的。

刪除的我們指定只是做了標記，但是這個標記可不是這麼簡單的標記，而是形成了鏈狀結構，每次如果新增的時候就替補了最後一個刪除為，然後最後一個刪除位又得到了更新。

這裡給出一個例子。

static void Main(string[] args)
{
	Console.WriteLine("Hello World!");
	Dictionary<string, string> keys = new Dictionary<string, string>();
	keys.Add("aaa","aaa");

	keys.Add("cccc", "cccc");

	keys.Add("ddddd", "ddddd");

	keys.Remove("aaa");

	keys.Remove("cccc");

	keys.Add("xxxx", "bbbbb");

	keys.Add("zzzz", "bbbbb");

	foreach (var a in keys)
	{
		Console.WriteLine(a.Key);
	}

	Console.ReadLine();
}

結

那麼得出結論。

1. 字典foreach 順序是字典新增的順序

2. 字典刪除元素後，字典順序將會改變

如果字典一直只新增，那麼會foreach 會是原來的順序。

如果字典進行了刪除，那麼這個最後一個刪除位將會成為下一個新增位。

整理查詢我就不介紹了，也是通過定位在哪個儲存桶，然後進行鏈狀查詢。

在該系列，在後面應該也會介紹一下list原始碼。

重學c#系列——datetime 和 datetimeoffset[二十一]
2021-12-25
C#
重學c#系列——list(十二)
2021-10-25
C#
重學c#系列——異常(六)
2020-07-31
C#
重學c#系列——c#執行原理(二)
2020-07-12
C#
重學c#系列—— IO流[三十四]
2023-02-04
C#
重學c#系列——linq(2) [二十八]
2022-12-15
C#
重學c#系列——linq(3) [二十九]
2022-12-17
C#
重學c#系列——對c#粗淺的認識(一)
2020-07-09
C#
重學c#系列——非託管例項(五)
2020-07-26
C#
重學c#系列——c# 託管和非託管資源(三)
2020-07-14
C#
重學c#系列—— 反射深入一點點[三十三]
2022-12-23
C#反射
重學c#系列——string.empty 和 "" 還有null[二十]
2021-12-24
C#Null
重學c#系列——逆變和協變[二十四]
2022-11-23
C#
重學c#————struct
2021-02-15
C#Struct
重學c#系列——異常續[異常注意事項](七)
2020-08-03
C#
重學c#系列——委託和匿名函式[二十五]
2022-11-24
C#函式
重學c#系列——訂閱釋出與事件[二十六]
2022-11-25
C#事件
字典 dictionary c# 0104
2021-01-04
C#
前端重學系列
2024-12-03
前端
重學c#系列——盛派自定義異常原始碼分析(八)
2020-08-24
C#原始碼
重學c#系列——c# 託管和非託管資源與程式碼相關(四)
2020-07-24
C#
Python零基礎學習筆記（二十一）——dict字典
2019-01-20
Python筆記
快速學習nodejs系列：十一、nodejs垃圾回收
2019-02-16
NodeJS
C#各種加密方法，字典排序
2024-11-04
C#加密排序
重學 JS 系列：聊聊繼承
2019-04-01
JS繼承
重學OC第二十一篇：@synchronized分析
2020-11-07
synchronized
深入理解javascript系列(十一):this
2018-06-14
JavaScript
初學Python——字典
2018-04-08
Python
字典樹學習
2018-09-02
重學 JS 系列：聊聊 new 操作符
2019-03-04
JS
C#初學者教程系列22：檔案操作
2020-12-11
C#
【Redis 系列】redis 學習十六，redis 字典(map) 及其核心編碼結構
2022-06-26
Redis
[Docker 系列]docker 學習十一，docker 總結和麵試題整理
2021-12-17
Docker
Python工具箱系列（五十一）
2024-03-29
Python
C#資料去重
2024-04-27
C#
【Redis 系列】redis 學習十一，redis 的哨兵模式詳解和實戰
2022-04-10
Redis模式
C#資料結構與演算法系列（二十一）：希爾排序演算法（ShellSort）
2020-07-04
C#資料結構演算法排序
密碼學系列——訊息摘要（c#程式碼實操）
2020-08-10
密碼學C#

重學c#系列——字典(十一)

前言

正文

結

相關文章