1. 知識儲備
在數學上, 原Key叫做原像,由對映函式h(key)對映的儲存位置叫做像;在IT領域,以上儲存位置叫雜湊地址(雜湊地址),這個對映過程叫做雜湊/雜湊。
① 不同的key值,由雜湊函式h(x) 作用後可能對映到同一個雜湊地址, 這就是雜湊衝突,衝突發生的概率取決於 定義的雜湊函式
② 由雜湊表作用後的雜湊地址需要空間儲存,這一系列連續相鄰的地址空間叫雜湊表、 雜湊表。
(1)開雜湊法發生衝突的元素儲存於陣列空間之外。可以把“開”字理解為需要另外“開闢”空間儲存發生衝突的元素, 又稱【鏈地址法】
2. 看圖說話
① 雜湊函式
② 構造雜湊表 + 衝突連結串列
裝填因子loadfactor :所謂裝填因子是指雜湊表中已存入的記錄數n與雜湊地址空間大小m的比值,即 α=n / m ,α越小,衝突發生的可能性就越小;α越大(最大可取1),衝突發生的可能性就越大。
Do not test for equality of hash codes to determine whether two objects are equal. (Unequal objects can have identical hash codes.) To test for equality, call the ReferenceEquals or Equals method. (重要的話要讀3遍)
單純判斷【邏輯相等】時,本無所謂重寫 GetHashCode方法:
using System; using System.Threading; using System.Threading.Tasks; using System.Net.Http; using System.Collections.Generic; using System.Linq; using System.Collections; namespace Test { public class Persion { public string Name { get; set; } public int Age { get; set; } public override bool Equals(object other1) // 邏輯相等 { return Name == (other1 as Persion)?.Name; } public override int GetHashCode() { return Name.GetHashCode(); } } public class Program { static void Main(string[] args) { var p1 = new Persion { Name="HJ" , Age=22}; var p2 = new Persion { Name = "HJ", Age = 21 }; var referenceEqual = (p1 == p2); var logicEqual = (p1.Equals(p2)); Console.WriteLine($"“==”操作符:引用相等(兩變數指向一個例項),始終為:{referenceEqual}"); Console.WriteLine($"“Equal”方法:邏輯相等, {logicEqual}"); Console.Read(); } } } ---------------------------------------------------------------- output: “==”操作符:引用相等(兩變數指向一個例項),始終為:False “Equal”方法: 邏輯相等, True
但是若需要利用HashCode快速 查詢 /插入某元素, 則一定要重寫GetHashCode方法。
? 我們看一個實錘:
在使用 LINQ.Union()方法計算A,B兩集合的並集 :A∪B, 自然會想到使用邏輯相等比較器Comparer 定義A,B中元素邏輯相等:
public static IEnumerable<TSource> Union<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer);

public static IEnumerable<TSource> Union<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer) { if (first == null) throw Error.ArgumentNull("first"); if (second == null) throw Error.ArgumentNull("second"); return UnionIterator<TSource>(first, second, comparer); } static IEnumerable<TSource> UnionIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer) { Set<TSource> set = new Set<TSource>(comparer); foreach (TSource element in first) if (set.Add(element)) yield return element; // Set 便是Union方法內部構造的雜湊表 foreach (TSource element in second) if (set.Add(element)) yield return element; }
觀察Union原始碼中求A,B並集的實現,內部會構造雜湊表Set 快速查詢和插入並集元素。
故我們需要給元素編寫合適的雜湊函式, 請關注下方程式碼區的 internal int InternalGetHashCode(TElement value) 函式
internal class Set<TElement> { int[] buckets; // 連續相鄰的地址空間,盛放不同衝突連結串列的容器,俗稱雜湊桶 Slot[] slots; // 用於解決衝突的連結串列 int count; int freeList; IEqualityComparer<TElement> comparer; public Set() : this(null) { } public Set(IEqualityComparer<TElement> comparer) { if (comparer == null) comparer = EqualityComparer<TElement>.Default; this.comparer = comparer; buckets = new int[7]; // 初始雜湊桶和衝突連結串列長度 都是7 slots = new Slot[7]; freeList = -1; } // If value is not in set, add it and return true; otherwise return false public bool Add(TElement value) { return !Find(value, true); } // Check whether value is in set public bool Contains(TElement value) { return Find(value, false); } // If value is in set, remove it and return true; otherwise return false public bool Remove(TElement value) { int hashCode = InternalGetHashCode(value); int bucket = hashCode % buckets.Length; int last = -1; for (int i = buckets[bucket] - 1; i >= 0; last = i, i = slots[i].next) { if (slots[i].hashCode == hashCode && comparer.Equals(slots[i].value, value)) { if (last < 0) { buckets[bucket] = slots[i].next + 1; } else { slots[last].next = slots[i].next; } slots[i].hashCode = -1; slots[i].value = default(TElement); slots[i].next = freeList; freeList = i; return true; } } return false; } bool Find(TElement value, bool add) { int hashCode = InternalGetHashCode(value); for (int i = buckets[hashCode % buckets.Length] - 1; i >= 0; i = slots[i].next) { if (slots[i].hashCode == hashCode && comparer.Equals(slots[i].value, value)) return true; } if (add) { int index; if (freeList >= 0) { index = freeList; freeList = slots[index].next; } else { if (count == slots.Length) Resize(); index = count; count++; } int bucket = hashCode % buckets.Length; slots[index].hashCode = hashCode; slots[index].value = value; slots[index].next = buckets[bucket] - 1; buckets[bucket] = index + 1; } return false; } void Resize() { int newSize = checked(count * 2 + 1); // 嘗試擴容 int[] newBuckets = new int[newSize]; Slot[] newSlots = new Slot[newSize]; Array.Copy(slots, 0, newSlots, 0, count); for (int i = 0; i < count; i++) { int bucket = newSlots[i].hashCode % newSize; newSlots[i].next = newBuckets[bucket] - 1; newBuckets[bucket] = i + 1; } buckets = newBuckets; slots = newSlots; } internal int InternalGetHashCode(TElement value) { //Microsoft DevDivBugs 171937. work around comparer implementations that throw when passed null return (value == null) ? 0 : comparer.GetHashCode(value) & 0x7FFFFFFF; } internal struct Slot { internal int hashCode; internal TElement value; internal int next; } }
因此有最佳實踐: 為兩物件重寫Equal 方法返回true時, 請最好重寫 GetHashCode方法,為這兩物件返回相同的hashcode。
在一般場景中,經驗會幫助你編寫雜湊函式, 比如以上Person類中,字串型別Name的HashCode總是相等的。
That‘all 看完了通篇文章的同棧猿,應該就可以回答文章引言 5大提問。