C# Dictionary 的实际工作原理_c#dictionary.add

C# Dictionary 的实际工作原理_c#dictionary.add

编码文章call10242025-02-09 12:02:0110A+A-

Dictionary是 C# 中非常流行的数据结构,也是面试问题的热门选择。我已经使用了 10 亿次,我非常确定我了解它们的工作原理。但是,当我更深入地研究它们并检查实际代码时,我发现它们比我想象的还要好(也许您也是如此)。在本文中,我们将一起进行深入研究,甚至编写我们自己的词典教育副本。所以和我一起开始吧!Dictionary

为了确保我们的副本与实际代码匹配,我们将首先探索原始代码中的内容,然后删除所有不必要的内容,并在需要的地方添加日志 ()。只需复制两个主要方法就足够了:Add 和 GetValueOrDefault 来重新创建 的所有基本要素,因此这就是我们要执行的操作。但首先,让我们看看 a 中的字段 :Console.WriteLineDictionaryDictionary

我将使用参考源中的 .NET Framework 4.8 中的代码。尽管现代 .NET 代码稍微复杂一些,但基本要素仍然相同,因此 Framework 版本应该可以。

private struct Entry { 
public int hashCode;
public int next;
public TKey key;
public TValue value;
}

private int[] buckets;
private Entry[] entries;
private int count;
private int version;
private int freeList;
private int freeCount;
private IEqualityComparer<TKey> comparer;

和 properties 是优化技术的一部分,对于 a 的运行不是必需的 - 我们将把它们传递给我们的 replica。 只是一个 changes 计数器,我们也会传递它。为简单起见,我们还将使用而不是允许外部配置。添加打印当前状态(所有属性的值)的方法,我们将得到:freeListfreeCountDictionaryversionEqualityComparer.Default

public class EducationalDictionary<TKey, TValue>
{
private struct Entry {
public int hashCode;
public int next;
public TKey key;
public TValue? value;

string ValueString => value == ? "" : value!.ToString()!;
public override string ToString()
{
return $"{key} - {ValueString} + (next = {next})";
}
}

private int[] buckets;
private Entry[] entries;
private int count;
private readonly EqualityComparer<TKey> comparer = EqualityComparer.Default;
public void PrintFullState(string preface)
{
Console.WriteLine();
Console.Write(this.ToString(preface));
}
public string ToString(string preface)
{
StringBuilder result = new();

result.AppendLine($"{preface}, state:");
result.AppendLine();
result.AppendLine($"buckets: [{String.Join(", ", buckets!)}]");
result.AppendLine($"entries:");
for (int i = 0; i < entries.Length; i++)
{
result.AppendLine($" [{i}] = {entries[i]}");
}
result.AppendLine($"count: {count}");
return result.ToString();
}
}

源代码中的方法实质上是对另一个方法的调用:Add

public void Add(TKey key, TValue value) { 
Insert(key, value, true);
}

该方法可能看起来很可怕,但别担心,我们会一点一点地剖析它:Insert

private void Insert(TKey key, TValue value, bool add) {
if( key == ) {
ThrowHelper.ThrowArgumentException(ExceptionArgument.key);
}

if (buckets == ) Initialize(0);
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
int targetBucket = hashCode % buckets.Length;

#if FEATURE_RANDOMIZED_STRING_HASHING
int collisionCount = 0;
#endif

for (int i = buckets[targetBucket]; i >= 0; i = entries[i].next) {
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) {
if (add) {
ThrowHelper.ThrowArgumentException(ExceptionResource.Argument_AddingDuplicate);
}
entries[i].value = value;
version++;
return;
}

#if FEATURE_RANDOMIZED_STRING_HASHING
collisionCount++;
#endif
}
int index;
if (freeCount > 0) {
index = freeList;
freeList = entries[index].next;
freeCount--;
}
else {
if (count == entries.Length)
{
Resize();
targetBucket = hashCode % buckets.Length;
}
index = count;
count++;
}

entries[index].hashCode = hashCode;
entries[index].next = buckets[targetBucket];
entries[index].key = key;
entries[index].value = value;
buckets[targetBucket] = index;
version++;

#if FEATURE_RANDOMIZED_STRING_HASHING

#if FEATURE_CORECLR
// In case we hit the collision threshold we'll need to switch to the comparer which is using randomized string hashing
// in this case will be EqualityComparer.Default.
// Note, randomized string hashing is turned on by default on coreclr so EqualityComparer.Default will
// be using randomized string hashing

if (collisionCount > HashHelpers.HashCollisionThreshold && comparer == NonRandomizedStringEqualityComparer.Default)
{
comparer = (IEqualityComparer) EqualityComparer<string>.Default;
Resize(entries.Length, true);
}
#else
if(collisionCount > HashHelpers.HashCollisionThreshold && HashHelpers.IsWellKnownEqualityComparer(comparer))
{
comparer = (IEqualityComparer) HashHelpers.GetRandomizedEqualityComparer(comparer);
Resize(entries.Length, true);
}
#endif // FEATURE_CORECLR

#endif
}

它还调用其他方法:

private void Initialize(int capacity) {
int size = HashHelpers.GetPrime(capacity);
buckets = new int[size];
for (int i = 0; i < buckets.Length; i++) buckets[i] = -1;
entries = new Entry[size];
freeList = -1;
}
private void Resize(int newSize, bool forceNewHashCodes) {
Contract.Assert(newSize >= entries.Length);
int[] newBuckets = new int[newSize];
for (int i = 0; i < newBuckets.Length; i++) newBuckets[i] = -1;
Entry[] newEntries = new Entry[newSize];
Array.Copy(entries, 0, newEntries, 0, count);
if(forceNewHashCodes) {
for (int i = 0; i < count; i++) {
if(newEntries[i].hashCode != -1) {
newEntries[i].hashCode = (comparer.GetHashCode(newEntries[i].key) & 0x7FFFFFFF);
}
}
}
for (int i = 0; i < count; i++) {
if (newEntries[i].hashCode >= 0) {
int bucket = newEntries[i].hashCode % newSize;
newEntries[i].next = newBuckets[bucket];
newBuckets[bucket] = i;
}
}
buckets = newBuckets;
entries = newEntries;
}

最后,它们调用 上的方法。我们将复制这些,因为如果您不处理边缘情况,它们将非常微不足道。我们的版本将如下所示:HashHelpers

public static class HashHelpers
{
public static readonly int[] primes = {
3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919,
1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,
187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263,
1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369};

public static int GetPrime(int min)
{
foreach (var prime in primes)
{
if (prime >= min) return prime;
}

return min;
}

public static int ExpandPrime(int oldSize)
{
return GetPrime(2 * oldSize);
}
}

因为我们将在调整大小完成后删除 logic 和 print state 。老实说,代码并不是很重要,它本质上是重新排列和针对新大小。这是我们的结果:Resizeif (forceNewHashCodes)entriesbuckets

private void Resize(int newSize) {
var newBuckets = new int[newSize];
for (var i = 0; i < newBuckets.Length; i++) newBuckets[i] = -1;
var newEntries = new Entry[newSize];
Array.Copy(entries, 0, newEntries, 0, count);

for (var i = 0; i < count; i++) {
if (newEntries[i].hashCode >= 0) {
var bucket = newEntries[i].hashCode % newSize;
newEntries[i].next = newBuckets[bucket];
newBuckets[bucket] = i;
}
}
buckets = newBuckets;
entries = newEntries;

PrintFullState("\u2194\ufe0f Resize");
}

我们将初始化逻辑从 straight 移动到构造函数(因为它将更容易处理可空性):if (buckets == ) Initialize(0);

以及初始化完成后的打印状态

public EducationalDictionary()
{
var size = HashHelpers.GetPrime(0);
buckets = new int[size];
for (var i = 0; i < buckets.Length; i++) buckets[i] = -1;
entries = new Entry[size];

PrintFullState("\ud83d\ude80 Initialized");
}

以下是我们将执行的更改列表:Add

  • 删除参数验证

  • 删除#
    FEATURE_RANDOMIZED_STRING_HASHING

  • 删除 logic under,因为我们还是删除了参数if (freeCount > 0)

  • 删除 loop,检查已设置的 key:for (int i = buckets[targetBucket]; i >= 0; i = entries[i].next) {

  • 添加后的打印状态

public void Add(TKey key, TValue value) {
var hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
var targetBucket = hashCode % buckets!.Length;

if (count == entries!.Length)
{
Resize(HashHelpers.ExpandPrime(count));
targetBucket = hashCode % buckets.Length;
}

var index = count;
count++;

entries[index].hashCode = hashCode;
entries[index].next = buckets[targetBucket];
entries[index].key = key;
entries[index].value = value;
buckets[targetBucket] = index;

Console.WriteLine();
PrintFullState($"\ud83d\udce5 Add: {key} - {value}. hashCode = {hashCode}, targetBucket = {targetBucket}");
}

现在看起来简单多了,对吧?现在让我们完成我们的类,实现 .GetValueOrDefault

获取价值

这一次的实现相当简单,甚至在原始代码中也是如此:GetValueOrDefault

internal TValue GetValueOrDefault(TKey key) {
int i = FindEntry(key);
if (i >= 0) {
return entries[i].value;
}
return default(TValue);
}

在搜索本身(在方法中):FindEntry

private int FindEntry(TKey key) {
if( key == ) {
ThrowHelper.ThrowArgumentException(ExceptionArgument.key);
}

if (buckets != ) {
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;
}
}
return -1;
}

这里没有太多需要清理的地方。但是我们将添加大量日志,因为这是我们最重要的逻辑

private int FindEntry(TKey key)
{
var hashCode = comparer.GetHashCode(key!) & 0x7FFFFFFF;
var initialBucketIndex = hashCode % buckets.Length;

Console.WriteLine();
Console.WriteLine($"\ud83d\udd0e Search. Key = {key}. Initial Bucket Index {initialBucketIndex}");

for (var i = buckets[initialBucketIndex]; i >= 0; i = entries[i].next)
{
Console.WriteLine();
Console.WriteLine($"Comparing key from entries[{i}] ({entries[i]}) to {key}");

if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
{
Console.WriteLine($"Key is equal returning {i}");

return i;
}
else {
Console.WriteLine($"Key is not equal, moving to the next linked index ({entries[i].next})");
}
}

Console.WriteLine();
Console.WriteLine("Search exit condition met (i >= 0). Returning -1 (as not found)");
return -1;
}

这样就完成了我们的 .现在让我们使用它并查看我们准备的详细日志!EducationalDictionary

执行测试

我们将添加 4

点击这里复制本文地址 以上内容由文彬编程网整理呈现,请务必在转载分享时注明本文地址!如对内容有疑问,请联系我们,谢谢!
qrcode

文彬编程网 © All Rights Reserved.  蜀ICP备2024111239号-4