当前位置：首页 > news >正文

【c++】【STL】unordered_set 底层实现总结

news 来源：原创 2025/4/28 11:08:28

【c++】【STL】unordered_set 底层实现总结

我大概花了一个图总结了一下（ps:我自己看的不保证完全正确）
ps：我写这个的初衷是想确定
1 c++11里面的unordered_set是否存在红黑树这个数据结构–>否
2 他的扩容是怎样设计的–>rehash
3 内部的扩容因子究竟是多少–>1.0(最大扩容因子)
大概解决这几个问题并不是整个 unordered_set的内部结构
这个图是大概总结了一下调用逻辑下面写一下

在这里插入图片描述

1继承关系

template <class _Kty, class _Hasher = hash<_Kty>, 
class _Keyeq = equal_to<_Kty>, class _Alloc = allocator<_Kty>>

class unordered_set : 
public _Hash<
             _Uset_traits<_Kty, 								  _Alloc, false>>
                              _Uhash_compare<_Kty, _Hasher, _Keyeq>,

参数	解释	例子
`_Kty`	存储的键的类型	`int`,
`_Hasher`	哈希函数	`std::hash<int>`
`_Keyeq`	键比较器	`std::equal_to<int>`
`_Alloc`	分配器	`std::allocator<int>`
`false`	是否允许重复键（为 `false` 表示不允许）	`unordered_set` 需要唯一键

unordered_set ：

公有继承 _hash (_hash 是实际的哈希表实现，包含哈希冲突解决、扩容等核心逻辑。)
_hash<> 是一个 模板内部是 _Uset_traits<> 模板 (_Uset_traits 负责设置哈希表的行为)
Uset_traits <> 是一个模板内部是 _Kty, _Uhash_compare<_Kty, _Hasher, _Keyeq>, _Alloc, false。

_hash<T>
- _hash 是一个类模板，它接收一个模板参数T。
- 这里的 T 是 _Uset_traits<_Kty, _Uhash_compare<_Kty, _Hasher, _Keyeq>, _Alloc, false>。
_Uset_traits<T>
- _Uset_traits 是一个模板，它接收一个模板参数T，用于封装 unordered_set 相关的类型信息
- 这里的 T 是 _Kty, _Uhash_compare<_Kty, _Hasher, _Keyeq>, _Alloc, false。
  - _Kty：键类型（Key Type）。
  - _Uhash_compare<_Kty, _Hasher, _Keyeq>：封装哈希函数 _Hasher 和键比较器 _Keyeq。
  - _Alloc：内存分配器（Allocator）。
  - false：是否允许重复键（为 false 表示不允许）。
_Uhash_compare<T>
- _Uhash_compare 是一个模板，它接收一个模板参数T，负责哈希和比较
- 这里的 T 是 _Kty, _Hasher, _Keyeq。
- _Uhash_compare 是构造哈希表的关键：

_Uhash_compare<>

1 内部存在 _Mypair 这个对象：

 _Compressed_pair<_Hasher, _Compressed_pair<_Keyeq, float>> _Mypair;

参数	作用
_Kty	关键字类型（key type）
_Hasher	哈希函数类型
_Keyeq	键值比较器类型

通过一个 pair（即 _Compressed_pair）存储哈希函数 + 键值比较器，最大负载因子（即扩容因子）。
_Mypair结构：

构造函数初始化最大负载因子 _Mypair._Myval2._Myval2 为 0.0f

2 内部定义 bucket_size初始值

template <class _Kty, class _Hasher, class _Keyeq>
class _Uhash_compare
    : public _Uhash_choose_transparency<_Kty, _Hasher, _Keyeq> { // traits class for unordered containers
public:
    enum { // parameters for hash table
        bucket_size = 1 // 0 < bucket_size
    };

扩容部分

ps: unordered_set 第一次触发扩容后，内部会将最大负载因子设置成 1.0。
涉及内容：
实际负载因子 ：根据计算得出–>元素数量/桶数量（内部写的是_Vec 应该是vector但是我没进去看看实际的···）（0.5–1.0区间）
最大负载因子 ：_Mypair._Myval2._Myval2 -->不会显式修改

rehash扩容代码

 void rehash(size_type _Buckets) { // rebuild table with at least _Buckets buckets
        // don't violate a.bucket_count() >= a.size() / a.max_load_factor() invariant:
        _Buckets = (_STD max) (_Min_load_factor_buckets(_List.size()), _Buckets);
        if (_Buckets <= _Maxidx) { // we already have enough buckets; nothing to do
            return;
        }
        _Forced_rehash(_Buckets);
    }

_Min_load_factor_buckets(_List.size())–>通过当前最大负载因子 计算出存储元素所需要的最小桶数量
_Buckets = (_STD max) (_Min_load_factor_buckets(_List.size()), _Buckets);–>_Buckets是传入的_Buckets值和 _Min_load_factor_buckets(_List.size())值计算出的较大者
if (_Buckets <= _Maxidx)–>如果 _Buckets <= _Maxidx直接返回，否则进行扩容处理（_Forced_rehash(_Buckets);）

_Forced_rehash

_Forced_rehash() 是 unordered_set触发扩容（rehash）时的核心操作。

工作原理（整体思路）

计算新的桶数量（满足最小的，根据 rehash内部调用传入的_Buckets值）,位运算–>将桶数量调整为最接近的 2 的幂log2(x)
如果桶数量超过最大允许数量，抛出异常。
重新分配存储空间。
将所有元素从旧桶重新分配到新桶（按照新的哈希分布）。
调整哈希表元信息（掩码、最大桶索引等）。

ps:通过位运算提高索引定位效率:定位桶索引采用位运算（&）而不是取模（%）

位运算速度远高于取模运算
hash % bucket_size → hash & (bucket_size - 1)
当桶的数量（即 bucket_size）是 2 的幂时，hash % bucket_size
hash & (bucket_size - 1)得到的结果是相同的：

关键操作定位

步骤	关键操作	解释
调整桶数量	`_Ceiling_of_log_2()`	调整到 2 的幂次方
分配内存	`_Assign_grow()`	分配新桶
更新索引	`_Mask = _Buckets - 1`	设置新桶索引的位掩码
重新插入	`_Mylist::_Scary_val::_Unchecked_splice()`	将元素插入到桶中
更新桶指针	`_Bucket_lo = _Inserted`	更新桶起始位置

测试代码：

#include <iostream>
#include <unordered_set>
#include <string>
using namespace std;

int main()
{
    // 创建空 unordered_set 容器
    unordered_set<int> uset;

    cout << "uset 桶数: " << uset.bucket_count() << endl;
    cout << "uset 当前负载因子: " << uset.load_factor() << endl;
    cout << "uset 最大负载因子: " << uset.max_load_factor() << endl;

    cout << "-------设置 uset 使用最适合存储 9 个元素的桶数------" << endl;
    uset.reserve(9);
    cout << "uset 桶数: " << uset.bucket_count() << endl;
    cout << "uset 当前负载因子: " << uset.load_factor() << endl;
    cout << "uset 最大负载因子: " << uset.max_load_factor() << endl;

    cout << "-------向 uset 容器添加 4 个元素------" << endl;
    uset.insert(1);
    uset.insert(2);
    uset.insert(7);
    uset.insert(8);
    cout << "uset 桶数: " << uset.bucket_count() << endl;
    cout << "uset 当前负载因子: " << uset.load_factor() << endl;
    cout << "uset 最大负载因子: " << uset.max_load_factor() << endl;
    cout << "------------------------------" << endl;
    // 调用 bucket() 获取指定元素所在的桶编号
    cout << "元素1 位于桶的编号为: " << uset.bucket(1) << endl;

    // 使用 hash_function() 自行计算元素所在桶的编号
    auto fn1 = uset.hash_function();
    cout << "计算元素1 位于桶的编号为: " << fn1(1) % uset.bucket_count() << endl;

    // 调用 bucket() 获取指定元素所在的桶编号
    cout << "元素2 位于桶的编号为: " << uset.bucket(2) << endl;

    // 使用 hash_function() 自行计算元素所在桶的编号
    auto fn2 = uset.hash_function();
    cout << "计算元素2 位于桶的编号为: " << fn2(2) % uset.bucket_count() << endl;

    // 调用 bucket() 获取指定元素所在的桶编号
    cout << "元素7 位于桶的编号为: " << uset.bucket(7) << endl;

    // 使用 hash_function() 自行计算元素所在桶的编号
    auto fn = uset.hash_function();
    cout << "计算元素7 位于桶的编号为: " << fn(7) % uset.bucket_count() << endl;

    return 0;
}