golang map探究

最新推荐文章于 2024-01-12 12:58:04 发布

原创最新推荐文章于 2024-01-12 12:58:04 发布 · 706 阅读

0 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#map #golang

Golang 专栏收录该内容

2 篇文章

订阅专栏

本文详细探讨了Go语言中的map，包括定义、初始化、判断键值存在、赋值、遍历及底层实现。特别强调了map的value赋值的注意事项，以及map的扩容策略和哈希冲突的解决方法。通过对golang map的底层结构hmap的分析，揭示了map的工作原理。

map的定义

map[KeyType]ValueType

KeyType为键的类型，ValueType为值的类型

map 初始化

    m1 := make(map[int]int)
    fmt.Println(m1) // map[]
    m2 := make(map[int]int,1)

make初始化的时候，可以不指定map容量，也可以指定容量

判断map是否存在某个值

package main

import "fmt"
  
func main() {
    m1 := make(map[int]int)
    m1[1] = 1
    m1[2] = 2
    v, ok := m1[1] // 返回两个值，后面的bool值就是判断map是否存在 1 这个值
    if ok {
        fmt.Println(v)
    } else {
        fmt.Println("without this key")
    }
}

Map的value赋值

package main

import (
    "fmt"
)

type S struct {
    s string
}

func main() {
    intmap := make(map[int]int, 0)
    intmap[1] = 1
    fmt.Println(intmap[1])  //1
    intmap[1] = 2
    fmt.Println(intmap[1])  //2

    structMap := make(map[int]S, 0)
    tmps := S{s: "tmp1"}
    structMap[1] = tmps
    fmt.Println(structMap[1])   //tmp1
    structMap[1].s = "tmp2"
    fmt.Println(structMap[1])   //cannot assign to struct field structMap[1].s in map
}

报错原因：map[int]S的value是一个Student结构值，所以当structMap[1].s = "tmp2"是一个值拷贝过程。而structMap[1]则是一个值引用。那么值引用的特点是只读。

修改方式1：

    tmps1 := structMap[1]
    tmps1.s = "tmp2"
    structMap[1] = tmps1
    fmt.Println(structMap[1])

先做一次值拷贝，做出一个tmps1副本,然后修改该副本，然后再次发生一次值拷贝复制回去，structMap[1] = tmps1,但是这种会在整体过程中发生2次结构体值拷贝，性能很差。

	structMap := make(map[int]*S, 0)
    tmps := S{s: "tmp1"}
    structMap[1] = &tmps
    fmt.Println(structMap[1]) //tmp1
    structMap[1].s = "tmp2"
    fmt.Println(structMap[1])

map的类型的value由S值，改成S指针。这样，我们实际上每次修改的都是指针所指向的S空间，指针本身是常指针，不能修改，只读属性，但是指向的S是可以随便修改的，而且这里并不需要值拷贝。只是一个指针的赋值。

map的遍历赋值

package main

import "fmt"

type S struct {
    s string
    i int
}

func main() {
    Slist := []S{
        {s: "1", i: 1},
        {s: "2", i: 2},
        {s: "3", i: 3},
    }
    sMap := make(map[string]*S, 0)

    for _, s := range Slist {
        sMap[s.s] = &s
    }

    for k, v := range sMap {
        fmt.Println(k, "==>", v)
    }
    // 1 ==> &{3 3}
	// 2 ==> &{3 3}
	// 3 ==> &{3 3}
}

map中的3个key均指向数组中最后一个结构体。因为foreach中，stu是结构体的一个拷贝副本，所以sMap[s.s] = &s实际上一致指向同一个指针，最终该指针的值为遍历的最后一个struct的值拷贝。

修改方法：

package main

import "fmt"

type S struct {
    s string
    i int
}

func main() {
    Slist := []S{
        {s: "1", i: 1},
        {s: "2", i: 2},
        {s: "3", i: 3},
    }
    sMap := make(map[string]*S, 0)

    // for _, s := range Slist {
    //     sMap[s.s] = &s
    // }

    for i := 0; i < len(Slist); i++ {
        sMap[Slist[i].s] = &Slist[i]
    }

    for k, v := range sMap {
        fmt.Println(k, "==>", v)
    }
    
    //1 ==> &{1 1}
	//2 ==> &{2 2}
	//3 ==> &{3 3}
}

map底层探究

以下探究为收纳整理多方讲解

map的底层是以hashmap实现的，解决冲突是采用链表法

golang中map的底层结构是定义在go/src/runtime/map.go，结构体为hmap，以下为该结构体：

// A header for a Go map.
type hmap struct {
        // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
        // Make sure this stays in sync with the compiler's definition.
        count     int // # live cells == size of map.  Must be first (used by len() builtin)
        flags     uint8
        B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
        noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
        hash0     uint32 // hash seed

        buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
        oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
        nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

        extra *mapextra // optional fields
}

如注释里说明的，B是buckets的长度的对数，即可以包含2^B个内容。bucket就是存储key和value。

创建：

底层调用的是 makemap 函数，主要做的工作就是初始化 hmap 结构体的各种字段，例如计算 B 的大小，设置哈希种子 hash0 等等。

函数返回的结果是 *hamp，是一个指针。故makemap和makeslice的区别在于此：

当 map 和 slice 作为函数参数时，在函数参数内部对 map 的操作会影响 map 自身；而对 slice 却不会。主要原因就是一个是指针（*hmap），一个是结构体（slice）。Go 语言中的函数传参都是值传递，在函数内部，参数会被 copy 到本地。*hmap指针 copy 完之后，仍然指向同一个 map，因此函数内部对 map 的操作会影响实参。而 slice 被 copy 后，会成为一个新的 slice，对它进行的操作不会影响到实参。

访问：

查询key在buckets的位置，key 经过哈希计算后得到哈希值，共 64 个 bit 位（64位机），计算它到底要落在哪个桶时，只会用到最后 B 个 bit 位。然后再用哈希值高8位（bucket里会最多装 8 个 key）寻找对应的tophash 值（HOB hash）。当两个不同的 key 落在同一个桶中，也就是发生了哈希冲突。冲突的解决手段是用链表法。

例，有个key 经过哈希函数计算后，得到的哈希结果是：

10010111 | 000011110110110010001111001010100010010110010101010 │ 01010

假定 B = 5，所以 bucket 总数就是 2^5 = 32。首先计算出待查找 key 的哈希，使用低 5 位 00110，找到对应的 6 号 bucket，使用高 8 位 10010111，对应十进制 151，在 6 号 bucket 中寻找 tophash 值（HOB hash）为 151 的 key，找到了 2 号槽位，这样整个查找过程就结束了。如果在 bucket 中没找到，并且 overflow 不为空，还要继续去 overflow bucket 中寻找，直到找到或是所有的 key 槽位都找遍了，包括所有的 overflow bucket。

查找key的方法定义为：

// returns key, if not find, returns nil
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer 

// returns key and exist. if not find, returns nil, false
func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool)

// returns both key and value. if not find, returns nil, nil
func mapaccessK(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, unsafe.Pointer)

插入：

根据key值算出哈希值
取哈希值低位与hmap.B取模确定bucket位置
查找该key是否已经存在，如果存在则直接更新值
如果没找到将key，将key插入

扩容：

Go 源码里这样定义 装载因子：

loadFactor := count / (2^B)

count 就是 map 的元素个数，2^B 表示 bucket 数量。

再来说触发 map 扩容的时机：在向 map 插入新 key 的时候，会进行条件检测，符合下面这 2 个条件，就会触发扩容：

装载因子超过阈值，源码里定义的阈值是 6.5。
overflow 的 bucket 数量过多：当 B 小于 15，也就是 bucket 总数 2^B 小于 2^15 时，如果 overflow 的 bucket 数量超过 2^B；当 B >= 15，也就是 bucket 总数 2^B 大于等于 2^15，如果 overflow 的 bucket 数量超过 2^15。

搬迁过程：