木鸟杂记

大规模数据系统

'Golang Notes (III): A Model for Understanding Slice'

Overview

In Golang, a slice is very similar to arrays in other languages, yet differs in many ways. As a result, beginners often develop misunderstandings and unwittingly fall into various pitfalls when using slices. This short article first starts from the official Go blog, laying out the slice-related syntax provided by the official documentation. Next, it presents a model for understanding slices using diagrams. Finally, it summarizes and analyzes some special usage scenarios, aiming to provide a clearer profile of slices from multiple perspectives.

If you do not wish to read the lengthy narrative, you can jump directly to the summary at the end.

go-slice-view-derive.pnggo-slice-view-derive.png

Author: 木鸟杂记 https://www.qtmuniao.com/2021/01/09/go-slice/, please indicate the source when reposting

Basic Syntax

This section is mainly based on the official Go blog. In Go, slices and arrays are closely related; slices are built on top of arrays but are more flexible. Therefore, in Go, the underlying arrays of slices are rarely used directly. However, to understand slices, we must start with arrays.

Arrays

Arrays in Go are defined by type + length. Unlike C and C++, arrays of different lengths in Go are different types, and the variable name is not a pointer to the first element of the array.

1
2
3
4
5
6
7
8
9
10
11
12
// Several ways to initialize arrays
var a [4]int // variable a is of type [4]int, each element is automatically initialized to the zero-value of int
b := [5]int{1,2,3,4} // variable b is of type [5]int, a different type from [4]int, and b[4] is automatically initialized to the zero-value of int
c := [...]int{1,2,3,4,5} // variable c is inferred as type [5]int, the same type as b

func echo(x [4]int) {
fmt.Println(x)
}

echo(a) // when calling echo, all elements of a are copied because Go function calls are pass-by-value
echo(b) // error
echo(([4]int)c) // error

To summarize, Go arrays have the following characteristics:

  1. The length is part of the type, so variables of type [4]int and [5]int cannot be assigned to each other, nor can they be cast to each other.
  2. An array variable is not a pointer, so passing it as an argument causes a full copy. Of course, you can avoid this copy by using the corresponding pointer type as the parameter type.

It can be seen that due to the shackles of length, the usefulness of Go arrays is greatly limited. Go cannot, like C/C++, convert arrays of arbitrary lengths into pointers to the corresponding type and then perform subscript operations. Of course, Go does not need to do this, because it has a higher-level abstraction—the slice.

Slices

Slices are very common in Go code, but slices are based on arrays underneath:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
type slice struct {
array unsafe.Pointer // pointer to the underlying array; yes, golang also has pointers
len int // length of the slice
cap int // length of the underlying array
}

// Several ways to initialize slices
s0 := make([]byte, 5) // using the make function, where len = cap = 5, each element is initialized to the zero-value of byte
s1 := []byte{0, 0, 0, 0, 0} // literal initialization, where len = cap = 5
var s2 []byte // automatically initialized to the "zero-value" of slice: nil

// make method specifying both len/cap, must satisfy len <= cap
s3 := make([]byte, 0, 5) // slice length len = 0, underlying array cap = 5
s4 := make([]byte, 5, 5) // equivalent to make([]byte, 5)

Compared to arrays, slices have the following advantages:

  1. Flexible operations; as the name suggests, they support powerful slicing operations.
  2. Free from the restriction of length; when passed as parameters, slices of different lengths can all be passed in the form of []T.
  3. When assigning or passing a slice, the entire underlying array is not copied; only the slice struct itself described above is copied.
  4. With the help of some built-in functions, such as append/copy, they can be easily extended and moved as a whole.

Slice Operations. Using slice operations, you can quickly extract, extend, assign, and move slices.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// Slicing operation, left-closed and right-open; if starting from the beginning or ending at the end, the corresponding index can be omitted
// The newly obtained slice shares the underlying array with the original slice, thus avoiding element copying
b := []byte{'g', 'o', 'l', 'a', 'n', 'g'}
b1 := b[1:4] // b1 == []byte{'o', 'l', 'a'}
b2 := b[:2] // b2 == []byte{'g', 'o'}
b3 := b[2:] // b3 == []byte{'l', 'a', 'n', 'g'}
b4 := b[:] // b4 == b

// Extension operation, requires the append function
// May cause reallocation of the underlying array, which will be analyzed in detail later
// Equivalent to b = append(b, []byte{',', 'h', 'i'}...)
b = append(b, ',', 'h', 'i') // b is now {'g', 'o', 'l', 'a', 'n', 'g', ',', 'h', 'i'}

// Assignment operation, requires the copy function
copy(b[:2], []byte{'e', 'r'}) // b is now {'e', 'r', 'l', 'a', 'n', 'g', ',', 'h', 'i'}

// Move operation, requires copy
copy(b[2:], b[6:]) // move length takes min(len(dst), len(src))
b = b[:5] // b is now {'e', 'r', ',', 'h', 'i'}

Parameter Passing. Slices of different lengths and capacities can all be passed in the form of []T.

1
2
3
4
5
6
7
8
9
b := []int{1,2,3,4}
c := []int{1,2,3,4,5}

func echo(x []int) {
fmt.Println(x)
}

echo(b) // when passing parameters, a new slice struct sharing the underlying array with the same len and cap is created
echo(c)

Related Functions. The main built-in functions related to slices are:

  1. make for creation
  2. append for extension
  3. copy for moving

Let us discuss their characteristics separately.

The signature of the make function when creating slices (it can also be used to create many other built-in structures) is func make([]T, len, cap) []T. This function first creates an array of length cap, then creates a new slice struct pointing to that array, and initializes len and cap according to the parameters.

append modifies the underlying array of the slice, but does not change the original slice; instead, it returns a new slice struct with a new length. Why not modify the original slice in place? Because functions in Go are pass-by-value; of course, this also reflects a certain functional-style preference in Go. Therefore, append(s, 'a', b'') will not modify the slice s itself; to achieve the purpose of modifying the variable s, you need to reassign s: s = append(s, 'a', b'').

Note that when appending, if the capacity (cap) of the underlying array is insufficient, a new array large enough to hold all elements will be created, similar to the underlying mechanism of vector in C++, and the values of the original array will be copied over before appending. If the underlying array of the original slice is no longer referenced by other slice variables, it will be reclaimed during GC.

The copy function is more like syntactic sugar, encapsulating batch assignment to slices into a function. Note that the copy length takes the smaller of the two slices. Moreover, you do not need to worry about overlapping when moving sub-slices of the same slice. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
package main

import (
"fmt"
)

// Intuitively assumed implementation of the copy function
// But this implementation would cause overlapping when copying sub-slices of the same slice
// Therefore, the actual implementation of copy should use extra space or copy from back to front
func myCopy(dst, src []int) {
l := len(dst)
if len(src) < l {
l = len(src)
}

for i := 0; i < l; i++ {
dst[i] = src[i]
}
}

func main() {
a := []int{0,1,3,4,5,6}

copy(a[3:], a[2:]) // a = [0 1 3 3 4 5]
// myCopy(a[3:], a[2:]) // a = [0 1 3 3 3 3]
fmt.Println(a)
}

A common usage scenario for copy is when you need to insert an element into the middle of a slice; use copy to move the fragment after the insertion point backward as a whole.

A Model for Understanding Slices

When first using slices, one often feels that the rules are complex and difficult to remember entirely; so I often wondered if there is a suitable model to capture the essence of slices.

One day, an immature idea suddenly popped up: a slice is a linear read-write view that hides the underlying array. This view avoids the common pointer arithmetic operations in C/C++, because users can derive slices without calculating offsets.

A slice uses only three variables, ptr/cap/len, to depict a window view, where ptr and ptr+cap are the start and end boundaries of the window, and len is the currently visible length of the window. A new view can be extracted by subscripting, and Go will automatically calculate the new ptr/len/cap. All views derived through slice expressions point to the same underlying array.

go slice viewgo slice view

Slice derivation automatically shares the underlying array to avoid array copying and improve efficiency; when appending elements, if the capacity of the underlying array is insufficient, append will automatically create a new array and return a slice view pointing to the new array, while the original slice view still points to the original array.

Slice Usage

This section summarizes some interesting points about using slices.

Zero-value and Empty-value. In Go, all types have a zero-value, which is used as the default value during initialization. The zero-value of a slice is nil.

1
2
3
4
5
6
7
func add(a []int) []int { // nil can be passed as an argument to the []int slice type
return append(a, 0, 1, 2)
}

func main() {
fmt.Println(add(nil)) // [0 1 2]
}

You can create an empty slice using make, with the same len/cap as the zero-value, but there are also the following small differences. If both are acceptable, nil is recommended.

1
2
3
4
5
6
7
8
9
10
11
12
func main() {
a := make([]int, 0)
var b []int

fmt.Println(a, len(a), cap(a)) // [] 0 0
fmt.Printf("%#v\n", a) // []int{}
fmt.Println(a==nil) // false

fmt.Println(b, len(b), cap(b)) // [] 0 0
fmt.Printf("%#v\n", b) // []int(nil)
fmt.Println(b==nil) // true
}

append Semantics. append first appends elements to the underlying array, then constructs a new slice and returns it. That is to say, even if we do not use the return value, the corresponding values will still be appended to the underlying array.

1
2
3
4
5
6
7
func main() {
a := make([]int, 0, 5)
_ = append(a, 0, 1, 2)
fmt.Println(a) // []
fmt.Println(a[:5]) // [0 1 2 0 0]; by using a slice expression to expand the window length, the appended values can be seen
fmt.Println(a[:6]) // panic; length out of bounds
}

Generating a Slice from an Array. You can generate a slice s of the desired length from an array a using slice syntax. At this point: the underlying array of s is a. In other words, using slice syntax on an array will not cause the array to be copied.

1
2
3
4
5
6
7
8
func main() {
a := [7]int{1,2,3}
s := a[:4]
fmt.Println(s) // [1 2 3 0]

a[3] = 4 // modifying a, the corresponding value in s also changes, indicating that the underlying array of s is a
fmt.Println(s) // [1 2 3 4]
}

Modifying the Right Boundary of a Slice View. In the view model proposed above, when performing a slice operation, the left boundary of the newly generated slice changes with the start parameter, but the right boundary remains unchanged, which is the end of the underlying array. If we want to modify its right boundary, we can use the three-parameter slice (Full slice Expression), adding a limited-capacity parameter.

A usage scenario for this feature is that if we want the new slice not to affect the original array when appending, we can modify its right boundary so that when appending, if the capacity is insufficient, a new underlying array is forced to be created.

go-full-slice-view-derive.pnggo-full-slice-view-derive.png

Summary

The core purpose of this article is to propose an easy-to-remember and easy-to-understand model for slices, to break down the ever-changing complexity of slice usage. To summarize, when understanding slices, we can approach it from two levels:

  1. Underlying data (underlying array)
  2. Upper view (slice)

The view has three key variables: array pointer (ptr), valid length (len), and view capacity (cap).

Through slice expressions, slices can be generated from arrays and from slices. This operation does not cause copying of array data. Through the append operation, whether array copying is performed depends on the cap of this view, and a view pointing to the new array is returned.

References

  1. 酷壳 coolshell: Go编程模式:切片,接口,时间和性能
  2. The Go Blog: Go slices: usage and internals


我是青藤木鸟,一个喜欢摄影、专注大规模数据系统的程序员,欢迎关注我的公众号:“木鸟杂记”,有更多的分布式系统、存储和数据库相关的文章,欢迎关注。 关注公众号后,回复“资料”可以获取我总结一份分布式数据库学习资料。 回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群,可以添加我的微信号:qtmuniao,我拉你入群。加我时记得备注:“分布式系统群”。 另外,如果你不想加群,还有一个分布式系统和数据库的论坛(点这里),欢迎来玩耍。

wx-distributed-system-s.jpg