In Go, strings are immutable byte sequences encoded in UTF-8. The string data type is represented as string in Go.
Each string is a sequence of one or more bytes, where bytes in Go are of the uint8 type, meaning they are 8-bit unsigned integers. Since Go strings use UTF-8 encoding, each Unicode code point (or character) can be represented by one to four bytes.
This design enables Go strings to easily handle text in multiple languages while efficiently supporting ASCII. UTF-8 encoding allows strings to effectively manage characters of varying lengths and facilitates network transmission and storage.
For example, consider the following Go program code:
gopackage main import ( "fmt" ) func main() { greeting := "Hello, 世界" fmt.Println(greeting) fmt.Printf("Type of greeting: %T\n", greeting) fmt.Printf("Length of greeting: %d bytes\n", len(greeting)) fmt.Printf("Number of runes in greeting: %d\n", len([]rune(greeting))) }
In this example, the string greeting contains the English word "Hello," and the Chinese word "世界". We can see:
greetingis of typestring.len(greeting)returns the byte length, as each character in "世界" occupies 3 bytes in UTF-8 encoding, totaling 6 bytes, plus 6 bytes for "Hello," and 2 bytes for the comma and space, resulting in 13 bytes.- After converting to a
runeslice usinglen([]rune(greeting)), we obtain the number of characters, which is 9 Unicode characters.
Therefore, the string type in Go is well-suited for handling internationalized text while maintaining good performance and flexibility.