OLD | NEW |
(Empty) | |
| 1 <!--{ |
| 2 "Title": "Gobs of data", |
| 3 "Template": true |
| 4 }--> |
| 5 |
| 6 <p> |
| 7 To transmit a data structure across a network or to store it in a file, it must |
| 8 be encoded and then decoded again. There are many encodings available, of |
| 9 course: <a href="http://www.json.org/">JSON</a>, |
| 10 <a href="http://www.w3.org/XML/">XML</a>, Google's |
| 11 <a href="http://code.google.com/p/protobuf">protocol buffers</a>, and more. |
| 12 And now there's another, provided by Go's <a href="/pkg/encoding/gob/">gob</a> |
| 13 package. |
| 14 </p> |
| 15 |
| 16 <p> |
| 17 Why define a new encoding? It's a lot of work and redundant at that. Why not |
| 18 just use one of the existing formats? Well, for one thing, we do! Go has |
| 19 <a href="/pkg/">packages</a> supporting all the encodings just mentioned (the |
| 20 <a href="http://code.google.com/p/goprotobuf">protocol buffer package</a> is in |
| 21 a separate repository but it's one of the most frequently downloaded). And for |
| 22 many purposes, including communicating with tools and systems written in other |
| 23 languages, they're the right choice. |
| 24 </p> |
| 25 |
| 26 <p> |
| 27 But for a Go-specific environment, such as communicating between two servers |
| 28 written in Go, there's an opportunity to build something much easier to use and |
| 29 possibly more efficient. |
| 30 </p> |
| 31 |
| 32 <p> |
| 33 Gobs work with the language in a way that an externally-defined, |
| 34 language-independent encoding cannot. At the same time, there are lessons to be |
| 35 learned from the existing systems. |
| 36 </p> |
| 37 |
| 38 <p> |
| 39 <b>Goals</b> |
| 40 </p> |
| 41 |
| 42 <p> |
| 43 The gob package was designed with a number of goals in mind. |
| 44 </p> |
| 45 |
| 46 <p> |
| 47 First, and most obvious, it had to be very easy to use. First, because Go has |
| 48 reflection, there is no need for a separate interface definition language or |
| 49 "protocol compiler". The data structure itself is all the package should need |
| 50 to figure out how to encode and decode it. On the other hand, this approach |
| 51 means that gobs will never work as well with other languages, but that's OK: |
| 52 gobs are unashamedly Go-centric. |
| 53 </p> |
| 54 |
| 55 <p> |
| 56 Efficiency is also important. Textual representations, exemplified by XML and |
| 57 JSON, are too slow to put at the center of an efficient communications network. |
| 58 A binary encoding is necessary. |
| 59 </p> |
| 60 |
| 61 <p> |
| 62 Gob streams must be self-describing. Each gob stream, read from the beginning, |
| 63 contains sufficient information that the entire stream can be parsed by an |
| 64 agent that knows nothing a priori about its contents. This property means that |
| 65 you will always be able to decode a gob stream stored in a file, even long |
| 66 after you've forgotten what data it represents. |
| 67 </p> |
| 68 |
| 69 <p> |
| 70 There were also some things to learn from our experiences with Google protocol |
| 71 buffers. |
| 72 </p> |
| 73 |
| 74 <p> |
| 75 <b>Protocol buffer misfeatures</b> |
| 76 </p> |
| 77 |
| 78 <p> |
| 79 Protocol buffers had a major effect on the design of gobs, but have three |
| 80 features that were deliberately avoided. (Leaving aside the property that |
| 81 protocol buffers aren't self-describing: if you don't know the data definition |
| 82 used to encode a protocol buffer, you might not be able to parse it.) |
| 83 </p> |
| 84 |
| 85 <p> |
| 86 First, protocol buffers only work on the data type we call a struct in Go. You |
| 87 can't encode an integer or array at the top level, only a struct with fields |
| 88 inside it. That seems a pointless restriction, at least in Go. If all you want |
| 89 to send is an array of integers, why should you have to put put it into a |
| 90 struct first? |
| 91 </p> |
| 92 |
| 93 <p> |
| 94 Next, a protocol buffer definition may specify that fields <code>T.x</code> and |
| 95 <code>T.y</code> are required to be present whenever a value of type |
| 96 <code>T</code> is encoded or decoded. Although such required fields may seem |
| 97 like a good idea, they are costly to implement because the codec must maintain a |
| 98 separate data structure while encoding and decoding, to be able to report when |
| 99 required fields are missing. They're also a maintenance problem. Over time, one |
| 100 may want to modify the data definition to remove a required field, but that may |
| 101 cause existing clients of the data to crash. It's better not to have them in the |
| 102 encoding at all. (Protocol buffers also have optional fields. But if we don't |
| 103 have required fields, all fields are optional and that's that. There will be |
| 104 more to say about optional fields a little later.) |
| 105 </p> |
| 106 |
| 107 <p> |
| 108 The third protocol buffer misfeature is default values. If a protocol buffer |
| 109 omits the value for a "defaulted" field, then the decoded structure behaves as |
| 110 if the field were set to that value. This idea works nicely when you have |
| 111 getter and setter methods to control access to the field, but is harder to |
| 112 handle cleanly when the container is just a plain idiomatic struct. Required |
| 113 fields are also tricky to implement: where does one define the default values, |
| 114 what types do they have (is text UTF-8? uninterpreted bytes? how many bits in a |
| 115 float?) and despite the apparent simplicity, there were a number of |
| 116 complications in their design and implementation for protocol buffers. We |
| 117 decided to leave them out of gobs and fall back to Go's trivial but effective |
| 118 defaulting rule: unless you set something otherwise, it has the "zero value" |
| 119 for that type - and it doesn't need to be transmitted. |
| 120 </p> |
| 121 |
| 122 <p> |
| 123 So gobs end up looking like a sort of generalized, simplified protocol buffer. |
| 124 How do they work? |
| 125 </p> |
| 126 |
| 127 <p> |
| 128 <b>Values</b> |
| 129 </p> |
| 130 |
| 131 <p> |
| 132 The encoded gob data isn't about <code>int8</code>s and <code>uint16</code>s. |
| 133 Instead, somewhat analogous to constants in Go, its integer values are abstract, |
| 134 sizeless numbers, either signed or unsigned. When you encode an |
| 135 <code>int8</code>, its value is transmitted as an unsized, variable-length |
| 136 integer. When you encode an <code>int64</code>, its value is also transmitted as |
| 137 an unsized, variable-length integer. (Signed and unsigned are treated |
| 138 distinctly, but the same unsized-ness applies to unsigned values too.) If both |
| 139 have the value 7, the bits sent on the wire will be identical. When the receiver |
| 140 decodes that value, it puts it into the receiver's variable, which may be of |
| 141 arbitrary integer type. Thus an encoder may send a 7 that came from an |
| 142 <code>int8</code>, but the receiver may store it in an <code>int64</code>. This |
| 143 is fine: the value is an integer and as a long as it fits, everything works. (If |
| 144 it doesn't fit, an error results.) This decoupling from the size of the variable |
| 145 gives some flexibility to the encoding: we can expand the type of the integer |
| 146 variable as the software evolves, but still be able to decode old data. |
| 147 </p> |
| 148 |
| 149 <p> |
| 150 This flexibility also applies to pointers. Before transmission, all pointers are |
| 151 flattened. Values of type <code>int8</code>, <code>*int8</code>, |
| 152 <code>**int8</code>, <code>****int8</code>, etc. are all transmitted as an |
| 153 integer value, which may then be stored in <code>int</code> of any size, or |
| 154 <code>*int</code>, or <code>******int</code>, etc. Again, this allows for |
| 155 flexibility. |
| 156 </p> |
| 157 |
| 158 <p> |
| 159 Flexibility also happens because, when decoding a struct, only those fields |
| 160 that are sent by the encoder are stored in the destination. Given the value |
| 161 </p> |
| 162 |
| 163 {{code "/doc/progs/gobs1.go" `/type T/` `/STOP/`}} |
| 164 |
| 165 <p> |
| 166 the encoding of <code>t</code> sends only the 7 and 8. Because it's zero, the |
| 167 value of <code>Y</code> isn't even sent; there's no need to send a zero value. |
| 168 </p> |
| 169 |
| 170 <p> |
| 171 The receiver could instead decode the value into this structure: |
| 172 </p> |
| 173 |
| 174 {{code "/doc/progs/gobs1.go" `/type U/` `/STOP/`}} |
| 175 |
| 176 <p> |
| 177 and acquire a value of <code>u</code> with only <code>X</code> set (to the |
| 178 address of an <code>int8</code> variable set to 7); the <code>Z</code> field is |
| 179 ignored - where would you put it? When decoding structs, fields are matched by |
| 180 name and compatible type, and only fields that exist in both are affected. This |
| 181 simple approach finesses the "optional field" problem: as the type |
| 182 <code>T</code> evolves by adding fields, out of date receivers will still |
| 183 function with the part of the type they recognize. Thus gobs provide the |
| 184 important result of optional fields - extensibility - without any additional |
| 185 mechanism or notation. |
| 186 </p> |
| 187 |
| 188 <p> |
| 189 From integers we can build all the other types: bytes, strings, arrays, slices, |
| 190 maps, even floats. Floating-point values are represented by their IEEE 754 |
| 191 floating-point bit pattern, stored as an integer, which works fine as long as |
| 192 you know their type, which we always do. By the way, that integer is sent in |
| 193 byte-reversed order because common values of floating-point numbers, such as |
| 194 small integers, have a lot of zeros at the low end that we can avoid |
| 195 transmitting. |
| 196 </p> |
| 197 |
| 198 <p> |
| 199 One nice feature of gobs that Go makes possible is that they allow you to define |
| 200 your own encoding by having your type satisfy the |
| 201 <a href="/pkg/encoding/gob/#GobEncoder">GobEncoder</a> and |
| 202 <a href="/pkg/encoding/gob/#GobDecoder">GobDecoder</a> interfaces, in a manner |
| 203 analogous to the <a href="/pkg/encoding/json/">JSON</a> package's |
| 204 <a href="/pkg/encoding/json/#Marshaler">Marshaler</a> and |
| 205 <a href="/pkg/encoding/json/#Unmarshaler">Unmarshaler</a> and also to the |
| 206 <a href="/pkg/fmt/#Stringer">Stringer</a> interface from |
| 207 <a href="/pkg/fmt/">package fmt</a>. This facility makes it possible to |
| 208 represent special features, enforce constraints, or hide secrets when you |
| 209 transmit data. See the <a href="/pkg/encoding/gob/">documentation</a> for |
| 210 details. |
| 211 </p> |
| 212 |
| 213 <p> |
| 214 <b>Types on the wire</b> |
| 215 </p> |
| 216 |
| 217 <p> |
| 218 The first time you send a given type, the gob package includes in the data |
| 219 stream a description of that type. In fact, what happens is that the encoder is |
| 220 used to encode, in the standard gob encoding format, an internal struct that |
| 221 describes the type and gives it a unique number. (Basic types, plus the layout |
| 222 of the type description structure, are predefined by the software for |
| 223 bootstrapping.) After the type is described, it can be referenced by its type |
| 224 number. |
| 225 </p> |
| 226 |
| 227 <p> |
| 228 Thus when we send our first type <code>T</code>, the gob encoder sends a |
| 229 description of <code>T</code> and tags it with a type number, say 127. All |
| 230 values, including the first, are then prefixed by that number, so a stream of |
| 231 <code>T</code> values looks like: |
| 232 </p> |
| 233 |
| 234 <pre> |
| 235 ("define type id" 127, definition of type T)(127, T value)(127, T value), ... |
| 236 </pre> |
| 237 |
| 238 <p> |
| 239 These type numbers make it possible to describe recursive types and send values |
| 240 of those types. Thus gobs can encode types such as trees: |
| 241 </p> |
| 242 |
| 243 {{code "/doc/progs/gobs1.go" `/type Node/` `/STOP/`}} |
| 244 |
| 245 <p> |
| 246 (It's an exercise for the reader to discover how the zero-defaulting rule makes |
| 247 this work, even though gobs don't represent pointers.) |
| 248 </p> |
| 249 |
| 250 <p> |
| 251 With the type information, a gob stream is fully self-describing except for the |
| 252 set of bootstrap types, which is a well-defined starting point. |
| 253 </p> |
| 254 |
| 255 <p> |
| 256 <b>Compiling a machine</b> |
| 257 </p> |
| 258 |
| 259 <p> |
| 260 The first time you encode a value of a given type, the gob package builds a |
| 261 little interpreted machine specific to that data type. It uses reflection on |
| 262 the type to construct that machine, but once the machine is built it does not |
| 263 depend on reflection. The machine uses package unsafe and some trickery to |
| 264 convert the data into the encoded bytes at high speed. It could use reflection |
| 265 and avoid unsafe, but would be significantly slower. (A similar high-speed |
| 266 approach is taken by the protocol buffer support for Go, whose design was |
| 267 influenced by the implementation of gobs.) Subsequent values of the same type |
| 268 use the already-compiled machine, so they can be encoded right away. |
| 269 </p> |
| 270 |
| 271 <p> |
| 272 Decoding is similar but harder. When you decode a value, the gob package holds |
| 273 a byte slice representing a value of a given encoder-defined type to decode, |
| 274 plus a Go value into which to decode it. The gob package builds a machine for |
| 275 that pair: the gob type sent on the wire crossed with the Go type provided for |
| 276 decoding. Once that decoding machine is built, though, it's again a |
| 277 reflectionless engine that uses unsafe methods to get maximum speed. |
| 278 </p> |
| 279 |
| 280 <p> |
| 281 <b>Use</b> |
| 282 </p> |
| 283 |
| 284 <p> |
| 285 There's a lot going on under the hood, but the result is an efficient, |
| 286 easy-to-use encoding system for transmitting data. Here's a complete example |
| 287 showing differing encoded and decoded types. Note how easy it is to send and |
| 288 receive values; all you need to do is present values and variables to the |
| 289 <a href="/pkg/encoding/gob/">gob package</a> and it does all the work. |
| 290 </p> |
| 291 |
| 292 {{code "/doc/progs/gobs2.go" `/package main/` `$`}} |
| 293 |
| 294 <p> |
| 295 You can compile and run this example code in the |
| 296 <a href="http://play.golang.org/p/_-OJV-rwMq">Go Playground</a>. |
| 297 </p> |
| 298 |
| 299 <p> |
| 300 The <a href="/pkg/net/rpc/">rpc package</a> builds on gobs to turn this |
| 301 encode/decode automation into transport for method calls across the network. |
| 302 That's a subject for another article. |
| 303 </p> |
| 304 |
| 305 <p> |
| 306 <b>Details</b> |
| 307 </p> |
| 308 |
| 309 <p> |
| 310 The <a href="/pkg/encoding/gob/">gob package documentation</a>, especially the |
| 311 file <a href="/src/pkg/encoding/gob/doc.go">doc.go</a>, expands on many of the |
| 312 details described here and includes a full worked example showing how the |
| 313 encoding represents data. If you are interested in the innards of the gob |
| 314 implementation, that's a good place to start. |
| 315 </p> |
OLD | NEW |