iter.json: A Powerful and Efficient Way to Iterate and Manipulate JSON in Go

2024-12-12

· go

Have you ever needed to modify unstructured JSON data in Go? Maybe you’ve had to delete password and all blacklisted fields, rename keys from camelCase to snake_case, or convert all number ids to strings because JavaScript does not like int64? If your solution has been to unmarshal everything into a map[string]any using encoding/json and then marshal it back… well, let’s face it, that’s far from efficient!

What if you could loop through the JSON data, grab the path of each item, and decide exactly what to do with it on the fly?

Yes! I have a good news! With the new iterator feature in Go 1.23, there’s a better way to iterate and manipulate JSON.
Meet ezpkg.io/iter.json — your powerful and efficient companion for working with JSON in Go.

Hello World!

ezpkg.io/iter.json is a Go package that provides a simple and efficient way to iterate over JSON data. It allows you to traverse JSON objects, arrays, and values, and perform various operations on them without fully parsing the data.

At the core, it provides a Parse() function and a Builder type. The Parse() function returns an iterator that yields each item in the JSON data, while the Builder type allows you to build new JSON data dynamically.

Let’s look at some examples of how to use iter.json to iterate, build, format, filter, and edit JSON data in Go.

1. Iterating JSON

Given that we have an alice.json file:

{
  "name": "Alice",
  "age": 24,
  "scores": [9, 10, 8],
  "address": {
    "city": "The Sun",
    "zip": 10101
  }
}

First, let’s use for range Parse() to iterate over the JSON file, then print the path, key, token, and level of each item. See examples/01.iter.

package main

import (
    "fmt"

    "ezpkg.io/errorz"
    iterjson "ezpkg.io/iter.json"
)

func main() {
    data := `{"name": "Alice", "age": 24, "scores": [9, 10, 8], "address": {"city": "The Sun", "zip": 10101}}`

    // 🎄Example: iterate over json
    fmt.Printf("| %12v | %10v | %10v |%v|\n", "PATH", "KEY", "TOKEN", "LVL")
    fmt.Println("| ------------ | ---------- | ---------- | - |")
    for item, err := range iterjson.Parse([]byte(data)) {
        errorz.MustZ(err)

        fmt.Printf("| %12v | %10v | %10v | %v |\n", 
            item.GetPathString(), item.Key, item.Token, item.Level)
    }
}

The code will output:

|         PATH |        KEY |      TOKEN |LVL|
| ------------ | ---------- | ---------- | - |
|              |            |          { | 0 |
|         name |     "name" |    "Alice" | 1 |
|          age |      "age" |         24 | 1 |
|       scores |   "scores" |          [ | 1 |
|     scores.0 |            |          9 | 2 |
|     scores.1 |            |         10 | 2 |
|     scores.2 |            |          8 | 2 |
|       scores |            |          ] | 1 |
|      address |  "address" |          { | 1 |
| address.city |     "city" |  "The Sun" | 2 |
|  address.zip |      "zip" |      10101 | 2 |
|      address |            |          } | 1 |
|              |            |          } | 0 |

2. Building JSON

Use Builder to build a JSON data. It accepts optional arguments for indentation. See examples/02.builder.

Create a new Builder with NewBuilder(prefix, indent string).
Builder.AddRaw(key RawToken, token RawToken) adds a raw token to the JSON data.
Builder.Add(key any, token any) adds a key-value pair to the JSON data.
Builder.Bytes() returns the JSON data as a byte slice.
It accepts various types, including string, int, struct, []byte, etc.

b := iterjson.NewBuilder("", "    ")
// open an object
b.Add("", iterjson.TokenObjectOpen)

// add a few fields
b.Add("name", "Alice")
b.Add("age", 22)
b.Add("email", "alice@example.com")
b.Add("phone", "(+84) 123-456-789")

// open an array
b.Add("languages", iterjson.TokenArrayOpen)
b.Add("", "English")
b.Add("", "Vietnamese")
b.Add("", iterjson.TokenArrayClose)
// close the array

// accept any type that can marshal to json
b.Add("address", Address{
    HouseNumber: 42,
    Street:      "Ly Thuong Kiet",
    City:        "Ha Noi",
    Country:     "Vietnam",
})

// accept []byte as raw json
b.Add("pets", []byte(`[{"type":"cat","name":"Kitty","age":2},{"type":"dog","name":"Yummy","age":3}]`))

// close the object
b.Add("", iterjson.TokenObjectClose)

out := errorz.Must(b.Bytes())
fmt.Printf("\n--- build json ---\n%s\n", out)

Which will output the JSON with indentation:

{
    "name": "Alice",
    "age": 22,
    "email": "alice@example.com",
    "phone": "(+84) 123-456-789",
    "languages": [
        "English",
        "Vietnamese"
    ],
    "address": {"house_number":42,"street":"Ly Thuong Kiet","city":"Ha Noi","country":"Vietnam"},
    "pets": [
        {
            "type": "cat",
            "name": "Kitty",
            "age": 2
        },
        {
            "type": "dog",
            "name": "Yummy",
            "age": 3
        }
    ]
}

3. Formatting JSON

You can reconstruct or format a JSON data by sending its key and values to a Builder. See examples/03.reformat.

{
    // 🐝Example: minify json
    b := iterjson.NewBuilder("", "")
    for item, err := range iterjson.Parse(data) {
        errorz.MustZ(err)
        b.AddRaw(item.Key, item.Token)
    }
    out := errorz.Must(b.Bytes())
    fmt.Printf("\n--- minify ---\n%s\n----------\n", out)
}
{
    // 🦋Example: format json
    b := iterjson.NewBuilder("👉   ", "\t")
    for item, err := range iterjson.Parse(data) {
        errorz.MustZ(err)
        b.AddRaw(item.Key, item.Token)
    }
    out := errorz.Must(b.Bytes())
    fmt.Printf("\n--- reformat ---\n%s\n----------\n", out)
}

The first example minifies the JSON while the second example formats it with prefix “👉” on each line.

--- minify ---
{"name":"Alice","age":24,"scores":[9,10,8],"address":{"city":"The Sun","zip":10101}}
----------

--- reformat ---
👉   {
👉       "name": "Alice",
👉       "age": 24,
👉       "scores": [
👉           9,
👉           10,
👉           8
👉       ],
👉       "address": {
👉           "city": "The Sun",
👉           "zip": 10101
👉       }
👉   }
----------

4. Adding line numbers

In this example, we add line numbers to the JSON output, by adding a b.WriteNewline() before the fmt.Fprintf() call. See examples/04.line_number.

// 🐞Example: print with line number
i := 0
b := iterjson.NewBuilder("", "    ")
for item, err := range iterjson.Parse(data) {
    i++
    errorz.MustZ(err)
    b.WriteNewline(item.Token.Type())

    // 👉 add line number
    fmt.Fprintf(b, "%3d    ", i)
    b.Add(item.Key, item.Token)
}
out := errorz.Must(b.Bytes())
fmt.Printf("\n--- line number ---\n%s\n----------\n", out)

This will output:

  1    {
  2        "name": "Alice",
  3        "age": 24,
  4        "scores": [
  5            9,
  6            10,
  7            8
  8        ],
  9        "address": {
 10            "city": "The Sun",
 11            "zip": 10101
 12        }
 13    }

5. Adding comments

By putting a fmt.Fprintf(comment) between b.WriteComma() and b.WriteNewline(), you can add a comment to the end of each line. See examples/05.comment.

i, newlineIdx, maxIdx := 0, 0, 30
b := iterjson.NewBuilder("", "    ")
for item, err := range iterjson.Parse(data) {
    errorz.MustZ(err)
    b.WriteComma(item.Token.Type())

    // 👉 add comment
    if i > 0 {
        length := b.Len() - newlineIdx
        fmt.Fprint(b, strings.Repeat(" ", maxIdx-length))
        fmt.Fprintf(b, "// %2d", i)
    }
    i++

    b.WriteNewline(item.Token.Type())
    newlineIdx = b.Len() // save the newline index

    b.Add(item.Key, item.Token)
}
length := b.Len() - newlineIdx
fmt.Fprint(b, strings.Repeat(" ", maxIdx-length))
fmt.Fprintf(b, "// %2d", i)

out := errorz.Must(b.Bytes())
fmt.Printf("\n--- comment ---\n%s\n----------\n", out)

This will output:

{                             //  1
    "name": "Alice",          //  2
    "age": 24,                //  3
    "scores": [               //  4
        9,                    //  5
        10,                   //  6
        8                     //  7
    ],                        //  8
    "address": {              //  9
        "city": "The Sun",    // 10
        "zip": 10101          // 11
    }                         // 12
}                             // 13

6. Filtering JSON and extracting values

There are item.GetPathString() and item.GetRawPath() to get the path of the current item. You can use them to filter the JSON data. See examples/06.filter_print.

Example with item.GetPathString() and regexp:

fmt.Printf("\n--- filter: GetPathString() ---\n")
i := 0
for item, err := range iterjson.Parse(data) {
    i++
    errorz.MustZ(err)

    path := item.GetPathString()
    switch {
    case path == "name",
        strings.Contains(path, "address"):
        // continue
    default:
        continue
    }

    // 👉 print with line number
    fmt.Printf("%2d %20s . %s\n", i, item.Token, item.GetPath())
}

Example with item.GetRawPath() and path.Match():

fmt.Printf("\n--- filter: GetRawPath() ---\n")
i := 0
for item, err := range iterjson.Parse(data) {
    i++
    errorz.MustZ(err)

    path := item.GetRawPath()
    switch {
    case path.Match("name"),
        path.Contains("address"):
        // continue
    default:
        continue
    }

    // 👉 print with line number
    fmt.Printf("%2d %20s . %s\n", i, item.Token, item.GetPath())
}

Both examples will output:

 2              "Alice" . name
 9                    { . address
10            "The Sun" . address.city
11                10101 . address.zip
12                    } . address

7. Filtering JSON and returning a new JSON

By combining the Builder with the option SetSkipEmptyStructures(false) and the filtering logic, you can filter the JSON data and return a new JSON. See examples/07.filter_json

// 🦁Example: filter and output json
b := iterjson.NewBuilder("", "    ")
b.SetSkipEmptyStructures(true) // 👉 skip empty [] or {}
for item, err := range iterjson.Parse(data) {
    errorz.MustZ(err)
    if item.Token.IsOpen() || item.Token.IsClose() {
        b.Add(item.Key, item.Token)
        continue
    }

    path := item.GetPathString()
    switch {
    case path == "name",
        strings.Contains(path, "address"):
        // continue
    default:
        continue
    }

    b.Add(item.Key, item.Token)
}
out := errorz.Must(b.Bytes())
fmt.Printf("\n--- filter: output json ---\n%s\n----------\n", out)

This example will return a new JSON with only the filtered fields:

{
    "name": "Alice",
    "address": {
        "city": "The Sun",
        "zip": 10101
    }
}

8. Editing values

This is an example for editing values in a JSON data. Assume that we are using number ids for our API. The ids are too big and JavaScript can’t handle them. We need to convert them to strings. See examples/08.number_id and order.json.

Iterate over the JSON data, find all _id fields and convert the number ids to strings:

b := iterjson.NewBuilder("", "    ")
for item, err := range iterjson.Parse(data) {
    errorz.MustZ(err)
    key, _ := item.GetRawPath().Last().ObjectKey()
    if strings.HasSuffix(key, "_id") {
        id, err0 := item.Token.GetInt()
        if err0 == nil {
            b.Add(item.Key, strconv.Itoa(id))
            continue
        }
    }
    b.Add(item.Key, item.Token)
}
out := errorz.Must(b.Bytes())
fmt.Printf("\n--- convert number id ---\n%s\n----------\n", out)

This will add quotes to the number ids:

{
    "order_id": "12345678901234",
    "number": 12,
    "customer_id": "12345678905678",
    "items": [
        {
            "item_id": "12345678901042",
            "quantity": 1,
            "price": 123.45
        },
        {
            "item_id": "12345678901098",
            "quantity": 2,
            "price": 234.56
        }
    ]
}

How it parses the JSON data

Thanks to powerful of iterators in Go 1.23, ezpkg.io/iter.json is able to process JSON data with minimal lines of code and low overhead.

The core parser logic is contained in 2 files: scanner.go and parser.go. Here’s a brief overview of how it works:

NextToken() pulls the next RawToken from the input.
Parse() is a state machine with a stack. It pulls the next token from the input then processes it based on the current state.
RawToken() is a tagged union with a TokenType and optional raw []byte.

// RawToken represents a raw token from the scanner.
type RawToken struct {
    typ TokenType
    raw []byte
}

NextToken() pulls the next token from the input

Here’s the core logic of the NextToken() function (scanner.go):

func NextToken(in []byte) (token RawToken, remain []byte, err error) {
    in = skipSpace(in)
    if len(in) == 0 {
        return RawToken{}, nil, nil
    }
    switch in[0] {
    case '{', '}', '[', ']', ',', ':':
        typ := TokenType(in[0])
        return RawToken{typ: typ, raw: in[:1]}, in[1:], nil
    case 'n':
        return nextTokenConst(in, rNull)
    case 'f':
        return nextTokenConst(in, rFalse)
    case 't':
        return nextTokenConst(in, rTrue)
    case '"':
        return nextTokenString(in)
    default:
        return nextTokenNumber(in)
    }
}

Scan() all tokens in a single loop

The Scan() function is essentially a single loop to pull the next token from the input each time.

func Scan(in []byte) iter.Seq2[RawToken, error] {
    return func(yield func(token RawToken, err error) bool) {
        remain := in
        for {
            token, rm, err := NextToken(remain)
            remain = rm
            if err != nil {
                yield(RawToken{}, err)
                return
            }
            if !yield(token, nil) {
                return
            }
            if len(remain) == 0 {
                return
            }
        }
    }
}

Parse() is a state machine with a stack

Here’s the core logic of the parser (parse.go).

It uses a stack to keep track of the current state (path, level) of the JSON data.
It pulls the next token from the input and processes it based on the current state:
- If it’s [ or {, it pushes the current state to the stack.
- If it’s ] or }, it pops the state from the stack.
- Otherwise, it parses “value” or “key: value” depending on the current state.

Here’s how it initializes the stack:

path := make([]PathItem, 1, 16)
last := &path[0]

With the implementation of PathItem:

type PathItem struct {
    Index int      // array index or object index
    Key   RawToken // object key
    Token RawToken // [ or { or } or ]
}

And the push(), pop(), advance() helper functions:

advance := func() {
    var err error
    tok = next
    next, remain, err = NextToken(remain)
    must(err)
}
push := func() {
    path = append(path, PathItem{Token: tok})
    last = &path[len(path)-1]
}
pop := func() {
    path = path[:len(path)-1]
    last = &path[len(path)-1]
}

The core state machine code is as follows. Honestly, using goto in this case is quite fun:

func Parse(in []byte) iter.Seq2[Item, error] {
    return func(yield func(Item, error) bool) {
        defer func() {/* ... */}()

        var tok, next RawToken
        remain := in
        path := make([]PathItem, 1, 16)
        last := &path[0]
        advance := func() {/* ... */}
        push := func() {/* ... */}
        pop := func() {/* ... */}
        yieldValue := func(key RawToken) bool {/* ... */}
        advance()
        advance()
    value:
        switch {
        case tok.typ == TokenArrayOpen:
            if !yieldValue(last.Key) { return }
            push()
            advance()
            if tok.typ == TokenArrayClose { goto close }
            else { goto value }
        case tok.typ == TokenObjectOpen:
            if !yieldValue(last.Key) { return }
            push()
            advance()
            if tok.typ == TokenObjectClose { goto close }
            else { goto key_value }
        case tok.IsValue():
            if !yieldValue(last.Key) { return }
            switch {
            case last.Token.typ == 0 && next.typ == 0:
                return // ✅ done
            case last.Token.typ == 0 && next.typ != 0:
                panic(/* ... */)
            default:
                advance()
                goto close
            }
        default:
            panic(/* ... */)
        }
    key_value:
        switch {
        case tok.typ == TokenString:
            last.Key = tok
            advance()
            if tok.typ == TokenColon {
                advance()
                goto value
            } else { panic(/* ... */) }
        default:
            panic(/* ... */)
        }
    close:
        switch {
        case tok.typ == TokenArrayClose:
            if last.Token.typ != TokenArrayOpen { panic(/* ... */) }
            pop()
            if !yieldValue(RawToken{}) { return }
            advance()
            if len(path) > 1 { goto close }
            else { goto end }
        case tok.typ == TokenObjectClose:
            if last.Token.typ != TokenObjectOpen { panic(/* ... */) }
            pop()
            if !yieldValue(RawToken{}) { return }
            advance()
            if len(path) > 1 { goto close }
            else { goto end    }
        case tok.typ == TokenComma:
            last.Index++
            last.Key = RawToken{}
            advance()
            switch {
            case last.Token.typ == TokenArrayOpen:
                goto value
            case last.Token.typ == TokenObjectOpen:
                goto key_value
            default:
                panic(/* ... */)
            }
        default:
            panic(/* ... */)
        }
    end:
        if tok.typ != 0 { panic(/* ... */) }
    }
}

How it builds the JSON data dynamically

Implementing a Builder with essentially only a single method Add(key any, value any) produce valid JSON is a fun challenge too!

Reconstructing JSON from RawToken

At first, let’s look at how we can construct a JSON object from RawToken without using the Builder. Here’s the simplest implementation by Reconstruct(), which produces a minified JSON:

It iterates over the Parse() result to retrieve keys and tokens.
Tokens can be [, {, ], }, or values. Note that , and : are not returned by Parse().
It writes the key and token to a buffer, adding a comma if necessary.
To correctly add commas between tokens, it needs to keep track of the last token type and call ShouldAddComma().

func Reconstruct(in []byte) ([]byte, error) {
    b := bytes.Buffer{}
    b.Grow(len(in))

    var lastTokenType TokenType
    for item, err := range Parse(in) {
        if err != nil {
            return nil, err
        }
        if ShouldAddComma(lastTokenType, item.Token.Type()) {
            b.WriteByte(',')
        }
        if item.Key.IsValue() {
            b.Write(item.Key.Raw())
            b.WriteByte(':')
        }
        b.Write(item.Token.Raw())
        lastTokenType = item.Token.Type()
    }
    return b.Bytes(), nil
}

And the ShouldAddComma() function:

Skip the comma the last token is [, {, ,, or : or the next token is ] or }.
Otherwise, add the comma.

func ShouldAddComma(lastToken, nextToken TokenType) bool {
    switch lastToken {
    case 0, TokenArrayOpen, TokenObjectOpen, TokenComma, TokenColon:
        return false
    }
    switch nextToken {
    case TokenArrayClose, TokenObjectClose:
        return false
    default:
        return true
    }
}

How to support indentation

To support indentation, we need to keep track of the current level and add the appropriate number of spaces before each line. Here’s how we can modify the Reconstruct() function to support indentation:

Add prefix and indent arguments to the function.
Add the prefix before each line.
Add the indent for each level of indentation.
Use the Level from the Parse() result to determine the indentation level.

Alternatively, we can keep track of the level ourselves by incrementing and decrementing a counter for each [,{ and ],}. We can also use a stack to keep track of the level, and the current path too.

Here’s the modified function with indentation support as Reformat():

func Reformat(in []byte, prefix, indent string) ([]byte, error) {
    b := bytes.Buffer{}
    b.Grow(len(in))

    var lastToken TokenType
    for item, err := range Parse(in) {
        if err != nil {
            return nil, err
        }
        if ShouldAddComma(lastToken, item.Token.Type()) {
            b.WriteByte(',')
        }
        if lastToken != 0 {
            b.WriteByte('\n')
        }
        b.WriteString(prefix)
        for range item.Level {
            b.WriteString(indent)
        }
        if item.Key.IsValue() {
            b.Write(item.Key.Raw())
            b.WriteString(": ")
        }
        b.Write(item.Token.Raw())
        lastToken = item.Token.Type()
    }
    return b.Bytes(), nil
}

Early implementation of Builder

So you get the idea of how Reconstruct()/Reformat() functions work. Now, let’s look at how the Builder is implemented.

It starts with a AddRaw() method to add raw tokens to the JSON data. Here’s the early implementation:

The code is basically the same as Reformat() but with a few differences:

It keeps track of the last token type, the current level, and the stack of [ and {.
To keep track of the level, it needs to switch on open or close tokens to update the stack and level.
- Instead of just a single check Key.IsValue() like in Reformat().
It writes the key and token to the buffer, adding a comma if necessary.

type Builder struct {
    bytes.Buffer
    indent string
    prefix string

    lastTok TokenType
    level   int
    stack   []TokenType // array or object
    err     error
}

func (b *Builder) AddRaw(key, token RawToken) {
    switch {
    case token.IsOpen():
        if ShouldAddComma(b.lastTok, token.Type()) {
            b.WriteByte(',')
        }
        b.writeIndent()
        b.writeKey(key)
        b.WriteByte(byte(token.Type()))
        b.lastTok = token.Type()
        b.stack = append(b.stack, token.Type())
        b.level++

    case token.IsClose():
        if key.Type() != 0 {
            b.addErrorf("unexpected key(%s) before close token(%s)", key, token.Type())
            return
        }
        if b.level <= 0 {
            b.addErrorf("unexpected close token(%s)", token.Type())
            return
        }
        b.level--
        b.stack = b.stack[:len(b.stack)-1]
        b.writeIndent()
        b.WriteByte(byte(token.Type()))
        b.lastTok = token.Type()

    case token.IsValue():
        if ShouldAddComma(b.lastTok, token.Type()) {
            b.WriteByte(',')
        }
        b.writeIndent()
        b.writeKey(key)
        b.Write(token.Raw())
        b.lastTok = token.Type()
    }
}

Twisting `Builder` code to support more use cases

As more use cases are added, the Builder code evolves and becomes more complex over time.

WriteNewline() are made public to control the position of prefix.

In the example Adding line numbers, notice that we have a b.WriteNewLine(item.Token.Type()) before the fmt.Fprintf() call.

This is because we need to control the position of the line number:

so the line number can be added after the comma and newline,
but before the key and token.

for item, err := range iterjson.Parse(data) {
    i++
    errorz.MustZ(err)
    b.WriteNewline(item.Token.Type())

    // 👉 add line number
    fmt.Fprintf(b, "%3d    ", i)
    b.Add(item.Key, item.Token)
}

WriteNewline() is optional.

If you don’t include it, the next b.Add() will automatically call it.
If you do, it will be remembered and b.Add() will just skip the call.

It’s also smart. If the Builder does not have indentation configured, it will not add any newline. It won’t add first newline, or double newlines too.

This way, the API becomes more flexible: easy to use while still allowing for more advanced use cases.

func (b *Builder) WriteNewline(next TokenType) {
    b.WriteComma(next)
    if b.prefix == "" && b.indent == "" {
        return
    }
    if b.lastNewline {
        return
    }
    if b.lastTok != 0 {
        b.writeByte('\n')
        b.lastNewline = true
    }
}

And WriteComma() for comments.

The same goes for WriteComma(). It’s used in the Adding comments example to control the position of the comment.

Like WriteNewline(), WriteComma() is optional and smart. It will add a comma only when necessary, to always produce valid JSON.
So if you call Add("", TokenObjectOpen) then WriteComma(), it won’t actually add any comma, otherwise the JSON will become invalid.

for item, err := range iterjson.Parse(data) {
    errorz.MustZ(err)
    // 👉 add a comma before the comment
    b.WriteComma(item.Token.Type()) 

    // 👉 then add the comment
    if i > 0 {
        // extra logic to control spaces and align comments
        length := b.Len() - newlineIdx
        fmt.Fprint(b, strings.Repeat(" ", maxIdx-length))
        fmt.Fprintf(b, "// %2d", i)
    }
    i++

    // 👉 and a newline after the comment
    b.WriteNewline(item.Token.Type())
    newlineIdx = b.Len() // save the newline index

    // 👉 finally, add the key and token
    b.Add(item.Key, item.Token)
}

SetSkipEmptyStructures(true) to ignore empty structures.

In the Filtering JSON and returning a new JSON example, we use SetSkipEmptyStructures(true) to ignore empty structures.

Without this option, the Builder will add empty {} or [] to the output JSON. Try removing it and the output will become:

{
    "name": "Alice",
    "scores": [
    ],
    "address": {
        "city": "The Sun",
        "zip": 10101
    }
}

Notice the empty [] for the scores field. But how does it work?

To make it work, the Builder should not write the "scores": [ immediately when receiving the [ token, because it doesn’t know if there are any items inside.
Instead, it writes to an alternative buffer. And save a snapshot of the current state.
The next time a new token is Added, if it’s empty ], the alternative buffer will be cleared, the previous snapshot will get restored, and the empty "scores" field will be skipped.
Otherwise, it switches back to the main buffer, including the content of the alternative buffer.
This way, the Builder can skip empty structures with minimal overhead.

func (b *Builder) SetSkipEmptyStructures(skip bool) {
    b.skipEmptyStructures = skip
    if skip {
        b.useAltBuf = true
        if b.altBuf == nil {
            b.altBuf = make([]byte, 0, 64)
        }
    } else {
        b.switchBuf()
    }
}

And here’s the updated implementation of the add() method with switchAltBuf() and restore(snapshot):

func (b *Builder) add(key any, tokType TokenType, raw []byte, value any) {
    switch {
    case tokType.IsOpen():
        snapshot := b.snapshot()
        b.switchAltBuf()
        b.WriteNewline(tokType)
        b.WriteIndent()
        b.writeKey(key)
        b.writeByte(byte(tokType))
        b.push(tokType, snapshot)
        b.setLastToken(tokType)

    case tokType.IsClose():
        if isValidKey(key) {
            b.addErrorf("unexpected key(%s) before close token(%s)", key, tokType)
            return
        }
        snapshot, ok := b.pop()
        if !ok {
            b.addErrorf("unexpected close token(%s)", tokType)
            return
        }
        if b.skipEmptyStructures && b.lastTok.IsOpen() {
            b.restore(snapshot)
        } else {
            b.WriteNewline(tokType)
            b.WriteIndent()
            b.writeByte(byte(tokType))
            b.setLastToken(tokType)
        }
    // ...
}

All tests are passing

The package starts with good testing in mind:

It includes tests for most core functions and many edge cases: scanner_test.go, parser_test.go, reconstruct_test.go.
Test data comes from various sources: serde-rs, jsonchecker, fastjson, rapidjson.

It’s the first release, so there’s still room for improvement, like fuzzing or benchmark. But the tests are a good starting point for ensuring the package’s core logic working as expected.

Missing features and future work

This is just the beginning. There are many more features and improvements that can be added to the package:

Query complex values: like arrays of objects, nested objects, etc.
Support reader/writer: to handle large JSON data.
Support JSONL: to handle line-delimited JSON.
Support ProtoBuf JSON: to handle JSON data from ProtoBuf.
Easy to use API: to handle common use cases like filtering, transforming, etc.
More examples: to show how to use the package in real-world scenarios.
Optimize, benchmark, and fuzz: to ensure the package is efficient and reliable.
And many more…

If you have any ideas or suggestions, feel free to open an issue or pull request. I’d love to hear your feedback and help support your use cases!

Conclusion

The ezpkg.io/iter.json package empowers Go developers to handle JSON data with precision and efficiency. Whether you need to iterate through complex JSON structures, build new JSON objects dynamically, format or minify data, filter specific fields, or even transform values, iter.json offers a flexible and powerful solution.

I’m excited to share this package with the community as a tool for effective JSON manipulation without the need for fully parsing the data. While it’s still in early development and there’s room for more features, it already works well for many common use cases.

If you have specific requirements or ideas for improvement, feel free to reach out — I’d love to hear your feedback and help support your use cases! 🥳

If you like the post, subscribe to my newsletter to get latest updates:

Author

I'm Oliver Nguyen. A software maker working mostly in Go and JavaScript. I enjoy learning and seeing a better version of myself each day. Occasionally spin off new open source projects. Share knowledge and thoughts during my journey. Connect with me on , , , , or subscribe to my posts.