Errors, Errors Everywhere: How We Centralized and Structured Error Handling

2024-12-06

· go

Handling errors in Go is simple and flexible – yet no structure!

It’s supposed to be simple, right? Just return an error , wrapped with a message, and move on. Well, that simplicity quickly turns into chaotic as our codebase grows with more packages, more developers, and more “quick fixes” that stay there forever. Over time, the logs are full of “failed to do this” and “unexpected that”, and nobody knows if it’s the user’s fault, the server’s fault, buggy code, or it’s just a misalignment of the stars!

Errors are created with inconsistent messages. Each package has it own set of styles, constants, or custom error types. Error codes are added arbitrarily. No easy way to tell which errors may be returned from which function without digging into its implementation!

So, I took the challenge of creating a new error framework. We decided to go with a structured, centralized system using namespace codes to make errors meaningful, traceable, and – most importantly – give us peace of mind!

This is the story of how we started with a simple error handling approach, got thoroughly frustrated as the problems grew, and eventually built our own error framework. The design decisions, how it’s implemented, the lessons learned, and why it transformed our approach to managing errors. I hope that it will bring some ideas for you too!

Go errors are just values

Go has a straightforward way to handle errors: errors are just values. An error is just a value that implements the error interface with a single method Error() string. Instead of throwing an exception and disrupting the current execution flow, Go functions return an error value alongside other results. The caller can then decide how to handle it: check its value to make decision, wrap with new messages and context, or simply return the error, leaving the handling logic for parent callers.

We can make any type an error by adding the Error() string method on it. This flexibility allows each package to define its own error-handling strategy, and choose whatever works best for them. This also integrates well with Go’s philosophy of composability, making it easy to wrap, extend, or customize errors as required.

Every package needs to deal with errors

The common practice is to return an error value that implements the error interface and lets the caller decide what to do next. Here’s a typical example:

func loadCredentials() (Credentials, error) {
    data, err := os.ReadFile("cred.json")
    if errors.Is(err, os.ErrNotExist) {
        return nil, fmt.Errorf("file not found: %w", err)
    }
    if err != nil {
        return nil, fmt.Errorf("failed to read file: %w", err)
    }
    cred, err := verifyCredentials(cred);
    if err != nil {
        return nil, fmt.Errorf("invalid credentials: %w", err)
    }
    return cred, nil
}

Go provides a handful of utilities for working with errors:

Creating errors: errors.New() and fmt.Errorf() for generating simple errors.
Wrapping errors: Wrap errors with additional context using fmt.Errorf() and the %w verb.
Combining errors: errors.Join() merges multiple errors into a single one.
Checking and handling errors: errors.Is() matches an error with a specific value, errors.As() matches an error to a specific type, and errors.Unwrap() retrieves the underlying error.

In practice, we usually see these patterns:

Using standard packages: Returning simple errors with errors.New() or fmt.Errorf().
Exporting constants or variables: For instance, go-redis and gorm.io define reusable error variables.
Custom error types: Libraries like lib/pq or grpc/status.Error create specialized error types, often with associated codes for additional context.
Error interfaces with implementations: The aws-sdk-go uses an interface-based approach to define error types with various implementations.
Or multiple interfaces: Like Docker’s errdefs, which defines multiple interfaces to classify and manage errors.

We started with a common approach

In the early days, like many Go developers, we followed Go’s common practices and kept error handling minimal yet functional. It worked well enough for a couple of years.

Include stacktrace using pkg/errors, a popular package at that time.
Export constants or variables for package-specific errors.
Use errors.Is() to check for specific errors.
Wrap errors with a new messages and context.
For API errors, we define error types and codes with Protobuf enum.

Including stacktrace with pkg/errors

We used pkg/errors, a popular error-handling package at the time, to include stacktrace in our errors. This was particularly helpful for debugging, as it allowed us to trace the origin of errors across different parts of the application.

To create, wrap, and propagate errors with stacktrace, we implemented functions like Newf(), NewValuef(), and Wrapf(). Here’s an example of our early implementation:

type xError struct {
    msg message,
    stack: callers(),
}

func Newf(msg string, args ...any) error {
    return &xError{  
        msg:   fmt.Sprintf(msg, args...),  
        stack: callers(),  // 👈 stacktrace
    }
}
func NewValuef(msg string, args ...any) error {
    return fmt.Errorf(msg, args...)  // 👈 no stacktrace
}
func Wrapf(err error, msg string, args ...any) error {
    if err == nil { return nil }
    stack := getStack(err)
    if stack == nil { stack = callers() }
    return &xError{
        msg:   fmt.Sprintf(msg, args...),
        stack: stack,
    }
}

Exporting error variables

Each package in our codebase defined its own error variables, often with inconsistent styles.

package database

var ErrNotFound = errors.NewValue("record not found")
var ErrMultipleFound = errors.NewValue("multiple records found")    
var ErrTimeout = errors.NewValue("request timeout")

package profile

var ErrUserNotFound = errors.NewValue("user not found")
var ErrBusinessNotFound = errors.NewValue("business not found")     
var ErrContextCancel = errors.NewValue("context canceled")

Checking errors with errors.Is() and wrapping with additional context

res, err := repo.QueryUser(ctx, req)
switch {
    case err == nil:
        // continue
    case errors.Is(database.NotFound):
        return nil, errors.Wrapf(ErrUserNotFound, "user not found (id=%v)", req.UserID)    
    default:
        return nil, errors.Wrapf(ctx, "failed to query user (id=%v)", req.UserID)
}

This helped propagate errors with more detail but often resulted in verbosity, duplication, and less clarity in logs:

internal server error: failed to query user: user not found (id=52a0a433-3922-48bd-a7ac-35dd8972dfe5): record not found: not found

Defining external errors with Protobuf

For external-facing APIs, we adopted a Protobuf-based error model inspired by Meta’s Graph API:

message Error {
    string message = 1;
    ErrorType type = 2;
    ErrorCode code = 3;

    string user_title   = 4;
    string user_message = 5;
    string trace_id     = 6;
    
    map<string, string> details = 7;
}
enum ErrorType {
    ERROR_TYPE_UNSPECIFIED = 1;
    ERROR_TYPE_AUTHENTICATION = 2;
    ERROR_TYPE_INVALID_REQUEST = 3;
    ERROR_TYPE_RATE_LIMIT = 4;
    ERROR_TYPE_BUSINESS_LIMIT = 5;
    ERROR_TYPE_WEBHOOK_DELIVERY = 6;
}
enum ErrorCode {
    ERROR_CODE_UNSPECIFIED = 1 [(error_type = UNSPECIFIED)];
    ERROR_CODE_UNAUTHENTICATED = 2 [(error_type = AUTHENTICATION)];
    ERROR_CODE_CAMPAIGN_NOT_FOUND = 3 [(error_type = NOT_FOUND)];
    ERROR_CODE_META_CHOSE_NOT_TO_DELIVER = 4 /* ... */;
    ERROR_CODE_MESSAGE_WABA_TEMPLATE_CAN_ONLY_EDIT_ONCE_IN_24_HOURS = 5;    
}

This approach helped structure errors, but over time, error types and codes were added without a clear plan, leading to inconsistencies and duplication.

And problems grew over time

Errors were declared everywhere

Each package defined its own error constants with no centralized system.
Constants and messages were scattered across the codebase, making it unclear which errors a function might return – ugh, is it gorm.ErrRecordNotFound or user.ErrNotFound or both?

Random error wrapping led to inconsistent and arbitrary logs

Many functions wrapped errors with arbitrary, inconsistent messages without declaring their own error types.
Logs were verbose, redundant, and difficult to search or monitor.
Error messages were generic and often didn’t explain what went wrong or how it happened. Also brittle and prone to unnoticed changes.

unexpected gorm error: failed to find business channel: error received when invoking API: unexpected: context canceled

No standardization led to improper error handling

Each package handled errors differently, making it hard to know if a function returned, wrapped, or transformed errors.
Context was often lost as errors propagated.
Upper layers received vague 500 Internal Server Errors without clear root causes.

No categorization made monitoring impossible

Errors weren’t classified by severity or behavior: A context.Canceled error may be a normal behavior when the user closes the browser tab, but it’s important if the request is canceled because that query is randomly slow.
Important issues were buried under noisy logs, making them hard to identify.
Without categorization, it was impossible to monitor error frequency, severity, or impact effectively.

It’s time to centralize error handling

Back to the drawing board

To address the growing challenges, we decided to build a better error strategy around the core idea of centralized and structured error codes.

Errors are declared everywhere → Centralize error declaration in a single place for better organization and traceability.
Inconsistent and arbitrary logs → Structured error codes with clear and consistent formatting.
Improper error handling → Standardize error creation and checking on the new Error type with a comprehensive set of helpers.
No categorization → Categorize error codes with tags for effective monitoring through logs and metrics.

Design decisions

All error codes are defined at a centralized place with namespace structure.

Use namespaces to create clear, meaningful, and extendable error codes. Example:

PRFL.USR.NOT_FOUND for “User not found.”
FLD.NOT_FOUND for “Flow document not found.”
Both can share an underlying base code DEPS.PG.NOT_FOUND, meaning “Record not found in PostgreSQL.”

Each layer of service or library must only return its own namespace codes.

Each layer of service, repository, or library declares its own set of error codes.
When a layer receives an error from a dependency, it must wrap it with its own namespace code before returning it.
For example: When receiving an error gorm.ErrRecordNotFound from a dependency, the “database” package must wrap it as DEPS.PG.NOT_FOUND. Later, the “profile/user” service must wrap it again as PRFL.USR.NOT_FOUND.

All errors must implement the Error interface.

This creates a clear boundary between errors from third-party libraries (error) and our internal Errors.
This also helps for migration progress, to separate between migrated packages and not-yet-migrated ones.

An error can wrap one or multiple errors. Together, they form a tree.

[FLD.INVALID_ARGUMENT] invalid argument
 → [TPL.INVALID_PARAMS] invalid input params
  1. [TPL.PARAM.EMPTY] name can not be empty
  2. [TPL.PARAM.MALFORM] invalid format for param[2]

Always require context.Context. Can attach context to the error.

Many times we saw logs with standalone errors with no context, no trace_id, and have no idea where it comes from.
Can attach additional key/value to errors, which can be used in logs or monitoring.

When errors are sent across service boundary, only the top-level error code is exposed.

The callers do not need to see the internal implementation details of that service.

For external errors, keep using the current Protobuf ErrorCode and ErrorType.

This ensures backward compatibility, so our clients don’t need to rewrite their code.

Automap namespace error codes to Protobuf codes, HTTP status codes, and tags.

Engineers define the mapping in the centralized place, and the framework will map each error code to the corresponding Protobuf ErrorCode, ErrorType, gRPC status, HTTP status, and tags for logging/metrics.
This ensures consistency and reduces duplication.

The namespace error framework

Core packages and types

There are a few core packages that form the foundation of our new error-handling framework.

connectly.ai/go/pkgs/

errors: The main package that defines the Error type and codes.
errors/api: For sending errors to the front-end or external API.
errors/E: Helper package intended to be used with dot import.
testing: Testing utilities for working with namespace errors.

Error and Code

The Error interface is an extension of the standard error interface, with additional methods to return a Code. A Code is implemented as an uint16.

package errors // import "connectly.ai/go/pkgs/errors"

type Error interface {
    error
    Code() Code
}
type Code struct {
    code uint16
}
type CodeI interface {
    CodeDesc() CodeDesc
}
type GroupI interface { /* ... */ }
type CodeDesc struct { /* ... */ }

Package errors/E exports all error codes and common types

package E // import "connectly.ai/go/pkgs/errors/E"

import "connectly.ai/go/pkgs/errors"

type Error = errors.Error

var (
    DEPS = errors.DEPS
    PRFL = errors.PRFL
)

func MapError(ctx context.Context, err error) errors.Mapper { /* ... */ }    
func IsErrorCode(err error, codes ...errors.CodeI) { /* ... */ }
func IsErrorGroup(err error, groups ...errors.GroupI) { /* ... */ }

Example usage

Example error codes:

// dependencies → postgres
DEPS.PG.NOT_FOUND
DEPS.PG.UNEXPECTED

// sdk → hash
SDK.HASH.UNEXPECTED

// profile → user
PRFL.USR.NOT_FOUND
PFRL.USR.UNKNOWN

// profile → user → repository
PRFL.USR.REPO.NOT_FOUND
PRFL.USR.REPO.UNKNOWN

// profile → auth
PRFL.AUTH.UNAUTHENTICATED
PRFL.AUTH.UNKNOWN
PRFL.AUTH.UNEXPECTED

Package database:

package database // import "connectly.ai/go/pkgs/database"

import "gorm.io/gorm"
import . "connectly.ai/go/pkgs/errors/E"

type DB struct { gorm: gorm.DB }

func (d *DB) Exec(ctx context.Context, sql string, params ...any) *DB {
    tx := d.gorm.WithContext(ctx).Exec(sql, params...)
    return wrapTx(tx)
}
func (x *DB) Error(msgArgs ...any) Error {
    return wrapError(tx.Error())  // 👈 convert gorm error to 'Error'
}
func (x *DB) SingleRowError(msgArgs ...any) Error {
    if err := x.Error(); err != nil { return err }
    switch {
    case x.RowsAffected == 1: return nil
    case x.RowsAffected == 0:
        return DEPS.PG.NOT_FOUND.CallerSkip(1).
            New(x.Context(), formatMsgArgs(msgArgs))
    default:
        return DEPS.PG.UNEXPECTED.CallerSkip(1).
            New(x.Context(), formatMsgArgs(msgArgs))
    }
}

Package pb/services/profile:

package profile // import "connectly.ai/pb/services/profile"

// these types are generated from services/profile.proto
type QueryUserRequest struct {
    BusinessId string
    UserId     string
}
type LoginRequest struct {
    Username string
    Password string
}

Package service/profile:

package profile

import uuid "github.com/google/uuid"
import . "connectly.ai/go/pkgs/errors/E"
import l "connectly.ai/go/pkgs/logging/l"
import profilepb "connectly.ai/pb/services/profile"

// repository requests
type QueryUserByUsernameRequest struct {
    Username string
}

// repository layer → query user
func (r *UserRepository) QueryUserByUsernameAuth(
    ctx context.Context, req *QueryUserByUsernameRequest,
) (*User, Error) {
    if req.Username == "" {
        return PRFL.USR.REPO.INVALID_ARGUMENT.New(ctx, "empty request")
    }

    var user User
    sqlQuery := `SELECT * FROM "user" WHERE username = ? LIMIT 1`
    tx := r.db.Exec(ctx, sqlQuery, req.Username).Scan(&user)     
    
    err := tx.SingleRowError()
    switch {
    case err == nil:
        return &user, nil
        
    case IsErrorCode(DEPS.PG.NOT_FOUND):
        return PRFL.USR.REPO.USER_NOT_FOUND.
            With(l.String("username", req.Username))
            Wrap(ctx, "user not found")
        
    default:
        return PRFL.USR.REPO.UNKNOWN.
            Wrap(ctx, "failed to query user")
    }
}

// user service layer → query user
func (u *UserService) QueryUser(
    ctx context.Context, req *profilepb.QueryUserRequest,
) (*profilepb.QueryUserResponse, Error) {
    // ...
    rr := QueryUserByUsernameRequest{ Username: req.Username }
    err := u.repo.QueryUserByUsername(ctx, rr)
    if err != nil {
        return nil, MapError(ctx, err).
            Map(PRFL.USR.REPO.NOT_FOUND, PRFL.USR.NOT_FOUND, 
                "the user %q cannot be found", req.UserName,
                api.UserTitle("User Not Found"),
                api.UserMsg("The requested user id %q can not be found", req.UserId)).    
            KeepGroup(PRFL.USR).    
            Default(PRFL.USR.UNKNOWN, "failed to query user")
    }
    // ...
    return resp, nil
}

// auth service layer → login user
func (a *AuthService) Login(
    ctx context.Context, req *profilepb.LoginRequest,
) (*profilepb.LoginResponse, *profilepb.LoginResponse, Error) {    

    vl := PRFL.AUTH.INVALID_ARGUMENT.WithMsg("invalid request")
    vl.Vl(req.Username != "", "no username", api.Detail("username is required"))
    vl.Vl(req.Password != "", "no password", api.Detail("password is required"))
    if err := vl.ToError(ctx); err != nil {
        return err
    }

    hashpwd, err := hash.Hash(req.Password)
    if err != nil {
        return PRFL.AUTH.UNEXPECTED.Wrap(ctx, err, "failed to calc hash")    
    }

    usrReq := profilepb.QueryUserByUsernameRequest{/*...*/}
    usrRes, err := a.userServiceClient.QueryUserByUsername(ctx, usrReq)
    if err != nil {
        return nil, MapError(ctx, err).
            Map(PRFL.USR.NOT_FOUND, PRFL.AUTH.UNAUTHENTICATED, "unauthenticated").
            Default(PRFL.AUTH.UNKNOWN, "failed to query by username")
    }
    // ...
}

Well, there are a lot of new functions and concepts in the above code. Let’s go through them step by step.

Creating and wrapping errors

First, import package errors/E using dot import

This will allow you to directly use common types like Error instead of errors.Error and access to codes by PRFL.USR.NOT_FOUND instead of errors.PRFL.USR.NOT_FOUND.

import . "connectly.ai/go/pkgs/errors/E"

Create new errors using CODE.New()

Suppose you get an invalid request, you can create a new error by:

err := PRFL.USR.INVALID_ARGUMENT.New(ctx, "invalid request")

PRFL.USR.INVALID_ARGUMENT is a Code.
A Code exposes methods like New() or Wrap() for creating a new error.
The New() function receives context.Context as the first argument, followed by message and optional arguments.

Print it with fmt.Print(err):

[PRFL.USR.INVALID_ARGUMENT] invalid request

or with fmt.Printf("%+v") to see more details:

[PRFL.USR.INVALID_ARGUMENT] invalid request

connectly.ai/go/services/profile.(*UserService).QueryUser
    /usr/i/src/go/services/profile/user.go:1234
connectly.ai/go/services/profile.(*UserRepository).QueryUser
    /usr/i/src/go/services/profile/repo/user.go:2341

Wrap an error within a new error using CODE.Wrap()

dbErr := DEPS.PG.NOT_FOUND.Wrap(ctx, gorm.ErrRecordNotFound, "not found")
usrErr := PRFL.USR.NOT_FOUND.Wrap(ctx, dbErr, "user not found")

will produce this output with fmt.Print(usrErr):

[PRFL.USR.NOT_FOUND] user not found → [DEPS.PG.NOT_FOUND] not found → record not found

or with fmt.Printf("%+v", usrErr)

[PRFL.USR.NOT_FOUND] user not found
  → [DEPS.PG.NOT_FOUND] not found
      → record not found

connectly.ai/go/services/profile.(*UserService).QueryUser
    /usr/i/src/go/services/profile/user.go:1234

The stacktrace will come from the innermost Error. If you are writing a helper function, you can use CallerSkip(skip) to skip frames:

func mapUserError(ctx context.Context, err error) Error {
    switch {
    case IsErrorCode(err, DEPS.PG.NOT_FOUND):
        return PRFL.USR.NOT_FOUND.CallerSkip(1).Wrap(ctx, err, "...")
    default:
        return PRFL.USR.UNKNOWN.CallerSkip(1).Wrap(ctx, err, "...")
    }
}

Adding context to errors

Add context to an error using With()

You can add additional key/value pairs to errors by .With(l.String(...)).
logging/l is a helper package to export sugar functions for logging.
l.String("flag", flag) return a Tag{String: flag} and l.UUID("user_id, userID) return Tag{Stringer: userID}.

import l "connectly.ai/go/pkgs/logging/l"

usrErr := PRFL.USR.NOT_FOUND.
    With(l.UUID("user_id", req.UserID), l.String("flag", flag)).
    Wrap(ctx, dbErr, "user not found")

The tags can be output with fmt.Printf("%+v", usrErr):

[PRFL.USR.NOT_FOUND] user not found
{"user_id": "81febc07-5c06-4e01-8f9d-995bdc2e0a9a", "flag": "ABRW"}
  → [DEPS.PG.NOT_FOUND] not found
    {"a number": 42}
      → record not found

Add context to errors directly inside New(), Wrap(), or MapError():

By leverage l.String() function and its family, New() and similar functions can smartly detect tags among formatting arguments. No need to introduce different functions.

err := INF.HEALTH.NOT_READY.New(ctx, 
    "service %q is not ready (retried %v times)", 
    req.ServiceName, 
    l.String("flag", flag)
    countRetries,
    l.Number("count", countRetries),
)

will output:

[INF.HEALTH.NOT_READY] service "magic" is not ready (retried 2 times)    
{"flag": "ABRW", "count": 2}

Different types: `Error0`, `VlError`, `ApiError`

Currently, there are 3 types that implements the Error interfaces. You can add more types if necessary. Each one can have different structure, with custom methods for specific needs.

Error is an extension of Go’s standard error interface

type Error interface {
    error
    Code()
    Message()
    Fields() []tags.Field
    StackTrace() stacktrace.StackTrace

    _base() *base // a private method
}

It contains a private method to ensure that we don’t accidentally implement new Error types outside of the errors package. We may (or may not) lift that restriction in the future when we experience with more usage patterns.

Why don’t we just use the standard error interface and use type assertion?

Because we want to separate between third-party errors and our internal errors. All layers and packages in our internal codes must always return Error. This way we can safely know when we have to convert third-party errors, and when we only need to deal with our internal error codes.

It also creates a boundary between migrated packages and not-yet-migrated packages. Back to reality, we cannot just declare a new type, wave a magic wand, whisper a ~~spell~~ prompt, and then all millions lines of code are magically converted and work seamlessly with no bugs! No, that future is not here yet. It may come someday, but for now, we still have to migrate our packages one by one.

Error0 is the default Error type

Most error codes will produce an Error0 value. It contains a base and an optional sub-error. You can use NewX() to return a concrete *Error0 struct instead of an Error interface, but you need to be careful.

type Error0 struct {
    base  
    err error
}

var errA:  Error  = DEPS.PG.NOT_FOUND.New (ctx, "not found")
var errB: *Error0 = DEPS.PG.NOT_FOUND.NewX(ctx, "not found")

base is the common structure shared by all Error implementation to provide common functionality: Code(), Message(), StackTrace(), Fields(), and more.

type base struct {
    code  Code
    msg   string
    kv    []tags.Field
    stack stacktrace.StackTrace
}

VlError is for validation errors

It can contain multiple sub-errors, and provide nice methods to work with validation helpers.

type VlError struct {
    base
    errs []error
}

You can create a VlError similar to other Error:

err := PRFL.USR.INVALID_ARGUMENT.New(ctx, "invalid request")

Or make a VlBuilder, add errors to it, then convert it to a VlError:

userID, err0 := parseUUID(req.UserId)
err1 := validatePassword(req.Password)

vl := PRFL.USR.INVALID_ARGUMENT.WithMsg("invalid request")
vl.Add(err0, err1)
vlErr := vl.ToError(ctx)

And include key/value pairs as usual:

vl := PRFL.USR.INVALID_ARGUMENT.
    With(l.Bool("testingenv", true)).
    WithMsg("invalid request")
userID, err0 := parseUUID(req.UserId)
err1 := validatePassword(req.Password)

vl.Add(err0, err1)
vlErr := vl.ToError(ctx, l.String("user_id", req.UserId))

Using fmt.Printf("%+v", vlErr) will output:

[PRFL.USR.INVALID_ARGUMENT] invalid request
{"testingenv": true, "user_id": "A1234567890"}

ApiError is an adapter for migrating API errors

Previously, we used a separate api.Error struct for returning API errors to the front-end and external clients. It includes ErrorType as ErrorCode as mentioned before.

package api
import errorpb "connectly.ai/pb/models/error"

// Deprecated
type Error struct {
    pbType errorpb.ErrorType
    pbCode errorpb.ErrorCode

    cause    error
    msg      string
    usrMsg   string
    usrTitle string
    // ...
}

This type is now deprecated. Instead, we will declare all the mapping (ErrorType, ErrorCode, gRPC code, HTTP code) in a centralize place, and convert them at corresponding boundaries. I will discuss about code declaration in the next section.

To do the migration to the new namespace error framework, we added a temporary namespace ZZZ.API_TODO. Every ErrorCode becomes a ZZZ.API_TODO code.

ZZZ.API_TODO.UNEXPECTED
ZZZ.API_TODO.INVALID_REQUEST
ZZZ.API_TODO.USERNAME_
ZZZ.API_TODO.META_CHOSE_NOT_TO_DELIVER
ZZZ.API_TODO.MESSAGE_WABA_TEMPLATE_CAN_ONLY_EDIT_ONCE_IN_24_HOURS

And ApiError is created as an adapter. All functions that previously return *api.Error were changed to return Error (implemented by *ApiError) instead.

package api
import . "connectly.ai/go/pkgs/errors/E"

// previous
func FailPreconditionf(err error, msg string, args ...any) *Error {
    return &Error{
        pbType: ERROR_TYPE_FAILED_PRECONDITION,
        pbCode: ERROR_CODE_MESSAGE_WABA_TEMPLATE_CAN_ONLY_EDIT_ONCE_IN_24_HOURS,    
        cause: err,
        msg: fmt.Sprintf(msg, args...)
    }
} 

// current: this is deprecated, and serves and an adapter
func FailPreconditionf(err error, msg string, args ...any) *Error {
    ctx := context.TODO()
    return ZZZ.API_TODO.MESSAGE_WABA_TEMPLATE_CAN_ONLY_EDIT_ONCE_IN_24_HOURS.
        CallerSkip(1).   // correct the stacktrace by 1 frame
        Wrap(ctx, err, msg, args...)
}

When all the migration is done, the previous usage:

wabaErr := verifyWabaTemplateStatus(tpl)
apiErr := api.FailPreconditionf(wabaErr, "template cannot be edited").
    WithErrorCode(ERROR_CODE_MESSAGE_WABA_TEMPLATE_CAN_ONLY_EDIT_ONCE_IN_24_HOURS).    
    WithUserMsg("According to WhatsApp, the message template can be only edited once in 24 hours. Consider creating a new message template instead.").
    ErrorOrNil()

should become:

CPG.TPL.EDIT_ONCE_IN_24_HOURS.Wrap(
    wabaErr, "template cannot be edited",
    api.UserMsg("According to WhatsApp, the message template can be only edited once in 24 hours. Consider creating a new message template instead."))

Notice that the ErrorCode is implicitly derived from the internal namespace code. No need to explicitly assign it every time. But how to declare the relationship between codes? It will be explained in the next section.

Declaring new error codes

At this point, you already know how to create new errors from existing codes. It’s time to explain about codes and how to add a new one.

A Code is implemented as an uint16 value, which has a corresponding string presentation.

type Code struct { code: uint16 }

fmt.Printf("%q", DEPS.PG.NOT_FOUND)
// "DEPS.PG.NOT_FOUND"

To store those strings, there is an array of all available CodeDesc:

const MaxCode = 321 // 👈 this value is generated
var   allCodes [MaxCode]CodeDesc

type CodeDesc {
    c    int     // 42
    code string  // DEPS.PG.NOT_FOUND
    api APICodeDesc
}
type APICodeDesc {
    ErrorType   errorpb.ErrorType
    ErrorCode   errorpb.ErrorCode
    HttpCode    int
    DefMessage  string
    UserMessage string
    UserTitle   string
}

Here’s how codes are declared:

var DEPS deps  // dependencies
var PRFL prfl  // profile
var FLD  fld   // flow document

type deps struct {
    PG pg      // postgres
    RD rd      // redis
}
// tag:postgres
type pg struct {
    NOT_FOUND   Code0 // record not found
    CONFLICT    Code0 // record already exist
    MALFORM_SQL Code0
}
// tag:profile
type PRFL struct {
    REPO prfl_repo
    USR  usr
    AUTH auth
}
// tag:profile
type prfl_repo struct {
    NOT_FOUND        Code0  // internal error code
    INVALID_ARGUMENT VlCode // internal error code
}
// tag:usr
type usr struct {
    NOT_FOUND        Code0  `api-code:"USER_NOT_FOUND"`
    INVALID_ARGUMENT VlCode `api-code:"INVALID_ARGUMENT"`
    DISABlED_ACCOUNT Code0  `api-code:"DISABLED_ACCOUNT"`
}
// tag:auth
type auth struct {
    UNAUTHENTICATED   Code0 `api-code:"UNAUTHENTICATED"`
    PERMISSION_DENIED Code0 `api-code:"PERMISSION_DENIED"`
}

After declaring new codes, you need to run the generation script:

run gen-errors

The generated code will look like this:

// Code generated by error-codes. DO NOT EDIT.

func init() {
    // ...
    PRFL.AUTH.UNAUTHENTICATED = Code0{Code{code: 143}}  
    PRFL.AUTH.PERMISSION_DENIED = Code0{Code{code: 144}}
    
    // ...
    allCodes[143] = CodeDesc{
        c: 143, code: "PRFL.AUTH.UNAUTHENTICATED",
        tags: []string{"auth", "profile"},  
        api: APICodeDesc{  
          ErrorType:      ERROR_TYPE_UNAUTHENTICATED,  
          ErrorCode:      ERROR_CODE_UNAUTHENTICATED,              
          HTTPCode:       401,  
          DefMessage:     "Unauthenticated error",        
          UserMessage:    "You are not authenticated.",   
          UserTitle:      "Unauthenticated error",        
       }))
}

Each Error type has a corresponding Code type

Ever wonder how PRFL.USR.NOT_FOUND.New() creates an *Error0 while PRFL.USR.INVALID_ARGUMENTS.New() creates an *VlError? It’s because they use different code types.

And each Code type returns different Error type, each can have its own extra methods:

type Code0  struct { Code }
type VlCode struct { Code }

func (c Code0) New(/*...*/) Error {
    return &Error0{/*...*/}
}
func (c VlCode) New(/*...*/) Error {
    return &VlError{/*...*/}
}

// extra methods on VlCode to create VlBuilder
func (c VlCode) WithMsg(msg string, args ...any) *VlBuilder {/*...*/}    

type VlBuilder struct {
    code VlCode
    msg  string
    args []any
}
func (b *VlBuilder) ToError(/*...*/) Error {
    return &VlError{Code: code, /*...*/ }
}

Use api-code to mark the codes available for external API

The namespace error code should be used internally.
To make a code available for returning in external HTTP API, you need to mark it with api-code. The value is the corresponding errorpb.ErrorCode.
If an error code is not marked with api-code, it’s internal code and will be shown as a generic Internal Server Error.
Notice that PRFL.USR.NOT_FOUND is external code, while PRFL.USR.REPO.NOT_FOUND is internal code.

Declare mapping between ErrorCode, ErrorType, and gRPC/HTTP codes in protobuf using enum option:

// error/type.proto
ERROR_TYPE_PERMISSION_DENIED = 707 [(error_type_detail_option) = {  
    type: "PermissionDeniedError",  
    grpc_code: PERMISSION_DENIED,  
    http_code: 403,  // Forbidden  
    message: "permission denied",  
    user_title: "Permission denied",  
    user_message: "The caller does not have permission to execute the specified operation.",  
}];

// error/code.proto
ERROR_CODE_DISABlED_ACCOUNT = 70020 [(error_code_detail_option) = {
    error_type: ERROR_TYPE_DISABlED_ACCOUNT,
    grpc_code: PERMISSION_DENIED,
    http_code: 403,  // Forbidden
    message: "account is disabled",  
    user_title: "Account is disabled",  
    user_message: "Your account is disabled. Please contact support for more information.",  
}];

UNEXPECTED and UNKNOWN codes

Each layer usually has 2 generic codes UNEXPECTED and UNKNOWN. They serve slightly different purposes:

UNEXPECTED code is used for errors that should never happen.
UNKNOWN code is used for errors that are not explicitly handled.

Mapping errors to new code

When receiving an error returned from a function, you need to handle it: convert third-party errors to internal namespace errors and map error codes from inner layers to outer layers.

Convert third-party errors to internal namespace errors

How you handle errors depends on: what the third-party package returns and what your application needs. For example, when handling database or external API errors:

switch {
case errors.Is(err, sql.ErrNoRows):
    // map a database "no rows" error to an internal "not found" error   
    return nil, PRFL.USR.NOT_FOUND.Wrap(ctx, err, "user not found")

case errors.Is(err, context.DeadlineExceeded):
    // map a context deadline exceeded error to a timeout error
    return nil, PRFL.USR.TIMEOUT.Wrap(ctx, err, "query timeout")

default:
    // wrap any other error as unknown
    return nil, PRFL.USR.UNKNOWN.Wrap(ctx, err, "unexpected error")
}

Using helpers for internal namespace errors

IsErrorCode(err, CODES...): Checks if the error contains any of the specified codes.
IsErrorGroup(err, GROUP): Return true if the error belongs to the input group.

Typical usage pattern:

user, err := queryUser(ctx, userReq)
switch {
case err == nil:
    // continue

case IsErrorCode(PRL.USR.REPO.NOT_FOUND):    
    // check for specific error code and convert to external code
    // and return as HTTP 400 Not Found
    return nil, PRFL.USR.NOT_FOUND.Wrap(ctx, err, "user not found")

case IsGroup(PRL.USR):
    // errors belong to the PRFL.USR group are returned as is
    return nil, err

default:
    return nil, PRL.USR.UNKNOWN.Wrap(ctx, err, "failed to query user")   
}

MapError() for writing mapping code easier:

Since mapping error codes is a common pattern, there is a MapError() helper to make writing code faster. The above code can be rewritten as:

user, err := queryUser(ctx, userReq)
if err != nil {
    return nil, MapError(ctx, err).
        Map(PRL.USR.REPO.NOT_FOUND, PRFL.USR.NOT_FOUND, "user not found").    
        KeepGroup(PRF.USR).
        Default(PRL.USR.UNKNOWN, "failed to query user")
}

You can format arguments and add key/value pairs as usual:

return nil, MapError(ctx, err).
    Map(PRL.USR.REPO.NOT_FOUND, PRFL.USR.NOT_FOUND, 
        "user %v not found", username, 
        l.String("flag", flag)).    
    KeepGroup(PRF.USR).
    Default(PRL.USR.UNKNOWN, "failed to query user", 
        l.Any("retries", retryCount))

Testing with namespace `Error`s

Testing is critical for any serious code base. The framework provides specialized helpers like ΩxError() to make writing and asserting error conditions in tests easier and more expressive.

// 👉 return true if the error contains the message
ΩxError(err).Contains("not found")

// 👉 return true if the error does not contain the message
ΩxError(err).NOT().Contains("not found")

There are many more methods, and you can chain them too:

ΩxError(err).
    MatchCode(DEPS.PG.NOT_FOUND).  // match any code in top or wrapped errors
    TopErrorMatchCode(PRFL.TPL.NOT_FOUND) // only match code from the top error
    MatchAPICode(API_CODE.WABA_TEMPLATE_NOTE_FOUND). // match errorpb.ErrorCode
    MatchExact("exact message to match")

Why use methods instead of Ω(err).To(testing.MatchCode())?

Because methods are more discoverable. When you’re faced with dozens of functions like testing.MatchValues(), it’s hard to know which ones will work with Errors and which will not. With methods, you can simply type a dot ., and your IDE will list all available methods specifically designed for asserting Errors.

Migration

The framework is just half of the story. Writing the code? That’s the easy part. The real challenge starts when you have to bring it into a massive, living codebase where dozens of engineers are pushing changes daily, customers expect everything to work perfectly, and the system just can’t stop running.

Migration comes with responsibility. It’s about carefully splitting ~~hair~~ tiny bits of code, making tiny changes at a time, breaking a ton of tests in the process. Then manually inspecting and fixing them one by one, merging into the main branch, deploying to production, watching the logs and alerts. Repeating it over and over…

Here are some tips for migration that we learned along the way:

Start with search and replace: Begin by replacing old patterns with the new framework. Fix any compilation issues that arise from this process.

For example, replace all error in this package with Error.

type ProfileController interface {
    LoginUser(req *LoginRequest) (*LoginResponse, error)
    QueryUser(req *QueryUserRequest) (*QueryUserResponse, error)
}

The new code will look like this:

import . "connectly.ai/go/pkgs/errors"

type ProfileController interface {
    LoginUser(req *LoginRequest) (*LoginResponse, Error)
    QueryUser(req *QueryUserRequest) (*QueryUserResponse, Error)
}

Migrate one package at a time: Start with the lowest-level packages and work your way up. This way, you can ensure that the lower-level packages are fully migrated before moving on to the higher-level ones.

Add missing unit tests: If parts of the codebase lack tests, add them. If you are not confident in your changes, add more tests. They are helpful to make sure that your changes don’t break existing functionality.

If your package depends on calling higher-level packages: Consider changing the related functions to DEPRECATED then add new functions with the new Error type.

Assume that you are migrating the database package, which has the Transaction() method:

package database

func (db *DB) Transaction(ctx context.Context, 
    fn func(tx *gorm.DB) error) error {
        return db.gorm.Transaction(func(tx *gorm.DB) error {
            return fn(tx)
        })
}

And it is used in the user service package:

err = s.DB(ctx).Transaction(func(tx *database.DB) error {
    user, usrErr := s.repo.CreateUser(ctx, tx, user)
    if usrErr != nil {
        return usrErr
    }
}

Since you are migrating the database package first, leaving the user and dozens of other packages as it. The s.repo.CreateUser() call still returns the old error type while the Transaction() method needs to return the new Error type. You can change the Transaction() method to DEPRECATED and add a new TransactionV2() method:

package database

// DEPRECATED: use TransactionV2 instead
func (db *DB) Transaction_DEPRECATED(ctx context.Context, 
    fn func(tx *gorm.DB) error) error {
        return db.gorm.Transaction(func(tx *gorm.DB) error {
            return fn(tx)
        })
}

func (db *DB) TransactionV2(ctx context.Context, 
    fn func(tx *gorm.DB) error) Error {
        err := db.gorm.Transaction(func(tx *gorm.DB) error {
            return fn(tx)
        })
	    return adaptToErrorV2(err)
}

Add new error codes as you go: When you encounter an error that doesn’t fit into the existing ones, add a new code. This will help you build a comprehensive set of error codes over time. Codes from other packages are always available as references.

Conclusion

Error handling in Go can feel simple at first—just return an error and move on. But as our codebase grew, that simplicity turned into a tangled mess of vague logs, inconsistent handling, and endless debugging sessions.

By stepping back and rethinking how we handle errors, we’ve built a system that works for us, not against us. Centralized and structured namespace codes give us clarity, while tools for mapping, wrapping, and testing errors make our lives easier. Instead of swimming through sea of logs, we now have meaningful, traceable errors that tell us what’s wrong and where to look.

This framework isn’t just about making our code cleaner; it’s about saving time, reducing frustration, and helping us prepare for the unknown. It’s just the beginning of a journey — we are still discovering more patterns — but the result is a system that can somehow bring peace of mind to error handling. Hopefully, it can spark some ideas for your projects too! 😊

If you like the post, subscribe to my newsletter to get latest updates:

Author

I'm Oliver Nguyen. A software maker working mostly in Go and JavaScript. I enjoy learning and seeing a better version of myself each day. Occasionally spin off new open source projects. Share knowledge and thoughts during my journey. Connect with me on , , , , or subscribe to my posts.