ERBSLAND REGULAR EXPRESSION

A secure, dependency-free regular expression library for modern C++

Language C++
License Apache 2.0

Erbsland Regular Expression is a modern C++ regular expression library built for secure and predictable behavior.

It emphasizes strict input validation, robust Unicode handling, and straightforward integration into real-world projects — without introducing external runtime dependencies.

Quick Example

Compile a pattern, scan text, and process the first match:

#include <erbsland/all_re.hpp>
#include <iostream>
#include <string>

using namespace erbsland::re;

int main() {
    try {
        const auto re = RegEx::compile(R"(\d+)");
        const auto text = std::string{"abc 12345 xyz"};

        if (const auto match = re->findFirst(text); match != nullptr) {
            std::cout << "Found a number: " << match->content(0) << "\n";
        }
    } catch (const Error &error) {
        std::cerr << error.what() << "\n";
        return 1;
    }
    return 0;
}

Key Features

Erbsland Regular Expression is designed to provide secure and predictable matching for modern C++ projects:

  • No external runtime dependencies.
  • Strict UTF-8 validation for both patterns and input text.
  • Unicode-aware matching with character classes and simple case folding.
  • Support for greedy, lazy, and possessive quantifiers, as well as atomic groups.
  • Configurable limits and timeouts to guard against problematic patterns or input.
  • Flexible APIs for first-match, all-match, and replacement workflows.
  • View-based matching APIs to avoid unnecessary allocations.
  • Diagnostic tools, including disassembler and assembler support.

Ready to Get Started?

Follow the step-by-step guide to set up a small C++ project, integrate the library, and run your first regular expression workflow.

CMake Integration

You can integrate the library as a Git submodule and link it as a static library.

git submodule add https://github.com/erbsland-dev/erbsland-cpp-re.git erbsland-re

Minimal CMake setup:

cmake_minimum_required(VERSION 3.25)
project(ExampleProject)

add_subdirectory(erbsland-re)
add_executable(example src/main.cpp)

target_compile_features(example PRIVATE cxx_std_20)
target_link_libraries(example PRIVATE erbsland-re)

If you prefer a standalone installation, the project also supports building and installing liberbsland-re.a directly with CMake.

Requirements

The code and build requirements are intentionally minimal:

  • A C++20-compatible compiler and standard library
  • CMake 3.23 or newer
  • Python 3.12 or newer (only for development and testing)

No third-party runtime libraries are required.

Design Goals

The engine is built around a clear set of priorities:

  • Security and robustness before raw throughput.
  • Explicit and readable modern C++ implementation.
  • Predictable behavior when working with untrusted patterns and input.
  • Dependency-free integration into existing C++ projects.

Internally, the matching engine is based on the ideas of the Thompson NFA algorithm.
This approach avoids catastrophic backtracking and helps ensure consistent and predictable runtime behavior, even for complex patterns.

We aim for deterministic behavior and controlled memory usage, so the library can be used with confidence in security-sensitive environments.

Explore API and Pattern Syntax

Explore the full API reference and pattern syntax documentation for detailed behavior, supported constructs, and diagnostic tooling.

Performance Comparison

Direct comparisons between regular expression engines can be misleading. The benchmarks below are provided only as a rough indication of performance characteristics.

All results were measured on a 2021 MacBook Pro (Apple M1 Max) with ample memory.

Important Notes

  • Erbsland Regular Expression:
    • Always reads and validates UTF-8 input
    • Performs Unicode-aware comparisons in all modes
    • Does not heavily optimize the compiled program representation, and includes only limited speed optimizations
  • For this benchmark, neither PCRE2 nor std::regex were configured to enforce strict UTF-8 validation.
  • “ASCII mode” in Erbsland RE only affects character class handling; input is still processed as Unicode.

Benchmark Results

These are the results for version 1.0.0 of the library.

Benchmarking file: shakespeare.html (6.98 MB)
┌───────────────────┬─────────────┬─────────┬─────┬───────────────────────┐
│ Pattern           │ Library     │ Mode    │ %   │ Bar       <= better   │
├───────────────────┼─────────────┼─────────┼─────┼───────────────────────┤
│ Words             │ erbsland-re │ Unicode │ 100 │ ██████████            │
│                   │ pcre2       │ Unicode │  54 │ █████▌                │
│                   │ erbsland-re │ Ascii   │ 100 │ ██████████            │
│                   │ std::regex  │ Ascii   │ 168 │ █████████████████     │
│                   │ pcre2       │ Ascii   │  33 │ ███▌                  │
├───────────────────┼─────────────┼─────────┼─────┼───────────────────────┤
│ Capitalized       │ erbsland-re │ Unicode │ 100 │ ██████████            │
│                   │ pcre2       │ Unicode │  14 │ █▌                    │
│                   │ erbsland-re │ Ascii   │ 105 │ ██████████▌           │
│                   │ std::regex  │ Ascii   │ 239 │ ████████████████████► │
│                   │ pcre2       │ Ascii   │  13 │ █▌                    │
├───────────────────┼─────────────┼─────────┼─────┼───────────────────────┤
│ URI               │ erbsland-re │ Unicode │ 100 │ ██████████            │
│                   │ pcre2       │ Unicode │   6 │ ▌                     │
│                   │ erbsland-re │ Ascii   │ 100 │ ██████████            │
│                   │ std::regex  │ Ascii   │ 280 │ ████████████████████► │
│                   │ pcre2       │ Ascii   │   6 │ ▌                     │
├───────────────────┼─────────────┼─────────┼─────┼───────────────────────┤
│ HTML Tags         │ erbsland-re │ Unicode │ 100 │ ██████████            │
│                   │ pcre2       │ Unicode │   9 │ █                     │
│                   │ erbsland-re │ Ascii   │  97 │ █████████▌            │
│                   │ std::regex  │ Ascii   │ 213 │ ████████████████████► │
│                   │ pcre2       │ Ascii   │   9 │ █                     │
└───────────────────┴─────────────┴─────────┴─────┴───────────────────────┘

Pattern Legend

  • Words: \w+
  • Capitalized: \b[A-Z][a-z]*\b
  • URI: https?://[a-zA-Z0-9\.]+
  • HTML Tags: <[a-z1-6]+[^>]*>

Sources and License

The complete source code for Erbsland Regular Expression is available on GitHub.

The documentation is published at re.erbsland.dev.
The project is released under the Apache License 2.0.