Skip to main content

The Regex Engine

rg ships with two regex engines:

EngineFlagSupportsLimitation
Rust regex(default)Most ERE + UnicodeNo lookaheads/lookbehinds
PCRE2-P / --pcre2Full PCRE2 + lookaroundsRequires build feature

The Default Rust regex Engine

The default engine is very fast due to two properties:

  1. Linear time: It never backtracks exponentially, so it cannot be triggered into "ReDoS" (Regular Expression Denial of Service).
  2. Literal extraction: It detects literal substrings in your pattern and uses SIMD hardware instructions to skip non-matching file sections at memory speed.

Supported Features (Default Engine)

# Character classes
rg "[0-9]+" # digits
rg "[[:alpha:]]+" # POSIX alpha
rg "\w+" # word chars (Rust: letters, digits, _)

# Anchors
rg "^ERROR" # line starts with ERROR
rg "\.log$" # line ends with .log

# Quantifiers
rg "fo{2,4}bar" # 2–4 "o"s between fo and bar

# Unicode
rg "\p{L}+" # any Unicode letter
rg "\p{Cyrillic}" # Cyrillic characters

PCRE2 Mode (-P)

Enable PCRE2 for lookaheads, lookbehinds, and atomic groups:

# Find lines where "error" is NOT preceded by "no " (negative lookbehind)
rg -P "(?<!no )error" app.log

# Find IP addresses using PCRE2 word boundaries
rg -P "\b(?:\d{1,3}\.){3}\d{1,3}\b" access.log

# Extract only the value after "user_id=", using lookbehind
rg -P -o "(?<=user_id=)\d+" events.log

Unicode Awareness

By default rg is fully Unicode-aware. \w matches Unicode word characters, . does not match newlines but does match multi-byte Unicode codepoints.

# Disable Unicode for raw byte matching (faster on ASCII-only logs)
rg --no-unicode "pattern" large_ascii.log