Complete Practical Reference for Regular Expressions (Multilingual & Multiline Compatible)

Overview (What)

This is a comprehensive technical reference for practical and portable usage of regular expressions. It includes syntax fundamentals, multiline pattern handling, replacement techniques, tool compatibility, and maintainable pattern design.

Purpose (Why)

Regular expressions are the foundation of text processing tasks such as validation, parsing, transformation, and extraction. Used in virtually every programming language and platform, mastering them enables efficient automation, data refinement, and analysis.


Part 1: Basic Syntax

Pattern Meaning Example Matches
. Any character except newline a.c abc, a-c
* Zero or more repetitions go*gle ggle, google
+ One or more repetitions go+gle gogle, google
? Zero or one occurrence colou?r color, colour
{n} Exactly n times a{3} aaa
{n,m} Between n and m times a{2,4} aa, aaa, aaaa
^ Start of line ^abc Line starts with abc
$ End of line abc$ Line ends with abc
[] Any one character in brackets [abc] a, b, c
[^] Any character not listed [^0-9] Non-digit characters
\d Digit \d+ 123, 007
\D Non-digit \D+ abc, XYZ
\w Word character (alphanum + _) \w+ abc_123
\W Non-word character \W+ #@!, whitespace
\s Whitespace character \s+ space, tab, newline
\S Non-whitespace character \S+ abc, 123
` ` OR condition `foo
() Grouping (abc)+ abcabc
\b Word boundary \bregex\b Exact word match

Part 2: Practical Patterns (Multilingual Compatible)

Use Case Pattern Notes
Digits only ^\d+$ Entire input is numeric
Decimal number ^\d+\.\d+$ Allows one decimal point
Alphanumeric ^[a-zA-Z0-9]+$ Useful for user IDs
Email (simple) ^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}$ General format
URL (http/https) https?://[\w.-]+\.[a-zA-Z]{2,}(/[\w./?=&%-]*)? Handles path/query
ISO Date Format \d{4}-\d{2}-\d{2} Matches YYYY-MM-DD
International Phone \+\d{1,3}[\s-]?\d{1,14} e.g., +81 90...
Username ^[a-zA-Z0-9._-]{3,20}$ For SNS or login handles
Password Strength ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$ Must include upper, lower, digit

Part 3: Multiline Matching

🔹 Why use [\s\S]*?

In many tools and languages, . does not match newline characters.
Recommended workaround:

[\s\S]*?

Matches all characters including newlines, non-greedily.

💡 Note: Some environments (e.g., VSCode) accept [\s\S\n]*? — verbose but sometimes safer.

🔹 Common Patterns

Use Case Pattern Description
Arbitrary text block [\s\S]*? Safe way to match multiline text
HTML <div> block <div[\s\S]*?</div> Captures entire div block
HTML comments <!--[\s\S]*?--> Matches multi-line comments
Markdown code block "[\s\S]*?" Matches fenced code sections

Part 4: Replacement Techniques

🔹 Group Substitution Syntax

Popular reference syntaxes:

  • ${1}, ${2} (VSCode, .NET, Python)
  • $1, $2 (JavaScript, sed)

🔹 Practical Examples

Task Pattern Replacement Result
Reverse word order (\w+)\s+(\w+) ${2} ${1}
Wrap text in span >([^<]+)< ><span>${1}</span><
CSV to JSON ^([^,]+),([^,]+),([^,]+)$ { "a": "${1}", "b": "${2}", "c": "${3}" }
Date format change (\d{4})/(\d{2})/(\d{2}) ${1}-${2}-${3}
CamelCase to snake_case ([a-z])([A-Z]) ${1}_${2}

Part 5: Tool Compatibility & Flags

Tool Multiline Support Non-Greedy Replace Syntax Notes
VSCode ✅ (Search Panel) ${1} Use [\s\S]*? or [\s\S\n]*?
JavaScript ✅ (s flag) $1, $<name> Named groups supported
Python re ✅ (DOTALL) \g<1>, \1 Raw strings (r"") recommended
sed ❌ (line-based) \1, \2 Limited to basic patterns
grep/egrep ❌ (no replacement) Search only, no substitution capability

Part 6: Best Practices & Anti-Patterns

✅ Recommended Guidelines

  • Always use [\s\S]*? for multiline (or [\s\S\n]*? if needed)
  • Avoid greedy patterns like .* when not required
  • Use non-capturing groups (?:...) when capturing is unnecessary
  • Improve readability with named groups ((?<year>\d{4}))
  • For complex regex, use inline comments or re.VERBOSE

❌ Anti-Patterns

Pattern Problem
.* Greedy match may consume unintended text
[^a-z] Fails for accented/multilingual characters
Nest parsing Not suitable for HTML/XML tree structures
.+ Can match empty whitespace-only lines

Conclusion

This guide is a comprehensive reference for writing safe, efficient, and portable regular expressions across environments.
Adopt [\s\S]*? as the standard for multiline handling, respect tool-specific syntaxes, and strive for readable, maintainable pattern design.

Well-crafted regex is a powerful and reusable asset in any developer’s toolkit.

Copied title and URL