Overview (What)
This is a comprehensive technical reference for practical and portable usage of regular expressions. It includes syntax fundamentals, multiline pattern handling, replacement techniques, tool compatibility, and maintainable pattern design.
Purpose (Why)
Regular expressions are the foundation of text processing tasks such as validation, parsing, transformation, and extraction. Used in virtually every programming language and platform, mastering them enables efficient automation, data refinement, and analysis.
Part 1: Basic Syntax
Pattern | Meaning | Example | Matches |
---|---|---|---|
. |
Any character except newline | a.c |
abc , a-c |
* |
Zero or more repetitions | go*gle |
ggle , google |
+ |
One or more repetitions | go+gle |
gogle , google |
? |
Zero or one occurrence | colou?r |
color , colour |
{n} |
Exactly n times | a{3} |
aaa |
{n,m} |
Between n and m times | a{2,4} |
aa , aaa , aaaa |
^ |
Start of line | ^abc |
Line starts with abc |
$ |
End of line | abc$ |
Line ends with abc |
[] |
Any one character in brackets | [abc] |
a , b , c |
[^] |
Any character not listed | [^0-9] |
Non-digit characters |
\d |
Digit | \d+ |
123 , 007 |
\D |
Non-digit | \D+ |
abc , XYZ |
\w |
Word character (alphanum + _ ) |
\w+ |
abc_123 |
\W |
Non-word character | \W+ |
#@! , whitespace |
\s |
Whitespace character | \s+ |
space, tab, newline |
\S |
Non-whitespace character | \S+ |
abc , 123 |
` | ` | OR condition | `foo |
() |
Grouping | (abc)+ |
abcabc |
\b |
Word boundary | \bregex\b |
Exact word match |
Part 2: Practical Patterns (Multilingual Compatible)
Use Case | Pattern | Notes |
---|---|---|
Digits only | ^\d+$ |
Entire input is numeric |
Decimal number | ^\d+\.\d+$ |
Allows one decimal point |
Alphanumeric | ^[a-zA-Z0-9]+$ |
Useful for user IDs |
Email (simple) | ^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}$ |
General format |
URL (http/https) | https?://[\w.-]+\.[a-zA-Z]{2,}(/[\w./?=&%-]*)? |
Handles path/query |
ISO Date Format | \d{4}-\d{2}-\d{2} |
Matches YYYY-MM-DD |
International Phone | \+\d{1,3}[\s-]?\d{1,14} |
e.g., +81 90... |
Username | ^[a-zA-Z0-9._-]{3,20}$ |
For SNS or login handles |
Password Strength | ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$ |
Must include upper, lower, digit |
Part 3: Multiline Matching
🔹 Why use [\s\S]*?
In many tools and languages, .
does not match newline characters.
Recommended workaround:
[\s\S]*?
Matches all characters including newlines, non-greedily.
💡 Note: Some environments (e.g., VSCode) accept
[\s\S\n]*?
— verbose but sometimes safer.
🔹 Common Patterns
Use Case | Pattern | Description |
---|---|---|
Arbitrary text block | [\s\S]*? |
Safe way to match multiline text |
HTML <div> block |
<div[\s\S]*?</div> |
Captures entire div block |
HTML comments | <!--[\s\S]*?--> |
Matches multi-line comments |
Markdown code block | "[\s\S]*? " |
Matches fenced code sections |
Part 4: Replacement Techniques
🔹 Group Substitution Syntax
Popular reference syntaxes:
${1}
,${2}
(VSCode, .NET, Python)$1
,$2
(JavaScript, sed)
🔹 Practical Examples
Task | Pattern | Replacement Result |
---|---|---|
Reverse word order | (\w+)\s+(\w+) |
${2} ${1} |
Wrap text in span | >([^<]+)< |
><span>${1}</span>< |
CSV to JSON | ^([^,]+),([^,]+),([^,]+)$ |
{ "a": "${1}", "b": "${2}", "c": "${3}" } |
Date format change | (\d{4})/(\d{2})/(\d{2}) |
${1}-${2}-${3} |
CamelCase to snake_case | ([a-z])([A-Z]) |
${1}_${2} |
Part 5: Tool Compatibility & Flags
Tool | Multiline Support | Non-Greedy | Replace Syntax | Notes |
---|---|---|---|---|
VSCode | ✅ (Search Panel) | ✅ | ${1} |
Use [\s\S]*? or [\s\S\n]*? |
JavaScript | ✅ (s flag) |
✅ | $1 , $<name> |
Named groups supported |
Python re |
✅ (DOTALL ) |
✅ | \g<1> , \1 |
Raw strings (r"" ) recommended |
sed | ❌ (line-based) | ❌ | \1 , \2 |
Limited to basic patterns |
grep/egrep | ❌ | ❌ | ❌ (no replacement) | Search only, no substitution capability |
Part 6: Best Practices & Anti-Patterns
✅ Recommended Guidelines
- Always use
[\s\S]*?
for multiline (or[\s\S\n]*?
if needed) - Avoid greedy patterns like
.*
when not required - Use non-capturing groups
(?:...)
when capturing is unnecessary - Improve readability with named groups (
(?<year>\d{4})
) - For complex regex, use inline comments or
re.VERBOSE
❌ Anti-Patterns
Pattern | Problem |
---|---|
.* |
Greedy match may consume unintended text |
[^a-z] |
Fails for accented/multilingual characters |
Nest parsing | Not suitable for HTML/XML tree structures |
.+ |
Can match empty whitespace-only lines |
Conclusion
This guide is a comprehensive reference for writing safe, efficient, and portable regular expressions across environments.
Adopt [\s\S]*?
as the standard for multiline handling, respect tool-specific syntaxes, and strive for readable, maintainable pattern design.
Well-crafted regex is a powerful and reusable asset in any developer’s toolkit.