Complete Practical Reference for Regular Expressions (Multilingual & Multiline Compatible)

Overview (What)
Purpose (Why)
Part 1: Basic Syntax
Part 2: Practical Patterns (Multilingual Compatible)
Part 3: Multiline Matching
1. 🔹 Why use [\s\S]*?
2. 🔹 Common Patterns
Part 4: Replacement Techniques
1. 🔹 Group Substitution Syntax
2. 🔹 Practical Examples
Part 5: Tool Compatibility & Flags
Part 6: Best Practices & Anti-Patterns
1. ✅ Recommended Guidelines
2. ❌ Anti-Patterns
Conclusion

Overview (What)

This is a comprehensive technical reference for practical and portable usage of regular expressions. It includes syntax fundamentals, multiline pattern handling, replacement techniques, tool compatibility, and maintainable pattern design.

Purpose (Why)

Regular expressions are the foundation of text processing tasks such as validation, parsing, transformation, and extraction. Used in virtually every programming language and platform, mastering them enables efficient automation, data refinement, and analysis.

Part 1: Basic Syntax

Pattern	Meaning	Example	Matches
`.`	Any character except newline	`a.c`	`abc`, `a-c`
`*`	Zero or more repetitions	`go*gle`	`ggle`, `google`
`+`	One or more repetitions	`go+gle`	`gogle`, `google`
`?`	Zero or one occurrence	`colou?r`	`color`, `colour`
`{n}`	Exactly n times	`a{3}`	`aaa`
`{n,m}`	Between n and m times	`a{2,4}`	`aa`, `aaa`, `aaaa`
`^`	Start of line	`^abc`	Line starts with `abc`
`$`	End of line	`abc$`	Line ends with `abc`
`[]`	Any one character in brackets	`[abc]`	`a`, `b`, `c`
`[^]`	Any character not listed	`[^0-9]`	Non-digit characters
`\d`	Digit	`\d+`	`123`, `007`
`\D`	Non-digit	`\D+`	`abc`, `XYZ`
`\w`	Word character (alphanum + `_`)	`\w+`	`abc_123`
`\W`	Non-word character	`\W+`	`#@!`, whitespace
`\s`	Whitespace character	`\s+`	space, tab, newline
`\S`	Non-whitespace character	`\S+`	`abc`, `123`
`	`	OR condition	`foo
`()`	Grouping	`(abc)+`	`abcabc`
`\b`	Word boundary	`\bregex\b`	Exact word match

Part 2: Practical Patterns (Multilingual Compatible)

Use Case	Pattern	Notes
Digits only	`^\d+$`	Entire input is numeric
Decimal number	`^\d+\.\d+$`	Allows one decimal point
Alphanumeric	`^[a-zA-Z0-9]+$`	Useful for user IDs
Email (simple)	`^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}$`	General format
URL (http/https)	`https?://[\w.-]+\.[a-zA-Z]{2,}(/[\w./?=&%-]*)?`	Handles path/query
ISO Date Format	`\d{4}-\d{2}-\d{2}`	Matches `YYYY-MM-DD`
International Phone	`\+\d{1,3}[\s-]?\d{1,14}`	e.g., `+81 90...`
Username	`^[a-zA-Z0-9._-]{3,20}$`	For SNS or login handles
Password Strength	`^(?=.[a-z])(?=.[A-Z])(?=.*\d).{8,}$`	Must include upper, lower, digit

Part 3: Multiline Matching

🔹 Why use [\s\S]*?

In many tools and languages, . does not match newline characters.
Recommended workaround:

[\s\S]*?

Matches all characters including newlines, non-greedily.

💡 Note: Some environments (e.g., VSCode) accept [\s\S\n]*? — verbose but sometimes safer.

🔹 Common Patterns

Use Case	Pattern	Description
Arbitrary text block	`[\s\S]*?`	Safe way to match multiline text
HTML `<div>` block	`<div[\s\S]*?</div>`	Captures entire div block
HTML comments	`<!--[\s\S]*?-->`	Matches multi-line comments
Markdown code block	"`[\s\S]*?`"	Matches fenced code sections

Part 4: Replacement Techniques

🔹 Group Substitution Syntax

Popular reference syntaxes:

${1}, ${2} (VSCode, .NET, Python)
$1, $2 (JavaScript, sed)

🔹 Practical Examples

Task	Pattern	Replacement Result
Reverse word order	`(\w+)\s+(\w+)`	`${2} ${1}`
Wrap text in span	`>([^<]+)<`	`><span>${1}</span><`
CSV to JSON	`^([^,]+),([^,]+),([^,]+)$`	`{ "a": "${1}", "b": "${2}", "c": "${3}" }`
Date format change	`(\d{4})/(\d{2})/(\d{2})`	`${1}-${2}-${3}`
CamelCase to snake_case	`([a-z])([A-Z])`	`${1}_${2}`

Part 5: Tool Compatibility & Flags

Tool	Multiline Support	Non-Greedy	Replace Syntax	Notes
VSCode	✅ (Search Panel)	✅	`${1}`	Use `[\s\S]?` or `[\s\S\n]?`
JavaScript	✅ (`s` flag)	✅	`$1`, `$<name>`	Named groups supported
Python `re`	✅ (`DOTALL`)	✅	`\g<1>`, `\1`	Raw strings (`r""`) recommended
sed	❌ (line-based)	❌	`\1`, `\2`	Limited to basic patterns
grep/egrep	❌	❌	❌ (no replacement)	Search only, no substitution capability

Part 6: Best Practices & Anti-Patterns

✅ Recommended Guidelines

Always use [\s\S]*? for multiline (or [\s\S\n]*? if needed)
Avoid greedy patterns like .* when not required
Use non-capturing groups (?:...) when capturing is unnecessary
Improve readability with named groups ((?<year>\d{4}))
For complex regex, use inline comments or re.VERBOSE

❌ Anti-Patterns

Pattern	Problem
`.*`	Greedy match may consume unintended text
`[^a-z]`	Fails for accented/multilingual characters
Nest parsing	Not suitable for HTML/XML tree structures
`.+`	Can match empty whitespace-only lines

Conclusion

This guide is a comprehensive reference for writing safe, efficient, and portable regular expressions across environments.
Adopt [\s\S]*? as the standard for multiline handling, respect tool-specific syntaxes, and strive for readable, maintainable pattern design.

Well-crafted regex is a powerful and reusable asset in any developer’s toolkit.