Unicode Category Classes

Unicode defines a number of "categories", which can be referenced with "\p{Code}" and "\P{Code}", using either one or two letter codes to represent which category of characters they belong to.

For details of which characters are matched, consult the documentation for java.lang.Character or the Unicode Category details.

Code Description
C all control chars
Cc cntrl
Cf format
Cn unassigned
Co private use
Cs surrogate
L all letters
L1 Latin-1
LD letter or digit
Ll lowercase letter
Lm modifier letter
Lo other letter
Lt titlecase letter
Lu uppercase letter
M all mark
Mc combining spacing mark
Me enclosing mark
Mn non spacing mark
N all numbers
Nd decimal digit number
Nl letter number
No other number
P all punctuation
Pc connector punctuation
Pd dash punctuation
Pe end punctuation
Po other punctuation
Ps start punctuation
S all symbols
Sc currency symbol
Sk modifier symbol
Sm math symbol
So other symbol
Z all separators
Zl line separator
Zp paragraph separator
Zs space separator