`
tory320
  • 浏览: 33172 次
  • 性别: Icon_minigender_1
  • 来自: 北京
最近访客 更多访客>>
社区版块
存档分类
最新评论

Javascript Regular Expressions

阅读更多

10.1.2 Character Classes

Individual literal characters can be combined into character classes by placing them within square brackets. A character class matches any one character that is contained within it. Thus, the regular expression /[abc]/ matches any one of the letters a, b, or c. Negated character classes can also be defined -- these match any character except those contained within the brackets. A negated character class is specified by placing a caret (^ ) as the first character inside the left bracket. The regexp /[^abc]/ matches any one character other than a, b, or c. Character classes can use a hyphen to indicate a range of characters. To match any one lowercase character from the Latin alphabet, use /[a-z]/ , and to match any letter or digit from the Latin alphabet, use /[a-zA-Z0-9]/ .

Because certain character classes are commonly used, the JavaScript regular expression syntax includes special characters and escape sequences to represent these common classes. For example, \s matches the space character, the tab character, and any other Unicode whitespace character, and \S matches any character that is not Unicode whitespace. Table 10-2 lists these characters and summarizes character class syntax. (Note that several of these character class escape sequences match only ASCII characters and have not been extended to work with Unicode characters. You can explicitly define your own Unicode character classes; for example, /[\u0400-04FF]/ matches any one Cyrillic character.)

Table 10-2. Regular expression character classes

Character

Matches

[...]

Any one character between the brackets.

[^...]

Any one character not between the brackets.

.

Any character except newline or another Unicode line terminator.

\w

Any ASCII word character. Equivalent to [a-zA-Z0-9_] .

\W

Any character that is not an ASCII word character. Equivalent to [^a-zA-Z0-9_] .

\s

Any Unicode whitespace character.

\S

Any character that is not Unicode whitespace. Note that \w and \S are not the same thing.

\d

Any ASCII digit. Equivalent to [0-9] .

\D

Any character other than an ASCII digit. Equivalent to [^0-9] .

[\b]

A literal backspace (special case).

Note that the special character class escapes can be used within square brackets. \s matches any whitespace character and \d matches any digit, so /[\s\d]/ matches any one whitespace character or digit. Note that there is one special case. As we'll see later, the \b escape has a special meaning. When used within a character class, however, it represents the backspace character. Thus, to represent a backspace character literally in a regular expression, use the character class with one element: /[\b]/ .

10.1.3 Repetition

With the regular expression syntax we have learned so far, we can describe a two-digit number as /\d\d/ and a four-digit number as /\d\d\d\d/ . But we don't have any way to describe, for example, a number that can have any number of digits or a string of three letters followed by an optional digit. These more complex patterns use regular expression syntax that specifies how many times an element of a regular expression may be repeated.

The characters that specify repetition always follow the pattern to which they are being applied. Because certain types of repetition are quite commonly used, there are special characters to represent these cases. For example, + matches one or more occurrences of the previous pattern. Table 10-3 summarizes the repetition syntax. The following lines show some examples:

/\d{2,4}/     // Match between two and four digits

/\w{3}\d?/    // Match exactly three word characters and an optional digit

/\s+java\s+/  // Match "java" with one or more spaces before and after

/[^"]*/       // Match zero or more non-quote characters

Table 10-3. Regular expression repetition characters

Character

Meaning

{ n , m }

Match the previous item at least n times but no more than m times.

{ n ,}

Match the previous item n or more times.

{ n }

Match exactly n occurrences of the previous item.

?

Match zero or one occurrences of the previous item. That is, the previous item is optional. Equivalent to {0,1} .

+

Match one or more occurrences of the previous item. Equivalent to {1,} .

*

Match zero or more occurrences of the previous item. Equivalent to {0,} .

Be careful when using the * and ? repetition characters. Since these characters may match zero instances of whatever precedes them, they are allowed to match nothing. For example, the regular expression /a*/ actually matches the string "bbbb", because the string contains zero occurrences of the letter a!

评论

相关推荐

Global site tag (gtag.js) - Google Analytics