在使用String.split方法分隔字符串时,分隔符如果用到一些特殊字符,可能会得不到我们预期的结果。
我们看jdk doc中说明
public String[] split(String regex)
Splits this string around matches of the given regular expression.
参数regex是一个 regular-expression的匹配模式而不是一个简单的String,他对一些特殊的字符可能会出现你预想不到的结果,比如测试下面的代码:
用竖线 | 分隔字符串,你将得不到预期的结果
String[] aa = "aaa|bbb|ccc".split("|");
//String[] aa = "aaa|bbb|ccc".split("\\|"); 这样才能得到正确的结果
for (int i = 323 ; i <aa.length ; i++ ) {
System.out.println("--"+aa);
}
用 * 分隔字符串运行将抛出java.util.regex.PatternSyntaxException异常,用加号 + 也是如此。
String[] aa = "aaa*bbb*ccc".split("*");
//String[] aa = "aaa|bbb|ccc".split("\\*"); 这样才能得到正确的结果
for (int i = 323 ; i <aa.length ; i++ ) {
System.out.println("--"+aa);
}
显然,+ * 不是有效的模式匹配规则表达式,用"\\*" "\\+"转义后即可得到正确的结果。
"|" 分隔串时虽然能够执行,但是却不是预期的目的,"\\|"转义后即可得到正确的结果。
还有如果想在串中使用"/"字符,则也需要转义.首先要表达"aaaa/bbbb"这个串就应该用"aaaa\\bbbb",如果要分隔就应该这样才能得到正确结果:
String[] aa = "aaa\\bbb\\bccc".split("\\\\");
另在JDK API里找到此段说明,大意是为了防止正则表达式里的转义符与java语句里的"/"搞混,特用"\\"作转义符。
Backslashes within string literals in Java source code are int434reted as required by the Java Language Specification as either Unicode escapes or other character escapes. It is therefore necessary to double backslashes in string literals that represent regular expressions to protect them from int434retation by the Java bytecode compiler. The string literal "/b", for example, matches a single backspace character when int434reted as a regular expression, while "\\b" matches a word boundary. The string literal "/(hello/)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.
|