Return Type
Name(Signature)
Description
int
ascii(string str)
Returns the numeric value of the first character of str.
返回str中首个ASCII字符串的整数值
string
base64(binary bin)
Converts the argument from binary to a base 64 string (as of Hive
0.12.0)..
将二进制bin转换成64位的字符串
string
concat(string|binary A, string|binary B...)
Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings..
对二进制字节码或字符串按次序进行拼接
array<struct<string,double>>
context_ngrams(array<array<string>>, array<string>, int K, int pf)
Returns the top-k contextual N-grams from a set of tokenized sentences, given a string of "context". See
StatisticsAndDataMining for more information..
string
concat_ws(string SEP, string A, string B...)
Like concat() above, but with custom separator SEP..
与concat()类似,但使用指定的分隔符喜进行分隔
string
concat_ws(string SEP, array<string>)
Like concat_ws() above, but taking an array of strings. (as of Hive
0.9.0).
拼接Array中的元素并用指定分隔符进行分隔
string
decode(binary bin, string charset)
Decodes the first argument into a String using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null. (As of Hive
0.12.0.).
使用指定的字符集charset将二进制值bin解码成字符串,支持的字符集有:'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16',如果任意输入参数为NULL都将返回NULL
binary
encode(string src, string charset)
Encodes the first argument into a BINARY using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null. (As of Hive
0.12.0.).
使用指定的字符集charset将字符串编码成二进制值,支持的字符集有:'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16',如果任一输入参数为NULL都将返回NULL
int
find_in_set(string str, string strList)
Returns the first occurance of str in strList where strList is a comma-delimited string. Returns null if either argument is null. Returns 0 if the first argument contains any commas. For example, find_in_set('ab', 'abc,b,ab,c,def') returns 3..
返回以逗号分隔的字符串中str出现的位置,如果参数str为逗号或查找失败将返回0,如果任一参数为NULL将返回NULL回
string
format_number(number x, int d)
Formats the number X to a format like '#,###,###.##', rounded to D decimal places, and returns the result as a string. If D is 0, the result has no decimal point or fractional part. (As of Hive
0.10.0; bug with float types fixed in
Hive 0.14.0, decimal type support added in
Hive 0.14.0).
将数值X转换成"#,###,###.##"格式字符串,并保留d位小数,如果d为0,将进行四舍五入且不保留小数
string
get_json_object(string json_string, string path)
Extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It will return null if the input json string is invalid. NOTE: The json path can only have the characters [0-9a-z_], i.e., no upper-case or special characters. Also, the keys *cannot start with numbers.* This is due to restrictions on Hive column names..
从指定路径上的JSON字符串抽取出JSON对象,并返回这个对象的JSON格式,如果输入的JSON是非法的将返回NULL,注意此路径上JSON字符串只能由数字 字母 下划线组成且不能有大写字母和特殊字符,且key不能由数字开头,这是由于Hive对列名的限制
boolean
in_file(string str, string filename)
Returns true if the string str appears as an entire line in filename..
如果文件名为filename的文件中有一行数据与字符串str匹配成功就返回true
int
instr(string str, string substr)
Returns the position of the first occurrence of substr in str. Returns null if either of the arguments are null and returns 0 if substr could not be found in str. Be aware that this is not zero based. The first character in str has index 1..
查找字符串str中子字符串substr出现的位置,如果查找失败将返回0,如果任一参数为Null将返回null,注意位置为从1开始的
int
length(string A)
Returns the length of the string..
返回字符串的长度
int
locate(string substr, string str[, int pos])
Returns the position of the first occurrence of substr in str after position pos..
查找字符串str中的pos位置后字符串substr第一次出现的位置
string
lower(string A) lcase(string A)
Returns the string resulting from converting all characters of B to lower case. For example, lower('fOoBaR') results in 'foobar'..
将字符串A的所有字母转换成小写字母
string
lpad(string str, int len, string pad)
Returns str, left-padded with pad to a length of len..
从左边开始对字符串str使用字符串pad填充,最终len长度为止,如果字符串str本身长度比len大的话,将去掉多余的部分
string
ltrim(string A)
Returns the string resulting from trimming spaces from the beginning(left hand side) of A. For example, ltrim(' foobar ') results in 'foobar '..
去掉字符串A前面的空格
array<struct<string,double>>
ngrams(array<array<string>>, int N, int K, int pf)
Returns the top-k N-grams from a set of tokenized sentences, such as those returned by the sentences() UDAF. See
StatisticsAndDataMining for more information..
string
parse_url(string urlString, string partToExtract [, string keyToExtract])
Returns the specified part from the URL. Valid values for partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'. Also a value of a particular key in QUERY can be extracted by providing the key as the third argument, for example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns 'v1'..
返回从URL中抽取指定部分的内容,参数url是URL字符串,而参数partToExtract是要抽取的部分,这个参数包含(HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO,例如:parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') ='facebook.com',如果参数partToExtract值为QUERY则必须指定第三个参数key 如:parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') =‘v1’
string
printf(String format, Obj... args)
Returns the input formatted according do printf-style format strings (as of Hive
0.9.0)..
按照printf风格格式输出字符串
string
regexp_extract(string subject, string pattern, int index)
Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html for more information on the 'index' or Java regex group() method..
抽取字符串subject中符合正则表达式pattern的第index个部分的子字符串,注意些预定义字符的使用,如第二个参数如果使用'\s'将被匹配到s,'\\s'才是匹配空格
string
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)
Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. For example, regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that some care is necessary in using predefined character classes: using '\s' as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc..
按照Java正则表达式PATTERN将字符串INTIAL_STRING中符合条件的部分成REPLACEMENT所指定的字符串,如里REPLACEMENT这空的话,抽符合正则的部分将被去掉 如:regexp_replace("foobar", "oo|ar", "") = 'fb.' 注意些预定义字符的使用,如第二个参数如果使用'\s'将被匹配到s,'\\s'才是匹配空格
string
repeat(string str, int n)
Repeats str n times..
重复输出n次字符串str
string
reverse(string A)
Returns the reversed string..
反转字符串
string
rpad(string str, int len, string pad)
Returns str, right-padded with pad to a length of len..
从右边开始对字符串str使用字符串pad填充,最终len长度为止,如果字符串str本身长度比len大的话,将去掉多余的部分
string
rtrim(string A)
Returns the string resulting from trimming spaces from the end(right hand side) of A. For example, rtrim(' foobar ') results in ' foobar'..
去掉字符串后面出现的空格
array<array<string>>
sentences(string str, string lang, string locale)
Tokenizes a string of natural language text into words and sentences, where each sentence is broken at the appropriate sentence boundary and returned as an array of words. The 'lang' and 'locale' are optional arguments. For example, sentences('Hello there! How are you?') returns ( ("Hello", "there"), ("How", "are", "you") )..
字符串str将被转换成单词数组,如:sentences('Hello there! How are you?') =( ("Hello", "there"), ("How", "are", "you") )
string
space(int n)
Returns a string of n spaces..
返回n个空格
array
split(string str, string pat)
Splits str around pat (pat is a regular expression)..
按照正则表达式pat来分割字符串str,并将分割后的数组字符串的形式返回
map<string,string>
str_to_map(text[, delimiter1, delimiter2])
Splits text into key-value pairs using two delimiters. Delimiter1 separates text into K-V pairs, and Delimiter2 splits each K-V pair. Default delimiters are ',' for delimiter1 and '=' for delimiter2..
将字符串str按照指定分隔符转换成Map,第一个参数是需要转换字符串,第二个参数是键值对之间的分隔符,默认为逗号;第三个参数是键值之间的分隔符,默认为"="
string
substr(string|binary A, int start) substring(string|binary A, int start)
对于字符串A,从start位置开始截取字符串并返回
string
substr(string|binary A, int start, int len) substring(string|binary A, int start, int len)
对于二进制/字符串A,从start位置开始截取长度为length的字符串并返回
stringsubstring_index(string A, string delim, int count)Returns the substring from string A before count occurrences of the delimiter delim (as of Hive 1.3.0). If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. Substring_index performs a case-sensitive match when searching for delim. Example: substring_index('www.apache.org', '.', 2) = 'www.apache'..截取第count分隔符之前的字符串,如count为正则从左边开始截取,如果为负则从右边开始截取
string
translate(string|char|varchar input, string|char|varchar from, string|char|varchar to)
Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string. This is similar to the translatefunction in
PostgreSQL. If any of the parameters to this UDF are NULL, the result is NULL as well. (Available as of Hive
0.10.0, for string types)
将input出现在from中的字符串替换成to中的字符串 如:translate("MOBIN","BIN","M")="MOM"
string
trim(string A)
Returns the string resulting from trimming spaces from both ends of A. For example, trim(' foobar ') results in 'foobar'.
将字符串A前后出现的空格去掉
binary
unbase64(string str)
Converts the argument from a base 64 string to BINARY. (As of Hive
0.12.0.).
将64位的字符串转换二进制值
string
upper(string A) ucase(string A)
Returns the string resulting from converting all characters of A to upper case. For example, upper('fOoBaR') results in 'FOOBAR'..
将字符串A中的字母转换成大写字母
stringinitcap(string A)Returns string, with the first letter of each word in uppercase, all other letters in lowercase. Words are delimited by whitespace. (As of Hive 1.1.0.).将字符串A转换第一个字母大写其余字母的字符串
intlevenshtein(string A, string B)Returns the Levenshtein distance between two strings (as of Hive 1.2.0). For example, levenshtein('kitten', 'sitting') results in 3..计算两个字符串之间的差异大小 如:levenshtein('kitten', 'sitting') = 3
stringsoundex(string A)Returns soundex code of the string (as of Hive 1.2.0). For example, soundex('Miller') results in M460..将普通字符串转换成soundex字符串