关系运算 | | | [size=7.5000pt]A <> B
| [size=7.5000pt]All primitive types
| [size=7.5000pt]NULL if A or B is NULL, TRUE if expression A is NOT equal to expression B, otherwise FALSE.
[size=7.5000pt]不等于
| [size=7.5000pt]A != B
| [size=7.5000pt]All primitive types
| [size=7.5000pt]Synonym for the <> operator.
| [size=7.5000pt]A [NOT] BETWEEN B AND C
| [size=7.5000pt]All primitive types
| [size=7.5000pt]NULL if A, B or C is NULL, TRUE if A is greater than or equal to B AND A less than or equal to C, otherwise FALSE. This can be inverted by using the NOT keyword. (As of version [size=7.5000pt] 0.9.0.)
[size=12.0000pt]等价于 a>=b and a<=c
| [size=7.5000pt]A IS NULL
| [size=7.5000pt]All types
| [size=7.5000pt]TRUE if expression A evaluates to NULL, otherwise FALSE.
| [size=7.5000pt]A IS NOT NULL
| [size=7.5000pt]All types
| [size=7.5000pt]FALSE if expression A evaluates to NULL, otherwise TRUE.
| [size=7.5000pt]A [NOT] LIKE B
| [size=7.5000pt]strings
| [size=7.5000pt]NULL if A or B is NULL, TRUE if string A matches the SQL simple regular expression B, otherwise FALSE. The comparison is done character by character. The _ character in B matches any character in A (similar to . in posix regular expressions) while the % character in B matches an arbitrary number of characters in A (similar to .* in posix regular expressions). For example, 'foobar' like 'foo' evaluates to FALSE whereas 'foobar' like 'foo_ _ _' evaluates to TRUE and so does 'foobar' like 'foo%'.
| [size=7.5000pt]A RLIKE B
| [size=7.5000pt]strings
| [size=7.5000pt]NULL if A or B is NULL, TRUE if any (possibly empty) substring of A matches the Java regular expression B, otherwise FALSE. For example, 'foobar' RLIKE 'foo' evaluates to TRUE and so does 'foobar' RLIKE '^f.*r$'.
[size=9.0000pt]JAVA的LIKE操作[size=9.0000pt] 正则匹配[size=9.0000pt]: RLIKE
[size=9.0000pt]where '123456' rlike '^\\d+$'
| [size=7.5000pt]A REGEXP B
| [size=7.5000pt]strings
| [size=7.5000pt]Same as RLIKE.
[size=9.0000pt]REGEXP操作: REGEXP[size=9.0000pt] [size=9.0000pt]功能与RLIKE相同
[size=9.0000pt]where ‘key' REGEXP '^f.*r$';
| 逻辑操作 | | | [size=7.5000pt]A AND B
| [size=7.5000pt]boolean
| [size=7.5000pt]TRUE if both A and B are TRUE, otherwise FALSE. NULL if A or B is NULL.
| [size=7.5000pt]A OR B
| [size=7.5000pt]boolean
| [size=7.5000pt]TRUE if either A or B or both are TRUE, FALSE OR NULL is NULL, otherwise FALSE.
| [size=7.5000pt]NOT A
| [size=7.5000pt]boolean
| [size=7.5000pt]TRUE if A is FALSE or NULL if A is NULL. Otherwise FALSE.
| [size=7.5000pt]! A
| [size=7.5000pt]boolean
| [size=7.5000pt]Same as NOT A.
| [size=7.5000pt]A IN (val1, val2, ...)
| [size=7.5000pt]boolean
| [size=7.5000pt]TRUE if A is equal to any of the values. As of Hive 0.13 [size=7.5000pt] subqueries are supported in IN statements.
| [size=7.5000pt]A NOT IN (val1, val2, ...)
| [size=7.5000pt]boolean
| [size=7.5000pt]TRUE if A is not equal to any of the values. As of Hive 0.13 [size=7.5000pt] subqueries are supported in NOT IN statements.
| [size=7.5000pt][NOT] EXISTS (subquery)
| [size=7.5000pt]
| [size=7.5000pt]TRUE if the the subquery returns at least one row. Supported as of [size=7.5000pt] Hive 0.13.
[size=7.5000pt]e[size=7.5000pt]xists的子查询(当exists后的子查询,至少可以查出一条数据时,exists条件成立)
[size=7.5000pt]select * from classes where exists (SELECT cno FROM student WHERE birthday > [size=7.5000pt] [size=7.5000pt]'1991-01-01');
| 字符串操作 | | | [size=7.5000pt]A || B
| [size=7.5000pt]strings
| [size=7.5000pt]Concatenates the operands - shorthand for [size=7.5000pt]concat(A,B)[size=7.5000pt] . Supported as of [size=7.5000pt] Hive 2.2.0.
[size=12.0000pt]字符串拼接
| 数学函数 | | | [size=7.5000pt]DOUBLE
| [size=7.5000pt]round(DOUBLE a)
| [size=7.5000pt]Returns the rounded [size=7.5000pt]BIGINT[size=7.5000pt] value of [size=7.5000pt]a[size=7.5000pt].
| [size=7.5000pt]DOUBLE
| [size=7.5000pt]round(DOUBLE a, INT d)
| [size=7.5000pt]Returns [size=7.5000pt]a[size=7.5000pt] rounded to [size=7.5000pt]d[size=7.5000pt] decimal places.
| [size=7.5000pt]BIGINT
| [size=7.5000pt]floor(DOUBLE a)
| [size=7.5000pt]Returns the maximum [size=7.5000pt]BIGINT[size=7.5000pt] value that is equal to or less than [size=7.5000pt]a[size=7.5000pt].
[size=7.5000pt]向下取整
| [size=7.5000pt]BIGINT
| [size=7.5000pt]ceil(DOUBLE a), ceiling(DOUBLE a)
| [size=7.5000pt]Returns the minimum BIGINT value that is equal to or greater than [size=7.5000pt]a[size=7.5000pt].
[size=7.5000pt]向上取整
| [size=7.5000pt]DOUBLE
| [size=7.5000pt]rand(), rand(INT seed)
| [size=7.5000pt]Returns a random number (that changes from row to row) that is distributed uniformly from 0 to 1. Specifying the seed will make sure the generated random number sequence is deterministic.
| [size=7.5000pt]DOUBLE
| [size=7.5000pt]pow(DOUBLE a, DOUBLE p), power(DOUBLE a, DOUBLE p)
| [size=7.5000pt]Returns [size=7.5000pt]a[size=7.5000pt]p[size=7.5000pt].
| [size=7.5000pt]DOUBLE
| [size=7.5000pt]abs(DOUBLE a)
| [size=7.5000pt]Returns the absolute value.
| [size=7.5000pt]INT or DOUBLE
| [size=7.5000pt]negative(INT a), negative(DOUBLE a)
| [size=7.5000pt]Returns [size=7.5000pt]-a[size=7.5000pt].
| [size=7.5000pt]T
| [size=7.5000pt]greatest(T v1, T v2, ...)
| [size=7.5000pt]Returns the greatest value of the list of values (as of Hive [size=7.5000pt] 1.1.0). Fixed to return NULL when one or more arguments are NULL, and strict type restriction relaxed, consistent with ">" operator (as of Hive [size=7.5000pt] 2.0.0).
[size=10.5000pt]返回列表最大值
| [size=7.5000pt]T
| [size=7.5000pt]least(T v1, T v2, ...)
| [size=7.5000pt]Returns the least value of the list of values (as of Hive [size=7.5000pt] 1.1.0). Fixed to return NULL when one or more arguments are NULL, and strict type restriction relaxed, consistent with "<" operator (as of Hive [size=7.5000pt] 2.0.0).
[size=10.5000pt]返回列表最小值
| 集合函数 | | | [size=7.5000pt]int
| [size=7.5000pt]size(Map<K.V>)
| [size=7.5000pt]Returns the number of elements in the map type.
| [size=7.5000pt]int
| [size=7.5000pt]size(Array<T>)
| [size=7.5000pt]Returns the number of elements in the array type.
| [size=7.5000pt]array<K>
| [size=7.5000pt]map_keys(Map<K.V>)
| [size=7.5000pt]Returns an unordered array containing the keys of the input map.
| [size=7.5000pt]array<V>
| [size=7.5000pt]map_values(Map<K.V>)
| [size=7.5000pt]Returns an unordered array containing the values of the input map.
| [size=7.5000pt]boolean
| [size=7.5000pt]array_contains(Array<T>, value)
| [size=7.5000pt]Returns TRUE if the array contains value.
| [size=7.5000pt]array<t>
| [size=7.5000pt]sort_array(Array<T>)
| [size=7.5000pt]Sorts the input array in ascending order according to the natural ordering of the array elements and returns it (as of version [size=7.5000pt] 0.9.0).
| 类型转换函数 | | | Expected "=" to follow "type"
| [size=7.5000pt]cast(expr as <type>)
| [size=7.5000pt]Converts the results of the expression expr to <type>. For example, cast('1' as BIGINT) will convert the string '1' to its integral representation. A null is returned if the conversion does not succeed. If cast(expr as boolean) Hive returns true for a non-empty string.
[size=7.5000pt]类型转换 SELECT CAST('2000-01-01' AS DATE) from dual
| 日期函数 | | | pre 2.1.0:[size=7.5000pt] string
2.1.0 on:[size=7.5000pt] date
| [size=7.5000pt]to_date(string timestamp)
| [size=7.5000pt]Returns the date part of a timestamp string (pre-Hive 2.1.0): to_date("1970-01-01 00:00:00") = "1970-01-01". As of Hive 2.1.0, returns a date object.
[size=7.5000pt]Prior to Hive 2.1.0 ([size=7.5000pt] HIVE-13248) the return type was a String because no Date type existed when the method was created.
[size=12.0000pt]日期字符串->日期类型
| [size=7.5000pt]int
| [size=7.5000pt]year(string date)
| [size=7.5000pt]Returns the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970.
[size=7.5000pt]获取年份
| [size=7.5000pt]int
| [size=7.5000pt]quarter(date/timestamp/string)
| [size=7.5000pt]Returns the quarter of the year for a date, timestamp, or string in the range 1 to 4 (as of Hive [size=7.5000pt] 1.3.0). Example: quarter('2015-04-08') = 2.
| [size=7.5000pt]int
| [size=7.5000pt]month(string date)
| [size=7.5000pt]Returns the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11.
| [size=7.5000pt]int
| [size=7.5000pt]day(string date) dayofmonth(date)
| [size=7.5000pt]Returns the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1.
| [size=7.5000pt]int
| [size=7.5000pt]hour(string date)
| [size=7.5000pt]Returns the hour of the timestamp: hour('2009-07-30 12:58:59') = 12, hour('12:58:59') = 12.
| [size=7.5000pt]int
| [size=7.5000pt]minute(string date)
| [size=7.5000pt]Returns the minute of the timestamp.
| [size=7.5000pt]int
| [size=7.5000pt]second(string date)
| [size=7.5000pt]Returns the second of the timestamp.
| [size=7.5000pt]int
| [size=7.5000pt]weekofyear(string date)
| [size=7.5000pt]Returns the week number of a timestamp string: weekofyear("1970-11-01 00:00:00") = 44, weekofyear("1970-11-01") = 44.
| [size=7.5000pt]int
| [size=7.5000pt]extract(field FROM source)
| [size=7.5000pt]Retrieve fields such as days or hours from source (as of Hive [size=7.5000pt] 2.2.0). Source must be a date, timestamp, interval or a string that can be converted into either a date or timestamp. Supported fields include: day, dayofweek, hour, minute, month, quarter, second, week and year.
[size=7.5000pt]Examples:
[size=7.5000pt]select extract(month from "2016-10-20") results in 10.
[size=7.5000pt]select extract(hour from "2016-10-20 05:06:07") results in 5.
[size=7.5000pt]select extract(dayofweek from "2016-10-20 05:06:07") results in 5.
[size=7.5000pt]select extract(month from interval '1-3' year to month) results in 3.
[size=7.5000pt]select extract(minute from interval '3 12:20:30' day to second) results in 20.
[size=7.5000pt]截取指定的年/月/日/。。。。
| [size=7.5000pt]int
| [size=7.5000pt]datediff(string enddate, string startdate)
| [size=7.5000pt]Returns the number of days from startdate to enddate: datediff('2009-03-01', '2009-02-27') = 2.
[size=7.5000pt]日期相差天数
| pre 2.1.0:[size=7.5000pt] string
2.1.0 on:[size=7.5000pt] date
| [size=7.5000pt]date_add(date/timestamp/string startdate, tinyint/smallint/int days)
| [size=7.5000pt]Adds a number of days to startdate: date_add('2008-12-31', 1) = '2009-01-01'.
[size=7.5000pt]Prior to Hive 2.1.0 ([size=7.5000pt] HIVE-13248) the return type was a String because no Date type existed when the method was created.
[size=12.0000pt]给日期增加天数 负数减少天数
| pre 2.1.0:[size=7.5000pt] string
2.1.0 on:[size=7.5000pt] date
| [size=7.5000pt]date_sub(date/timestamp/string startdate, tinyint/smallint/int days)
| [size=7.5000pt]Subtracts a number of days to startdate: date_sub('2008-12-31', 1) = '2008-12-30'.
[size=7.5000pt]Prior to Hive 2.1.0 ([size=7.5000pt] HIVE-13248) the return type was a String because no Date type existed when the method was created.
| [size=7.5000pt]date
| [size=7.5000pt]current_date
| [size=7.5000pt]Returns the current date at the start of query evaluation (as of Hive [size=7.5000pt] 1.2.0). All calls of current_date within the same query return the same value.
[size=12.0000pt]当前日期
| [size=7.5000pt]timestamp
| [size=7.5000pt]current_timestamp
| [size=7.5000pt]Returns the current timestamp at the start of query evaluation (as of Hive [size=7.5000pt] 1.2.0). All calls of current_timestamp within the same query return the same value.
| [size=7.5000pt]string
| [size=7.5000pt]add_months(string start_date, int num_months, [size=7.5000pt][output_date_format][size=7.5000pt])
| [size=7.5000pt]Returns the date that is num_months after start_date (as of Hive [size=7.5000pt] 1.1.0). start_date is a string, date or timestamp. num_months is an integer. If start_date is the last day of the month or if the resulting month has fewer days than the day component of start_date, then the result is the last day of the resulting month. Otherwise, the result has the same day component as start_date. The default output format is 'yyyy-MM-dd'.
[size=7.5000pt]Before Hive 4.0.0, the time part of the date is ignored.
[size=7.5000pt]As of Hive [size=7.5000pt] 4.0.0, add_months supports an optional argument output_date_format, which accepts a String that represents a valid date format for the output. This allows to retain the time format in the output.
[size=7.5000pt]For example :
[size=7.5000pt]add_months('2009-08-31', 1) returns '2009-09-30'.[size=7.5000pt]
[size=7.5000pt]add_months('2017-12-31 14:15:16', 2, 'YYYY-MM-dd HH:mm:ss') returns '2018-02-28 14:15:16'.
[size=7.5000pt]增加月
| [size=7.5000pt]string
| [size=7.5000pt]last_day(string date)
| [size=7.5000pt]Returns the last day of the month which the date belongs to (as of Hive [size=7.5000pt] 1.1.0). date is a string in the format 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'. The time part of date is ignored.
[size=10.5000pt]当月最后一天
| [size=7.5000pt]string
| [size=7.5000pt]next_day(string start_date, string day_of_week)
| [size=7.5000pt]Returns the first date which is later than start_date and named as day_of_week (as of Hive [size=7.5000pt] 1.2.0). start_date is a string/date/timestamp. day_of_week is 2 letters, 3 letters or full name of the day of the week (e.g. Mo, tue, FRIDAY). The time part of start_date is ignored. Example: next_day('2015-01-14', 'TU') = 2015-01-20.
[size=10.5000pt]返回给定日期第二天起,一周内指定周几的日期
| [size=7.5000pt]string
| [size=7.5000pt]trunc(string date, string format)
| [size=7.5000pt]Returns date truncated to the unit specified by the format (as of Hive [size=7.5000pt] 1.2.0). Supported formats: MONTH/MON/MM, YEAR/YYYY/YY. Example: trunc('2015-03-17', 'MM') = 2015-03-01.
[size=10.5000pt]获取当年( yyyy)/月(MM)第一天
| [size=7.5000pt]double
| [size=7.5000pt]months_between(date1, date2)
| [size=7.5000pt]Returns number of months between dates date1 and date2 (as of Hive [size=7.5000pt] 1.2.0). If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise the UDF calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. date1 and date2 type can be date, timestamp or string in the format 'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss'. The result is rounded to 8 decimal places. Example: months_between('1997-02-28 10:30:00', '1996-10-30') = 3.94959677
[size=10.5000pt]两日期相差多少月
| [size=7.5000pt]string
| [size=7.5000pt]date_format(date/timestamp/string ts, string fmt)
| [size=7.5000pt]Converts a date/timestamp/string to a value of string in the format specified by the date format fmt (as of Hive [size=7.5000pt] 1.2.0). Supported formats are Java SimpleDateFormat formats – [size=7.5000pt] https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html. The second argument fmt should be constant. Example: date_format('2015-04-08', 'y') = '2015'.
[size=7.5000pt]date_format can be used to implement other UDFs, e.g.:
[size=10.0000pt]· [size=7.5000pt]dayname(date) is date_format(date, 'EEEE')
[size=10.0000pt]· [size=7.5000pt]dayofyear(date) is date_format(date, 'D')
[size=7.5000pt]日期格式化
|
表生成函数 Examples
explode (array)
[size=10.5000pt]
[size=7.5000pt]select explode(array('A','B','C'));
[size=7.5000pt]select explode(array('A','B','C')) as col;
[size=7.5000pt]select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;
[size=7.5000pt]
col | [size=7.5000pt]A
| [size=7.5000pt]B
| [size=7.5000pt]C
| [size=7.5000pt]
explode (map)
[size=10.5000pt]
[size=7.5000pt]select explode(map('A',10,'B',20,'C',30));
[size=7.5000pt]select explode(map('A',10,'B',20,'C',30)) as (key,value);
[size=7.5000pt]select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as key,value;
[size=7.5000pt]
| | [size=7.5000pt]A
| [size=7.5000pt]10
| [size=7.5000pt]B
| [size=7.5000pt]20
| [size=7.5000pt]C
| [size=7.5000pt]30
| [size=7.5000pt]
posexplode (array)
[size=10.5000pt]
[size=7.5000pt]select posexplode(array('A','B','C'));
[size=7.5000pt]select posexplode(array('A','B','C')) as (pos,val);
[size=7.5000pt]select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf as pos,val;
[size=7.5000pt]
| | [size=7.5000pt]0
| [size=7.5000pt]A
| [size=7.5000pt]1
| [size=7.5000pt]B
| [size=7.5000pt]2
| [size=7.5000pt]C
| [size=7.5000pt]
inline (array of structs)
[size=10.5000pt]
[size=7.5000pt]select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02')));
[size=7.5000pt]select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) as (col1,col2,col3);
[size=7.5000pt]select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf;
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf as col1,col2,col3;
| | | [size=7.5000pt]A
| [size=7.5000pt]10
| [size=7.5000pt]2015-01-01
| [size=7.5000pt]B
| [size=7.5000pt]20
| [size=7.5000pt]2016-02-02
| [size=10.5000pt]
stack (values)
[size=10.5000pt]
[size=7.5000pt]select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01');
[size=7.5000pt]select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2);
[size=7.5000pt]select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf;
select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf as col0,col1,col2;
[size=7.5000pt]
| | | [size=7.5000pt]A
| [size=7.5000pt]10
| [size=7.5000pt]2015-01-01
| [size=7.5000pt]B
| [size=7.5000pt]20
| [size=7.5000pt]2016-01-01
| [size=10.5000pt]
explode
[size=7.5000pt]explode()[size=7.5000pt] takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows. UDTFs can be used in the SELECT expression list and as a part of LATERAL VIEW.
[size=7.5000pt]As an example of using [size=7.5000pt]explode()[size=7.5000pt] in the SELECT expression list, consider a table named myTable that has a single column (myCol) and two rows:
| [size=7.5000pt][100,200,300]
| [size=7.5000pt][400,500,600]
| [size=7.5000pt]Then running the query:
[size=7.5000pt]SELECT explode(myCol) AS myNewCol FROM myTable;
[size=7.5000pt]will produce:
| [size=7.5000pt]100
| [size=7.5000pt]200
| [size=7.5000pt]300
| [size=7.5000pt]400
| [size=7.5000pt]500
| [size=7.5000pt]600
| [size=7.5000pt]The usage with Maps is similar:
[size=7.5000pt]SELECT explode(myMap) AS (myMapKey, myMapValue) FROM myMapTable;
[size=10.5000pt]
posexplode
[size=7.5000pt]Version
[size=7.5000pt]Available as of Hive 0.13.0. See HIVE-4943[size=7.5000pt].
[size=7.5000pt]posexplode()[size=7.5000pt] is similar to [size=7.5000pt]explode[size=7.5000pt] but instead of just returning the elements of the array it returns the element as well as its position in the original array.
[size=7.5000pt]As an example of using [size=7.5000pt]posexplode()[size=7.5000pt] in the SELECT expression list, consider a table named myTable that has a single column (myCol) and two rows:
| [size=7.5000pt][100,200,300]
| [size=7.5000pt][400,500,600]
| [size=7.5000pt]Then running the query:
[size=7.5000pt]SELECT posexplode(myCol) AS pos, myNewCol FROM myTable;
[size=7.5000pt]will produce:
| | [size=7.5000pt]1
| [size=7.5000pt]100
| [size=7.5000pt]2
| [size=7.5000pt]200
| [size=7.5000pt]3
| [size=7.5000pt]300
| [size=7.5000pt]1
| [size=7.5000pt]400
| [size=7.5000pt]2
| [size=7.5000pt]500
| [size=7.5000pt]3
| [size=7.5000pt]600
| [size=10.5000pt]
json_tuple
[size=7.5000pt]For example,
[size=7.5000pt]select a.timestamp, get_json_object(a.appevents, '$.eventid'), get_json_object(a.appenvets, '$.eventname') from log a;
[size=7.5000pt]should be changed to:
select a.timestamp, b.*
from log a lateral view json_tuple(a.appevent, 'eventid', 'eventname') b as f1, f2;
[size=10.5000pt]
parse_url_tuple
[size=10.5000pt]
SELECT b.*
FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY', 'QUERY:id') b as host, path, query, query_id LIMIT 1;
[size=7.5000pt]
将一个字段,拆成多个字段
create table t_ods_tmp_referurl as
SELECT a.*,b.*
FROM ods_weblog_origin a
LATERAL VIEW parse_url_tuple(regexp_replace(http_referer, "\"", ""),
'HOST', 'PATH','QUERY', 'QUERY:id') b as host, path, query, query_id;
注: lateral view 用于和 split, explode 等 UDTF 一起使用,它能够将一列数据拆成多行数据。
UDTF(User-Defined Table-Generating Functions) 用来解决输入一行输出多行(Onto-many maping) 的需求。 Explode 也是拆列函数,比如 Explode (ARRAY) , array 中的每个元素生成一行。
select *
from dual
LATERAL VIEW parse_url_tuple('http://www.baidu.com/user/log?id=1&name=wfh',
'HOST', 'PATH','QUERY', 'QUERY:id') t as host, path, query, query_id;
+----------------+------------+----------------+--------------+--+
| t.host | t.path | t.query | t.query_id |
+----------------+------------+----------------+--------------+--+
| www.baidu.com | /user/log | id=1&name=wfh | 1 |
+----------------+------------+----------------+--------------+--+
[size=7.5000pt]
排序函数[size=7.5000pt]将结果集按照指定字段分组,在组内按照指定字段排序,然后该函数为每组生成一个行号
[size=7.5000pt]row_number() over() 生成组内连续且唯一的数字
[size=7.5000pt]select name, row_number() over (partition by deptno order by sal desc) rank from emp
[size=7.5000pt]rank() over() 生成组内不连续 也不唯一的数字
[size=7.5000pt]dense_ranke() over() 生成组内连续不唯一的数字
[size=7.5000pt]1.row_number() over(partition by xx order by xx)
[size=7.5000pt]2.row_number() over(distribute by xx sort by xx)
行转列列转行表1:cityInfo
表2:cityInfoSet
[size=7.5000pt]
[size=10.5000pt]行转列
表1=>表2 可以使用 hive 的内置函数 concat_ws() 和 collect_set()进行转换:
select cityname,concat_ws(',',collect_set(regionname)) as address_set from cityInfo group by cityname;
列出该字段所有的值
collect_set去除重复元素;collect_list不去除重复元素;需要进行group by
[size=10.5000pt]列转行
表2=>表1 可以使用 hive 的内置函数 explode()进行转化
select cityname, region from cityInfoSet lateral view explode(split(address_set, ',')) aa as region;
[size=10.5000pt]
|
|