黑马程序员技术交流社区

标题: 【上海校区】Hive内置函数和自定义函数UDF [打印本页]

作者: 不二晨 时间: 2019-3-15 16:16
标题: 【上海校区】Hive内置函数和自定义函数UDF
Hive内置函数

Hive为我们提供了很多内置函数

查看方式:

官方地址
启动hive后通过命令查看函数
SHOW FUNCTIONS;
DESCRIBE FUNCTION <function_name>;
DESCRIBE FUNCTION EXTENDED <function_name>;
1
2
3
hive> show functions;
...
Time taken: 1.933 seconds, Fetched: 220 row(s)
hive> DESCRIBE FUNCTION case;
OK
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return f
Time taken: 0.009 seconds, Fetched: 1 row(s)
hive>  DESCRIBE FUNCTION EXTENDED case;
OK
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return f
Example:
SELECT
CASE deptno
WHEN 1 THEN Engineering
WHEN 2 THEN Finance
ELSE admin
END,
CASE zone
WHEN 7 THEN Americas
ELSE Asia-Pac
END
FROM emp_details
Time taken: 0.005 seconds, Fetched: 13 row(s)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
自定义函数

临时函数

自定义UDF需要继承org.apache.hadoop.hive.ql.UDF

需要实现evaluate函数，evaluate函数支持重载。(注意：在一个库中创建的UDF函数，不能在另一个库中使用 )

package com.cnyimi.udf;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;

@Description(name = "myadd",
value = "myadd(int a , int b) ==> return a + b ",
extended = "Example:\n"
+ " myadd(1,1) ==> 2 \n"
+ " myadd(1,2,3) ==> 6;")
public class AddUDF extends UDF {

public int evaluate(int a ,int b) {
return a + b ;
}

public int evaluate(int a ,int b , int c) {
return a + b + c;
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
将程序打成jar包放到目标机器上去

cmd>cd {classes所在目录}
cmd>jar cvf HiveDemo.jar -C x/x/x/x/classes/ .
1
2
或使用maven插件打包

进入hive客户端，添加jar包

hive> add jar /root/test/HiveDemo.jar;
或 (使用hdfs)
hive> add jar hdfs://nameservice1/user/hive/udf/HiveDemo.jar;
1
2
3
创建临时函数

hive> CREATE TEMPORARY FUNCTION add_example AS 'com.cnyimi.udf.AddUDF';
1
查询HQL语句

SELECT add_example(8, 9) FROM scores;
SELECT add_example(scores.math, scores.art) FROM scores;
SELECT add_example(6, 7, 8, 6.8) FROM scores;
1
2
3
销毁临时函数

hive> DROP TEMPORARY FUNCTION add_example;
1
注意：UDF只能实现一进一出的操作，如果需要实现多进一出，则需要实现UDAF；
一进多出，则需要实现UDTF。

永久函数

自定义UDF需要继承org.apache.hadoop.hive.ql.UDF。（注意该类的包名必须是org.apache.hadoop.hive.ql.udf）
需要实现evaluate函数，evaluate函数支持重载。
修改FunctionRegistry这个类，注册定义的udf
把udf函数编译成class放到hive-exec-0.12.0-cdh5.0.0.jar中org\apache\hadoop\hive\ql\udf 路径下面
将新的FunctionRegistry编译后的class文件替换hive-exec-0.12.0-cdh5.0.0.jar中的org.apache.hadoop.hive.ql.exec
或

将jar包上传到hdfs上
[root@node00 ~]# hdfs dfs -put /root/test/substring_index_UDF.jar /user/hive/udf
[root@node00 ~]#  hdfs dfs -ls /user/hive/udf
Found 1 items
-rw-r--r-- 3 yarn hive    24859 2018-12-21 20:48 /user/hive/udf/substring_index_UDF.jar
1
2
3
4
创建永久函数
hive> CREATE FUNCTION substring_index AS 'com.hnb.data.hive.SubStringIndexUDF' using jar 'hdfs://nameservice1/user/hive/udf/substring_index_UDF.jar';
1
注意：执行这条语句创建永久函数，show functiuons 会加上默认的数据库名在函数名前。（default.say_hello1）

在MySQL中查询创建的自定义函数
mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from funcs;
ERROR 1146 (42S02): Table 'hive.funcs' doesn't exist
mysql> SELECT * FROM FUNCS;
+---------+-------------------------------------+-------------+-------+-----------------+-----------+------------+------------+
| FUNC_ID | CLASS_NAME                         | CREATE_TIME | DB_ID | FUNC_NAME    | FUNC_TYPE | OWNER_NAME | OWNER_TYPE |
+---------+-------------------------------------+-------------+-------+-----------------+-----------+------------+------------+
|    1 | com.hnb.data.hive.SubStringIndexUDF |  1545016176 |    1 | substring_index |       1 | NULL    | USER    |
+---------+-------------------------------------+-------------+-------+-----------------+-----------+------------+------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM FUNC_RU;
+---------+---------------+-----------------------------------------------------------+-------------+
| FUNC_ID | RESOURCE_TYPE | RESOURCE_URI                                           | INTEGER_IDX |
+---------+---------------+-----------------------------------------------------------+-------------+
|    1 |          1 | hdfs://nameservice1/user/hive/udf/substring_index_UDF.jar |          0 |
+---------+---------------+-----------------------------------------------------------+-------------+
1 row in set (0.00 sec)

mysql>
---------------------
【转载，仅作分享，侵删】
作者：storm_fury
来源：CSDN
原文：https://blog.csdn.net/weixin_43215250/article/details/84837350
版权声明：本文为博主原创文章，转载请附上博文链接！

作者: 不二晨 时间: 2019-3-20 17:10

加油，感谢分享

欢迎光临黑马程序员技术交流社区 (http://bbs.itheima.com/)

黑马程序员IT技术论坛 X3.2