Sqoop安装环境Centos7.6 + Hadoop 3.2.0 + Hive3 + sqoop-1.4.7.bin__hadoop-2.6.0 【sqoop-部署】快速入门(安装与使用) Sqoop架构以及应用介绍
Sqoop架构介绍 Sqoop在业务当中的实际地位 一个流程图介绍Hadoop业务的开发流程以及Sqoop在业务当中的实际地位。![]() 如上图所示:在实际的业务当中,我们首先对原始数据集通过MapReduce进行数据清洗,然后将清洗后的数据存入到Hbase数据库中,而后通过数据仓库Hive对Hbase中的数据进行统计与分析,分析之后将分析结果存入到Hive表中,然后通过Sqoop这个工具将我们的数据挖掘结果导入到MySql数据库中,最后通过Web将结果展示给客户。 Sqoop的概念 正如Sqoop的名字所示:Sqoop是一个用来将关系型数据库和Hadoop中的数据进行相互转移的工具,
可以将一个关系型数据库(例如Mysql、Oracle)中的数据导入到Hadoop(例如HDFS、Hive、Hbase)中,
也可以将Hadoop(例如HDFS、Hive、Hbase)中的数据导入到关系型数据库(例如Mysql、Oracle)中。- 如下图所示:[size=0.9em]Sqoop收到客户端的shell命令或者Java api命令后,通过Sqoop中的任务翻译器 (Task Translator)
将命令转换为对应的MapReduce任务,而后将关系型数据库和Hadoop中的数据进行相互转移,进而完成数
据的拷贝。Sqoop注意- 把HDFS、Hive、HBase中的数据导出到MySQL、Oracle等数据库中。
- 1.4 为sqoop1, 1.9 为sqoop2 ,sqoop1与sqoop2是不兼容的。
下载安装sqoop download说明:sqoop对应hadoop的版本要求不大,hadoop2.x与hadoop3.x都可以使用 wget http://mirror.bit.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.tar.gz
tar -zxvf sqoop-1.4.7.tar.gz
mv sqoop-1.4.7.tar.gz sqoop修改配置cd sqoop/conf/
cp sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh
#添加
export HADOOP_COMMON_HOME=/server/hadoop
export HADOOP_MAPRED_HOME=/server/hadoop
export HIVE_HOME=/server/hive
export ZOOCFGDIR=/server/hive/hcatalog
vi /opt/apps/sqoop/bin/configure-sqoop
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi
#if [ ! -d "${ACCUMULO_HOME}" ]; then
# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi
# Add HCatalog to dependency list
#if [ -e "${HCAT_HOME}/bin/hcat" ]; then
# TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat -classpath`
# if [ -z "${HIVE_CONF_DIR}" ]; then
# TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}
# fi
# SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
#fi
# Add Accumulo to dependency list
#if [ -e "$ACCUMULO_HOME/bin/accumulo" ]; then
# for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*accumulo.*jar | cut -d':' -f2`; do
# SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
# done
# for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*zookeeper.*jar | cut -d':' -f2`; do
# SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
# done
#fi 添加驱动驱动下载地址将驱动包添加到lib/目录下 #没有mysql驱动,安装sql驱动
wget -P /opt/apps/sqoop/lib/ http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.40/mysql-connector-java-5.1.40.jar
#建议hadoop集群的mysql驱动统一,所以这里使用的是之前安装的hive驱动
ln -s /server/hive/lib/mysql-connector-java-8.0.17.jar /server/sqoop/lib/mysql-connector-java.jar配置环境变量(可选)vi /etc/profile
export SQOOP_HOME=/server/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
source /etc/profile 使用介绍[root@node1 bin]# sqoop help
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/server/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2019-10-13 02:59:12,286 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command. 由上可知命令格式:sqoop COMMAND [ARGS]参数介绍将mysql中的hive数据库中的consumer表通过sqoop导入到HDFS中
#(例子1)【将mysql中的hive数据库中的consumer表通过sqoop导入到HDFS中】
#建表
mysql>create table consumer(
id int unsigned NOT NULL AUTO_INCREMENT,
name varchar(32),
sex varchar(32),
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
#插入数据
insert into consumer values(null,'xiaoming迷宫','man');
insert into consumer values(null,'leyan','man');
insert into consumer values(null,'theShy','man');
insert into consumer values(null,'theshygirlfriend','woman');
#开始sqoop导入
sqoop import --connect jdbc:mysql://node1:3306/hive --table consumer --username root --password root123456 --target-dir /sqoopout/ --fields-terminated-by '\t' -m 1;![]() 数据
#user.txt
字段 属性 备注
uname varchar 用户名
vnum int 视频数
fnum int 朋友数 mysql>create database sqoop;
mysql>use sqoop;
mysql>create table user(
uname varchar(32),
vnum varchar(32),
fnum varchar(100)
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci; hdfs dfs -put user.txt hdfs://node1:9000/input/ sqoop export \
--connect 'jdbc:mysql://node1:3306/sqoop' \
--username 'root' \
--password 'root123456' \
--table 'user' \
--export-dir '/input/user.txt' \
--columns 'uname,vnum,fnum' \
--mapreduce-job-name 'hdfs to mysql' \
--input-fields-terminated-by '\t' \
--input-lines-terminated-by '\n'; 抽取Mysql数据到Hive- 将mysql表consumer中的数据导入hive的consumer_hive表,hive的consumer表不存在
#---(案例1)---
#需求: 将mysql表consumer中的数据导入hive的consumer_hive表,hive的consumer表不存在。
sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-table hive.mysql_to_hive_consumer --hive-import -m 1 --delete-target-dir ;
#mysql的数据
mysql> select * from consumer;
+----+------------------+-------+
| id | name | sex |
+----+------------------+-------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man |
| 22 | theShy | man |
| 23 | theshygirlfriend | woman |
+----+------------------+-------+
#导入结果
jdbc:hive2://node1:10000> select * from mysql_to_hive_consumer;
+----------------------------+------------------------------+-----------------------------+
| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |
+----------------------------+------------------------------+-----------------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man |
| 22 | theShy | man |
| 23 | theshygirlfriend | woman |
+----------------------------+------------------------------+-----------------------------+
- 在(案例1)的基础上,分别进行overwrite(覆盖)导入和into(直接加入)导入。
#---(案例2)---
#需求: 在(案例1)的基础上,分别进行overwrite(覆盖)导入和into(直接加入)导入。
into: 命令同(案例1)
overwrite:
sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-table hive.mysql_to_hive_consumer --hive-import -m 1 --hive-overwrite;
#修改mysql
mysql> update consumer set sex='woman,man' where id=23;
mysql> update consumer set sex='man,workman' where id=22;
mysql> update consumer set sex='man,workman,iconman' where id=21;
mysql> select * from consumer;
+----+------------------+---------------------+
| id | name | sex |
+----+------------------+---------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
+----+------------------+---------------------+
#原来的hive表
+----------------------------+------------------------------+-----------------------------+
| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |
+----------------------------+------------------------------+-----------------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man |
| 22 | theShy | man |
| 23 | theshygirlfriend | woman |
+----------------------------+------------------------------+-----------------------------+
#执行覆盖语句后的hive表
jdbc:hive2://node1:10000> select * from mysql_to_hive_consumer;
+----------------------------+------------------------------+-----------------------------+
| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |
+----------------------------+------------------------------+-----------------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
+----------------------------+------------------------------+-----------------------------+
- 在(案例2)的基础上,通过增加mysql的consumer表数据,增量导入到hive表中。
#---(案例3)---
#需求: 在(案例2)的基础上,通过增加mysql的consumer表数据,增量导入到hive表中。
#增加mysql的consumer表数据
insert into consumer values(null,'世纪大道东','man');
insert into consumer values(null,'皇后大道','woman');
insert into consumer values(null,'sky','man');
insert into consumer values(null,'OhYesYes','woman');
mysql> select * from consumer;
+----+------------------+---------------------+
| id | name | sex |
+----+------------------+---------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
+----+------------------+---------------------+
#hive增量插入前的数据
+----------------------------+------------------------------+-----------------------------+
| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |
+----------------------------+------------------------------+-----------------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
+----------------------------+------------------------------+-----------------------------+
#执行增量导入语句
sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --where "id>23" --hive-table hive.mysql_to_hive_consumer --hive-import -m 1;
#执行结果
jdbc:hive2://node1:10000> select * from mysql_to_hive_consumer;
+----------------------------+------------------------------+-----------------------------+
| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |
+----------------------------+------------------------------+-----------------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
+----------------------------+------------------------------+-----------------------------+- 将consumer表中的数据导出到使用','分割字段的hive表(hivetest2)中。
#---(案例4)---
#需求: 将consumer表中的数据导出到使用','分割字段的hive表(hivetest2)中。
#consumer表的数据
mysql> select * from consumer;
+----+------------------+---------------------+
| id | name | sex |
+----+------------------+---------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
+----+------------------+---------------------+
#创建hivetest2表
jdbc:hive2://node1:10000>create table hivetest2(id int,name string,sex string) row format delimited fields terminated by ',';
#执行导入数据命令
sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-table hive.hivetest2 --hive-import -m 1 --fields-terminated-by ",";
#执行结果
jdbc:hive2://node1:10000> select * from hivetest2;
+---------------+-------------------+----------------+
| hivetest2.id | hivetest2.name | hivetest2.sex |
+---------------+-------------------+----------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man |
| 22 | theShy | man |
| 23 | theshygirlfriend | woman |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
+---------------+-------------------+----------------+
#---(案例5)---
#需求: 在案例4的基础上,把mysql的数据导入数据到hdfs中。
#consumer表的数据
mysql> select * from consumer;
+----+------------------+---------------------+
| id | name | sex |
+----+------------------+---------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
+----+------------------+---------------------+
#执行导入命令
sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --target-dir /sqooptxt/consumer_mysql.txt -m 1;
![]()
- <span][size=0.9em]#---(案例6)---
#需求: 在案例5的基础上,【增量】把mysql的数据导入数据到hdfs中。
#consumer表的数据
insert into consumer values(null,'手机定位','woman');
insert into consumer values(null,'憨憨','woman');
insert into consumer values(null,'杰尼龟','man');
insert into consumer values(null,'杰尼龟头大','man');
mysql> select * from consumer;
+----+------------------+---------------------+
| id | name | sex |
+----+------------------+---------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
| 28 | 手机定位 | woman |
| 29 | 憨憨 | woman |
| 30 | 杰尼龟 | man |
| 31 | 杰尼龟头大 | man |
+----+------------------+---------------------+
#执行增量导入命令
sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --target-dir /input/consumer_mysql.txt -m 1 --check-column id --incremental append --last-value 20;
#执行结果抽取Hive数据到Mysql#mysql的consumer的数据
mysql> truncate table consumer;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into consumer values(24,'我不是世纪大道','woman');
mysql> insert into consumer values(1000,'杰尼龟,头大1000号','man');
mysql> select * from consumer;
+------+-------------------------+-------+
| id | name | sex |
+------+-------------------------+-------+
| 24 | 我不是世纪大道 | woman |
| 1000 | 杰尼龟,头大1000号 | man |
+------+-------------------------+--------+
#hive的表数据
jdbc:hive2://node1:10000> select * from mysql_to_hive_consumer;
+----------------------------+------------------------------+-----------------------------+
| mysql_to_hive_consumer.id | mysql_to_hive_consumer.name | mysql_to_hive_consumer.sex |
+----------------------------+------------------------------+-----------------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
+----------------------------+------------------------------+-----------------------------+
#hive的表结构
jdbc:hive2://node1:10000> show create table mysql_to_hive_consumer;
+-------------------------------------------------------------------------------+
| createtab_stmt |
+-------------------------------------------------------------------------------+
| CREATE TABLE `mysql_to_hive_consumer`( |
| `id` bigint, |
| `name` string, |
| `sex` string) |
| COMMENT 'Imported by sqoop on 2019/10/13 19:34:45' |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| WITH SERDEPROPERTIES ( |
| 'field.delim'='•', |
| 'line.delim'='\n', |
| 'serialization.format'='•') |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://node1:9000/user/hive/warehouse/hive.db/mysql_to_hive_consumer' |
| TBLPROPERTIES ( |
| 'bucketing_version'='2', |
| 'transient_lastDdlTime'='1570979547') |
+-------------------------------------------------------------------------------+
#执行抽取语句
#hivetest表分隔方式是'\n',但是export命令默认使用','分隔数据
sqoop export --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --export-dir hdfs://node1:9000/user/hive/warehouse/hive.db/mysql_to_hive_consumer --input-fields-terminated-by "\\01" --update-mode allowinsert --update-key id;
#执行结果
mysql> select * from consumer;
+------+-------------------------+---------------------+
| id | name | sex |
+------+-------------------------+---------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
| 1000 | 杰尼龟,头大1000号 | man |
+------+-------------------------+---------------------+sqoop从mysql到hive实现定时增量导入#hive的数据表列表信息
jdbc:hive2://node1:10000> use hive;
jdbc:hive2://node1:10000> show tables;
+-------------------------+
| tab_name |
+-------------------------+
| hivetest2 |
| mysql_to_hive_consumer |
+-------------------------+
#mysql第一次所有的数据
mysql> SELECT d.id,d.name,d.sex FROM hive.consumer d;
+------+-------------------------+---------------------+
| id | name | sex |
+------+-------------------------+---------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
| 1000 | 杰尼龟,头大1000号 | man |
+------+-------------------------+---------------------+
#(1)第一次全量抽取,并创建hive表
sqoop import \
--connect jdbc:mysql://node1:3306/hive \
--username root \
--password root123456 \
--direct \
--delete-target-dir \
--target-dir /user/hive/warehouse/hive.db/clocking_update_consumer \
--hive-import \
-m 2 \
--hive-database hive \
--hive-table clocking_update_consumer \
--create-hive-table \
--query 'select id,name,sex from consumer where $CONDITIONS' \
--split-by id;
#执行结果
+------------------------------+--------------------------------+-------------------------------+
| clocking_update_consumer.id | clocking_update_consumer.name | clocking_update_consumer.sex |
+------------------------------+--------------------------------+-------------------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky | man |
| 27 | OhYesYes | woman |
| 1000 | 杰尼龟,头大1000号 | man |
+------------------------------+--------------------------------+-------------------------------+
#(2)创建增量抽取的job
#给mysql的表添加数据(改变数据)
mysql>insert into consumer values(999,'卡咪龟','man');
mysql>insert into consumer values(998,'木木枭','woman');
mysql>insert into consumer values(1001,'木木枭02','woman');
mysql> update consumer set name='sky_changed被改了' where id=26;
mysql> select * from consumer;
+-------+-------------------------+---------------------+
| id | name | sex |
+-------+-------------------------+---------------------+
| 20 | xiaoming迷宫 | man |
| 21 | leyan | man,workman,iconman |
| 22 | theShy | man,workman |
| 23 | theshygirlfriend | woman,man |
| 24 | 世纪大道东 | man |
| 25 | 皇后大道 | woman |
| 26 | sky_changed被改了 | man |
| 27 | OhYesYes | woman |
| 998 | 木木枭 | woman |
| 999 | 卡咪龟 | man |
| 1000 | 杰尼龟,头大1000号 | man |
| 1001 | 木木枭02 | woman |
| 10001 | 木木枭2 | man |
+-------+-------------------------+---------------------+
#执行创建增量添加的job
sqoop job --create clocking_update_consumer_job -- import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --direct --target-dir /user/hive/warehouse/hive.db/clocking_update_consumer --delete-target-dir --hive-import --hive-database hive --hive-table clocking_update_consumer --query 'select id,name,sex from consumer where $CONDITIONS' --split-by 'id' --incremental append --check-column id --last-value 12;
#说明:
增量抽取,需要指定`--incremental append`,同时指定按照源表中哪个字段进行增量
`--check-column id`,并指定hive表中pk当前最大值`--last-value 1000`。
创建sqoop job的目的是,每次执行job以后,sqoop会自动记录pk的last-value,
下次再执行时,就会自动指定last-value,不需要手工去改了。
#怎么查询 sqoop 的 job 每次用到的 last-value 的值?
(1)查询表并获取最后一列的最大值。
(2)在sqoop中创建一个作业并将该列设置为增量列并继续前进,您的作业将逐步运行
⣠⣶⡾⠏⠉⠙⠳⢦⡀⠀⠀⠀ ⢠⠞⠉⠙⠲⡀
⠀⠀⠀⣴⠿⠏⠀⠀⠀⠀⠀⠀ ⢳⡀⠀ ⡏⠀⠀啊 ⢷
⠀⠀⢠⣟⣋⡀⢀⣀⣀⡀⠀⣀⡀⣧⠀ ⢸⠀挺好 ⡇
⠀⠀⢸⣯⡭⠁⠸⣛⣟⠆⡴⣻⡲⣿⠀ ⣸⠀的呢 ⡇
⠀⠀⣟⣿⡭⠀⠀⠀⠀⠀⢱⠀⠀⣿⠀ ⢹⠀⠀ ⡇
⠀⠀⠙⢿⣯⠄⠀⠀⠀⢀⡀⠀⠀⡿⠀⠀ ⡇⠀ ⡼
⠀⠀⠀⠀⠹⣶⠆⠀⠀⠀⠀⠀⡴⠃⠀⠀ ⠘⠤⣄⣠⠞⠀
⠀⠀⠀⠀⠀⢸⣷⡦⢤⡤⢤⣞⣁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⢀⣤⣴⣿⣏⠁⠀⠀⠸⣏⢯⣷⣖⣦⡀⠀⠀⠀⠀⠀⠀
⢀⣾⣽⣿⣿⣿⣿⠛⢲⣶⣾⢉⡷⣿⣿⠵⣿⠀⠀⠀⠀⠀⠀
⣼⣿⠍⠉⣿⡭⠉⠙⢺⣇⣼⡏⠀⠀⠀ ⣄⢸⠀⠀⠀⠀⠀⠀
⣿⣿⣧⣀⣿.........⣀⣰⣏⣘⣆⣀⠀⠀
(1)#/server/hadoop/etc/hadoop/mampred-site.xml报错
[2019-10-13 08:08:46.417]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
For more detailed output, check the application tracking page: http://node1:8088/cluster/app/application_1570906081976_0002 Then click on links to logs of each attempt.
. Failing the application.
2019-10-13 08:08:13,627 INFO mapreduce.Job: Counters: 0
2019-10-13 08:08:13,634 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
2019-10-13 08:08:13,636 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 4.5729 seconds (0 bytes/sec)
2019-10-13 08:08:13,642 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2019-10-13 08:08:13,643 INFO mapreduce.ImportJobBase: Retrieved 0 records.
2019-10-13 08:08:13,643 ERROR tool.ImportTool: Import failed: Import job failed!
#解决方法
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/server/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/server/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/server/hadoop</value>
</property>
(2)#sqoop 报 Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR 解决方法
将hive 里面的lib下的hive-exec-**.jar 放到sqoop 的lib 下可以解决以下问题。
(3)#字段分割不了
原因: 是user.txt文件的数据格式不正确
mysql> select * from user;
+-------------------------------------+------+------+
| uname | vnum | fnum |
+-------------------------------------+------+------+
| fnum int 朋友数 | NULL | NULL |
| vnum int 视频数 | NULL | NULL |
| 字段 属性 备注 | NULL | NULL |
| uname varchar 用户名 | NULL | NULL |
+-------------------------------------+------+------+
解决方法:user.txt每个字段后都要有一个tab空格,包括例如‘用户名’后也要
字段 属性 备注
uname varchar 用户名
vnum int 视频数
fnum int 朋友数
(4)#mysql->hive表,但是hive表建立了,但是没有数据
因为使用了`--hive-overwrite ` ,他的使用最好是表存在的情况,去掉就好
(5)#ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.json.JSONObject.<init>(JSONObject.java:144)
原因:sqoop没有找到HCAT_HOME
在sqoop/conf文件夹下执行 cp sqoop-env-template.sh sqoop-env.sh。
然后,修改sqoop-env.sh,加上:
export HCAT_HOME=/server/hive/hcatalog
(6)#2019-10-16 17:44:47,283 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.json.JSONObject.<init>(JSONObject.java:144)
at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)
at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785)
at 。。。。
解决方法:
#!!!待解决!!!导入: java-josn.jar包(未生效!)
继续
作业记住用于指定作业的参数,因此可以通过调用作业的句柄重新执行这些参数。如果将保存的作业配置为执行增量导入,则有关最近导入的行的状态将在保存的作业中更新,以允许作业仅持续导入最新的行。 #保存常用的作业,以便下次快速调用
-create <job-id> 创建一个新的job
-delete <job-id> 删除job
-exec <job-id> 执行job
-show <job-id> 显示job的参数
-list 列出所有的job创建job sqoop job -create myjob - import -connect jdbc:mysql://example.com/db ... -table mytable创建非增量job#将consumer表中的数据导出到使用','分割字段的hive表(hivetest2)中。
sqoop import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-table hive.hivetest2 --hive-import -m 1 --fields-terminated-by ",";
#把上面语句修改成job执行
sqoop job --create hivetest2job -- import --connect jdbc:mysql://node1:3306/hive --username root --password root123456 --table consumer --hive-import --hive-table hive.hivetest2 --fields-terminated-by "," -m 1;
-------------------------------------------------------------------
sqoop import --connect 'jdbc:mysql://172.17.1.103:3306/test?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull' --username root --password 123456 -table incr_test -hive-database test -hive-table test_incr_test_20171130 -hive-import -m -1 --incremental append --check-column time_update --last-value "2017-12-01"
1;
CREATE TABLE `consumer` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`age` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`sex` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci 列出所有job sqoop job -list查看job sqoop job -show myjob查看结果Job: clocking_update_consumer_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
split.limit = null
hbase.create.table = false
mainframe.input.dataset.type = p
db.require.password = false
skip.dist.cache = false
hdfs.append.dir = false
codegen.input.delimiters.escape = 0
import.fetch.size = null
accumulo.create.table = false
codegen.input.delimiters.enclose.required = false
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
codegen.input.delimiters.record = 0
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
customtool.options.jsonmap = {}
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 4
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 44
export.new.update = UpdateOnly
incremental.mode = None
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-root/compile/054b4b106ff93dfa99cc8994f47e1524
direct.import = false
temporary.dirRoot = _sqoop
hive.fail.table.exists = false
db.batch = false执行job sqoop job -exec myjob重写参数 sqoop job -exec myjob -username someuser -P感悟解决不了的问题,可以先跳过,说明了现在的能力还不够解决
2019年10月21日18:14:27
回首这句话,后来学习了Datax,确实发现是数据类型的问题,解决了sqoop创建job的问题,并且也爱上了Datax
|