1、下载和安装
1 |
wget http://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz |
-
上传sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz到/tools目录下
-
解压安装, 改名
1 2 3 |
tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /opt/pkg/ cd /opt/pkg/ mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop-1.4.7 |
- 配置环境变量
1 2 3 |
#sqoop export SQOOP_HOME=/training/sqoop-1.4.7.bin__hadoop-2.6.0 export PATH=$PATH:$SQOOP_HOME/bin |
2、配置
- 配置sqoop的环境和配置
把Sqoop可能使用的环境变量都配置上
修改sqoop-env.sh:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
$ mv conf/sqoop-env-template.sh conf/sqoop-env.sh $ vi conf/sqoop-env.sh # Set Hadoop-specific environment variables here. #Set path to where bin/hadoop is available export HADOOP_COMMON_HOME=/opt/pkg/hadoop #Set path to where hadoop-*-core.jar is available export HADOOP_MAPRED_HOME=/opt/pkg/hadoop #set the path to where bin/hbase is available export HBASE_HOME=/opt/pkg/hbase #Set the path to where bin/hive is available export HIVE_HOME=/opt/pkg/hive #Set the path for where zookeper config dir is export ZOOCFGDIR=/opt/pkg/zookeeper/conf |
- 修改sqoop-site.xml
具体配置如下文件所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>sqoop.metastore.client.enable.autoconnect</name> <value>true</value> <description>If true, Sqoop will connect to a local metastore for job management when no other metastore arguments are provided. </description> </property> <property> <name>sqoop.metastore.client.autoconnect.url</name> <value>jdbc:hsqldb:file:/tmp/sqoop-meta/meta.db;shutdown=true</value> <description>The connect string to use when connecting to a job-management metastore. If unspecified, uses ~/.sqoop/. You can specify a different path here. </description> </property> <property> <name>sqoop.metastore.client.autoconnect.username</name> <value>SA</value> <description>The username to bind to the metastore. </description> </property> <property> <name>sqoop.metastore.client.autoconnect.password</name> <value></value> <description>The password to bind to the metastore. </description> </property> <property> <name>sqoop.metastore.client.record.password</name> <value>true</value> <description>If true, allow saved passwords in the metastore. </description> </property> <property> <name>sqoop.metastore.server.location</name> <value>/tmp/sqoop-metastore/shared.db</value> <description>Path to the shared metastore database files. If this is not set, it will be placed in ~/.sqoop/. </description> </property> <property> <name>sqoop.metastore.server.port</name> <value>16000</value> <description>Port that this metastore should listen on. </description> </property> </configuration> |
- 修改configure-sqoop
1 |
vi bin/configure-sqoop |
将出现HCAT_HOME和ACCUMULO_HOME的判断逻辑注释掉.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
[hadoop@hadoop100 sqoop-1.4.7]$ vi bin/configure-sqoop 82 #if [ -z "${HCAT_HOME}" ]; then 83 # if [ -d "/usr/lib/hive-hcatalog" ]; then 84 # HCAT_HOME=/usr/lib/hive-hcatalog 85 # elif [ -d "/usr/lib/hcatalog" ]; then 86 # HCAT_HOME=/usr/lib/hcatalog 87 # else 88 # HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog 89 # if [ ! -d ${HCAT_HOME} ]; then 90 # HCAT_HOME=${SQOOP_HOME}/../hcatalog 91 # fi 92 # fi 93 #fi 94 #if [ -z "${ACCUMULO_HOME}" ]; then 95 # if [ -d "/usr/lib/accumulo" ]; then 96 # ACCUMULO_HOME=/usr/lib/accumulo 97 # else 98 # ACCUMULO_HOME=${SQOOP_HOME}/../accumulo 99 # fi 100 #fi 134 ## Moved to be a runtime check in sqoop. 135 #if [ ! -d "${HCAT_HOME}" ]; then 136 # echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail." 137 # echo 'Please set $HCAT_HOME to the root of your HCatalog installation.' 138 #fi 139 140 #if [ ! -d "${ACCUMULO_HOME}" ]; then 141 # echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail." 142 # echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.' 143 #fi |
这样做的目的是避免运行时出现下面的警告信息,其实也可以不用注释掉:
1234 Warning: /opt/module/sqoop/bin/…/…/hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /opt/module/sqoop/bin/…/…/accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.
- 添加所需的Jar包
将MySQL的驱动(使用5.x版本,不要使用高版本的)上传到sqoop安装目录下的lib目录下
1 |
$ cp mysql-connector-java-5.1.44-bin.jar /opt/pkg/sqoop-1.4.7/lib/ |
将$HIVE_HOME/lib/hive-common-3.1.2.jar
拷贝或者软链接到$SQOOP_HOME/lib
下
1 |
$ ln -s /opt/pkg/hive/lib/hive-common-3.1.2.jar /opt/pkg/hbase/lib |
如果需要解析json,可下载java-json.har,放到sqoop目录下的lib里
1 |
$ cp java-json.jar /opt/pkg/sqoop-1.4.7/lib/ |
3、验证
sqoop version
1 2 3 4 5 |
[hadoop@hadoop100 zookeeper-3.4.11]$ sqoop-version 21/01/11 00:48:43 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Sqoop 1.4.7 git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8 Compiled by maugli on Thu Dec 21 15:59:58 STD 2017 |
4.异常处理
- sqoop导入hive表报错解决
1 |
Could not load org.apache.hadoop.hive.conf.HiveConf |
解决方法:
只需把hive安装文件夹里的/lib/hive-common-x.x.x.jar复制出来放在sqoop安装文件夹的lib文件夹里就行的了
注:上面的x.x.x意思不是这个jar包真的叫hive-common-x.x.x.jar,是表示版本号的,比如说我的hive版本是2.3.3的,所以我这个jar包名是hive-common-2.3.3.jar的。
- 查询导入报错
1 2 |
INFO mapreduce.ImportJobBase: Beginning query import. Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject |
根据错误定位导缺少java-json jar包
然后去下载这个jar
http://www.java2s.com/Code/Jar/j/Downloadjavajsonjar.htm
把jar包拷贝到SQOOP_HOME/lib下面,然后重新运行SQOOP
Views: 35