09 – SQOOP安装配置

1、下载和安装

  1. 下载 http://archive.apache.org/dist/sqoop/1.4.7/
wget http://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
  1. 上传sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz到/tools目录下

  2. 解压安装, 改名

tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /opt/pkg/
cd /opt/pkg/
mv sqoop-1.4.7.bin__hadoop-2.6.0/ sqoop-1.4.7
  1. 配置环境变量
#sqoop
export SQOOP_HOME=/training/sqoop-1.4.7.bin__hadoop-2.6.0
export PATH=$PATH:$SQOOP_HOME/bin

2、配置

  1. 配置sqoop的环境和配置

​ 把Sqoop可能使用的环境变量都配置上

​ 修改sqoop-env.sh:

$ mv conf/sqoop-env-template.sh conf/sqoop-env.sh
$ vi conf/sqoop-env.sh 

# Set Hadoop-specific environment variables here.

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/pkg/hadoop

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/pkg/hadoop

#set the path to where bin/hbase is available
export HBASE_HOME=/opt/pkg/hbase

#Set the path to where bin/hive is available
export HIVE_HOME=/opt/pkg/hive

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/opt/pkg/zookeeper/conf
  1. 修改sqoop-site.xml

​ 具体配置如下文件所示:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

  <property>
    <name>sqoop.metastore.client.enable.autoconnect</name>
    <value>true</value>
    <description>If true, Sqoop will connect to a local metastore
      for job management when no other metastore arguments are
      provided.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.client.autoconnect.url</name>
    <value>jdbc:hsqldb:file:/tmp/sqoop-meta/meta.db;shutdown=true</value>
    <description>The connect string to use when connecting to a
      job-management metastore. If unspecified, uses ~/.sqoop/.
      You can specify a different path here.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.client.autoconnect.username</name>
    <value>SA</value>
    <description>The username to bind to the metastore.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.client.autoconnect.password</name>
    <value></value>
    <description>The password to bind to the metastore.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.client.record.password</name>
    <value>true</value>
    <description>If true, allow saved passwords in the metastore.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.server.location</name>
    <value>/tmp/sqoop-metastore/shared.db</value>
    <description>Path to the shared metastore database files.
    If this is not set, it will be placed in ~/.sqoop/.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.server.port</name>
    <value>16000</value>
    <description>Port that this metastore should listen on.
    </description>
  </property>

</configuration>
  1. 修改configure-sqoop
vi bin/configure-sqoop

​ 将出现HCAT_HOME和ACCUMULO_HOME的判断逻辑注释掉.

[hadoop@hadoop100 sqoop-1.4.7]$ vi bin/configure-sqoop

 82 #if [ -z "${HCAT_HOME}" ]; then
 83 #  if [ -d "/usr/lib/hive-hcatalog" ]; then
 84 #    HCAT_HOME=/usr/lib/hive-hcatalog
 85 #  elif [ -d "/usr/lib/hcatalog" ]; then
 86 #    HCAT_HOME=/usr/lib/hcatalog
 87 #  else
 88 #    HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog
 89 #    if [ ! -d ${HCAT_HOME} ]; then
 90 #       HCAT_HOME=${SQOOP_HOME}/../hcatalog
 91 #    fi
 92 #  fi
 93 #fi
 94 #if [ -z "${ACCUMULO_HOME}" ]; then
 95 #  if [ -d "/usr/lib/accumulo" ]; then
 96 #    ACCUMULO_HOME=/usr/lib/accumulo
 97 #  else
 98 #    ACCUMULO_HOME=${SQOOP_HOME}/../accumulo
 99 #  fi
100 #fi

134 ## Moved to be a runtime check in sqoop.
135 #if [ ! -d "${HCAT_HOME}" ]; then
136 #  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
137 #  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
138 #fi
139 
140 #if [ ! -d "${ACCUMULO_HOME}" ]; then
141 #  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
142 #  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
143 #fi

这样做的目的是避免运行时出现下面的警告信息,其实也可以不用注释掉:

Warning: /opt/module/sqoop/bin/…/…/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/module/sqoop/bin/…/…/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  1. 添加所需的Jar包

​ 将MySQL的驱动(使用5.x版本,不要使用高版本的)上传到sqoop安装目录下的lib目录下

$ cp mysql-connector-java-5.1.44-bin.jar /opt/pkg/sqoop-1.4.7/lib/

​ 将$HIVE_HOME/lib/hive-common-3.1.2.jar拷贝或者软链接到$SQOOP_HOME/lib

$ ln -s /opt/pkg/hive/lib/hive-common-3.1.2.jar /opt/pkg/hbase/lib

​ 如果需要解析json,可下载java-json.har,放到sqoop目录下的lib里

$ cp java-json.jar /opt/pkg/sqoop-1.4.7/lib/

3、验证

sqoop version

[hadoop@hadoop100 zookeeper-3.4.11]$ sqoop-version 
21/01/11 00:48:43 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017

4.异常处理

  1. sqoop导入hive表报错解决
Could not load org.apache.hadoop.hive.conf.HiveConf

​ 解决方法:

​ 只需把hive安装文件夹里的/lib/hive-common-x.x.x.jar复制出来放在sqoop安装文件夹的lib文件夹里就行的了

​ 注:上面的x.x.x意思不是这个jar包真的叫hive-common-x.x.x.jar,是表示版本号的,比如说我的hive版本是2.3.3的,所以我这个jar包名是hive-common-2.3.3.jar的。

  1. 查询导入报错
INFO mapreduce.ImportJobBase: Beginning query import.
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject

根据错误定位导缺少java-json jar包

然后去下载这个jar

http://www.java2s.com/Code/Jar/j/Downloadjavajsonjar.htm

把jar包拷贝到SQOOP_HOME/lib下面,然后重新运行SQOOP

Views: 36