08 – Hive 3.1.2 安装 – Fancy Coding.

TOC

官方手册

1．安装及配置 Hive

（1）把 Hive 的安装包 apache-hive-3.1.2-bin.tar.gz 上传到 Linux 虚拟机的/opt/download目录下，解压

1 2	$ wget https://mirrors.bfsu.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz $ tar -zxvf apache-hive-3.1.2-bin.tar.gz

（2）将解压的文件夹移动到/opt/pkg下，同时改名为hive

1	$ mv apache-hive-3.1.2 /opt/pkg/hive

（3）修改/etc/profile.d/hadoop.env.sh 文件，添加环境变量。

1	$ sudo vim /etc/profile.d/hadoop.env.sh

添加以下内容。

# HIVE_HOME 3.1.2

export HIVE_HOME=/opt/module/hive

export PATH=PATH:HIVE_HOME/bin

执行以下命令使环境变量生效。

1	$ source /etc/profile.d/hadoop.env.sh

（4）进到/opt/module/hive/lib 目录下执行以下命令，解决日志 jar 包冲突。

1 2	$ cd /opt/module/hive/lib $ mv log4j-slf4j-impl-2.10.0.jar log4j-slf4j-impl-2.10.0.jar.bak

（5）解决guava版本低于hadoop版本的问题

$ cd /opt/module/hive/lib

$ mv guava-19.0.jar guava-19.0.jar.bak

$ ln -s /opt/pkg/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar ./

删除或改名lib下较低版本的guava的jar包

从hadoop目录将guava的jar包拷贝或者软链接过来

如果不处理，初始化MySQL元数据的时候就会出错。

2．驱动复制

（1）将mysql-connector-java-5.1.27.tar 驱动包拷贝到Hive的lib目录。

1	$ cp mysql-connector-java-5.1.27.jar /opt/pkg/hive/lib/

3．配置

（1）在/opt/module/hive/conf 目录下创建一个 hive-site.xml 文件

1 2	$ touch hive-site.xml $ vim hive-site.xml

（2）配置如下

[hadoop@hadoop100 ~]$ cat /opt/pkg/hive/conf/hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://hadoop100:3306/metastore?createDatabaseIfNotExist=true</value>

<description>JDBC connect string for a JDBC metastore</description>

</property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>Driver class name for a JDBC metastore</description>

</property>

<name>javax.jdo.option.ConnectionUserName</name>

<description>username to use against metastore database</description>

</property>

<name>javax.jdo.option.ConnectionPassword</name>

<description>password to use against metastore database</description>

</property>

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive/warehouse</value>

</property>

<name>hive.metastore.schema.verification</name>

<value>false</value>

</property>

<name>hive.metastore.uris</name>

<value>thrift://hadoop100:9083</value>

</property>

<name>hive.server2.thrift.port</name>

</property>

<name>hive.server2.thrift.bind.host</name>

<value>hadoop100</value>

</property>

<name>hive.metastore.event.db.notification.api.auth</name>

<value>false</value>

</property>

<name>hive.cli.print.header</name>

</property>

<name>hive.cli.print.current.db</name>

</property>

</configuration>

配置详情见官方文档

4．初始化元数据库

（1）启动 MySQL。

1	[hadoop@hadoop100 ~]$mysql -hhadoop100 -P3306 -uhive -phive1234

（2）新建 Hive 元数据库(metastore)。

1 2	mysql> create database hive; mysql> quit;

（3）初始化 Hive 元数据库。

1	$ schematool -initSchema -dbType mysql -verbose

5．启动 Hive

（1）Hive 2.x 以上版本，要先启动 Metastore 和 Hiveserver2 服务，否则会报错。

1	FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

（2）在/opt/pkg/hive/bin 目录下编写 Hive 服务启动脚本，在脚本中启动 Metastore 和 Hiveserver2 服务。

[hadoop@hadoop100 bin]$ vi /opt/pkg/hive/bin/hive-services.sh

#!/bin/bash

HIVE_LOG_DIR=$HIVE_HOME/logs

mkdir -p $HIVE_LOG_DIR

#检查进程是否运行正常，参数1为进程名，参数2为进程端口

function check_process()

{

pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}')

ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1)

echo $pid

[[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1

}

function hive_start()

{

metapid=$(check_process HiveMetastore 9083)

cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &"

cmd=$cmd" sleep 4; hdfs dfsadmin -safemode wait >/dev/null 2>&1"

[ -z "$metapid" ] && eval $cmd || echo "Metastroe服务已启动"

server2pid=$(check_process HiveServer2 10000)

cmd="nohup hive --service hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &"

[ -z "$server2pid" ] && eval $cmd || echo "HiveServer2服务已启动"

}

function hive_stop()

{

metapid=$(check_process HiveMetastore 9083)

[ "$metapid" ] && kill $metapid || echo "Metastore服务未启动"

server2pid=$(check_process HiveServer2 10000)

[ "$server2pid" ] && kill $server2pid || echo "HiveServer2服务未启动"

}

case $1 in

"start")

hive_start

;;

"stop")

hive_stop

;;

"restart")

hive_stop

sleep 2

hive_start

;;

"status")

check_process HiveMetastore 9083 >/dev/null && echo "Metastore服务运行正常" || echo "Metastore服务运行异常"

check_process HiveServer2 10000 >/dev/null && echo "HiveServer2服务运行正常" || echo "HiveServer2服务运行异常"

;;

echo Invalid Args!

echo 'Usage: '$(basename $0)' start|stop|restart|status'

;;

esac

（3）增加脚本执行权限。

1	$ sudo chmod +x hive-services.sh

（4）启动 Hive 后台服务。

1	$ hive-services.sh start

（5）查看 Hive 后台服务运行情况。

需要多试几次，因为服务从启动直到进程出现需要等待约一分钟左右时间。

$ hive-services.sh status

Metastore 服务运行正常

HiveServer2 服务运行正常

（6）启动 Hive 客户端。

[hadoop@hadoop100 bin]$ hive

which: no hbase in (/opt/rh/llvm-toolset-7.0/root/usr/bin:/opt/pkg/hive/bin:/opt/pkg/flume/bin:/opt/pkg/kafka/bin:/opt/pkg/zookeeper/bin:/opt/pkg/hadoop/bin:/opt/pkg/hadoop/sbin:/opt/pkg/maven/bin:/sbin:/opt/pkg/java/bin:/opt/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/hadoop/.local/bin:/home/hadoop/bin)

Hive Session ID = 2b898c67-6677-4e98-8e23-0d70d358d206

Logging initialized using configuration in jar:file:/opt/pkg/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

Hive Session ID = 8aa334ad-253c-4974-bdb2-971b22bd3afb

hive (default)>

6. 日志

hive和beeline工具的日志默认在/tmp/<操作用户名>下生成，

如果希望执行命令时实时看到日志，也可以创建conf/hive-log4j2.properties以及conf/beeline-log4j2.properties（有模板）

日志存放位置可以修改
- property.hive.log.dir = /opt/pkg/hive/logs/
  property.hive.log.file = hive.log
默认日志级别是info，会导致hive客户端中输出很多不必要的信息
- 但是修改级别后并没有效果

hive服务器日志可以查看

1	$ tail -F /opt/pkg/hive/logs/hiveServer2.log

7. beeline 使用示例

Hive客户端工具后续将使用Beeline 替代HiveCLI ，并且后续版本也会废弃掉HiveCLI 客户端工具,Beeline是 Hive 0.11版本引入的新命令行客户端工具,它是基于SQLLine CLI的JDBC客户端。

beeline连接hiveserver2

beeline -u "jdbc:hive2://localhost:10000" -nhadoop -p123123

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0

scan complete in 2ms

Connecting to jdbc:hive2://localhost:10000

Connected to: Apache Hive (version 2.3.4)

Driver: Hive JDBC (version 2.3.4)

Transaction isolation: TRANSACTION_REPEATABLE_READ

Beeline version 2.3.4 by Apache Hive

0: jdbc:hive2://localhost:10000>

类似hive-cli 的执行脚本功能

nohup beeline -u jdbc:hive2://127.0.0.1:10000 -n hadoop -p 123123 --color=true --silent=false \

--hivevar p_date=${partitionDate} --hivevar f_date=${fileLocDate} \

-f hdfs_add_partition_dmp_clearlog.hql >> $logdir/load_{curDate}.log

beeline支持的操作

type command !help in beeline terminal

!help

output:

!addlocaldriverjar Add driver jar file in the beeline client side.

!addlocaldrivername Add driver name that needs to be supported in the beeline

client side.

!all Execute the specified SQL against all the current connections

!autocommit Set autocommit mode on or off

!batch Start or execute a batch of statements

!brief Set verbose mode off

!call Execute a callable statement

!close Close the current connection to the database

!closeall Close all current open connections

!columns List all the columns for the specified table

!commit Commit the current transaction (if autocommit is off)

!connect Open a new connection to the database.

!dbinfo Give metadata information about the database

!describe Describe a table

!dropall Drop all tables in the current database

!exportedkeys List all the exported keys for the specified table

!go Select the current connection

!help Print a summary of command usage

!history Display the command history

!importedkeys List all the imported keys for the specified table

!indexes List all the indexes for the specified table

!isolation Set the transaction isolation for this connection

!list List the current connections

!manual Display the BeeLine manual

!metadata Obtain metadata information

!nativesql Show the native SQL for the specified statement

!nullemptystring Set to true to get historic behavior of printing null as

empty string. Default is false.

!outputformat Set the output format for displaying results

(table,vertical,csv2,dsv,tsv2,xmlattrs,xmlelements, and

deprecated formats(csv, tsv))

!primarykeys List all the primary keys for the specified table

!procedures List all the procedures

!properties Connect to the database specified in the properties file(s)

!quit Exits the program

!reconnect Reconnect to the database

!record Record all output to the specified file

!rehash Fetch table and column names for command completion

!rollback Roll back the current transaction (if autocommit is off)

!run Run a script from the specified file

!save Save the current variabes and aliases

!scan Scan for installed JDBC drivers

!script Start saving a script to a file

!set Set a beeline variable

!sh Execute a shell command

!sql Execute a SQL command

!tables List all the tables in the database

!typeinfo Display the type map for the current connection

!verbose Set verbose mode on

常用的几个command

!connect url –连接不同的Hive2服务器
!exit –退出shell
!help –显示全部命令列表
!verbose –显示查询追加的明细

5. 异常处理

一般hiverServer2的启动比较花时间，需等待5分钟左右才能启动，如果长时间还没有启动需要查看日志寻找失败原因

启动hiveserver2报错 1

Hive启动hiveserver2报错：Could not open client transport with JDBC Uri解决方案

报错信息：

1 2	Error: Could not open client transport with JDBC Uri: jdbc:hive2://node1:10000/hive_metadata;user=hadoop: java.net.ConnectException: 拒绝连接 (Connection refused) (state=08S01,code=0) Beeline version 2.3.3 by Apache Hive

原因：hiveserver2增加了权限控制，需要在hadoop的配置文件中配置代理用户

解决方案

关闭dfs和yarn

1 2	$ stop-dfs.sh $ stop-yarn.sh

在hadoop的core-site.xml中添加如下内容(其中hadoop为允许作为代理用户的用户及用户组)

<name>hadoop.proxyuser.hadoop.hosts</name>

</property>

<name>hadoop.proxyuser.hadoop.groups</name>

</property>

重启dfs

1	$ start-dfs.sh

重启yarn

1	$ start-yarn.sh

重新连接Hive

1	beeline -u jdbc:hive2://hadoop100:10000 -n hadoop -p

启动hiveserver2报错 2

若日志中发现如下异常信息：

1	Safe mode is ON. The reported blocks 3 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 5. The number of live datanodes 2 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.

说明我们的损坏的文件比例超过了阈值, 这个阈值配置在hdfs中, 也就是说不允许任何一个块损坏掉. 如果我们配置成99%应该就不会触发safemode了.
由于系统断电，内存不足等原因导致dataNode丢失超过设置的丢失百分比，系统自动进入安全模式

解决办法

执行命令退出安全模式：

1	hadoop dfsadmin -safemode leave`

执行健康检查，删除损坏掉的block
hdfs fsck / -delete

运行后如果出现下面提示，表示修复完毕

1	The filesystem under path '/' is HEALTHY

如果没有修复则可以多执行一次hdfs fsck命令

beeline下无法导入hbase表

改为使用bin/hive就可以导入了，可能是BUG。

一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

官方手册

1．安装及配置 Hive

2．驱动复制

3．配置

4．初始化元数据库

5．启动 Hive

6. 日志

7. beeline 使用示例

beeline连接hiveserver2

类似hive-cli 的执行脚本功能

beeline支持的操作

常用的几个command

5. 异常处理

启动hiveserver2报错 1

启动hiveserver2报错 2

beeline下无法导入hbase表

其他关联文章: