08 – Hive 3.1.2 安装

官方手册

GettingStarted

Manual

1.安装及配置 Hive

(1)把 Hive 的安装包 apache-hive-3.1.2-bin.tar.gz 上传到 Linux 虚拟机的/opt/download目录下, 解压

$ wget https://mirrors.bfsu.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
$ tar -zxvf apache-hive-3.1.2-bin.tar.gz

(2)将解压的文件夹移动到/opt/pkg下,同时改名为hive

$ mv apache-hive-3.1.2 /opt/pkg/hive

(3)修改/etc/profile.d/hadoop.env.sh 文件,添加环境变量。

$ sudo vim /etc/profile.d/hadoop.env.sh 

添加以下内容。

# HIVE_HOME 3.1.2 
export HIVE_HOME=/opt/module/hive 
export PATH=PATH:HIVE_HOME/bin 

执行以下命令使环境变量生效。

$ source /etc/profile.d/hadoop.env.sh

(4)进到/opt/module/hive/lib 目录下执行以下命令,解决日志 jar 包冲突。

$ cd /opt/module/hive/lib
$ mv log4j-slf4j-impl-2.10.0.jar log4j-slf4j-impl-2.10.0.jar.bak

(5)解决guava版本低于hadoop版本的问题

$ cd /opt/module/hive/lib
$ mv guava-19.0.jar guava-19.0.jar.bak
$ ln -s /opt/pkg/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar ./

删除或改名lib下较低版本的guava的jar包

从hadoop目录将guava的jar包拷贝或者软链接过来

如果不处理,初始化MySQL元数据的时候就会出错。

2.驱动复制

(1)将mysql-connector-java-5.1.27.tar 驱动包拷贝到Hive的lib目录。

$ cp mysql-connector-java-5.1.27.jar /opt/pkg/hive/lib/

3.配置

(1)在/opt/module/hive/conf 目录下创建一个 hive-site.xml 文件

$ touch hive-site.xml 
$ vim hive-site.xml 

(2)配置如下


[hadoop@hadoop100 ~]$ cat /opt/pkg/hive/conf/hive-site.xml




  
  
    javax.jdo.option.ConnectionURL
    jdbc:mysql://hadoop100:3306/metastore?createDatabaseIfNotExist=true
    JDBC connect string for a JDBC metastore
  

  
  
    javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver
    Driver class name for a JDBC metastore
  

  
  
    javax.jdo.option.ConnectionUserName
    hive
    username to use against metastore database
  

  
  
    javax.jdo.option.ConnectionPassword
    hive1234
    password to use against metastore database
  

  
    hive.metastore.warehouse.dir
    /user/hive/warehouse
  

  
    hive.metastore.schema.verification
    false
  

   
  
    hive.metastore.uris
    thrift://hadoop100:9083
  

  
  
    hive.server2.thrift.port
    10000
  

  
    hive.server2.thrift.bind.host
    hadoop100
  

  
    hive.metastore.event.db.notification.api.auth
    false
  

  
    hive.cli.print.header
    true
  

  
    hive.cli.print.current.db
    true
  

配置详情见官方文档

4.初始化元数据库

(1)启动 MySQL。

[hadoop@hadoop100 ~]$mysql -hhadoop100 -P3306 -uhive -phive1234

(2)新建 Hive 元数据库(metastore)。

mysql> create database hive; 
mysql> quit; 

(3)初始化 Hive 元数据库。

$ schematool -initSchema -dbType mysql -verbose 

5.启动 Hive

(1)Hive 2.x 以上版本,要先启动 Metastore 和 Hiveserver2 服务,否则会报错。

FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient 

(2)在/opt/pkg/hive/bin 目录下编写 Hive 服务启动脚本,在脚本中启动 Metastore 和 Hiveserver2 服务。

[hadoop@hadoop100 bin]$ vi /opt/pkg/hive/bin/hive-services.sh

#!/bin/bash
HIVE_LOG_DIR=$HIVE_HOME/logs

mkdir -p $HIVE_LOG_DIR

#检查进程是否运行正常,参数1为进程名,参数2为进程端口
function check_process()
{
    pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}')
    ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1)
    echo $pid
    [[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1
}

function hive_start()
{
    metapid=$(check_process HiveMetastore 9083)
    cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &"
    cmd=$cmd" sleep 4; hdfs dfsadmin -safemode wait >/dev/null 2>&1"
    [ -z "$metapid" ] && eval $cmd || echo "Metastroe服务已启动"
    server2pid=$(check_process HiveServer2 10000)
    cmd="nohup hive --service hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &"
    [ -z "$server2pid" ] && eval $cmd || echo "HiveServer2服务已启动"
}

function hive_stop()
{
    metapid=$(check_process HiveMetastore 9083)
    [ "$metapid" ] && kill $metapid || echo "Metastore服务未启动"
    server2pid=$(check_process HiveServer2 10000)
    [ "$server2pid" ] && kill $server2pid || echo "HiveServer2服务未启动"
}

case $1 in
"start")
    hive_start
    ;;
"stop")
    hive_stop
    ;;
"restart")
    hive_stop
    sleep 2
    hive_start
    ;;
"status")
    check_process HiveMetastore 9083 >/dev/null && echo "Metastore服务运行正常" || echo "Metastore服务运行异常"
    check_process HiveServer2 10000 >/dev/null && echo "HiveServer2服务运行正常" || echo "HiveServer2服务运行异常"
    ;;
*)
    echo Invalid Args!
    echo 'Usage: '$(basename $0)' start|stop|restart|status'
    ;;
esac

(3)增加脚本执行权限。

$ sudo chmod +x hive-services.sh 

(4)启动 Hive 后台服务。

$ hive-services.sh start 

(5)查看 Hive 后台服务运行情况。

需要多试几次,因为服务从启动直到进程出现需要等待约一分钟左右时间。

$ hive-services.sh status 
Metastore 服务运行正常 
HiveServer2 服务运行正常 

(6)启动 Hive 客户端。

[hadoop@hadoop100 bin]$ hive
which: no hbase in (/opt/rh/llvm-toolset-7.0/root/usr/bin:/opt/pkg/hive/bin:/opt/pkg/flume/bin:/opt/pkg/kafka/bin:/opt/pkg/zookeeper/bin:/opt/pkg/hadoop/bin:/opt/pkg/hadoop/sbin:/opt/pkg/maven/bin:/sbin:/opt/pkg/java/bin:/opt/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/hadoop/.local/bin:/home/hadoop/bin)
Hive Session ID = 2b898c67-6677-4e98-8e23-0d70d358d206

Logging initialized using configuration in jar:file:/opt/pkg/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Hive Session ID = 8aa334ad-253c-4974-bdb2-971b22bd3afb
hive (default)>

6. 日志

hive和beeline工具的日志默认在/tmp/<操作用户名>下生成,

如果希望执行命令时实时看到日志,也可以创建conf/hive-log4j2.properties以及conf/beeline-log4j2.properties(有模板)

  • 日志存放位置可以修改

    • property.hive.log.dir = /opt/pkg/hive/logs/
      property.hive.log.file = hive.log
  • 默认日志级别是info,会导致hive客户端中输出很多不必要的信息

    • 但是修改级别后并没有效果

hive服务器日志可以查看

$ tail -F /opt/pkg/hive/logs/hiveServer2.log

7. beeline 使用示例

Hive客户端工具后续将使用Beeline 替代HiveCLI ,并且后续版本也会废弃掉HiveCLI 客户端工具,Beeline是 Hive 0.11版本引入的新命令行客户端工具,它是基于SQLLine CLI的JDBC客户端。

beeline连接hiveserver2

beeline -u "jdbc:hive2://localhost:10000"  -nhadoop -p123123
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
scan complete in 2ms
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 2.3.4)
Driver: Hive JDBC (version 2.3.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.4 by Apache Hive
0: jdbc:hive2://localhost:10000>

类似hive-cli 的执行脚本功能

nohup beeline -u jdbc:hive2://127.0.0.1:10000  -n hadoop -p 123123  --color=true --silent=false  \
--hivevar p_date=${partitionDate} --hivevar f_date=${fileLocDate}  \
-f hdfs_add_partition_dmp_clearlog.hql  >> $logdir/load_${curDate}.log

beeline支持的操作

type command !help in beeline terminal

!help

output:

!addlocaldriverjar  Add driver jar file in the beeline client side.
!addlocaldrivername Add driver name that needs to be supported in the beeline
                    client side.
!all                Execute the specified SQL against all the current connections
!autocommit         Set autocommit mode on or off
!batch              Start or execute a batch of statements
!brief              Set verbose mode off
!call               Execute a callable statement
!close              Close the current connection to the database
!closeall           Close all current open connections
!columns            List all the columns for the specified table
!commit             Commit the current transaction (if autocommit is off)
!connect            Open a new connection to the database.
!dbinfo             Give metadata information about the database
!describe           Describe a table
!dropall            Drop all tables in the current database
!exportedkeys       List all the exported keys for the specified table
!go                 Select the current connection
!help               Print a summary of command usage
!history            Display the command history
!importedkeys       List all the imported keys for the specified table
!indexes            List all the indexes for the specified table
!isolation          Set the transaction isolation for this connection
!list               List the current connections
!manual             Display the BeeLine manual
!metadata           Obtain metadata information
!nativesql          Show the native SQL for the specified statement
!nullemptystring    Set to true to get historic behavior of printing null as
                    empty string. Default is false.
!outputformat       Set the output format for displaying results
                    (table,vertical,csv2,dsv,tsv2,xmlattrs,xmlelements, and
                    deprecated formats(csv, tsv))
!primarykeys        List all the primary keys for the specified table
!procedures         List all the procedures
!properties         Connect to the database specified in the properties file(s)
!quit               Exits the program
!reconnect          Reconnect to the database
!record             Record all output to the specified file
!rehash             Fetch table and column names for command completion
!rollback           Roll back the current transaction (if autocommit is off)
!run                Run a script from the specified file
!save               Save the current variabes and aliases
!scan               Scan for installed JDBC drivers
!script             Start saving a script to a file
!set                Set a beeline variable
!sh                 Execute a shell command
!sql                Execute a SQL command
!tables             List all the tables in the database
!typeinfo           Display the type map for the current connection
!verbose            Set verbose mode on

常用的几个command

  • !connect url –连接不同的Hive2服务器
  • !exit –退出shell
  • !help –显示全部命令列表
  • !verbose –显示查询追加的明细

5. 异常处理

一般hiverServer2的启动比较花时间,需等待5分钟左右才能启动,如果长时间还没有启动需要查看日志寻找失败原因

启动hiveserver2报错 1

Hive启动hiveserver2报错:Could not open client transport with JDBC Uri解决方案

报错信息:

Error: Could not open client transport with JDBC Uri: jdbc:hive2://node1:10000/hive_metadata;user=hadoop: java.net.ConnectException: 拒绝连接 (Connection refused) (state=08S01,code=0)
Beeline version 2.3.3 by Apache Hive

原因:hiveserver2增加了权限控制,需要在hadoop的配置文件中配置代理用户

解决方案

关闭dfs和yarn

$ stop-dfs.sh
$ stop-yarn.sh

在hadoop的core-site.xml中添加如下内容(其中hadoop为允许作为代理用户的用户及用户组)


   hadoop.proxyuser.hadoop.hosts
   *


   hadoop.proxyuser.hadoop.groups
   *

重启dfs

$ start-dfs.sh

重启yarn

$ start-yarn.sh

重新连接Hive

beeline -u jdbc:hive2://hadoop100:10000 -n hadoop -p

启动hiveserver2报错 2

若日志中发现如下异常信息:

Safe mode is ON. The reported blocks 3 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 5. The number of live datanodes 2 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.

说明我们的损坏的文件比例超过了阈值, 这个阈值配置在hdfs中, 也就是说不允许任何一个块损坏掉. 如果我们配置成99%应该就不会触发safemode了.
由于系统断电,内存不足等原因导致dataNode丢失超过设置的丢失百分比,系统自动进入安全模式

解决办法

执行命令退出安全模式:

hadoop dfsadmin -safemode leave`

执行健康检查,删除损坏掉的block
hdfs fsck  /  -delete

运行后如果出现下面提示,表示修复完毕

The filesystem under path '/' is HEALTHY

如果没有修复则可以多执行一次hdfs fsck命令

beeline下无法导入hbase表

改为使用bin/hive就可以导入了, 可能是BUG。

Views: 43