官方手册
1.安装及配置 Hive
(1)把 Hive 的安装包 apache-hive-3.1.2-bin.tar.gz 上传到 Linux 虚拟机的/opt/download目录下, 解压
1 2 |
$ wget https://mirrors.bfsu.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz $ tar -zxvf apache-hive-3.1.2-bin.tar.gz |
(2)将解压的文件夹移动到/opt/pkg下,同时改名为hive
1 |
$ mv apache-hive-3.1.2 /opt/pkg/hive |
(3)修改/etc/profile.d/hadoop.env.sh 文件,添加环境变量。
1 |
$ sudo vim /etc/profile.d/hadoop.env.sh |
添加以下内容。
1 2 3 |
# HIVE_HOME 3.1.2 export HIVE_HOME=/opt/module/hive export PATH=PATH:HIVE_HOME/bin |
执行以下命令使环境变量生效。
1 |
$ source /etc/profile.d/hadoop.env.sh |
(4)进到/opt/module/hive/lib 目录下执行以下命令,解决日志 jar 包冲突。
1 2 |
$ cd /opt/module/hive/lib $ mv log4j-slf4j-impl-2.10.0.jar log4j-slf4j-impl-2.10.0.jar.bak |
(5)解决guava版本低于hadoop版本的问题
1 2 3 |
$ cd /opt/module/hive/lib $ mv guava-19.0.jar guava-19.0.jar.bak $ ln -s /opt/pkg/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar ./ |
删除或改名lib下较低版本的guava的jar包
从hadoop目录将guava的jar包拷贝或者软链接过来
如果不处理,初始化MySQL元数据的时候就会出错。
2.驱动复制
(1)将mysql-connector-java-5.1.27.tar 驱动包拷贝到Hive的lib目录。
1 |
$ cp mysql-connector-java-5.1.27.jar /opt/pkg/hive/lib/ |
3.配置
(1)在/opt/module/hive/conf 目录下创建一个 hive-site.xml 文件
1 2 |
$ touch hive-site.xml $ vim hive-site.xml |
(2)配置如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
[hadoop@hadoop100 ~]$ cat /opt/pkg/hive/conf/hive-site.xml <?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--配置 Hive 保存元数据信息所需的 MySQL URL 地址--> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop100:3306/metastore?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <!--配置 Hive 连接 MySQL 的驱动全类名--> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <!--配置 Hive 连接 MySQL 的用户名 --> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <!--配置 Hive 连接 MySQL 的密码 --> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive1234</value> <description>password to use against metastore database</description> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <!--Hive 元数据的 thrift 的 uri地址,可配多个 --> <property> <name>hive.metastore.uris</name> <value>thrift://hadoop100:9083</value> </property> <!--Hive server2 的 thrift 端口 --> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> <property> <name>hive.server2.thrift.bind.host</name> <value>hadoop100</value> </property> <property> <name>hive.metastore.event.db.notification.api.auth</name> <value>false</value> </property> <property> <name>hive.cli.print.header</name> <value>true</value> </property> <property> <name>hive.cli.print.current.db</name> <value>true</value> </property> </configuration> |
4.初始化元数据库
(1)启动 MySQL。
1 |
[hadoop@hadoop100 ~]$mysql -hhadoop100 -P3306 -uhive -phive1234 |
(2)新建 Hive 元数据库(metastore)。
1 2 |
mysql> create database hive; mysql> quit; |
(3)初始化 Hive 元数据库。
1 |
$ schematool -initSchema -dbType mysql -verbose |
5.启动 Hive
(1)Hive 2.x 以上版本,要先启动 Metastore 和 Hiveserver2 服务,否则会报错。
1 |
FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient |
(2)在/opt/pkg/hive/bin 目录下编写 Hive 服务启动脚本,在脚本中启动 Metastore 和 Hiveserver2 服务。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
[hadoop@hadoop100 bin]$ vi /opt/pkg/hive/bin/hive-services.sh #!/bin/bash HIVE_LOG_DIR=$HIVE_HOME/logs mkdir -p $HIVE_LOG_DIR #检查进程是否运行正常,参数1为进程名,参数2为进程端口 function check_process() { pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}') ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1) echo $pid [[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1 } function hive_start() { metapid=$(check_process HiveMetastore 9083) cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &" cmd=$cmd" sleep 4; hdfs dfsadmin -safemode wait >/dev/null 2>&1" [ -z "$metapid" ] && eval $cmd || echo "Metastroe服务已启动" server2pid=$(check_process HiveServer2 10000) cmd="nohup hive --service hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &" [ -z "$server2pid" ] && eval $cmd || echo "HiveServer2服务已启动" } function hive_stop() { metapid=$(check_process HiveMetastore 9083) [ "$metapid" ] && kill $metapid || echo "Metastore服务未启动" server2pid=$(check_process HiveServer2 10000) [ "$server2pid" ] && kill $server2pid || echo "HiveServer2服务未启动" } case $1 in "start") hive_start ;; "stop") hive_stop ;; "restart") hive_stop sleep 2 hive_start ;; "status") check_process HiveMetastore 9083 >/dev/null && echo "Metastore服务运行正常" || echo "Metastore服务运行异常" check_process HiveServer2 10000 >/dev/null && echo "HiveServer2服务运行正常" || echo "HiveServer2服务运行异常" ;; *) echo Invalid Args! echo 'Usage: '$(basename $0)' start|stop|restart|status' ;; esac |
(3)增加脚本执行权限。
1 |
$ sudo chmod +x hive-services.sh |
(4)启动 Hive 后台服务。
1 |
$ hive-services.sh start |
(5)查看 Hive 后台服务运行情况。
需要多试几次,因为服务从启动直到进程出现需要等待约一分钟左右时间。
1 2 3 |
$ hive-services.sh status Metastore 服务运行正常 HiveServer2 服务运行正常 |
(6)启动 Hive 客户端。
1 2 3 4 5 6 7 8 9 |
[hadoop@hadoop100 bin]$ hive which: no hbase in (/opt/rh/llvm-toolset-7.0/root/usr/bin:/opt/pkg/hive/bin:/opt/pkg/flume/bin:/opt/pkg/kafka/bin:/opt/pkg/zookeeper/bin:/opt/pkg/hadoop/bin:/opt/pkg/hadoop/sbin:/opt/pkg/maven/bin:/sbin:/opt/pkg/java/bin:/opt/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/hadoop/.local/bin:/home/hadoop/bin) Hive Session ID = 2b898c67-6677-4e98-8e23-0d70d358d206 Logging initialized using configuration in jar:file:/opt/pkg/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Hive Session ID = 8aa334ad-253c-4974-bdb2-971b22bd3afb hive (default)> |
6. 日志
hive和beeline工具的日志默认在/tmp/<操作用户名>下生成,
如果希望执行命令时实时看到日志,也可以创建conf/hive-log4j2.properties以及conf/beeline-log4j2.properties(有模板)
-
日志存放位置可以修改
- property.hive.log.dir = /opt/pkg/hive/logs/
property.hive.log.file = hive.log
- property.hive.log.dir = /opt/pkg/hive/logs/
-
默认日志级别是info,会导致hive客户端中输出很多不必要的信息
- 但是修改级别后并没有效果
hive服务器日志可以查看
1 |
$ tail -F /opt/pkg/hive/logs/hiveServer2.log |
7. beeline 使用示例
Hive客户端工具后续将使用Beeline 替代HiveCLI ,并且后续版本也会废弃掉HiveCLI 客户端工具,Beeline是 Hive 0.11版本引入的新命令行客户端工具,它是基于SQLLine CLI的JDBC客户端。
beeline连接hiveserver2
1 2 3 4 5 6 7 8 9 10 |
beeline -u "jdbc:hive2://localhost:10000" -nhadoop -p123123 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 scan complete in 2ms Connecting to jdbc:hive2://localhost:10000 Connected to: Apache Hive (version 2.3.4) Driver: Hive JDBC (version 2.3.4) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 2.3.4 by Apache Hive 0: jdbc:hive2://localhost:10000> |
类似hive-cli 的执行脚本功能
1 2 3 |
nohup beeline -u jdbc:hive2://127.0.0.1:10000 -n hadoop -p 123123 --color=true --silent=false \ --hivevar p_date=${partitionDate} --hivevar f_date=${fileLocDate} \ -f hdfs_add_partition_dmp_clearlog.hql >> $logdir/load_{curDate}.log |
beeline支持的操作
type command !help in beeline terminal
1 |
!help |
output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
!addlocaldriverjar Add driver jar file in the beeline client side. !addlocaldrivername Add driver name that needs to be supported in the beeline client side. !all Execute the specified SQL against all the current connections !autocommit Set autocommit mode on or off !batch Start or execute a batch of statements !brief Set verbose mode off !call Execute a callable statement !close Close the current connection to the database !closeall Close all current open connections !columns List all the columns for the specified table !commit Commit the current transaction (if autocommit is off) !connect Open a new connection to the database. !dbinfo Give metadata information about the database !describe Describe a table !dropall Drop all tables in the current database !exportedkeys List all the exported keys for the specified table !go Select the current connection !help Print a summary of command usage !history Display the command history !importedkeys List all the imported keys for the specified table !indexes List all the indexes for the specified table !isolation Set the transaction isolation for this connection !list List the current connections !manual Display the BeeLine manual !metadata Obtain metadata information !nativesql Show the native SQL for the specified statement !nullemptystring Set to true to get historic behavior of printing null as empty string. Default is false. !outputformat Set the output format for displaying results (table,vertical,csv2,dsv,tsv2,xmlattrs,xmlelements, and deprecated formats(csv, tsv)) !primarykeys List all the primary keys for the specified table !procedures List all the procedures !properties Connect to the database specified in the properties file(s) !quit Exits the program !reconnect Reconnect to the database !record Record all output to the specified file !rehash Fetch table and column names for command completion !rollback Roll back the current transaction (if autocommit is off) !run Run a script from the specified file !save Save the current variabes and aliases !scan Scan for installed JDBC drivers !script Start saving a script to a file !set Set a beeline variable !sh Execute a shell command !sql Execute a SQL command !tables List all the tables in the database !typeinfo Display the type map for the current connection !verbose Set verbose mode on |
常用的几个command
- !connect url –连接不同的Hive2服务器
- !exit –退出shell
- !help –显示全部命令列表
- !verbose –显示查询追加的明细
5. 异常处理
一般hiverServer2的启动比较花时间,需等待5分钟左右才能启动,如果长时间还没有启动需要查看日志寻找失败原因
启动hiveserver2报错 1
Hive启动hiveserver2报错:Could not open client transport with JDBC Uri
解决方案
报错信息:
1 2 |
Error: Could not open client transport with JDBC Uri: jdbc:hive2://node1:10000/hive_metadata;user=hadoop: java.net.ConnectException: 拒绝连接 (Connection refused) (state=08S01,code=0) Beeline version 2.3.3 by Apache Hive |
原因:hiveserver2增加了权限控制,需要在hadoop的配置文件中配置代理用户
解决方案
关闭dfs和yarn
1 2 |
$ stop-dfs.sh $ stop-yarn.sh |
在hadoop的core-site.xml中添加如下内容(其中hadoop为允许作为代理用户的用户及用户组)
1 2 3 4 5 6 7 8 |
<property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property> |
重启dfs
1 |
$ start-dfs.sh |
重启yarn
1 |
$ start-yarn.sh |
重新连接Hive
1 |
beeline -u jdbc:hive2://hadoop100:10000 -n hadoop -p |
启动hiveserver2报错 2
若日志中发现如下异常信息:
1 |
Safe mode is ON. The reported blocks 3 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 5. The number of live datanodes 2 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. |
说明我们的损坏的文件比例超过了阈值, 这个阈值配置在hdfs中, 也就是说不允许任何一个块损坏掉. 如果我们配置成99%应该就不会触发safemode了.
由于系统断电,内存不足等原因导致dataNode丢失超过设置的丢失百分比,系统自动进入安全模式
解决办法
执行命令退出安全模式:
1 |
hadoop dfsadmin -safemode leave` |
执行健康检查,删除损坏掉的block
hdfs fsck / -delete
运行后如果出现下面提示,表示修复完毕
1 |
The filesystem under path '/' is HEALTHY |
如果没有修复则可以多执行一次
hdfs fsck
命令
beeline下无法导入hbase表
改为使用bin/hive
就可以导入了, 可能是BUG。
Views: 43