hive 动态加载数据到指定分区,以及其他hive使用的技巧
hive修改分隔符:
hive根据数据创建分区,并且动态加载数据到分区
Loading data to table obd_message.device_status_log partition (date=null)
Time taken for load dynamic partitions : 4073
Loading partition {date=20161020}
Loading partition {date=20161017}
Loading partition {date=20161024}
Loading partition {date=20161021}
Loading partition {date=20161023}
Loading partition {date=20161026}
Loading partition {date=20161015}
Loading partition {date=20161018}
Loading partition {date=20161016}
Loading partition {date=20161019}
Loading partition {date=20161025}
Loading partition {date=20161022}
Time taken for adding to write entity : 6
Partition obd_message.device_status_log{date=20161015} stats: [numFiles=1, numRows=188, totalSize=79565, rawDataSize=79377]
Partition obd_message.device_status_log{date=20161016} stats: [numFiles=1, numRows=648, totalSize=299298, rawDataSize=298650]
Partition obd_message.device_status_log{date=20161017} stats: [numFiles=1, numRows=912, totalSize=414597, rawDataSize=413685]
Partition obd_message.device_status_log{date=20161018} stats: [numFiles=1, numRows=895, totalSize=410935, rawDataSize=410040]
Partition obd_message.device_status_log{date=20161019} stats: [numFiles=1, numRows=1412, totalSize=613903, rawDataSize=612491]
Partition obd_message.device_status_log{date=20161020} stats: [numFiles=1, numRows=475, totalSize=204375, rawDataSize=203900]
Partition obd_message.device_status_log{date=20161021} stats: [numFiles=1, numRows=346, totalSize=142079, rawDataSize=141733]
Partition obd_message.device_status_log{date=20161022} stats: [numFiles=1, numRows=561, totalSize=220711, rawDataSize=220150]
Partition obd_message.device_status_log{date=20161023} stats: [numFiles=1, numRows=856, totalSize=352452, rawDataSize=351596]
Partition obd_message.device_status_log{date=20161024} stats: [numFiles=1, numRows=1997, totalSize=783701, rawDataSize=781704]
Partition obd_message.device_status_log{date=20161025} stats: [numFiles=1, numRows=1384, totalSize=556970, rawDataSize=555586]
Partition obd_message.device_status_log{date=20161026} stats: [numFiles=1, numRows=326, totalSize=133275, rawDataSize=132949]
hive查看分区
hive 时间函数 添加分钟或者秒
有些tips 创建hiveInit.sh 编辑内容如下 (此处的目的是为了能够尽量让job在本地执行,缩短等待时间,方便调试):
-
alter table tableName set SERDEPROPERTIES("field.delim"=" ");
hive根据数据创建分区,并且动态加载数据到分区
-
insert into table device_status_log partition( date )
select `vin`,`obd_id` , `function_id` , `message_id` ,`message_content` ,
`longitude`,`latitude` ,`speed` ,`engine_speed` ,`gps_stat`,`client_time`,
`create_time`,`analytical_result`,regexp_replace( to_date(create_time ) ,"-","") as date
from pre_device_status_log ;
Loading data to table obd_message.device_status_log partition (date=null)
Time taken for load dynamic partitions : 4073
Loading partition {date=20161020}
Loading partition {date=20161017}
Loading partition {date=20161024}
Loading partition {date=20161021}
Loading partition {date=20161023}
Loading partition {date=20161026}
Loading partition {date=20161015}
Loading partition {date=20161018}
Loading partition {date=20161016}
Loading partition {date=20161019}
Loading partition {date=20161025}
Loading partition {date=20161022}
Time taken for adding to write entity : 6
Partition obd_message.device_status_log{date=20161015} stats: [numFiles=1, numRows=188, totalSize=79565, rawDataSize=79377]
Partition obd_message.device_status_log{date=20161016} stats: [numFiles=1, numRows=648, totalSize=299298, rawDataSize=298650]
Partition obd_message.device_status_log{date=20161017} stats: [numFiles=1, numRows=912, totalSize=414597, rawDataSize=413685]
Partition obd_message.device_status_log{date=20161018} stats: [numFiles=1, numRows=895, totalSize=410935, rawDataSize=410040]
Partition obd_message.device_status_log{date=20161019} stats: [numFiles=1, numRows=1412, totalSize=613903, rawDataSize=612491]
Partition obd_message.device_status_log{date=20161020} stats: [numFiles=1, numRows=475, totalSize=204375, rawDataSize=203900]
Partition obd_message.device_status_log{date=20161021} stats: [numFiles=1, numRows=346, totalSize=142079, rawDataSize=141733]
Partition obd_message.device_status_log{date=20161022} stats: [numFiles=1, numRows=561, totalSize=220711, rawDataSize=220150]
Partition obd_message.device_status_log{date=20161023} stats: [numFiles=1, numRows=856, totalSize=352452, rawDataSize=351596]
Partition obd_message.device_status_log{date=20161024} stats: [numFiles=1, numRows=1997, totalSize=783701, rawDataSize=781704]
Partition obd_message.device_status_log{date=20161025} stats: [numFiles=1, numRows=1384, totalSize=556970, rawDataSize=555586]
Partition obd_message.device_status_log{date=20161026} stats: [numFiles=1, numRows=326, totalSize=133275, rawDataSize=132949]
hive查看分区
- show partitions device_status_log ;
-
regexp_replace( to_date(create_time
) ,"-","") as date
hive 时间函数 添加分钟或者秒
-
from_unixtime(unix_timestamp(client_time) + 8*3600 ) as client_time
-
date date( date_add( date_sub( datediff( datetime
有些tips 创建hiveInit.sh 编辑内容如下 (此处的目的是为了能够尽量让job在本地执行,缩短等待时间,方便调试):
SET mapred.job.tracker=local; set mapred.reduce.tasks = 1; set hive.exec.mode.local.auto.input.files.max=1000; set hive.exec.mode.local.auto.inputbytes.max=50000000; set hive.exec.mode.local.auto.tasks.max=10; set hive.exec.mode.local.auto=true; set hive.cli.print.current.db=true; set hive.cli.print.header=true; show databases; use obd_message;
在编辑 hiveStart.sh
hive -i hiveInit.sh
然后修改执行权限 在当前目录执行 ./hiveStart.sh 就能以指定的配置启动hiveClient
声明:该文观点仅代表作者本人,牛骨文系教育信息发布平台,牛骨文仅提供信息存储空间服务。
- 上一篇: Hive(十四)--静态分区和动态分区
- 下一篇: hive中的一些基本问题解决方法