hive 表注释乱码以及show create table语句描述乱码解决
hive 表注释乱码以及show create table语句描述乱码解决
#执行 desc table1 #字段注释是中文,没乱码 desc formatted table1; 字段注释是中文,没乱码 但表注释是中文 ,显示是另一种编码,有问题,解决这块
元数据库编码情况如下:
由于元数据库使用的是postgresql, 修改数据库编码并不能像mysql那样方便,(暂时没找到方便的方式处理) 所以直接用修改源代码的方式 处理
官网上找到相关修改的bug https://issues.apache.org/jira/browse/HIVE-5682
修改类 ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java
下面这个方法替换为
private static void displayAllParameters(Map<String, String> params, StringBuilder tableInfo) { List<String> keys = new ArrayList<String>(params.keySet()); String value = null; Collections.sort(keys); for (String key : keys) { tableInfo.append(FIELD_DELIM); // Ensures all params are indented. value = params.get(key); if("comment".equals(key)&& null!=value && value.getBytes().length!=key.length()) { formatOutput(key, value, tableInfo); } else { formatOutput(key, StringEscapeUtils.escapeJava(value), tableInfo); } } }
####重新编译hive-exec-0.13.1-cdh5.3.1.jar,再放到 /opt/cloudera/parcels/CDH/jars 目录下 替换
####show create table 有中文乱码问题,解决
可参考 https://issues.apache.org/jira/browse/HIVE-2905
https://issues.apache.org/jira/secure/attachment/12791019/HIVE-11837.1.patch
修改源代码 ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
@@ -2048,7 +2048,7 @@ private int showCreateTable(Hive db, DataOutputStream outStream, String tableNam if (tbl.isView()) { String createTab_stmt = "CREATE VIEW `" + tableName + "` AS " + tbl.getViewExpandedText(); - outStream.writeBytes(createTab_stmt.toString()); + outStream.write(createTab_stmt.toString().getBytes("UTF-8")); return 0; }
@@ -2196,7 +2196,7 @@ else if (sortCol.getOrder() == BaseSemanticAnalyzer.HIVE_COLUMN_ORDER_DESC) { } createTab_stmt.add(TBL_PROPERTIES, tbl_properties); - outStream.writeBytes(createTab_stmt.render()); + outStream.write(createTab_stmt.render().getBytes("UTF-8")); } catch (IOException e) { LOG.info("show create table: " + stringifyException(e)); return 1;
#####重新编译hive-exec-0.13.1-cdh5.3.1.jar,再放到 /opt/cloudera/parcels/CDH/jars 目录下 替换
下载hive源代码可以到这里找到对应版本 http://archive.cloudera.com/cdh5/cdh/5/
cd /Users/yzygenuine/Downloads/hive-0.13.1-cdh5.3.1
##执行如下编译打包项目
mvn clean package -Phadoop-2 -Pdist -DskipTests -Dtar
同样了,修改好代码再编译打包出新了jar包,去替换线上的
#####替换后效果如下:
hive> show create table dwd_audio_download_redis; OK CREATE TABLE `dwd_audio_download_redis`( `audio_id` bigint COMMENT "节目ID", `download_cnt` bigint COMMENT "下载量") COMMENT "节目从上传到分区时间的下载量" PARTITIONED BY ( `day` bigint COMMENT "节目某天统计数据", `hour` bigint COMMENT "节目某时统计数据") ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe" STORED AS INPUTFORMAT "org.apache.hadoop.mapred.TextInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat" LOCATION "hdfs://master:8020/user/hive/warehouse/hive_db.db/dwd_audio_download_redis" TBLPROPERTIES ( "transient_lastDdlTime"="1457938879") Time taken: 0.55 seconds, Fetched: 17 row(s)
#########desc formatted 语句,取出表注释有中文乱码总理 ,解决
hive> desc formatted dwd_audio_download_redis; OK # col_name data_type comment audio_id bigint 节目ID download_cnt bigint 下载量 # Partition Information # col_name data_type comment day bigint 节目某天统计数据 hour bigint 节目某时统计数据 # Detailed Table Information Database: hive_db Owner: datamining CreateTime: Mon Mar 14 15:01:19 CST 2016 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://master:8020/user/hive/warehouse/hive_db.db/dwd_audio_download_redis Table Type: MANAGED_TABLE Table Parameters: comment 节目从上传到分区时间的下载量 transient_lastDdlTime 1457938879 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 0.514 seconds, Fetched: 34 row(s)
转贴请声明原文
http://blog.csdn.net/duck_genuine/article/details/50896532 |
声明:该文观点仅代表作者本人,牛骨文系教育信息发布平台,牛骨文仅提供信息存储空间服务。