牛骨文教育服务平台(让学习变的简单)
博文笔记

【异常】Spark写入HBase时写入DataNode失败:dfs.client.block.write.replace-datanode-on-failure.policy

创建时间:2017-12-25 投稿人: 浏览次数:564

问题描述:

在SparkStreaming长时间写入HBase的时候,会下面的异常问题:

2017-12-24 23:20:34  [ SparkListenerBus:540107357 ] - [ ERROR ]  Listener EventLoggingListener threw an exception
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[ip:50010,DS-d9caacf5-a95a-45ab-8231-95decdbe4889,DISK], DatanodeInfoWithStorage[ip:50010,DS-7e2e14d9-3d8b-412d-bf38-3d2930a83d2f,DISK]], original=[DatanodeInfoWithStorage[ip:50010,DS-d9caacf5-a95a-45ab-8231-95decdbe4889,DISK], DatanodeInfoWithStorage[ip:50010,DS-7e2e14d9-3d8b-412d-bf38-3d2930a83d2f,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via "dfs.client.block.write.replace-datanode-on-failure.policy" in its configuration.
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1191)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1265)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1433)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1147)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:632)
2017-12-24 23:20:34  [ SparkListenerBus:540107357 ] - [ ERROR ]  Listener EventLoggingListener threw an exception
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[ip:50010,DS-d9caacf5-a95a-45ab-8231-95decdbe4889,DISK], DatanodeInfoWithStorage[ip:50010,DS-7e2e14d9-3d8b-412d-bf38-3d2930a83d2f,DISK]], original=[DatanodeInfoWithStorage[ip:50010,DS-d9caacf5-a95a-45ab-8231-95decdbe4889,DISK], DatanodeInfoWithStorageip:50010,DS-7e2e14d9-3d8b-412d-bf38-3d2930a83d2f,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via "dfs.client.block.write.replace-datanode-on-failure.policy" in its configuration.
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1191)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1265)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1433)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1147)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:632)

根据异常栈的信息,DataNode写入策略问题导致失败

然后寻找源码位置在dfsclient中,发现是客户端在pipeline写数据块时候的问题,也出现了两个相关的参数:
dfs.client.block.write.replace-datanode-on-failure.enable=true

如果在写pipeline中存在一个DataNode或者网络故障时,那么DFSClient将尝试从pipeline中删除失败的DataNode,然后继续尝试剩下的DataNodes进行写入。结果,pipeline中的DataNodes的数量在减少。该特性是在pipeline中添加新的DataNodes。这是一个site-wide属性来enable/disable该特性。当集群规模非常小时,例如3个节点或更少时,集群管理员可能希望将策略设置为NEVER在默认配置文件或禁用该特性。否则,因为找不到新的DataNode来替换,用户可能会经历异常高的pipeline错误


dfs.client.block.write.replace-datanode-on-failure.policy=DEFAULT

这个属性只有在dfs.client.block.write.replace-datanode-on-failure.enable设置true时有效:

ALWAYS:当一个存在的DataNode被删除时,总是添加一个新的DataNode

NEVER:永远不添加新的DataNode

DEFAULT:副本数是r,DataNode的数时n,只要r >= 3时,或者floor(r/2)大于等于n时,r>n时再添加一个新的DataNode,并且这个块是hflushed/appended

conf.set("dfs.client.block.write.replace-datanode-on-failure.policy","NEVER"); 
conf.set("dfs.client.block.write.replace-datanode-on-failure.enable","true"); 


声明:该文观点仅代表作者本人,牛骨文系教育信息发布平台,牛骨文仅提供信息存储空间服务。