牛骨文教育服务平台(让学习变的简单)
博文笔记

运行mahout 将学习集的序列化转化为向量报错Error: Java heap space(mapreduce运行内存调优)

创建时间:2015-10-23 投稿人: 浏览次数:205
mahout seq2sparse -i ./email-seq -o ./email-vectors  -lnorm -nv  -wt tfidf

15/10/28 09:36:58 INFO FileInputFormat: Total input paths to process : 1
15/10/28 09:36:58 INFO JobSubmitter: number of splits:1
15/10/28 09:36:58 INFO JobSubmitter: Submitting tokens for job: job_1445824701143_0007
15/10/28 09:36:59 INFO YarnClientImpl: Submitted application application_1445824701143_0007
15/10/28 09:36:59 INFO Job: The url to track the job: http://hcg1:8088/proxy/application_1445824701143_0007/
15/10/28 09:36:59 INFO Job: Running job: job_1445824701143_0007
15/10/28 09:37:10 INFO Job: Job job_1445824701143_0007 running in uber mode : false
15/10/28 09:37:10 INFO Job:  map 0% reduce 0%
15/10/28 09:37:38 INFO Job:  map 100% reduce 0%
15/10/28 09:37:58 INFO Job:  map 100% reduce 100%
15/10/28 09:38:21 INFO Job: Task Id : attempt_1445824701143_0007_r_000000_0, Status : FAILED
Error: Java heap space
15/10/28 09:38:22 INFO Job:  map 100% reduce 0%
15/10/28 09:38:41 INFO Job:  map 100% reduce 100%
15/10/28 09:39:03 INFO Job: Task Id : attempt_1445824701143_0007_r_000000_1, Status : FAILED
Error: Java heap space
15/10/28 09:39:04 INFO Job:  map 100% reduce 0%
解决方法:调大的MapReduce中的参数
mapreduce.map.java.opts
<属性>
    <名> mapreduce.map.java.opts </姓名>
    <值> -Xmx1024m </值>
  </属性>
继续报错
Error: GC overhead limit exceeded
15/10/28 09:22:36 INFO FileInputFormat: Total input paths to process : 1
15/10/28 09:22:36 INFO JobSubmitter: number of splits:1
15/10/28 09:22:36 INFO JobSubmitter: Submitting tokens for job: job_1445824701143_0004
15/10/28 09:22:37 INFO YarnClientImpl: Submitted application application_1445824701143_0004
15/10/28 09:22:37 INFO Job: The url to track the job: http://hcg1:8088/proxy/application_1445824701143_0004/
15/10/28 09:22:37 INFO Job: Running job: job_1445824701143_0004
15/10/28 09:23:08 INFO Job: Job job_1445824701143_0004 running in uber mode : false
15/10/28 09:23:08 INFO Job:  map 0% reduce 0%
15/10/28 09:23:26 INFO Job:  map 100% reduce 0%
15/10/28 09:23:53 INFO Job: Task Id : attempt_1445824701143_0004_r_000000_0, Status : FAILED
Error: GC overhead limit exceeded
15/10/28 09:24:22 INFO Job:  map 100% reduce 100%
15/10/28 09:24:22 INFO Job: Task Id : attempt_1445824701143_0004_r_000000_1, Status : FAILED
Error: GC overhead limit exceeded
15/10/28 09:24:23 INFO Job:  map 100% reduce 0%
15/10/28 09:24:50 INFO Job:  map 100% reduce 100%
15/10/28 09:24:50 INFO Job: Task Id : attempt_1445824701143_0004_r_000000_2, Status : FAILED
Error: GC overhead limit exceeded
15/10/28 09:24:51 INFO Job:  map 100% reduce 0%
经过多次试验,我的数据量需要的配置
<属性>
    <名> mapreduce.map.java.opts </姓名>
    <值> -Xmx3000m </值>
  </属性>
<属性>
    <名> mapreduce.reduce.java.opts </姓名>
    <值> -Xmx3000m </值>
  </属性>
将hadoop-env.sh的hadoope HADOOP_HEAPSIZE调大
参考:http://blogs.msdn.com/b/shanyu/archive/2014/07/31/hadoop-yarn-memory-settings-in-hdinsigh.aspx
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/rpm-chap1-11.html
声明:该文观点仅代表作者本人,牛骨文系教育信息发布平台,牛骨文仅提供信息存储空间服务。