Hi Clay,All
Thank you for your response and help.
I already solved that.
i found that there are many reasons for this issue: the OutOfMemoryError:
unable to create new native thread.
the mmap maybe one of the reasons. But it's not the problem i met.( i use
docker to run sls).
the key for my issue is:
SLS-runner.xml
<property>
<name>yarn.sls.runner.pool.size</name>
<value>100000</value>
</property>
the value i set is too large.
in the TaskRunner.java#155:
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L155
executor = new ThreadPoolExecutor(threadPoolSize, threadPoolSize, 0,
TimeUnit.MILLISECONDS, queue);
ThreadPoolExecutor:
ThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime,
TimeUnit unit, BlockingQueue<Runnable> workQueue)
So the both corePoolSize and maximumPoolSize are set the same
size:threadPoolSize. So if the users set the value of
yarn.sls.runner.pool.size too large. the corePoolSize will cause
OutOfMemoryError error.
And in fact, for the sake of convenience and insurance, users will always set a
big value.
Here is my solution:
1. I think we need set a max_PoolSize in slsconfiguration.java. and take it as
the second parameter:
executor = new ThreadPoolExecutor(threadPoolSize, max_PoolSize, 0,
TimeUnit.MILLISECONDS, queue);
2. We also can judge the size of input threadPoolSize, if lager than
max_PoolSize, throws an error or warn to remind users.
I can submit an patch for my solution. What do you think?
------------------------------------------------------------------
Hi Sichen,
I would expect you are running out of mmap ranges on most stock Linux
kernels. (Each thread takes a mmap slot.) You can increase your
vm.max_map_count[1] to see if that helps.
-Clay
[1]: A discussion on effecting the change:
https://www.systutorials.com/241561/maximum-number-of-mmaped-ranges-and-how-to-set-it-on-linux/
On Tue, 24 Jul 2018, 赵思晨(思霖) wrote:
> Hi,
> I am running 200+ jobs, and each job contains 100 tasks, when i using
> slsrun.sh
> to start SLS.
> it came out error:
>
> 2018-07-24 04:47:27,957 INFO capacity.CapacityScheduler: Added node 11.178.150
> .104:1604 clusterResource: <memory:821760000, vCores:15408000, disk: 609900000
> 0M, resource2: 8025G>
> Exception in thread "main" java.lang.OutOfMemoryError: unable to create new na
> tive thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:717)
> at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecuto
> r.java:957)
> at java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(Thre
> adPoolExecutor.java:1617)
> at org.apache.hadoop.yarn.sls.scheduler.TaskRunner.start(TaskRunner.ja
> va:157)
> at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:247)
> at org.apache.hadoop.yarn.sls.SLSRunner.run(SLSRunner.java:950)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:957)
>
> I set the Xmx and Xms in Hadoop-env.sh: -Xmx20480m, -Xms20480m, but still
> doesn't work.
>
> Anyone help me?
>
> thanks inadvance
>
> Sichen
>
>