Re: Intent to implement and ship: navigator.hardwareConcurrency

Ehsan Akhgari Tue, 13 May 2014 15:17:26 -0700

On 2014-05-13, 2:42 PM, Rik Cabanier wrote:




On Tue, May 13, 2014 at 10:43 AM, Ehsan Akhgari <ehsan.akhg...@gmail.com
<mailto:ehsan.akhg...@gmail.com>> wrote:

    On 2014-05-13, 9:25 AM, Rik Cabanier wrote:

                 Web applications can already do this today. There's nothing
                 stopping them
                 from figuring out the CPU's and trying to use them all.
                 Worse, I think they will likely optimize for popular
        platforms
                 which either
                 overtax or underutilize non-popular ones.


             Can you please provide some examples of actual web
        applications that
             do this, and what they're exactly trying to do with the
        number once
             they estimate one?  (Eli's timing attack demos don't count. ;-)


        Eli's listed some examples:
        http://wiki.whatwg.org/wiki/__NavigatorCores#Example_use___cases
        <http://wiki.whatwg.org/wiki/NavigatorCores#Example_use_cases>


    That is a list of use cases which could use better ways of
    supporting a worker pool that actually scales to how many cores you
    have available at any given point in time.  That is *not* what
    navigator.hardwareConcurrency gives you, so I don't find those
    examples very convincing.


That is not the point of this attribute. It's just a hint for the author
so he can tune his application accordingly.
Maybe the application is tuned to use fewer cores, or maybe more. It all
depends...

The problem is that the API doesn't really make it obvious that you'renot supposed to take the value that the getter returns and just spawn Nworkers. IOW, the API encourages the wrong behavior by design.

    (Note that I would be very eager to discuss a proposal that actually
    tries to solve that problem.)


You should do that! People have brought this up in the past but no
progress has been made in the last 2 years.
However, if this simple attribute is able to stir people's emotions, can
you imagine what would happen if you propose something complex? :-)

Sorry, but I have a long list of things on my todo list, and honestlythis one is not nearly close to the top of the list, because I'm notaware of people asking for this feature very often. I'm sure there aresome people who would like it, but there are many problems that we aretrying to solve here, and this one doesn't look very high priority.

        I don't have any other cases where this is done.


    That really makes me question the "positive feedback from web
    developers" cited in the original post on this thread.  Can you
    please point us to places where that feedback is documented?


That was from the email to blink-dev where Adam Barth stated this.
I'll ask him where this came from.


Thanks!

I looked at other interpreted languages and they all seem to give you
access to the CPU count. Then I searched on GitHub to see the popularity:
Python:

    multiprocessing.cpu_count()

    11,295 results

    
https://github.com/search?q=multiprocessing.cpu_count%28%29+extension%3Apy&type=Code&ref=advsearch&l=

Perl:

    use Sys::Info;
    use Sys::Info::Constants qw( :device_cpu );
    my $info = Sys::Info->new;
    my $cpu = $info->device( CPU => %options );

    7 results
    
https://github.com/search?q=device_cpu+extension%3Apl&type=Code&ref=searchresults

Java:

    Runtime.getRuntime().availableProcessors()

    23,967 results

    
https://github.com/search?q=availableProcessors%28%29+extension%3Ajava&type=Code&ref=searchresults

Ruby:

    Facter.processorcount

    115 results

    
https://github.com/search?q=processorcount+extension%3Arb&type=Code&ref=searchresults

C#:

    Environment.ProcessorCount

    5,315 results
    
https://github.com/search?q=Environment.ProcessorCount&type=Code&ref=searchresults

I also searched for JavaScript files that contain "cpu" and "core":

    21,487 results

    
https://github.com/search?q=core+cpu+extension%3Ajs&type=Code&ref=searchresults

The results are mixed. Some projects seem to hard code CPU cores while
others are not about workers at all.
A search for "worker" and "cpu" gets more consistent results:

    2,812 results

    
https://github.com/search?q=worker+cpu+extension%3Ajs&type=Code&ref=searchresults

node.js is also exposing it:

    require('os').cpus()

    4,851 results

    
https://github.com/search?q=require%28%27os%27%29.cpus%28%29+extension%3Ajs&type=Code&ref=searchresults

I don't view platform parity as a checklist of features, so I reallyhave no interest in "checking this checkbox" just so that the Webplatform can be listed in these kinds of lists. Honestly a list ofgithub hits without more information on what this value is actually usedfor etc. is not really that helpful. We're not taking a vote ofpopularity here. ;-)

                 Everyone is in agreement that that is a hard problem to
        fix and
                 that there
                 is no clear answer.
                 Whatever solution is picked (maybe like Grand Central
        or Intel
                 TBB), most
                 solutions will still want to know how many cores are
        available.
                 Looking at the native platform (and Adobe's
        applications), many
                 query the
                 operating system for this information to balance the
        workload. I
                 don't see
                 why this would be different for the web platform.


             I don't think that the value exposed by the native platforms is
             particularly useful.  Really if the use case is to try to
        adapt the
             number of workers to a number that will allow you to run
        them all
             concurrently, that is not the same number as reported
        traditionally
             by the native platforms.


        Why not? How is the web platform different?


    Here's why I find the native platform parity argument unconvincing
    here.  This is not the only primitive that native platforms expose
    to make it possible for you to write apps that scale to the number
    of available cores.  For example, OS X provides GCD.  Windows
    provides at least two threadpool APIs.  Not sure if Linux directly
    addresses this problem right now.


I'm not familiar with the success of those frameworks. Asking around at
Adobe, so far I haven't found anyone that has used them.
Tuning the application depending on the number of CPU's is done quite often.

But do you have arguments on the specific problems I brought up whichmake this a bad idea? "Others do this" is just not going to convince mehere.

    Another very important distinction between the Web platform and
    native platforms which is relevant here is the amount of abstraction
    that each platform provides on top of hardware.  Native platforms
    provide a much lower level of abstraction, and as a result, on such
    platforms at the very least you can control how many threads your
    own application spawns and keeps active.  We don't even have this
    level of control on the Web platform (applications are typically
    even unaware that you have multiple copies running in different tabs
    for example.)


I'm unsure how tabs are different from different processes.
As an author, I would certainly want my web workers to run in parallel.
Why else would I use workers to do number crunching?
Again, this is a problem that already exists and we're not trying to
solve it here.

What _is_ the problem that you're trying to solve here then? I thoughtthat this API is supposed to give you a number of workers that theapplication should start so that it can keep all of the cores busy?

    Also, please note that there are use cases on native platforms which
    don't really exist on the Web.  For example, on a desktop OS you
    might want to write a "system info" application which actually wants
    to list information about the hardware installed on the system.


             If you try Eli's test case in Firefox under different
        workloads (for
             example, while building Firefox, doing a disk intensive
        operation,
             etc.), the utter inaccuracy of the results is proof in the
             ineffectiveness of this number in my opinion.


        As Eli mentioned, you can run the algorithm for longer and get a
        more
        accurate result.


    I tried
    <http://wg.oftn.org/projects/__customized-core-estimator/__demo/
    <http://wg.oftn.org/projects/customized-core-estimator/demo/>> which
    is supposed to give you a more accurate estimate.  Have you tried
    that page when the system is under load in Firefox?


So did you try this?  :-)

     > Again, if the native platform didn't support this,

        doing this in C++ would result in the same.


    Yes, exactly.  Which is why I don't really buy the argument that we
    should do this because native platforms do this.


I don't follow. Yes, the algorithm is imprecise and it would be just as
imprecise in C++.
There is no difference in behavior between the web platform and native.

My point is, I think you should have some evidence indicating why thisis a good idea. So far I think the only argument has been the fact thatthis is exposed by other platforms.

             Also, I worry that this API is too focused on the
        past/present.  For
             example, I don't think anyone sufficiently addressed Boris'
        concern
             on the whatwg thread about AMP vs SMP systems.


        Can you provide a link to that?


    http://lists.whatwg.org/htdig.__cgi/whatwg-whatwg.org/2014-__May/296737.html
    <http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2014-May/296737.html>


     > Are there systems that expose this to

        the user? (AFAIK slow cores are substituted with fast ones on
        the fly.)


    I'm not sure about the details of how these cores are controlled,
    whether the control happens in hardware or in the OS, etc.  This is
    one aspect of this problem which needs more research before we can
    decide to implement and ship this, IMO.


Does Firefox behave different on such systems? (Is it even supported on
these systems?)
If so, how are workers scheduled? In the end, even if the cores are
heterogeneous, knowing the number of them will keep them ALL busy (which
means more work is getting done)

I don't know the answer to any of these questions. I was hoping thatyou would do the research here. :-)

             This proposal also assumes that the UA itself is mostly
        contempt
             with using a single core, which is true for the current browser
             engines, but we're working on changing that assumption in
        Servo.  It
             also doesn't take the possibility of several ones of these web
             application running at the same time.


        How is this different from the native platform?


    On the first point, I hope the difference is obvious.  Native apps
    don't typically run in a VM which provides highly sophisticated
    functionality for them.


See my long list of interpreted languages earlier in this email.
There are lots of VM's that support this and a lot of people are using it.

    And also they give you direct control over how many threads your
    "application" (which typically maps to an OS level process) spawns
    and when, what their priorities and affinities are, etc.  I think
    with that in mind, implementing this API as is in Gecko will be
    lying to the user (because we run some threads with higher priority
    than worker threads, for example our chrome workers, the
    MediaStreamGraph thread, etc.) and it would actually be harmful in
    Servo where the UA tries to get its hands on as many cores as it can
    do to things such as running script, layout, etc.


Why would that be? Are you burning more CPU resources in servo to do the
same thing? If so, that sounds like a problem.
If not, the use case to scale your workload to more CPU cores is even
better as similar tasks will end faster.
For instance, if we have a system with 8 idle cores and we divide up a
64 second task


What Boris said.

    UA overhead = 2s + 8 * 8s -> 10s

    UA overhead over 2 threads = 2 * 1s + 8 * 8s -> 9s

    On the second point, please see the paragraph above where I discuss
    that.


             Until these issues are addressed, I do not think we should
        implement
             or ship this feature.


        FWIW these issues were already discussed in the WebKit bug.


    The issues that I bring up here are the ones that I think have not
    either been brought up before or have not been sufficiently
    addressed, so I'd appreciate if you could try to address them
    sufficiently.  It could be that I'm wrong/misinformed and I would
    appreciate if you would call me out on those points.


        I find it odd that we don't want to give authors access to such
        a basic
        feature. Not everything needs to be solved by a complex framework.


    You're asserting that navigator.hardwareConcurrency gives you a
    basic way of solving the use case of scaling computation over a
    number of worker threads.  I am rejecting that assertion here.  I am
    not arguing that we should not try to fix this problem, I'm just not
    convinced that the current API brings us any closer to solving it.


I'm not asserting anything. I want to give authors an hint that they can
make a semi-informed decision to balance their workload.
Even if there's a more general solution later on to solve that
particular problem, it will sometimes still be valuable to know the
layout of the system so you can best divide up the work.

I disagree. Let me try to rephrase the issue with this. The number ofavailable cores is not a constant number equal to the number of logicalcores exposed to us by the OS. This number varies depending oneverything else which is going on in the system, including the thingsthat the UA has control over and the things that it does not. I hopethe reason for my opposition is clear so far.


Cheers,
Ehsan

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to implement and ship: navigator.hardwareConcurrency

Reply via email to