Re: Intent to implement and ship: navigator.hardwareConcurrency

Ehsan Akhgari Wed, 14 May 2014 11:40:30 -0700

On 2014-05-13, 9:01 PM, Rik Cabanier wrote:




On Tue, May 13, 2014 at 3:16 PM, Ehsan Akhgari <ehsan.akhg...@gmail.com
<mailto:ehsan.akhg...@gmail.com>> wrote:


        ...


        That is not the point of this attribute. It's just a hint for
        the author
        so he can tune his application accordingly.
        Maybe the application is tuned to use fewer cores, or maybe
        more. It all
        depends...


    The problem is that the API doesn't really make it obvious that
    you're not supposed to take the value that the getter returns and
    just spawn N workers.  IOW, the API encourages the wrong behavior by
    design.


That is simply untrue.

I'm assuming that the goal of this API is to allow authors to spawn asmany workers as possible so that they can exhaust all of the cores inthe interest of finishing their computation faster. I have providedreasons why any thread which is running at a higher priority on thesystem busy doing work is going to make this number an overapproximation, I have given you two examples of higher priority threadsthat we're currently shipping in Firefox (Chrome Workers and theMediaStreamGraph thread) and have provided you with experimentalevidence of running Eli's test cases trying to exhaust as many cores asit can fails to predict the number of cores in these situations. If youdon't find any of this convincing, I'd respectfully ask us to agree todisagree on this point.

For the sake of argument, let's say you are right. How are things worse
than before?

I don't think we should necessarily try to find a solution that is justnot worse than the status quo, I'm more interested in us implementing agood solution here (and yes, I'm aware that there is no concreteproposal out there that is better at this point.)

             (Note that I would be very eager to discuss a proposal that
        actually
             tries to solve that problem.)


        You should do that! People have brought this up in the past but no
        progress has been made in the last 2 years.
        However, if this simple attribute is able to stir people's
        emotions, can
        you imagine what would happen if you propose something complex? :-)


    Sorry, but I have a long list of things on my todo list, and
    honestly this one is not nearly close to the top of the list,
    because I'm not aware of people asking for this feature very often.
      I'm sure there are some people who would like it, but there are
    many problems that we are trying to solve here, and this one doesn't
    look very high priority.


That's fine but we're coming right back to the start: there is no way
for informed authors to make a decision today.


Yes, absolutely.

The "let's build something complex that solves everything" proposal
won't be done in a long time. Meanwhile apps can make responsive UI's
and fluid games.

That's I think one fundamental issue we're disagreeing on. I think thatapps can build responsive UIs and fluid games without this today on the Web.

                 I don't have any other cases where this is done.


             That really makes me question the "positive feedback from web
             developers" cited in the original post on this thread.  Can you
             please point us to places where that feedback is documented?

        ...
        Python:

             multiprocessing.cpu_count()

             11,295 results

        
https://github.com/search?q=__multiprocessing.cpu_count%28%__29+extension%3Apy&type=Code&__ref=advsearch&l=
        
<https://github.com/search?q=multiprocessing.cpu_count%28%29+extension%3Apy&type=Code&ref=advsearch&l=>

        ...
        Java:

             Runtime.getRuntime().__availableProcessors()

             23,967 results

        
https://github.com/search?q=__availableProcessors%28%29+__extension%3Ajava&type=Code&__ref=searchresults
        
<https://github.com/search?q=availableProcessors%28%29+extension%3Ajava&type=Code&ref=searchresults>

        ...

        node.js is also exposing it:

             require('os').cpus()

             4,851 results

        
https://github.com/search?q=__require%28%27os%27%29.cpus%28%__29+extension%3Ajs&type=Code&__ref=searchresults
        
<https://github.com/search?q=require%28%27os%27%29.cpus%28%29+extension%3Ajs&type=Code&ref=searchresults>


    I don't view platform parity as a checklist of features, so I really
    have no interest in "checking this checkbox" just so that the Web
    platform can be listed in these kinds of lists.  Honestly a list of
    github hits without more information on what this value is actually
    used for etc. is not really that helpful.  We're not taking a vote
    of popularity here.  ;-)


Wait, you stated:

    Native apps don't typically run in a VM which provides highly
    sophisticated functionality for them.

and

    That really makes me question the "positive feedback from
    web developers" cited in the original post on this thread.

There were 24,000 hits for java which is on the web and a VM but now you
say that it's not a vote of popularity?

We may have a different terminology here, but to me, "positive feedbackfrom web developers" should indicate a large amount of demand from theweb developer community for us to solve this problem at this point, andalso a strong positive signal from them on this specific solution withthe flaws that I have described above in mind. That simply doesn't mapto searching for API names on non-Web technologies on github. :-)

Also, FTR, I strongly disagree that we should implement all popular JavaAPIs just because there is a way to run Java code on the web. ;-)

        ...
                 Why not? How is the web platform different?


             Here's why I find the native platform parity argument
        unconvincing
             here.  This is not the only primitive that native platforms
        expose
             to make it possible for you to write apps that scale to the
        number
             of available cores.  For example, OS X provides GCD.  Windows
             provides at least two threadpool APIs.  Not sure if Linux
        directly
             addresses this problem right now.


        I'm not familiar with the success of those frameworks. Asking
        around at
        Adobe, so far I haven't found anyone that has used them.
        Tuning the application depending on the number of CPU's is done
        quite often.


    But do you have arguments on the specific problems I brought up
    which make this a bad idea?


Can you restate the actual problem? I reread your message but didn't
find anything that indicates this is a bad idea.

See above where I re-described why this is not a good technical solutionto achieve the goal of the API.

Also, as I've mentioned several times, this API basically ignores thefact that there are AMP systems shipping *today* and dies not take thefact that future Web engines may try to use as many cores as they can ata higher priority (Servo being one example.)

      "Others do this" is just not going to convince me here.

What would convince you? The fact that every other framework provides
this and people use it, is not a strong indication?
It's not possible for me to find exact javascript examples that use this
feature since it doesn't exist.

I'm obviously not asking you to create evidence of usage of an API whichno engine has shipped yet. You originally cited strong positivefeedback from web developers on this and given the fact that I have notseen that myself I would like to know more about where those requestsare coming from. At the lack of that, what would convince me would begood answers to all of the points that I've brought up several times inthis thread (which I have summarized above.)

Please note that _if_ this were the single most requested features thatactually blocked people from building apps for the Web, I might havebeen inclined to go on with a bad solution rather than no solution atall. And if you provide evidence of that, I'm willing to reconsider myposition.

        ...

        I'm unsure how tabs are different from different processes.
        As an author, I would certainly want my web workers to run in
        parallel.
        Why else would I use workers to do number crunching?
        Again, this is a problem that already exists and we're not trying to
        solve it here.


    What _is_ the problem that you're trying to solve here then?  I
    thought that this API is supposed to give you a number of workers
    that the application should start so that it can keep all of the
    cores busy?


Make it possible for authors to make a semi-informed decision on how to
divide the work among workers.

That can already be done using the timing attacks at the waste of someCPU time. The question is, whether we should do that right now?

In a good number of cases the pool will be smaller than the number of
cores (ie a game), or it might be bigger (see the webkit bug that goes
over this).

Which part of the WebKit bug are you mentioning exactly? The onlymention of "games" on the bug ishttps://bugs.webkit.org/show_bug.cgi?id=132588#c10 which seems to argueagainst your position. (It's not very easy to follow the discussion inthat bug...)

             Also, please note that there are use cases on native
        platforms which
             don't really exist on the Web.  For example, on a desktop
        OS you
             might want to write a "system info" application which
        actually wants
             to list information about the hardware installed on the system.


I don't think that's all that important.

Well, you seem to imply that the reason why those platforms expose thenumber of cores is to support the use case under the discussion, and I'mchallenging that assumption.

                      If you try Eli's test case in Firefox under different
                 workloads (for
                      example, while building Firefox, doing a disk
        intensive
                 operation,
                      etc.), the utter inaccuracy of the results is
        proof in the
                      ineffectiveness of this number in my opinion.


                 As Eli mentioned, you can run the algorithm for longer
        and get a
                 more
                 accurate result.


             I tried

        <http://wg.oftn.org/projects/____customized-core-estimator/____demo/
        <http://wg.oftn.org/projects/__customized-core-estimator/__demo/>


        <http://wg.oftn.org/projects/__customized-core-estimator/__demo/
        <http://wg.oftn.org/projects/customized-core-estimator/demo/>>>
        which
             is supposed to give you a more accurate estimate.  Have you
        tried
             that page when the system is under load in Firefox?


    So did you try this?  :-)


I did. As expected, it drops off as the load increases. I don't see what
this proves except that the polyfill is unreliable as it posted.

It's an argument that the information, if exposed from the UA, will be*just* as unreliable.

              > Again, if the native platform didn't support this,

                 doing this in C++ would result in the same.


             Yes, exactly.  Which is why I don't really buy the argument
        that we
             should do this because native platforms do this.


        I don't follow. Yes, the algorithm is imprecise and it would be
        just as
        imprecise in C++.
        There is no difference in behavior between the web platform and
        native.


    My point is, I think you should have some evidence indicating why
    this is a good idea.  So far I think the only argument has been the
    fact that this is exposed by other platforms.


And used successfully on other platforms.
Note that it is exposed on PNaCl in Chrome as well

So? PNaCl is a Chrome specific technology so it's not any more relevantto this discussion that Python, Perl, Java, etc. is.

        Does Firefox behave different on such systems? (Is it even
        supported on
        these systems?)
        If so, how are workers scheduled? In the end, even if the cores are
        heterogeneous, knowing the number of them will keep them ALL
        busy (which
        means more work is getting done)


    I don't know the answer to any of these questions.  I was hoping
    that you would do the research here.  :-)


I did a little bit of research. As usual, wikipedia is the easiest to
read: http://en.wikipedia.org/wiki/Big.LITTLE There are many other
papers [1] for more information.

In "In-kernel switcher" mode, the little CPU's are taken offline when
the big one spool up. So, in this case the number of cores is half the
physical CPU's.
In "Heterogeneous multi-processing", the big CPU's will help out when
the system load increases. In this case, the number of cores is equal to
the number of CPU's.

So which number is the one that the OS exposes to us in each case? Andis that number constant no matter how many actual hardware cores areactive at any given point in time?

                      This proposal also assumes that the UA itself is
        mostly
                 contempt
                      with using a single core, which is true for the
        current browser
                      engines, but we're working on changing that
        assumption in
                 Servo.  It
                      also doesn't take the possibility of several ones
        of these web
                      application running at the same time.


                 How is this different from the native platform?


             On the first point, I hope the difference is obvious.
          Native apps
             don't typically run in a VM which provides highly sophisticated
             functionality for them.


        ...

        Why would that be? Are you burning more CPU resources in servo
        to do the
        same thing? If so, that sounds like a problem.
        If not, the use case to scale your workload to more CPU cores is
        even
        better as similar tasks will end faster.
        For instance, if we have a system with 8 idle cores and we
        divide up a
        64 second task


    What Boris said.


He didn't refute that knowing the number of cores would still help.


I'm trying to do that here.  :-)

             UA overhead = 2s + 8 * 8s -> 10s

             UA overhead over 2 threads = 2 * 1s + 8 * 8s -> 9s

             On the second point, please see the paragraph above where I
        discuss
             that.


                      Until these issues are addressed, I do not think
        we should
                 implement
                      or ship this feature.


                 FWIW these issues were already discussed in the WebKit bug.


             The issues that I bring up here are the ones that I think
        have not
             either been brought up before or have not been sufficiently
             addressed, so I'd appreciate if you could try to address them
             sufficiently.  It could be that I'm wrong/misinformed and I
        would
             appreciate if you would call me out on those points.


                 I find it odd that we don't want to give authors access
        to such
                 a basic
                 feature. Not everything needs to be solved by a complex
        framework.


             You're asserting that navigator.hardwareConcurrency gives you a
             basic way of solving the use case of scaling computation over a
             number of worker threads.  I am rejecting that assertion
        here.  I am
             not arguing that we should not try to fix this problem, I'm
        just not
             convinced that the current API brings us any closer to
        solving it.


        I'm not asserting anything. I want to give authors an hint that
        they can
        make a semi-informed decision to balance their workload.
        Even if there's a more general solution later on to solve that
        particular problem, it will sometimes still be valuable to know the
        layout of the system so you can best divide up the work.


    I disagree.  Let me try to rephrase the issue with this.  The number
    of available cores is not a constant number equal to the number of
    logical cores exposed to us by the OS.  This number varies depending
    on everything else which is going on in the system, including the
    things that the UA has control over and the things that it does not.
      I hope the reason for my opposition is clear so far.


No, you failed to show why this does not apply to the web platform and
JavaScript in particular.

That is not a fair summary of everything I have said here so far.Please see the first paragraph of my response here where I summarize whyI think this doesn't help the use case that it's trying to solve.You're of course welcome to disagree, but that doesn't mean that I'venecessarily failed to show my side of the argument.

Your arguments apply equally to PNaCL, Java, native applications and all
the other examples listed above


Yes they do!

> yet they all provide this functionality

and people are using it to build successful applications.


1. PNaCl/Java/native platforms doing something doesn't make it right.

2. There is a reason why people have built more sophisticated solutionsto solve this problem (GCD/Windows threadpools, etc.) So let's not justclose our eyes on those solutions and pretend that the number of coresis the only solution out there to address this use case in native platforms.


Cheers,
Ehsan
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to implement and ship: navigator.hardwareConcurrency

Reply via email to