On 11/22/2011 07:27 PM, Marek Olšák wrote:
On Tue, Nov 22, 2011 at 11:11 PM, Ian Romanick<[email protected]>  wrote:
All of this discussion is largely moot.  The failure that you're so angry
about was caused by a bug in the check, not by the check itself. That bug
has already been fixed (commit 151867b).

The exact same check was previously performed in st_glsl_to_tgsi (or
ir_to_mesa), and the exact same set of shaders would have been rejected.
  The check is now done in the linker instead.

Actually, the bug only got my attention and then I realized what is
actually happening in the linker. I probably wouldn't even notice
because I no longer do any 3D on my laptop with r500. I gotta admit, I
didn't know the checks were so... well, "not ready for a release" to
say the least and that's meant regardless of the bug.

Let's analyze the situation a bit, open-minded.

The checks can be enabled for OpenGL ES 2.0 with no problem, we won't
likely get a failure there.

They can also be enabled for D3D10-level and later hardware, because
its limits are pretty high and therefore are unlikely to fail. The
problem is with the D3D9-level hardware (probably related to the
vmware driver too).

Let me paraphrase this a little bit in a way that I think concisely captures the intention:

    "We need to work really hard to make things work on older hardware."

I don't think anyone disagrees with that. However, the solutions you have so far proposed to this problem have said:

    "We need to let anything through whether it will work or not."

Those are very different things. We can have the first without the second. I will fight very, very hard to not allow the second in any project with which I'm associated.

We also have to consider that a lot of applications are now developed
with D3D10-level or later hardware and even though the expected
hardware requirements for such an app are meant to be low, there can
be, say, programming mistakes, which raise hardware requirements quite
a lot. The app developer has no way to know about it, because it just
works on his machine. For example, some compositing managers had such
mistakes and there's been a lot of whining about that on Phoronix.

We also should take into account that hardly any app has a fallback if
a shader program fails to link. VDrift has one, but that's rather an
exception to the rule (VDrift is an interesting example though; it
falls back to fixed-function because Mesa is too strict about obeying
specs, just that really). Most apps usually just abort, crash, or
completely ignore that linking failed and render garbage or nothing.
Wine, our biggest user of Mesa, can't fail. D3D shaders must compile
successfully or it's game over.

Here's the deal about Wine and compositing (my spell checker always wants to make that word "composting") managers. All of the closed-source driver makers have developer outreach programs that work closely with tier-1 developers to make sure their apps work and run well. This is how they avoid a lot of these sorts of problems. It's unreasonable to expect any developer to test their product on every piece of hardware. We (the Mesa community) can't even manage that with our drivers. What we can do is try to prevent app developers from shooting themselves in the foot.

We've had a lot more communication with the Wine developers over the last year or so, and things have gotten a lot better there. We can and should be more proactive, but I'm not sure what form that should take. Pretty much everything we do with app developers is reactive. Right? We only interact with them when they come to us because something doesn't work or they got a bug report from a user.

Although the possibility of a linker failure is a nice feature in
theory, the reality is nobody wants it, because it's the primary cause
of apps aborting themselves or just rendering nothing (and, of course,
everybody blames Mesa, or worse: Linux).

By this same logic, malloc should never return NULL because most apps can't handle it. Instead it should mmap /dev/null and return a pointer to that. That analogy isn't as far off as it may seem: in both cases the underlying infrastructure has lied to the application that an operation succeeded, and it has given it a resource that it can't possibly use.

Surely nobody would suggest glibc implement such a thing much less implement it. We shouldn't either. Both cases may make some deployed application run. However, what happens to the poor schmuck writing an application that accidentally tries 5TB instead of 5MB? He spends hours trying to figure out why all his reads of malloc'ed memory give zeros.

My day job is writing OpenGL drivers. My evening job is teaching people how to write OpenGL applications. I have seen people try to debug OpenGL code, and it's already a miserable process. There are so many things that can lead to a mysterious black screen. Adding another by lying to the developer doesn't do anyone any favors. The code looks fine, the driver says it's fine, and it may even work fine on a different piece of hardware. That developer will blame Mesa, Linux, or OpenGL and probably ragequit.

There is a quite a large possibility that if those linker checks were
disabled, more apps would work, especially those were the limits are
exceeded by a little bit, but the difference is eliminated by the
driver. Sure, some apps would still be broken or render garbage, but
it's either this or nothing, don't you think?

No, I don't think that at all. I think we can have more shaders run within hardware limits without letting things through that cannot run. There have been proposals on IRC, in at least one of the bug reports, and in this e-mail thread about how we could achieve that. It just requires some work. Good engineering is a real hassle that way. :)
_______________________________________________
mesa-dev mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to