Your message dated Wed, 19 Nov 2014 21:19:10 +0000
with message-id <e1xrcec-0004o0...@franck.debian.org>
and subject line Bug#768090: fixed in beignet 0.9.3~really.0.8+dfsg-1
has caused the Debian Bug report #768090,
regarding beignet: pow(n), erf(c), tgamma give wrong results
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)


-- 
768090: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768090
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems
--- Begin Message ---
Package: beignet
Version: 0.9.3~dfsg-1
Severity: serious
Justification: a math library should NOT silently give wrong answers
Control: found -1 0.8-1.1
Control: tags -1 upstream patch

In current beignet (0.8, 0.9.3 and upstream-HEAD):
-pow/pown ignore the sign of their first argument (e.g. pow(-2,3) gives 8 instead of -8) -erf/erfc diverge (instead of converging to 1 or 0) for arguments above about 2 -tgamma is actually lgamma, a related but very different function (and the test suite doesn't notice because it checks against glibc's gammaf, which is also lgamma, instead of tgammaf)
#!/usr/bin/env python3
#Depends: python3-pyopencl python3-scipy
import pyopencl
import pyopencl.array
import numpy as np
import scipy.special as spfn
import time
ctx=pyopencl.create_some_context()
cq=pyopencl.CommandQueue(ctx)
a1=np.array(-0.95*np.random.rand(1e6)-0.01,dtype=np.float32)
a2=-1000*a1
a3=20*(a1+0.5)
ai=np.array(a3,dtype=np.int32)
aCL1=pyopencl.array.to_device(cq,a1)
aCL2=pyopencl.array.to_device(cq,a2)
aCL3=pyopencl.array.to_device(cq,a3)
aCLi=pyopencl.array.to_device(cq,ai)
bCL=aCL1+1
c=np.array(range(20),dtype=np.float32)/3-2
ci=np.array(c,dtype=np.int32)
cCL=pyopencl.array.to_device(cq,c)
cCLi=pyopencl.array.to_device(cq,ci)
dCL=cCL+1
def approx_erfc(x):
    p  =  0.3275911
    a1 =  0.254829592
    a2 = -0.284496736
    a3 =  1.421413741
    a4 = -1.453152027
    a5 =  1.061405429
    d = np.exp(-x*x)
    t = 1/(1+p*np.abs(x))
    r = (a1*t+a2*t*t+a3*t*t*t+a4*t*t*t*t+a5*t*t*t*t*t)*d
    return r*np.sign(x)+1-np.sign(x)
erfc_err=np.abs(spfn.erfc(a3)-approx_erfc(a3))
print("x:",c,"\n",ci)
print("erfc:",np.max(erfc_err),np.mean(erfc_err))
f_to_test=[("cos",np.cos),("sin",np.sin),("tan",np.tan),("cosh",np.cosh),("cospi",lambda x:np.cos(np.pi*x)),("tanh",np.tanh),("exp",np.exp),("log",np.log),("sqrt",np.sqrt),("acos",np.arccos),("acosh",np.arccosh),("pow(a[i],c[i])",lambda x,y:x**y),("pown(a[i],d[i])",lambda x,y:x**y),("tgamma",spfn.gamma),("erf",spfn.erf),("erfc",spfn.erfc)]
for f in f_to_test:
    for aCL in aCL1,aCL2,aCL3:
        fCL=pyopencl.elementwise.ElementwiseKernel(ctx,"float *a,float *b,float *c,int *d","b[i]="+f[0]+("" if f[0] in ("pow(a[i],c[i])","pown(a[i],d[i])","powr(a[i],c[i])") else "(a[i])"))
        t0=time.time()
        fCL(aCL,bCL,aCL3,aCLi).wait()
        t=time.time()-t0
        b=bCL.get()
        if f[0] in ("pow(a[i],c[i])","powr(a[i],c[i])"):
            b0=f[1](aCL.get(),a3)
        elif f[0] in ("pown(a[i],d[i])",):
            b0=f[1](aCL.get(),ai)
        else:
            b0=f[1](aCL.get())
        abserr=np.abs(b-b0)
        relerr=np.abs(b/b0-1)
        print(f[0],"abs avg err:",np.nanmean(abserr)," max err:",np.max(abserr),"rel avg err:",np.nanmean(relerr)," max err:",np.max(relerr)," time:",t)
    if f[0] in ("erf","erfc","tgamma","pown(a[i],d[i])","pow(a[i],c[i])"):
        fCL(cCL,dCL,cCL,cCLi).wait()
        if f[0] in ("pow(a[i],c[i])","powr(a[i],c[i])"):
            d0=f[1](c,c)
        elif f[0] in ("pown(a[i],d[i])",):
            d0=f[1](c,ci)
        else:
            d0=f[1](c)
        print(dCL.get(),"\n",d0)

Description: <short summary of the patch>
 TODO: Put a short summary on the line above and replace this paragraph
 with a longer explanation of this change. Complete the meta-information
 with other relevant fields (see below for details). To make it easier, the
 information below has been extracted from the changelog. Adjust it or drop
 it.
 .
 beignet (0.9.3~dfsg-2) UNRELEASED; urgency=medium
 .
   * Fix tgamma,pow,pown,erf,erfc
   * Enable debug output in tests
Author: Rebecca N. Palmer <rnpalmer@rnpalmer-laptop>

---
The information above should follow the Patch Tagging Guidelines, please
checkout http://dep.debian.net/deps/dep3/ to learn about the format. Here
are templates for supplementary fields that you might want to add:

Origin: <vendor|upstream|other>, <url of original patch>
Bug: <url in upstream bugtracker>
Bug-Debian: https://bugs.debian.org/<bugnumber>
Bug-Ubuntu: https://launchpad.net/bugs/<bugnumber>
Forwarded: <no|not-needed|url proving that it has been forwarded>
Reviewed-By: <name and email of someone who approved the patch>
Last-Update: <YYYY-MM-DD>

--- beignet-0.9.3~dfsg.orig/utests/builtin_acos_asin.cpp
+++ beignet-0.9.3~dfsg/utests/builtin_acos_asin.cpp
@@ -2,12 +2,13 @@
 #include <cmath>
 #include <algorithm>
 
-#define udebug 0
+#define udebug 1
 #define printf_c(...) \
 {\
   printf("\033[1m\033[40;31m");\
   printf( __VA_ARGS__ );\
   printf("\033[0m");\
+  status = 1;\
 }
 
 const float input_data[] = {-30, -1, -0.92, -0.5, -0.09, 0, 0.09, 0.5, 0.92, 1, 30};
@@ -29,6 +30,7 @@ static void builtin_acos_asin(void)
 {
   // Setup kernel and buffers
   int k, i, index_cur;
+  int status = 0;
   float gpu_data[max_function * count_input] = {0}, cpu_data[max_function * count_input] = {0};
 
   OCL_CREATE_KERNEL("builtin_acos_asin");
@@ -82,6 +84,7 @@ static void builtin_acos_asin(void)
 #endif
     }
   }
+  OCL_ASSERT(status == 0);
 }
 
 MAKE_UTEST_FROM_FUNCTION(builtin_acos_asin)
--- beignet-0.9.3~dfsg.orig/utests/builtin_pow.cpp
+++ beignet-0.9.3~dfsg/utests/builtin_pow.cpp
@@ -2,12 +2,13 @@
 #include <cmath>
 #include <algorithm>
 
-#define udebug 0
+#define udebug 1
 #define printf_c(...) \
 {\
   printf("\033[1m\033[40;31m");\
   printf( __VA_ARGS__ );\
   printf("\033[0m");\
+  status = 1;\
 }
 const float ori_data[] = {-20.5, -1, -0.9, -0.01, 0, 0.01, 0.9, 1.0, 20.5};
 const int count_input_ori = sizeof(ori_data) / sizeof(ori_data[0]);
@@ -27,6 +28,7 @@ static void builtin_pow(void)
 {
   // Setup kernel and buffers
   int k, i, index_cur;
+  int status = 0;
   float gpu_data[max_function * count_input] = {0}, cpu_data[max_function * count_input] = {0};
 
   for(i=0; i<count_input_ori;i++)
@@ -87,6 +89,7 @@ static void builtin_pow(void)
 #endif
     }
   }
+  OCL_ASSERT(status == 0);
 }
 
 MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(builtin_pow)
--- beignet-0.9.3~dfsg.orig/utests/utest_generator.py
+++ beignet-0.9.3~dfsg/utests/utest_generator.py
@@ -73,6 +73,7 @@ def udebug(ulpSize,returnType):
       }
 #endif
   }
+  OCL_ASSERTM(status == 0, log);
 }\n'''%(returnType,\
         ulpUnit(ulpSize),ulpNum(ulpSize),\
         ulpNum(ulpSize), ulpNum(ulpSize),\
@@ -139,7 +140,7 @@ which can print more values and informat
 #include <algorithm>
 #include <string.h>
 
-#define udebug 0
+#define udebug 1
 #define FLT_MAX 0x1.fffffep127f
 #define FLT_MIN 0x1.0p-126f
 #define INT_ULP 0
@@ -149,6 +150,7 @@ which can print more values and informat
   printf("\\033[1m\\033[40;31m");\\
   printf( __VA_ARGS__ );\\
   printf("\\033[0m");\\
+  status = 1;\\
 }
 '''
     #########Execute class itself
@@ -248,6 +250,7 @@ which can print more values and informat
 static void %s_%s(void)
 {
   int index;
+  int status = 0;
   %s gpu_data[count_input] = {0}, cpu_data[count_input] = {0}, diff=0.0;
   char log[1024] = {0};
 
Description: <short summary of the patch>
 TODO: Put a short summary on the line above and replace this paragraph
 with a longer explanation of this change. Complete the meta-information
 with other relevant fields (see below for details). To make it easier, the
 information below has been extracted from the changelog. Adjust it or drop
 it.
 .
 beignet (0.9.3~dfsg-2) UNRELEASED; urgency=medium
 .
   * Fix tgamma,pow,pown,erf,erfc
   * Enable debug output in tests
Author: Rebecca N. Palmer <rnpalmer@rnpalmer-laptop>

---
The information above should follow the Patch Tagging Guidelines, please
checkout http://dep.debian.net/deps/dep3/ to learn about the format. Here
are templates for supplementary fields that you might want to add:

Origin: <vendor|upstream|other>, <url of original patch>
Bug: <url in upstream bugtracker>
Bug-Debian: https://bugs.debian.org/<bugnumber>
Bug-Ubuntu: https://launchpad.net/bugs/<bugnumber>
Forwarded: <no|not-needed|url proving that it has been forwarded>
Reviewed-By: <name and email of someone who approved the patch>
Last-Update: <YYYY-MM-DD>

--- beignet-0.9.3~dfsg.orig/backend/src/builtin_vector_proto.def
+++ beignet-0.9.3~dfsg/backend/src/builtin_vector_proto.def
@@ -94,8 +94,7 @@ floatn pown (floatn x, intn y)
 float pown (float x, int y)
 doublen pown (doublen x, intn y)
 double pown (double x, int y)
-#XXX we define powr as pow
-#gentype powr (gentype x, gentype y)
+gentype powr (gentype x, gentype y)
 gentype remainder (gentype x, gentype y)
 floatn remquo (floatn x, floatn y, __global intn *quo)
 floatn remquo (floatn x, floatn y, __local intn *quo)
--- beignet-0.9.3~dfsg.orig/backend/src/ocl_stdlib.tmpl.h
+++ beignet-0.9.3~dfsg/backend/src/ocl_stdlib.tmpl.h
@@ -1565,189 +1565,6 @@ INLINE_OVERLOADABLE float native_log2(fl
 INLINE_OVERLOADABLE float native_log(float x) {
   return native_log2(x) * 0.6931472002f;
 }
-INLINE_OVERLOADABLE float tgamma(float x) {
-/*
- * ====================================================
- * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
- *
- * Developed at SunPro, a Sun Microsystems, Inc. business.
- * Permission to use, copy, modify, and distribute this
- * software is freely granted, provided that this notice
- * is preserved.
- * ====================================================
- */
-  float pi = 3.1415927410e+00,
-    a0 = 7.7215664089e-02,
-    a1 = 3.2246702909e-01,
-    a2 = 6.7352302372e-02,
-    a3 = 2.0580807701e-02,
-    a4 = 7.3855509982e-03,
-    a5 = 2.8905137442e-03,
-    a6 = 1.1927076848e-03,
-    a7 = 5.1006977446e-04,
-    a8 = 2.2086278477e-04,
-    a9 = 1.0801156895e-04,
-    a10 = 2.5214456400e-05,
-    a11 = 4.4864096708e-05,
-    tc = 1.4616321325e+00,
-    tf = -1.2148628384e-01,
-    tt = 6.6971006518e-09,
-    t0 = 4.8383611441e-01,
-    t1 = -1.4758771658e-01,
-    t2 = 6.4624942839e-02,
-    t3 = -3.2788541168e-02,
-    t4 = 1.7970675603e-02,
-    t5 = -1.0314224288e-02,
-    t6 = 6.1005386524e-03,
-    t7 = -3.6845202558e-03,
-    t8 = 2.2596477065e-03,
-    t9 = -1.4034647029e-03,
-    t10 = 8.8108185446e-04,
-    t11 = -5.3859531181e-04,
-    t12 = 3.1563205994e-04,
-    t13 = -3.1275415677e-04,
-    t14 = 3.3552918467e-04,
-    u0 = -7.7215664089e-02,
-    u1 = 6.3282704353e-01,
-    u2 = 1.4549225569e+00,
-    u3 = 9.7771751881e-01,
-    u4 = 2.2896373272e-01,
-    u5 = 1.3381091878e-02,
-    v1 = 2.4559779167e+00,
-    v2 = 2.1284897327e+00,
-    v3 = 7.6928514242e-01,
-    v4 = 1.0422264785e-01,
-    v5 = 3.2170924824e-03,
-    s0 = -7.7215664089e-02,
-    s1 = 2.1498242021e-01,
-    s2 = 3.2577878237e-01,
-    s3 = 1.4635047317e-01,
-    s4 = 2.6642270386e-02,
-    s5 = 1.8402845599e-03,
-    s6 = 3.1947532989e-05,
-    r1 = 1.3920053244e+00,
-    r2 = 7.2193557024e-01,
-    r3 = 1.7193385959e-01,
-    r4 = 1.8645919859e-02,
-    r5 = 7.7794247773e-04,
-    r6 = 7.3266842264e-06,
-    w0 = 4.1893854737e-01,
-    w1 = 8.3333335817e-02,
-    w2 = -2.7777778450e-03,
-    w3 = 7.9365057172e-04,
-    w4 = -5.9518753551e-04,
-    w5 = 8.3633989561e-04,
-    w6 = -1.6309292987e-03;
-  float t, y, z, nadj, p, p1, p2, p3, q, r, w;
-  int i, hx, ix;
-  nadj = 0;
-  hx = *(int *) (&x);
-  ix = hx & 0x7fffffff;
-  if (ix >= 0x7f800000)
-    return x * x;
-  if (ix == 0)
-    return INFINITY;
-  if (ix < 0x1c800000) {
-    if (hx < 0) {
-      return - native_log(-x);
-    } else
-      return - native_log(x);
-  }
-  if (hx < 0) {
-    if (ix >= 0x4b000000)
-      return INFINITY;
-    t = __gen_ocl_internal_sinpi(x);
-    if (__gen_ocl_fabs(t) < 1e-8f)
-      return INFINITY;
-    nadj = native_log(M_PI_F / __gen_ocl_fabs(t * x));
-    x = -x;
-  }
-
-  if (ix == 0x3f800000 || ix == 0x40000000)
-    r = 0;
-  else if (ix < 0x40000000) {
-    if (ix <= 0x3f666666) {
-      r = - native_log(x);
-      if (ix >= 0x3f3b4a20) {
-        y = 1 - x;
-        i = 0;
-      } else if (ix >= 0x3e6d3308) {
-        y = x - (tc - 1);
-        i = 1;
-      } else {
-        y = x;
-        i = 2;
-      }
-    } else {
-      r = 0;
-      if (ix >= 0x3fdda618) {
-        y = 2 - x;
-        i = 0;
-      } else if (ix >= 0x3F9da620) {
-        y = x - tc;
-        i = 1;
-      } else {
-        y = x - 1;
-        i = 2;
-      }
-    }
-    switch (i) {
-    case 0:
-      z = y * y;
-      p1 = a0 + z * (a2 + z * (a4 + z * (a6 + z * (a8 + z * a10))));
-      p2 = z * (a1 + z * (a3 + z * (a5 + z * (a7 + z * (a9 + z * a11)))));
-      p = y * p1 + p2;
-      r += (p - .5f * y);
-      break;
-    case 1:
-      z = y * y;
-      w = z * y;
-      p1 = t0 + w * (t3 + w * (t6 + w * (t9 + w * t12)));
-      p2 = t1 + w * (t4 + w * (t7 + w * (t10 + w * t13)));
-      p3 = t2 + w * (t5 + w * (t8 + w * (t11 + w * t14)));
-      p = z * p1 - (tt - w * (p2 + y * p3));
-      r += (tf + p);
-      break;
-    case 2:
-      p1 = y * (u0 + y * (u1 + y * (u2 + y * (u3 + y * (u4 + y * u5)))));
-      p2 = 1 + y * (v1 + y * (v2 + y * (v3 + y * (v4 + y * v5))));
-      r += (-.5f * y + p1 / p2);
-    }
-  } else if (ix < 0x41000000) {
-    i = x;
-    t = 0;
-    y = x - i;
-    p = y*(s0+y*(s1+y*(s2+y*(s3+y*(s4+y*(s5+y*s6))))));
-    q = 1 + y * (r1 + y * (r2 + y * (r3 + y * (r4 + y * (r5 + y * r6)))));
-    r = .5f * y + p / q;
-    z = 1;
-    switch (i) {
-    case 7:
-      z *= (y + 6.f);
-    case 6:
-      z *= (y + 5.f);
-    case 5:
-      z *= (y + 4.f);
-    case 4:
-      z *= (y + 3.f);
-    case 3:
-      z *= (y + 2.f);
-      r += native_log(z);
-      break;
-    }
-  } else if (ix < 0x5c800000) {
-    t = native_log(x);
-    z = 1 / x;
-    y = z * z;
-    w = w0 + z * (w1 + y * (w2 + y * (w3 + y * (w4 + y * (w5 + y * w6)))));
-    r = (x - .5f) * (t - 1) + w;
-  } else
-    r = x * (native_log(x) - 1);
-  if (hx < 0)
-    r = nadj - r;
-  return r;
-}
-
 INLINE_OVERLOADABLE float lgamma(float x) {
 /*
  * ====================================================
@@ -2501,13 +2318,6 @@ INLINE_OVERLOADABLE float __gen_ocl_inte
 INLINE_OVERLOADABLE float __gen_ocl_internal_atanpi(float x) {
   return __gen_ocl_internal_atan(x) / M_PI_F;
 }
-INLINE_OVERLOADABLE float __gen_ocl_internal_erf(float x) {
-  return M_2_SQRTPI_F * (x - __gen_ocl_pow(x, 3) / 3 + __gen_ocl_pow(x, 5) / 10 - __gen_ocl_pow(x, 7) / 42 + __gen_ocl_pow(x, 9) / 216);
-}
-INLINE_OVERLOADABLE float __gen_ocl_internal_erfc(float x) {
-  return 1 - __gen_ocl_internal_erf(x);
-}
-
 // XXX work-around PTX profile
 #define sqrt native_sqrt
 INLINE_OVERLOADABLE float rsqrt(float x) { return native_rsqrt(x); }
@@ -2726,6 +2536,295 @@ INLINE_OVERLOADABLE float __gen_ocl_inte
     return y*twom100;
   }
 }
+INLINE_OVERLOADABLE float tgamma(float x) {
+  float y;
+  int s;
+  y=lgamma_r(x,&s);
+  return __gen_ocl_internal_exp(y)*s;
+}
+
+/* erf,erfc from glibc s_erff.c -- float version of s_erf.c.
+ * Conversion to float by Ian Lance Taylor, Cygnus Support, i...@cygnus.com.
+ */
+
+/*
+ * ====================================================
+ * Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
+ *
+ * Developed at SunPro, a Sun Microsystems, Inc. business.
+ * Permission to use, copy, modify, and distribute this
+ * software is freely granted, provided that this notice
+ * is preserved.
+ * ====================================================
+ */
+
+INLINE_OVERLOADABLE float __gen_ocl_internal_erf(float x) {
+/*...*/
+const float
+tiny = 1.0e-30,
+half_val=  5.0000000000e-01, /* 0x3F000000 */
+one =  1.0000000000e+00, /* 0x3F800000 */
+two =  2.0000000000e+00, /* 0x40000000 */
+	/* c = (subfloat)0.84506291151 */
+erx =  8.4506291151e-01, /* 0x3f58560b */
+/*
+ * Coefficients for approximation to  erf on [0,0.84375]
+ */
+efx =  1.2837916613e-01, /* 0x3e0375d4 */
+efx8=  1.0270333290e+00, /* 0x3f8375d4 */
+pp0  =  1.2837916613e-01, /* 0x3e0375d4 */
+pp1  = -3.2504209876e-01, /* 0xbea66beb */
+pp2  = -2.8481749818e-02, /* 0xbce9528f */
+pp3  = -5.7702702470e-03, /* 0xbbbd1489 */
+pp4  = -2.3763017452e-05, /* 0xb7c756b1 */
+qq1  =  3.9791721106e-01, /* 0x3ecbbbce */
+qq2  =  6.5022252500e-02, /* 0x3d852a63 */
+qq3  =  5.0813062117e-03, /* 0x3ba68116 */
+qq4  =  1.3249473704e-04, /* 0x390aee49 */
+qq5  = -3.9602282413e-06, /* 0xb684e21a */
+/*
+ * Coefficients for approximation to  erf  in [0.84375,1.25]
+ */
+pa0  = -2.3621185683e-03, /* 0xbb1acdc6 */
+pa1  =  4.1485610604e-01, /* 0x3ed46805 */
+pa2  = -3.7220788002e-01, /* 0xbebe9208 */
+pa3  =  3.1834661961e-01, /* 0x3ea2fe54 */
+pa4  = -1.1089469492e-01, /* 0xbde31cc2 */
+pa5  =  3.5478305072e-02, /* 0x3d1151b3 */
+pa6  = -2.1663755178e-03, /* 0xbb0df9c0 */
+qa1  =  1.0642088205e-01, /* 0x3dd9f331 */
+qa2  =  5.4039794207e-01, /* 0x3f0a5785 */
+qa3  =  7.1828655899e-02, /* 0x3d931ae7 */
+qa4  =  1.2617121637e-01, /* 0x3e013307 */
+qa5  =  1.3637083583e-02, /* 0x3c5f6e13 */
+qa6  =  1.1984500103e-02, /* 0x3c445aa3 */
+ /*
+ * Coefficients for approximation to  erfc in [1.25,1/0.35]
+ */ra0  = -9.8649440333e-03, /* 0xbc21a093 */
+ra1  = -6.9385856390e-01, /* 0xbf31a0b7 */
+ra2  = -1.0558626175e+01, /* 0xc128f022 */
+ra3  = -6.2375331879e+01, /* 0xc2798057 */
+ra4  = -1.6239666748e+02, /* 0xc322658c */
+ra5  = -1.8460508728e+02, /* 0xc3389ae7 */
+ra6  = -8.1287437439e+01, /* 0xc2a2932b */
+ra7  = -9.8143291473e+00, /* 0xc11d077e */
+sa1  =  1.9651271820e+01, /* 0x419d35ce */
+sa2  =  1.3765776062e+02, /* 0x4309a863 */
+sa3  =  4.3456588745e+02, /* 0x43d9486f */
+sa4  =  6.4538726807e+02, /* 0x442158c9 */
+sa5  =  4.2900814819e+02, /* 0x43d6810b */
+sa6  =  1.0863500214e+02, /* 0x42d9451f */
+sa7  =  6.5702495575e+00, /* 0x40d23f7c */
+sa8  = -6.0424413532e-02, /* 0xbd777f97 */
+/*
+ * Coefficients for approximation to  erfc in [1/.35,28]
+ */
+rb0  = -9.8649431020e-03, /* 0xbc21a092 */
+rb1  = -7.9928326607e-01, /* 0xbf4c9dd4 */
+rb2  = -1.7757955551e+01, /* 0xc18e104b */
+rb3  = -1.6063638306e+02, /* 0xc320a2ea */
+rb4  = -6.3756646729e+02, /* 0xc41f6441 */
+rb5  = -1.0250950928e+03, /* 0xc480230b */
+rb6  = -4.8351919556e+02, /* 0xc3f1c275 */
+sb1  =  3.0338060379e+01, /* 0x41f2b459 */
+sb2  =  3.2579251099e+02, /* 0x43a2e571 */
+sb3  =  1.5367296143e+03, /* 0x44c01759 */
+sb4  =  3.1998581543e+03, /* 0x4547fdbb */
+sb5  =  2.5530502930e+03, /* 0x451f90ce */
+sb6  =  4.7452853394e+02, /* 0x43ed43a7 */
+sb7  = -2.2440952301e+01; /* 0xc1b38712 */
+
+	int hx,ix,i;
+	float R,S,P,Q,s,y,z,r;
+	GEN_OCL_GET_FLOAT_WORD(hx,x);
+	ix = hx&0x7fffffff;
+	if(ix>=0x7f800000) {		/* erf(nan)=nan */
+	    i = ((unsigned int)hx>>31)<<1;
+	    return (float)(1-i)+one/x;	/* erf(+-inf)=+-1 */
+	}
+
+	if(ix < 0x3f580000) {		/* |x|<0.84375 */
+	    if(ix < 0x31800000) { 	/* |x|<2**-28 */
+	        if (ix < 0x04000000)
+		    /*avoid underflow */
+		    return (float)0.125*((float)8.0*x+efx8*x);
+		return x + efx*x;
+	    }
+	    z = x*x;
+	    r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4)));
+	    s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5))));
+	    y = r/s;
+	    return x + x*y;
+	}
+	if(ix < 0x3fa00000) {		/* 0.84375 <= |x| < 1.25 */
+	    s = __gen_ocl_internal_fabs(x)-one;
+	    P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6)))));
+	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6)))));
+	    if(hx>=0) return erx + P/Q; else return -erx - P/Q;
+	}
+	if (ix >= 0x40c00000) {		/* inf>|x|>=6 */
+	    if(hx>=0) return one-tiny; else return tiny-one;
+	}
+	x = __gen_ocl_internal_fabs(x);
+ 	s = one/(x*x);
+	if(ix< 0x4036DB6E) {	/* |x| < 1/0.35 */
+	    R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*(
+				ra5+s*(ra6+s*ra7))))));
+	    S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(
+				sa5+s*(sa6+s*(sa7+s*sa8)))))));
+	} else {	/* |x| >= 1/0.35 */
+	    R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*(
+				rb5+s*rb6)))));
+	    S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*(
+				sb5+s*(sb6+s*sb7))))));
+	}
+	GEN_OCL_GET_FLOAT_WORD(ix,x);
+	GEN_OCL_SET_FLOAT_WORD(z,ix&0xfffff000);
+	r  =  __gen_ocl_internal_exp(-z*z-(float)0.5625)*__gen_ocl_internal_exp((z-x)*(z+x)+R/S);
+	if(hx>=0) return one-r/x; else return  r/x-one;
+}
+INLINE_OVERLOADABLE float __gen_ocl_internal_erfc(float x) {
+/*...*/
+const float
+tiny = 1.0e-30,
+half_val=  5.0000000000e-01, /* 0x3F000000 */
+one =  1.0000000000e+00, /* 0x3F800000 */
+two =  2.0000000000e+00, /* 0x40000000 */
+	/* c = (subfloat)0.84506291151 */
+erx =  8.4506291151e-01, /* 0x3f58560b */
+/*
+ * Coefficients for approximation to  erf on [0,0.84375]
+ */
+efx =  1.2837916613e-01, /* 0x3e0375d4 */
+efx8=  1.0270333290e+00, /* 0x3f8375d4 */
+pp0  =  1.2837916613e-01, /* 0x3e0375d4 */
+pp1  = -3.2504209876e-01, /* 0xbea66beb */
+pp2  = -2.8481749818e-02, /* 0xbce9528f */
+pp3  = -5.7702702470e-03, /* 0xbbbd1489 */
+pp4  = -2.3763017452e-05, /* 0xb7c756b1 */
+qq1  =  3.9791721106e-01, /* 0x3ecbbbce */
+qq2  =  6.5022252500e-02, /* 0x3d852a63 */
+qq3  =  5.0813062117e-03, /* 0x3ba68116 */
+qq4  =  1.3249473704e-04, /* 0x390aee49 */
+qq5  = -3.9602282413e-06, /* 0xb684e21a */
+/*
+ * Coefficients for approximation to  erf  in [0.84375,1.25]
+ */
+pa0  = -2.3621185683e-03, /* 0xbb1acdc6 */
+pa1  =  4.1485610604e-01, /* 0x3ed46805 */
+pa2  = -3.7220788002e-01, /* 0xbebe9208 */
+pa3  =  3.1834661961e-01, /* 0x3ea2fe54 */
+pa4  = -1.1089469492e-01, /* 0xbde31cc2 */
+pa5  =  3.5478305072e-02, /* 0x3d1151b3 */
+pa6  = -2.1663755178e-03, /* 0xbb0df9c0 */
+qa1  =  1.0642088205e-01, /* 0x3dd9f331 */
+qa2  =  5.4039794207e-01, /* 0x3f0a5785 */
+qa3  =  7.1828655899e-02, /* 0x3d931ae7 */
+qa4  =  1.2617121637e-01, /* 0x3e013307 */
+qa5  =  1.3637083583e-02, /* 0x3c5f6e13 */
+qa6  =  1.1984500103e-02, /* 0x3c445aa3 */
+ /*
+ * Coefficients for approximation to  erfc in [1.25,1/0.35]
+ */ra0  = -9.8649440333e-03, /* 0xbc21a093 */
+ra1  = -6.9385856390e-01, /* 0xbf31a0b7 */
+ra2  = -1.0558626175e+01, /* 0xc128f022 */
+ra3  = -6.2375331879e+01, /* 0xc2798057 */
+ra4  = -1.6239666748e+02, /* 0xc322658c */
+ra5  = -1.8460508728e+02, /* 0xc3389ae7 */
+ra6  = -8.1287437439e+01, /* 0xc2a2932b */
+ra7  = -9.8143291473e+00, /* 0xc11d077e */
+sa1  =  1.9651271820e+01, /* 0x419d35ce */
+sa2  =  1.3765776062e+02, /* 0x4309a863 */
+sa3  =  4.3456588745e+02, /* 0x43d9486f */
+sa4  =  6.4538726807e+02, /* 0x442158c9 */
+sa5  =  4.2900814819e+02, /* 0x43d6810b */
+sa6  =  1.0863500214e+02, /* 0x42d9451f */
+sa7  =  6.5702495575e+00, /* 0x40d23f7c */
+sa8  = -6.0424413532e-02, /* 0xbd777f97 */
+/*
+ * Coefficients for approximation to  erfc in [1/.35,28]
+ */
+rb0  = -9.8649431020e-03, /* 0xbc21a092 */
+rb1  = -7.9928326607e-01, /* 0xbf4c9dd4 */
+rb2  = -1.7757955551e+01, /* 0xc18e104b */
+rb3  = -1.6063638306e+02, /* 0xc320a2ea */
+rb4  = -6.3756646729e+02, /* 0xc41f6441 */
+rb5  = -1.0250950928e+03, /* 0xc480230b */
+rb6  = -4.8351919556e+02, /* 0xc3f1c275 */
+sb1  =  3.0338060379e+01, /* 0x41f2b459 */
+sb2  =  3.2579251099e+02, /* 0x43a2e571 */
+sb3  =  1.5367296143e+03, /* 0x44c01759 */
+sb4  =  3.1998581543e+03, /* 0x4547fdbb */
+sb5  =  2.5530502930e+03, /* 0x451f90ce */
+sb6  =  4.7452853394e+02, /* 0x43ed43a7 */
+sb7  = -2.2440952301e+01; /* 0xc1b38712 */
+	int hx,ix;
+	float R,S,P,Q,s,y,z,r;
+	GEN_OCL_GET_FLOAT_WORD(hx,x);
+	ix = hx&0x7fffffff;
+	if(ix>=0x7f800000) {			/* erfc(nan)=nan */
+						/* erfc(+-inf)=0,2 */
+	    return (float)(((unsigned int)hx>>31)<<1)+one/x;
+	}
+
+	if(ix < 0x3f580000) {		/* |x|<0.84375 */
+	    if(ix < 0x23800000)  	/* |x|<2**-56 */
+		return one-x;
+	    z = x*x;
+	    r = pp0+z*(pp1+z*(pp2+z*(pp3+z*pp4)));
+	    s = one+z*(qq1+z*(qq2+z*(qq3+z*(qq4+z*qq5))));
+	    y = r/s;
+	    if(hx < 0x3e800000) {  	/* x<1/4 */
+		return one-(x+x*y);
+	    } else {
+		r = x*y;
+		r += (x-half_val);
+	        return half_val - r ;
+	    }
+	}
+	if(ix < 0x3fa00000) {		/* 0.84375 <= |x| < 1.25 */
+	    s = __gen_ocl_internal_fabs(x)-one;
+	    P = pa0+s*(pa1+s*(pa2+s*(pa3+s*(pa4+s*(pa5+s*pa6)))));
+	    Q = one+s*(qa1+s*(qa2+s*(qa3+s*(qa4+s*(qa5+s*qa6)))));
+	    if(hx>=0) {
+	        z  = one-erx; return z - P/Q;
+	    } else {
+		z = erx+P/Q; return one+z;
+	    }
+	}
+	if (ix < 0x41e00000) {		/* |x|<28 */
+	    x = __gen_ocl_internal_fabs(x);
+ 	    s = one/(x*x);
+	    if(ix< 0x4036DB6D) {	/* |x| < 1/.35 ~ 2.857143*/
+	        R=ra0+s*(ra1+s*(ra2+s*(ra3+s*(ra4+s*(
+				ra5+s*(ra6+s*ra7))))));
+	        S=one+s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(
+				sa5+s*(sa6+s*(sa7+s*sa8)))))));
+	    } else {			/* |x| >= 1/.35 ~ 2.857143 */
+		if(hx<0&&ix>=0x40c00000) return two-tiny;/* x < -6 */
+	        R=rb0+s*(rb1+s*(rb2+s*(rb3+s*(rb4+s*(
+				rb5+s*rb6)))));
+	        S=one+s*(sb1+s*(sb2+s*(sb3+s*(sb4+s*(
+				sb5+s*(sb6+s*sb7))))));
+	    }
+	    GEN_OCL_GET_FLOAT_WORD(ix,x);
+	    GEN_OCL_SET_FLOAT_WORD(z,ix&0xffffe000);
+	    r  =  __gen_ocl_internal_exp(-z*z-(float)0.5625)*
+			__gen_ocl_internal_exp((z-x)*(z+x)+R/S);
+	    if(hx>0) {
+		float ret = r/x;
+		return ret;
+	    } else
+		return two-r/x;
+	} else {
+	    if(hx>0) {
+		return tiny*tiny;
+	    } else
+		return two-tiny;
+	}
+}
+
+
 INLINE_OVERLOADABLE float __gen_ocl_internal_fmod (float x, float y) {
   //return x-y*__gen_ocl_rndz(x/y);
   float one = 1.0;
@@ -3190,7 +3289,6 @@ INLINE_OVERLOADABLE float __gen_ocl_inte
 #define atan2pi __gen_ocl_internal_atan2pi
 #define atanpi __gen_ocl_internal_atanpi
 #define atanh __gen_ocl_internal_atanh
-#define pow powr
 #define cbrt __gen_ocl_internal_cbrt
 #define rint __gen_ocl_internal_rint
 #define copysign __gen_ocl_internal_copysign
@@ -3729,10 +3827,23 @@ INLINE_OVERLOADABLE float remquo(float x
 #undef BODY
 INLINE_OVERLOADABLE float native_divide(float x, float y) { return x/y; }
 INLINE_OVERLOADABLE float pown(float x, int n) {
-  if (x == 0 && n == 0)
-    return 1;
+  if (x == 0.f && n == 0)
+    return 1.f;
+  if (x < 0.f && (n&1) )
+    return -powr(-x, n);
   return powr(x, n);
 }
+INLINE_OVERLOADABLE float pow(float x, float y) {
+  int n;
+  if (x == 0.f && y == 0.f)
+    return 1.f;
+  if (x >= 0.f)
+    return powr(x, y);
+  n = y;
+  if ((float)n == y)//is exact integer
+    return pown(x, n);
+  return NAN;
+}
 
 INLINE_OVERLOADABLE float internal_rootn(float x, int n, const bool isFastpath)
 {
--- beignet-0.9.3~dfsg.orig/utests/builtin_pow.cpp
+++ beignet-0.9.3~dfsg/utests/builtin_pow.cpp
@@ -16,12 +16,12 @@ const int count_input = count_input_ori
 
 float input_data1[count_input];
 float input_data2[count_input];
-const int max_function = 1;
+const int max_function = 2; // builtin_pow.cl has 2 outputs: pow(src1,src2) and src1
 
 static void cpu_compiler_math(const float *src1, const float *src2, float *dst)
 {
   dst[0] = powf(src1[0], src2[0]);
-//  dst[1] = src1[0];
+  dst[1] = src1[0];
 }
 
 static void builtin_pow(void)
@@ -92,4 +92,4 @@ static void builtin_pow(void)
   OCL_ASSERT(status == 0);
 }
 
-MAKE_UTEST_FROM_FUNCTION_WITH_ISSUE(builtin_pow)
+MAKE_UTEST_FROM_FUNCTION(builtin_pow)
--- beignet-0.9.3~dfsg.orig/utests/builtin_tgamma.cpp
+++ beignet-0.9.3~dfsg/utests/builtin_tgamma.cpp
@@ -27,7 +27,7 @@ void builtin_tgamma(void)
     OCL_MAP_BUFFER(1);
     float *dst = (float*)buf_data[1];
     for (int i = 0; i < n; ++i) {
-      float cpu = gammaf(src[i]);
+      float cpu = tgammaf(src[i]);
       if (isinf(cpu)) {
         OCL_ASSERT(isinf(dst[i]));
       } else if (fabsf(cpu - dst[i]) >= 1e-3) {
--- beignet-0.9.3~dfsg.orig/utests/utest_math_gen.py
+++ beignet-0.9.3~dfsg/utests/utest_math_gen.py
@@ -216,17 +216,17 @@ static float atanpi(float x){
   cospi_cpu_func=reduce1+cospi
   cospiUtests = func('cospi','cospi',[cospi_input_type],cospi_output_type,[cospi_input_values],'2 * FLT_ULP',cospi_cpu_func)
   
-#  ##### gentype erf(gentype)
-#  erf_input_values = base_input_values
-#  erf_input_type = ['float','float2','float4','float8','float16']
-#  erf_output_type = ['float','float2','float4','float8','float16']
-#  erfUtests = func('erf','erf',[erf_input_type],erf_output_type,[erf_input_values],'16 * FLT_ULP')
+  ##### gentype erf(gentype)
+  erf_input_values = base_input_values
+  erf_input_type = ['float','float2','float4','float8','float16']
+  erf_output_type = ['float','float2','float4','float8','float16']
+  erfUtests = func('erf','erf',[erf_input_type],erf_output_type,[erf_input_values],'16 * FLT_ULP')
 
-#  ##### gentype erfc(gentype)
-#  erfc_input_values = base_input_values
-#  erfc_input_type = ['float','float2','float4','float8','float16']
-#  erfc_output_type = ['float','float2','float4','float8','float16']
-#  erfcUtests = func('erfc','erfc',[erfc_input_type],erfc_output_type,[erfc_input_values],'16 * FLT_ULP')
+  ##### gentype erfc(gentype)
+  erfc_input_values = base_input_values
+  erfc_input_type = ['float','float2','float4','float8','float16']
+  erfc_output_type = ['float','float2','float4','float8','float16']
+  erfcUtests = func('erfc','erfc',[erfc_input_type],erfc_output_type,[erfc_input_values],'16 * FLT_ULP')
   
   ##### gentype exp(gentype x)
   exp_input_values = base_input_values

--- End Message ---
--- Begin Message ---
Source: beignet
Source-Version: 0.9.3~really.0.8+dfsg-1

We believe that the bug you reported is fixed in the latest version of
beignet, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 768...@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Andreas Beckmann <a...@debian.org> (supplier of updated beignet package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmas...@ftp-master.debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Format: 1.8
Date: Wed, 19 Nov 2014 19:52:26 +0100
Source: beignet
Binary: beignet-dev beignet
Architecture: source
Version: 0.9.3~really.0.8+dfsg-1
Distribution: unstable
Urgency: medium
Maintainer: Debian OpenCL Maintainers <pkg-opencl-de...@lists.alioth.debian.org>
Changed-By: Andreas Beckmann <a...@debian.org>
Description:
 beignet    - OpenCL library for Intel Ivy Bridge GPUs
 beignet-dev - Intel OpenCL library
Closes: 764930 767387 768090
Changes:
 beignet (0.9.3~really.0.8+dfsg-1) unstable; urgency=medium
 .
   [ Andreas Beckmann ]
   * Set Maintainer to "Debian OpenCL Maintainers" with Simon's permission.
   * Add Simon Richter, Rebecca N. Palmer and myself to Uploaders.
   * Repack upstream tarball to remove non-distributable Len(n)a images and
     CC-BY-NC-SA licensed parts from the test suite.  (Closes: #767387)
   * 0001-use-env-to-set-environment-variables-for-GBE_BIN_GEN.patch: New, fix
     FTBFS of "~dfsg" versions in pbuilder etc.
     - Backport patch to 0.8.
   * Import debian/ packaging history into GIT. Add Vcs-* URLs.
   * Update Homepage.
 .
   [ Rebecca N. Palmer ]
   * Build-Depends: clang-3.4, lib{clang,llvm}-3.4-dev.  (Closes: #764930)
   * patches/versioned-llvm-tools: Fix one more unversioned use of clang.
   * Skip-deleted-tests.patch: New. Skip tests removed in the dfsg version.
   * Replace broken pow(n), rootn, erf(c), tgamma.  (Closes: #768090)
   * Document in the description what hardware this version supports.
Checksums-Sha1:
 9f1f617f2f3a9fc1792747760c5d2c946aceedc9 2538 
beignet_0.9.3~really.0.8+dfsg-1.dsc
 60c577b192c415af1d74f04972c00b0b4e5862a5 407316 
beignet_0.9.3~really.0.8+dfsg.orig.tar.xz
 1225e34970fadd4db2c77e140ee053378094975f 58224 
beignet_0.9.3~really.0.8+dfsg-1.debian.tar.xz
Checksums-Sha256:
 c33e7bef033d85524530b0bc6f10cbba96d06b0ffcfea8c202fa005daccf492d 2538 
beignet_0.9.3~really.0.8+dfsg-1.dsc
 6929ba7fe07de39d7b815b71e9b930b6c8a579631bf1181424cad74765e92954 407316 
beignet_0.9.3~really.0.8+dfsg.orig.tar.xz
 cd417759b55844e52c171f8456d3fa9357154acc8b0c47b42bfa5ac2a90a0a37 58224 
beignet_0.9.3~really.0.8+dfsg-1.debian.tar.xz
Files:
 853f2ded8b820da9bf3e6b609c3a487c 2538 libs extra 
beignet_0.9.3~really.0.8+dfsg-1.dsc
 225adbd24dc8f6e1ad56602c0a8d0286 407316 libs extra 
beignet_0.9.3~really.0.8+dfsg.orig.tar.xz
 6bf756a94c3ed8d8b659e5f9afa6644d 58224 libs extra 
beignet_0.9.3~really.0.8+dfsg-1.debian.tar.xz

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCAAGBQJUbPzjAAoJEF+zP5NZ6e0Ify0P/R0MeRkkSUGOGwNsgdbPqhUP
gRGeww4Rs/KccYogHsFogXFS/AcCKW78/cvGHoBcRq3jMnyZTGR5N4HEcTcZD0Nn
vihIb6dmjNWDuKT5UZ2CDvgJnA45sJikbNzIflSlBiNYO3iDjF+XfVcu6AexOj2X
m9n44hFVmgipb0i9dntzIXW4QRO77pGoK3zCYU8uaHzHbSebc/4b+8G3oe7RgHnm
ExF2VYWUqfBWq9TowJHXMGk6s4s6Hc87Rne+PuZwNlv/SncgBs95eN8iEmIw2bsW
ItRqKHx3QO0xuLcupIrt7LniVPUH8OBYuAYWxLjHe8R/oFaqHQjGnTKE1isDkvAV
c66EoPhCRjv9T6m09ljMBLUWE3+mhaWVv8wcO0miIDK3knTCn4A/Ez5LIYoEQnhV
Sg4msajwJsZ1DsdQzC+3x57kh1XJeH+qtjFuyDCYzdodtzd9tpt1f89bcSA2whQy
FEnWuNVDgavKH7fnouifPLPjBr7CVyZgwTYSYSqO9raNd1TfYZg+6XtPu6v5YoSY
5pdE9kLUV5oTXNmZ6T66LOgGgrDJJeWf72UVSURVbJoax637vux+Sdr7t0pxPCRG
W8R176Csv4iff/qufsT6THOOFSd7ExmEPo1wXpV4/ItNvmKZOs5I2wmgXt7gdBE1
ELEWg/CvQ2MBWiCtzKvl
=PEve
-----END PGP SIGNATURE-----

--- End Message ---

Reply via email to