spark git commit: [SPARK-7299][SQL] Set precision and scale for Decimal according to JDBC metadata instead of returned BigDecimal

rxin Mon, 18 May 2015 01:11:31 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-1.4 0b6bc8a23 -> 0e7cd8ff8



[SPARK-7299][SQL] Set precision and scale for Decimal according to JDBC 
metadata instead of returned BigDecimal

JIRA: https://issues.apache.org/jira/browse/SPARK-7299

When connecting with oracle db through jdbc, the precision and scale of 
`BigDecimal` object returned by `ResultSet.getBigDecimal` is not correctly 
matched to the table schema reported by `ResultSetMetaData.getPrecision` and 
`ResultSetMetaData.getScale`.

So in case you insert a value like `19999` into a column with `NUMBER(12, 2)` 
type, you get through a `BigDecimal` object with scale as 0. But the dataframe 
schema has correct type as `DecimalType(12, 2)`. Thus, after you save the 
dataframe into parquet file and then retrieve it, you will get wrong result 
`199.99`.

Because it is reported to be problematic on jdbc connection with oracle db. It 
might be difficult to add test case for it. But according to the user's test on 
JIRA, it solves this problem.

Author: Liang-Chi Hsieh <[email protected]>

Closes #5833 from viirya/jdbc_decimal_precision and squashes the following 
commits:

69bc2b5 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into 
jdbc_decimal_precision
928f864 [Liang-Chi Hsieh] Add comments.
5f9da94 [Liang-Chi Hsieh] Set up Decimal's precision and scale according to 
table schema instead of returned BigDecimal.

(cherry picked from commit e32c0f69f38ad729e25c2d5f90eb73b4453f8279)
Signed-off-by: Reynold Xin <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0e7cd8ff
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0e7cd8ff
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0e7cd8ff

Branch: refs/heads/branch-1.4
Commit: 0e7cd8ff82080376e4da11fc86f7763d0493961a
Parents: 0b6bc8a
Author: Liang-Chi Hsieh <[email protected]>
Authored: Mon May 18 01:10:55 2015 -0700
Committer: Reynold Xin <[email protected]>
Committed: Mon May 18 01:11:10 2015 -0700

----------------------------------------------------------------------
 .../org/apache/spark/sql/jdbc/JDBCRDD.scala     | 23 ++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/0e7cd8ff/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
index 95935ba..4189dfc 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
@@ -300,7 +300,7 @@ private[sql] class JDBCRDD(
   abstract class JDBCConversion
   case object BooleanConversion extends JDBCConversion
   case object DateConversion extends JDBCConversion
-  case object DecimalConversion extends JDBCConversion
+  case class  DecimalConversion(precisionInfo: Option[(Int, Int)]) extends 
JDBCConversion
   case object DoubleConversion extends JDBCConversion
   case object FloatConversion extends JDBCConversion
   case object IntegerConversion extends JDBCConversion
@@ -317,8 +317,8 @@ private[sql] class JDBCRDD(
     schema.fields.map(sf => sf.dataType match {
       case BooleanType           => BooleanConversion
       case DateType              => DateConversion
-      case DecimalType.Unlimited => DecimalConversion
-      case DecimalType.Fixed(d)  => DecimalConversion
+      case DecimalType.Unlimited => DecimalConversion(None)
+      case DecimalType.Fixed(d)  => DecimalConversion(Some(d))
       case DoubleType            => DoubleConversion
       case FloatType             => FloatConversion
       case IntegerType           => IntegerConversion
@@ -375,7 +375,22 @@ private[sql] class JDBCRDD(
               } else {
                 mutableRow.update(i, null)
               }
-            case DecimalConversion    =>
+            // When connecting with Oracle DB through JDBC, the precision and 
scale of BigDecimal
+            // object returned by ResultSet.getBigDecimal is not correctly 
matched to the table
+            // schema reported by ResultSetMetaData.getPrecision and 
ResultSetMetaData.getScale.
+            // If inserting values like 19999 into a column with NUMBER(12, 2) 
type, you get through
+            // a BigDecimal object with scale as 0. But the dataframe schema 
has correct type as
+            // DecimalType(12, 2). Thus, after saving the dataframe into 
parquet file and then
+            // retrieve it, you will get wrong result 199.99.
+            // So it is needed to set precision and scale for Decimal based on 
JDBC metadata.
+            case DecimalConversion(Some((p, s))) =>
+              val decimalVal = rs.getBigDecimal(pos)
+              if (decimalVal == null) {
+                mutableRow.update(i, null)
+              } else {
+                mutableRow.update(i, Decimal(decimalVal, p, s))
+              }
+            case DecimalConversion(None) =>
               val decimalVal = rs.getBigDecimal(pos)
               if (decimalVal == null) {
                 mutableRow.update(i, null)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-7299][SQL] Set precision and scale for Decimal according to JDBC metadata instead of returned BigDecimal

Reply via email to