Repository: zeppelin Updated Branches: refs/heads/master e65f73066 -> 0c3b446a5
ZEPPELIN-3412 Enable query prefix syntax in bigquery interpreter ### What is this PR for? - interpreter Enable query prefix syntax in bigquery interpreter. Bigquery has a convenient query prefix syntax #standardSQL and #legacySQL. This enable to switch between standardSQL and legacySQL without changing UseLegacySql option. Currently (v0.8.0 or later), bigquery interpreter definitely runs with UseLegacySql option. As described in the [docs](https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql?hl=en#sql-prefix), when this option is set to true or false, the query prefix cannot be used. - documentation Removed some contents from README.md. They are duplicated with interpreter documentation. ### What type of PR is it? Improvement ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-3412 ### How should this be tested? - Unit test should pass ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? Yes Author: iijima_satoshi <iijima_sato...@cyberagent.co.jp> Closes #2929 from iijima-satoshi/bigquery-sql-prefix and squashes the following commits: e5b1acd [iijima_satoshi] Address comments f88cc42 [iijima_satoshi] Address comments 809f561 [iijima_satoshi] Address comments 4e2a6b2 [iijima_satoshi] Enable query prefix syntax in bigquery interpreter Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/0c3b446a Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/0c3b446a Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/0c3b446a Branch: refs/heads/master Commit: 0c3b446a5304df3f3c0d665d2df926cca2dcddb7 Parents: e65f730 Author: iijima_satoshi <iijima_sato...@cyberagent.co.jp> Authored: Fri Apr 27 16:47:40 2018 +0900 Committer: Felix Cheung <felixche...@apache.org> Committed: Fri Apr 27 23:14:54 2018 -0700 ---------------------------------------------------------------------- bigquery/README.md | 85 -------------------- .../zeppelin/bigquery/BigQueryInterpreter.java | 19 ++++- .../src/main/resources/interpreter-setting.json | 13 +-- .../bigquery/BigQueryInterpreterTest.java | 11 ++- docs/interpreter/bigquery.md | 5 ++ 5 files changed, 35 insertions(+), 98 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/bigquery/README.md ---------------------------------------------------------------------- diff --git a/bigquery/README.md b/bigquery/README.md index fc09763..0dff5fe 100644 --- a/bigquery/README.md +++ b/bigquery/README.md @@ -1,10 +1,6 @@ # Overview BigQuery interpreter for Apache Zeppelin -# Pre requisities -You can follow the instructions at [Apache Zeppelin on Dataproc](https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/blob/master/apache-zeppelin/README.MD) to bring up Zeppelin on Google dataproc. -You could also install and bring up Zeppelin on Google compute Engine. - # Unit Tests BigQuery Unit tests are excluded as these tests depend on the BigQuery external service. This is because BigQuery does not have a local mock at this point. @@ -14,34 +10,6 @@ If you like to run these tests manually, please follow the following steps: * Copy the project ID that you created and add it to the property "projectId" in `resources/constants.json` * Run the command mvn <options> -Dbigquery.text.exclude='' test -pl bigquery -am - -# Interpreter Configuration - -Configure the following properties during Interpreter creation. - -<table class="table-configuration"> - <tr> - <th>Name</th> - <th>Default Value</th> - <th>Description</th> - </tr> - <tr> - <td>zeppelin.bigquery.project_id</td> - <td> </td> - <td>Google Project Id</td> - </tr> - <tr> - <td>zeppelin.bigquery.wait_time</td> - <td>5000</td> - <td>Query Timeout in Milliseconds</td> - </tr> - <tr> - <td>zeppelin.bigquery.max_no_of_rows</td> - <td>100000</td> - <td>Max result set size</td> - </tr> -</table> - # Connection The Interpreter opens a connection with the BigQuery Service using the supplied Google project ID and the compute environment variables. @@ -51,59 +19,6 @@ The Interpreter opens a connection with the BigQuery Service using the supplied We have used the curated veneer version of the Java APIs versus [Idiomatic Java client] (https://github.com/GoogleCloudPlatform/gcloud-java/tree/master/gcloud-java-bigquery) to build the interpreter. This is mainly for usability reasons. -# Enabling the BigQuery Interpreter - -In a notebook, to enable the **BigQuery** interpreter, click the **Gear** icon and select **bigquery**. - -# Using the BigQuery Interpreter - -In a paragraph, use `%bigquery.sql` to select the **BigQuery** interpreter and then input SQL statements against your datasets stored in BigQuery. -You can use [BigQuery SQL Reference](https://cloud.google.com/bigquery/query-reference) to build your own SQL. - -For Example, SQL to query for top 10 departure delays across airports using the flights public dataset - -```bash -%bigquery.sql -SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays -FROM [bigquery-samples:airline_ontime_data.flights] -group by departure_airport -order by 2 desc -limit 10 -``` - -Another Example, SQL to query for most commonly used java packages from the github data hosted in BigQuery - -```bash -%bigquery.sql -SELECT - package, - COUNT(*) count -FROM ( - SELECT - REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, - id - FROM ( - SELECT - SPLIT(content, '\n') line, - id - FROM - [bigquery-public-data:github_repos.sample_contents] - WHERE - content CONTAINS 'import' - AND sample_path LIKE '%.java' - HAVING - LEFT(line, 6)='import' ) - GROUP BY - package, - id ) -GROUP BY - 1 -ORDER BY - count DESC -LIMIT - 40 -``` - # Sample Screenshot  http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java ---------------------------------------------------------------------- diff --git a/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java b/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java index b56f63c..0973fda 100644 --- a/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java +++ b/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java @@ -89,7 +89,7 @@ public class BigQueryInterpreter extends Interpreter { static final String PROJECT_ID = "zeppelin.bigquery.project_id"; static final String WAIT_TIME = "zeppelin.bigquery.wait_time"; static final String MAX_ROWS = "zeppelin.bigquery.max_no_of_rows"; - static final String LEGACY_SQL = "zeppelin.bigquery.use_legacy_sql"; + static final String SQL_DIALECT = "zeppelin.bigquery.sql_dialect"; private static String jobId = null; private static String projectId = null; @@ -226,8 +226,19 @@ public class BigQueryInterpreter extends Interpreter { String projId = getProperty(PROJECT_ID); long wTime = Long.parseLong(getProperty(WAIT_TIME)); long maxRows = Long.parseLong(getProperty(MAX_ROWS)); - String legacySql = getProperty(LEGACY_SQL); - boolean useLegacySql = legacySql == null ? true : Boolean.parseBoolean(legacySql); + String sqlDialect = getProperty(SQL_DIALECT, "").toLowerCase(); + Boolean useLegacySql; + switch (sqlDialect) { + case "standardsql": + useLegacySql = false; + break; + case "legacysql": + useLegacySql = true; + break; + default: + // Enable query prefix like '#standardSQL' if specified + useLegacySql = null; + } Iterator<GetQueryResultsResponse> pages; try { pages = run(sql, projId, wTime, maxRows, useLegacySql); @@ -247,7 +258,7 @@ public class BigQueryInterpreter extends Interpreter { //Function to run the SQL on bigQuery service public static Iterator<GetQueryResultsResponse> run(final String queryString, - final String projId, final long wTime, final long maxRows, boolean useLegacySql) + final String projId, final long wTime, final long maxRows, Boolean useLegacySql) throws IOException { try { logger.info("Use legacy sql: {}", useLegacySql); http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/bigquery/src/main/resources/interpreter-setting.json ---------------------------------------------------------------------- diff --git a/bigquery/src/main/resources/interpreter-setting.json b/bigquery/src/main/resources/interpreter-setting.json index 8e92ee4..8023bed 100644 --- a/bigquery/src/main/resources/interpreter-setting.json +++ b/bigquery/src/main/resources/interpreter-setting.json @@ -22,14 +22,15 @@ "envName": null, "propertyName": "zeppelin.bigquery.max_no_of_rows", "defaultValue": "100000", - "description": "Maximum number of rows to fetch from BigQuery" + "description": "Maximum number of rows to fetch from BigQuery", + "type": "number" }, - "zeppelin.bigquery.use_legacy_sql": { + "zeppelin.bigquery.sql_dialect": { "envName": null, - "propertyName": "zeppelin.bigquery.use_legacy_sql", - "defaultValue": "true", - "description": "set true to use legacy sql", - "type": "checkbox" + "propertyName": "zeppelin.bigquery.sql_dialect", + "defaultValue": "", + "description": "BigQuery SQL dialect (standardSQL or legacySQL). If empty, query prefix like '#standardSQL' can be used.", + "type": "string" } }, "editor": { http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java ---------------------------------------------------------------------- diff --git a/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java b/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java index 04676ab..9dcd9f8 100644 --- a/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java +++ b/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java @@ -74,6 +74,7 @@ public class BigQueryInterpreterTest { p.setProperty("zeppelin.bigquery.project_id", constants.getProjectId()); p.setProperty("zeppelin.bigquery.wait_time", "5000"); p.setProperty("zeppelin.bigquery.max_no_of_rows", "100"); + p.setProperty("zeppelin.bigquery.sql_dialect", ""); intpGroup = new InterpreterGroup(); @@ -85,7 +86,6 @@ public class BigQueryInterpreterTest { @Test public void sqlSuccess() { InterpreterResult ret = bqInterpreter.interpret(constants.getOne(), context); - assertEquals(InterpreterResult.Code.SUCCESS, ret.code()); assertEquals(ret.message().get(0).getType(), InterpreterResult.Type.TABLE); } @@ -93,14 +93,19 @@ public class BigQueryInterpreterTest { @Test public void badSqlSyntaxFails() { InterpreterResult ret = bqInterpreter.interpret(constants.getWrong(), context); - assertEquals(InterpreterResult.Code.ERROR, ret.code()); } @Test + public void testWithQueryPrefix() { + InterpreterResult ret = bqInterpreter.interpret( + "#standardSQL\n WITH t AS (select 1) SELECT * FROM t", context); + assertEquals(InterpreterResult.Code.SUCCESS, ret.code()); + } + + @Test public void testInterpreterOutputData() { InterpreterResult ret = bqInterpreter.interpret("SELECT 1 AS col1, 2 AS col2", context); - String[] lines = ret.message().get(0).getData().split("\\n"); assertEquals(2, lines.length); assertEquals("col1\tcol2", lines[0]); http://git-wip-us.apache.org/repos/asf/zeppelin/blob/0c3b446a/docs/interpreter/bigquery.md ---------------------------------------------------------------------- diff --git a/docs/interpreter/bigquery.md b/docs/interpreter/bigquery.md index 1b90f99..cdac762 100644 --- a/docs/interpreter/bigquery.md +++ b/docs/interpreter/bigquery.md @@ -48,6 +48,11 @@ limitations under the License. <td>100000</td> <td>Max result set size</td> </tr> + <tr> + <td>zeppelin.bigquery.sql_dialect</td> + <td></td> + <td>BigQuery SQL dialect (standardSQL or legacySQL). If empty, [query prefix](https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql#sql-prefix) like '#standardSQL' can be used.</td> + </tr> </table>