This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-connect-go.git
The following commit(s) were added to refs/heads/master by this push:
new f7ad518 [MINOR] Make readme easier to follow
f7ad518 is described below
commit f7ad5188552c4f0c78c2dc1ad6f24c1977583d5c
Author: Matthew Powers <[email protected]>
AuthorDate: Thu Apr 11 09:05:36 2024 +0900
[MINOR] Make readme easier to follow
### What changes were proposed in this pull request?
Update the README to make it easier to follow.
### Why are the changes needed?
I tried to get spark-connect-go running locally and it was a little
confusing. This new layout should make the setup steps a lot clearer.
### Does this PR introduce _any_ user-facing change?
Just updates the README.
### How was this patch tested?
N/A.
Closes #18 from MrPowers/update-readme.
Authored-by: Matthew Powers <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
README.md | 55 ++++++++++++++++++++++---------------------------------
1 file changed, 22 insertions(+), 33 deletions(-)
diff --git a/README.md b/README.md
index 8b15743..7832edb 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,6 @@ This project houses the **experimental** client for [Spark
Connect](https://spark.apache.org/docs/latest/spark-connect-overview.html) for
[Apache Spark](https://spark.apache.org/) written in [Golang](https://go.dev/).
-
## Current State of the Project
Currently, the Spark Connect client for Golang is highly experimental and
should
@@ -13,33 +12,42 @@ project reserves the right to withdraw and abandon the
development of this proje
if it is not sustainable.
## Getting started
+
+This section explains how to run Spark Connect Go locally.
+
+Step 1: Install Golang: https://go.dev/doc/install.
+
+Step 2: Ensure you have installed `buf CLI` installed, [more info
here](https://buf.build/docs/installation/)
+
+Step 3: Run the following commands to setup the Spark Connect client.
+
```
git clone https://github.com/apache/spark-connect-go.git
git submodule update --init --recursive
make gen && make test
```
-> Ensure you have installed `buf CLI`; [more
info](https://buf.build/docs/installation/)
-## How to write Spark Connect Go Application in your own project
+Step 4: Setup the Spark Driver on localhost.
-See [Quick Start Guide](quick-start.md)
+1. [Download Spark distribution](https://spark.apache.org/downloads.html)
(3.4.0+), unzip the package.
-## Spark Connect Go Application Example
+2. Start the Spark Connect server with the following command (make sure to use
a package version that matches your Spark distribution):
-A very simple example in Go looks like following:
+```
+sbin/start-connect-server.sh --packages
org.apache.spark:spark-connect_2.12:3.4.0
+```
+
+Step 5: Run the example Go application.
```
-func main() {
- remote := "localhost:15002"
- spark, _ := sql.SparkSession.Builder.Remote(remote).Build()
- defer spark.Stop()
-
- df, _ := spark.Sql("select 'apple' as word, 123 as count union all
select 'orange' as word, 456 as count")
- df.Show(100, false)
-}
+go run cmd/spark-connect-example-spark-session/main.go
```
+## How to write Spark Connect Go Application in your own project
+
+See [Quick Start Guide](quick-start.md)
+
## High Level Design
Following [diagram](https://textik.com/#ac299c8f32c4c342) shows main code in
current prototype:
@@ -66,7 +74,6 @@ Following [diagram](https://textik.com/#ac299c8f32c4c342)
shows main code in cur
| SparkConnectServiceClient |--------------+| Spark Driver |
| | | |
+---------------------------+ +----------------+
-
```
`SparkConnectServiceClient` is GRPC client which talks to Spark Driver.
`sparkSessionImpl` generates `dataFrameImpl`
@@ -75,24 +82,6 @@ instances. `dataFrameImpl` uses the GRPC client in
`sparkSessionImpl` to communi
We will mimic the logic in Spark Connect Scala implementation, and adopt Go
common practices, e.g. returning `error` object for
error handling.
-## How to Run Spark Connect Go Application
-
-1. Install Golang: https://go.dev/doc/install.
-
-2. Download Spark distribution (3.4.0+), unzip the folder.
-
-3. Start Spark Connect server by running command:
-
-```
-sbin/start-connect-server.sh --packages
org.apache.spark:spark-connect_2.12:3.4.0
-```
-
-4. In this repo, run Go application:
-
-```
-go run cmd/spark-connect-example-spark-session/main.go
-```
-
## Contributing
Please review the [Contribution to Spark
guide](https://spark.apache.org/contributing.html)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]