Copilot commented on code in PR #2802:
URL: https://github.com/apache/sedona/pull/2802#discussion_r3005849482
##########
docs/api/sql/Raster-Functions.md:
##########
@@ -131,6 +131,7 @@ These functions perform operations on raster objects.
| [RS_Union](Raster-Operators/RS_Union.md) | Raster | Returns a combined
multi-band raster from 2 or more input Rasters. The order of bands in the
resultant raster will be in the order of the input rasters. For example if
`RS_Union` is called on two 2... | v1.6.0 |
| [RS_Value](Raster-Operators/RS_Value.md) | Double | Returns the value at the
given point in the raster. If no band number is specified it defaults to 1. |
v1.4.0 |
| [RS_Values](Raster-Operators/RS_Values.md) | `Array<Double>` | Returns the
values at the given points or grid coordinates in the raster. If no band number
is specified it defaults to 1. | v1.4.0 |
+| [RS_AsRaster](Raster-Output/RS_AsRaster.md) | Raster | Converts a vector
geometry into a raster dataset by assigning a specified value to all pixels
covered by the geometry. | v1.5.0 |
Review Comment:
`RS_AsRaster` is listed under “Raster Operators” but links to a page in
`Raster-Output/`. This categorization/path mismatch makes the function harder
to discover/maintain; either move the row to the “Raster Output” table (with
the other `RS_As*` functions) or relocate the doc page to the operators
directory and update links consistently.
```suggestion
| [RS_AsRaster](Raster-Operators/RS_AsRaster.md) | Raster | Converts a
vector geometry into a raster dataset by assigning a specified value to all
pixels covered by the geometry. | v1.5.0 |
```
##########
docs/usecases/ApacheSedonaRaster.ipynb:
##########
@@ -449,7 +449,7 @@
"metadata": {},
"source": [
"### Convert a geometry to raster (Rasterize a geometry)\n",
- "A geometry can be converted to a raster using
[RS_AsRaster](https://sedona.apache.org/1.5.0/api/sql/Raster-writer/#rs_asraster)"
+ "A geometry can be converted to a raster using
[RS_AsRaster](https://sedona.apache.org/latest/api/sql/Raster-Output/RS_AsRaster/)"
Review Comment:
This notebook now links `RS_AsRaster` to `/latest/` while most other
references still point to the `1.5.0` docs. Consider using a relative link into
this repo’s docs site (or updating the other links) to avoid mixing versions
and sending readers to inconsistent documentation.
```suggestion
"A geometry can be converted to a raster using
[RS_AsRaster](https://sedona.apache.org/1.5.0/api/sql/Raster-Output/RS_AsRaster/)"
```
##########
docs/tutorial/raster.md:
##########
@@ -503,47 +565,93 @@ Please refer to [Raster visualizer
docs](../api/sql/Raster-Functions.md#raster-o
## Save to permanent storage
-Sedona has APIs that can save an entire raster column to files in a specified
location. Before saving, the raster type column needs to be converted to a
binary format. Sedona provides several functions to convert a raster column
into a binary column suitable for file storage. Once in binary format, the
raster data can then be written to files on disk using the Sedona file storage
APIs.
-
-```sparksql
-rasterDf.write.format("raster").option("rasterField",
"raster").option("fileExtension",
".tiff").mode(SaveMode.Overwrite).save(dirPath)
-```
+Saving raster data is a two-step process: (1) convert the Raster column to
binary format using an `RS_AsXXX` function, and (2) write the binary DataFrame
to files using Sedona's `raster` data source writer.
-Sedona has a few writer functions that create the binary DataFrame necessary
for saving the raster images.
+### Step 1: Convert to binary format
-### As Arc Grid
+Choose one of the following output format functions:
-Use [RS_AsArcGrid](../api/sql/Raster-writer.md#rs_asarcgrid) to get the binary
Dataframe of the raster in Arc Grid format.
+| Function | Format | Description |
+| :--- | :--- | :--- |
+| [RS_AsGeoTiff](../api/sql/Raster-Output/RS_AsGeoTiff.md) | GeoTiff |
General-purpose raster format with optional compression |
+| [RS_AsCOG](../api/sql/Raster-Output/RS_AsCOG.md) | Cloud Optimized GeoTiff |
Ideal for cloud storage with efficient range-read access |
+| [RS_AsArcGrid](../api/sql/Raster-Output/RS_AsArcGrid.md) | Arc Grid |
ASCII-based format, single band only |
+| [RS_AsPNG](../api/sql/Raster-Output/RS_AsPNG.md) | PNG | Image format,
unsigned integer pixel types only |
```sql
-SELECT RS_AsArcGrid(raster)
+SELECT RS_AsGeoTiff(rast) AS raster_binary FROM rasterDf
```
-### As GeoTiff
+### Step 2: Write to files
-Use [RS_AsGeoTiff](../api/sql/Raster-writer.md#rs_asgeotiff) to get the binary
Dataframe of the raster in GeoTiff format.
+Use Sedona's built-in `raster` data source to write the binary DataFrame:
-```sql
-SELECT RS_AsGeoTiff(raster)
-```
+=== "Scala"
+ ```scala
+ df.withColumn("raster_binary", expr("RS_AsGeoTiff(rast)"))
+ .write.format("raster").mode("overwrite").save("my_raster_file")
+ ```
Review Comment:
In the “Write to files” example, `df` is not defined anywhere in the
tutorial (the DataFrame in prior sections is `rasterDf`). Update the snippet to
use the correct DataFrame variable (or show how `df` is created) so readers can
run it as-is.
##########
docs/api/sql/Raster-Output/RS_AsGeoTiff.md:
##########
@@ -0,0 +1,65 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# RS_AsGeoTiff
+
+Introduction: Returns a binary DataFrame from a Raster DataFrame. Each raster
object in the resulting DataFrame is a GeoTiff image in binary format.
+
+Possible values for `compressionType`: `None`, `PackBits`, `Deflate`,
`Huffman`, `LZW` and `JPEG`
+
+Possible values for `imageQuality`: any decimal number between 0 and 1. 0
means the lowest quality and 1 means the highest quality.
+
+Format:
+
+`RS_AsGeoTiff(raster: Raster)`
+
+`RS_AsGeoTiff(raster: Raster, compressionType: String, imageQuality: Double)`
+
+Return type: `Binary`
+
+Since: `v1.4.1`
+
+SQL Example
+
+```sql
+SELECT RS_AsGeoTiff(raster) FROM my_raster_table
+```
+
+SQL Example
+
+```sql
+SELECT RS_AsGeoTiff(raster, 'LZW', '0.75') FROM my_raster_table
Review Comment:
In the second SQL example, `imageQuality` is documented as a `Double` but
the example passes `'0.75'` as a quoted string. Use a numeric literal (e.g.,
`0.75`) to match the signature and avoid relying on implicit casts.
```suggestion
SELECT RS_AsGeoTiff(raster, 'LZW', 0.75) FROM my_raster_table
```
##########
docs/api/sql/Raster-Output/RS_AsRaster.md:
##########
@@ -0,0 +1,116 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+ -->
+
+# RS_AsRaster
+
+Introduction: `RS_AsRaster` converts a vector geometry into a raster dataset
by assigning a specified value to all pixels covered by the geometry. Unlike
`RS_Clip`, which extracts a subset of an existing raster while preserving its
original values, `RS_AsRaster` generates a new raster where the geometry is
rasterized onto a raster grid. The function supports all geometry types and
takes the following parameters:
+
+* `geom`: The geometry to be rasterized.
+* `raster`: The reference raster to be used for overlaying the `geom` on.
+* `pixelType`: Defines data type of the output raster. This can be one of the
following, D (double), F (float), I (integer), S (short), US (unsigned short)
or B (byte).
+* `allTouched` (Since: `v1.7.1`): Decides the pixel selection criteria. If set
to `true`, the function selects all pixels touched by the geometry, else,
selects only pixels who's centroids intersect the geometry. Defaults to `false`.
+* `Value`: The value to be used for assigning pixels covered by the geometry.
Defaults to using `1.0` for cell `value` if not provided.
Review Comment:
The parameter list has a couple of grammar/wording issues: use “whose” (not
“who's”) and keep parameter names consistently cased (e.g., `value`, not
`Value`). This improves clarity and avoids implying a different argument name.
```suggestion
* `allTouched` (Since: `v1.7.1`): Decides the pixel selection criteria. If
set to `true`, the function selects all pixels touched by the geometry, else,
selects only pixels whose centroids intersect the geometry. Defaults to `false`.
* `value`: The value to be used for assigning pixels covered by the
geometry. Defaults to using `1.0` for cell `value` if not provided.
```
##########
docs/tutorial/raster.md:
##########
@@ -503,47 +565,93 @@ Please refer to [Raster visualizer
docs](../api/sql/Raster-Functions.md#raster-o
## Save to permanent storage
-Sedona has APIs that can save an entire raster column to files in a specified
location. Before saving, the raster type column needs to be converted to a
binary format. Sedona provides several functions to convert a raster column
into a binary column suitable for file storage. Once in binary format, the
raster data can then be written to files on disk using the Sedona file storage
APIs.
-
-```sparksql
-rasterDf.write.format("raster").option("rasterField",
"raster").option("fileExtension",
".tiff").mode(SaveMode.Overwrite).save(dirPath)
-```
+Saving raster data is a two-step process: (1) convert the Raster column to
binary format using an `RS_AsXXX` function, and (2) write the binary DataFrame
to files using Sedona's `raster` data source writer.
-Sedona has a few writer functions that create the binary DataFrame necessary
for saving the raster images.
+### Step 1: Convert to binary format
-### As Arc Grid
+Choose one of the following output format functions:
-Use [RS_AsArcGrid](../api/sql/Raster-writer.md#rs_asarcgrid) to get the binary
Dataframe of the raster in Arc Grid format.
+| Function | Format | Description |
+| :--- | :--- | :--- |
+| [RS_AsGeoTiff](../api/sql/Raster-Output/RS_AsGeoTiff.md) | GeoTiff |
General-purpose raster format with optional compression |
+| [RS_AsCOG](../api/sql/Raster-Output/RS_AsCOG.md) | Cloud Optimized GeoTiff |
Ideal for cloud storage with efficient range-read access |
+| [RS_AsArcGrid](../api/sql/Raster-Output/RS_AsArcGrid.md) | Arc Grid |
ASCII-based format, single band only |
+| [RS_AsPNG](../api/sql/Raster-Output/RS_AsPNG.md) | PNG | Image format,
unsigned integer pixel types only |
```sql
-SELECT RS_AsArcGrid(raster)
+SELECT RS_AsGeoTiff(rast) AS raster_binary FROM rasterDf
```
-### As GeoTiff
+### Step 2: Write to files
-Use [RS_AsGeoTiff](../api/sql/Raster-writer.md#rs_asgeotiff) to get the binary
Dataframe of the raster in GeoTiff format.
+Use Sedona's built-in `raster` data source to write the binary DataFrame:
-```sql
-SELECT RS_AsGeoTiff(raster)
-```
+=== "Scala"
+ ```scala
+ df.withColumn("raster_binary", expr("RS_AsGeoTiff(rast)"))
+ .write.format("raster").mode("overwrite").save("my_raster_file")
+ ```
-### As Cloud Optimized GeoTiff
+=== "Python"
+ ```python
+ df.withColumn("raster_binary",
expr("RS_AsGeoTiff(rast)")).write.format("raster").mode(
+ "overwrite"
+ ).save("my_raster_file")
+ ```
-Use [RS_AsCOG](../api/sql/Raster-writer.md#rs_ascog) to get the binary
Dataframe of the raster in [Cloud Optimized GeoTiff](https://www.cogeo.org/)
(COG) format. COG is ideal for cloud-hosted raster data because it supports
efficient range-read access over HTTP.
+The writer data source options are:
-```sql
-SELECT RS_AsCOG(raster)
-```
+| Option | Default | Description |
+| :--- | :--- | :--- |
+| `rasterField` | The `binary` type column | The name of the binary column to
write. Required if the DataFrame has multiple binary columns. |
+| `fileExtension` | `.tiff` | File extension for output files (e.g., `.png`,
`.asc`). |
+| `pathField` | None | Column name containing the output file paths. If not
set, each file gets a random UUID name. |
+| `useDirectCommitter` | `true` | If `true`, files are written directly to the
target location. If `false`, files are written to a temp location first.
Writing with `false` is slower, especially on object stores like S3. |
-### As PNG
+Example with all options:
-Use [RS_AsPNG](../api/sql/Raster-writer.md#rs_aspng) to get the binary
Dataframe of the raster in PNG format.
+=== "Scala"
+ ```scala
+ df.withColumn("raster_binary", expr("RS_AsGeoTiff(rast)"))
+ .write.format("raster")
+ .option("rasterField", "raster_binary")
+ .option("pathField", "path")
+ .option("fileExtension", ".tiff")
+ .mode("overwrite")
+ .save("my_raster_file")
Review Comment:
This “Example with all options” block also uses an undefined `df` variable.
Please make it consistent with the earlier `rasterDf` (or define `df`) to avoid
copy/paste errors.
##########
docs/tutorial/raster.md:
##########
@@ -213,31 +266,48 @@ The output will look like this:
```
| path| modificationTime|length| content|
+--------------------+--------------------+------+--------------------+
-|file:/Download/ra...|2023-09-06 16:24:...|209199|[4D 4D 00 2A 00 0...|
-|file:/Download/ra...|2023-09-06 16:24:...|174803|[49 49 2A 00 08 0...|
|file:/Download/ra...|2023-09-06 16:24:...|174803|[49 49 2A 00 08 0...|
-|file:/Download/ra...|2023-09-06 16:24:...| 6619|[49 49 2A 00 08 0...|
```
-The content column in the raster table is still in the raw form, binary form.
+For multiple raster data files, you can load them recursively:
+
+=== "Python"
+ ```python
+ rawDf = (
+ sedona.read.format("binaryFile")
+ .option("recursiveFileLookup", "true")
+ .option("pathGlobFilter", "*.asc*")
+ .load(path_to_raster_data_folder)
+ )
+ rawDf.createOrReplaceTempView("rawdf")
+ rawDf.show()
+ ```
-## Create a Raster type column
+### Step 2: Create a Raster type column
-All raster operations in SedonaSQL require Raster type objects. Therefore,
this should be the next step after loading the data.
+All raster operations in SedonaSQL require Raster type objects. Use one of the
following constructors:
-### From Geotiff
+#### From GeoTiff
```sql
SELECT RS_FromGeoTiff(content) AS rast, modificationTime, length, path FROM
rawdf
```
-To verify this, use the following code to print the schema of the DataFrame:
+#### From Arc Grid
```sql
-rasterDf.printSchema()
+SELECT RS_FromArcInfoAsciiGrid(content) AS rast, modificationTime, length,
path FROM rawdf
```
-The output will be like this:
+#### From NetCDF
+
+See [RS_FromNetCDF](../api/sql/Raster-Constructors/RS_FromNetCDF.md) for
details on loading NetCDF files.
+
+To verify the raster column was created successfully:
+
+```sql
Review Comment:
`rasterDf.printSchema()` is a DataFrame API call, but it’s currently in a
```sql fenced block. This renders incorrectly and is confusing; switch the
fence to the appropriate language (e.g., scala/python) or replace it with a
SQL-only verification query.
```suggestion
```scala
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]