Copilot commented on code in PR #905:
URL: https://github.com/apache/incubator-graphar/pull/905#discussion_r2911445813


##########
README.md:
##########
@@ -386,7 +386,7 @@ The merge-based decoding method yields the largest gain, 
where “binary (RLE) +
     </tbody>
 </table>
 <p><strong>Notes: <a href="https://github.com/apache/pinot"; 
target="_blank">Pinot (P)</a>, <a href="https://github.com/neo4j/neo4j"; 
target="_blank">Neo4j (N)</a>, <a 
href="https://arrow.apache.org/docs/cpp/streaming_execution.html"; 
target="_blank">Acero (A)</a>, and GraphAr (G).
-“OM” denotes failed execution due to out-of-memory errors. 
+"OM" denotes failed execution due to out-of-memory errors. 
 While both Pinot and Neo4j are widely-used, they
 are not natively designed for data lakes and require an Extract-Transform-Load 
(ETL) process for integration. The three representative queries includes 
neighbor retrieval and label filtering, reference to <a 
href="https://github.com/ldbc/ldbc_snb_bi"; target="_blank">LDBC SNB Business 
Intelligence</a> and <a 
href="https://github.com/ldbc/ldbc_snb_interactive_v1_impls"; 
target="_blank">LDBC SNB Interactive v1 </a> workload implementations. 
</strong></p>

Review Comment:
   Minor formatting/grammar in the Notes paragraph: the line with "\"OM\" 
denotes..." has trailing whitespace, and "The three representative queries 
includes" should be "include". Cleaning these up will improve the rendered 
README and avoid stray whitespace.



##########
README.md:
##########
@@ -257,21 +257,21 @@ Here we show statistics of datasets with hundreds of 
millions of vertices from [
 width="700" alt="storage consumption" />
 
 Two baseline approaches are
-considered: 1) “plain”, which employs plain encoding for the
-source and destination columns, and 2) “plain + offset”, which
-extends the “plain” method by sorting edges and adding an
-offset column to mark each vertex’s starting edge position.
+considered: 1) "plain", which employs plain encoding for the
+source and destination columns, and 2) "plain + offset", which
+extends the "plain" method by sorting edges and adding an
+offset column to mark each vertex's starting edge position.
 The result
 is a notable storage advantage: on average, GraphAr requires
-only 27.3% of the storage needed by the baseline “plain +
-offset”, which is due to delta encoding.
+only 27.3% of the storage needed by the baseline "plain +
+offset", which is due to delta encoding.
 
 ### I/O speed
 <img src="docs/images/benchmark_IO_time.png" class="align-center"
 width="700" alt="I/O time" />
 
 In (a) indicate that GraphAr significantly
-outperforms the baseline (CSV), achieving an average speedup of 4.9×. In 
Figure (b), the immutable (“Imm”) and mutable (“Mut”) variants are two native 
in-memory storage of GraphScope. It demonstrates that although the querying 
time with GraphAr exceeds that of the in-memory storages, attributable to 
intrinsic I/O overhead, it significantly surpasses the process of loading and 
then
+outperforms the baseline (CSV), achieving an average speedup of 4.9×. In 
Figure (b), the immutable ("Imm") and mutable ("Mut") variants are two native 
in-memory storage of GraphScope. It demonstrates that although the querying 
time with GraphAr exceeds that of the in-memory storages, attributable to 
intrinsic I/O overhead, it significantly surpasses the process of loading and 
then

Review Comment:
   The sentence starting with "In (a) indicate" is grammatically 
incorrect/unclear. It likely should refer to the figure (e.g., "Figure (a) 
indicates ..."), and later in the same paragraph "two native in-memory storage" 
should be plural/clear (e.g., "storage systems"). Updating the wording would 
improve readability.



##########
README.md:
##########
@@ -257,21 +257,21 @@ Here we show statistics of datasets with hundreds of 
millions of vertices from [
 width="700" alt="storage consumption" />
 
 Two baseline approaches are
-considered: 1) “plain”, which employs plain encoding for the
-source and destination columns, and 2) “plain + offset”, which
-extends the “plain” method by sorting edges and adding an
-offset column to mark each vertex’s starting edge position.
+considered: 1) "plain", which employs plain encoding for the
+source and destination columns, and 2) "plain + offset", which
+extends the "plain" method by sorting edges and adding an
+offset column to mark each vertex's starting edge position.
 The result
 is a notable storage advantage: on average, GraphAr requires
-only 27.3% of the storage needed by the baseline “plain +
-offset”, which is due to delta encoding.
+only 27.3% of the storage needed by the baseline "plain +
+offset", which is due to delta encoding.

Review Comment:
   The quoted baseline name is split across a hard line break ("plain +\n 
offset"), which makes the Markdown source harder to read/search and can render 
awkwardly in some contexts. Consider keeping the full quoted phrase on a single 
line (or remove the quotes and wrap the phrase differently) so the term appears 
as one contiguous string.



##########
README.md:
##########
@@ -280,23 +280,23 @@ executing the query, by 2.4× and 2.5×, respectively. This 
indicates that Graph
 width="700" alt="Neighbor retrival" />
 
 We query vertices with the largest
-degree in selected graphs, maintaining edges in CSR-like or CSC-like formats 
depending on the degree type. GraphAr significantly outperforms the baselines, 
achieving an average speedup of 4452× over the “plain” method, 3.05× over 
“plain + offset”, and 1.23× over “delta + offset”. -->
+degree in selected graphs, maintaining edges in CSR-like or CSC-like formats 
depending on the degree type. GraphAr significantly outperforms the baselines, 
achieving an average speedup of 4452× over the "plain" method, 3.05× over 
"plain + offset", and 1.23× over "delta + offset". -->
 ### Label Filtering
 <img src="docs/images/benchmark_label_simple_filter.png" class="align-center"
 width="700" alt="Simple condition filtering" />
 
 **Performance of simple condition filtering.**
 For each graph, we perform experiments where we consider
 each label individually as the target label for filtering.
-GraphAr consistently outperforms the baselines. On average, it achieves a 
speedup of 14.8× over the “string” method, 8.9× over the “binary (plain)” 
method, and 7.4× over the  “binary (RLE)” method.
+GraphAr consistently outperforms the baselines. On average, it achieves a 
speedup of 14.8× over the "string" method, 8.9× over the "binary (plain)" 
method, and 7.4× over the  "binary (RLE)" method.

Review Comment:
   There is an extra double space before the quoted term ("over the  \"binary 
(RLE)\" method"). Please remove the duplicate whitespace to keep formatting 
consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to