yashmayya commented on code in PR #15245:
URL: https://github.com/apache/pinot/pull/15245#discussion_r2040698744


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/blocks/SerializedDataBlock.java:
##########
@@ -0,0 +1,86 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.query.runtime.blocks;
+
+import com.google.common.base.Preconditions;
+import java.util.List;
+import org.apache.pinot.common.datablock.DataBlock;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.core.util.DataBlockExtractUtils;
+
+/// A block that contains data in serialized format.
+///
+/// This class is a subclass of [MseBlock.Data] and is used to store data in 
serialized format.
+/// This is the most efficient way to store data, but it is also the hardest 
to work with.
+/// As the day this comment was written, this class is only used when we need 
to shuffle data through the network.
+/// In all other cases, we use [RowHeapDataBlock].
+public class SerializedDataBlock implements MseBlock.Data {
+  private final DataBlock _dataBlock;
+
+  /// Creates a new block with the given data block.
+  /// @param dataBlock The data block to store in this block. It cannot be a 
metadata block.
+  /// @throws IllegalArgumentException If the data block is a metadata block.
+  public SerializedDataBlock(DataBlock dataBlock) {
+    Preconditions.checkArgument(dataBlock.getDataBlockType() != 
DataBlock.Type.METADATA,
+        "SerializedDataBlock cannot be used to decorate metadata block");
+    _dataBlock = dataBlock;
+  }
+
+  /// Returns the data block stored in this block.
+  /// It is guaranteed that the returned data block is not a metadata block.
+  public DataBlock getDataBlock() {
+    return _dataBlock;
+  }
+
+  @Override
+  public int getNumRows() {
+    return _dataBlock.getNumberOfRows();
+  }
+
+  @Override
+  public DataSchema getDataSchema() {
+    return _dataBlock.getDataSchema();
+  }
+
+  @Override
+  public RowHeapDataBlock asRowHeap() {
+    List<Object[]> rows = DataBlockExtractUtils.extractRows(_dataBlock);
+    return new RowHeapDataBlock(rows, _dataBlock.getDataSchema(), null);

Review Comment:
   I don't think it's being used in `HashExchange` either right? The only place 
it seems to be used is in `DataBlockBuilder` to serialize intermediate result 
values when converting a row heap data block to a serialized data block.



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/blocks/MseBlock.java:
##########
@@ -0,0 +1,178 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.query.runtime.blocks;
+
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.core.common.Block;
+
+
+/// Blocks used by 
[MultiStageOperator][org.apache.pinot.query.runtime.operator.MultiStageOperator]
 to share
+/// information.
+///
+/// Blocks always go from upstream (the children of the operator) to 
downstream (the parent of the operator) and can be
+/// classified in the following categories:
+/// - [Data] blocks: contain data that can be processed by the operator.
+/// - [Eos] blocks: signal the end of a stream. These blocks can be either 
[successful][SuccessMseBlock] or
+/// [error][ErrorMseBlock].
+///
+/// ## The MseBlock API
+/// A MseBlock itself is not very useful, as they have almost no methods.
+/// Instead, they are used as a common sub-interface for [data][Data] and 
[end-of-stream][Eos] blocks,
+/// which are then subclassed to provide the actual functionality.
+/// This pattern follows the principles of Java 17 sealed interfaces and the 
intention is implement them as such once
+/// Pinot source code is migrated to Java 17 or newer, specially in Java 21 
where pattern matching can also be used,
+/// removing the need for the [Visitor] pattern.
+///
+/// Meanwhile, the API force callers to do some castings, but it is a 
trade-off to have a more robust and maintainable
+/// codebase given that we can rely on Java typesystem to verify some 
important properties at compile time instead of
+/// adding runtime checks.
+///
+/// The alternative to this pattern would be to have a single class with all 
methods, adding runtime checks to verify
+/// whether it is acceptable to call a method or not. This is the approach 
that was used in the removed
+/// TransferableBlock class, which was used for all possible block type 
combinations. As a result each method
+/// had to include a runtime check to verify if it was legal to call it given 
some conditions imposed by its attributes.
+/// This approach was error-prone and hard to maintain, as it was easy to 
forget to add a check in a new method or to
+/// know which methods could be called at a given point in time.
+///
+/// ## MseBlock vs DataBlock
+/// MseBlock are conceptually close to 
[DataBlocks][org.apache.pinot.common.datablock.DataBlock].
+/// MseBlocks are sent from one operator to another while DataBlocks are a way 
to codify data. It is important to notice
+/// that MseBlocks don't store stats, while DataBlocks do.
+///
+/// When a MseBlock needs to be sent to another server, it will serialize it 
into a DataBlock. Then, when a DataBlock
+/// is received by another server, it will deserialize it into a MseBlock 
(plus stats if needed). This is done by
+/// [GrpcSendingMailbox][org.apache.pinot.query.mailbox.GrpcSendingMailbox] and
+/// [ReceivingMailbox][org.apache.pinot.query.mailbox.ReceivingMailbox].
+public interface MseBlock extends Block {
+  /// Whether the block is a [Data] block or otherwise an [Eos] block.
+  boolean isData();
+
+  /// Whether the block is an [Eos] block or otherwise a [Data] block.
+  default boolean isEos() {
+    return !isData();
+  }
+
+  /// Whether the block is an [Eos] that signals the end of a stream with no 
errors.
+  boolean isSuccess();
+
+  /// Whether the block is an [Eos] that signals the end of a stream with one 
or more errors.
+  boolean isError();
+
+  <R, A> R accept(Visitor<R, A> visitor, A arg);
+
+  /// A block that contains data.
+  /// These blocks can store data as [rows on heap][RowHeapDataBlock] or as
+  /// [DataBlocks][SerializedDataBlock].
+  interface Data extends MseBlock {
+    /// Returns the number of rows in the block.
+    int getNumRows();
+
+    /// Returns the schema of the data in the block.
+    DataSchema getDataSchema();
+
+    /// Returns the data in the block as a [RowHeapDataBlock].
+    /// This is a no-op if the block is already a [RowHeapDataBlock]} but is 
an CPU and memory intensive operation
+    /// if the block is a [SerializedDataBlock].
+    RowHeapDataBlock asRowHeap();
+    /// Returns the data in the block as a [SerializedDataBlock].
+    /// This is a no-op if the block is already a [SerializedDataBlock] but is 
a CPU and memory intensive operation
+    /// if the block is a [RowHeapDataBlock].
+    /// @throws java.io.UncheckedIOException if the block cannot be serialized.
+    SerializedDataBlock asSerialized();
+
+    /// Returns whether the block is a [RowHeapDataBlock].
+    boolean isRowHeap();
+    /// Returns whether the block is a [SerializedDataBlock].
+    default boolean isSerialized() {
+      return !isRowHeap();
+    }
+
+    @Override
+    default boolean isData() {
+      return true;
+    }
+
+    @Override
+    default boolean isSuccess() {
+      return false;
+    }
+
+    @Override
+    default boolean isError() {
+      return false;
+    }
+
+    <R, A> R accept(Visitor<R, A> visitor, A arg);
+
+    @Override
+    default <R, A> R accept(MseBlock.Visitor<R, A> visitor, A arg) {
+      return accept((MseBlock.Data.Visitor<R, A>) visitor, arg);
+    }

Review Comment:
   Ah, I see. I don't love it because I feel like it introduces a fair bit of 
cognitive overhead but I like your proposed future alternative of using sealed 
classes and pattern matching based switch statements.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to