Hi Kamil,
Thank you for the detailed feedback. I’ve updated the RFC to reflect
several of your points, specifically regarding the separation of concerns.
To keep things clean, everything not directly visible to the user (metadata
caching, extended metadata, auth plugins) will be moved to separate PRs.
> Extended Metadata:
> It looks like you have some formatting issues there. I don't
> understand what this feature is, how to use it, nor how it will be
> exposed. It needs more explanation.
>
> In the table, you mention that it doesn't require any user-facing
> changes, but it does. The returned structure is extended with two new
> fields from what I can see. Additionally, the PoC changes existing
> field values too.
>
Extended metadata will go to a separate PR - we only need to map
MYSQL_TYPE_LONGTYPE + json to MYSQL_TYPE_JSON. and MYSQL_TYPE_BLOB + vector
to MYSQL_TYPE_VECTOR.
>
> ----------------------------
>
> Prepared Statement Metadata Caching:
> In theory I am all for adding transparent improvements and they don't
> usually require an RFC, but in this case, I don't understand how the
> caching will be done. Are you saying the user will need to cache the
> metadata somewhere? Is the cache shared across requests? How will the
> cache be invalidated/cleaned?
>
Right this will also be implemented via PR
>
> From the implementation, it seems like the cache is only per prepared
> statement, i.e. only the first execute() call will fetch the metadata.
> If that's all it is then it can be made into a simple PR without RFC.
> But in the table you say "Avoids resending metadata for repeated
> prepares" which confuses me.
>
The caching mechanism is designed to be transparent within the mysqlnd
layer. Historically, MySQL and older MariaDB versions sent metadata during
the PREPARE phase and again for every subsequent EXECUTE.
MariaDB has optimized this: the server now only sends metadata during
EXECUTE if the table structure or definition has changed. By implementing
caching in mysqlnd, we can skip the redundant allocation and parsing of
metadata packets. This is managed internally by the driver and does not
require the user to manually clean or invalidate the cache.
> > Performance Improvements: Existing PHP functions that prepare and
> execute statements, such as mysql_execute_query() (introduced in PHP 8.2),
> could be updated to use execute_direct when connected to a MariaDB server.
> This would allow one-time statements to bypass the extra round-trip and
> metadata fetch, providing immediate performance gains.
>
> Does this refer to caching? How does caching improve the performance
> of a one-time executed prepared statement? If that's not what it
> refers to, then what did you mean?
>
Sorry I forgot to remove this before I published it. MariaDB supports
direct execution by sending prepare and execute together - this doesn't
seem to work with php_streams.
>
> ----------------------------
>
> Progress Indication:
> I'm quite worried about this one. PHP doesn't have native asynchronous
> support yet. How do you intend for this to work when a built-in PHP
> function can be suspended and possibly interrupted completely while
> the MySQL is processing the request?
>
To clarify: this is not asynchronous in the Fiber sense. It is a
synchronous "keep-alive" mechanism. During long-running operations (like
ALTER TABLE, LOAD DATA, or complex INSERT...SELECT statements), the server
sends "progress" packets.
Currently, mysqlnd ignores these, which can lead to connection timeouts or
a lack of feedback for the user. By supporting these packets, the driver
stays active, preventing timeouts and allowing the application to provide
real-time feedback to the user or logs, ensuring the client knows the
server is still alive and working.
>
> ----------------------------
>
> PDO Considerations:
> Why is PDO out of scope? PDO is the main database extension in PHP. If
> any new feature is added, then it should probably be exposed via PDO
> first. What do you mean by "PDO_MySQL currently does not support array
> binding or bulk execution semantics"?
>
PDO is currently out of scope only to ensure a manageable and focused RFC.
Implementing these features in mysqlnd provides the necessary binary
foundation. A future "PDO Bulk RFC" can then leverage this work across all
drivers (PostgreSQL, SQLite, etc.) without needing to reinvent the
protocol-specific logic for the MySQL/MariaDB ecosystem. We are building
the engine first; the unified PDO interface can follow.
>
> ----------------------------
>
> Server capabilities:
> I don't think it's more robust to detect MariaDB server by the lack of
> CLIENT_LONG_PASSWORD than it is by the server_version. What if MySQL
> decides to remove that flag in the future? What if derived products
> don't follow the same convention?
>
> Why is it necessary to expose this information to the PHP user via
> server_capabilities and ext_server_capabilities? I can't find anywhere
> that you explain what information is presented in these properties or
> how to use it.
>
There is a fundamental difference between what a server might do (Version)
and what it can do (Capabilities). A version string is a static label.
Capability flags, however, reflect the server’s current build-time features
and runtime configuration.
Exposing ext_server_capabilities allows developers and frameworks to
implement reliable feature detection. Without these flags, a developer
cannot safely use functions without risking a fatal error or needing to
wrap every call in a try-catch block.
----------------------------
>
> Memory Consumption:
> Why is using generators going to use less memory? How does the new
> function use generators that it's able to provide performance
> improvements without reading all of the data from the generator
> beforehand? This needs to be clearly explained so that it can be
> explained in the PHP manual later. What is the difference between
> these two:
>
> ```
> $stmt->executemany("sd", $generator);
> $stmt->executemany("sd", iterator_to_array($generator));
> ```
>
> The performance improvement is not derived from the Generator itself, but
from the reduction of network round-trips.
- standard foreach + execute(): 1,000 rows = 1,000 "Send" packets + 1,000
"Response" packets.
- execute_many(): The driver pulls rows from the iterable and serializes
them into a large internal network buffer. 1,000 rows = 1 large Send packet
+ 1 Response packet.
Regarding iterator_to_array($generator): This forces PHP to allocate memory
for every single row simultaneously. If you are importing 1 million rows,
PHP must allocate memory for 1 million arrays before the first byte is ever
sent. By using the Generator directly, we maintain O(1) memory usage,
fetching and serializing only one row at a time.
> ----------------------------
>
> mysqli_stmt::executemany implementation:
> You only mention the OO-style method in the RFC. Will the procedural
> style counterpart also be implemented?
>
yes - OO and procedural functions use the same code.
>
> The name doesn't follow mysqli naming standard. It should be called
> mysqli_stmt::execute_many. However, I wonder what made you land on
> this name instead of mysqli_stmt::bulk_execute?
>
> There should be no $types parameter IMHO. The data should either
> always be treated as text, or it should be bound using the PHP type,
> i.e. autodetected. I believe it was a mistake that bind_param() has a
> $types parameter. All it serves is to cast the data silently into the
> specified type. It has been a source of many silent bugs and a lot of
> confusion. Additionally, giving users more type choices in
> executemany() will not only create an inconsistency with bind_param(),
> but also confuse PHP users who don't know if a variable is 8, or 16,
> or 32, or 64-bit.
>
You are absolutely right—the name should follow the underscore convention,
so it will be execute_many. Regarding the parameter order and the $types
question, the proposed signature could be:
public mysqli_stmt::execute_many(iterable $data, ?iterable $control = null,
?string $types = null): bool
public mysqli_stmt_execute_many(mysqli_stmt $stmt, iterable $data,
?iterable $control = null, ?string $types = null): bool
By placing $control and $types at the end as optional parameters, we can
leverage PHP 8 named parameters to resolve the conflict between ease-of-use
and technical optimization.
1. The "All-Strings" Path (Implicit Autodetection)
To address your concern about autodetection: providing $types is not
mandatory. If the user simply calls $stmt->execute_many($data), the driver
will implicitly treat all scalar values as strings. Since MariaDB and MySQL
handle implicit casting efficiently, this provides the seamless "it just
works" experience you requested for the majority of use cases.
2. The "Strict-Typed" Path (Memory & Bandwidth Optimization)
While autodetection is convenient, specifying $types remains critical when
resource optimization is a factor:
- Binary Efficiency: By specifying types like i (LONG) or 1 (TINYINT), we
significantly reduce the network payload compared to sending everything as
strings (up to an 80% reduction for certain numeric columns).
- O(1) Memory Safety for Streams: Because the MariaDB bulk protocol
requires type declarations in the packet header before the data rows are
sent, the driver cannot "autodetect" binary widths from a Generator without
reading the entire stream into memory first. Providing the $types string
acts as a contract that allows for true, constant-memory streaming.
3. The "Control" Parameter and Indicators
I agree that the Indicator Enum should be moved to mysqli
(mysqli_indicator). The $control array serves three critical architectural
purposes:
- Non-Destructive Transformation: It allows developers to apply logic (like
DEFAULT or NULL overrides) without modifying the original $data array or
stream. This is vital if the data source is a read-only Generator or is
being reused elsewhere in the application.
- Security (Sensitive Data Overwrite): The $control array acts as a secure
template. A developer can pass a raw data array but use the control array
to force a status column to "active" or a last_modified_by column to a
specific ID, ensuring sensitive fields are overwritten regardless of the
input data's content.
-Flexibility: It supports both a Global Mode (one row that acts as a mask
for the entire batch) and a Streaming Mode (an iterable that advances in
parallel with the data).
This design ensures that execute_many remains simple for beginners while
providing the granular control required by high-performance implementations.
> Please be explicit what the failure conditions may be.
>
Like for all other functions you have to retrieve error messages via
mysqli_stmt::error/errno.
> How does get_result() work with executemany()? For example, if I
> execute SELECT ?,? with [[1,2], [3,4]] will I get that same structure
> in the result set? Is the sequence guaranteed? Is there a way to
> distinguish which row came from which input data row? What if one
> SELECT produces two rows and the other zero? Will it still be a single
> mysqli_result?
>
As mentioned in the beginning of the RFC, execute_many is used for UPSERT
(insert, replace,delete and update) operations. However it might contain a
result set, e.g. by having a RETURNING clause.
>
> How does it work with store_result() and also with unbuffered result
> sets? I assume it's all treated the same as a normal execute(),
> correct?
>
Yep
>
> How does this affect insert_id? MariaDB documentation says that only
> "the first auto-generated ID" is returned, which is contrary to what I
> expect: the last. Does the value propagate to PHP? If so, why?
>
This behavior matches the standard MySQL C API and protocol where
mysql_insert_id() returns the first ID of a multi-row insert, not the last:
MySQL [test]> truncate table t1;
Query OK, 0 rows affected (0.075 sec)
MySQL [test]> insert into t1 (name) values ('foo'),('bar'),('PHP');
Query OK, 3 rows affected (0.017 sec)
Records: 3 Duplicates: 0 Warnings: 0
MySQL [test]> select last_insert_id()\G;
*************************** 1. row ***************************
last_insert_id(): 1
>
> From what I can see, it is possible to request that MariaDB reply with
> individual result sets. Wouldn't that be better? How would that impact
> the performance?
>
If you require the IDs of every row in a bulk insert, the MariaDB RETURNING
clause should be used:
INSERT INTO t1 (name) VALUES (?) RETURNING id
> How about atomicity? If I make 3 INSERTS and the middle one fails,
> will the other two execute?
>
For transactional engines like InnoDB, atomicity is guaranteed. If a
protocol error occurs or a constraint is hit mid-batch, the server handles
the rollback, ensuring the database remains in a consistent state.
>
> Indicator enum should contain only two values: IGNORE and DEFAULT.
> Null is already a PHP data type and does not require an indicator.
> None serves no purpose (what would even happen if someone used it).
> Adding an Indicator enum will create an inconsistency with the
> existing feature set which do not support it.
> An enum should be defined by the extension in which it will be used,
> i.e. mysqli. The mysqlnd extension shuld not expose any user-facing
> APIs.
>
See my comments about new execute_many parameters (control). Even None is
used to indicate to use the original value from $data block.
NULL and mysqli_indicator::Null are indeed the same - however it is more
consistant with other drivers (e.g. ODBC and OCI) to have a Null indicator.
>
> When connected to the MySQL server, how will executemany() behave? I
> assume there will be an emulated support for it in mysqlnd, but how
> will you ensure consistency between these two implementations? Making
> the function no-op with MySQL should be out of the question.
>
When connected to a server that does not support the native bulk protocol
(such as standard MySQL), execute_many() will transition to a fallback
emulation layer. This ensures that the function remains a reliable,
portable API for developers regardless of the backend. However, there are
inherent limitations to this emulation:
- Limited Indicator Support: Since the standard MySQL COM_STMT_EXECUTE
protocol does not understand MariaDB-specific indicators, the fallback will
only support mysqli_indicator::Null (translated to a standard SQL NULL) and
mysqli_indicator::None. Indicators like DEFAULT and IGNORE are technically
impossible to implement in the fallback without complex SQL string
manipulation/rewriting, which would introduce unacceptable CPU overhead.
- Transaction Safety & Atomicity: In native MariaDB bulk mode, the entire
batch is sent as a single unit. In the fallback emulation, rows are
executed one by one. For non-transactional engines, a failure on row 500
would leave the first 499 rows committed. To maintain consistency, we
should recommend that users wrap execute_many() in an explicit transaction
when portability across MySQL and MariaDB is required.
- Performance Delta: While the fallback provides a consistent API, it does
not offer the network round-trip savings of the native protocol. Each row
in the fallback requires a separate request/response cycle, effectively
acting as a "C-speed" foreach loop.
/Georg
--
Georg Richter, Staff Software Engineer
Client Connectivity
MariaDB Corporation Ab