Re: [PHP-DEV] [RFC][Discussion] Add MariaDB-specific features to mysqlnd and mysqli

Georg Richter Wed, 15 Apr 2026 06:41:38 -0700

Hi Kamil,


> Regarding the progress indicator: it looks to me like it will be very
> difficult to implement properly in PHP, but I might be completely
> misunderstanding the design. If you think a PoC is possible, could you
> please prepare one and maybe then we can come back to this discussion.
>
> ----------------------------
>
We have callback function support in several extension - the closest
matching one is probably curl:

curl_setopt($resource, CURLOPT_PROGRESSFUNCTION, 'progressCallback');

Even if no progress callback function was specified, we will always read
the progress information from server to avoid possible read timeouts for
long running operations.


> This is exactly what exceptions and errors are for. The PHP user
> SHOULD NOT be concerned with what capabilities the server offers and
> which are compatible with PHP. Even if it could be potentially useful
> to PHP developers, we should not expose this information as part of
> mysqli API. Either mysqli supports the feature or it doesn't; there
> should be no maybe.
>
> ----------------------------
>
I disagree — relying on try/catch is a waste of resources, especially in
high-load environments, when the information is already available locally.
It forces the application to parse error messages or perform unnecessary
network round-trips just to 'discover' server limitations.

More importantly, this approach is dangerous for data integrity. For
example, if you attempt to insert 1,000 rows via execute_many() on a MyISAM
table and the last row contains a feature (like an indicator variable) that
the server does not support, 999 rows will be committed before the error is
triggered. You cannot 'catch' your way out of a partially written batch in
a non-transactional engine.
Capability checking is a standard and necessary pattern in PHP. ext/gd uses
a bitmask for supported formats so developers can choose the correct logic
path before processing begins. Furthermore, ext/mysqli already exposes
several capability flags. While some were historically used for internal
purposes or inherited from libmysql (like SSL_VERIFY_SERVER_CERT), the
precedent for exposing capabilities to the user is already firmly
established in the current API.
>
>
> If mysqlnd is going to serialize it all into a large internal buffer
> then it still needs to read all the data from the iterator before
> making the query? It's still going to use the same amount of memory.
> How is it going to maintain constant memory usage?
>

Exactly. Because mysqlnd sits on top of PHP streams — which do not expose
direct access to their internal network buffers (at least not at the time
when Andrey wrote mysqlnd) —mysqlnd must allocate its own memory buffer for
each command before transferring it via the Stream API.

The memory usage is determined by the size of the protocol packet being
sent. To maintain efficiency, mysqlnd manages this buffer dynamically: it
reallocates the buffer only when necessary using an exponential growth
strategy to minimize overhead.

Regardless of whether the input is an array or a stream, the buffer must be
large enough to hold the serialized command. The 'constant memory' aspect
refers to the fact that we aren't duplicating the entire dataset multiple
times in different formats; rather, we are streaming the serialized data
into a managed internal buffer that mysqlnd already relies on for network
communication.


>
> A generator might be a neat trick for users to prepare data on the go,
> but it doesn't reduce the memory usage if the consumer needs to use
> all of the data in one go. As I understand, execute_many will send it
> all in one batch, so the memory footprint stays the same.
>
> ----------------------------
>
No, the peak memory usage is significantly lower. Consider the difference
between file() and fread():
- file()/Array: You must store the entire dataset in PHP memory as zvals,
then duplicate it into the mysqlnd network buffer. You are essentially
double-buffering.
- Generator/Stream: PHP memory remains constant because it only holds one
row at a time. mysqlnd pulls that row, serializes it directly into the
network buffer, and moves on.
By using a stream, you eliminate the massive overhead of the intermediate
PHP array, even if the final network packet requires a large buffer.

>
> Re execute_many parameters:
>
> Despite your convincing arguments for better network utilization by
> providing the types, I still think we should not offer the possibility
> of specifying the types. I don't know what other PHP developers on
> this mailing list think about it, but for me the type feature goes
> against the nature of PHP. Making the parameter optional is very good
> choice and eases my concerns slightly, so if I am outnumbered in my
> opinion, I won't be upset.
>
> The number of mysqli users grows increasingly smaller. Out of this,
> the number of people who will use execute_many and who will need to
> optimize for TINYINT is unbelievably tiny. Any string easily
> overshadows the numerical data. Thus, this feature won't see much
> legitimate use.
>

I agree that 99% of users likely won't specify types. However, there will
always be cases—such as limited memory or restricted CPU—where this
optional parameter is essential. It reduces the footprint and eliminates
the overhead of type conversions on both the client and the server.

Even if the primary use case is 'tiny,' a low-level driver like mysqlnd
should provide the tools for maximum efficiency, especially when the
implementation cost for the engine is minimal but the potential performance
gain for the user is high.


> > Because the MariaDB bulk protocol requires type declarations in the
> packet header before the data rows are sent, the driver cannot "autodetect"
> binary widths from a Generator without reading the entire stream into
> memory first. Providing the $types string acts as a contract that allows
> for true, constant-memory streaming.
>
> Isn't that what it does anyway? You need to read all the data
> (serialize) before you make the EXECUTE command, correct? I don't
> understand why you can't prepare the type specification automatically
> while serializing the data.
>

That works for arrays, but not for streams or generators. Unlike an array,
a stream is a one-way 'pull' mechanism—we don't know the type of the second
or hundredth row until we have already consumed the last.

To determine types automatically, we would have to buffer the entire stream
into memory first to inspect it, which completely defeats the purpose of
using a stream to save memory. Providing the types upfront allows mysqlnd
to serialize the stream directly to the wire in a single pass.



>
> Regarding the control parameter:
>
> Why not make it a callback? Provide the $row as an argument and let
> the user modify it inside the callback, substituting values for
> mysqli_indicator, and returning the row to be inserted. It would offer
> a lot more flexibility to the user and would make the implementation
> simpler. This way, you don't need to implement Null or None anymore.
>
> ----------------------------
>
A callback would actually be a major step backward for performance.
Invoking a PHP user-land closure 100,000 times in a single execute_many()
call introduces a massive overhead due to context switching between the
C-engine and the PHP VM. This would effectively negate the performance
gains we are trying to achieve.

My planned implementation already solves the flexibility problem via
Generators. Since execute_many() accepts an iterable, a user can already
use a generator to perform row-by-row logic. This is far more efficient
than a callback.

Furthermore, a callback does not eliminate the need for
mysqli_indicator::None. Even inside a function, the driver still needs a
clear 'metadata signal' to know whether to pull a value from the data
source or to treat it as an override.

I was asking you to list in the RFC all the possible client errors
> that are added as part of this implementation. For example, "Row %lu
> is not an array". This should be part of the RFC, in my opinion, as we
> may want to discuss the error conditions and messages too.
>
> ----------------------------
>
I haven't introduced any new error codes; all data validation errors use
CR_INVALID_PARAMETER_NO with UNKNOWN_SQLSTATE. I didn't see a requirement
in the RFC guidelines to list every specific error message, but if it is
mandatory for the process, I can add them to the document of course.



>
> Result sets:
>
> That's not what I meant. I was asking whether it could be implemented
> with MARIADB_CLIENT_BULK_UNIT_RESULTS instead. When users execute a
> SELECT with 2 data rows, I would like to see it return 2 mysqli_result
> objects. Same with INSERT statements, it should return a separate
> result for each insertion. If there are arguments against that, they
> should be explained in the RFC.
>

Expecting 1,000 separate result objects for 1,000 inserted rows would cause
a massive performance collapse. Each result set would require its own
network packet and redundant metadata headers, completely defeating the
purpose of a bulk execution API.
Regarding MARIADB_CLIENT_BULK_UNIT_RESULTS: this is a very recnt feature
(introduced in MariaDB 11.5) that allows the server to return a single
result package containing multiple status rows. I did not include it in the
current RFC because it is not yet widely available in LTS releases.
Furthermore, for the MySQL fallback (where this capability is absent),
mysqlnd would have to 'artificially' construct these result sets in memory,
which adds significant overhead. The goal of execute_many is maximum
throughput, which is best achieved by providing a summary of the bulk
operation rather than individual results for every row.

>
> Anyway, the RFC should clearly explain how result fetching works with
> all 3 methods (unbuffered, stored, and get_result) and what are the
> possible gotchas.
>
> ----------------------------
>
The RFC notes that execute_many() can return a result set (e.g., when using
a RETURNING clause or similar). The PHP documentation already clearly
defines how result sets are retrieved from prepared statements: via
bind_result() with store_result()/use_result(), or via get_result().

Since execute_many() follows the existing mysqli_stmt behavior and does not
change how results are buffered or fetched, adding a redundant explanation
of standard mysqli mechanisms would only clutter the RFC.


>
> > For transactional engines like InnoDB, atomicity is guaranteed. If a
> protocol error occurs or a constraint is hit mid-batch, the server handles
> the rollback, ensuring the database remains in a consistent state.
>
> > Transaction Safety & Atomicity: In native MariaDB bulk mode, the entire
> batch is sent as a single unit. In the fallback emulation, rows are
> executed one by one. For non-transactional engines, a failure on row 500
> would leave the first 499 rows committed. To maintain consistency, we
> should recommend that users wrap execute_many() in an explicit transaction
> when portability across MySQL and MariaDB is required.
>
> I am confused. Aren't both of these statements stating the same? Why
> can't you wrap the fallback in an automatic transaction to make it
> work exactly the same as the native MariaDB solution?
>
> If execute_many implies an automatic transaction but only with
> transactional engines, it should be clearly stated in the RFC so that
> it can be later documented in PHP manual too.
>
> ----------------------------
>
> I am against wrapping the fallback in an automatic transaction because a
low-level driver should not modify the session's transactional state behind
the scenes.

If mysqli were to automatically inject START TRANSACTION and COMMIT, it
could unexpectedly commit a user's existing work or interfere with their
manual transaction logic. Furthermore, an 'automatic' transaction would be
a false promise on engines like MyISAM, where BEGIN and COMMIT are simply
ignored, still resulting in partial inserts.

I've updated the RFC to clarify this.


> > Limited Indicator Support: Since the standard MySQL COM_STMT_EXECUTE
> protocol does not understand MariaDB-specific indicators, the fallback will
> only support mysqli_indicator::Null (translated to a standard SQL NULL) and
> mysqli_indicator::None. Indicators like DEFAULT and IGNORE are technically
> impossible to implement in the fallback without complex SQL string
> manipulation/rewriting, which would introduce unacceptable CPU overhead.
>
> And for this reason, I think that maybe we shouldn't even implement
> the control parameter at all. It sounds like a neat feature, but it
> costs performance in a function that is all about improving
> performance, and it is DB-version specific. The new execute_many
> function doesn't need the control parameter to function properly, and
> in my opinion, it would be better to keep it as simple as possible.
> But I am curious to see what others think.
>
Could you please explain why you expect a performance loss? I believe the
opposite is the case.

Consider the following common scenario: A system collects records from
various external APIs. The 'IDs' from these sources are not unique, so we
need the database to generate its own AUTO_INCREMENT keys. Additionally,
for privacy compliance (GDPR/CCPA), we must mask phone numbers during the
import.

The 'expensive' way would be to modify the existing data source. In the
worst case—for instance, if you need to log the original data after the
import—you would have to create a full copy of the data first. You then
have to iterate over the entire dataset to:

- Set every id to null to trigger AUTO_INCREMENT.
- Overwrite every phone number with a masked string (e.g., +XX XXX-XXXXXXX).

By specifying a control parameter, the source data remains completely
untouched. The driver handles both the nullification and the masking at the
C-level.

Example:

/* Raw external data: [External_ID, Name, Phone] */
$external_data = [
    [101, 'John Doe', '555-1234'],
    [102, 'Jane Doe', '555-5678'],
     ...
    [100000, "Rasmus Lerdorf', '431-1233939']
];

/* Control parameter:
   - Column 0 (ID): Force NULL to trigger AUTO_INCREMENT
   - Column 1 (Name): Use mysqli_indicator::None (Keep original data)
   - Column 2 (Phone): Scalar override for privacy masking */
$control = [
    mysqli_indicator::Null,
    mysqli_indicator::None,
    "+XX XXX-XXXXXXX"
];

$stmt->execute_many($external_data, control: $control);

/Georg
-- 
Georg Richter, Staff Software Engineer
Client Connectivity
MariaDB Corporation Ab

Re: [PHP-DEV] [RFC][Discussion] Add MariaDB-specific features to mysqlnd and mysqli

Reply via email to