On Thu, Jun 05, 2025 at 10:45:17PM +0200, Patrice Dumas wrote:
> I do not have any good idea on what more could be done to understand
> what is going on.

I based my investigations on commit 08c59dc0cbc36 (2025-06-05) as
you said that your later commits did not fix the problem.

Summary: I think I have found the explanation, but have not confirmed it.  It
means that the approach of using SV instead of HV is flawed.

I started with your explanation and reduced the test case to the following 
input:

$ cat sv-hv-fail.texi
\input texinfo.tex

@node Top
@top top

@node chapter
@chapter chap

@itemize @minus
@c comment in itemize
@cindex also a cindex in itemize
@item e--mph item
@end itemize

@bye

Then, I replicated the failure with the following:

$ TEXINFO_XS_STRUCTURE=0 TEXINFO_OUTPUT_FORMAT=plaintexinfo perl ./texi2any.pl 
-c TREE_TRANSFORMATIONS=move_index_entries_after_items sv-hv-fail.texi

In the output from this, the "@c comment in itemize" line is missing.

I tried checking ref counts using Devel::Refcount, but like you, I did
not find any problems.

I added print statements to debug the code and confirm the problem (see
below).

The trace output for one run included:

REPARENT FROM (before_item)[C2] at ../perl/Texinfo/ManipulateTree.pm line 1293.
REPARENT FROM |Texinfo::TreeElement=HASH(0x55cb44f4dbd0)|
 CURRENT(before_item)[C2] at ../perl/Texinfo/ManipulateTree.pm line 1295.
 CURRENT |Texinfo::TreeElement=HASH(0x55cb44f4dbd0)|
  Ref count to contents array: 3 at ../perl/Texinfo/ManipulateTree.pm line 1298.
  Ref count to contents 1: 4 at ../perl/Texinfo/ManipulateTree.pm line 1299.
  contents array: |ARRAY(0x55cb44f4e698)|
  contents: 
Texinfo::TreeElement=HASH(0x55cb44e83ac8):Texinfo::TreeElement=HASH(0x55cb44f4dbd0):Texinfo::TreeElement=HASH(0x55cb44f4e0f8):Texinfo::TreeElement=HASH(0x55cb44f4e350)
 at ../perl/Texinfo/ManipulateTree.pm line 1301.
  =REPARENT @cindex(index_entry_command)[C1]
  =REPARENT |Texinfo::TreeElement=HASH(0x55cb44f4dd38)|
  Ref count to contents array: 3 at ../perl/Texinfo/ManipulateTree.pm line 1313.
  Ref count to contents 1: 4 at ../perl/Texinfo/ManipulateTree.pm line 1314.
  contents array: |ARRAY(0x55cb44f4e698)|
  contents: 
Texinfo::TreeElement=HASH(0x55cb44e83ac8):Texinfo::TreeElement=HASH(0x55cb44f4e0f8):Texinfo::TreeElement=HASH(0x55cb44f4e0f8):Texinfo::TreeElement=HASH(0x55cb44f4e350)
 at ../perl/Texinfo/ManipulateTree.pm line 1316.
 DEPARENTED |(before_item)[C2]
 DEPARENTED |Texinfo::TreeElement=HASH(0x55cb44f4dbd0)|
 CURRENT@item[C2] at ../perl/Texinfo/ManipulateTree.pm line 1319.
 CURRENT |Texinfo::TreeElement=HASH(0x55cb44f4e0f8)|

You can see the HASH(0x55cb44f4dbd0) disappears from the second element (index 
1)
of the array and is replaced by HASH(0x55cb44f4e0f8), confirming your analysis.

I had a breakthrough when I realised that the referenced parent element,
the $previous_ending_container->{'contents'}->[$i]->{'parent'}, was the 
before_item
element that was being overwritten.  What if the assignment didn't just change 
the
value for 'parent' in the hash 
(%{$previous_ending_container->{'contents'}->[$i]}),
but modified the object that was referenced?

When I added the line to delete this key from the hash first, so there was
no chance to modify the previously referenced object, the output changed, and
looked correct.  I added:

+          delete $previous_ending_container->{'contents'}->[$i]->{'parent'};

Then the trace output looks like:

REPARENT FROM (before_item)[C2] at ../perl/Texinfo/ManipulateTree.pm line 1293.
REPARENT FROM |Texinfo::TreeElement=HASH(0x55d2f3eb0468)|
 CURRENT(before_item)[C2] at ../perl/Texinfo/ManipulateTree.pm line 1295.
 CURRENT |Texinfo::TreeElement=HASH(0x55d2f3eb0468)|
  Ref count to contents array: 3 at ../perl/Texinfo/ManipulateTree.pm line 1298.
  Ref count to contents 1: 4 at ../perl/Texinfo/ManipulateTree.pm line 1299.
  contents array: |ARRAY(0x55d2f3eb0f48)|
  contents: 
Texinfo::TreeElement=HASH(0x55d2f3e904c0):Texinfo::TreeElement=HASH(0x55d2f3eb0468):Texinfo::TreeElement=HASH(0x55d2f3eb0990):Texinfo::TreeElement=HASH(0x55d2f3eb0be8)
 at ../perl/Texinfo/ManipulateTree.pm line 1301.
  =REPARENT @cindex(index_entry_command)[C1]
  =REPARENT |Texinfo::TreeElement=HASH(0x55d2f3eb05d0)|
  Ref count to contents array: 3 at ../perl/Texinfo/ManipulateTree.pm line 1313.
  Ref count to contents 1: 4 at ../perl/Texinfo/ManipulateTree.pm line 1314.
  contents array: |ARRAY(0x55d2f3eb0f48)|
  contents: 
Texinfo::TreeElement=HASH(0x55d2f3e904c0):Texinfo::TreeElement=HASH(0x55d2f3eb0468):Texinfo::TreeElement=HASH(0x55d2f3eb0990):Texinfo::TreeElement=HASH(0x55d2f3eb0be8)
 at ../perl/Texinfo/ManipulateTree.pm line 1316.
 DEPARENTED |(before_item)[C2]
 DEPARENTED |Texinfo::TreeElement=HASH(0x55d2f3eb0468)|
 CURRENT(before_item)[C2] at ../perl/Texinfo/ManipulateTree.pm line 1319.
 CURRENT |Texinfo::TreeElement=HASH(0x55d2f3eb0468)|

As you can see, HASH(0x55d2f3eb0468) is now referenced consistently throughout.

Blessed hashes are not supposed to behave like this, though.  They don't make
any assignments to an lvalue referencing the blessed hash "copy-by-value" in
that the blessed hash itself should be overwritten.

Indeed, I could not get the losing behaviour with test Perl programs I
tried with blessed references.

Then I suspected something was strange with the way the data structures
were constructed.  I suspect it would be hard to replicate the loss without
using XS.

"man guts" states re hashes:

       Perl keeps the actual data in a linked list of structures with a typedef 
of
       HE.  These contain the actual key and value pointers (plus extra
       administrative overhead).  The key is a string pointer; the value is an
       "SV*". 

Perhaps a hash shouldn't share its SV * values with any other data structures,
and when a hash is assigned to, these SV values can be assigned to, clobbering
anything else that shares the SV value?

In element_to_perl_hash, commit 359b73825d2 changed:

-      sv = newRV_inc ((SV *) e->parent->hv);
-      hv_store (e->hv, "parent", strlen ("parent"), sv, HSH_parent);
+      sv = SvREFCNT_inc ((SV *) e->parent->sv);
+      hv_store (element_hv, "parent", strlen ("parent"), sv, HSH_parent);

So rather than creating a new reference and storing it in the hash, the same
SV object is stored in the hash that is used elsewhere (including in element
contents arrays).

When you delete the hash entry for "parent" with 'delete', and then recreate
it, you get a new SV that isn't shared with any other data structure.

To explain the problem another way, I believe that if you had XS code like:

HV *hv = newHV ();
SV *sv = newSViv(7);
hv_store (hv, "foo", strlen ("foo"), sv, 0);
hv_store (hv, "bar", strlen ("bar"), sv, 0);

and then, in Perl code, did

$hv->{'foo'} = 8;

then $hv->{'bar'} would also change from 7 to 8.  (I haven't tested this.)


Changes for testing:

diff --git a/tta/perl/Texinfo/Common.pm b/tta/perl/Texinfo/Common.pm
index 52fc87124a..c20fa2718c 100644
--- a/tta/perl/Texinfo/Common.pm
+++ b/tta/perl/Texinfo/Common.pm
@@ -58,6 +58,9 @@ collect_commands_in_tree
 collect_commands_list_in_tree
 valid_customization_option
 valid_tree_transformation
+debug_print_element
+debug_print_element_details
+debug_print_tree
 );
 
 # This is where the Texinfo modules get access to __( without explicit
diff --git a/tta/perl/Texinfo/ManipulateTree.pm 
b/tta/perl/Texinfo/ManipulateTree.pm
index 6d0d74a3a4..e0eccbd7ba 100644
--- a/tta/perl/Texinfo/ManipulateTree.pm
+++ b/tta/perl/Texinfo/ManipulateTree.pm
@@ -51,7 +51,10 @@ use Texinfo::XSLoader;
 
 use Texinfo::TreeElement;
 
-use Texinfo::Common;
+use Texinfo::Common qw(debug_print_element
+debug_print_element_details
+debug_print_tree
+);
 
 require Exporter;
 our @ISA = qw(Exporter);
@@ -1224,6 +1227,8 @@ sub protect_first_parenthesis($)
   }
 }
 
+use Devel::Refcount qw ( refcount );
+
 sub move_index_entries_after_items($)
 {
   # enumerate or itemize
@@ -1231,6 +1236,14 @@ sub move_index_entries_after_items($)
 
   return unless ($current->{'contents'});
 
+  warn "Ref count to contents array: ", refcount($current->{'contents'});
+  my $array_anchor = $current->{'contents'};
+  warn "Ref count to contents array: ", refcount($current->{'contents'});
+
+
+  warn "=====MOVE AFTER ITEMS\n";
+  warn "CURRENT ".debug_print_tree($current)."\n";
+
   my $previous;
   foreach my $item (@{$current->{'contents'}}) {
     #print STDERR "Before proceeding: $previous $item->{'cmdname'} 
(@{$previous->{'contents'}})\n" if ($previous and $previous->{'contents'});
@@ -1246,6 +1259,9 @@ sub move_index_entries_after_items($)
       } else {
         $previous_ending_container = $previous;
       }
+      warn debug_print_element($previous_ending_container)."\n";
+
+
 
       my $contents_nr = scalar(@{$previous_ending_container->{'contents'}});
 
@@ -1274,11 +1290,35 @@ sub move_index_entries_after_items($)
           $item_container = $item;
         }
 
+        warn "REPARENT FROM ".debug_print_element($previous_ending_container);
+        warn "REPARENT FROM |".$previous_ending_container."|\n";
+        warn " CURRENT".debug_print_element($current->{'contents'}->[1]);
+        warn " CURRENT |".$current->{'contents'}->[1]."|\n";
+
+  warn "  Ref count to contents array: ", refcount($current->{'contents'});
+  warn "  Ref count to contents 1: ", refcount($current->{'contents'}->[1]);
+  warn "  contents array: |", $current->{'contents'} . "|\n";
+  warn "  contents: ", join(':', @{$current->{'contents'}});
+
         for (my $i = $last_entry_idx; $i < $contents_nr; $i++) {
+          warn "  =REPARENT 
".debug_print_element($previous_ending_container->{'contents'}->[$i])."\n";
+          warn "  =REPARENT 
|".$previous_ending_container->{'contents'}->[$i]."|\n";
+          #$previous_ending_container->{'contents'}->[$i]->{'parent'}
+          #    = $item_container;
+          #delete $previous_ending_container->{'contents'}->[$i]->{'parent'};
           $previous_ending_container->{'contents'}->[$i]->{'parent'}
                = $item_container;
         }
 
+  warn "  Ref count to contents array: ", refcount($current->{'contents'});
+  warn "  Ref count to contents 1: ", refcount($current->{'contents'}->[1]);
+  warn "  contents array: |", $current->{'contents'} . "|\n";
+  warn "  contents: ", join(':', @{$current->{'contents'}});
+        warn " DEPARENTED 
|".debug_print_element($previous_ending_container)."\n";
+        warn " DEPARENTED |".$previous_ending_container."|\n";
+        warn " CURRENT".debug_print_element($current->{'contents'}->[1]);
+        warn " CURRENT |".$current->{'contents'}->[1]."|\n";
+
         my $insertion_idx = 0;
         if ($item_container->{'contents'}
             and $item_container->{'contents'}->[0]
@@ -1303,6 +1343,8 @@ sub move_index_entries_after_items($)
     }
     $previous = $item;
   }
+  warn "CURRENT ".debug_print_tree($current)."\n";
+  warn "=====MOVE AFTER ITEMS DONE\n";
 }
 
 sub _move_index_entries_after_items($$)



Reply via email to