Package: python3-lxml Version: 4.6.3+dfsg-0.1 Severity: important X-Debbugs-Cc: micha...@gmail.com
Dear Maintainer, I ran into a bug that causes lxml to truncate output when using "tostring" with encoding set to "utf8", while it works correctly when encoding is set to "utf-8". See attached "bug.py" file with an example to reproduce. The output under "Bad" has truncated text in the last subfield. I've previously reported this bug upstream in https://bugs.launchpad.net/lxml/+bug/1944751 but further testing makes me think that this is Debian specific: when running the attached "bug.py" example in a new virtualenv in which I ran "pip install lxml", and hence using the upstream binary wheel, the bug doesn't arise. Best, Micha -- System Information: Debian Release: 11.0 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 5.10.0-8-amd64 (SMP w/8 CPU threads) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_GB:en Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages python3-lxml depends on: ii libc6 2.31-13 ii libxml2 2.9.10+dfsg-6.7 ii libxslt1.1 1.1.34-4 ii python3 3.9.2-3 Versions of packages python3-lxml recommends: ii python3-bs4 4.9.3-1 ii python3-html5lib 1.1-3 Versions of packages python3-lxml suggests: pn python-lxml-doc <none> pn python3-lxml-dbg <none> -- no debconf information
from lxml.builder import E from lxml.etree import tostring RECORD = E.record CONTROLFIELD = E.controlfield DATAFIELD = E.datafield SUBFIELD = E.subfield INPUT_DATA = { "520": [ { "9": "APS", "a": 'The first measurement of the dependence of <math display="inline"><mrow><mi>γ</mi><mi>γ</mi><mo stretchy="false">→</mo><msup><mrow><mi>μ</mi></mrow><mrow><mo>+</mo></mrow></msup><msup><mrow><mi>μ</mi></mrow><mrow><mo>−</mo></mrow></msup></mrow></math> production on the multiplicity of neutrons emitted very close to the beam direction in ultraperipheral heavy ion collisions is reported. Data for lead-lead interactions at <math display="inline"><mrow><msqrt><mrow><msub><mrow><mi>s</mi></mrow><mrow><mi>N</mi><mi>N</mi></mrow></msub></mrow></msqrt><mo>=</mo><mn>5.02</mn><mtext>\u2009</mtext><mtext>\u2009</mtext><mi>TeV</mi></mrow></math>, with an integrated luminosity of approximately <math display="inline"><mrow><mn>1.5</mn><mtext>\u2009</mtext><mtext>\u2009</mtext><msup><mrow><mi>nb</mi></mrow><mrow><mo>-</mo><mn>1</mn></mrow></msup></mrow></math>, are collected using the CMS detector at the LHC. The azimuthal correlations between the two muons in the invariant mass region <math display="inline"><mrow><mn>8</mn><mo><</mo><msub><mrow><mi>m</mi></mrow><mrow><mi>μ</mi><mi>μ</mi></mrow></msub><mo><</mo><mn>60</mn><mtext>\u2009</mtext><mtext>\u2009</mtext><mi>GeV</mi></mrow></math> are extracted for events including 0, 1, or at least 2 neutrons detected in the forward pseudorapidity range <math display="inline"><mrow><mrow><mo stretchy="false">|</mo><mi>η</mi><mo stretchy="false">|</mo></mrow><mo>></mo><mn>8.3</mn></mrow></math>. The back-to-back correlation structure from leading-order photon-photon scattering is found to be significantly broader for events with a larger number of emitted neutrons from each nucleus, corresponding to interactions with a smaller impact parameter. This observation provides a data-driven demonstration that the average transverse momentum of photons emitted from relativistic heavy ions has an impact parameter dependence. These results provide new constraints on models of photon-induced interactions in ultraperipheral collisions. They also provide a baseline to search for possible final-state effects on lepton pairs caused by traversing a quark-gluon plasma produced in hadronic heavy ion collisions.', }, { "9": "arXiv", "a": "The first measurement of the dependence of $\\gamma\\gamma$$\\to$$\\mu^{+}\\mu^{-}$ production on the multiplicity of neutrons emitted very close to the beam direction in ultraperipheral heavy ion collisions is reported. Data for lead-lead interactions at $\\sqrt{s_\\mathrm{NN}} =$ 5.02 TeV, with an integrated luminosity of approximately 1.5 nb$^{-1}$, were collected using the CMS detector at the LHC. The azimuthal correlations between the two muons in the invariant mass region 8 $\\lt$$m_{\\mu\\mu}$$\\lt$ 60 GeV are extracted for events including 0, 1, or at least 2 neutrons detected in the forward pseudorapidity range $|\\eta|$$\\gt$ 8.3. The back-to-back correlation structure from leading-order photon-photon scattering is found to be significantly broader for events with a larger number of emitted neutrons from each nucleus, corresponding to interactions with a smaller impact parameter. This observation provides a data-driven demonstration that the average transverse momentum of photons emitted from relativistic heavy ions has an impact parameter dependence. These results provide new constraints on models of photon-induced interactions in ultraperipheral collisions. They also provide a baseline to search for possible final-state effects on lepton pairs caused by traversing a quark-gluon plasma produced in hadronic heavy ion collisions.", }, ] } record = RECORD() for tag, values in sorted(INPUT_DATA.items()): for value in values: datafield = DATAFIELD({"tag": tag, "ind1": " ", "ind2": " "}) for code, el in sorted(value.items()): datafield.append( SUBFIELD(el, {"code": code}) ) record.append(datafield) utf8_bad = tostring(record, encoding="utf8") utf8_good = tostring(record, encoding="utf-8") print("Bad:", utf8_bad, "\n", "Good:", utf8_good, sep="\n")