This is an automated email from the ASF dual-hosted git repository.
tallison pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tika.git
from 4fbd3ec9a6 TIKA-4327: update junrar
new 155a620714 ebcdic gate
new 9ce41f5c70 win-1252 hack
new 41784517b5 sparse-Latin vCard IBM424 false positive test
new d370f54247 bump limit to something realistic
new d28c9b1f71 gate UTF-16 model output on 2-byte column-diversity
asymmetry
new 86a4f3e02b Merge remote-tracking branch 'origin/main' into
4x-reg-test-charset-detection-tweaks
The 6 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
.../ml/chardetect/MojibusterEncodingDetector.java | 145 ++++++++++++++++++++-
.../ml/chardetect/StructuralEncodingRules.java | 131 +++++++++++++++++++
.../chardetect/SparseLatinVcardRegressionTest.java | 116 +++++++++++++++++
3 files changed, 388 insertions(+), 4 deletions(-)
create mode 100644
tika-encoding-detectors/tika-encoding-detector-mojibuster/src/test/java/org/apache/tika/ml/chardetect/SparseLatinVcardRegressionTest.java