This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tika.git


    from 4fbd3ec9a6 TIKA-4327: update junrar
     new 155a620714 ebcdic gate
     new 9ce41f5c70 win-1252 hack
     new 41784517b5 sparse-Latin vCard IBM424 false positive test
     new d370f54247 bump limit to something realistic
     new d28c9b1f71 gate UTF-16 model output on 2-byte column-diversity 
asymmetry
     new 86a4f3e02b Merge remote-tracking branch 'origin/main' into 
4x-reg-test-charset-detection-tweaks

The 6 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../ml/chardetect/MojibusterEncodingDetector.java  | 145 ++++++++++++++++++++-
 .../ml/chardetect/StructuralEncodingRules.java     | 131 +++++++++++++++++++
 .../chardetect/SparseLatinVcardRegressionTest.java | 116 +++++++++++++++++
 3 files changed, 388 insertions(+), 4 deletions(-)
 create mode 100644 
tika-encoding-detectors/tika-encoding-detector-mojibuster/src/test/java/org/apache/tika/ml/chardetect/SparseLatinVcardRegressionTest.java

Reply via email to