Created attachment 112107 Remove combining characters from normalized text This patch changes normalization so that combining characters are removed from the normalized text. This makes searching through TextPage::findText insensitive to these characters.
Also, renames unicodeNormalizeNFKC to unicodeNormalizeSearch to make it clear it's no longer doing a regular NFKC normalization. Renames decomp_compat to decomp_compat_base because it now strips combing characters, leaving only base characters, in addition to compatibility decomposition. Removes UnicodeCompTables.h and some compose functions. They're no longer needed since we're not recomposing the characters. I'm not sure if UnicodeTypeTable.h and UnicodeCompTables.h are considered part of the public interface. They're included in the xpdf headers. Albert, is it OK to change these files in this way? -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to poppler in Ubuntu. https://bugs.launchpad.net/bugs/116453 Title: evince can not find ü in attached PDF Status in Poppler: Confirmed Status in poppler package in Ubuntu: Triaged Bug description: Binary package hint: evince 1) lsb_release -rd Description: Ubuntu Vivid Vervet (development branch) Release: 15.04 2) apt-cache policy evince evince: Installed: 3.14.1-0ubuntu1 Candidate: 3.14.1-0ubuntu1 Version table: *** 3.14.1-0ubuntu1 0 500 http://us.archive.ubuntu.com/ubuntu/ vivid/main amd64 Packages 100 /var/lib/dpkg/status 3) What is expected to happen with the attached document is when one searches for: über it is found: https://bugs.launchpad.net/ubuntu/+source/poppler/+bug/116453/+attachment/102979/+files/example.pdf 4) What happens instead is it does not return any matches. WORKAROUND: Use the built-in PDF viewer+search with chromium-browser or chrome (doesn't work in Firefox). apt-cache policy chromium-browser chromium-browser: Installed: 39.0.2171.65-0ubuntu0.14.04.1.1064 Candidate: 39.0.2171.65-0ubuntu0.14.04.1.1064 Version table: *** 39.0.2171.65-0ubuntu0.14.04.1.1064 0 500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/universe amd64 Packages 500 http://security.ubuntu.com/ubuntu/ trusty-security/universe amd64 Packages 100 /var/lib/dpkg/status 34.0.1847.116-0ubuntu2 0 500 http://us.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages apt-cache policy google-chrome-stable:i386 google-chrome-stable:i386: Installed: 39.0.2171.95-1 Candidate: 39.0.2171.95-1 Version table: *** 39.0.2171.95-1 0 500 http://dl.google.com/linux/chrome/deb/ stable/main i386 Packages 100 /var/lib/dpkg/status ProblemType: Bug Architecture: i386 Date: Wed May 23 18:22:27 2007 DistroRelease: Ubuntu 7.04 ExecutablePath: /usr/bin/evince Package: evince 0.8.1-0ubuntu1 PackageArchitecture: i386 ProcEnviron: LANGUAGE=en_US:en PATH=~/local/bin:~/local/lib:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11:/usr/games LANG=en_US.UTF-8 SHELL=/bin/bash SourcePackage: evince Uname: Linux copper 2.6.20-15-generic #2 SMP Sun Apr 15 07:36:31 UTC 2007 i686 GNU/Linux To manage notifications about this bug go to: https://bugs.launchpad.net/poppler/+bug/116453/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp