On Wed, Aug 01, 2018 at 01:41:24PM +0200, Laura Arjona Reina wrote: > On Tue, 31 Jul 2018 21:40:18 +0800 Jonathan Wiltshire <j...@debian.org> > wrote: > > A number of search languages end up with no results for contextually > > common search terms, for example "debian" or "buster". > > > > To reproduce: > > - use the search box for the term "buster" in English. There are a > > number of results including release information, news items and > > errata. > > - set the language to Vietnamese, Chinese or similar and search again > > - there are no results. > > I can reproduce that. However, searching in Vietnamese for "Debian" or > "Buster" shows results. > > E.g. the search for "Buster" in Vietnamese produces this link as first > match: > > https://www.debian.org/releases/index.vi.html > 100% relevant, matching: buster > > Interestingly, it says "matching: buster" (smallcaps, but I searched for > Buster) > > If I search for "buster" (with quotes), I also get the results.
The matching isn't case-sensitive (but capitalising a word in the query suppresses stemming, as does putting it in quotes). > The relevant code in the Debian website about this bug is in the file > webwml/english/search.xml.in, that I think it just sends the search term > to the search engine (which is in search.debian.org): > > <: my $ext = lc('$(CUR_ISO_LANG)'); $ext =~ s/-/_/; > print > 'template="https://search.debian.org/cgi-bin/omega?P={searchTerms}&HITSPERPAGE={count?}&DB='.$ext.'[CN:-cn:][TW:-tw:][HK:-hk:]"/>'; > :> It looks to me like the problem is there's no explicit stemmer mapping for zh-cn so it uses the English stemmer, but that stemmer wasn't used at index time. Those mappings are in: /srv/search.debian.org/xapian/templates/inc/stemmer (At least on wolkenstein - the host key for search.d.o doesn't seem to match for me so I didn't look there yet - probably I need to update my debian hosts list. The setup is that search.debian.org is cgi-grnet-01.debian.org, but the indexing actually happens on wolkenstein.debian.org and the databases replicated to the front-end machine). I'm not sure how the stemmer mapping file is generated, but I'll look into it today if I can. I think we should be able to just specify a default of "none" but I suspect this file is generated so I need to fix the script not just the current output. > I couldn't find a canonical repository or pseudopackage related to > search.debian.org. For what I've search, it is a "a slightly patched > xapian-omega instance". I've logged in the machine and the code there > has two remote repositories. I'm CC'ing Raphael Geissert (shown as > contact for comments in the search result pages) and Olly Betts (shown > as the author of the last commits in the repo that is currently deployed > in search.debian.org). I hope they can help or tell us how to proceed. Thanks for looping me in. I think that "slightly patched" is out of date and we've been using the standard xapian-omega package for some time now. Cheers, Olly