Thanks Erick !
As I re-checked the configuration files, it turns out someone had
modified the /solr/conf/*stopwords.txt* on the production server,
and now we know what problem we're dealing with, which seems to be
related to:
-
http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html#a493488
-
http://stackoverflow.com/questions/3635096/dismax-feat-stopwords-synonyms-etc
Now I've tried to get around that issue by changing <str
name="mm">2<-35%</str> to <str name="mm">1</str> in *solrconfig.xml*,
as suggested on http://drupal.org/node/1102646#comment-4249774 which
actually gets us results for the incriminated queries, but it adds way
too much *noise*...
So I tried to make sure all my field types were using our
StopFilterFactory (even <fieldType name="string" class="solr.StrField"
sortMissingLast="true" omitNorms="true">), with no luck.
I'll keep on looking for clues, meanwhile if there's a known way around
that issue, I'd be really grateful to hear about it :)
Cheers !
Paul
Le 15/05/2011 16:48, Erick Erickson a écrit :
What happens if you copy the index from one machine to the other? Probably from
prod to test. If your results stay the same, that'd eliminate index
differences as
the culprit.
What do you get by attaching&debugQuery=on the the queries that differ?
Is the parsed query any different? I'm wondering here if you somehow
have a difference in the configuration, perhaps dismax? anyway, if the
parsed queries are identical, that eliminates that possibility.
Next, what about synonym files? Stopwords? Are you absolutely sure they're
identical?
If you're using dismax, is it possible that the mm (minimum should match)
is different?
Perhaps this is all stuff you've done already, but this would at least narrow
down where the problem might lie...
Best
Erick
On Wed, May 11, 2011 at 12:10 PM, Paul Michalet<p...@pix-l.fr> wrote:
Thanks for the hint :)
We ruled that out after having tested special characters, and if it was an
applicative bug, it wouldn't work consistently like it currently does for
the majority of queries.
The only difference we noticed was in the HTTP headers in the SOLR response:
occasionnally, the "Content-length" is present, but I've been told it was
probably not causing our bug:
=> dev:
headers = Array
(
[0] => HTTP/1.1 200 OK
[1] => Last-Modified: Fri, 29 Apr 2011 13:36:21 GMT
[2] => ETag: "MTFjZjU2MTgxNDgwMDAwMFNvbHI="
[3] => Content-Type: text/plain; charset=utf-8
[4] => Server: Jetty(6.1.3)
)
=> production:
headers = Array
(
[0] => HTTP/1.1 200 OK
[1] => Last-Modified: Fri, 06 May 2011 14:18:36 GMT
[2] => ETag: "OGI3ZWYyZDUxNDgwMDAwMFNvbHI="
[3] => Content-Type: text/plain; charset=utf-8
[4] => Content-Length: 2558
[5] => Server: Jetty(6.1.3)
)
Paul Michalet
Le 11/05/2011 17:47, Paul Libbrecht a écrit :
Could it be something in the transmission of the query?
Or is it also identical?
paul
Le 11 mai 2011 à 17:19, Paul Michalet a écrit :
Hello everyone
We have succesfully installed SOLR on 2 servers (developpement and
production), using the same configuration files and paths.
Both SOLR instances have indexed the same contents and most queries give
identical results, but there's a few exceptions where the production
instance returns 0 results (the developpement instance returns perfectly
valid results for the same query).
We checked the logs in both environments without finding anything
suspicous (the queries are rigorously identical, and the index is built in
the exact same way) and we've run out of options as to where to look for
debugging these cases.
Our developpement server is Debian and the production is CentOS;
the SOLR version installed in both environments is 1.4.0.
The weird thing is that the few queries failing in the production
instance contain very common terms (without quotes) which, when queried
individually, return valid results...
Any pointers would be greatly appreciated;
thanks in advance !
Paul