Thanks Doug, removing "query" definitely helped. I just switched to
Ivan's new patch (which definitely helped a lot - no SEVERE errors now
- thanks Ivan!) but I'm still struggling with faceting myself.
Basically, I can tell that faceting is happening after the collapse -
because the facet counts are definitely lower than they would be
otherwise. For example, with one search, I'd have 196 results with no
collapsing, I get 120 results with collapsing - but the facet count is
119??? In other searches the difference is more drastic - In another
search, I get 61 results without collapsing, 61 with collapsing, but
the facet count is 39.
Looking at it for a while now, I think I can guess what the problem
might be...
The incorrect counts seem to only happen when the term in question
does not occur evenly across all duplicates of a document. That is,
multiple document records may exist for the same image (it's an image
search engine), but each document will have different terms in
different fields depending on the audience it's targeting. So, when
you collapse, the counts are lower than they should be because when
you actually execute a search with that facet's term included in the
query, *all* the documents after collapsing will be ones that have
that term.
Here's an illustration:
Collapse field is "link_id", facet field is "keyword":
Doc 1:
id: 123456,
link_id: 2,
keyword: Black, Printed, Dress
Doc 2:
id: 123457,
link_id: 2,
keyword: Black, Shoes, Patent
Doc 3:
id: 123458,
link_id: 2,
keyword: Red, Hat, Felt
Doc 4:
id: 123459,
link_id:1,
keyword: Felt, Hat, Black
So, when you collapse, only two of these documents are in the result
set (123456, 123459), and only the keywords Black, Printed, Dress,
Felt, and Hat are counted. The facet count for Black is 2, the facet
count for Felt is 1. If you choose Black and add it to your query,
you get 2 results (great). However, if you add *Felt* to your query,
you get 2 results (because a different document for link_id 2 is
chosen in that query than is in the more general query from which the
facets are produced).
I think what needs to happen here is that all the terms for all the
documents that are collapsed together need to be included (just once)
with the document that gets counted for faceting. In this example,
when the document for link_id 2 is counted, it would need to appear to
the facet counter to have keywords Black, Printed, Dress, Shoes,
Patent, Red, Hat, and Felt, as opposed to just Black, Printed, and
Dress.
Unfortunately, not knowing Java at all really, I have absolutely no
idea how this change would be implemented... I mean I can tweak here
or there, but I think this is above my pay grade. I've looked at the
code affected by the patch and the code for faceting but I can't make
heads or tails of it.
I think I'll go post this over on the JIRA...
Any ideas?
--
Steve
On Dec 10, 2008, at 8:52 AM, Doug Steigerwald wrote:
The first output is from the query component. You might just need
to make the collapse component first and remove the query component
completely.
We perform geographic searching with localsolr first (if we need
to), and then try to collapse those results (if collapse=true). If
we don't have any results yet, that's the only time we use the
standard query component. I'm making sure we set the
builder.setNeedDocSet=false and then I modified the query component
to only execute when builder.isNeedDocSet=true.
In the field collapsing patch that I'm using, I've got code to
remove a previous 'response' from the builder.rsp so we don't have
duplicates.
Now, if I could get field collapsing to work properly with a docSet/
docList from localsolr and also have faceting work, I'd be golden.
Doug
On Dec 9, 2008, at 9:37 PM, Stephen Weiss wrote:
Hi Tracy,
Well, I managed to get it working (I think) but the weird thing is,
in the XML output it gives both recordsets (the filtered and
unfiltered - filtered second). In the JSON (the one I actually use
anyway, at least) I only get the filtered results (as expected).
In my core's solrconfig.xml, I added:
<searchComponent name="collapse"
class="org.apache.solr.handler.component.CollapseComponent" />
(I'm not sure if it's supposed to go anywhere in particular but for
me it's right before StandardRequestHandler)
and then within StandardRequestHandler:
<requestHandler name="standard" class="solr.StandardRequestHandler">
<!-- default values for query parameters -->
<lst name="defaults">
<str name="echoParams">explicit</str>
<!--
<int name="rows">10</int>
<str name="fl">*</str>
<str name="version">2.1</str>
-->
</lst>
<arr name="components">
<str>query</str>
<str>facet</str>
<str>mlt</str>
<str>highlight</str>
<str>debug</str>
<str>collapse</str>
</arr>
</requestHandler>
Which is basically all the default values plus collapse. Not sure
if this was needed for prior versions, I don't see it in any patch
files (I just got a vague idea from looking at a comment from
someone else who said it wasn't working for them). It would kinda
be nice if someone working on the code might throw us a bone and
say explicitly what the right options to put in the config file are
(if there are even supposed to be any - for all I know, this is
just a bandaid over a larger problem). I know it's not done yet
though... just a pointer for this patch might be handy, it's really
a useful feature if it works (I was kinda shocked this wasn't part
of the standard distribution since it's something I had to do so
often with mysql, kinda lucky I guess that it only came up now).
Another issue I'm having now is the faceting doesn't seem to change
- even if I set the collapse.facet option to "after"... I should
really try "before" and see what happens.
Of course, I just realized the integrity of my collapse field is
not so great so I have to go back and redo the data :-)
Best of luck.
--
Steve
On Dec 9, 2008, at 7:49 PM, Tracy Flynn (SOLR) wrote:
Steve,
I need this too. As my previous posting said, I adapted the 1.2
field collapsing back at the beginning of the year, so I'm
somewhat familiar.
I'll try and get a look this weekend. It's the earliest I''m
likely to get spare cycles. I'll post any results.
Tracy
On Dec 9, 2008, at 4:18 PM, Stephen Weiss wrote:
Hi,
I'm trying to use field collapsing with our SOLR but I just can't
seem to get it to do anything.
I've downloaded a dist copy of solr 1.3 and applied Ivan de
Prado's patch - reading through the source code, the patch
definitely was applied successfully (all the changes are in the
right places, I've checked every single one).
I've run ant clean, ant compile, and ant dist to produce the war
file in the dist/ folder, and then put the war file in place and
restarted jetty. According to the logs, jetty is definitely
loading the right war file. If I expand the war file and grep
through the files, it would appear the collapsing code is there.
However, when I add any sort of collapse parameters (I've tried
any combination of collapse=true collapse.field=link_id
collapse.threshold=1 collapse.type=normal
collapse.info.doc=true), the result set is no different from
normal query, and there is no collapse data returned in the XML.
I'm not a java developer, this is my first time using ant period,
and I'm just following basic directions I found on google.
Here is the output of the compilation process:
I really need this patch to work for a project... Can someone
please tell me what I'm missing to get this to work? I can't
really find any documentation beyond adding the collapse options
to the query string, so it's hard to tell - is there an option in
solrconfig.xml or in the core configuration that needs to be
set? Am I going about this entirely the wrong way?
Thanks for any advice, I appreciate it.
[ sorry if you get this twice, I accidentally sent first from the
wrong e-mail address and I don't think it went through ]
--
Steve