In our case, we had specific matching that we needed to return, so I can't really contribute this to the code base, but we did get this working. Basically, we have a custom request handler. After it receives the search results, we then send this to our matcher algorithm. We then go through each document in the doc list. Based on the field type we are looking at, we send our input data through the correct analyzer and come up with a TokenStream. And then for each document, we also send each value in the field (for multivalued) through that field's analyzer to also produce a TokenStream. Each TokenStream was also sent into a multi-valued HashMap with starting position as the key. We then step through each position to find matches. We use some other hash lists as well to make it more efficient so that we are only analyzing the same data once.
In our case, we were just looking for score of how similar the index and input data were as well as some other information that was specific to our application. So, it is not necessarily how Solr/Lucene determined a match. But, it provided what we needed for our case. And in fact, we did not want exactly how the search results were created. And then we return the NamedList similar to how the highlighter or debug works. One warning is that this is a very doable problem, but is definitely not trivial to implement, depending on your specific requirements. ________________________________ From: Jon Baer <jonb...@gmail.com> To: solr-user@lucene.apache.org Sent: Sat, May 15, 2010 8:56:57 AM Subject: Re: How to tell which field matched? Sorry my response wasn't to actually use debugQuery on for production it was more of wondering if it (the component) gave you the insight data you were looking for, on a side note Im also interested in this type of component because there are a number of projects I have worked on recently where it seems people outside of tuning the index want to know "why did my query match these results?" in some sort of ~plain english explanation~. I have the feeling what you want is possible it's just not finding it's way into the result set yet (guess) or needs a plugin. - Jon On May 15, 2010, at 11:16 AM, Tim Garton wrote: > Additionally, I don't think this gets us what we want with multiValued > fields. It tells if a multiValued field matched, but not which value > out of the multiple values matched. I am beginning to suspect that > this information can't be returned and we may have to restructure our > schema. > > -Tim > > On Sat, May 15, 2010 at 7:12 AM, Sascha Szott <sz...@zib.de> wrote: >> Hi, >> >> I'm not sure if debugQuery=on is a feasible solution in a productive >> environment, as generating such extra information requires a reasonable >> amount of computation. >> >> -Sascha >> >> Jon Baer wrote: >>> >>> Does the standard debug component (?debugQuery=on) give you what you need? >>> >>> >>> http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22 >>> >>> - Jon >>> >>> On May 14, 2010, at 4:03 PM, Tim Garton wrote: >>> >>>> All, >>>> I've searched around for help with something we are trying to do >>>> and haven't come across much. We are running solr 1.4. Here is a >>>> summary of the issue we are facing: >>>> >>>> A simplified example of our schema is something like this: >>>> >>>> <field name="id" type="string" indexed="true" stored="true" >>>> required="true" /> >>>> <field name="title" type="text" indexed="true" stored="true" >>>> required="true" /> >>>> <field name="date_posted" type="tdate" indexed="true" stored="true" /> >>>> <field name="supplement_title" type="text" indexed="true" >>>> stored="true" multiValued="true" /> >>>> <field name="supplement_pdf_url" type="text" indexed="true" >>>> stored="true" multiValued="true" /> >>>> <field name="supplement_pdf_text" type="text" indexed="true" >>>> stored="true" multiValued="true" /> >>>> >>>> When someone does a search we search across the title, >>>> supplement_title, and supplement_pdf_text fields. When we get our >>>> results, we would like to be able to tell which field the search >>>> matched and if it's a multiValued field, which of the multiple values >>>> matched. This is so that we can display results similar to: >>>> >>>> Example Title >>>> Example Supplement Title >>>> Example Supplement Title 2 (your search matched this document) >>>> Example Supplement Title 3 >>>> >>>> Example Title 2 >>>> Example Supplement Title 4 >>>> Example Supplement Title 5 >>>> Example Supplement Title 6 (your search matched this document) >>>> >>>> etc. >>>> >>>> How would you recommend doing this? Is there some way to get solr to >>>> tell us which field matched, including multiValued fields? As a >>>> workaround we have been using highlighting to tell which field >>>> matched, but it doesn't get us what we want for multiValued fields and >>>> there is a significant cost to enabling the highlighting. Should we >>>> design our schema in some other fashion to achieve these results? >>>> Thanks. >>>> >>>> -Tim >>> >> >>