RE: Google like searching

2006-07-06 Thread Andre Basse
Hi Hoss,

Thank you very much. Works great!

Another question, probably more index related.
When I do a search for "ageing", my query will also return documents
with the word "age" only. (not ageing) 
I could image that age == ageing but not ageing == age.

Please, how can I change that? 


Thanks,

Andre



*
The information contained in this e-mail message and any accompanying files is 
or may be confidential.  If you are not the intended recipient, any use, 
dissemination, reliance, forwarding, printing or copying of this e-mail or any 
attached files is unauthorised. This e-mail is subject to copyright. No part of 
it should be reproduced, adapted or communicated without the written consent of 
the copyright owner. If you have received this e-mail in error, please advise 
the sender immediately by return e-mail, or telephone and delete all copies. 
Fairfax does not guarantee the accuracy or completeness of any information 
contained in this e-mail or attached files. Internet communications are not 
secure, therefore Fairfax does not accept legal responsibility for the contents 
of this message or attached files.
*



Simple faceted browsing

2006-07-06 Thread Nick Snels

Hi,

is anybody currently working on the following item from the tasklist?

* Simple faceted browsing (grouping) support in the standard query handler
   * group by field (provide counts for each distinct value in that field)
   * group by (query1, query2, query3, query4, query5)

I had the intention of doing it myself, but my knowledge of Java proved to
small. I had a look at DisMaxRequestHandler and the StandardRequestHandler
and even the code of Erik Hatcher. But couldn't figure out the flow. My
current solution is to send, in my case, 11 queries
(?indent=on&q=%2B#{query}+%2Bprovince:#{province} each with a different
value for the parameter province. On my development machine it is fast
enough. It would be much easier if I could add a parameter, group by and get
an xml file back with the counts. Which should be a bit faster than sending
11 requests to get the count for each. It would be great to hear from
someone who was implemented this 'simple' grouping in Solr and maybe give me
some pointers. Thanks.

Kind regards,

Nick Snels


Re: Simple faceted browsing

2006-07-06 Thread Erik Hatcher

Nick,

I wish I could help more directly, but, alas, my time is compressed  
and I can't shepherd facets into Solr's core just yet.  Since you're  
focused on a single field currently, here's how I recommend you  
proceed (unless someone goes the full distance on this and makes it  
more easily available):


  - Set up an environment where you can add some custom code to a  
Solr WAR file and deploy it.   At first just subclass  
StandardRequestHandler as a custom class and add in something simple  
like rsp.add("test", "test") and ensure you're getting this custom  
value back in the returned XML.


  - Hard code in those 11 queries (for now) as TermQuery's into an  
array or something.


  - Then loop over all those TermQuery's and do this:

SolrIndexSearcher searcher = req.getSearcher();

for each TermQuery:
  DocSet valueDocSet = searcher.getDocSet(termQuery);
  long count = valueDocSet.intersectionSize(originalQuery);
  rsp.add(termQuery.toString(), count);

This is all off the top of my head with some glancing at the custom  
faceted request handler I created for Collex, so maybe I've  
overlooked something?  But overall its pretty straightforward to get  
counts per field value.  The question is, where do those field values  
come from?   This is why I suggested you hard-code those 11 queries  
for now, and then when that is working you can ramp up and get those  
field values dynamically from the index (which is what my code does,  
but I'm still fiddling to find the best way to cache those values or  
read them from the index dynamically myself).  The above provides the  
counts.  To group actual documents by a field, you could intersect,  
rather than just intersectionSize.


Erik



On Jul 6, 2006, at 7:39 AM, Nick Snels wrote:

is anybody currently working on the following item from the tasklist?

* Simple faceted browsing (grouping) support in the standard query  
handler
   * group by field (provide counts for each distinct value in that  
field)

   * group by (query1, query2, query3, query4, query5)

I had the intention of doing it myself, but my knowledge of Java  
proved to
small. I had a look at DisMaxRequestHandler and the  
StandardRequestHandler
and even the code of Erik Hatcher. But couldn't figure out the  
flow. My

current solution is to send, in my case, 11 queries
(?indent=on&q=%2B#{query}+%2Bprovince:#{province} each with a  
different

value for the parameter province. On my development machine it is fast
enough. It would be much easier if I could add a parameter, group  
by and get
an xml file back with the counts. Which should be a bit faster than  
sending

11 requests to get the count for each. It would be great to hear from
someone who was implemented this 'simple' grouping in Solr and  
maybe give me

some pointers. Thanks.

Kind regards,

Nick Snels




Re: Google like searching

2006-07-06 Thread Yonik Seeley

On 7/6/06, Andre Basse <[EMAIL PROTECTED]> wrote:

Another question, probably more index related.
When I do a search for "ageing", my query will also return documents
with the word "age" only. (not ageing)
I could image that age == ageing but not ageing == age.

Please, how can I change that?


That's stemming at work... all the forms of age will be reduced to a
common root.  One form doesn't get preferential treatment over
another.

One thing you can do is use copyField to add a non-stemmed version of
your field and query across both, and maybe boost the query on the
non-stemmed version. This will help exact matches score higher.


-Yonik


RE: base64 support & containers

2006-07-06 Thread Chris Hostetter

: No - no advanced use of XML has been implemented.
: One of the fields in the add request would contain the original binary
: document encoded in base64, then this would preferably be decoded to
: binary and placed into a lucene binary field, which would need to be
: defined in Solr.

Ah! ... I think I'm understanding now: your goal is to be able to send
binary data to Solr in some way as a field value when adding/updating a
doc -- preferably by base64 encoding it -- and then get the data back in
the same way when fetching the doc as a result of a query, but instead of
just storing the base64 encoded data, you'd like Solr to utilize the
"binary" storage mechanism thta Lucene supports  presumably because it
should take up less space then storing hte base64 encoded value.

does that capture your goal fairly?

there's no way to do this with Solr out of the box ... but i think it
should be possible to write your own subclass of FieldType which does the
base64 decoding/encoding in the createField and write methods.  (no
existing subclasses override createField, they leverage it by implimenting
toInternal, but that assumes you want to use the String constructor
of Field -- it doesn't mean you can't override it and use the byte[]
constructor instead)

once you have your new FieldType, you can use it in your schema just like
any other built in field type class...

  
  


...that *should* work, but by all means if you run into snags feel
free to send followup questions to the solr-dev list.



-Hoss