Top 10 results with faceted queries

2007-02-21 Thread Andreas Hochsteger

Hi,

is it possible to do a faceted search and additionally returning the
first 10 results for each facet (or category) in one query?

I have the requirement to produce a page like this:

Category 1:
- Link 1
- Link 2
- ...
- Link 10

Category 2:
- Link 1
- Link 2
- ...
- Link 10

...

Category n:
- Link 1
- Link 2
- ...
- Link 10

The naive approach would be to perform separate solr queries for each
category, take to top 10 and aggregate the results.
This works, but it's really slow, since there may be up to 40
categories on one page.

Any help is highly appreciated!

Thanks,
Andreas


SV: Top 10 results with faceted queries

2007-02-21 Thread Antonio Eggberg


Andreas Hochsteger <[EMAIL PROTECTED]> skrev:
The naive approach would be to perform separate solr queries for each
category, take to top 10 and aggregate the results.
This works, but it's really slow, since there may be up to 40
categories on one page.

I have also the same question. As far as I can tell reading from the list is 
that you have to do "pre-computed quires" in advance (I am guessing some sort 
of a cron job and then save them in a text file.. just guessing) due to load 
constrains etc.. There is some discussion on the list but I don't know the 
exact procedure.

If you do find out about I would appreciate an answer.

Regards




-

Stava rätt! Stava lätt! Yahoo! Mails stavkontroll tar hand om tryckfelen och 
mycket mer! Få den på http://se.mail.yahoo.com

Re[4]: solr performance

2007-02-21 Thread Jack L
Thanks for all who replied.

> my number 1000 was per minute, not second!

I can't read! :-p

> couple of times today at around 158 documents / sec.

This is not bad at all. How about search performance?
How many concurrent queries have people been having?
What does the response time look like?

> Thanks to the others that clarified.  I run my indexers in
> parallel... but a single instance of Solr (which in turn handles  
> requests in parallel as well).

Do you feel if multi-threaded posting is helpful?
I suppose when solr does indexing, it's bound more
on solr indexer than the poster?

Jack







__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Starting an index...

2007-02-21 Thread Jack L

I have played with the "example" directory for a while.
Everything seems to work well. Now I'd like to start my own
index and I have a few questions.

1. I suppose I can start from copying the whole example
directory and name it myindex. I understand that I need
to modify the solr/conf/schema.xml to suit my data. Besides
that, is there anything else that I must/should change?
I'll take a look at the stopwords.txt, etc. to see if any
changes is required. How about solr.war? Anything else I
need to customize? (I'm not a heavy java developer.)

2. For each index, do I need to copy this directory and start
a solr instance? Is it possible to run one solr instance
for multiple indices?

3. solr comes with jetty and it seems to work pretty well.
Is there any reason that I should switch to tomcat for
production servers?

-- 
Thanks,
Jack

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


internal field max length?

2007-02-21 Thread Brian Whitman
I am sending Solr stored fields of sizes in the 10-50K range. My  
maxFieldLength is 5, and the field in question is a  
solr.TextField. I am finding that fields that have more than a few K  
of text come back "clipped:" if I try to index the field with 40K of  
text, the search result will show only the *last* 5-10K or so, the  
beginning is missing.


Is there somewhere else I should look for a field trim other than  
maxFieldLength?







Re: Re[4]: solr performance

2007-02-21 Thread Mike Klaas

On 2/21/07, Jack L <[EMAIL PROTECTED]> wrote:


> Thanks to the others that clarified.  I run my indexers in
> parallel... but a single instance of Solr (which in turn handles
> requests in parallel as well).

Do you feel if multi-threaded posting is helpful?
I suppose when solr does indexing, it's bound more
on solr indexer than the poster?


It certainly is bound more on solr than the poster, but I've found
multithreading beneficial as it removes whatever latency factors might
exist--http connections, xml parsing, i/o, the poster, etc.  For us,
concurrent analysis was less of a gain, but then again our analysis is
relatively light.

-Mike


Re: internal field max length?

2007-02-21 Thread Yonik Seeley

On 2/21/07, Brian Whitman <[EMAIL PROTECTED]> wrote:

I am sending Solr stored fields of sizes in the 10-50K range. My
maxFieldLength is 5, and the field in question is a
solr.TextField. I am finding that fields that have more than a few K
of text come back "clipped:" if I try to index the field with 40K of
text, the search result will show only the *last* 5-10K or so, the
beginning is missing.

Is there somewhere else I should look for a field trim other than
maxFieldLength?


Ouch... sounds serious (assuming you aren't talking about highlighting).
Could you open a JIRA issue and describe or attach a test that can reproduce it?
I'll try to reproduce this myself in the meantime.

-Yonik


Re: internal field max length?

2007-02-21 Thread Brian Whitman
Ouch... sounds serious (assuming you aren't talking about  
highlighting).
Could you open a JIRA issue and describe or attach a test that can  
reproduce it?

I'll try to reproduce this myself in the meantime.



Not highlighting, no. I'll try to make a test case. I am using the  
SOLR-20 client to post the data, so there's still a chance that's the  
culprit. I will try with straight HTTP.


-Brian



Re: internal field max length?

2007-02-21 Thread Yonik Seeley

On 2/21/07, Brian Whitman <[EMAIL PROTECTED]> wrote:

> Ouch... sounds serious (assuming you aren't talking about
> highlighting).
> Could you open a JIRA issue and describe or attach a test that can
> reproduce it?
> I'll try to reproduce this myself in the meantime.


So far so good for me.
I started with example/exampledocs/solr.xml and added an additional
field value for "features" of size 500K
It starts with "this is the first line", then repeats the ASL over and
over, then
ends with "this is the last line".

I posted via post.sh (curl), and then retrieved by searching for the
id "solr", and
observed the complete field returned.



Not highlighting, no. I'll try to make a test case. I am using the
SOLR-20 client to post the data, so there's still a chance that's the
culprit. I will try with straight HTTP.


please do... that might be it.

-Yonik


Re: internal field max length?

2007-02-21 Thread Brian Whitman

On Feb 21, 2007, at 5:10 PM, Yonik Seeley wrote:


So far so good for me.
I started with example/exampledocs/solr.xml and added an additional
field value for "features" of size 500K
It starts with "this is the first line", then repeats the ASL over and
over, then
ends with "this is the last line".

I posted via post.sh (curl), and then retrieved by searching for the
id "solr", and
observed the complete field returned.



I just did the same thing as you.. with the same results. It must be  
SOLR-20 or some brain dead thing I'm doing (I suspect the latter, but  
we'll see.)


-Brian






Re: Starting an index...

2007-02-21 Thread Chris Hostetter

: 1. I suppose I can start from copying the whole example
: directory and name it myindex. I understand that I need
: to modify the solr/conf/schema.xml to suit my data. Besides
: that, is there anything else that I must/should change?
: I'll take a look at the stopwords.txt, etc. to see if any
: changes is required. How about solr.war? Anything else I
: need to customize? (I'm not a heavy java developer.)

the only things you should need to worry about customizing are in the
solr/conf dir ... you should give a critical eye to all of those files
(there's some zany protwords.txt and synonyms.txt that only make
sense for the example data)

you shouldn't need to customize anything else, except the configuration
for your servlet container to get it to run solr at the URL you want, and
to get it to log things the way you want.

: 2. For each index, do I need to copy this directory and start
: a solr instance? Is it possible to run one solr instance
: for multiple indices?

no, each instance manages a single schema and a single data index -- but
thta schema can allow for various differnt types of documents that don't
need to have anything in common.

: 3. solr comes with jetty and it seems to work pretty well.
: Is there any reason that I should switch to tomcat for
: production servers?

it is entirely personal prefrence ... the use of Jetty shouldn't be
considered an endorsement, it's just a free, pure java servlet container
that was the easiest to bundle into a self contained demo.


-Hoss



Re: Starting an index...

2007-02-21 Thread Erik Hatcher


On Feb 21, 2007, at 4:37 PM, Jack L wrote:

2. For each index, do I need to copy this directory and start
a solr instance? Is it possible to run one solr instance
for multiple indices?


Further on this than Hoss mentioned... you can share a common  
configuration among multiple Solr instances without copying the  
directory by using system property substitutions recently added to Solr:


	


These substitutions work in both schema.xml and solrconfig.xml.

Erik


Re: internal field max length?

2007-02-21 Thread Ryan McKinley

Looks like it was actually an error with SOLR-133 not handling CDATA
properly.  I fixed it and updated the patch.

at least SOLR-20 ins't to blame!


On 2/21/07, Brian Whitman <[EMAIL PROTECTED]> wrote:

On Feb 21, 2007, at 5:10 PM, Yonik Seeley wrote:
>
> So far so good for me.
> I started with example/exampledocs/solr.xml and added an additional
> field value for "features" of size 500K
> It starts with "this is the first line", then repeats the ASL over and
> over, then
> ends with "this is the last line".
>
> I posted via post.sh (curl), and then retrieved by searching for the
> id "solr", and
> observed the complete field returned.


I just did the same thing as you.. with the same results. It must be
SOLR-20 or some brain dead thing I'm doing (I suspect the latter, but
we'll see.)

-Brian







Re: Re[4]: solr performance

2007-02-21 Thread Erik Hatcher


On Feb 21, 2007, at 4:25 PM, Jack L wrote:

couple of times today at around 158 documents / sec.


This is not bad at all. How about search performance?
How many concurrent queries have people been having?
What does the response time look like?


I'm the only user :)   What I've done is a proof-of-concept for our  
library.  We have 3.7M records that I've indexed and faceted.  Search  
performance (in my unrealistic single user scenario) is blazing (50ms  
or so) for purely full-text queries.  For queries that return facets,  
the response times are actually quite good too (~900ms, or less  
depending on the request) - provided the filter cache is warmed and  
large enough.  This is running on my laptop (MacBook Pro, 2GB RAM,  
1.83GHz) - I'm sure on a beefier box it'll only get better.



Thanks to the others that clarified.  I run my indexers in
parallel... but a single instance of Solr (which in turn handles
requests in parallel as well).


Do you feel if multi-threaded posting is helpful?


It depends.  If the data processing can be parallelized and your  
hardware supports it, it can certainly make a big difference... it  
did in my case.  Both CPUs were cooking during my parallel indexing  
runs.


Erik





Re[2]: Starting an index...

2007-02-21 Thread Jack L
Thanks Chris and Eric for the replies. Very helpful.

> no, each instance manages a single schema and a single data index -- but
> thta schema can allow for various differnt types of documents that don't
> need to have anything in common.

Does this mean that as long as I have the schema for all doc
types (which essentially means a larger schema file) set up,
then I can just throw any doc types to it, provided
that there is no conflict among the field names? And
the fields are flat among different doc types?

Is there a way to specify the doc types other than having
it as one of the fields so that I can query against to get
a specific type?

-- 
Best regards,
Jack



__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: Re[2]: Starting an index...

2007-02-21 Thread Erik Hatcher


On Feb 21, 2007, at 9:29 PM, Jack L wrote:


Thanks Chris and Eric for the replies. Very helpful.

no, each instance manages a single schema and a single data index  
-- but
thta schema can allow for various differnt types of documents that  
don't

need to have anything in common.


Does this mean that as long as I have the schema for all doc
types (which essentially means a larger schema file) set up,
then I can just throw any doc types to it, provided
that there is no conflict among the field names?


Wouldn't even matter if there were field name "conflicts".  A field  
by any other name is just a field.  All document types could have a  
"title" field, for example.



And
the fields are flat among different doc types?


I don't understand what you mean by flat here.  By definition, a  
document in Solr/Lucene is "flat" in that it has fields, but no  
hierarchy beyond that.



Is there a way to specify the doc types other than having
it as one of the fields so that I can query against to get
a specific type?


No, there isn't another way.  Solr doesn't impose any semantics on  
the _types of documents_ you index... it's up to the client to do  
that.  But adding a simple "type" field to every document facilitates  
some amazing stuff :)


Erik