Re: Result based sorting for KWIC?

2008-03-17 Thread Chris Hostetter

: If you go to http://tkb.mydns.jp:8899/exist/rest/db/new/tkb.xq you will see
: what I currently have.  Just click search to search for the example, or maybe
: delete the last character so that you get more results (this is not released

I don't read Japanese so i'm really not sure what I'm looking at.  I see 
hte search term highlighted, but I don't know enough about the langauge to 
really understand what is special about hte order of the results...

: yet, so don't be surprised if it breaks...). You will see the search term
: highlighted in the middle, context is available from the blue arrow to the
: right.  The display would be much more useful for the users, if this could be
: sorted on the characters following the hit (ignoring punctuation).  Another
: option would be to sort on the characters previous to the hit.  But in this
: case, the sorting has to be reversed, so that if I have:
:  ABCDFGHI
: the sort-key would be constructed as DCBA for this case.

That still doesn't really answer a fairly fundemental question i've been 
trying to understand: *why* would having the results in that order be much 
more useful to for the users? 

what are you going to do if the term input more then once in a single document?

: would be very slow, so I am looking for other ways.   Erik also said that down
: the road there might be a sort function that could be called, which is what I
: would need here. 

SOlr can sort your results on any indexed, single value, field - but for 
something like this you'd need to write your own plugin to do the sorting.  
Note that your plugin would basically need to do the same thing you 
currently do on the client, the only real speed performance gain would be 
in reducing the amount of data sent over the wire.


-Hoss



Does solr support runtime index?

2008-03-17 Thread Bhavin Pandya
Hi,

I am aware of lucene but newbie in solr...
I want to swith my one of the product to solr for,
1. run time index ( the record which is indexed becomes searchable immediately)
2. faceted search
3. master slave architecture

But i have doubt that  does solr supports runtime index in master slave 
architecture ???

If i have configures one master and four slave servers, can i make one of the 
slave server as a master server for second application ?

Any thoughts, pointers in this direction please.

- Bhavin pandya


Re: Does solr support runtime index?

2008-03-17 Thread Grant Ingersoll


On Mar 17, 2008, at 3:20 AM, Bhavin Pandya wrote:


Hi,

I am aware of lucene but newbie in solr...
I want to swith my one of the product to solr for,
1. run time index ( the record which is indexed becomes searchable  
immediately)


Immediately is a bit of a stretch, but say, within 1 minute or so,  
that is doable.




2. faceted search
3. master slave architecture

But i have doubt that  does solr supports runtime index in master  
slave architecture ???


If i have configures one master and four slave servers, can i make  
one of the slave server as a master server for second application ?


Do you mean with a separate index?  I suppose it is possible, but I  
wouldn't think it is recommended.  Typically, the worker nodes are  
there b/c you have such a high query volume that you need the  
support.  Making one of them a master, means taking away, presumably,  
from query time support.





Any thoughts, pointers in this direction please.

- Bhavin pandya


--
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







Re: Does solr support runtime index?

2008-03-17 Thread Bhavin Pandya

Hi,

1. run time index ( the record which is indexed becomes searchable 
immediately)


Immediately is a bit of a stretch, but say, within 1 minute or so,  that 
is doable.


I think its cron job... so it will sync the snapshot of master index to all 
slaves

suppose we have four slave machines.
Isnt it expensive to sync the snapshot every min???

If i have configures one master and four slave servers, can i make  one 
of the slave server as a master server for second application ?


Do you mean with a separate index?  I suppose it is possible, but I 
wouldn't think it is recommended.  Typically, the worker nodes are  there 
b/c you have such a high query volume that you need the  support.  Making 
one of them a master, means taking away, presumably,  from query time 
support.


That means it is recommended to use one master server for two application 
but slaves i should not share with indexer.


- Bhavin pandya

- Original Message - 
From: "Grant Ingersoll" <[EMAIL PROTECTED]>

To: 
Sent: Monday, March 17, 2008 4:46 PM
Subject: Re: Does solr support runtime index?




On Mar 17, 2008, at 3:20 AM, Bhavin Pandya wrote:


Hi,

I am aware of lucene but newbie in solr...
I want to swith my one of the product to solr for,
1. run time index ( the record which is indexed becomes searchable 
immediately)


Immediately is a bit of a stretch, but say, within 1 minute or so,  that 
is doable.




2. faceted search
3. master slave architecture

But i have doubt that  does solr supports runtime index in master  slave 
architecture ???


If i have configures one master and four slave servers, can i make  one 
of the slave server as a master server for second application ?


Do you mean with a separate index?  I suppose it is possible, but I 
wouldn't think it is recommended.  Typically, the worker nodes are  there 
b/c you have such a high query volume that you need the  support.  Making 
one of them a master, means taking away, presumably,  from query time 
support.





Any thoughts, pointers in this direction please.

- Bhavin pandya


--
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ










sorting on aggregate averages

2008-03-17 Thread Umar Shah
Hi,
I have a problem of returning an list of results which is sorted on a
average of ranks returned from aggregates.
the qury would be something like ?
q=product:p1+product:p2+product:p3; sort score desc
To explain Supose I have documents with fields Product, Manufacturer, Rank
and I want to return the top manufacturers across products p1,p2,p3 with
highest average rank on these products.

One way is to create a store of  search results and then group and compute
the average and sort the result. Can it be done from lucene/ solr itself? if
so how?


umar


Smart way of indexing for Better performance

2008-03-17 Thread Yerraguntla

Hi,
  I have the following use case. I could implement the solution but
performance is affected. I need some smart ways of doing this.
Use Case :
Incoming data has two fields which have values like 'WAL MART STORES INC' 
and 'wal-mart-stores-inc'.   
Users can search the data either in 'walmart'  'wal mart' or 'wal-mart' 
also partially on any part of the name from the start of word like 'wal',
'walm' 'wal m'  etc .   I could get the solution  by using two indexes, one
as text field for the first field (wal mart ) column and sub word 
wal-mart-stores (with WordDelimiterFilterFactory filter).  

Is there a smart way of doing or any other techniques to boost the
performance? I need to use them for a high traffic application where the
response requirements are around 50 milli seconds.
I have some control on modifying the incoming data. 

Can someone suggest better ways of implementing. I can provide more
information the tokens and filters I am using.

Thanks
Ravi
-- 
View this message in context: 
http://www.nabble.com/Smart-way-of-indexing-for-Better-performance-tp16092886p16092886.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: sorting on a multivalued field

2008-03-17 Thread Chris Hostetter

: It appears that adding sort functions would be done in Lucene, and not
: in solr.  I'm not sure I want to go down that path, so I'm wondering
: if there's a way to accomplish this with solr.  From recent
: discussions, it sounds like I might be able to do this with some boost
: magic.  Unfortunately, I haven't found any examples of boosting that
: seem close to what I want to do.

I can't think of anyway to accomplish anything like this without writing 
some custom Java code in Solr ... either some custom ValueSources to use 
in FunctionQuery, or a custom Sort object.


-Hoss



ResponseBuilder public flags

2008-03-17 Thread Tricia Williams

Hi,

   I'm working on a custom SearchComponent to display context stored in 
payloads.  I noticed that both the FacetComponent and the 
HighlightComponent are tightly coupled with the ResponseBuilder through 
the frequent use of doFacet and doHighlight.  If I am building a 
component with similar functionality to highlighting/faceting that will 
need to check a similar flag how can I do this as a plugin (ie without 
making any modification to the ResponseBuilder)?


   How are people feeling about the stability of this API?  Is this the 
right way to approach this?


Thanks,
Tricia



sort by uniq fields

2008-03-17 Thread Jae Joo
I have 30 millions document indexed and tried sort by "sequenceid" which is
unique over the document.
I am experiencing "very slow" than sort by pub_date.
sequenceid is not defined as "unique key" in the schema.xml and there is the
"unique key" defined in schema.xml - item_id.

Anyone knows why?

Thanks,

Jae


Re: sort by uniq fields

2008-03-17 Thread Yonik Seeley
The first time sort on a particular field is normally slow.
Subsequent sorts on the same field should be just as fast.

-Yonik

On Mon, Mar 17, 2008 at 4:42 PM, Jae Joo <[EMAIL PROTECTED]> wrote:
> I have 30 millions document indexed and tried sort by "sequenceid" which is
>  unique over the document.
>  I am experiencing "very slow" than sort by pub_date.
>  sequenceid is not defined as "unique key" in the schema.xml and there is the
>  "unique key" defined in schema.xml - item_id.
>
>  Anyone knows why?
>
>  Thanks,
>
>  Jae
>


Date Range Query + Fields

2008-03-17 Thread Nathan Woodhull
Hi,

I'm working on an application where the documents in the solr index
might only be relevant to users within a date range. We are storing a
start_date and an end_date in the index for each document that defines
the range for which the document is relevant. These date ranges in the
document might be one day long or an entire season.

We want to allow users to retrieve a list of all documents that will
be relevant in the next 30 days, or combine that restriction with a
search term. If our documents had a single relevant date instead of a
range this would be easy. Its not clear if it is possible for solr to
index and query with a date range and the same time.

The only thing I have been able to come up with is storing the date
range as a single multivalued  field containing a record for each of
the days within the range. This seems inelegant and doesn't really
work well for long date ranges where you would have to store hundreds
of values in the multivalued field. (i'm not even sure if it works, I
figured I would ask the list if there was a better solution before
trying it).

Thanks in advance. Please let me know if I can provide any other
information that might help.

-Nathan


Re: Date Range Query + Fields

2008-03-17 Thread Nathan Woodhull
Nevermind, this is actually easy:

StartDate: [NOW TO NOW+30DAY] AND EndDate: [NOW TO NOW+30DAY]

-Nathan

On 3/17/08, Nathan Woodhull <[EMAIL PROTECTED]> wrote:
> Hi,
>
>  I'm working on an application where the documents in the solr index
>  might only be relevant to users within a date range. We are storing a
>  start_date and an end_date in the index for each document that defines
>  the range for which the document is relevant. These date ranges in the
>  document might be one day long or an entire season.
>
>  We want to allow users to retrieve a list of all documents that will
>  be relevant in the next 30 days, or combine that restriction with a
>  search term. If our documents had a single relevant date instead of a
>  range this would be easy. Its not clear if it is possible for solr to
>  index and query with a date range and the same time.
>
>  The only thing I have been able to come up with is storing the date
>  range as a single multivalued  field containing a record for each of
>  the days within the range. This seems inelegant and doesn't really
>  work well for long date ranges where you would have to store hundreds
>  of values in the multivalued field. (i'm not even sure if it works, I
>  figured I would ask the list if there was a better solution before
>  trying it).
>
>  Thanks in advance. Please let me know if I can provide any other
>  information that might help.
>
>
>  -Nathan
>


Re: Date Range Query + Fields

2008-03-17 Thread Nathan Woodhull
Actually, it doesn't. This does not take care of documents that extend
beyond the bounds of the current 30 day window... which are relevant
even though both the start and end are not within the range.

For instance: A document with a start_date of 1/1/08 and an end_date
of 3/1/08 should still match for a search of the range 2/1/08 to
2/2/08.

-Nathan

On 3/17/08, Nathan Woodhull <[EMAIL PROTECTED]> wrote:
> Nevermind, this is actually easy:
>
>  StartDate: [NOW TO NOW+30DAY] AND EndDate: [NOW TO NOW+30DAY]
>
>
>  -Nathan
>
>
>  On 3/17/08, Nathan Woodhull <[EMAIL PROTECTED]> wrote:
>  > Hi,
>  >
>  >  I'm working on an application where the documents in the solr index
>  >  might only be relevant to users within a date range. We are storing a
>  >  start_date and an end_date in the index for each document that defines
>  >  the range for which the document is relevant. These date ranges in the
>  >  document might be one day long or an entire season.
>  >
>  >  We want to allow users to retrieve a list of all documents that will
>  >  be relevant in the next 30 days, or combine that restriction with a
>  >  search term. If our documents had a single relevant date instead of a
>  >  range this would be easy. Its not clear if it is possible for solr to
>  >  index and query with a date range and the same time.
>  >
>  >  The only thing I have been able to come up with is storing the date
>  >  range as a single multivalued  field containing a record for each of
>  >  the days within the range. This seems inelegant and doesn't really
>  >  work well for long date ranges where you would have to store hundreds
>  >  of values in the multivalued field. (i'm not even sure if it works, I
>  >  figured I would ask the list if there was a better solution before
>  >  trying it).
>  >
>  >  Thanks in advance. Please let me know if I can provide any other
>  >  information that might help.
>  >
>  >
>  >  -Nathan
>  >
>


Re: Performance of Filter Query

2008-03-17 Thread Ryan McKinley


where 'distribution' of queried single-value field is extemely low, such as
fq=country:USA

Standard query  is 1 times faster than less intelligent


Does anyone experience similar staff? 


It's probably specific to [* TO *] which was stupid in this case...



try:
 q=+california +country:USA
vs
 q=california&fq=country:USA
and then try it twice.

I'm no expert on this, but I think the advantage is that fq does not 
affect the score and can be cached -- it is just includes a set of docs.


[* TO *] iterates over all docs in the index, so that is not a fair 
comparison...


ryan



Re: ResponseBuilder public flags

2008-03-17 Thread Ryan McKinley

Tricia Williams wrote:

Hi,

   I'm working on a custom SearchComponent to display context stored in 
payloads.  I noticed that both the FacetComponent and the 
HighlightComponent are tightly coupled with the ResponseBuilder through 
the frequent use of doFacet and doHighlight.  If I am building a 
component with similar functionality to highlighting/faceting that will 
need to check a similar flag how can I do this as a plugin (ie without 
making any modification to the ResponseBuilder)?


   How are people feeling about the stability of this API?  Is this the 
right way to approach this?




An early version of the ResponseBuilder allowed you to subclass the 
ResponseBuilder for just this reason.  We went back and forth on this 
and in the end tossed it because it felt too complicated.


For custom components, the safest (and most API stable) way to 
communicate with itself is via the SolrQueryRequest context.  That is, 
you can get/put values into the request:


  Map context = rb.req.getContext();
  context.put( "mykey", bean );
  bean.doStuff = true;
  ...
  bean = context.get( "mykey" );

As for API stability...  I doubt it will have a complete overhaul before 
1.3, but I am confident that there will be tweeks.  Please give feedback 
on waht would be better too so we can fix it before things get locked down.


ryan






Thanks,
Tricia






Re: Performance of Filter Query

2008-03-17 Thread Yonik Seeley
On Mon, Mar 17, 2008 at 5:59 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
>  [* TO *] iterates over all docs in the index, so that is not a fair
>  comparison...

It's equivalent to iterating over all docs in the index, but it's
worse since it iterates over all terms in the field and then all docs
for each term (which should be just 1 for the id field).

Iterating over all docs of a single term (even if it's every doc in
the index) is much faster.

-Yonik


Re: Result based sorting for KWIC?

2008-03-17 Thread Christian Wittern

Chris Hostetter wrote:


That still doesn't really answer a fairly fundemental question i've been 
trying to understand: *why* would having the results in that order be much 
more useful to for the users? 
  
Well, there are several reasons: One is that it allows users to easily 
spot related entries, for example quotations of a text appearing within 
another document.  Another reason is that it allows to easily detect 
linguistic patterns.


Of course, this is not the only sorting to be offered, but the one I am 
currently struggling with and trying to evaluate whether Solr would be 
of help here.

what are you going to do if the term input more then once in a single document?
  
The KWIC representation is generated for every hit, so if there are 5 
matches in a doc, you get five hits.


SOlr can sort your results on any indexed, single value, field - but for 
something like this you'd need to write your own plugin to do the sorting.  
Note that your plugin would basically need to do the same thing you 
currently do on the client, the only real speed performance gain would be 
in reducing the amount of data sent over the wire.
  
Indeed.  Except that Solr might be able to use mature, efficient and 
well-debugged code to do that, which I can't say about my client code.  
Well, not knowing anything about the internals used in Solr (or Lucene 
for that matter), I just assumed that this in some sense parallels the 
way a ranking value is calculated for a search term and then the results 
are sorted by relevance.


But I think I have enough information now to decide how to proceed.

Christian

--

Christian Wittern 
Institute for Research in Humanities, Kyoto University

47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN



DataImportHandler and MultiCore

2008-03-17 Thread Jon Baer

Hi,

Sorry I could have sworn I read a snippet about this somewhere and  
having trouble tracking it back down.  Im interested in (possible)  
using DataImportHandler to run MultiCore (n+ indexes) ... what Id like  
to do is have 2 indexes (/news and /video) and run them in a single  
instance and load them w/ deltas from a few different tables.


Any links, much appreciated?

Thanks!

- Jon


Re: DataImportHandler and MultiCore

2008-03-17 Thread Shalin Shekhar Mangar
Hi Jon,

For general information related to using multiple cores, see
http://wiki.apache.org/solr/MultiCore

Apart from that, configuration of DataImportHandler does not change in
any way when using it with multiple cores. The only thing that changes
is the URL which has the core name now e.g.
http://localhost:8983/solr/news/dataimport and
http://localhost:8983/solr/video/dataimport in your case.

It would nice if you can share any feedback/suggestions you have on
DataImportHandler

On Tue, Mar 18, 2008 at 8:04 AM, Jon Baer <[EMAIL PROTECTED]> wrote:
> Hi,
>
>  Sorry I could have sworn I read a snippet about this somewhere and
>  having trouble tracking it back down.  Im interested in (possible)
>  using DataImportHandler to run MultiCore (n+ indexes) ... what Id like
>  to do is have 2 indexes (/news and /video) and run them in a single
>  instance and load them w/ deltas from a few different tables.
>
>  Any links, much appreciated?
>
>  Thanks!
>
>  - Jon
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Date Range Query + Fields

2008-03-17 Thread Shalin Shekhar Mangar
Hi Nathan,

We had a similiar problem but with a numeric field and we had solved
it by keeping both start and end range as one multivalued field. Then
your first query will get you the desired results.

On Tue, Mar 18, 2008 at 2:52 AM, Nathan Woodhull <[EMAIL PROTECTED]> wrote:
> Actually, it doesn't. This does not take care of documents that extend
>  beyond the bounds of the current 30 day window... which are relevant
>  even though both the start and end are not within the range.
>
>  For instance: A document with a start_date of 1/1/08 and an end_date
>  of 3/1/08 should still match for a search of the range 2/1/08 to
>  2/2/08.
>
>
>
>  -Nathan
>
>  On 3/17/08, Nathan Woodhull <[EMAIL PROTECTED]> wrote:
>  > Nevermind, this is actually easy:
>  >
>  >  StartDate: [NOW TO NOW+30DAY] AND EndDate: [NOW TO NOW+30DAY]
>  >
>  >
>  >  -Nathan
>  >
>  >
>  >  On 3/17/08, Nathan Woodhull <[EMAIL PROTECTED]> wrote:
>  >  > Hi,
>  >  >
>  >  >  I'm working on an application where the documents in the solr index
>  >  >  might only be relevant to users within a date range. We are storing a
>  >  >  start_date and an end_date in the index for each document that defines
>  >  >  the range for which the document is relevant. These date ranges in the
>  >  >  document might be one day long or an entire season.
>  >  >
>  >  >  We want to allow users to retrieve a list of all documents that will
>  >  >  be relevant in the next 30 days, or combine that restriction with a
>  >  >  search term. If our documents had a single relevant date instead of a
>  >  >  range this would be easy. Its not clear if it is possible for solr to
>  >  >  index and query with a date range and the same time.
>  >  >
>  >  >  The only thing I have been able to come up with is storing the date
>  >  >  range as a single multivalued  field containing a record for each of
>  >  >  the days within the range. This seems inelegant and doesn't really
>  >  >  work well for long date ranges where you would have to store hundreds
>  >  >  of values in the multivalued field. (i'm not even sure if it works, I
>  >  >  figured I would ask the list if there was a better solution before
>  >  >  trying it).
>  >  >
>  >  >  Thanks in advance. Please let me know if I can provide any other
>  >  >  information that might help.
>  >  >
>  >  >
>  >  >  -Nathan
>  >  >
>  >
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Does solr support runtime index?

2008-03-17 Thread Otis Gospodnetic
Bhavin - one of the nice things about Solr's index replication is that 
*typically* only changed/new index files are sent from master to slave, and 
this is typically cheap.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Bhavin Pandya <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, March 17, 2008 8:12:53 AM
Subject: Re: Does solr support runtime index?

Hi,

>> 1. run time index ( the record which is indexed becomes searchable 
>> immediately)

> Immediately is a bit of a stretch, but say, within 1 minute or so,  that 
> is doable.

I think its cron job... so it will sync the snapshot of master index to all 
slaves
suppose we have four slave machines.
Isnt it expensive to sync the snapshot every min???

>> If i have configures one master and four slave servers, can i make  one 
>> of the slave server as a master server for second application ?

> Do you mean with a separate index?  I suppose it is possible, but I 
> wouldn't think it is recommended.  Typically, the worker nodes are  there 
> b/c you have such a high query volume that you need the  support.  Making 
> one of them a master, means taking away, presumably,  from query time 
> support.

That means it is recommended to use one master server for two application 
but slaves i should not share with indexer.

- Bhavin pandya

- Original Message - 
From: "Grant Ingersoll" <[EMAIL PROTECTED]>
To: 
Sent: Monday, March 17, 2008 4:46 PM
Subject: Re: Does solr support runtime index?


>
> On Mar 17, 2008, at 3:20 AM, Bhavin Pandya wrote:
>
>> Hi,
>>
>> I am aware of lucene but newbie in solr...
>> I want to swith my one of the product to solr for,
>> 1. run time index ( the record which is indexed becomes searchable 
>> immediately)
>
> Immediately is a bit of a stretch, but say, within 1 minute or so,  that 
> is doable.
>
>>
>> 2. faceted search
>> 3. master slave architecture
>>
>> But i have doubt that  does solr supports runtime index in master  slave 
>> architecture ???
>>
>> If i have configures one master and four slave servers, can i make  one 
>> of the slave server as a master server for second application ?
>
> Do you mean with a separate index?  I suppose it is possible, but I 
> wouldn't think it is recommended.  Typically, the worker nodes are  there 
> b/c you have such a high query volume that you need the  support.  Making 
> one of them a master, means taking away, presumably,  from query time 
> support.
>
>>
>>
>> Any thoughts, pointers in this direction please.
>>
>> - Bhavin pandya
>
> --
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>