Re: Result based sorting for KWIC?
: If you go to http://tkb.mydns.jp:8899/exist/rest/db/new/tkb.xq you will see : what I currently have. Just click search to search for the example, or maybe : delete the last character so that you get more results (this is not released I don't read Japanese so i'm really not sure what I'm looking at. I see hte search term highlighted, but I don't know enough about the langauge to really understand what is special about hte order of the results... : yet, so don't be surprised if it breaks...). You will see the search term : highlighted in the middle, context is available from the blue arrow to the : right. The display would be much more useful for the users, if this could be : sorted on the characters following the hit (ignoring punctuation). Another : option would be to sort on the characters previous to the hit. But in this : case, the sorting has to be reversed, so that if I have: : ABCDFGHI : the sort-key would be constructed as DCBA for this case. That still doesn't really answer a fairly fundemental question i've been trying to understand: *why* would having the results in that order be much more useful to for the users? what are you going to do if the term input more then once in a single document? : would be very slow, so I am looking for other ways. Erik also said that down : the road there might be a sort function that could be called, which is what I : would need here. SOlr can sort your results on any indexed, single value, field - but for something like this you'd need to write your own plugin to do the sorting. Note that your plugin would basically need to do the same thing you currently do on the client, the only real speed performance gain would be in reducing the amount of data sent over the wire. -Hoss
Does solr support runtime index?
Hi, I am aware of lucene but newbie in solr... I want to swith my one of the product to solr for, 1. run time index ( the record which is indexed becomes searchable immediately) 2. faceted search 3. master slave architecture But i have doubt that does solr supports runtime index in master slave architecture ??? If i have configures one master and four slave servers, can i make one of the slave server as a master server for second application ? Any thoughts, pointers in this direction please. - Bhavin pandya
Re: Does solr support runtime index?
On Mar 17, 2008, at 3:20 AM, Bhavin Pandya wrote: Hi, I am aware of lucene but newbie in solr... I want to swith my one of the product to solr for, 1. run time index ( the record which is indexed becomes searchable immediately) Immediately is a bit of a stretch, but say, within 1 minute or so, that is doable. 2. faceted search 3. master slave architecture But i have doubt that does solr supports runtime index in master slave architecture ??? If i have configures one master and four slave servers, can i make one of the slave server as a master server for second application ? Do you mean with a separate index? I suppose it is possible, but I wouldn't think it is recommended. Typically, the worker nodes are there b/c you have such a high query volume that you need the support. Making one of them a master, means taking away, presumably, from query time support. Any thoughts, pointers in this direction please. - Bhavin pandya -- Grant Ingersoll http://www.lucenebootcamp.com Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Does solr support runtime index?
Hi, 1. run time index ( the record which is indexed becomes searchable immediately) Immediately is a bit of a stretch, but say, within 1 minute or so, that is doable. I think its cron job... so it will sync the snapshot of master index to all slaves suppose we have four slave machines. Isnt it expensive to sync the snapshot every min??? If i have configures one master and four slave servers, can i make one of the slave server as a master server for second application ? Do you mean with a separate index? I suppose it is possible, but I wouldn't think it is recommended. Typically, the worker nodes are there b/c you have such a high query volume that you need the support. Making one of them a master, means taking away, presumably, from query time support. That means it is recommended to use one master server for two application but slaves i should not share with indexer. - Bhavin pandya - Original Message - From: "Grant Ingersoll" <[EMAIL PROTECTED]> To: Sent: Monday, March 17, 2008 4:46 PM Subject: Re: Does solr support runtime index? On Mar 17, 2008, at 3:20 AM, Bhavin Pandya wrote: Hi, I am aware of lucene but newbie in solr... I want to swith my one of the product to solr for, 1. run time index ( the record which is indexed becomes searchable immediately) Immediately is a bit of a stretch, but say, within 1 minute or so, that is doable. 2. faceted search 3. master slave architecture But i have doubt that does solr supports runtime index in master slave architecture ??? If i have configures one master and four slave servers, can i make one of the slave server as a master server for second application ? Do you mean with a separate index? I suppose it is possible, but I wouldn't think it is recommended. Typically, the worker nodes are there b/c you have such a high query volume that you need the support. Making one of them a master, means taking away, presumably, from query time support. Any thoughts, pointers in this direction please. - Bhavin pandya -- Grant Ingersoll http://www.lucenebootcamp.com Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
sorting on aggregate averages
Hi, I have a problem of returning an list of results which is sorted on a average of ranks returned from aggregates. the qury would be something like ? q=product:p1+product:p2+product:p3; sort score desc To explain Supose I have documents with fields Product, Manufacturer, Rank and I want to return the top manufacturers across products p1,p2,p3 with highest average rank on these products. One way is to create a store of search results and then group and compute the average and sort the result. Can it be done from lucene/ solr itself? if so how? umar
Smart way of indexing for Better performance
Hi, I have the following use case. I could implement the solution but performance is affected. I need some smart ways of doing this. Use Case : Incoming data has two fields which have values like 'WAL MART STORES INC' and 'wal-mart-stores-inc'. Users can search the data either in 'walmart' 'wal mart' or 'wal-mart' also partially on any part of the name from the start of word like 'wal', 'walm' 'wal m' etc . I could get the solution by using two indexes, one as text field for the first field (wal mart ) column and sub word wal-mart-stores (with WordDelimiterFilterFactory filter). Is there a smart way of doing or any other techniques to boost the performance? I need to use them for a high traffic application where the response requirements are around 50 milli seconds. I have some control on modifying the incoming data. Can someone suggest better ways of implementing. I can provide more information the tokens and filters I am using. Thanks Ravi -- View this message in context: http://www.nabble.com/Smart-way-of-indexing-for-Better-performance-tp16092886p16092886.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sorting on a multivalued field
: It appears that adding sort functions would be done in Lucene, and not : in solr. I'm not sure I want to go down that path, so I'm wondering : if there's a way to accomplish this with solr. From recent : discussions, it sounds like I might be able to do this with some boost : magic. Unfortunately, I haven't found any examples of boosting that : seem close to what I want to do. I can't think of anyway to accomplish anything like this without writing some custom Java code in Solr ... either some custom ValueSources to use in FunctionQuery, or a custom Sort object. -Hoss
ResponseBuilder public flags
Hi, I'm working on a custom SearchComponent to display context stored in payloads. I noticed that both the FacetComponent and the HighlightComponent are tightly coupled with the ResponseBuilder through the frequent use of doFacet and doHighlight. If I am building a component with similar functionality to highlighting/faceting that will need to check a similar flag how can I do this as a plugin (ie without making any modification to the ResponseBuilder)? How are people feeling about the stability of this API? Is this the right way to approach this? Thanks, Tricia
sort by uniq fields
I have 30 millions document indexed and tried sort by "sequenceid" which is unique over the document. I am experiencing "very slow" than sort by pub_date. sequenceid is not defined as "unique key" in the schema.xml and there is the "unique key" defined in schema.xml - item_id. Anyone knows why? Thanks, Jae
Re: sort by uniq fields
The first time sort on a particular field is normally slow. Subsequent sorts on the same field should be just as fast. -Yonik On Mon, Mar 17, 2008 at 4:42 PM, Jae Joo <[EMAIL PROTECTED]> wrote: > I have 30 millions document indexed and tried sort by "sequenceid" which is > unique over the document. > I am experiencing "very slow" than sort by pub_date. > sequenceid is not defined as "unique key" in the schema.xml and there is the > "unique key" defined in schema.xml - item_id. > > Anyone knows why? > > Thanks, > > Jae >
Date Range Query + Fields
Hi, I'm working on an application where the documents in the solr index might only be relevant to users within a date range. We are storing a start_date and an end_date in the index for each document that defines the range for which the document is relevant. These date ranges in the document might be one day long or an entire season. We want to allow users to retrieve a list of all documents that will be relevant in the next 30 days, or combine that restriction with a search term. If our documents had a single relevant date instead of a range this would be easy. Its not clear if it is possible for solr to index and query with a date range and the same time. The only thing I have been able to come up with is storing the date range as a single multivalued field containing a record for each of the days within the range. This seems inelegant and doesn't really work well for long date ranges where you would have to store hundreds of values in the multivalued field. (i'm not even sure if it works, I figured I would ask the list if there was a better solution before trying it). Thanks in advance. Please let me know if I can provide any other information that might help. -Nathan
Re: Date Range Query + Fields
Nevermind, this is actually easy: StartDate: [NOW TO NOW+30DAY] AND EndDate: [NOW TO NOW+30DAY] -Nathan On 3/17/08, Nathan Woodhull <[EMAIL PROTECTED]> wrote: > Hi, > > I'm working on an application where the documents in the solr index > might only be relevant to users within a date range. We are storing a > start_date and an end_date in the index for each document that defines > the range for which the document is relevant. These date ranges in the > document might be one day long or an entire season. > > We want to allow users to retrieve a list of all documents that will > be relevant in the next 30 days, or combine that restriction with a > search term. If our documents had a single relevant date instead of a > range this would be easy. Its not clear if it is possible for solr to > index and query with a date range and the same time. > > The only thing I have been able to come up with is storing the date > range as a single multivalued field containing a record for each of > the days within the range. This seems inelegant and doesn't really > work well for long date ranges where you would have to store hundreds > of values in the multivalued field. (i'm not even sure if it works, I > figured I would ask the list if there was a better solution before > trying it). > > Thanks in advance. Please let me know if I can provide any other > information that might help. > > > -Nathan >
Re: Date Range Query + Fields
Actually, it doesn't. This does not take care of documents that extend beyond the bounds of the current 30 day window... which are relevant even though both the start and end are not within the range. For instance: A document with a start_date of 1/1/08 and an end_date of 3/1/08 should still match for a search of the range 2/1/08 to 2/2/08. -Nathan On 3/17/08, Nathan Woodhull <[EMAIL PROTECTED]> wrote: > Nevermind, this is actually easy: > > StartDate: [NOW TO NOW+30DAY] AND EndDate: [NOW TO NOW+30DAY] > > > -Nathan > > > On 3/17/08, Nathan Woodhull <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I'm working on an application where the documents in the solr index > > might only be relevant to users within a date range. We are storing a > > start_date and an end_date in the index for each document that defines > > the range for which the document is relevant. These date ranges in the > > document might be one day long or an entire season. > > > > We want to allow users to retrieve a list of all documents that will > > be relevant in the next 30 days, or combine that restriction with a > > search term. If our documents had a single relevant date instead of a > > range this would be easy. Its not clear if it is possible for solr to > > index and query with a date range and the same time. > > > > The only thing I have been able to come up with is storing the date > > range as a single multivalued field containing a record for each of > > the days within the range. This seems inelegant and doesn't really > > work well for long date ranges where you would have to store hundreds > > of values in the multivalued field. (i'm not even sure if it works, I > > figured I would ask the list if there was a better solution before > > trying it). > > > > Thanks in advance. Please let me know if I can provide any other > > information that might help. > > > > > > -Nathan > > >
Re: Performance of Filter Query
where 'distribution' of queried single-value field is extemely low, such as fq=country:USA Standard query is 1 times faster than less intelligent Does anyone experience similar staff? It's probably specific to [* TO *] which was stupid in this case... try: q=+california +country:USA vs q=california&fq=country:USA and then try it twice. I'm no expert on this, but I think the advantage is that fq does not affect the score and can be cached -- it is just includes a set of docs. [* TO *] iterates over all docs in the index, so that is not a fair comparison... ryan
Re: ResponseBuilder public flags
Tricia Williams wrote: Hi, I'm working on a custom SearchComponent to display context stored in payloads. I noticed that both the FacetComponent and the HighlightComponent are tightly coupled with the ResponseBuilder through the frequent use of doFacet and doHighlight. If I am building a component with similar functionality to highlighting/faceting that will need to check a similar flag how can I do this as a plugin (ie without making any modification to the ResponseBuilder)? How are people feeling about the stability of this API? Is this the right way to approach this? An early version of the ResponseBuilder allowed you to subclass the ResponseBuilder for just this reason. We went back and forth on this and in the end tossed it because it felt too complicated. For custom components, the safest (and most API stable) way to communicate with itself is via the SolrQueryRequest context. That is, you can get/put values into the request: Map context = rb.req.getContext(); context.put( "mykey", bean ); bean.doStuff = true; ... bean = context.get( "mykey" ); As for API stability... I doubt it will have a complete overhaul before 1.3, but I am confident that there will be tweeks. Please give feedback on waht would be better too so we can fix it before things get locked down. ryan Thanks, Tricia
Re: Performance of Filter Query
On Mon, Mar 17, 2008 at 5:59 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > [* TO *] iterates over all docs in the index, so that is not a fair > comparison... It's equivalent to iterating over all docs in the index, but it's worse since it iterates over all terms in the field and then all docs for each term (which should be just 1 for the id field). Iterating over all docs of a single term (even if it's every doc in the index) is much faster. -Yonik
Re: Result based sorting for KWIC?
Chris Hostetter wrote: That still doesn't really answer a fairly fundemental question i've been trying to understand: *why* would having the results in that order be much more useful to for the users? Well, there are several reasons: One is that it allows users to easily spot related entries, for example quotations of a text appearing within another document. Another reason is that it allows to easily detect linguistic patterns. Of course, this is not the only sorting to be offered, but the one I am currently struggling with and trying to evaluate whether Solr would be of help here. what are you going to do if the term input more then once in a single document? The KWIC representation is generated for every hit, so if there are 5 matches in a doc, you get five hits. SOlr can sort your results on any indexed, single value, field - but for something like this you'd need to write your own plugin to do the sorting. Note that your plugin would basically need to do the same thing you currently do on the client, the only real speed performance gain would be in reducing the amount of data sent over the wire. Indeed. Except that Solr might be able to use mature, efficient and well-debugged code to do that, which I can't say about my client code. Well, not knowing anything about the internals used in Solr (or Lucene for that matter), I just assumed that this in some sense parallels the way a ranking value is calculated for a search term and then the results are sorted by relevance. But I think I have enough information now to decide how to proceed. Christian -- Christian Wittern Institute for Research in Humanities, Kyoto University 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN
DataImportHandler and MultiCore
Hi, Sorry I could have sworn I read a snippet about this somewhere and having trouble tracking it back down. Im interested in (possible) using DataImportHandler to run MultiCore (n+ indexes) ... what Id like to do is have 2 indexes (/news and /video) and run them in a single instance and load them w/ deltas from a few different tables. Any links, much appreciated? Thanks! - Jon
Re: DataImportHandler and MultiCore
Hi Jon, For general information related to using multiple cores, see http://wiki.apache.org/solr/MultiCore Apart from that, configuration of DataImportHandler does not change in any way when using it with multiple cores. The only thing that changes is the URL which has the core name now e.g. http://localhost:8983/solr/news/dataimport and http://localhost:8983/solr/video/dataimport in your case. It would nice if you can share any feedback/suggestions you have on DataImportHandler On Tue, Mar 18, 2008 at 8:04 AM, Jon Baer <[EMAIL PROTECTED]> wrote: > Hi, > > Sorry I could have sworn I read a snippet about this somewhere and > having trouble tracking it back down. Im interested in (possible) > using DataImportHandler to run MultiCore (n+ indexes) ... what Id like > to do is have 2 indexes (/news and /video) and run them in a single > instance and load them w/ deltas from a few different tables. > > Any links, much appreciated? > > Thanks! > > - Jon > -- Regards, Shalin Shekhar Mangar.
Re: Date Range Query + Fields
Hi Nathan, We had a similiar problem but with a numeric field and we had solved it by keeping both start and end range as one multivalued field. Then your first query will get you the desired results. On Tue, Mar 18, 2008 at 2:52 AM, Nathan Woodhull <[EMAIL PROTECTED]> wrote: > Actually, it doesn't. This does not take care of documents that extend > beyond the bounds of the current 30 day window... which are relevant > even though both the start and end are not within the range. > > For instance: A document with a start_date of 1/1/08 and an end_date > of 3/1/08 should still match for a search of the range 2/1/08 to > 2/2/08. > > > > -Nathan > > On 3/17/08, Nathan Woodhull <[EMAIL PROTECTED]> wrote: > > Nevermind, this is actually easy: > > > > StartDate: [NOW TO NOW+30DAY] AND EndDate: [NOW TO NOW+30DAY] > > > > > > -Nathan > > > > > > On 3/17/08, Nathan Woodhull <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > I'm working on an application where the documents in the solr index > > > might only be relevant to users within a date range. We are storing a > > > start_date and an end_date in the index for each document that defines > > > the range for which the document is relevant. These date ranges in the > > > document might be one day long or an entire season. > > > > > > We want to allow users to retrieve a list of all documents that will > > > be relevant in the next 30 days, or combine that restriction with a > > > search term. If our documents had a single relevant date instead of a > > > range this would be easy. Its not clear if it is possible for solr to > > > index and query with a date range and the same time. > > > > > > The only thing I have been able to come up with is storing the date > > > range as a single multivalued field containing a record for each of > > > the days within the range. This seems inelegant and doesn't really > > > work well for long date ranges where you would have to store hundreds > > > of values in the multivalued field. (i'm not even sure if it works, I > > > figured I would ask the list if there was a better solution before > > > trying it). > > > > > > Thanks in advance. Please let me know if I can provide any other > > > information that might help. > > > > > > > > > -Nathan > > > > > > -- Regards, Shalin Shekhar Mangar.
Re: Does solr support runtime index?
Bhavin - one of the nice things about Solr's index replication is that *typically* only changed/new index files are sent from master to slave, and this is typically cheap. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Bhavin Pandya <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, March 17, 2008 8:12:53 AM Subject: Re: Does solr support runtime index? Hi, >> 1. run time index ( the record which is indexed becomes searchable >> immediately) > Immediately is a bit of a stretch, but say, within 1 minute or so, that > is doable. I think its cron job... so it will sync the snapshot of master index to all slaves suppose we have four slave machines. Isnt it expensive to sync the snapshot every min??? >> If i have configures one master and four slave servers, can i make one >> of the slave server as a master server for second application ? > Do you mean with a separate index? I suppose it is possible, but I > wouldn't think it is recommended. Typically, the worker nodes are there > b/c you have such a high query volume that you need the support. Making > one of them a master, means taking away, presumably, from query time > support. That means it is recommended to use one master server for two application but slaves i should not share with indexer. - Bhavin pandya - Original Message - From: "Grant Ingersoll" <[EMAIL PROTECTED]> To: Sent: Monday, March 17, 2008 4:46 PM Subject: Re: Does solr support runtime index? > > On Mar 17, 2008, at 3:20 AM, Bhavin Pandya wrote: > >> Hi, >> >> I am aware of lucene but newbie in solr... >> I want to swith my one of the product to solr for, >> 1. run time index ( the record which is indexed becomes searchable >> immediately) > > Immediately is a bit of a stretch, but say, within 1 minute or so, that > is doable. > >> >> 2. faceted search >> 3. master slave architecture >> >> But i have doubt that does solr supports runtime index in master slave >> architecture ??? >> >> If i have configures one master and four slave servers, can i make one >> of the slave server as a master server for second application ? > > Do you mean with a separate index? I suppose it is possible, but I > wouldn't think it is recommended. Typically, the worker nodes are there > b/c you have such a high query volume that you need the support. Making > one of them a master, means taking away, presumably, from query time > support. > >> >> >> Any thoughts, pointers in this direction please. >> >> - Bhavin pandya > > -- > Grant Ingersoll > http://www.lucenebootcamp.com > Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > >