breadcrumb in Solr
Hi, I am looking for the way to have "breadcrumb" Is there any way to get thoes kind of information from Solr search result.. Thanks, Jae Joo
Function Queries
Hello all, I am struggling with FunctionQueries in Solr. Thanks in advance for taking the time to read this and answer my questions. There doesn't seem to be a "how to" page anywhere. I have been to these sites: http://wiki.apache.org/solr/FunctionQuery http://wiki.apache.org/solr/DisMaxRequestHandler#head-14b9ca618089829d139e6f3d6f52ff63e22a80d1 I have also searched through the forums for the answer, but have found none on what I am asking below. What I have come to realize is that if you have or don't have certain parameters, then the dismax will error out or the function query does not work. One example is that if you have mm being blank in the solrConfig.xml and not commented out, then it will throw a NumberFormatException. Another example is that without something in qf, then the query, using dt=dismax in the query request string, does not return any results. So, what I am really looking for here is the proper way to do the whole solrConfig.xml, for the dismax request handler. It seems that I am somehow missing something. The way that I understand it right now is this, for all the fields that will be searched on and a function query will be used, they need to be in the qf parameter. For the function query itself, I have just a field called importancerank which is a float type field. I do not use ord() or reord() or linear() etc... because I just want to take that value of that field and add it to the score. I also have a 0.01 in tie. I have echoParams set to explicit. These are the only parameters that I have set up. I have the rest commented out such as pf, ps, q.alt, and mm. Also, what is fl? I could not find any documentation on that. What happens currently for me is that when I put the dt=dismax parameter in my query request string, I get exactly the same results as if I didn't, meaning it didn't appear to sort it at all. What other parameters do I have to fill out in the request handler to make this work? What might I have done wrong in my thinking of how things work? Thanks again for reading through this and replying to my questions. Another thing that would be helpful is to see a whole solrConfig schema for the dismax request handler. I have only read about bits of it and I think that to get a view of a full one that actually works would be very helpful. Thanks again. Mike -- View this message in context: http://www.nabble.com/Function-Queries-tf4280039.html#a12182596 Sent from the Solr - User mailing list archive at Nabble.com.
Payloads for multiValued fields?
When searching a multiValued field, is it possible to know which of the multiple fields the match was in? For example if I have an index of documents, each of which has multiple image captions stored in separate fields, I'd like to be able to link from the search results to the caption in the original document. One possibility could be attaching metadata to a field, similar to payloads for terms. At the moment all I can think of is adding metadata inside the stored field and stripping that out when it's indexed and displayed, but that's not ideal. alf.
Re: Payloads for multiValued fields?
On 16 Aug 2007, at 17:20, Alf Eaton wrote: When searching a multiValued field, is it possible to know which of the multiple fields the match was in? For example if I have an index of documents, each of which has multiple image captions stored in separate fields, I'd like to be able to link from the search results to the caption in the original document. One possibility could be attaching metadata to a field, similar to payloads for terms. At the moment all I can think of is adding metadata inside the stored field and stripping that out when it's indexed and displayed, but that's not ideal. Actually on reflection all this would need would be for the Highlighter to add a field to the response, saying which item of the multiValued field the match was in. Is that possible? alf.
Re: Payloads for multiValued fields?
On 8/16/07, Alf Eaton <[EMAIL PROTECTED]> wrote: > > On 16 Aug 2007, at 17:20, Alf Eaton wrote: > > > When searching a multiValued field, is it possible to know which of > > the multiple fields the match was in? > > > > For example if I have an index of documents, each of which has > > multiple image captions stored in separate fields, I'd like to be > > able to link from the search results to the caption in the original > > document. > > > > One possibility could be attaching metadata to a field, similar to > > payloads for terms. At the moment all I can think of is adding > > metadata inside the stored field and stripping that out when it's > > indexed and displayed, but that's not ideal. > > Actually on reflection all this would need would be for the > Highlighter to add a field to the response, saying which item of the > multiValued field the match was in. Is that possible? Could you perhaps index the captions as #1 this is the first caption #2 this is the second caption And then when just look for #n in the highlighted results? For display, you could also strip out the #n in the captions. -Yonik
Re: breadcrumb in Solr
What do you mean by "breadcrumbs"? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Aug 16, 2007, at 7:03 AM, Jae Joo wrote: Hi, I am looking for the way to have "breadcrumb" Is there any way to get thoes kind of information from Solr search result.. Thanks, Jae Joo
Re: Payloads for multiValued fields?
On 16 Aug 2007, at 17:34, Yonik Seeley wrote: On 8/16/07, Alf Eaton <[EMAIL PROTECTED]> wrote: On 16 Aug 2007, at 17:20, Alf Eaton wrote: When searching a multiValued field, is it possible to know which of the multiple fields the match was in? For example if I have an index of documents, each of which has multiple image captions stored in separate fields, I'd like to be able to link from the search results to the caption in the original document. One possibility could be attaching metadata to a field, similar to payloads for terms. At the moment all I can think of is adding metadata inside the stored field and stripping that out when it's indexed and displayed, but that's not ideal. Actually on reflection all this would need would be for the Highlighter to add a field to the response, saying which item of the multiValued field the match was in. Is that possible? Could you perhaps index the captions as #1 this is the first caption #2 this is the second caption And then when just look for #n in the highlighted results? For display, you could also strip out the #n in the captions. I think that would probably work, yes - '#1' wouldn't be indexed so wouldn't affect the search results. Thanks, alf.
String collapsing
Does Solr have a processing tool that collapses, say, "E L V I S" to "Elvis", or "D.N.A." to "DNA"?
Re: String collapsing
On 8/16/07, Lance Norskog <[EMAIL PROTECTED]> wrote: > Does Solr have a processing tool that collapses, say, "E L V I S" to > "Elvis", or "D.N.A." to "DNA"? WordDelimiterFilter can be configured to collapse things like D.N.A to DNA, but not if space separated like D N A -Yonik
Replacing existing documents in the index
Hi- We recrawl the same places and update blindly without checking if a document is already in the index. We have a use case where we would like to delete documents (porn) and have them stay deleted. To implement this use case now, we would need to check the existence of the document and check for a 'deleted' flag. Or, we would maintain a separate database of deleted documents that we check against. A more efficient way to do this would be to have a 'do not delete' flag in the document. Delete failures are currently ignored and they would continue to be ignored. Is this a worthwhile addition to 1.3 or 1.4? Thanks for your time, Lance
Re: Replacing existing documents in the index
It sounds like it might be more efficient to implement this at the crawler level to short-circuit crawling whole sites. Baring that, a separate database sounds more flexible. Non-deletable docs doesn't sound like something that should be a general feature. However, one would probably be able to implement custom logic to do this using an update-processor plugin (should be in the next version of Solr) -Yonik On 8/16/07, Lance Norskog <[EMAIL PROTECTED]> wrote: > Hi- > > We recrawl the same places and update blindly without checking if a document > is already in the index. We have a use case where we would like to delete > documents (porn) and have them stay deleted. To implement this use case now, > we would need to check the existence of the document and check for a > 'deleted' flag. Or, we would maintain a separate database of deleted > documents that we check against. > > A more efficient way to do this would be to have a 'do not delete' flag in > the document. Delete failures are currently ignored and they would continue > to be ignored. > > Is this a worthwhile addition to 1.3 or 1.4? > > Thanks for your time, > > Lance >
Re: Function Queries
Hi Yakn, On 17/08/07, Yakn <[EMAIL PROTECTED]> wrote: > One example is that if you have mm being blank in the solrConfig.xml > and not commented out, then it will throw a NumberFormatException. The required format of the mm field is described in more detail here: http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html The parser is pretty fussy about how this field is formatted, when a value is not specified the default value is "100%" which means "match documents that contain every term in the query". Perhaps it might be a good idea to add a simple sanity check to mm testing for empty string? Another example is that without something in qf, then the query, using dt=dismax in the query request string, does not return any results. qf is a required parameter and is needed (in combination with q param) to construct a Lucene query, it won't work without it (as you've discovered). The default values in solrconfig.xml serve as an example and you'll most probably need to change it to match your schema (the value in solrconfig.xmlis only used if qf is not set in your request query string). So, what I am really looking for here is the proper way to do the whole > solrConfig.xml, for the dismax request handler. It seems that I am somehow > missing something. I think the whole point of the values set in the solrconfig.xml included with the distribution are to serve as a guide for you to try with the examples provided. In general most of these default values (that don't refer to specific fields) can be left unmodified and dismax requests will still work fine, however you can change and tweak these parameters to suit your particular requirements if neccessary. > The way that I understand it right now is this, for all > the fields that will be searched on and a function query will be used, > they > need to be in the qf parameter. Only fields that you want to match terms in "q" need to be listed in "qf", it is not necessary to list fields used in a function query there. > For the function query itself, I have just a > field called importancerank which is a float type field. I do not use > ord() > or reord() or linear() etc... because I just want to take that value of > that > field and add it to the score. I haven't tried this myself but it should be as simple as adding the following to your query string: bf=importancerank > I also have a 0.01 in tie. I have echoParams > set to explicit. These are the only parameters that I have set up. I have > the rest commented out such as pf, ps, q.alt, and mm. Also, what is fl? I > could not find any documentation on that. A lot of parameters (including fl) are common and used by both Standard and Dismax request handlers, so you should take a look at: http://wiki.apache.org/solr/CommonQueryParameters What happens currently for me is that when I put the dt=dismax parameter in > my query request string, I get exactly the same results as if I didn't, > meaning it didn't appear to sort it at all. What other parameters do I > have > to fill out in the request handler to make this work? What might I have done > wrong in my thinking of how things work? You'll have to provide more information about your query (e.g. query string parameters, field definitions from schema.xml, list the contents of in solrconfig.xml) in order to see what's going on. Another thing that would be helpful is to see a whole solrConfig schema for > the dismax request handler. I have only > read about bits of it and I think that to get a view of a full one that > actually works would be very helpful. Thanks again. This is the solrconfig.xml that I mentioned earlier, it is provided with the Solr distribution (in /example/solr/conf/): http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/solrconfig.xml Hope this helps, Piete
Re: solr + carrot2
Any updates on this? It certainly would be quite interesting to see how well carrot2 clustering can be integrated with solr, I suppose it's a fairly similar concept to simple faceting (maybe another candidate for SOLR-281 component?). One concern I have is that the additional processing required at query time would make the whole operation significant slower (which is something I'd like to avoid). I've been wondering if it might be possible to calculate (and store) clustering information at index time however since carrot2 seems to use the query term & result set to create clustering info this doesn't appear to be a practical approach. In a similar vein, I'm also looking at methods of term extraction and automatic keyword generation from indexed documents. I've been experimenting with MoreLikeThis and values returned by the " mlt.interestingTerms" parameter, which has potential but needs a bit of refinement before it can be truely useful. Has anybody else discovered clever or useful methods of term extraction using solr? Piete On 02/08/07, Burkamp, Christian <[EMAIL PROTECTED]> wrote: > > Hi, > > In my opinion the results from carrot2 clustering could be used in the > same way that facet results are used. > That's the way I'm planning to use them. > The user of the search application can narrow the search by selecting one > of the facets presented in the search result presentation. These facets > could come from metadata (classic facets) or from dynamically computed > categories which are results from carrot2. > > From this point of view it would be most convenient to have the > integration for carrot2 directly in the StandardRequestHandler. This leaves > questions open like "how should filters for categories from carrot2 be > formulated". > > Is anybody already using carrot2 with solr? > > -- Christian > > -Ursprüngliche Nachricht- > Von: [EMAIL PROTECTED] [mailto: [EMAIL PROTECTED] Im Auftrag von > Stanislaw Osinski > Gesendet: Mittwoch, 1. August 2007 14:01 > An: solr-user@lucene.apache.org > Betreff: Re: solr + carrot2 > > > > > > Has anyone looked into using carrot2 clustering with solr? > > > > I know this is integrated with nutch: > > > > http://lucene.apache.org/nutch/apidocs/org/apache/nutch/clustering/car > > rot2/Clusterer.html > > > > It looks like carrot has support to read results from a solr index: > > > > http://demo.carrot2.org/head/api/org/carrot2/input/solr/package-summar > > y.html > > > > But I'm hoping for something that returns clustered results from solr. > > > > Carrot also has something to read lucene indexes: > > > > http://demo.carrot2.org/head/api/org/carrot2/input/lucene/package-summ > > ary.html > > > > Any pointers or experience before I (may) delve into this? > > > > First of all, apologies for a delayed response. I'm one of Carrot2 > developers and indeed we did some Solr integration, but from Carrot2's > perspective, which I guess will not be directly useful in this case. If you > have any ideas for integration, questions or requests for changes/patches, > feel free to post on Carrot2 mailing list or file an issue for us. > > Thanks, > > Staszek >
synchronizing slave indexes in distributing collections
Hi, there, We want to use Solr's Collection Distribution. Here's the question regarding recovery of failures of the scripts. To my understanding: * if the snapuller fails on a slave, we can possibly implement something like the master would examine the status messages from all slaves and notify all slaves to execute snapinstaller if all statuses are success. * however, if then snapinstaller fails on a slave, there is really no simple operation to rollback so that all slaves can still keep the same old index. Besides, there is usually some hardware, network or simply Solr problems causing the snapinstaller to fail. The problem may prevent any rollback operation to execute, even if there is such an operation. It seems possible to implement a 2-phase commit like protocol to provide automatic recovery to keep all slave indexes consistent at all time. However, one being that I don't see there's an rollback operation for snapinstaller; two this would definitely complicates the system. So looks like all we can do is it monitoring the logs and alarm people to fix the issue and rerun the scripts, etc. whenever failures occur. Is that the correct understanding? Thanks, -Hui
Re: how to retrieve all the documents in an index?
: Any of you know whether the new "q:*.*" query performs better than the : get-around solutions like using a ranged query? I would guess so, but I : haven't looked into the Lucene implementation. it's faster -- it has almost no work to do relative the range query version. -Hoss
Re: synchronizing slave indexes in distributing collections
: So looks like all we can do is it monitoring the logs and alarm people to : fix the issue and rerun the scripts, etc. whenever failures occur. Is that : the correct understanding? I have *never* seen snappuller or snapinstaller fail (except during an initial rollout of Solr when i forgot to setup the neccessary ssh keys). I suppose we could at an option to snapinstaller to support explicitly installing a snapshot by name ... then if you detect that salve Z didn't load the latest snapshot, you could always tell the other slaves to snapinstall whatever older version slave Z is still using -- but frankly that seems a little silly -- not to mention that if you couldn't load the snapshot into Z, odds are Z isn't responding to queries either. a better course of action might just be to have an automated system which monitors the distribution status info on the master, and takes any slaves that don't update it properly out of your load balances rotation (and notifies people to look into it) -Hoss
Re: solr + carrot2
Pieter Berkel wrote: > In a similar vein, I'm also looking at methods of term extraction and automatic keyword generation from indexed documents. I've been experimenting with MoreLikeThis and values returned by the " mlt.interestingTerms" parameter, which has potential but needs a bit of refinement before it can be truely useful. Has anybody else discovered clever or useful methods of term extraction using solr? I've been using mlt.interestingTerms and it works well. For example, fetching the mlt.interestingTerms for each of the top 20 search results, combining the scores and using the top words to suggest additional search terms. alf.