Re: return matched terms / fuzzy or wildcard searches
My Solr-Server: http://www.captionsearch.de/solr.html Everytime you make a new search you get the last response file here: http://www.captionsearch.de/response.xml 2007/3/24, Chris Hostetter <[EMAIL PROTECTED]>: : > Perhaps our use of ConstantScorePrefixQuery by default? : : Ah, that would probably explain it! I had stumbled on this before : too and went to fix it and saw the rewrite in there and was : perplexed, but then got distracted by something shiny. yeah, that makes sense ... a true wildcard query works fine... http://localhost:8983/solr/select/?q=id:V???B*&fl=id&hl=true&hl.fl=id To answer your question Krystian: it's suppose to work for you, for fuzzy queries (like: dna~0.7) and wildcard queries (like: d?a) it should currently be working fine ... pelase send us an example Solr URL that doesn't work if it's not what you are observing. Only a simple prefix query (like: dn*) doesn't work ... and that seems to be because of the way we optimize a PrefixQuery into a ConstantScorePrefixQuery .. a workarround is to allways include a "?" in your query when you want highlighting -- so instead of dn* search for dn?* -Hoss
schema field type doesn't work
Hi everybody, I added the following fieldtype in schema.xml : I want to index two types of strings, for example : 12345678 1234-5678 No matter which of the above strings is stored, I'd like to match it by using either 12345678 or 1234-5678. Everything is working fine, except for the case when 12345678 is stored and I try to match it using 1234-5678. I must be doing something wrong, maybe in the schema. Does anyone have any suggestions? Any help would be greatly appreciated.
Re: schema field type doesn't work
On 3/24/07, Dimitar Ouzounov <[EMAIL PROTECTED]> wrote: ...I must be doing something wrong, maybe in the schema. Does anyone have any suggestions?.. The best way to debug such problems is with the analyzer admin tool: http://localhost:8983/solr/admin/analysis.jsp You can try various combinations of analyzers and see what Solr actually indexes for various values. HTH, -Bertrand
Re: schema field type doesn't work
On 3/24/07, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote: On 3/24/07, Dimitar Ouzounov <[EMAIL PROTECTED]> wrote: > ...I must be doing something wrong, maybe in the schema. Does anyone > have any suggestions?.. The best way to debug such problems is with the analyzer admin tool: http://localhost:8983/solr/admin/analysis.jsp Yep... trying the analysis page, one can see that parts of the numbers (not just the catenation) are also still being generated, messing up the query. So if 123-456 is indexed, and you also want to be able to match parts of that number (like 123), then you need a query analyzer and an index analyzer for the field type, and turn off generation of parts for the query analyzer only. If you don't want to match parts, then a single analyzer for both query and indexing will do, but explicitly turn off part generation: -Yonik
Re: schema field type doesn't work
Thanks a lot ! The analyzer admin tool is indeed useful. On 3/24/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 3/24/07, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote: > On 3/24/07, Dimitar Ouzounov <[EMAIL PROTECTED]> wrote: > > > ...I must be doing something wrong, maybe in the schema. Does anyone > > have any suggestions?.. > > The best way to debug such problems is with the analyzer admin tool: > http://localhost:8983/solr/admin/analysis.jsp Yep... trying the analysis page, one can see that parts of the numbers (not just the catenation) are also still being generated, messing up the query. So if 123-456 is indexed, and you also want to be able to match parts of that number (like 123), then you need a query analyzer and an index analyzer for the field type, and turn off generation of parts for the query analyzer only. If you don't want to match parts, then a single analyzer for both query and indexing will do, but explicitly turn off part generation: -Yonik
Re: sorting question
True, but let me ask the question in a different way. The problem is that when I run the query and order by date then the most recent results are not relevant enough (in general I find I need to do work on top of what solr provides in order to get good relevancy) so I guess I'm looking more for of a threashold to retrieve results only from a certain score and I need this threshold to be adaptive. I.e it's not about the number of results to retrieve since I want as many as possible so I have better chance to get the most recent one, but more about getting all the results that are relevant enough. When I display results sorted by score this is not a problem because all these results hide in page number X (X is big). I can think of several hacks (e.g calculating the distribution of results myself) to do this but was wondering if there is a proper solution. Thx On 3/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Is there a way (in 1 query) to retrieve the best scoring X results and : then sort them by another field (date for example)? not at the moment. keep in mind, this is the type of thing that can be done easily on the client side -- pull back the top X results sorted by score, then sort by date. -Hoss
Re: Backup and distributed index/backup management
Reposting :) Hi: I am novice to solr in terms of backup/operations. We have a single instance of master (solr) working well, I tried the backup scripts etc and could get things working fine. My question is, even with backup, solr will still have a single index, right? We will have huge amount of data in index - it is ever increasing. I want to archive older data - say every 2 weeks and start a new index - but want the older indices to be searchable. I can potentially take a snapshot at master at 2 week interval, backup and restart master with fresh index. On the slaves, where the actual searches happen, how do I deal with things - won't there be multiple indices there then? Does solr handle this - how? Or how do I solve this problem? Open to other suggestions too. Best Regards -al
Re: Backup and distributed index/backup management
: My question is, even with backup, solr will still have a single index, : right? We will have huge amount of data in index - it is ever increasing. if you have older docs you want to retire out of your index, you'll need to do that manually (delete by query can come in handy) : I want to archive older data - say every 2 weeks and start a new index - but : want the older indices to be searchable. : : I can potentially take a snapshot at master at 2 week interval, backup and : restart master with fresh index. you don't really need to restart the master ... you could pull snapshots from your master to a slave, and then when you decide that slave is "full" of old docs you stop pulling snapshots, and delete the old docs from your master and start replicating to a new slave. : Does solr handle this - how? Or how do I solve this problem? Open to other : suggestions too. what you're describing is fairly outside of what i would consider "normal" Solr usage .. it seems very special purpose. -Hoss
Re: Using cocoon to update index
: Is anyone using cocoon to index data? I'm trying to do this via cincludes : but I have had no luck. If you are using cocoon, and are POSTing data to : solr via a pipeline, would you share an example of how you have things you may want to take a look at the forest plugin Thorsten wrote, or the Cocoon/Solr/Subversion presentation Bertrand gave at the Cocoon 2006 GetTogether... http://forrest.apache.org/pluginDocs/plugins_0_80/org.apache.forrest.plugin.output.solr/ http://wiki.apache.org/solr/SolrResources http://wiki.apache.org/cocoon-data/attachments/GT2006Notes/attachments/13-SubversionSolr.pdf -Hoss
Re: return matched terms / fuzzy or wildcard searches
On 3/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: Only a simple prefix query (like: dn*) doesn't work ... and that seems to be because of the way we optimize a PrefixQuery into a ConstantScorePrefixQuery .. a workarround is to allways include a "?" in your query when you want highlighting -- so instead of dn* search for dn?* Note that you need the a recent nightly build for that to work--it wasn't there for the last release. -Mike
RE: Using cocoon to update index
I've blogged a method of doing this using Cocoon's webdav transformer: http://www.wallandbinkley.com/quaedam/?p=104 Peter From: Winona Salesky [mailto:[EMAIL PROTECTED] Sent: Fri 3/23/2007 12:14 PM To: solr-user@lucene.apache.org Subject: Using cocoon to update index Hi, Is anyone using cocoon to index data? I'm trying to do this via cincludes but I have had no luck. If you are using cocoon, and are POSTing data to solr via a pipeline, would you share an example of how you have things working. Thanks for the help, -Winona - Winona Salesky The University of Vermont Libraries [EMAIL PROTECTED]