Question about solrPlugins
i use customer Analyzer. Now i config synonyms filter. but it not effect. my schema.xml write: i think query procedure: first do myCustomerAnlyzer, two use synonyms to do analyzed word. if analyzed word is find in synonyms.txt and it will do something. am i right? i m sure analyzed word in synonyms.txt when i test.(i use http://localhost:8080/solr1/admin/analysis.jsp?highlight=on to sure it is in synonyms.txt) -- regards jl
Re: How to use Similarity
no one know? 2007/4/19, James liu <[EMAIL PROTECTED]>: i uncoment Similarity in Schema.xmland start tomcat i use admin gui to test and find it not effect. maybe something is wrong, anyone know? -- regards jl -- regards jl
Re: Facet.query
On Apr 19, 2007, at 10:41 PM, Ge, Yao ((Y.)) wrote: When mutiple facet queries are specified, are they booleaned as OR or AND? Neither, if you're referring to &facet.query=... facet.query's are all appended to the response, like this (in Ruby response format): { 'responseHeader'=>{ 'status'=>0, 'QTime'=>105, 'params'=>{ 'wt'=>'ruby', 'rows'=>'0', 'facet.query'=>['ant', 'lucene'], 'facet'=>'on', 'indent'=>'on', 'q'=>'erik hatcher'}}, 'response'=>{'numFound'=>3,'start'=>0,'docs'=>[] }, 'facet_counts'=>{ 'facet_queries'=>{ 'ant'=>1, 'lucene'=>1}, 'facet_fields'=>{}}} The query was this: ?q=erik% 20hatcher&facet=on&facet.query=ant&facet.query=lucene&wt=ruby&indent=on& rows=0 on our library metadata which, pleasantly, has copies of both the Ant book (yes, I'm looking into that JUnit issue, Ryan and Yonik :) and the Lucene book. If you mean the filter queries, &fq=... then those are logically ANDed when multiple are present. Erik
Avoiding caching of special filter queries
Hi, I'm using filter queries to implement document level security with solr. The caching mechanism for filters separate from queries comes in handy and the system performs well once all the filters for the users of the system are stored in the cache. However, I'm storing full document content in the index for the purpose of highlighting. In addition to the standard snippet highlighting I would like to offer a feature that displays the highlighted full document content. I can add a filter query to select just the needed Document by ID but this filter would go into the filter cache as well, possibly throwing out some of the other usefull filters. Is there a way to get the single document with highlighting info but without polluting the filter cache? -- Christian
AW: Avoiding caching of special filter queries
Hi Erik, No, what I need to do is &q="my funny query"&fq=user:erik&fq=id:"doc Id"&hl=on ... This is because the StandardRequestHandler needs the original query to do proper highlighting. The user gets his paginated result page with his next 10 hits. He can then select one document for highlighting. Then I just repeat the last request with an additional filter query to select this one document and add the highlighting parameters. -- Christian -Ursprüngliche Nachricht- Von: Erik Hatcher [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 20. April 2007 15:43 An: solr-user@lucene.apache.org Betreff: Re: Avoiding caching of special filter queries On Apr 20, 2007, at 7:11 AM, Burkamp, Christian wrote: > I'm using filter queries to implement document level security with > solr. > The caching mechanism for filters separate from queries comes in handy > and the system performs well once all the filters for the users of the > system are stored in the cache. > However, I'm storing full document content in the index for the > purpose > of highlighting. In addition to the standard snippet highlighting I > would like to offer a feature that displays the highlighted full > document content. I can add a filter query to select just the needed > Document by ID but this filter would go into the filter cache as well, > possibly throwing out some of the other usefull filters. > Is there a way to get the single document with highlighting info but > without polluting the filter cache? Correct me if I'm wrong, but here's my understanding... &q=id:"doc id"&fq=user:erik is what you'd want to do. q=id:"doc" won't go into the filter cache, but rather the query cache and the document itself into the document cache. So you won't risk bumping things out of the filter cache by using queries. Erik
Re: AW: Leading wildcards
thanks, this worked like a charm !! we built a custom "QueryParser" and we integrated the *foo** in it, so basically we can now search leading, trailing and both ... only crappy thing is the max Boolean clauses, but i'm going to look into that after the weekend for the next release of Solr : do not make this default, too many risks but do make an option in the config to enable it, it's a very nice feature thanks everybody for the help and have a nice weekend, maarten "Burkamp, Christian" <[EMAIL PROTECTED]> 19/04/2007 12:37 Please respond to solr-user@lucene.apache.org To cc Subject AW: Leading wildcards Hi there, Solr does not support leading wildcards, because it uses Lucene's standard QueryParser class without changing the defaults. You can easily change this by inserting the line parser.setAllowLeadingWildcards(true); in QueryParsing.java line 92. (This is after creating a QueryParser instance in QueryParsing.parseQuery(...)) and it obviously means that you have to change solr's source code. It would be nice to have an option in the schema to switch leading wildcards on or off per field. Leading wildcards really make no sense on richly populated fields because queries tend to result in too many clauses exceptions most of the time. This works for leading wildcards. Unfortunately it does not enable searches with leading AND trailing wildcards. (E.g. searching for "*lega*" does not find results even if the term "elegance" is in the index. If you put a second asterisk at the end, the term "elegance" is found. (search for "*lega**" to get hits). Can anybody explain this though it seems to be more of a lucene QueryParser issue? -- Christian -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 19. April 2007 08:35 An: solr-user@lucene.apache.org Betreff: Leading wildcards hi, we have been trying to get the leading wildcards to work. we have been looking around the Solr website, the Lucene website, wiki's and the mailing lists etc ... but we found a lot of contradictory information. so we have a few question : - is the latest version of lucene capable of handling leading wildcards ? - is the latest version of solr capable of handling leading wildcards ? - do we need to make adjustments to the solr source code ? - if we need to adjust the solr source, what do we need to change ? thanks in advance ! Maarten
Re: AW: Leading wildcards
Maarten: Would you mind sharing your custom query parser? On 4/20/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: thanks, this worked like a charm !! we built a custom "QueryParser" and we integrated the *foo** in it, so basically we can now search leading, trailing and both ... only crappy thing is the max Boolean clauses, but i'm going to look into that after the weekend for the next release of Solr : do not make this default, too many risks but do make an option in the config to enable it, it's a very nice feature thanks everybody for the help and have a nice weekend, maarten "Burkamp, Christian" <[EMAIL PROTECTED]> 19/04/2007 12:37 Please respond to solr-user@lucene.apache.org To cc Subject AW: Leading wildcards Hi there, Solr does not support leading wildcards, because it uses Lucene's standard QueryParser class without changing the defaults. You can easily change this by inserting the line parser.setAllowLeadingWildcards(true); in QueryParsing.java line 92. (This is after creating a QueryParser instance in QueryParsing.parseQuery(...)) and it obviously means that you have to change solr's source code. It would be nice to have an option in the schema to switch leading wildcards on or off per field. Leading wildcards really make no sense on richly populated fields because queries tend to result in too many clauses exceptions most of the time. This works for leading wildcards. Unfortunately it does not enable searches with leading AND trailing wildcards. (E.g. searching for "*lega*" does not find results even if the term "elegance" is in the index. If you put a second asterisk at the end, the term "elegance" is found. (search for "*lega**" to get hits). Can anybody explain this though it seems to be more of a lucene QueryParser issue? -- Christian -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 19. April 2007 08:35 An: solr-user@lucene.apache.org Betreff: Leading wildcards hi, we have been trying to get the leading wildcards to work. we have been looking around the Solr website, the Lucene website, wiki's and the mailing lists etc ... but we found a lot of contradictory information. so we have a few question : - is the latest version of lucene capable of handling leading wildcards ? - is the latest version of solr capable of handling leading wildcards ? - do we need to make adjustments to the solr source code ? - if we need to adjust the solr source, what do we need to change ? thanks in advance ! Maarten -- Michael Kimsal http://webdevradio.com
Re: Avoiding caching of special filter queries
On Apr 20, 2007, at 7:11 AM, Burkamp, Christian wrote: I'm using filter queries to implement document level security with solr. The caching mechanism for filters separate from queries comes in handy and the system performs well once all the filters for the users of the system are stored in the cache. However, I'm storing full document content in the index for the purpose of highlighting. In addition to the standard snippet highlighting I would like to offer a feature that displays the highlighted full document content. I can add a filter query to select just the needed Document by ID but this filter would go into the filter cache as well, possibly throwing out some of the other usefull filters. Is there a way to get the single document with highlighting info but without polluting the filter cache? Correct me if I'm wrong, but here's my understanding... &q=id:"doc id"&fq=user:erik is what you'd want to do. q=id:"doc" won't go into the filter cache, but rather the query cache and the document itself into the document cache. So you won't risk bumping things out of the filter cache by using queries. Erik
Re: Multiple Solr Cores
Updated (forgot the patch for Servlet). http://www.nabble.com/file/7996/solr-trunk-src.patch solr-trunk-src.patch The change should still be compatible with the trunk it is based upon. Henrib wrote: > > Following up on a previous thread in the Solr-User list, here is a patch > that allows managing multiple cores in the same VM (thus multiple > config/schemas/indexes). > The SolrCore.core singleton has been changed to a Map; > the current singleton behavior is keyed as 'null'. (Which is used by > SolrInfoRegistry). > All static references to either a Config or a SolrCore have been removed; > this implies that some classes now do refer to either a SolrCore or a > SolrConfig (some ctors have been modified accordingly). > > I haven't tried to modify anything above the 'jar' (script, admin & > servlet are unaware of the multi-core part). > > The 2 patches files are the src/ & the test/ patches. > http://www.nabble.com/file/7971/solr-test.patch solr-test.patch > http://www.nabble.com/file/7972/solr-src.patch solr-src.patch > > This being my first attempt at a contribution, I will humbly welcome any > comment. > Regards, > Henri > -- View this message in context: http://www.nabble.com/Multiple-Solr-Cores-tf3608399.html#a10106126 Sent from the Solr - User mailing list archive at Nabble.com.
sorting by matched field, then title alpha
Hi. I have some result ordering requirements which I can not solve searching the doco and forums to this point. Perhaps what I am trying does not belong in solr. Can anyone offer any hint or suggestion before I disappear into a vortex of terror? I need to conditionally order results of phrase searches like this: * First show all docs with phrase in field A - regardless of other occurences of phrase in doc - ordered alphabetically by field X * Next show all docs with phrase in field B - ... (as above) * Then same again for fields C, D and E * Then show remaining matching docs as per default scoring Perhaps I will have to do this clientside (ie. do multiple searches and concatenate results), but I'm hoping there is some way I can do this in a single search. Thanks, Simon
Re: Solr performance warnings
Hey Erik, thanks for the fast reply. Yes this could be possible. I currently got solr running for the indexing of a forum with 100k users. It could definitely be possible that two commits overlap. But I need to commit all changes because the new posts must be available in the search as soon as they are posted. Do you think there is a way to optimize this? Cheers, Michael Erik Hatcher wrote: On Apr 19, 2007, at 7:47 PM, Michael Thessel wrote: in my logs I get from time to time this message: INFO: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 What does this mean? What can I do to avoid this? I think you have issued multiple commits (or optimizes) that hadn't fully finished. Is that the case? Erik
Re: [acts_as_solr] Few question on usage
Hi Erik, Please find my comments under ">>>" to your queries. > : 1. What are other alternatives are available for ruby integration > with solr > : other than acts-as_solr plugin. acts_as_solr is purely for ActiveRecord (database O/R mapping) integration with Solr, such that when you create/update/delete records they get taken care of in Solr also. For pure Ruby access to Solr without a database, use solr-ruby. The 0.01 gem is available as "gem install solr-ruby", but if you can I'd recommend you tinker with the trunk codebase too. >>> Well I say, considering use of solr with rails application. Whats the ideal approach?. > : 2. acts_as_solr plugin - does it support highlighting feature This depends on which acts_as_solr you've grabbed. As Hoss mentioned, there are various flavors of it floating around. I've promised to speak about acts_as_solr at RailsConf next month, so I'll be working to get that under control even if that means resurrecting my initial hack and making it part of solr-ruby and hoping that the other implementations floating out there would like to collaborate on a definitive version built into the Solr codebase. >>> Since there are many flavors floating around which is most sought after and supported. And I agree that definitive version will help ROR community to accept solr with much larger level of confidence. And since ROR application are addressing web2.0 the need for search and collaborate information is much higher. So I personally believe addressing this will definately go long way. > : 3. performance benchmark for acts_as_solr plugin available if any What kind of numbers are you after? acts_as_solr searches Solr, and then will fetch the records from the database to bring back model objects, so you have to account for the database access in the picture as well as Solr. >>> Well to be specific I am keen to know about creation and update of indexes when you run into large number of documents. Since database is used to populate the models and definately it will be the commulative effect of retrieval of document from solr with lucene, network issues (since its a web service) and locally on database (depends on configuration). -TIA Erik Hatcher wrote: > > Sorry, I missed the original mail. Hoss has got it right. > > Personally I'd love to see acts_as_solr definitively come into the > solr-ruby fold. > > Regarding your questions: > >> : 1. What are other alternatives are available for ruby integration >> with solr >> : other than acts-as_solr plugin. > > acts_as_solr is purely for ActiveRecord (database O/R mapping) > integration with Solr, such that when you create/update/delete > records they get taken care of in Solr also. > > For pure Ruby access to Solr without a database, use solr-ruby. The > 0.01 gem is available as "gem install solr-ruby", but if you can I'd > recommend you tinker with the trunk codebase too. > >> : 2. acts_as_solr plugin - does it support highlighting feature > > This depends on which acts_as_solr you've grabbed. As Hoss > mentioned, there are various flavors of it floating around. I've > promised to speak about acts_as_solr at RailsConf next month, so I'll > be working to get that under control even if that means resurrecting > my initial hack and making it part of solr-ruby and hoping that the > other implementations floating out there would like to collaborate on > a definitive version built into the Solr codebase. > >> : 3. performance benchmark for acts_as_solr plugin available if any > > What kind of numbers are you after? acts_as_solr searches Solr, and > then will fetch the records from the database to bring back model > objects, so you have to account for the database access in the > picture as well as Solr. > > Erik > > > > On Apr 19, 2007, at 5:30 PM, Chris Hostetter wrote: > >> >> I don't really know alot about Ruby, but as i understand it there >> are more >> then a few versions of something called "acts_as_solr" floating >> arround >> ... the first written by Erik as a proof of concept, and then >> pickedu pand >> polished a bit by someone else (whose name escapes me) >> >> all of the "serious" ruby/solr development i know about is >> happening as >> part of the "Flare" sub-sub project... >> >> http://wiki.apache.org/solr/Flare >> http://wiki.apache.org/solr/SolRuby >> >> ...most of the people workign on it seem to hang out on the >> [EMAIL PROTECTED] mailing list. as i understand it the "solr-ruby" >> package >> is a low level ruby<->solr API, with Flare being a higher level >> reusable Rails app type thingamombob. (can you tell i don't know a >> lot >> about RUby or rails? ... i'm winging it) >> >> >> : Date: Tue, 17 Apr 2007 10:52:00 -0700 >> : From: amit rohatgi <[EMAIL PROTECTED]> >> : Reply-To: solr-user@lucene.apache.org >> : To: solr-user@lucene.apache.org >> : Subject: [acts_as_sol
Re: AW: Avoiding caching of special filter queries
On Apr 20, 2007, at 10:02 AM, Burkamp, Christian wrote: No, what I need to do is &q="my funny query"&fq=user:erik&fq=id:"doc Id"&hl=on ... No you don't what you need is: &q="my funny query" AND id:"doc Id"&fq=user:erik&hl=on This is because the StandardRequestHandler needs the original query to do proper highlighting. The user gets his paginated result page with his next 10 hits. He can then select one document for highlighting. Then I just repeat the last request with an additional filter query to select this one document and add the highlighting parameters. I think the above will suit this use case just fine. No? Erik -- Christian -Ursprüngliche Nachricht- Von: Erik Hatcher [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 20. April 2007 15:43 An: solr-user@lucene.apache.org Betreff: Re: Avoiding caching of special filter queries On Apr 20, 2007, at 7:11 AM, Burkamp, Christian wrote: I'm using filter queries to implement document level security with solr. The caching mechanism for filters separate from queries comes in handy and the system performs well once all the filters for the users of the system are stored in the cache. However, I'm storing full document content in the index for the purpose of highlighting. In addition to the standard snippet highlighting I would like to offer a feature that displays the highlighted full document content. I can add a filter query to select just the needed Document by ID but this filter would go into the filter cache as well, possibly throwing out some of the other usefull filters. Is there a way to get the single document with highlighting info but without polluting the filter cache? Correct me if I'm wrong, but here's my understanding... &q=id:"doc id"&fq=user:erik is what you'd want to do. q=id:"doc" won't go into the filter cache, but rather the query cache and the document itself into the document cache. So you won't risk bumping things out of the filter cache by using queries. Erik
Re: Avoiding caching of special filter queries
On 4/20/07, Burkamp, Christian <[EMAIL PROTECTED]> wrote: Hi Erik, No, what I need to do is &q="my funny query"&fq=user:erik&fq=id:"doc Id"&hl=on ... This is because the StandardRequestHandler needs the original query to do proper highlighting. The user gets his paginated result page with his next 10 hits. He can then select one document for highlighting. Then I just repeat the last request with an additional filter query to select this one document and add the highlighting parameters. Erik posted the way to do this that works with OOB Solr. If you want to do it with no additional querying (not even for the docid filter), you can use an approach like this (from a previous email): - turn on lazy field loading. For best effect, compress the main text field. - create a new request handler that is similar to dismax, but uses the query for highlighting only. A separate parameter allows the specification of document keys to highlight - highlighting requires the internal lucene document id, not the document key, and it can be slow to execute queries to get the ids. I created a custom cache that maps doc keys -> doc ids, populate it during the main query, and grab ids from the cache during the highlighting step. -Mike
Re: Solr performance warnings
On 4/20/07, Michael Thessel <[EMAIL PROTECTED]> wrote: Hey Erik, thanks for the fast reply. Yes this could be possible. I currently got solr running for the indexing of a forum with 100k users. It could definitely be possible that two commits overlap. But I need to commit all changes because the new posts must be available in the search as soon as they are posted. Do you think there is a way to optimize this? "As soon as" is a rather vague requirement. If you can specify the minimum acceptible delay, then you can use Solr's autocommit functionality to trigger commits. -Mike
Re: sorting by matched field, then title alpha
On 4/20/07, Simon Kahl <[EMAIL PROTECTED]> wrote: I need to conditionally order results of phrase searches like this: * First show all docs with phrase in field A - regardless of other occurences of phrase in doc - ordered alphabetically by field X * Next show all docs with phrase in field B - ... (as above) * Then same again for fields C, D and E * Then show remaining matching docs as per default scoring Perhaps I will have to do this clientside (ie. do multiple searches and concatenate results), but I'm hoping there is some way I can do this in a single search. You can approximate it by doing something like: A:"phrase"^10 B:"phrase"^1 C:"phrase"^1000 D:"phrase"^100 E:"phrase"^30 HTH, -Mike
Re: Solr performance warnings
Mike Klaas wrote: On 4/20/07, Michael Thessel <[EMAIL PROTECTED]> wrote: Hey Erik, thanks for the fast reply. Yes this could be possible. I currently got solr running for the indexing of a forum with 100k users. It could definitely be possible that two commits overlap. But I need to commit all changes because the new posts must be available in the search as soon as they are posted. Do you think there is a way to optimize this? "As soon as" is a rather vague requirement. If you can specify the minimum acceptible delay, then you can use Solr's autocommit functionality to trigger commits. -Mike I didn't know about the timed commits. That's perfect for me. Thanks, Michael
Re: Solr performance warnings
Michael Thessel wrote: Mike Klaas wrote: On 4/20/07, Michael Thessel <[EMAIL PROTECTED]> wrote: Hey Erik, thanks for the fast reply. Yes this could be possible. I currently got solr running for the indexing of a forum with 100k users. It could definitely be possible that two commits overlap. But I need to commit all changes because the new posts must be available in the search as soon as they are posted. Do you think there is a way to optimize this? "As soon as" is a rather vague requirement. If you can specify the minimum acceptible delay, then you can use Solr's autocommit functionality to trigger commits. -Mike I didn't know about the timed commits. That's perfect for me. Thanks, Michael The timed commits don't work for me. The webinterface says 0 commits since the server was restarted. And nothing in the logs as well. I use: apache-solr-1.1.0-incubating My updateHandler section from solrconfig.xml: 1 I also tried 10 in case its seconds and not ms. Cheers Michael
Re: Solr performance warnings
On 4/20/07, Michael Thessel <[EMAIL PROTECTED]> wrote: The timed commits don't work for me. The webinterface says 0 commits since the server was restarted. And nothing in the logs as well. I use: apache-solr-1.1.0-incubating Sorry about that--I see that that change was added a few weeks after 1.1 was cut. I suggest using a nightly build from Feb 2 or later, or waiting until 1.2 is released. cheers, -Mike
Re: Question about solrPlugins
On 4/20/07, James liu <[EMAIL PROTECTED]> wrote: i use customer Analyzer. Now i config synonyms filter. but it not effect. my schema.xml write: > > > > > > > > i think query procedure: first do myCustomerAnlyzer, two use synonyms to do analyzed word. if analyzed word is find in synonyms.txt and it will do something. You can't chain analyzers. You can specify an analyzer via a tokenizer followed by several token filters though. Follow the samples in the example schema.xml -Yonik
Re: How to use Similarity
On 4/19/07, James liu <[EMAIL PROTECTED]> wrote: i uncoment Similarity in Schema.xmland start tomcat i use admin gui to test and find it not effect. If no similarity is specified, Lucene's DefaultSimilarity is used. The Similarity you uncommented in the example schema.xml specifies Lucene's DefaultSimilarity hence there should be no difference. -Yonik
Re: Snapshooting or replicating recently indexed data
Hi Yonik, Thanks for your quick response, my question is this, can we take incremental backup/replication in SOLR? Regards, Doss. M. MOHANDOSS Software Engineer Ext: 507 (A BharatMatrimony Enterprise) - Original Message - From: "Yonik Seeley" <[EMAIL PROTECTED]> To: Sent: Thursday, April 19, 2007 7:42 PM Subject: Re: Snapshooting or replicating recently indexed data On 4/19/07, Doss <[EMAIL PROTECTED]> wrote: It seems the snapshooter takes the exact copy of the indexed data, that is all the contents inside the index directory, how can we take the recently added once? ... cp -lr ${data_dir}/index ${temp} mv ${temp} ${name} ... I don't quite understand your question, but since hard links are used, it's more like pointing to the index files instead of copying them. Rsync is used as a transport to only move the files that were changed from the master to slaves. -Yonik