Ranking Question.
Hi, Maybe a trivial/stupid questions but: I have a fairly simple schema with a title, tags and description. I have my own ranking/scoring system that takes into account the similarity of each tag to a term in the query but now that i want to include also the title and description (the description is somewhere between short to a moderate length) i am not sure how to handle this. For example, would parsing the description and title before indexing in SOLR and adding them as tags makes sense ? it sounds like that would replicate a mechanism of stop words, stemming etc... built into lucene. My goal at the end is change as little as possible in the retrieval process but then be able to rank based the keywords extracted from the entire document. Any ideas / directions ? Thanks Shai
Re: Solr on Tomcat 6.0.10?
I'm using 6.0.9 and no issues (fingers crossed) Walter Underwood wrote: Is anyone running Solr on Tomcat 6.0.10? Any issues? I searched the archives and didn't see anything. wunder -- Galo Navarro, Developer [EMAIL PROTECTED] t. +44 (0)20 7780 7080 Last.fm | http://www.last.fm Karen House 1-11 Baches Street London N1 6DL http://www.last.fm/user/galeote
Re: Solr on Tomcat 6.0.10?
today i use tomcat 6.0.10,,,but no time to search. tomorrow i will test it. which java version you use? 2007/3/8, Walter Underwood <[EMAIL PROTECTED]>: Is anyone running Solr on Tomcat 6.0.10? Any issues? I searched the archives and didn't see anything. wunder -- Walter Underwood Search Guru, Netflix -- regards jl
Re: [2] SQL Update
I could create a list of field name + type, but doing so I might as well create it and add it to fields in schema.xml. Does solr reread the schema file when I post an add action or only on starup (or someother point)? In general, I wonder if adding the suffix for dynamic fields is not posing some usability tradeoff. I think, For a user (not a programmer) it's not intuitive to think of id as an integer and therefore enter id_i when searching, what do you think? Chris Hostetter wrote: > > > and i suppose you could make > a customized ResponseWriter that when writing out documents striped off > any suffixes it could tell came from dynamicFields so the response docs > contained and ... but when parsing the > query string your clients send, and they ask for "user:42" how would the > request handler know that it shoudl rewrite that to user_string:42 and not > user_int:42 ? > > > > -Hoss > > > -- View this message in context: http://www.nabble.com/SQL-Update-tf3358303.html#a9372391 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re[2]: Solr and Multiple Index Partitions
On Mar 7, 2007, at 9:20 PM, Jack L wrote: Selecting by type will do the job. But I suppose it sacrifice performance because having multiple document types in the same index will render a larger index. Is it bad? A many documents we talking here? My hunch is you'll be fine :) Erik
Re: Solr and Multiple Index Partitions
: I use a custom Analyzer which extends Lucene's StandardAnalyzer. When I : configured Solr to use this one, It throws an exception : RuntimeException("Can't set positionIncrementGap on custom analyzer " + : analyzer.getClass()). : : Do I need to extend a specific Analyzer for it to work with Solr? you can use any Analyzer you want, but you can't configure a positionIncrementGap in the schema.xml unless your Analyzer extends SolrAnalyzer (the concept of a position increment gap is an inherient property that Lucene Analyzers can specify, but configuring it explicitly is a Solr concept) -Hoss
Re: [2] SQL Update
: I could create a list of field name + type, but doing so I might as well : create it and add it to fields in schema.xml. that was my original point: if you want to be able to refer to a field as "username" and have it be string, just define it explicitly. : Does solr reread the schema file when I post an add action or only on starup : (or someother point)? SOlr only reads the schema.xml file once, but the IndexSchema object it builds from the schema.xml is used pervasively during document adds and searches to understand how to use each field. : In general, I wonder if adding the suffix for dynamic fields is not posing : some usability tradeoff. : I think, For a user (not a programmer) it's not intuitive to think of id as : an integer and therefore enter id_i when searching, : what do you think? there's definitely a tradeoff ... dynamic fields make it easy for you to add arbitrary fields where the type information is infered by naming convention -- but then you have to use those names. if you want cleaner names you have create more explict fields in advance. -Hoss
Re: Solr and Multiple Index Partitions
whoops .. forgot the documentaiton link... http://wiki.apache.org/solr/SolrPlugins#head-9939da9abe85a79eb30a026e85cc4aec0beac10c : you can use any Analyzer you want, but you can't configure a : positionIncrementGap in the schema.xml unless your Analyzer extends : SolrAnalyzer (the concept of a position increment gap is an inherient : property that Lucene Analyzers can specify, but configuring it explicitly : is a Solr concept) -Hoss
solr vs custom JMS for replication?
Hi everyone, I have a open source app under development called "authsum" which is a sso/identity/authorization server that supports user registration, openid,sso. It's a "search engine for authorizations" because the authorizations are stored in a lucene index accessible via xfire. There will be a dotnet and ruby client. http://www.authsum.org/overview/index.html I am using JMS to keep my lucene indexes in sync. My applications (admin application,registration,login applications) publish messages (i.e. userid, group addition, etc) onto a JMS topic. There are other running applications that subscribes to the topic and processes the index changes. I am trying to "cut down" on the engineering and was wondering if solr would be a better fit for my needs. As I see it, my custom JMS solution means that there are potentially many IndexWriters out there (and more processing) since the same processing work needs to be performed on all indexes. This could also be a problem since there is more of a possibility that indexes could get out of sync with one another. For these reasons, I am thinking that solr would be better for me than JMS. The drawbacks: 1) I would need to write my application to post xml documents to lucene vs. my lucene programming that I do now. 2) Do I have direct access to the lucene index to do queries? Or do I need to rewrite my app for that also?
Re: Solr and Multiple Index Partitions
Thanks Chris for a wonderful explanation. I completely get it now. Thanks for the handy URL too. Venkatesh On 3/8/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I use a custom Analyzer which extends Lucene's StandardAnalyzer. When I : configured Solr to use this one, It throws an exception : RuntimeException("Can't set positionIncrementGap on custom analyzer " + : analyzer.getClass()). : : Do I need to extend a specific Analyzer for it to work with Solr? you can use any Analyzer you want, but you can't configure a positionIncrementGap in the schema.xml unless your Analyzer extends SolrAnalyzer (the concept of a position increment gap is an inherient property that Lucene Analyzers can specify, but configuring it explicitly is a Solr concept) -Hoss
HA and load balancing Qauestion
Hello there, Howdy. I'd like to know if I can configure Multiple Solr instances working with a single read-only index partition for failover/HA and load balancing purposes. Or is there any other way that Solr has built-in features to handle the same. Any ideas/thoughts are greatly appreciated. -- Thanks, Venkatesh "Perfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away." - Antoine de Saint-Exupéry
Re: HA and load balancing Qauestion
On 3/8/07, Venkatesh Seetharam <[EMAIL PROTECTED]> wrote: Howdy. I'd like to know if I can configure Multiple Solr instances working with a single read-only index partition for failover/HA and load balancing purposes. Or is there any other way that Solr has built-in features to handle the same. On the front-end, HTTP is easily load-balanced via software or hardware loadbalancers. To distribute a single index to multiple solr searchers, see http://wiki.apache.org/solr/CollectionDistribution You don't have to do it that way though... if you have another mechanism to get the index to the searchers, that could work too. -Yonik
Re: Solr on Tomcat 6.0.10?
Java 1.5.0_05 on Intel and PowerPC (IBM) plus any DST changes. --wunder On 3/8/07 4:08 AM, "James liu" <[EMAIL PROTECTED]> wrote: > today i use tomcat 6.0.10,,,but no time to search. > > tomorrow i will test it. > > which java version you use? > > 2007/3/8, Walter Underwood <[EMAIL PROTECTED]>: >> >> Is anyone running Solr on Tomcat 6.0.10? Any issues? >> I searched the archives and didn't see anything. >> >> wunder >> -- >> Walter Underwood >> Search Guru, Netflix
Re: HA and load balancing Qauestion
Thanks for the reply Yonik. I'm not using HTTP and using a wrapper to wrap Solr for searching. I'm using RPC to talk to multiple servers. Can I point 2 Solr instances to the same index partition, having the same path in SolrConfig? Is this safe or I need to make 2 copies of the same index partition and point the Solr instances to these copies? Since my index partition lives on a shared NetApp mount, I'd like to use the same index partition for multiple Solr instances. Thanks for any help, Venkatesh On 3/8/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 3/8/07, Venkatesh Seetharam <[EMAIL PROTECTED]> wrote: > Howdy. I'd like to know if I can configure Multiple Solr instances working > with a single read-only index partition for failover/HA and load balancing > purposes. Or is there any other way that Solr has built-in features to > handle the same. On the front-end, HTTP is easily load-balanced via software or hardware loadbalancers. To distribute a single index to multiple solr searchers, see http://wiki.apache.org/solr/CollectionDistribution You don't have to do it that way though... if you have another mechanism to get the index to the searchers, that could work too. -Yonik
Re: HA and load balancing Qauestion
On 3/8/07, Venkatesh Seetharam <[EMAIL PROTECTED]> wrote: Can I point 2 Solr instances to the same index partition, having the same path in SolrConfig? Yes, that should work fine. -Yonik
Re: solr vs custom JMS for replication?
if your primary motivation for whating to use Solr is the replication of index data vs JMS messages and seperate INdexWriters on each node, you may just want to use the Solr replication scripts -- they should work with any Lucene index regardless of wether or not you are using SOlr, you just need to make sure you use them with the same semantics (ie: run snapshooter after closing the IndexWriter on your master, reopen the INdexReaders on your slaves after runing snapinstaller, etc...) : Hi everyone, : I have a open source app under development called "authsum" which is a sso/identity/authorization server that supports user registration, openid,sso. It's a "search engine for authorizations" because the authorizations are stored in a lucene index accessible via xfire. There will be a dotnet and ruby client. http://www.authsum.org/overview/index.html : : I am using JMS to keep my lucene indexes in sync. : : My applications (admin application,registration,login applications) publish messages (i.e. userid, group addition, etc) onto a JMS topic. : There are other running applications that subscribes to the topic and processes the index changes. : : I am trying to "cut down" on the engineering and was wondering if solr would be a better fit for my needs. : : As I see it, my custom JMS solution means that there are potentially many IndexWriters out there (and more processing) since the same processing work needs to be performed on all indexes. This could also be a problem since there is more of a possibility that indexes could get out of sync with one another. For these reasons, I am thinking that solr would be better for me than JMS. : : The drawbacks: : 1) I would need to write my application to post xml documents to lucene vs. my lucene programming that I do now. : 2) Do I have direct access to the lucene index to do queries? Or do I need to rewrite my app for that also? : : -Hoss
Re: HA and load balancing Qauestion
: > Can I point 2 Solr instances to the same index partition, having the same : > path in SolrConfig? : : Yes, that should work fine. you might run into some weirdness if you send updates/delets to both instances .. basically you'll want to configure all but one instace as a "slave" and anytime you do a commit on the master you'll want to trigger a commit on all of the slaves so that they reopen the index. (just like using the snapshot scripts, except you don't need to snapshoot, snappull, or snapinstall) -Hoss
Re: [2] synonym filter fix
On 3/7/07, nick19701 <[EMAIL PROTECTED]> wrote: thanks mike for your confirmation. it turns out to be tomcat's problem. even though the new build was within tomcat's reach, it didn't use it. After I deleted the folders under tomcat/webapps, the new build was picked up immediately and everything works perfectly. Great! Thanks for your bug report. -Mike
Re: [2] SQL Update
On 3/8/07, Debra <[EMAIL PROTECTED]> wrote: I could create a list of field name + type, but doing so I might as well create it and add it to fields in schema.xml. <> Alternative solution: write a SQL schema <-> Solr schema mapper. Should be relatively simple, as long as you are confining yourself to flat tables. Or, it could provide the mapping on the fly going into and out of Solr. In general, I wonder if adding the suffix for dynamic fields is not posing some usability tradeoff. I think, For a user (not a programmer) it's not intuitive to think of id as an integer and therefore enter id_i when searching, what do you think? In my experience, it is very common for SQL schemata to include suffices indicates the datatype of the field. As we've discussed, Solr needs some way of distinguishing that a field is a given type, so it is infeasible to simply drop the suffix. If you think it should go, there has to be some kind of alternatice mechanism for recognizing dynamic field types. cheers, -Mike
Re: Re: [2] SQL Update
Re: HA and load balancing Qauestion
Thanks Yonik and Chris for your confirmation. Chris, these are read-only index partitions. I perform updates/deletions on a master index which will be snapshotted at some fixed intervals. I'll look into the Collection Distribution of Solr. Sounds very powerful. I'm struck with Solr requiring an index directory under dataDir configured in SolrConfig. Why does it not take a complete path_to_index configured under dataDir but append "index"? Is there anyway I can workaround this? org.apache.solr.core.SolrCore: this.index_path = dataDir + "/" + "index"; Thanks, Venkatesh On 3/8/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : > Can I point 2 Solr instances to the same index partition, having the same : > path in SolrConfig? : : Yes, that should work fine. you might run into some weirdness if you send updates/delets to both instances .. basically you'll want to configure all but one instace as a "slave" and anytime you do a commit on the master you'll want to trigger a commit on all of the slaves so that they reopen the index. (just like using the snapshot scripts, except you don't need to snapshoot, snappull, or snapinstall) -Hoss
Re: HA and load balancing Qauestion
: I'm struck with Solr requiring an index directory under dataDir configured : in SolrConfig. Why does it not take a complete path_to_index configured : under dataDir but append "index"? Is there anyway I can workaround this? i think at one time we were assuming there might be other types of data you'd want Solr to store besides the index... the assumption is that you tell Solr where you want it to keep all of it's data, and after that you shouldn't care what lives in that directory. if i remember correctly, the dataDir is alwo where the solr snapshots and temp dirs get put ... if Solr let you configure the indexdir directly, we'd need you to also configure those locations seperately -- except that they have to be on the same physical disk for hardlinks to work, so it's relaly just a lot simpler if you tell SOlr the dataDir and let it take care of everything else. (except in your case where you *want* to take care of it ... Solr wasn't really designed for that case) -Hoss
Re: HA and load balancing Qauestion
Thanks Hoss for the clarification. I think I can make a copy of the index for searching and rename 'em. I think I can work around this one but good to know the bigger picture. Venkatesh On 3/8/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I'm struck with Solr requiring an index directory under dataDir configured : in SolrConfig. Why does it not take a complete path_to_index configured : under dataDir but append "index"? Is there anyway I can workaround this? i think at one time we were assuming there might be other types of data you'd want Solr to store besides the index... the assumption is that you tell Solr where you want it to keep all of it's data, and after that you shouldn't care what lives in that directory. if i remember correctly, the dataDir is alwo where the solr snapshots and temp dirs get put ... if Solr let you configure the indexdir directly, we'd need you to also configure those locations seperately -- except that they have to be on the same physical disk for hardlinks to work, so it's relaly just a lot simpler if you tell SOlr the dataDir and let it take care of everything else. (except in your case where you *want* to take care of it ... Solr wasn't really designed for that case) -Hoss
stack trace response
Hello, We have a front application server that uses Solr to search our contents. If the front application calls Solr with /select but missing q parameter, Solr returns a stack trace with its response body, while we expect XML response with an error message (& the stack trace in the XML). Is it a feature? If so, is the front server responsible about checking required params before requesting to Solr, correct? I've found a similar issue in JIRA: an empty query string in the admin interface throws an exception http://issues.apache.org/jira/browse/SOLR-48 but the solution was that the front checked q parameter before sending. We are hoping Solr returns a readable response produced by Response Writer even if the front server sends wrong request to Solr, but if it is a feature, we will validate params at the front. We'd like to just confirm that. Thank you, Koji
Re: stack trace response
: If the front application calls Solr with /select but missing q parameter, : Solr returns a stack trace with its response body, while we expect : XML response with an error message (& the stack trace in the XML). : Is it a feature? : : If so, is the front server responsible about checking required params : before requesting to Solr, correct? generally speaking, clients should allways try to ensure that they only send welformed requests -- the built in RequestHandlers can't function without a "q" param, so i would say yes they should check that they have a value for that param before sending the request to Solr. if you want their to be a default value for the "q" param, you can configure it in the solrconfig.xml. at a broader level, you are correct - the error reporting is not very clean. we are hoping to eventually fix that so the error reporting is more consistent... http://issues.apache.org/jira/browse/SOLR-141 ...in the mean time, i believe query errors are reported using the HTTP status code (if not 200, then an error) and update errors are reported in the XML response body. -Hoss
RE: Hierarchical Facets
: Dir1 : Dir1/Subdir1 : Dir1/Subdir1/SubSubDir1 or something like... Dir1 Subdir1 SubSubDir1 ...but this is why Hierarchical facets are hard. (it just occured to me that this is a differnet hiearchical facets thread then the one i thought it was .. you may want to check the arcives for some other recent discussion on this) -Hoss
Re: Ranking Question.
you need to elaborate a little more on what yo uare currently doing, and what you want to be doing... youmention "my own ranking/scoring system" ... is this something you've implemented in code already? Is this a custom Simalrity class or Query class, or something basic htat you've done with a custom request hadler? how do you want matches on the title/description to affect things? should htey contribute to hte score (ie: influence ordering) or just affect wether or not a document is included i nthe results set? when you say "change as little as possible in the retrieval process" are you refering to some existing process you've implemented, or hte default logic of the StandardRequestHandler? : I have a fairly simple schema with a title, tags and description. : I have my own ranking/scoring system that takes into account the : similarity of each tag to a term in the query but now that i want to : include also the title and description (the description is somewhere : between short to a moderate length) i am not sure how to handle this. : For example, would parsing the description and title before indexing : in SOLR and adding them as tags makes sense ? it sounds like that : would replicate a mechanism of stop words, stemming etc... built into : lucene. : My goal at the end is change as little as possible in the retrieval : process but then be able to rank based the keywords extracted from the : entire document. -Hoss
Re: stack trace response
I agree, the display is a bit weird. But, if you check the response headers it the response code is 400 "Bad Request" In firefox or IE, you would need to inspect the headers to see what is going on. The issue is that /select uses a servlet that writes out a stack trace for every error it hits directly. Is uses: try{response.setStatus(rc);} catch (Exception e) {} PrintWriter writer = response.getWriter(); writer.write(msg); Down the line, when we have http://issues.apache.org/jira/browse/SOLR-141, this will be the best option. I'm lobbying to let the SolrDispatchFilter handle /select. The SolrDispatchFilter passes the error code and message on to the servlet container so it is formatted in the most standard way. It also only includes a stack trace for 500 errors, not 400,403,etc. It uses: res.sendError( code, ex.getMessage() ); ryan