LNS - or - "now i know we've succeeded"
First it was "I18N", then "LAMP", and recently "RoR" ... and now ... Solr has hit the big time by becoming integrated into an obfuscated and non-intuitive acronym that pertains to a stack/suite of software... ! ! !"LNS"! ! ! http://www.ideaeng.com/pub/entsrch/2008/number_01/article01.html Our final inductee into the new Enterprise Search "Tier 1" (the "half" in "4 and 1/2") is a set of open source software based on Lucene, including Nutch and Solr. Technically it doesn't qualify for the classical "enterprise" status, as we define it. By rights, they should stay on our "Tier 1.5" list. But, and this is a big one, they are being considered, or at least discussed again and again by clients. It happened often enough in 2007 that we have to include Lucene/Nutch/Solr (which we'll refer to as "LNS") Superfulous acronym jokes asside, It is pretty cool to see a reputable Search company (is ideaeng.com a reputable search consulting company? i don't really pay much attention to this sort of thing but i'm assuming they must be since they have a website and they don't need to run banner ads on it to earn money) ranking "Lucene/Nutch/Solr" as a "Tier 1" enterprise search solution ... even if they do only count "LNS" as "half" a product. -Hoss (needs sleep now)
Wildcard on last char
Hello, i have encountered a problem concerning the wildcard. When i search for field:testword i get 50 results. That's ok but when I search for field:testwor* i get just 3 hits! I get only words returned without a whitespace after the char like "testwordtest" but i wont find any single "testword" i've searched before. Why can't I replace a single char at the end of a word with the wildcard? Thanks a million, Patric
Re: Delte by multiple id problem
On Fri, 11 Jan 2008 00:43:19 -0200 Leonardo Santagada <[EMAIL PROTECTED]> wrote: > No, actually my problem is that the solr index is mirroring data on a > database (a Zope app to be more acurate) so it would be better if I > could send the whole transaction together so I don't have to keep it > on separate files... wich I have to do so I can not send anything if > the transaction is aborted (I can't abort a solr add right?). > > Maybe I should explain more, but I think this is pretty comon to > anyone trying to keep database transactions and a solr index in sync, > as solr doesn't support two phase commit or anything like that. Hola Leonardo, I haven't have to do this, but I am starting to design something along these lines. if you execut your 'add' and 'deletes' from a stored proc, inside a transaction, you can simply have an extra table with Solr doc ids and the action to perform (add / delete). eg, exec(delete_from_my_db('xyz') -> being transaction {do here all your DB work} {add to tblSolrWork the ID to delete} end transaction Hence , If the transaction fails, those records will never actually exist. You can then have a process that every x seconds/minutes/hours (depending on your needs), scans this tbSolrWork table and performs whatever adds or deletes are needed. Of course, for the add, you'll also need to get the information to add from somewhere, but I imagine you already do that. Whether and how you could do this in Zope, I have no idea, but if you solve it it would be great if you could share it here . You could also make use of triggers (on insert / update and onDelete triggers), but I suppose that is a bit more DB dependent than plain SP work - though it may be simpler to implement than changing all your code to call the SP instead of direct SQL cmds... good luck, B _ {Beto|Norberto|Numard} Meijome "He can compress the most words into the smallest idea of any man I know." Abraham Lincoln I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Transactions and Solr Was: Re: Delte by multiple id problem
On 12/01/2008, at 11:24, Norberto Meijome wrote: On Fri, 11 Jan 2008 00:43:19 -0200 Leonardo Santagada <[EMAIL PROTECTED]> wrote: No, actually my problem is that the solr index is mirroring data on a database (a Zope app to be more acurate) so it would be better if I could send the whole transaction together so I don't have to keep it on separate files... wich I have to do so I can not send anything if the transaction is aborted (I can't abort a solr add right?). Maybe I should explain more, but I think this is pretty comon to anyone trying to keep database transactions and a solr index in sync, as solr doesn't support two phase commit or anything like that. Hola Leonardo, I haven't have to do this, but I am starting to design something along these lines. if you execut your 'add' and 'deletes' from a stored proc, inside a transaction, you can simply have an extra table with Solr doc ids and the action to perform (add / delete). eg, exec(delete_from_my_db('xyz') -> being transaction {do here all your DB work} {add to tblSolrWork the ID to delete} end transaction Hence , If the transaction fails, those records will never actually exist. Not that simple, for example, another add with the same unique key should remove the key from the delete, and then store the whole data twice so you know what to send to solr. Also you have to save a serial number of the transaction so you add documents in the right order and do the deletes also in order. And having one table that manages this in the same relational database could mean a big drop in performance, as everything you do on your db would lock, write and read from a single or a couple of tables, and this makes your life a living hell also :). What I am doing on Zope is firing some events when new documents are added, updated or removed, and then I join the transaction with my transaction manager wich orders the adds to solr and already saves a xml file to be sent to solr. The problems with this are the ones mentioned, it would be simpler if the same file could send all types of commands to solr (add and delete are the ones I am using. Whether and how you could do this in Zope, I have no idea, but if you solve it it would be great if you could share it here . You could also make use of triggers (on insert / update and onDelete triggers), but I suppose that is a bit more DB dependent than plain SP work - though it may be simpler to implement than changing all your code to call the SP instead of direct SQL cmds... Probably, but still would hit performance really hard on a relational database that have a lot more than documents on it I think. Does anyone have more experience doing this kind of stuff and whants to share? -- Leonardo Santagada
Restrict values in a multivalued field
Hi, In my schema I have a multivalued field, and the values of that field are "stored" and "indexed" in the index. I wanted to know if its possible to restrict the number of multiple values being returned from that field, on a search? And how? Because, lets say, if I have thousands of values in that multivalued field, returning all of them would be a lot of load on the system. So, I want to restrict it to send me only say, 50 values out of the thousands. Regards, Rishabh
Re: Restrict values in a multivalued field
I don't have the answer to this one other than the "process it yourself in your app". But should anyone decide to work on this, I have another similar suggestion/request: return N *unique* values from a multivalued field sorted by their count. The use case for this is a tagging system like simpy.com where multiple people can tag an entity with the same tags, and while you would want to store/index multiple copies for TF purposes, you really don't want to display multiple copies of the same tag. Simpy currently does this manually, for example that happens here: http://www.simpy.com/links/search/solr ... and I'm not convinced this belongs to Solr. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Rishabh Joshi <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Saturday, January 12, 2008 10:55:32 AM Subject: Restrict values in a multivalued field Hi, In my schema I have a multivalued field, and the values of that field are "stored" and "indexed" in the index. I wanted to know if its possible to restrict the number of multiple values being returned from that field, on a search? And how? Because, lets say, if I have thousands of values in that multivalued field, returning all of them would be a lot of load on the system. So, I want to restrict it to send me only say, 50 values out of the thousands. Regards, Rishabh
Re: Transactions and Solr Was: Re: Delte by multiple id problem
Do you have to have your data in Solr as soon as its added to the DB? Probably not. What if somebody manually changes the DB? It will be out of sync with your DB. We see similar situations pretty frequently and our solution is a standalone DB Indexing application that knows how to do incremental indexing, detect deleted rows/document as well as updates. So I'd suggest you think about that approach. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Leonardo Santagada <[EMAIL PROTECTED]> To: Norberto Meijome <[EMAIL PROTECTED]> Cc: solr-user@lucene.apache.org Sent: Saturday, January 12, 2008 10:16:39 AM Subject: Transactions and Solr Was: Re: Delte by multiple id problem On 12/01/2008, at 11:24, Norberto Meijome wrote: > On Fri, 11 Jan 2008 00:43:19 -0200 > Leonardo Santagada <[EMAIL PROTECTED]> wrote: > >> No, actually my problem is that the solr index is mirroring data on a >> database (a Zope app to be more acurate) so it would be better if I >> could send the whole transaction together so I don't have to keep it >> on separate files... wich I have to do so I can not send anything if >> the transaction is aborted (I can't abort a solr add right?). >> >> Maybe I should explain more, but I think this is pretty comon to >> anyone trying to keep database transactions and a solr index in sync, >> as solr doesn't support two phase commit or anything like that. > > Hola Leonardo, > I haven't have to do this, but I am starting to design something > along these lines. > > if you execut your 'add' and 'deletes' from a stored proc, inside a > transaction, you can simply have an extra table with Solr doc ids > and the action to perform (add / delete). > eg, > exec(delete_from_my_db('xyz') -> > being transaction > {do here all your DB work} > {add to tblSolrWork the ID to delete} > end transaction > Hence , If the transaction fails, those records will never actually > exist. Not that simple, for example, another add with the same unique key should remove the key from the delete, and then store the whole data twice so you know what to send to solr. Also you have to save a serial number of the transaction so you add documents in the right order and do the deletes also in order. And having one table that manages this in the same relational database could mean a big drop in performance, as everything you do on your db would lock, write and read from a single or a couple of tables, and this makes your life a living hell also :). What I am doing on Zope is firing some events when new documents are added, updated or removed, and then I join the transaction with my transaction manager wich orders the adds to solr and already saves a xml file to be sent to solr. The problems with this are the ones mentioned, it would be simpler if the same file could send all types of commands to solr (add and delete are the ones I am using. > Whether and how you could do this in Zope, I have no idea, but if > you solve it it would be great if you could share it here . > > You could also make use of triggers (on insert / update and > onDelete triggers), but I suppose that is a bit more DB dependent > than plain SP work - though it may be simpler to implement than > changing all your code to call the SP instead of direct SQL cmds... Probably, but still would hit performance really hard on a relational database that have a lot more than documents on it I think. Does anyone have more experience doing this kind of stuff and whants to share? -- Leonardo Santagada