LNS - or - "now i know we've succeeded"

2008-01-12 Thread Chris Hostetter


First it was "I18N", then "LAMP", and recently "RoR" ... and now ... Solr 
has hit the big time by becoming integrated into an obfuscated and 
non-intuitive acronym that pertains to a stack/suite of software...


! ! !"LNS"! ! !

http://www.ideaeng.com/pub/entsrch/2008/number_01/article01.html


Our final inductee into the new Enterprise Search "Tier 1" (the "half"
in "4 and 1/2") is a set of open source software based on Lucene, 
including Nutch and Solr. Technically it doesn't qualify for the 
classical "enterprise" status, as we define it. By rights, they should 
stay on our  "Tier 1.5" list. But, and this is a big one, they are 
being considered, or at least discussed again and again by clients. It 
happened often enough in 2007 that we have to include Lucene/Nutch/Solr 
(which we'll refer to as "LNS")


Superfulous acronym jokes asside, It is pretty cool to see a reputable 
Search company (is ideaeng.com a reputable search consulting company? i 
don't really pay much attention to this sort of thing but i'm assuming 
they must be since they have a website and they don't need to run banner 
ads on it to earn money) ranking "Lucene/Nutch/Solr" as a "Tier 1" 
enterprise search solution ... even if they do only count "LNS" as "half" 
a product.




-Hoss  (needs sleep now)



Wildcard on last char

2008-01-12 Thread Patric Wilms

Hello,

i have encountered a problem concerning the wildcard. When i search for 
field:testword  i get 50 results. That's ok but when I search for 
field:testwor* i get just 3 hits! I get only words returned without a 
whitespace after the char like "testwordtest" but i wont find any single 
"testword" i've searched before. Why can't I replace a single char at 
the end of a word with the wildcard?


Thanks a million,

Patric








Re: Delte by multiple id problem

2008-01-12 Thread Norberto Meijome
On Fri, 11 Jan 2008 00:43:19 -0200
Leonardo Santagada <[EMAIL PROTECTED]> wrote:

> No, actually my problem is that the solr index is mirroring data on a  
> database (a Zope app to be more acurate) so it would be better if I  
> could send the whole transaction together so I don't have to keep it  
> on separate files... wich I have to do so I can not send anything if  
> the transaction is aborted (I can't abort a solr add right?).
> 
> Maybe I should explain more, but I think this is pretty comon to  
> anyone trying to keep database transactions and a solr index in sync,  
> as solr doesn't support two phase commit or anything like that.

Hola Leonardo,
I haven't have to do this, but I am starting to design something along these 
lines.

if you execut your 'add' and 'deletes' from a stored proc, inside a 
transaction, you can simply have an extra table with Solr doc ids and the 
action to perform (add / delete). 
eg, 
exec(delete_from_my_db('xyz') ->
being transaction
 {do here all your DB work}
 {add to tblSolrWork the ID to delete}
end transaction
Hence , If the transaction fails, those records will never actually exist.

You can then have a process that every x seconds/minutes/hours (depending on 
your needs), scans this tbSolrWork table and performs whatever adds or deletes 
are needed. Of course, for the add, you'll also need to get the information to 
add from somewhere, but I imagine you already do that.

Whether and how you could do this in Zope, I have no idea, but if you solve it 
it would be great if you could share it here .

You could also make use of  triggers (on insert / update and onDelete 
triggers), but I suppose that is a bit more DB dependent than plain SP work - 
though it may be simpler to implement than changing all your code to call the 
SP instead of direct SQL cmds...

good luck,
B
_
{Beto|Norberto|Numard} Meijome

"He can compress the most words into the smallest idea of any man I know."
  Abraham Lincoln

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Transactions and Solr Was: Re: Delte by multiple id problem

2008-01-12 Thread Leonardo Santagada


On 12/01/2008, at 11:24, Norberto Meijome wrote:


On Fri, 11 Jan 2008 00:43:19 -0200
Leonardo Santagada <[EMAIL PROTECTED]> wrote:


No, actually my problem is that the solr index is mirroring data on a
database (a Zope app to be more acurate) so it would be better if I
could send the whole transaction together so I don't have to keep it
on separate files... wich I have to do so I can not send anything if
the transaction is aborted (I can't abort a solr add right?).

Maybe I should explain more, but I think this is pretty comon to
anyone trying to keep database transactions and a solr index in sync,
as solr doesn't support two phase commit or anything like that.


Hola Leonardo,
I haven't have to do this, but I am starting to design something  
along these lines.


if you execut your 'add' and 'deletes' from a stored proc, inside a  
transaction, you can simply have an extra table with Solr doc ids  
and the action to perform (add / delete).

eg,
exec(delete_from_my_db('xyz') ->
being transaction
{do here all your DB work}
{add to tblSolrWork the ID to delete}
end transaction
Hence , If the transaction fails, those records will never actually  
exist.


Not that simple, for example, another add with the same unique key  
should remove the key from the delete, and then store the whole data  
twice so you know what to send to solr. Also you have to save a serial  
number of the transaction so you add documents in the right order and  
do the deletes also in order. And having one table that manages this  
in the same relational database could mean a big drop in performance,  
as everything you do on your db would lock, write and read from a  
single or a couple of tables, and this makes your life a living hell  
also :).


What I am doing on Zope is firing some events when new documents are  
added, updated or removed, and then I join the transaction with my  
transaction manager wich orders the adds to solr and already saves a  
xml file to be sent to solr. The problems with this are the ones  
mentioned, it would be simpler if the same file could send all types  
of commands to solr (add and delete are the ones I am using.


Whether and how you could do this in Zope, I have no idea, but if  
you solve it it would be great if you could share it here .


You could also make use of  triggers (on insert / update and  
onDelete triggers), but I suppose that is a bit more DB dependent  
than plain SP work - though it may be simpler to implement than  
changing all your code to call the SP instead of direct SQL cmds...


Probably, but still would hit performance really hard on a relational  
database that have a lot more than documents on it I think.


Does anyone have more experience doing this kind of stuff and whants  
to share?


--
Leonardo Santagada





Restrict values in a multivalued field

2008-01-12 Thread Rishabh Joshi
Hi,

In my schema I have a multivalued field, and the values of that field are
"stored" and "indexed" in the index. I wanted to know if its possible to
restrict the number of multiple values being returned from that field, on a
search? And how? Because, lets say, if I have thousands of values in that
multivalued field, returning all of them would be a lot of load on the
system. So, I want to restrict it to send me only say, 50 values out of the
thousands.

Regards,
Rishabh


Re: Restrict values in a multivalued field

2008-01-12 Thread Otis Gospodnetic
I don't have the answer to this one other than the "process it yourself in your 
app".  But should anyone decide to work on this, I have another similar 
suggestion/request: return N *unique* values from a multivalued field sorted by 
their count.

The use case for this is a tagging system like simpy.com where multiple people 
can tag an entity with the same tags, and while you would want to store/index 
multiple copies for TF purposes, you really don't want to display multiple 
copies of the same tag.

Simpy currently does this manually, for example that happens here: 
http://www.simpy.com/links/search/solr ... and I'm not convinced this belongs 
to Solr.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Rishabh Joshi <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Saturday, January 12, 2008 10:55:32 AM
Subject: Restrict values in a multivalued field

Hi,

In my schema I have a multivalued field, and the values of that field
 are
"stored" and "indexed" in the index. I wanted to know if its possible
 to
restrict the number of multiple values being returned from that field,
 on a
search? And how? Because, lets say, if I have thousands of values in
 that
multivalued field, returning all of them would be a lot of load on the
system. So, I want to restrict it to send me only say, 50 values out of
 the
thousands.

Regards,
Rishabh





Re: Transactions and Solr Was: Re: Delte by multiple id problem

2008-01-12 Thread Otis Gospodnetic
Do you have to have your data in Solr as soon as its added to the DB?  Probably 
not.  What if somebody manually changes the DB?  It will be out of sync with 
your DB.  We see similar situations pretty frequently and our solution is a 
standalone DB Indexing application that knows how to do incremental indexing, 
detect deleted rows/document as well as updates.  So I'd suggest you think 
about that approach.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Leonardo Santagada <[EMAIL PROTECTED]>
To: Norberto Meijome <[EMAIL PROTECTED]>
Cc: solr-user@lucene.apache.org
Sent: Saturday, January 12, 2008 10:16:39 AM
Subject: Transactions and Solr Was: Re: Delte by multiple id problem


On 12/01/2008, at 11:24, Norberto Meijome wrote:

> On Fri, 11 Jan 2008 00:43:19 -0200
> Leonardo Santagada <[EMAIL PROTECTED]> wrote:
>
>> No, actually my problem is that the solr index is mirroring data on
 a
>> database (a Zope app to be more acurate) so it would be better if I
>> could send the whole transaction together so I don't have to keep it
>> on separate files... wich I have to do so I can not send anything if
>> the transaction is aborted (I can't abort a solr add right?).
>>
>> Maybe I should explain more, but I think this is pretty comon to
>> anyone trying to keep database transactions and a solr index in
 sync,
>> as solr doesn't support two phase commit or anything like that.
>
> Hola Leonardo,
> I haven't have to do this, but I am starting to design something  
> along these lines.
>
> if you execut your 'add' and 'deletes' from a stored proc, inside a  
> transaction, you can simply have an extra table with Solr doc ids  
> and the action to perform (add / delete).
> eg,
> exec(delete_from_my_db('xyz') ->
> being transaction
> {do here all your DB work}
> {add to tblSolrWork the ID to delete}
> end transaction
> Hence , If the transaction fails, those records will never actually  
> exist.

Not that simple, for example, another add with the same unique key  
should remove the key from the delete, and then store the whole data  
twice so you know what to send to solr. Also you have to save a serial
  
number of the transaction so you add documents in the right order and  
do the deletes also in order. And having one table that manages this  
in the same relational database could mean a big drop in performance,  
as everything you do on your db would lock, write and read from a  
single or a couple of tables, and this makes your life a living hell  
also :).

What I am doing on Zope is firing some events when new documents are  
added, updated or removed, and then I join the transaction with my  
transaction manager wich orders the adds to solr and already saves a  
xml file to be sent to solr. The problems with this are the ones  
mentioned, it would be simpler if the same file could send all types  
of commands to solr (add and delete are the ones I am using.

> Whether and how you could do this in Zope, I have no idea, but if  
> you solve it it would be great if you could share it here .
>
> You could also make use of  triggers (on insert / update and  
> onDelete triggers), but I suppose that is a bit more DB dependent  
> than plain SP work - though it may be simpler to implement than  
> changing all your code to call the SP instead of direct SQL cmds...

Probably, but still would hit performance really hard on a relational  
database that have a lot more than documents on it I think.

Does anyone have more experience doing this kind of stuff and whants  
to share?

--
Leonardo Santagada