Re: Solr DIH and $deleteDocById

2012-03-11 Thread Peter Boudreau
Thanks for the info, James. I failed to mention in my original message that we're on Solr 3.5 and we are combining the deletes with our add/updates in the same DIH. In searching through the archives of this mailing list, I actually found a thread which described my problem exactly and led me t

Does the lucene support the substring search?

2012-03-11 Thread neosky
Thank you! Now I use the awk to preprocess it. It seems quite efficiency.I think the other scripting languages will also be helpful. Return to the post, I would like to know about whether the lucene support the substring search or not. As you can see, one field of my document is long string filed

Solr 4.0

2012-03-11 Thread Robert Yu
What's status of Solr 4.0? is there anyone start to use it? I heard it support real time update index, I'm interested in this feature. Thanks, Robert Yu Platform Service - Backend Morningstar Shenzhen Ltd. Morningstar. Illuminating investing worldwide. +86

RE: How to limit the number of open searchers?

2012-03-11 Thread Michael Ryan
> I'm curious, why can't you do a master/slave setup? It's just not all that useful for this particular application. Indexing new docs and merging segments - which as I understand is the main strength of having a write-only master - is a relatively small part of our app. What really is expensiv

Re: 3 Way Solr Join . . ?

2012-03-11 Thread Angelyna Bola
Bill, So sorry - my example is rapidly showing its short comings. The data I am actually working with is complex and obscure so I was trying to think of an example that was easy to relate to, but still has all the relevant characteristics. Let me try a better example: Let's suppose a Company is

Strange behavior with search on empty string and NOT

2012-03-11 Thread Lan
I am curious why solr results are inconsistent for the query below for an empty string search on a TextField. q=name:"" returns 0 results q=name:"" AND NOT name:"FOOBAR" return all results in the solr index. Should it should not return 0 results too? Here is the debugQuery. 0 1 on on 0 name:

Re: 3 Way Solr Join . . ?

2012-03-11 Thread Bill Bell
You can do concatenation johns and then put into Solr. You can denormalize the results. Everyone is telling you the same thing. Select customer_name, (select group_concat(city) from address where nameid=customers.nameid) as state_bar from customers DIH handler has a way to split on comma to add

Re: Faster Solr Indexing

2012-03-11 Thread Mikhail Khludnev
Dmitry, If you start to speak about logging, don't forget to say that jdk logging is absolutely not really performant, but is default for 3.x. Logback is much faster. Peyman, 1. shingles has performance implication. That is. it can cost much. Why term positions and phrase queries are not enough f

Re: 3 Way Solr Join . . ?

2012-03-11 Thread Bill Bell
Sure we do this a lot for smaller indexes. Create a string field. Not text. Store it. Then it will come out when you do a simple select query. Sent from my Mobile device 720-256-8076 On Mar 11, 2012, at 11:09 AM, Angelyna Bola wrote: > William, > > :: You can also use external fiel

Re: Vector based queries

2012-03-11 Thread Bill Bell
It is way too slow Sent from my Mobile device 720-256-8076 On Mar 11, 2012, at 12:07 PM, Pat Ferrel wrote: > I found a description here: > http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ > > If it is the same four years later, it looks like lucene is doing an index > look

Re: Vector based queries

2012-03-11 Thread Pat Ferrel
I found a description here: http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ If it is the same four years later, it looks like lucene is doing an index lookup for each important term in the example doc boosting each term based on the term weights. My guess would be that this

Re: embedded server / servlet container

2012-03-11 Thread Arjun Dhar
Hi I was looking for info on the embedded server too. So there is no pure API version as a dependency that I can control and run via the webapp code? Solr is so popular, I'd assume also it has a JMX enabled API. I should not have the need for JSPs, servlets etc if I want to index, query and integr

Re: Vector based queries

2012-03-11 Thread Pat Ferrel
MoreLikeThis looks exactly like what I need. I would probably create a new "like" method to take a mahout vector and build a search? I build the vector by starting from a doc and reweighting certain terms. The prototype just reweights words but I may experiment with dirichlet clusters and rewei

Re: Custom Sharding on solrcloud

2012-03-11 Thread Mark Miller
Hmm...let me think. At a minimum we intend to make the hashing mechanism pluggable...need to think if there is something you else you could try now... On Mar 8, 2012, at 4:28 AM, Phil Hoy wrote: > Hi, > > If I remove the DistributedUpdateProcessorFactory I will have to manage a > master slave

Re: 3 Way Solr Join . . ?

2012-03-11 Thread Angelyna Bola
William, :: You can also use external fields, or store formatted info into a String field in json or xml format. Thank you for the idea . . . I have tried to load xml formatted data into Solr (not to be confused with the Solr XML load format), but not had any luck. Could you please point me to a

Re: 3 Way Solr Join . . ?

2012-03-11 Thread Angelyna Bola
Walter, :: Fields can be multi-valued. Put multiple phone numbers in a field and match all of them. Thank you for the suggestion, unfortunately I oversimplified my example =( Let me try again: I should have said that I need to match on 2 fields (as a set) from within a given child table

Re: Knowing which fields matched a search

2012-03-11 Thread Paul Libbrecht
Russel, there's been a thread on that in the lucene world... it's not really perfect yet. The suggestion to debugQuery gives only, to my experience, the explain monster which is good for developers (only). paul Le 11 mars 2012 à 08:40, William Bell a écrit : > debugQuery tells you. > > On F

Re: Vector based queries

2012-03-11 Thread Paul Libbrecht
Maybe that's exactly it but... given a document with n tokens A, and m tokens B, a query A^n B^m would find what you're looking for or? paul PS I've always viewed queries as linear forms on the vector space and I'd like to see this really mathematically written one day... Le 11 mars 2012 à 07:

Re: Faster Solr Indexing

2012-03-11 Thread Dmitry Kan
one approach we have taken was decreasing the solr logging level for the posting session, described here (implemented for 1.4, but should be easy to port to 3.x): http://dmitrykan.blogspot.com/2011/01/solr-speed-up-batch-posting.html On 3/11/12, Yandong Yao wrote: > I have similar issues by usin

Re: Faster Solr Indexing

2012-03-11 Thread Yandong Yao
I have similar issues by using DIH, and org.apache.solr.update.DirectUpdateHandler2.addDoc(AddUpdateCommand) consumes most of the time when indexing 10K rows (each row is about 70K) - DIH nextRow takes about 10 seconds totally - If index uses whitespace tokenizer and lower case filter, th