Thanks for the info, James.
I failed to mention in my original message that we're on Solr 3.5 and we are
combining the deletes with our add/updates in the same DIH.
In searching through the archives of this mailing list, I actually found a
thread which described my problem exactly and led me t
Thank you! Now I use the awk to preprocess it. It seems quite efficiency.I
think the other scripting languages will also be helpful.
Return to the post, I would like to know about whether the lucene support
the substring search or not.
As you can see, one field of my document is long string filed
What's status of Solr 4.0? is there anyone start to use it? I heard it
support real time update index, I'm interested in this feature.
Thanks,
Robert Yu
Platform Service - Backend
Morningstar Shenzhen Ltd.
Morningstar. Illuminating investing worldwide.
+86
> I'm curious, why can't you do a master/slave setup?
It's just not all that useful for this particular application. Indexing new
docs and merging segments - which as I understand is the main strength of
having a write-only master - is a relatively small part of our app. What really
is expensiv
Bill,
So sorry - my example is rapidly showing its short comings. The data I
am actually working with is complex and obscure so I was trying to
think of an example that was easy to relate to, but still has all the
relevant characteristics.
Let me try a better example:
Let's suppose a Company is
I am curious why solr results are inconsistent for the query below for an
empty string search on a TextField.
q=name:"" returns 0 results
q=name:"" AND NOT name:"FOOBAR" return all results in the solr index. Should
it should not return 0 results too?
Here is the debugQuery.
0
1
on
on
0
name:
You can do concatenation johns and then put into Solr. You can denormalize the
results. Everyone is telling you the same thing.
Select customer_name, (select group_concat(city) from address where
nameid=customers.nameid) as state_bar from customers
DIH handler has a way to split on comma to add
Dmitry,
If you start to speak about logging, don't forget to say that jdk logging
is absolutely not really performant, but is default for 3.x. Logback is
much faster.
Peyman,
1. shingles has performance implication. That is. it can cost much. Why
term positions and phrase queries are not enough f
Sure we do this a lot for smaller indexes.
Create a string field. Not text. Store it. Then it will come out when you do a
simple select query.
Sent from my Mobile device
720-256-8076
On Mar 11, 2012, at 11:09 AM, Angelyna Bola wrote:
> William,
>
> :: You can also use external fiel
It is way too slow
Sent from my Mobile device
720-256-8076
On Mar 11, 2012, at 12:07 PM, Pat Ferrel wrote:
> I found a description here:
> http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/
>
> If it is the same four years later, it looks like lucene is doing an index
> look
I found a description here:
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/
If it is the same four years later, it looks like lucene is doing an
index lookup for each important term in the example doc boosting each
term based on the term weights. My guess would be that this
Hi I was looking for info on the embedded server too.
So there is no pure API version as a dependency that I can control and run
via the webapp code?
Solr is so popular, I'd assume also it has a JMX enabled API. I should not
have the need for JSPs, servlets etc if I want to index, query and integr
MoreLikeThis looks exactly like what I need. I would probably create a new
"like" method to take a mahout vector and build a search? I build the vector by
starting from a doc and reweighting certain terms. The prototype just reweights words but
I may experiment with dirichlet clusters and rewei
Hmm...let me think. At a minimum we intend to make the hashing mechanism
pluggable...need to think if there is something you else you could try now...
On Mar 8, 2012, at 4:28 AM, Phil Hoy wrote:
> Hi,
>
> If I remove the DistributedUpdateProcessorFactory I will have to manage a
> master slave
William,
:: You can also use external fields, or store formatted info into a
String field in json or xml format.
Thank you for the idea . . .
I have tried to load xml formatted data into Solr (not to be confused
with the Solr XML load format), but not had any luck. Could you please
point me to a
Walter,
:: Fields can be multi-valued. Put multiple phone numbers in a field
and match all of them.
Thank you for the suggestion, unfortunately I oversimplified my example =(
Let me try again:
I should have said that I need to match on 2 fields (as a set) from
within a given child table
Russel,
there's been a thread on that in the lucene world... it's not really perfect
yet.
The suggestion to debugQuery gives only, to my experience, the explain monster
which is good for developers (only).
paul
Le 11 mars 2012 à 08:40, William Bell a écrit :
> debugQuery tells you.
>
> On F
Maybe that's exactly it but... given a document with n tokens A, and m tokens
B, a query A^n B^m would find what you're looking for or?
paul
PS I've always viewed queries as linear forms on the vector space and I'd like
to see this really mathematically written one day...
Le 11 mars 2012 à 07:
one approach we have taken was decreasing the solr logging level for
the posting session, described here (implemented for 1.4, but should
be easy to port to 3.x):
http://dmitrykan.blogspot.com/2011/01/solr-speed-up-batch-posting.html
On 3/11/12, Yandong Yao wrote:
> I have similar issues by usin
I have similar issues by using DIH,
and org.apache.solr.update.DirectUpdateHandler2.addDoc(AddUpdateCommand)
consumes most of the time when indexing 10K rows (each row is about 70K)
- DIH nextRow takes about 10 seconds totally
- If index uses whitespace tokenizer and lower case filter, th
20 matches
Mail list logo