Re: exact match country

2013-05-26 Thread David Smiley (@MITRE.org)
Hi Bill. So it seems you want an exact match to be first even if it is outside the spatial region, right? Your suggested implementation suggests this. And apparently you want to sort by distance, notwithstanding the exact match being first. Although you don't have to do this as two queries, I t

Benchmarking Solr

2013-05-26 Thread Benson Margulies
I'd like to run a repeatable test of having Solr ingest a corpus of docs on disk, to measure the speed of some alternative things plugged in. Anyone have some advice to share? One approach would be a quick SolrJ program that pushed the entire stack as one giant collection with a commit at the end.

Re: index multiple files into one index entity

2013-05-26 Thread Erick Erickson
I'm still not quite getting the issue. Separate requests (i.e. any addition of a SolrInputDocument) are treated as a separate document. There's no notion of "append the contents of one doc to another based on ID", unless you're doing atomic updates. And Tika takes some care to index separate files

Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery

2013-05-26 Thread Erick Erickson
Unfortunately I don't quite know the internals of this code well. I vaguely remember a problem with insuring that deletes were handled correctly, so this may be a manifestation of that fix. As I remember optimistic locking is mixed up in this too. But all that means is that I really can't answer y

Re: Benchmarking Solr

2013-05-26 Thread Upayavira
SolrMeter? Upayavira On Sun, May 26, 2013, at 03:38 PM, Benson Margulies wrote: > I'd like to run a repeatable test of having Solr ingest a corpus of > docs on disk, to measure the speed of some alternative things plugged > in. > > Anyone have some advice to share? One approach would be a quick

Re: fq & facet on double and non-indexed field

2013-05-26 Thread Erick Erickson
bq: Whats is the difference between q and fq other than cache Very little from a functional standpoint. I.e. q=abc AND def and q=abc&fq=def return the same set of results. The differences are 1> the fq clause can be cached efficiently 2> the terms in the fq clause don't contribute to the score of

Re: Core admin action "CREATE" fails to persist some settings in solr.xml with Solr 4.3

2013-05-26 Thread Erick Erickson
I'm beginning to hate solr.xml That stuff should definitely be persisted, please raise a JIRA and assign it to me. Thanks, Erick On Thu, May 23, 2013 at 5:10 PM, André Widhani wrote: > When I create a core with Core admin handler using these request parameters: > > action=CREATE > &name=cor

Re: Note on The Book

2013-05-26 Thread Erick Erickson
Jack: Kudos for carrying on! Having a contract canceled after putting a lot of work into it must be a bummer... Personally I'm not buying many paper books any more, so the e-book version is preferable for me, so take this with a grain of salt.. but make the paper version spiral bound, _please_. I

Re: Distributed query: strange behavior.

2013-05-26 Thread Erick Erickson
Valery: I share your puzzlement. _If_ you are letting Solr do the document routing, and not doing any of the custom routing, then the same unique key should be going to the same shard and replacing the previous doc with that key. But, if you're using custom routing, if you've been experimenting w

Re: Note on The Book

2013-05-26 Thread Jack Krupansky
Thanks, Erick. I could do the experiment of publishing both spiral and perfect found and see which "wins". Spiral does have the one downside of not standing out on a shelf. But, for now, I'll focus on getting the (rough draft) e-book available ASAP. -- Jack Krupansky -Original Message

Re: Tika: How can I import automatically all metadata without specifiying them explicitly

2013-05-26 Thread Erick Erickson
In addition to Alexandre's comment: bq: ...I’d like to import in my index all metadata Be a little careful here, this isn't actually very useful in my experience. Sure it's nice to have all that data in the index, but... how do you search it meaningfully? Consider that some doc may have an "aut

configuring shard handler at a more 'global' level?

2013-05-26 Thread Shawn Heisey
SOLR-3221 added the ability to configure the shard handler in Solr. In particular, increasing maxConnectionsPerHost is important for scalability, and many people might want to enable fairnessPolicy. http://wiki.apache.org/solr/SolrConfigXml#Configuration_of_Shard_Handlers_for_Distributed_searche

Re: Why would one not use RemoveDuplicatesTokenFilterFactory?

2013-05-26 Thread Dotan Cohen
On Fri, May 24, 2013 at 4:04 PM, Jack Krupansky wrote: > The primary purpose of this filter is in conjunction with the > KeywordRepeatFilterFactory and a stemmer, to remove the tokens that did not > produce a stem from the original token, so the keyword duplicate is no > longer needed. The goal is

Re: Why would one not use RemoveDuplicatesTokenFilterFactory?

2013-05-26 Thread Jack Krupansky
The only comment I was trying to make here is the relationship between the RemoveDuplicatesTokenFilterFactory and the KeywordRepeatFilterFactory. No, stemmed terms are not considered the same text as the original word. By definition, they are a new value for the term text. -- Jack Krupansky

"the collection time out" error in every operation in collections API

2013-05-26 Thread adfel70
Hi I use solr 4.3.0 created 3 collections with collections API. Reloaded one of them a few times. The cluster is running for 2 weeks now. Today I tried creating a new collection using the collections API and I get an error "reloadcollection the collection time out: 60s I then tried reloading a co

split document or not

2013-05-26 Thread Oleksiy Druzhynin
I have document divider by paragraphs. How better to add it to Solr? As single str field: paragraph1 paragraph2 paragraph3 Or multivalued fields: paragraph1 paragraph2 paragraph3

Re: split document or not

2013-05-26 Thread Alexandre Rafalovitch
That depends on what you are trying to search. Start your schema design from your _search_ requirements, not your document requirements. See the presentation by Gilt on how they went through different iterations on their document schema design: http://www.slideshare.net/trenaman/lucene-revolution-

Re: split document or not

2013-05-26 Thread Upayavira
On Sun, May 26, 2013, at 10:41 PM, Oleksiy Druzhynin wrote: > I have document divider by paragraphs. How better to add it to Solr? > As single str field: > > > paragraph1 > paragraph2 > paragraph3 > > > Or multivalued fields: > paragraph1 > paragraph2 > paragraph3 Depends what

Re: exact match country

2013-05-26 Thread William Bell
Thanks David ! On Sun, May 26, 2013 at 8:02 AM, David Smiley (@MITRE.org) < dsmi...@mitre.org> wrote: > Hi Bill. > > So it seems you want an exact match to be first even if it is outside the > spatial region, right? Your suggested implementation suggests this. And > apparently you want to

RE: Fuzzy search in solr

2013-05-26 Thread Sagar Chaturvedi
Thank you jack for the response. >> Fuzzy search is the syntax for a term, not a handler. For example: alpha~1 >> will match terms that have an editing distance of 0 or 1 from "alpha". So the search query string will be like - /term?q= alpha~1 >> But, are you sure you really mean "fuzzy search"

Re: Fuzzy search in solr

2013-05-26 Thread Jack Krupansky
Fuzzy query is invoked just like any other query: .../select?q=alpha~1 -- Jack Krupansky -Original Message- From: Sagar Chaturvedi Sent: Sunday, May 26, 2013 11:27 PM To: solr-user@lucene.apache.org Subject: RE: Fuzzy search in solr Thank you jack for the response. Fuzzy search is t

Re: Distributed query: strange behavior.

2013-05-26 Thread Luis Cappa Banda
Hi, Erick! That's it! I'm using a custom implementation of a SolrServer with distributed behavior that routes queries and updates using an in-house Round Robin method. But the thing is that I'm doing this myself because I've noticed that duplicated documents appears using LBHttpSolrServer implemen