Re: core.properties setup help

2014-05-21 Thread Aman Tandon
Thanks Erick for replying, actually i need to configure this for solrcloud so i am confused here. Both mcat and cat have different schema in my case. With Regards Aman Tandon On Tue, May 20, 2014 at 6:05 AM, Erick Erickson wrote: > You can actually just remove those entries from solr.xml (and a

Re: Custom filter not working with solr 4.7.1

2014-05-21 Thread Kamal Kishore Aggarwal
Thanks Shawn for quick reply. I am trying to change the code (removing the errors from the code shown in image) & will test the filter after that & will update here. Thanks Kamal Kishore On Mon, May 19, 2014 at 10:17 PM, Shawn Heisey wrote: > On 5/19/2014 1:10 AM, Kamal Kishore Aggarwal wrote

NumberFormatException in solr.SpatialRecursivePrefixTreeFieldType in solr upgrade

2014-05-21 Thread Kamal Kishore Aggarwal
I am using following field type with solr 4.2 & its working fine. But, when I am upgrading solr to solr 4.7.1, it is reporting following errors while posting docs: Caused by: com.spatial4j.core.exception.InvalidShapeException: java.lang.NumberFormatException: For input string: "78.42968,30.7333

Re: Stemming for Chinese and Japanese

2014-05-21 Thread Alexandre Rafalovitch
You may want to review the series of articles about CJK support for Solr: http://discovery-grindstone.blogspot.ca/ . Probably answers more questions that you dare to ask. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating yo

SolrCloud : node recovery fails with "No registered leader was found"

2014-05-21 Thread zzT
SolrCloud configuration contains a single shard and 2 Solr servers, therefore one acts as a leader and one as a replica. Through a series of events(*) I've ended up with one Solr server being in "Active" status and the leader of the shard while the other one in "Recovery failed" status which canno

Re: date range queries efficiency

2014-05-21 Thread Dmitry Kan
Thanks, Erick! On Tue, May 20, 2014 at 3:55 AM, Erick Erickson wrote: > This might be useful: > http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/ > > Best, > Erick > > On Mon, May 19, 2014 at 12:09 AM, Dmitry Kan wrote: > > Thanks, Jack, Alex and Shawn. > > > > This makes proper

grouping of multivalued fields

2014-05-21 Thread Thomas Scheffler
Hi, I have a special case of grouping multivalued fields and I wonder if this is possible with SOLR. I have a field "foo" that is generally multivalued. But for a restricted set of documents this field has one value or is not present. So normally grouping should work. Sadly SOLR is failing

Re: grouping of multivalued fields

2014-05-21 Thread Joel Bernstein
You may want to investigate the group.func option. This would allow you to plug in your own logic to return the group by key. I don't think there is an existing function that does exactly what you need so you may have to write a custom function. Joel Bernstein Search Engineer at Heliosearch On W

Re: solr-user Digest of: get.100322

2014-05-21 Thread Jack Krupansky
Just to re-emphasize the point - when provisioning Solr, you need to ASSURE that the system has enough system memory so that the Solr index on that system fits entirely in the OS file system cache. No ifs, ands, or buts. If you fail to follow that RULE, all bets are off for performance and don't

Re: grouping of multivalued fields

2014-05-21 Thread Thomas Scheffler
Am 21.05.2014 15:07, schrieb Joel Bernstein: You may want to investigate the group.func option. This would allow you to plug in your own logic to return the group by key. I don't think there is an existing function that does exactly what you need so you may have to write a custom function. I th

Distributed Search in Solr with different queries per shard

2014-05-21 Thread Avner Levy
I have 2 cores. One with active data and one with historical data (for documents which were removed from the active one). I want to run Distributed Search on both and get the unified result (as supported by Solr Distributed Search, I'm not using Solr Cloud). My problem is that the query for each

Date truncation and time zone when searching

2014-05-21 Thread jimi.hullegard
Hi, What is the prefered way to do searches with date truncation with respect to a specific time zone? The dates are stored correctly, ie I can see the UTC date in the index and if I add 1 or 2 hours (depending on daylight saving time or not) I get the time in our time zone (CET/CEST). But when

RE: Date truncation and time zone when searching

2014-05-21 Thread Michael Ryan
Well for CEST, which is 2 hours ahead, I would think you could just do... datefield:[* TO NOW/MONTH-2HOURS] That would give you everything up to 2014-04-30 22:00:00 GMT, which is 2014-05-01 00:00:00 CEST. Always always always store the correct value. -Michael -Original Message- From:

SV: Date truncation and time zone when searching

2014-05-21 Thread jimi.hullegard
OK. Feels a bit strange that one would have to do this manual calculation in every place that performs searches like this. Would be much more logical if solr supported specifying the timezone in the query (with a default setting being possible to configure in solrconfig), and that solr itself di

Re: solr-user Digest of: get.100322

2014-05-21 Thread Shawn Heisey
On 5/21/2014 7:28 AM, Jack Krupansky wrote: > Just to re-emphasize the point - when provisioning Solr, you need to > ASSURE that the system has enough system memory so that the Solr index > on that system fits entirely in the OS file system cache. No ifs, > ands, or buts. If you fail to follow that

Re: solr problem after indexing, shutdown and startup

2014-05-21 Thread Michael Della Bitta
Two possibly unrelated things: 1. Don't commit until the end. 2. Consider not optimizing at all. You might want to look at your autocommit settings in your solrconfig.xml. You probably want soft commits set at something north of 10 seconds, and hard commits set to openSearcher=false with a maxTi

Re: Indexing getting failed after some millions of documents

2014-05-21 Thread Erick Erickson
How much memory have you allocated the JVMs? Also, what's does the Solr log show on the machine that isn't coming up? Sounds like the node went down and perhaps went into recovery And how are you indexing? Best, Erick On Tue, May 20, 2014 at 11:54 PM, Tim Burner wrote: > Hi Everyone, > > I

Re: core.properties setup help

2014-05-21 Thread Erick Erickson
then don't worry about cores. Use the collections API to create your collections. Note: you use the ZkCli script to push configuration files up to ZK and give them a name, so you can have your "mcat" and "cat" configurations. Then when you create the collection, you tell it which set of configurat

Re: SolrCloud : node recovery fails with "No registered leader was found"

2014-05-21 Thread Erick Erickson
What version of Solr? On Wed, May 21, 2014 at 2:23 AM, zzT wrote: > SolrCloud configuration contains a single shard and 2 Solr servers, therefore > one acts as a leader and one as a replica. > > Through a series of events(*) I've ended up with one Solr server being in > "Active" status and the le

Re: Distributed Search in Solr with different queries per shard

2014-05-21 Thread Erick Erickson
I suppose you could, but I _really_ question whether it's a wise investment in time. Personally I'd treat them as two different collections and have the app layer fire off two queries and do the aggregation (this is a variant of "federated search" I think). This removes your issue with having the c

Using fq as OR

2014-05-21 Thread johnmunir
Hi, Currently, I'm building my search as follows: q=(search string ...) AND (type:type_a OR type:type_b OR type:type_c OR ...) Which means anything I search for will be AND'ed to be in either fields that have "type_a", "type_b", "type_c", etc. (I have defaultOperator set to "AND") Now

boosting multivalued fields

2014-05-21 Thread vit
is it posiible to boost values of the same field. For example in a query like that: category_id:(2271578^0.5 22718986^0.4 475101^0.2) -- View this message in context: http://lucene.472066.n3.nabble.com/boosting-multivalued-fields-tp4137409.html Sent from the Solr - User mailing list archive at

Re: Date truncation and time zone when searching

2014-05-21 Thread Erick Erickson
Try the TZ parameter on the query, as blah&TZ=GMT-4 There's a good discussion of why PDT is ambiguous here: https://issues.apache.org/jira/browse/SOLR-2690. You can define whatever default parameters you want in your handler in the section, the TZ parameter included. Best Erick On Wed, May 21

Re: solr problem after indexing, shutdown and startup

2014-05-21 Thread Erick Erickson
Sending 32K docs at a time is a bit of overkill, I usually stay around 1,000. I agree with Michael, it's rarely a Good Thing to do the commit in line, except (perhaps) at the very end. Just let the autocommit settings take care of it for you. Your log shows numbers of "overlapping ondeck searcher

Re: SolrJ Unicode problem

2014-05-21 Thread Sivathanupillai, Jothi
Great ! This solution worked for me Jothi Sivathanupillai, IT Programmer Analyst Principal I _ Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may in

Re: Using fq as OR

2014-05-21 Thread Shawn Heisey
On 5/21/2014 9:26 AM, johnmu...@aol.com wrote: > Currently, I'm building my search as follows: > > > q=(search string ...) AND (type:type_a OR type:type_b OR type:type_c OR > ...) > > > Which means anything I search for will be AND'ed to be in either fields that > have "type_a", "type_b", "ty

Re: Using fq as OR

2014-05-21 Thread Jack Krupansky
How would you characterizer the differences that you see which you try "q=search string ...&fq=type:(type_a OR type_b OR type_c OR ...)"? That does look like the right way to do it. Is the count of documents different? Are some documents missing or added? Or is it just the ordering of documents

Re: Distributed Search in Solr with different queries per shard

2014-05-21 Thread Jack Krupansky
Unfortunately the same query will be sent to all cores if you use the shards parameter to query multiple cores. Is there some characteristic of the first core that is distinct from the second core so that you could OR the differences between the two? -- Jack Krupansky -Original Message--

Re: core.properties setup help

2014-05-21 Thread Aman Tandon
Thanks erick for suggestion, yeah you is right i am doing some hurry to implement it, i will invest some time to properly understand the working of SolrCloud :) With Regards Aman Tandon On Wed, May 21, 2014 at 8:38 PM, Erick Erickson wrote: > then don't worry about cores. Use the collections AP

Re: Using fq as OR

2014-05-21 Thread johnmunir
Answering Jack's question first: the result is different, by few counts, but I found my problem:I was using the wrong syntax in my code vs. what I posted here: I was using q=(search string ...) AND (type:type_a OR type_b OR type_c OR ...) (see how I left out "type:" from "type_b" and "t

Re: Date truncation and time zone when searching

2014-05-21 Thread Chris Hostetter
: Try the TZ parameter on the query, as blah&TZ=GMT-4 Docs... https://cwiki.apache.org/confluence/display/solr/Working+with+Dates : There's a good discussion of why PDT is ambiguous here: : https://issues.apache.org/jira/browse/SOLR-2690. -Hoss http://www.lucidworks.com/

multiple queries in single request

2014-05-21 Thread Pavel Belenkovich
Hi, I have list of 1000 values for some field which is sort of id (essentially unique between documents) (let's say firstname_lastmane). I need to get the document for each id (to know which document is for which id, not just list of responses). Is there some support for multiple queries in sin

SOLR 4.8 Collections API Create Collection Fails on Tomcat

2014-05-21 Thread christopher palm
Hello, I am trying to use the collections API to add in a secondary collection similar to what is described in this article http://heliosearch.org/solrcloud-assigning-nodes-machines/ It works fine on a jetty setup where bootstrap config dir is specified. When I try to run this on a tomcat solr that

Re: SOLR 4.8 Collections API Create Collection Fails on Tomcat

2014-05-21 Thread Joel Bernstein
Chris, I wasn't able to reproduce the error with a stock install of the HelioSearch Distribution for Solr (HDS). HDS runs under Tomcat. Looking at the stack trace this looks like the problem: Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/ebs-d

SV: Date truncation and time zone when searching

2014-05-21 Thread jimi.hullegard
Thanks! I totally forgot to add the word "math" (as in 'solr date math time zone') when searching for a solution on this, so I never stumbled upon that jira issue. Will giv it a try. /Jimi > -Ursprungligt meddelande- > Från: Erick Erickson [mailto:erickerick...@gmail.com] > Skickat: den

[ANNOUNCE] Luke 4.8.0 released

2014-05-21 Thread Dmitry Kan
Hello, Luke 4.8.0 has been released. Download it here: https://github.com/DmitryKey/luke/releases/tag/4.8.0 The release tested against the solr-4.8.0 index. This is a trivial update (maven, no code changes), but a required one: index version changed since the last release. This version has als

Re: Using fq as OR

2014-05-21 Thread Jack Krupansky
The whole point of a filter query is to hide data but without impacting the scoring for the non-hidden data. A second goal is performance since the filter query can be cached. So, the immediate question for you is whether you really want a true filter query, or if you actually do what the filt

Re: SOLR 4.8 Collections API Create Collection Fails on Tomcat

2014-05-21 Thread cpalm
Joel, Yes we use that to force the data dir to be someplace other than solr.home. Removing the data dir field just puts the data in solr home, which isn't desirable. solr home is specified in conf/Catalina/localhost/solr.xml Is there any way to use the collections api with a data.dir set? Th

Re: Using fq as OR

2014-05-21 Thread johnmunir
Hi Jack, I'm going after speed per: https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-Thefq%28FilterQuery%29Parameter If using "fq" ranking will now be different, I need to understand why. Even more, I'm now wandering, which ranking is correct th

Re: Using fq as OR

2014-05-21 Thread Walter Underwood
I think of it this way: fq: select, do not score q: select, score bq/boost: do not select, score This looks better in a 2x3 table. wunder On May 21, 2014, at 2:05 PM, "Jack Krupansky" wrote: > The whole point of a filter query is to hide data but without impacting the > scoring for the non-h

Re: Using fq as OR

2014-05-21 Thread Jack Krupansky
As I indicated in my original response, the fq query terms do not participate in any way in the scoring of documents - they merely filter (eliminate or keep) documents. If you actually do want the fq terms to participate in the scoring of documents, either keep them on the original q query, or

Re: multiple queries in single request

2014-05-21 Thread Jack Krupansky
Nothing special for this use case. This seems to be a use case that I would call "bulk data retrieval - based on ID". I would suggest "batching" your requests - limit each request query to, say, 50 or 100 IDs. -- Jack Krupansky -Original Message- From: Pavel Belenkovich Sent: Wed

Re: boosting multivalued fields

2014-05-21 Thread Jack Krupansky
Yes. -- Jack Krupansky -Original Message- From: vit Sent: Wednesday, May 21, 2014 11:20 AM To: solr-user@lucene.apache.org Subject: boosting multivalued fields is it posiible to boost values of the same field. For example in a query like that: category_id:(2271578^0.5 22718986^0.4 4

Re: Using fq as OR

2014-05-21 Thread johnmunir
Interesting!! I did not know that using "fq" means the result will NOT be scored. When you say "add a boosting query using the bq parameter" can you give me an example? I read on "bq" but could not figure out how to convert: q=(searchstring ...) AND (type:type_a OR type:type_b OR type:ty

Re: SOLR 4.8 Collections API Create Collection Fails on Tomcat

2014-05-21 Thread cpalm
What I did to finally get this working was remove the Catalina/localhost/solr.xml which was allowing the binary to be run from a different location, I made sure the solr.solr.home=/ebs-data/solr/conf was set to the file mount I wanted and I then had to remove the Dsolr.data.dir=/ebs-data/solr/data

Re: Using fq as OR

2014-05-21 Thread Jack Krupansky
The results will be scored, but only based on terms in q, not terms in fq. -- Jack Krupansky -Original Message- From: johnmu...@aol.com Sent: Wednesday, May 21, 2014 6:41 PM To: solr-user@lucene.apache.org Subject: Re: Using fq as OR Interesting!! I did not know that using "fq" means

commit on a spec shard with SolrCloud

2014-05-21 Thread YouPeng Yang
Hi Doing DIH to one of shards in my SolrCloud Colleciton.I notice that every time do ing commit in the shard,all the other shards do commit too. I have check the source code DistributedUpdateProcessor.processCommit ,it said that processCommit would extend to all the shard in the collection. W

Re: commit on a spec shard with SolrCloud

2014-05-21 Thread Alexandre Rafalovitch
You can probably do a custom update request processor chain and skip the distributed component. No idea of the consequences though. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, May 22, 2

Applying boosting for keyword search

2014-05-21 Thread manju16832003
HI, I have a scenario where by I apply boosting in the following two cases - Usual search, by user selection - Keyword search. I have a field *keyword* that is copy/combination of many fields When user does the usual query, my boosting works fines, this is how I do boosting /select?q=featured:t

Re: Using fq as OR

2014-05-21 Thread manju16832003
The *fq* is used for searching more deterministic results something like WHERE type={} Where as *q* is something like WHERE type like '%%' user *fq*, if your are sure of what your going to search use *q*, if not sure what your trying to search If you are using fq and if you do not get any matchin

Re: Applying boosting for keyword search

2014-05-21 Thread Jack Krupansky
Just add the boost to the keyword: q=toyota^100. Or, use the dismax or edismax query parsers and then the boost can be specified for the field: qf=keyword^100. -- Jack Krupansky -Original Message- From: manju16832003 Sent: Thursday, May 22, 2014 12:04 AM To: solr-user@lucene.apache.

pdfs

2014-05-21 Thread Brian McDowell
Has anyone had issues with indexing pdf files? Some pdfs are bringing down Solr completely so that it actually needs to be manually restarted. We are using Solr 4.4 and thought that upgrading to Solr 4.8 would solve the problem because the release notes associated with the new tika version and also

Re: Applying boosting for keyword search

2014-05-21 Thread manju16832003
Hi Jack, Thanks for your help. I do not want to boost *keyword* field. I apply full text search no keyword field and boost based on another field *featured*. Also qf field allows us to boost the field without values. I would like to boost with value Ex: qf=featured:true^100 - I don't think this

Re: pdfs

2014-05-21 Thread Alexandre Rafalovitch
Run Tika in a client instead? Or as a standalone server listening over TCP socket). Ship only extractions to Solr. This is more efficient as well. I suspect, there would always be PDFs that cause strange behaviour, even if just based on memory requirements (e.g. embedded images). If that becomes a

Re: pdfs

2014-05-21 Thread Jack Krupansky
Yeah, PDF extraction has always been at least somewhat problematic. It has improved over the years, but still not likely to be perfect. That said, I'm not aware of any specific PDF extraction issue that would bring down Solr - as opposed to causing a 500 status with an exception in PDF extract

RE: Distributed Search in Solr with different queries per shard

2014-05-21 Thread Avner Levy
Yes, there is. But since the real query is very long and complex per core, I don't want each core to work very hard on irrelevant query parts of other cores. Perhaps I can write some query plugin which will strip the unnecessary parts on each core? Thanks, Avner -Original Message- Fro

Re: Applying boosting for keyword search

2014-05-21 Thread Jack Krupansky
Your original message had "q=toyota featured:true^100" and also using bq - both are valid. If either is not working for you, please be specific about what exactly is not behaving as you expected - what the symptom is. Sometimes you have to experiment with the boost factor. -- Jack Krupansky -

RE: Distributed Search in Solr with different queries per shard

2014-05-21 Thread Avner Levy
I believe unifying multiple query results including facets, paging, sorts and other extra features on my own in the application is complex as well. Is there some Solr code I can use in the application level to unify multiple results? (this can be actually an interesting direction) The queries wer