bug in search with sloppy queries

2015-06-14 Thread Dmitry Kan
Hi guys, We observe some strange bug in solr 4.10.2, where by a sloppy query hits words it should not: the "e commerce"the "e commerce"SpanNearQuery(spanNear([Contents:the, spanNear([Contents:eä, Contents:commerceä], 0, true)], 300, false))spanNear([Contents:the, spanNear([Contents:eä, Contents:c

Phrase query get converted to SpanNear with slop 1 instead of 0

2015-06-14 Thread ariya bala
Hi, I encounter this peculiar case with solr 4.10.2 where the parsed query doesnt seem to be logical. PHRASE23("reduce workforce") ==> SpanNearQuery(spanNear([spanNear([Contents:reduceä, Contents:workforceä], 1, true)], 23, true)) The question is why does the Phrase("quoted string") gets convert

Re: What's wrong

2015-06-14 Thread Test Test
Re, Thanks for your reply. I mock my parser like this : @Overridepublic Query parse() {      SpanQuery[] clauses = new SpanQuery[2];       clauses[0] = new SpanTermQuery(new Term("details", "london"));        clauses[1] = new SpanTermQuery(new Term("details", "city"));      return new SpanNearQue

Limitation on Collections Number

2015-06-14 Thread Arnon Yogev
We're running some tests on Solr and would like to have a deeper understanding of its limitations. Specifically, We have tens of millions of documents (say 50M) and are comparing several "#collections X #docs_per_collection" configurations. For example, we could have a single collection with 50M

Integrating Solr 5.2.0 with nutch 1.10

2015-06-14 Thread kunal chakma
Hi, I am very new to the nutch and solr plateform. I have been trying a lot to integrate Solr 5.2.0 with nutch 1.10 but not able to do so. I have followed all the steps mentioned at nutch 1.x tutorial page but when I execute the following command , bin/nutch solrindex http://localhost:8983/so

Re: Limitation on Collections Number

2015-06-14 Thread Jack Krupansky
As a general rule, there are only two ways that Solr scales to large numbers: large number of documents and moderate number of nodes (shards and replicas). All other parameters should be kept relatively small, like dozens or low hundreds. Even shards and replicas should probably kept down to that s

Re: What's wrong

2015-06-14 Thread Jack Krupansky
Why don't you take a step back and tell us what you are really trying to do. Try using a normal Solr query parser first, to verify that the data is analyzed as expected. Did you try using the surround query parser? It supports span queries. Your span query appears to require that the two terms a

Re: Limitation on Collections Number

2015-06-14 Thread Shai Erera
Thanks Jack for your response. But I think Arnon's question was different. If you need to index 10,000 different collection of documents in Solr (say a collection denotes someone's Dropbox files), then you have two options: index all collections in one Solr collection, and add a field like collect

Re: bug in search with sloppy queries

2015-06-14 Thread Erick Erickson
My guess is that you have WordDelimiterFilterFactory in your analysis chain with parameters that break up E-Tail to both "e" and "tail" _and_ put them in the same position. This assumes that the result fragment you pasted is incomplete and "commerce" is in it >From E-Tail commerce or some such. T

Re: Integrating Solr 5.2.0 with nutch 1.10

2015-06-14 Thread Erick Erickson
No clue, you'd probably have better luck on the Nutch user's list unless there are _Solr_ errors. Does your Solr log show any errors? Best, Erick On Sun, Jun 14, 2015 at 6:49 AM, kunal chakma wrote: > Hi, > I am very new to the nutch and solr plateform. I have been trying a > lot to integr

Re: Limitation on Collections Number

2015-06-14 Thread Erick Erickson
To my knowledge there's nothing built in to Solr to limit the number of collections. There's nothing explicitly in place to handle many hundreds of collections either so you're really in uncharted, certainly untested waters. Anecdotally we've heard of the problem you're describing. You say you sta

Re: Limitation on Collections Number

2015-06-14 Thread Jack Krupansky
My answer remains the same - a large number of collections (cores) in a single Solr instance is not one of the ways in which Solr is designed to scale. To repeat, there are only two ways to scale Solr, number of documents and number of nodes. -- Jack Krupansky On Sun, Jun 14, 2015 at 11:00 AM,

Re: Limitation on Collections Number

2015-06-14 Thread Shalin Shekhar Mangar
Yes, there are some known problems while scaling to large number of collections, say 1000 or above. See https://issues.apache.org/jira/browse/SOLR-7191 On Sun, Jun 14, 2015 at 8:30 PM, Shai Erera wrote: > Thanks Jack for your response. But I think Arnon's question was different. > > If you need

Re: Limitation on Collections Number

2015-06-14 Thread Shai Erera
> > My answer remains the same - a large number of collections (cores) in a > single Solr instance is not one of the ways in which Solr is designed to > scale. To repeat, there are only two ways to scale Solr, number of > documents and number of nodes. > Jack, I understand that, but I still feel y

Re: file index format

2015-06-14 Thread Frank Ralf
Hi, I face the same problem when trying to index DITA XML files. These are XML files but have the file extension .dita which Solr ignores. According to java -jar post.jar -h only the following file extensions are supported: /-Dfiletypes=[,,...] (default=xml,json,csv,pdf,doc,docx,ppt,pptx,xls

Solrj Tika/Cell not using defaultField

2015-06-14 Thread Charlie Hubbard
I'm having trouble getting Solr to pay attention to the defaultField value when I send a document to Solr Cell or Tika. Here is my post I'm sending using Solrj POST /solr/collection1/update/extract?extractOnly=true&defaultField=text&wt=javabin&version=2 HTTP/1.1 When I get the response back the

Re: file index format

2015-06-14 Thread Frank Ralf
Looks like this has been solved recently in the current dev branch: "SimplePostTool (and thus bin/post) cannot index files with unknown extensions" https://issues.apache.org/jira/browse/SOLR-7546 -- View this message in context: http://lucene.472066.n3.nabble.com/file-index-format-tp4199892p42

Re: Limitation on Collections Number

2015-06-14 Thread Erick Erickson
re: hybrid approach. Hmmm, _assuming_ that no single user has a really huge number of documents you might be able to use a single collection (or much smaller group of collections), by using custom routing. That allows you to send all the docs for a particular user to a particular shard. There are

Re: file index format

2015-06-14 Thread Frank Ralf
This issue has also already been discussed in the Tika issue queue: "Add method get file extension from MimeTypes" https://issues.apache.org/jira/browse/TIKA-538 And http://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml does support DITA X

Re: Division with Stats Component when Grouping in Solr

2015-06-14 Thread kingofhypocrites
I think I have this about working with the analytics component. It seems to fill in all the gaps that the stats component and the json facet don't support. It solved the following problems for me: - I am able to perform math on stats to form other stats.. Then i can sort on those as needed. - When

Re: Division with Stats Component when Grouping in Solr

2015-06-14 Thread Erick Erickson
Why it isn't in core Solr... Because it doesn't (and probably can't) support distributed mode. The Streaming aggregation stuff, and the (in trunk Real Soon Now) Parallel SQL support are where the effort is going to support this kind of stuff. https://issues.apache.org/jira/browse/SOLR-7560 https:

Please help test the new Angular JS Admin UI

2015-06-14 Thread Erick Erickson
And anyone who, you know, really likes working with UI code please help making it better! As of Solr 5.2, there is a new version of the Admin UI available, and several improvements are already in 5.2.1 (release imminent). The old admin UI is still the default, the new one is available at /admin/i

Re: Issues with using Paoding to index Chinese characters

2015-06-14 Thread Zheng Lin Edwin Yeo
But I think Solr 3.6 is too far back to fall back to as I'm already using Solr 5.1. Regards, Edwin On 14 June 2015 at 14:49, Upayavira wrote: > When in 2012? I'd give it a go with Solr 3.6 if you don't want to modify > the library. > > Upayavira > > On Sun, Jun 14, 2015, at 04:14 AM, Zheng Lin

invalid index version and generation

2015-06-14 Thread Summer Shire
Hi all, Every time I optimize my index with maxSegment=2 after some time the replication fails to get filelist for a given generation. Looks like the index version and generation count gets messed up. (If the maxSegment=1 this never happens. I am able to successfully reproduce this by optimizin

RE: Solr Exact match boost Reduce the results

2015-06-14 Thread JACK
Hi chillra, I have changed the index and query filed configuration to But still my problem not solved , it won't resolve my problem. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Exact-match-boost-Reduce-the-results-tp4211352p4211788.html Sent from the Solr - U