Re: Range facets in sharded search

2015-04-16 Thread Tomás Fernández Löbbe
Should be fixed in 5.2. See https://issues.apache.org/jira/browse/SOLR-7412 On Thu, Apr 16, 2015 at 3:18 PM, Tomás Fernández Löbbe < tomasflo...@gmail.com> wrote: > This looks like a bug. The logic to merge range facets from shards seems > to only be merging counts, not the first level elements.

SolrJ Exceptions

2015-04-16 Thread Bryan Bende
I'm trying to identify the difference between an exception when Solr is in a bad state/down vs. when it is up but an invalid request was made (maybe some bad data sent in). The JavaDoc for SolrRequest process() says: *@throws SolrServerException if there is an error on the Solr server@throws IOE

Re: 5.1 'unique' facet function / calcDistinct

2015-04-16 Thread Yonik Seeley
Thanks for the feedback Levan! Could you open a JIRA issue for unique() on numeric/date fields? We don't yet have explicit numeric support for unique() and I think some changes in Lucene 5 broke treating these fields as strings (i.e. the ability to retrieve ords). -Yonik On Thu, Apr 16, 2015 at

SolrCloud 4.8.0 upgrade

2015-04-16 Thread Vincenzo D'Amore
Hi All, I have a SolrCloud cluster with 3 server, I would like to use stats.facet, but this feature is available only if I upgrade to 4.10. May I simply redeploy new solr cloud version in tomcat or should reload all the documents? There are other drawbacks? Best regards, Vincenzo

Re: Differentiating user search term in Solr

2015-04-16 Thread Steven White
Hi Hoss, Maybe I'm missing something, but I tried this and got 1 hit: http://localhost:8983/solr/db/select?q=title:(Apache%20Solr%20Notes)&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&q.op=AND Than I tried this and got 0 hit: http://localhost:8983/solr/db/select?q={!field%20f=title%20v=$qq}&qq=Ap

SolrCloud Core Reload

2015-04-16 Thread Vincenzo D'Amore
Hi all, I have a solrcloud cluster with 3 server and there are many cores. Using the SolrCloud UI Admin Core, if I execute core "optimize" (or "reload"), all the core in the cluster will be optimized or reloaded? or only the selected core?. Best regards, Vincenzo

Re: Range facets in sharded search

2015-04-16 Thread Tomás Fernández Löbbe
This looks like a bug. The logic to merge range facets from shards seems to only be merging counts, not the first level elements. Could you create a Jira? On Thu, Apr 16, 2015 at 2:38 PM, Will Miller wrote: > I am seeing some some odd behavior with range facets across multiple > shards. When que

Range facets in sharded search

2015-04-16 Thread Will Miller
I am seeing some some odd behavior with range facets across multiple shards. When querying each node directly with distrib=false the facet returned matches what is expected. When doing the same query against the collection and it spans the two shards, the facet after and between buckets are wron

Re: Differentiating user search term in Solr

2015-04-16 Thread Chris Hostetter
: The summary of your email is: client's must escape search string to prevent : Solr from failing. : : It would be a nice addition to Solr to provide a new query parameter that : tells it to treat the query text as literal text. Doing so, means you : remove the burden placed on clients to unders

Re: Spurious _version_ conflict?

2015-04-16 Thread Chris Hostetter
: I notice that the expected value in the error message matches both what : I pass in and the index contents. But the actual value in the error : message is different only in the last (low order) two digits. : Consistently. what does your client code look like? Are you sure you aren't being

Spurious _version_ conflict?

2015-04-16 Thread Reitzel, Charles
Hi All, I have been getting intermittent 409 conflict responses to updates. I check and double-check that the _version_ I am passing in matches the current value in the index. I notice that the expected value in the error message matches both what I pass in and the index contents. But the ac

Re: 1:M connectivity

2015-04-16 Thread Oded Sofer
Right, we are using that. The issue is the firewall setting needed for the cloud. We do not want to open all nodes to all others nodes. However, we found that add-index to a specific node tries to access all other nodes though we set it to index locally on that node only. On Apr 16, 2015 7:1

Re: Indexing PDF and MS Office files

2015-04-16 Thread Walter Underwood
Turning PDF back into a structured document is like trying to turn hamburger back into a cow. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Apr 16, 2015, at 4:55 AM, Allison, Timothy B. wrote: > +1 > > :) > >> PS: one more thing - please, tell

Re: Solr 5.x deployment in production

2015-04-16 Thread Steven White
Thanks Karl. In my case, I have to deploy Solr on Windows, AIX, and Linux (all server edition). We are a WebSphere shop, moving away from it means I have to deal with politics and culture. For Windows, I cannot use NSSM so I have to figure a solution managing Solr (at least start-up and shutdown

Re: Solr 5.x deployment in production

2015-04-16 Thread Karl Kildén
I asked a very similar question recently. You should switch to using the package as is and forget that it contains a .war. The war is now an internal component. Also switch to the new script for startup etc. I have seen several disappointed users that disagree with this decision but I assume the p

Solr 5.x deployment in production

2015-04-16 Thread Steven White
Hi folks, With Solr 5.0, the WAR file is deprecated and I see Jetty is included with Solr. What if I have my own Web server into which I need to deploy Solr, how do I go about doing this correctly without messing things up and making sure Solr works? Or is this not recommended and Jetty is the w

Re: generate uuid/ id for table which do not have any primary key

2015-04-16 Thread Vishal Swaroop
Just wondering if there is a way to generate uuid/ id in data-config without using combination of fields in query... data-config.xml On Thu, Apr 16, 2015 at 3:18 PM, Vishal Swaroop wrote: > Thanks Kaushik & Erick.. > > Though I can populate uuid by using combination of fields but need t

Re: generate uuid/ id for table which do not have any primary key

2015-04-16 Thread Vishal Swaroop
Thanks Kaushik & Erick.. Though I can populate uuid by using combination of fields but need to change the type to "string" else it throws "Invalid UUID String" a) I will have ~80 millions records and wondering if performance might be issue b) So, during update I can still use combination of fiel

Re: generate uuid/ id for table which do not have any primary key

2015-04-16 Thread Erick Erickson
This seems relevant: http://stackoverflow.com/questions/16914324/solr-4-missing-required-field-uuid Best, Erick On Thu, Apr 16, 2015 at 11:38 AM, Kaushik wrote: > You seem to have defined the field, but not populating it in the query. Use > a combination of fields to come up with a unique id th

Re: generate uuid/ id for table which do not have any primary key

2015-04-16 Thread Kaushik
You seem to have defined the field, but not populating it in the query. Use a combination of fields to come up with a unique id that can be assigned to uuid. Does that make sense? Kaushik On Thu, Apr 16, 2015 at 2:25 PM, Vishal Swaroop wrote: > How to generate uuid/ id (maybe in data-config.xml

generate uuid/ id for table which do not have any primary key

2015-04-16 Thread Vishal Swaroop
How to generate uuid/ id (maybe in data-config.xml...) for table which do not have any primary key. Scenario : Using DIH I need to import data from database but table does not have any primary key I do have uuid defined in schema.xml and is uuid data-config.xml Error : Document is missi

Re: How can I temporarily detach node from SolrCloud?

2015-04-16 Thread Erick Erickson
bq: it down will either reduce your result set or cause queries to return an error Setting shards.tolerant=true will reduce your result set. If you don't set that and all replicas of a shard are down, you'll get an error. And indexing won't work if all the replicas for a shard are down. Best

Re: Differentiating user search term in Solr

2015-04-16 Thread Steven White
Thanks for trying Shawn. Looks like I have to escape the string on my client side (this isn't a clean design and can lead to errors if not all reserved tokens are not escaped). I hope folks from @dev are reading this and consider adding a parameter to tell Solr the text is raw-text. Steve On Th

Re: Differentiating user search term in Solr

2015-04-16 Thread Shawn Heisey
On 4/16/2015 10:18 AM, Shawn Heisey wrote: > On 4/16/2015 10:10 AM, Steven White wrote: >> I don't follow what the "f" parameter is. Do you have a link where I can >> read more about it? I found this >> https://wiki.apache.org/solr/HighlightingParameters and >> https://wiki.apache.org/solr/Simple

Re: check If I am Still Leader

2015-04-16 Thread Erick Erickson
bq: I don't use replication so why does it has to check who is the leader Because the doc must be routed to the correct shard, and the shard leader is the machine that coordinates the indexing for that shard. I really question whether this is a fruitful course for you to take. What specific prob

Re: Differentiating user search term in Solr

2015-04-16 Thread Shawn Heisey
On 4/16/2015 10:10 AM, Steven White wrote: > I don't follow what the "f" parameter is. Do you have a link where I can > read more about it? I found this > https://wiki.apache.org/solr/HighlightingParameters and > https://wiki.apache.org/solr/SimpleFacetParameters but i"m not sure this is > what y

Re: 1:M connectivity

2015-04-16 Thread Erick Erickson
You say "the SolrCloud API". Not entirely sure what that is, do you mean the post.jar tool? Because to get much more scalable throughput, you probably want to use SolrJ and the CloudSolrServer class. That class takes a connection to Zookeeper and "does the right thing". Best, Erick On Thu, Apr 1

Re: Merge indexes in MapReduce

2015-04-16 Thread Erick Erickson
You're stating two things that are somewhat antithetical: 1: We have real-time search and 2: want to merge (and optimize) its indexes into one Needing to merge indexes implies (to me at least) that you're not really doing NRT processing as docs in the batch you're merging into your collection aren

Re: Differentiating user search term in Solr

2015-04-16 Thread Steven White
I don't follow what the "f" parameter is. Do you have a link where I can read more about it? I found this https://wiki.apache.org/solr/HighlightingParameters and https://wiki.apache.org/solr/SimpleFacetParameters but i"m not sure this is what you mean (I'm not doing highlighting for faceting). T

RE: Indexing PDF and MS Office files

2015-04-16 Thread Davis, Daniel (NIH/NLM) [C]
Indeed. Another solution is to purchase ABBYY or Nuance as a server, and have them do that work. You will even get OCR.Both offer a Linux SDK. -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, April 16, 2015 7:56 AM To: solr-user@lucene.apache

RE: Indexing PDF and MS Office files

2015-04-16 Thread Davis, Daniel (NIH/NLM) [C]
If you use pdftotext with a simple fork/exec per document, you will get about 5 MB/s throughput on a single AMD x86_64. Much of that is because of the fork/exec. I suggest that you use HTML output and UTF-8 encoding for the PDF, because that way you can get title/keywords and such as http m

Re: SolrCloud - Collection Browsing

2015-04-16 Thread Erick Erickson
Check that your config has a valid path to the velocity contrib. You should see something like (from Solr 4.10). and you should also see the indicated file on each of your Solr nodes. What's the full stack BTW? I'm expecting something like a class not found error somewhere down in the stack. B

Re: Differentiating user search term in Solr

2015-04-16 Thread Shawn Heisey
On 4/16/2015 9:37 AM, Steven White wrote: > What is "term" in the "defType=term", do you mean the raw word "term" or > something else? Because I tried that too in two different ways: Oops. I forgot that the term query parser (that's what "term" means -- the name of the query parser) requires tha

Conditional Filter Queries

2015-04-16 Thread Tao, Jing
Hi, I want to filter my search results by different date fields based on content type. In other words: if contentType is A, filter out results that are older than 1 year; if contentType is B, filter out results that are older than 2 years; otherwise, date does not matter. Is that possible with

Re: Differentiating user search term in Solr

2015-04-16 Thread Steven White
What is "term" in the "defType=term", do you mean the raw word "term" or something else? Because I tried that too in two different ways: Using correct Solr syntax: http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text}%20solr%20sys&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&defType=term Th

Batch collecting in PostFilter

2015-04-16 Thread ha.pham
Hi all, I am implementing a PostFilter following this article https://lucidworks.com/blog/custom-security-filtering-in-solr/ We have a requirement to call the external system only once for all the documents (max 200) so below is my change: -don't call super.collect(docId) in the collect method

Re: check If I am Still Leader

2015-04-16 Thread Shawn Heisey
On 4/16/2015 7:42 AM, Adir Ben Ami wrote: > I have not mentioned before that the index are always routed to specific > machine. > Is there a way to avoid connectivity from the node to all other nodes? That capability has been added in Solr 5.1.0. https://issues.apache.org/jira/browse/SOLR-6832

Re: check If I am Still Leader

2015-04-16 Thread Shawn Heisey
On 4/16/2015 7:08 AM, Adir Ben Ami wrote: > I am using Solr 4.10.0 with tomcat and embedded Zookeeper. > I use SolrCloud in my system. > > Each Shard machine try to reach/connect with other cluster machines in order > to index the document ,it just checks if it is still the leader. > I don't use

Re: How can I temporarily detach node from SolrCloud?

2015-04-16 Thread Shawn Heisey
On 4/16/2015 8:27 AM, Oded Sofer wrote: > How can I detach node from SolrCloud (temporarily for maintenance and such > and attach it back after some time). We are using SolrCloud 4.10.0; One > Collection, and Shard per node. > The add-index is routed to specific machine base on our customize rou

Re: Differentiating user search term in Solr

2015-04-16 Thread Shawn Heisey
On 4/16/2015 7:49 AM, Steven White wrote: > defType didn't work: > > > http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text%20solr%20sys&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&defType=lucene > > Gave me error: > > org.apache.solr.search.SyntaxError: Expected identifier at pos 27 > str=

How can I temporarily detach node from SolrCloud?

2015-04-16 Thread Oded Sofer
How can I detach node from SolrCloud (temporarily for maintenance and such and attach it back after some time). We are using SolrCloud 4.10.0; One Collection, and Shard per node. The add-index is routed to specific machine base on our customize routing logic (kind of hard-coded)

1:M connectivity

2015-04-16 Thread Oded Sofer
Given that the index are always routed to specific machine, is there a way to avoid connectivity from the node to all other node. We are using Solr 4.10; the Add/Update Index uses SolrCloud API and always added to the node that get API request for add-index (i.e., we are sending the add index t

Re: Indexing PDF and MS Office files

2015-04-16 Thread Charlie Hull
On 16/04/2015 12:53, Siegfried Goeschl wrote: Hi Vijay, I know the this road too well :-) For PDF you can fallback to other tools for text extraction * ps2ascii.ps * XPDF's pdftotext CLI utility (more comfortable than Ghostscript) * some other tools exists as well (pdflib) Here's some file e

Re: Differentiating user search term in Solr

2015-04-16 Thread Steven White
defType didn't work: http://localhost:8983/solr/db/select?q={!q.op=AND%20df=text%20solr%20sys&fl=id%2Cscore%2Ctitle&wt=xml&indent=true&defType=lucene Gave me error: org.apache.solr.search.SyntaxError: Expected identifier at pos 27 str='{!q.op=AND df=text solr sys' Is my use of defType corr

custom search component on solrcloud

2015-04-16 Thread Robust Links
Hi Apologize for sending this again. I am trying to port my none solrcloud custom search handler to a solrcloud one. I have read the WritingDistibutedSearchComponents wiki page and looked at Terms and Querycomponent codes but the con

RE: check If I am Still Leader

2015-04-16 Thread Adir Ben Ami
I have not mentioned before that the index are always routed to specific machine. Is there a way to avoid connectivity from the node to all other nodes? > From: adi...@hotmail.com > To: solr-user@lucene.apache.org > Subject: check If I am Still Leader > Date: Thu, 16 Apr 2015 16:08:15 +

Re: Indexing PDF and MS Office files

2015-04-16 Thread Vijaya Narayana Reddy Bhoomi Reddy
For MS Word documents, one common pattern for all failed documents I noticed is that all of them contain embedded images (like scanned signature images embedded into the documents. These documents are much like some letterheads where someone scanned the signature image and then embedded into the do

Re: Differentiating user search term in Solr

2015-04-16 Thread Shawn Heisey
On 4/16/2015 7:09 AM, Steven White wrote: > I cannot use escapeQueryChars method because my app interacts with Solr via > REST. > > The summary of your email is: client's must escape search string to prevent > Solr from failing. > > It would be a nice addition to Solr to provide a new query param

Re: Differentiating user search term in Solr

2015-04-16 Thread Steven White
Thanks Shawn. I cannot use escapeQueryChars method because my app interacts with Solr via REST. The summary of your email is: client's must escape search string to prevent Solr from failing. It would be a nice addition to Solr to provide a new query parameter that tells it to treat the query tex

check If I am Still Leader

2015-04-16 Thread Adir Ben Ami
Hi, I am using Solr 4.10.0 with tomcat and embedded Zookeeper. I use SolrCloud in my system. Each Shard machine try to reach/connect with other cluster machines in order to index the document ,it just checks if it is still the leader. I don't use replication so why does it has to check who is

Re: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

2015-04-16 Thread elisabeth benoit
For the records, what I finally did is place those words I want spellcheck to ignore in spellcheck.collateParam.fq and the words I'd like to be checked in spellcheck.q. collationQuery uses spellcheck.collateParam.fq so all did_you_mean queries return results containing words in spellcheck.collatePa

Re: Indexing PDF and MS Office files

2015-04-16 Thread Vijaya Narayana Reddy Bhoomi Reddy
Thanks Tim. I shall raise a Jira with the stack trace information. Thanks & Regards Vijay On 16 April 2015 at 12:54, Allison, Timothy B. wrote: > This sounds like a Tika issue, let's move discussion to that list. > > If you are still having problems after you upgrade to Tika 1.8, please at >

RE: Indexing PDF and MS Office files

2015-04-16 Thread Allison, Timothy B.
+1 :) >PS: one more thing - please, tell your management that you will never >ever successfully all real-world PDFs and cater for that fact in your >requirements :-)

RE: Indexing PDF and MS Office files

2015-04-16 Thread Allison, Timothy B.
This sounds like a Tika issue, let's move discussion to that list. If you are still having problems after you upgrade to Tika 1.8, please at least submit the stack traces (if you can) to the Tika jira. We may be able to find a document that triggers that stack trace in govdocs1 or the slice of

Re: Indexing PDF and MS Office files

2015-04-16 Thread Siegfried Goeschl
Hi Vijay, I know the this road too well :-) For PDF you can fallback to other tools for text extraction * ps2ascii.ps * XPDF's pdftotext CLI utility (more comfortable than Ghostscript) * some other tools exists as well (pdflib) If you start command line tools from your JVM please have a look a

5.1 'unique' facet function / calcDistinct

2015-04-16 Thread levanDev
Hello, We are looking at a couple of options for using solr to dynamically calulate unique values per field. In testing out Solr 5.1, I've been using the unique() facet function: http://yonik.com/solr-facet-functions/ Overall, loving the JSON Facet API, especially the sub-faceting thus far. H

Re: Indexing PDF and MS Office files

2015-04-16 Thread Vijaya Narayana Reddy Bhoomi Reddy
Thanks Allison. I tried with the mentioned changes. But still no luck. I am using the code from lucidworks site provided by Erick and now included the changes mentioned by you. But still the issue persists with a small percentage of documents (both PDF and MS Office documents) failing. Unfortunate

RE: Indexing PDF and MS Office files

2015-04-16 Thread Allison, Timothy B.
I entirely agree with Erick -- it is best to isolate Tika in its own jvm if you can -- bad things can happen if you don't [1] [2]. Erick's blog on SolrJ is fantastic. If you want to have Tika parse embedded documents/attachments, make sure to set the parser in the ParseContext before parsing:

Merge indexes in MapReduce

2015-04-16 Thread Norgorn
Is there a ready-to-use tool to merge existing indexes in map-reduce? We have real-time search and want to merge (and optimize) its indexes into one, so we don't need to build index in Map-Reduce, but only merge it. -- View this message in context: http://lucene.472066.n3.nabble.com/Merge-index

Re: Indexing PDF and MS Office files

2015-04-16 Thread Vijaya Narayana Reddy Bhoomi Reddy
Erick, I tried indexing both ways - SolrJ / Tika's AutoParser and as well as SolrCell's ExtractRequestHandler. Majority of the PDF and Word documents are getting parsed properly and indexed into Solr. However, a minority of them keep failing wither PDFParser or OfficeParser error. Not sure if thi

Nno servers hosting shard.

2015-04-16 Thread Modassar Ather
Hi, I have a setup of 5 node SolrCloud (Lucene/Solr version 5.1.0) without replicas. When I am executing complex and large queries with wild-cards after some time I am getting following exceptions. The index size on each of the node is around 170GB and the memory is set to -Xms20g -Xmx24g on each

Escaping in update XML messages

2015-04-16 Thread Jens Brandt
Hi, I am trying to delete some documents from my index by posting XML-messages to the solr. The unique key for the documents in my index is their url. The XML messages look like this: url:"http://example.com/path/file"; For simple urls everything works fine, but if the url contains an '&' like

SolrCloud - Collection Browsing

2015-04-16 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi, I have setup a SolrCloud on 3 machines - machine1, machine2 and machine3. The DirectoryFactory used is HDFS where the collection index data is stored in HDFS within a Hadoop cluster. SlorCloud has been setup successfully and everything looks fine so far. I have uploaded the default configurat

facets on external field

2015-04-16 Thread jainam vora
Hi, I am using external field for price field since it changes frequently. generate facets using external field? how? I understand that faceting requires indexing and external fields fields are not actually indexed. Is there any solution for this problem? -- Thanks & Regards, Jainam Vora

Re: Information regarding "This conf directory is not valid" SolrException.

2015-04-16 Thread Shai Erera
I opened SOLR-7408 to track that. Shai On Mon, Apr 13, 2015 at 3:31 PM, Bar Weiner wrote: > After some additional debugging, I think that this issue is caused by a > possible race condition introduced to ZkController in Solr-5.0.0. > > My concerns are around unregister(...) function in ZkContro