Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-21 Thread Ali Nazemian
Dear Yonik, Hi, Really thanks for you response. Best regards. On Tue, Jul 21, 2015 at 5:42 PM, Yonik Seeley wrote: > On Tue, Jul 21, 2015 at 3:09 AM, Ali Nazemian > wrote: > > Dear Erick, > > I found another thing, I did check the number of unique terms for this > > field using schema browser,

Optimizing Solr indexing over WAN

2015-07-21 Thread Ali Nazemian
Dears, Hi, I know that there are lots of tips about how to make the Solr indexing faster. Probably some of the most important ones which are considered in client side are choosing batch indexing and multi-thread indexing. There are other important factors that are server side which I dont want to m

issue with query boost using qf and edismax

2015-07-21 Thread sandeep bonkra
Hi, I am implementing searching using SOLR 5.0 and facing very strange problem. I am having 4 fields Name and address, city and state in the document apart from a unique ID. My requirement is that it should give me those results first where there is a match in name , then address, then state, cit

Running SolrJ from Solr's REST API

2015-07-21 Thread Zheng Lin Edwin Yeo
Hi, Would like to check, as I've created a SorJ program and exported it as an Runnable JAR, how do I integrate it together with Solr so that I can call this JAR directly from Solr's REST API? Currently I can only run it on command prompt using the command java -jar solrj.jar I'm using Solr 5.2.1

Re: WordDelimiterFilter Leading & Trailing Special Character

2015-07-21 Thread Jack Krupansky
You can also use the types attribute to change the type of specific characters, such as to treat the "!" or "&" as an . -- Jack Krupansky On Tue, Jul 21, 2015 at 7:43 PM, Sathiya N Sundararajan wrote: > Upayavira, > > thanks for the helpful suggestion, that works. I was looking for an option >

Re: WordDelimiterFilter Leading & Trailing Special Character

2015-07-21 Thread Sathiya N Sundararajan
Upayavira, thanks for the helpful suggestion, that works. I was looking for an option to turn off/circumvent that particular WordDelimiterFilter's behavior completely. Since our indexes are hundred's of Terabytes, every time we find a term that needs to be added, it will be a cumbersome process to

Re: IntelliJ setup

2015-07-21 Thread Andrew Musselman
Bingo, thanks! On Tue, Jul 21, 2015 at 4:12 PM, Konstantin Gribov wrote: > Try "invalidate caches and restart" in IDEA, remove .idea directory in > lucene-solr dir. After that run "ant idea" and re-open project. > > Also, you have to, at least, close project, run "ant idea" and re-open it > if s

Re: IntelliJ setup

2015-07-21 Thread Konstantin Gribov
Try "invalidate caches and restart" in IDEA, remove .idea directory in lucene-solr dir. After that run "ant idea" and re-open project. Also, you have to, at least, close project, run "ant idea" and re-open it if switching between too diverged branches (e.g., 4.10 and 5_x). вт, 21 июля 2015 г. в 2

Re: Parsing and indexing parts of the input file paths

2015-07-21 Thread Andrew Musselman
Which can only happen if I post it to a web service, and won't happen if I do it through config? On Tue, Jul 21, 2015 at 2:19 PM, Upayavira wrote: > yes, unless it has been added consciously as a separate field. > > On Tue, Jul 21, 2015, at 09:40 PM, Andrew Musselman wrote: > > Thanks, so by the

Re: Parsing and indexing parts of the input file paths

2015-07-21 Thread Upayavira
yes, unless it has been added consciously as a separate field. On Tue, Jul 21, 2015, at 09:40 PM, Andrew Musselman wrote: > Thanks, so by the time we would get to an Analyzer the file path is > forgotten? > > https://cwiki.apache.org/confluence/display/solr/Analyzers > > On Tue, Jul 21, 2015 at

Re: Tips for faster indexing

2015-07-21 Thread Fadi Mohsen
In Java: UUID.randomUUID(); That is what I'm using. Regards > On 21 Jul 2015, at 22:38, Vineeth Dasaraju wrote: > > Hi Upayavira, > > I guess that is the problem. I am currently using a function for generating > an ID. It takes the current date and time to milliseconds and generates the > id.

Re: Issue with using createNodeSet in Solr Cloud

2015-07-21 Thread Savvas Andreas Moysidis
Ah, nice tip, thanks! This could also make scripts more portable too. Cheers, Savvas On 21 July 2015 at 08:40, Upayavira wrote: > Note, when you start up the instances, you can pass in a hostname to use > instead of the IP address. If you are using bin/solr (which you should > be!!) then you ca

Re: Parsing and indexing parts of the input file paths

2015-07-21 Thread Andrew Musselman
Thanks, so by the time we would get to an Analyzer the file path is forgotten? https://cwiki.apache.org/confluence/display/solr/Analyzers On Tue, Jul 21, 2015 at 1:27 PM, Upayavira wrote: > Solr generally does not interact with the file system in that way (with > the exception of the DIH). > >

Re: Tips for faster indexing

2015-07-21 Thread Vineeth Dasaraju
Hi Upayavira, I guess that is the problem. I am currently using a function for generating an ID. It takes the current date and time to milliseconds and generates the id. This is the function. public static String generateID(){ Date dNow = new Date(); SimpleDateFormat ft = new Simp

Re: Tips for faster indexing

2015-07-21 Thread Upayavira
Are you making sure that every document has a unique ID? Index into an empty Solr, then look at your maxdocs vs numdocs. If they are different (maxdocs is higher) then some of your documents have been deleted, meaning some were overwritten. That might be a place to look. Upayavira On Tue, Jul 21

Re: Parsing and indexing parts of the input file paths

2015-07-21 Thread Upayavira
Solr generally does not interact with the file system in that way (with the exception of the DIH). It is the job of the code that pushes a file to Solr to process the filename and send that along with the request. See here for more info: https://cwiki.apache.org/confluence/display/solr/Uploading+

Re: Tips for faster indexing

2015-07-21 Thread solr . user . 1507
I can confirm this behavior, seen when sending json docs in batch, never happens when sending one by one, but sporadic when sending batches. Like if sole/jetty drops couple of documents out of the batch. Regards > On 21 Jul 2015, at 21:38, Vineeth Dasaraju wrote: > > Hi, > > Thank You Erick

Re: Tips for faster indexing

2015-07-21 Thread Vineeth Dasaraju
Hi, Thank You Erick for your inputs. I tried creating batches of 1000 objects and indexing it to solr. The performance is way better than before but I find that number of indexed documents that is shown in the dashboard is lesser than the number of documents that I had actually indexed through sol

Re: Performance of facet contain search in 5.2.1

2015-07-21 Thread Erick Erickson
"contains" has to basically examine each and every term to see if it matches. Say my facet.contains=bbb. A matching term could be aaabbbxyz or zzzbbbxyz So there's no way to _know_ when you've found them all without examining every last one. So I'd try to redefine the problem to not require that.

IntelliJ setup

2015-07-21 Thread Andrew Musselman
I followed the instructions here https://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ, including `ant idea`, but I'm still not getting the links in solr classes and methods; do I need to add libraries, or am I missing something else? Thanks!

Re: Parsing and indexing parts of the input file paths

2015-07-21 Thread Andrew Musselman
I'm not sure, it's a remote team but will get more info. For now, assuming that a certain directory is specified, like "/user/andrew/", and a regex is applied to capture anything two directories below matching "*/*/*.pdf". Would there be a way to capture the wild-carded values and index them as f

Re: Parsing and indexing parts of the input file paths

2015-07-21 Thread Upayavira
Keeping to the user list (the right place for this question). More information is needed here - how are you getting these documents into Solr? Are you posting them to /update/extract? Or using DIH, or? Upayavira On Tue, Jul 21, 2015, at 06:31 PM, Andrew Musselman wrote: > Dear user and dev lists

Parsing and indexing parts of the input file paths

2015-07-21 Thread Andrew Musselman
Dear user and dev lists, We are loading files from a directory and would like to index a portion of each file path as a field as well as the text inside the file. E.g., on HDFS we have this file path: /user/andrew/1234/1234/file.pdf And we would like the "1234" token parsed from the file path a

Re: solr blocking and client timeout issue

2015-07-21 Thread Jeremy Ashcraft
I did find a dark corner of our application that a dev had left some experimental code in that snuck past QA, because it was rarely used. A client discovered and was using it heavily over the past week. It was generating multiple consecutive update/commit requests. Its been disabled and the

upgrade clusterstate.json fom 4.10.4 to split state.json in 5.2.1

2015-07-21 Thread Yago Riveiro
Hi, How can I upgrade the clusterstate.json to be split by collection? I read this issue https://issues.apache.org/jira/browse/SOLR-5473. In theory exists a param “stateFormat” that configured to 2 says to use the /collections/collection/cluster.son format. Where can I configure this? —/Y

Re: SOLR nrt read writes

2015-07-21 Thread Alessandro Benedetti
> > Could this be due to caching? I have tried to disable all in my solrconfig. If you mean Solr caches ? NO . Solr caches live the life of the searcher. So new searcher, new caches ( possibly warmed with updated results) . If you mean your application caching or browser caching, you should veri

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Alessandro Benedetti
Hi Mese, let me try to answer to your 2 questions : 1. What happens if a shard(both leader and replica) goes down. If the > document on the "dead shard" is updated, will it forward the document to > the > new shard. If so, when the "dead shard" comes up again, will this not be > considered for t

Re: Query Performance

2015-07-21 Thread Nagasharath
I tried using SolrMeter but for some reason it does not detect my url and throws solr server exception Sent from my iPhone > On 21-Jul-2015, at 10:58 am, Alessandro Benedetti > wrote: > > SolrMeter mate, > > http://code.google.com/p/solrmeter/ > > Take a look, it will help you a lot ! > >

Re: Query Performance

2015-07-21 Thread Alessandro Benedetti
SolrMeter mate, http://code.google.com/p/solrmeter/ Take a look, it will help you a lot ! Cheers 2015-07-21 16:49 GMT+01:00 Nagasharath : > Any recommended tool to test the query performance would be of great help. > > Thanks > -- -- Benedetti Alessandro Visiting c

Migrating junit tests from Solr 4.5.1 to Solr 5.2.1

2015-07-21 Thread Rich Hume
I am migrating from Solr 4.5.1 to Solr 5.2.1 on a Windows platform. I am using multi-core, but not Solr cloud. I am having issues with my suite of junit tests. My tests currently use code I found in SOLR-4502. I was wondering whether anyone could point me at best-practice examples of multi-c

Query Performance

2015-07-21 Thread Nagasharath
Any recommended tool to test the query performance would be of great help. Thanks

Re: Use REST API URL to update field

2015-07-21 Thread Zheng Lin Edwin Yeo
Ok. Thanks for your advice. Regards, Edwin On 21 July 2015 at 15:37, Upayavira wrote: > curl is just a command line HTTP client. You can use HTTP POST to send > the JSON that you are mentioning below via any means that works for you > - the file does not need to exist on disk - it just needs to

Re: Data Import Handler Stays Idle

2015-07-21 Thread Paden
Hey shawn when I use the -m 2g command in my script I get the error a 'cannot open [path]/server/logs/solr.log for reading: No such file or directory' I do not see how this would affect that. -- View this message in context: http://lucene.472066.n3.nabble.com/Data-Import-Handler-Stays-Idle-tp4

Re: Data Import Handler Stays Idle

2015-07-21 Thread Paden
Okay. I'm going to run the index again with specifications that you recommended. This could take a few hours but I will post the entire trace on that error when it pops up again and I will let you guys know the results of increasing the heap size. -- View this message in context: http://lucene

Re: Data Import Handler Stays Idle

2015-07-21 Thread Shawn Heisey
On 7/21/2015 8:17 AM, Paden wrote: > There are some zip files inside the directory and have been addressed to in > the database. I'm thinking those are the one's it's jumping right over. They > are not the issue. At least I'm 95% sure. And Shawn if you're still watching > I'm sorry I'm using solr-5

Re: Data Import Handler Stays Idle

2015-07-21 Thread Paden
There are some zip files inside the directory and have been addressed to in the database. I'm thinking those are the one's it's jumping right over. They are not the issue. At least I'm 95% sure. And Shawn if you're still watching I'm sorry I'm using solr-5.1.0. -- View this message in context:

RE: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Reitzel, Charles
Also, the function used to generate hashes is org.apache.solr.common.util.Hash.murmurhash3_x86_32(), which produces a 32-bit value. The range of the hash values assigned to each shard are resident in Zookeeper. Since you are using only a single hash component, all 32-bits will be used by th

RE: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Reitzel, Charles
When are you generating the UUID exactly? If you set the unique ID field on an "update", and it contains a new UUID, you have effectively created a new document. Just a thought. -Original Message- From: mesenthil1 [mailto:senthilkumar.arumu...@viacomcontractor.com] Sent: Tuesday, Ju

Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-21 Thread Yonik Seeley
On Tue, Jul 21, 2015 at 3:09 AM, Ali Nazemian wrote: > Dear Erick, > I found another thing, I did check the number of unique terms for this > field using schema browser, It reported 1683404 number of terms! Does it > exceed the maximum number of unique terms for "fcs" facet method? The real limit

RE: Programmatically find out if node is overseer

2015-07-21 Thread Markus Jelsma
Hello - this approach not only solves the problem but also allows me to run different processing threads on other nodes. Thanks! Markus -Original message- > From:Chris Hostetter > Sent: Saturday 18th July 2015 1:00 > To: solr-user > Subject: Re: Programmatically find out if node is ov

Re: Performance of facet contain search in 5.2.1

2015-07-21 Thread Alessandro Benedetti
Hi Dave, generally giving terms in a dictionary, it's much more efficient to run prefix queries than "contain" queries. Talking about using docValues, if I remember well when they are loaded in memory they are skipList, so you can use two operators on them : - next() that simply gives you ht next

Performance of facet contain search in 5.2.1

2015-07-21 Thread Lo Dave
I found that facet contain search take much longer time than facet prefix search. Do anyone have idea how to make contain search faster? org.apache.solr.core.SolrCore; [concordance] webapp=/solr path=/select params={q=sentence:"duty+of+care"&facet.field=autocomplete&indent=true&facet.prefix=duty+

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread mesenthil1
Unable to delete by passing distrib=false as well. Also it is difficult to identify those duplicate documents among the 130 million. Is there a way we can see the generated hash key and mapping them to the specific shard? -- View this message in context: http://lucene.472066.n3.nabble.com/Sol

Re: solr blocking and client timeout issue

2015-07-21 Thread Daniel Collins
We have a similar situation: production runs Java 7u10 (yes, we know its old!), and has custom GC options (G1 works well for us), and a 40Gb heap. We are a heavy user of NRT (sub-second soft-commits!), so that may be the common factor here. Every time we have tried a later Java 7 or Java 8, the he

Re: WordDelimiterFilter Leading & Trailing Special Character

2015-07-21 Thread Upayavira
Looking at the javadoc for the WordDelimiterFilterFactory, it suggests this config: Note the protected="x" attribute. I suspect if you put Yahoo! into a file referenced by that attribute, it may survive analysis. I'd be curious to hear whether it works. Upayavira On

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-21 Thread Upayavira
I suspect you can delete a document from the wrong shard by using update?distrib=false. I also suspect there are people here who would like to help you debug this, because it has been reported before, but we haven't yet been able to see whether it occurred due to human or software error. Upayavir

Re: SOLR nrt read writes

2015-07-21 Thread Upayavira
Bhawna, I think you need to reconcile yourself to the fact that what you want to achieve is not going to be possible. Solr (and Lucene underneath it) is HEAVILY optimised for high read/low write situations, and that leads to some latency in content reaching the index. If you wanted to change this

Re: Issue with using createNodeSet in Solr Cloud

2015-07-21 Thread Upayavira
Note, when you start up the instances, you can pass in a hostname to use instead of the IP address. If you are using bin/solr (which you should be!!) then you can use bin/solr -h my-host-name and that'll be used in place of the IP. Upayavira On Tue, Jul 21, 2015, at 05:45 AM, Erick Erickson wrote

Re: Use REST API URL to update field

2015-07-21 Thread Upayavira
curl is just a command line HTTP client. You can use HTTP POST to send the JSON that you are mentioning below via any means that works for you - the file does not need to exist on disk - it just needs to be added to the body of the POST request. I'd say review how to do HTTP POST requests from yo

Re: Installing Banana on Solr 5.2.1

2015-07-21 Thread Upayavira
On Tue, Jul 21, 2015, at 02:00 AM, Shawn Heisey wrote: > On 7/20/2015 5:45 PM, Vineeth Dasaraju wrote: > > I am trying to install Banana on top of solr but haven't been able to do > > so. All the procedures that I get are for an earlier version of solr. Since > > the directory structure has change

Re: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field content

2015-07-21 Thread Ali Nazemian
Dear Erick, I found another thing, I did check the number of unique terms for this field using schema browser, It reported 1683404 number of terms! Does it exceed the maximum number of unique terms for "fcs" facet method? I read somewhere it should be more than 16m does it true?! Best regards. O