Re: High Cpu sys usage

2016-03-19 Thread Otis Gospodnetić
Hi, I looked at those metrics outputs, but nothing jumps out at me as problematic. How full are your JVM heap memory pools? If you are using SPM to monitor your Solr/Tomcat/Jetty/... look for a chart that looks like this: https://apps.sematext.com/spm-reports/s/zB3JcdZyRn If some of these lines

Re: Why is multiplicative boost prefered over additive?

2016-03-19 Thread Shawn Heisey
On 3/18/2016 6:34 AM, jimi.hulleg...@svensktnaringsliv.se wrote: > I'm not sure I follow your logic now. If one can express the popularity as a > value between 0.0 and 1.0, why can't one use that, together with a weight > (indicating how much the popularity should influence the score, in general)

Re: No live SolrServers available to handle this request

2016-03-19 Thread Anil
and defType is edismax On 18 March 2016 at 10:40, Anil wrote: > HI Michael, > > i could not post the query. i know its difficult to find out the root > cause without query. sorry about that. > > query includes expand/collpase and query filter (fq) and 2 to 3 terms with > AND. > > please share yo

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread John Smith
Hi, For once I might be of some help: I've had a similar configuration (large set of products from various sources). It's very difficult to find the right balance between all parameters and requires a lot of tweaking, most often in the dark unfortunately. What I've found is that omitNorms=true is

Re: High Cpu sys usage

2016-03-19 Thread Patrick Plaatje
Hi, >From the sar output you supplied, it looks like you might have a memory issue >on your hosts. The memory usage just before your crash seems to be *very* >close to 100%. Even the slightest increase (Solr itself, or possibly by a >system service) could caused the system crash. What are the s

Re: Document Cache

2016-03-19 Thread Erick Erickson
First, I want to make sure when you say "TTL", you're talking about documents being evicted from the documentCache and not the "Time To Live" option whereby documents are removed completely from the index. The time varies with the number of new documents fetched. This is an LRU cache whose size is

Re: Would it be better to make my Schema changes within the renamed "/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf/schema.xml" instead of the way that I am doing it now via curl -

2016-03-19 Thread Shawn Heisey
On 3/18/2016 7:31 AM, John Mitchell wrote: > My question would it be better to make my Schema changes within the renamed > "/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf/schema.xml" > instead of the way that I am doing it now via curl -X POST -H > 'Content-type:application/json

Re: from zookeper embedded to standalone

2016-03-19 Thread Erick Erickson
Looking forward to finding out if it works as I haven't had to do this myself ;). As Upayavira mentions though, you might have to do some fancy dancing with the ZK quorum. I'm assuming that once all the data is moved around, shutting down _all_ the Zookeepers (and Solrs!) and reconfiguring the co

RE: how to update billions of docs

2016-03-19 Thread Ken Krugler
As others noted, currently updating a field means deleting and inserting the entire document. Depending on how you use the field, you might be able to create another core/container with that one field (plus the key field), and use join support. Note that https://issues.apache.org/jira/browse/LU

Re: High Cpu sys usage

2016-03-19 Thread YouPeng Yang
Hi Shawn Actually,there are three Solr instances(The top three PIDs is the three instances),and the datafile size of the stuff is 851G,592G,49G respectively ,and more and more data will be added as time going.I think it may be rare as the large scope as my solrcloud service .and it is now one o

how to update billions of docs

2016-03-19 Thread Mohsin Beg Beg
Hi, I have a requirement to replace a value of a field in 100B's of docs in 100's of cores. The field is multiValued=false docValues=true type=StrField stored=true indexed=true. Atomic Updates performance is on the order of 5K docs per sec per core in solr 5.3 (other fields are quite big). An

Re: Solr5 Optimize

2016-03-19 Thread Rallavagu
Erick, Thanks for the response. Comments in line... On 3/16/16 9:56 AM, Erick Erickson wrote: In general, don't bother with optimize unless the index is quite static, i.e. there are very few adds/updates or those updates are done in batches and rarely (i.e. once a day or less frequently). As fa

Re: how to update billions of docs

2016-03-19 Thread Jack Krupansky
That's another great example of a mode that Bulk Field Update (my mythical feature) needs - switch a list of fields from stored to docvalues. And maybe even the opposite since there are scenarios in which docValues is worse than stored and you would only find that out after indexing... billions of

RE: Explain score is different from score

2016-03-19 Thread Rick Sullivan
Try adding the following to your schema just under the tag:   This seems to solve the problem for me. Well I at least haven't yet found any cases where I see the score discrepancy. Thanks, -Rick > From: r...@ricksullivan.net > To: solr-user@lucene.apac

Re: SolrCloud App Unit Testing

2016-03-19 Thread Steve Davids
Naveen, The Solr codebase generally uses the base “SolrTestCaseJ4” class and sometimes mixes in the cloud cluster. I personally write a generic abstract base test class to fit my needs and have an abstract `getSolrServer` method with an EmbeddedSolrServer implementation along with a separate im

RE: Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard
On Thursday, March 17, 2016 11:21 PM, u...@odoko.co.uk wrote: > > If you use additive boosting, when you add a boost to a search with one term, > (e.g. between 0 and 1) > you get a different effect compared to when you add the same boost to a > search with four terms (e.g. between 0 and 4). Wo

Re: High Cpu sys usage

2016-03-19 Thread YouPeng Yang
Hi To Patrick: Never mind .Thank you for your suggestion all the same. To Otis. We do not use SPM. We monintor the JVM just use jstat becasue my system went well before ,so we do not need other tools. But SPM is really awesome . Still looking for help. Best Regards 2016-03-18 6:01 GMT

Explain score is different from score

2016-03-19 Thread G, Rajesh
Mismatch in score displayed in debug and score field. Please refer attached xml. When I search for title_ws:(Microsoft Ofice 365). If the results are displayed by explain score order then we would have the expected result “Microsoft Office 365” then “Lync - Microsoft Office 365” Lync -

RE: Explain score is different from score

2016-03-19 Thread Rick Sullivan
Yes it seems to be something similar, but the normalization isn't applied to all retrieved documents, which messes with the document rankings. Some documents have the exact values from the 'explain' response, while others are normalized. -Rick > Date: F

Re: stop words as blacklist

2016-03-19 Thread Ahmet Arslan
Hi John, Do you want to skip that document in the indexing process? Or, you want to index that document, but you don't want to retrieve it if it is queried with stop words? There is a KeepWordFilterFactory to detect if a document contains a black-list word. To skip a certain document that mee

Re: No live SolrServers available to handle this request

2016-03-19 Thread michael solomon
What query do you try? On Thu, Mar 17, 2016 at 12:22 PM, Anil wrote: > HI, > > We are using solrcloud with zookeeper and each collection has 5 shareds and > 2 replicas. > we are seeing "org.apache.solr.client.solrj.SolrServerException: No live > SolrServers available to handle this request". i d

RE: Ping handler in SolrCloud mode

2016-03-19 Thread Davis, Daniel (NIH/NLM) [C]
Shawn Heisey wrote: > On 3/16/2016 10:11 AM, Tom Evans wrote: > > This worked, I would still be interested in a lighter-weight approach > > that doesn't involve joins to see if a given collection has a shard on > > this server. I suspect that might require a custom ping handler plugin > > howeve

RE: Indexing both meta-data and full content of HTML

2016-03-19 Thread Davis, Daniel (NIH/NLM) [C]
So, I think I've solved my problem, it basically comes from having only done Data Import Handler with any depth. I'll simply use extract request processing handler with some literal fields. -Original Message- From: Davis, Daniel (NIH/NLM) [C] Sent: Wednesday, March 16, 2016 11:47 AM To:

Re: Why is multiplicative boost prefered over additive?

2016-03-19 Thread Upayavira
Yes. Boosting adjusts an existing score. That original score can vary, e.g. depending upon how many search terms there are. If you use additive boosting, when you add a boost to a search with one term, (e.g. between 0 and 1) you get a different effect compared to when you add the same boost to a s

Ping handler in SolrCloud mode

2016-03-19 Thread Tom Evans
Hi all I have a cloud setup with 8 nodes and 3 collections, products, items and skus. All collections have just one shard, products has 6 replicas, items has 2 replicas, skus has 8 replicas. No node has both products and items, all nodes have skus Some of our queries join from sku to either produ

Re: Solr5 Optimize

2016-03-19 Thread Erick Erickson
In general, don't bother with optimize unless the index is quite static, i.e. there are very few adds/updates or those updates are done in batches and rarely (i.e. once a day or less frequently). As far as space, this will require that you have at _least_ as much free space on your disks as your i

Re: Ping handler in SolrCloud mode

2016-03-19 Thread Tom Evans
On Wed, Mar 16, 2016 at 2:14 PM, Tom Evans wrote: > Hi all > > [ .. ] > > The option I'm trying now is to make two ping handler for skus that > join to one of items/products, which should fail on the servers which > do not support it, but I am concerned that this is a little > heavyweight for a st

Re: DIG issue with SolrEntityProcessor 5.4.1

2016-03-19 Thread William Bell
I will try to see if I can create a use case and fix it. On Wed, Mar 16, 2016 at 10:00 AM, William Bell wrote: > We are running this inside of another entity in DIH. There appears to be > an issue. We get 2 calls to the survey core if hits > 0. If hits = 0 we get > 1 call. Has anyone else seen t

Re: RETRY: SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-19 Thread Kelly, Frank
Any thoughts on this? Hoping for just a quick 1) Yes - once ZooKeeper loses a Quorum you need to restart Solr and your SolrJ Client 2) No - that¹s not expected behavior - Solr and SolrJ should recover - please file a JIRA issue Cheers! Frank Kelly Principal Software Engineer Predictive Analytics

Re: Solr 5.5.0 ClassNotFoundException solr.MockTokenizerFactory after DIH setup

2016-03-19 Thread Victor D'agostino
Hi It is a new server on CentOS release 6.7 with java-1.6.0-openjdk.x86_64 and java-1.7.0-openjdk-devel.x86_64 installed. I can parse the logs to extract jar files which are loaded but which ones am I supposed to look for ? They are all located in /data/solr-5.5.0/ : [root@LXLYOSOL31 logs]

Re: No live SolrServers available to handle this request

2016-03-19 Thread Shawn Heisey
On 3/17/2016 11:29 PM, Anil wrote: > Thanks Shawn. we are using 4.10.3. > > I don't see any issues with replicas of all shards at the time of > exception. health of all shards is good in CDH. I do not know what CDH is. I'm guessing it's third-party software. As far as I'm aware, Solr doesn't hav

Boosts for relevancy (shopping products)

2016-03-19 Thread Robert Brown
Hi, I currently have an index of ~50m docs representing shopping products: name, description, brand, category, etc. Our "qf" is currently setup as: name^5 brand^2 category^3 merchant^2 description^1 mm: 100% ps: 5 I'm getting complaints from the business concerning relevancy, and was hopin

Re: Document Cache

2016-03-19 Thread Rallavagu
Thanks for the recommendations Shawn. Those are the lines I am thinking as well. I am reviewing application also. Going with the note on cache invalidation for every two minutes due to soft commit, wonder how would it go OOM in simply two minutes or is it likely that a thread is holding the se

RE: Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard
On Friday, March 18, 2016 5:11 PM, wun...@wunderwood.org wrote: > > I used a popularity score based on the DVD being in people's queues and the > streaming views. > The Peter Jackson films were DVD only. They were in about 100 subscriber > queues. > The first Twilight film was in 1.25 million

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Robert Brown
Thanks for the added input. I'll certainly look into the machine learning aspect, will be good to put some basic knowledge I have into practice. I'd been led to believe the tie parameter didn't actually do a lot. :-/ On 03/18/2016 12:07 PM, Nick Vasilyev wrote: I work with a similar catalo

Document Cache

2016-03-19 Thread Rallavagu
Solr 5.4 embedded Jetty Is it the right assumption that whenever a document that is returned as a response to a query is cached in "Document Cache"? Essentially, if I request for any entry like /select?q=id: will it be cached in "Document Cache"? If yes, what is the TTL? Thanks in advance

Re: how to update billions of docs

2016-03-19 Thread Jack Krupansky
It would be nice to have a wiki/doc for "Bulk Field Update" that listed all of these techniques and tricks. And, of course, it would be so much better to have an explicit Lucene feature for this. It could work in the background like merge and process one segment at a time as efficiently as possibl

Re: Stopping Solr JVM on OOM

2016-03-19 Thread Binoy Dalal
Hi Shawn, Your thoughts on this? On Mon, Mar 14, 2016 at 2:11 PM Binoy Dalal wrote: > I set the heap to 16 mb and tried to index about 350k records using a DIH. > This did throw an OOM for that particular thread in the console, but the > oom script wasn't called and solr was running properly. >

Re: Making managed schema unmutable correctly?

2016-03-19 Thread Yonik Seeley
On Wed, Mar 16, 2016 at 11:10 PM, Erick Erickson wrote: > Personally I prefer to hand-edit the files. Me too, I hand edit managed-schema all the time. IMO, the warning is a bit overkill. -Yonik

indexing Free-form text description

2016-03-19 Thread Vis Sw
Hi, I am trying to understand the best way to index and search "free text field" e.g. notes or description... Please suggest what will be the best field type, tokenizer, filter... to query Free-form text description of a field. Any example will be great... Regards

Re: Document Cache

2016-03-19 Thread Rallavagu
On 3/18/16 8:56 AM, Emir Arnautovic wrote: Problem starts with autowarmCount="5000" - that executes 5000 queries when new searcher is created and as queries are executed, document cache is filled. If you have large queryResultWindowSize and queries return big number of documents, that will eat

Re: how to update billions of docs

2016-03-19 Thread sudsport s
I think there are no inplace updates in solr , that means updates behaves like inserts and marking old version deleted. so behaviors should be same as indexing billions of docs. On Wed, Mar 16, 2016 at 3:52 PM, Mohsin Beg Beg wrote: > Hi, > > I have a requirement to replace a value of a field in

Re: Document Cache

2016-03-19 Thread Rallavagu
On 3/18/16 9:27 AM, Emir Arnautovic wrote: Running single query that returns all docs and all fields will actually load as many document as queryResultWindowSize is. What you need to do is run multiple queries that will return different documents. In case your id is numeric, you can run somethi

publish solr on galsshfish server

2016-03-19 Thread Adel Mohamed Khalifa
Hello All, What is the requirement for installing solr on glassfish server, and how can I do it? Regards, Adel Khalifa | Developer | Saudisoft-Egypt | Tel: +2 023 303 2037 - ext 112 | M +2 01149247744 | Fax +2 023 303 2036 | Follow us on

RE: publish solr on galsshfish server

2016-03-19 Thread Adel Mohamed Khalifa
I build my webpage for searching and create a servlet for it but it is not working I using this Ajax for calling servlet :- $.ajax({ url: contextPath + '/GetResults', data: { qu: $("#query").val() }, dataType: 'js

Re: Solr5 Optimize

2016-03-19 Thread Rallavagu
Thanks Erick. This helps. On 3/16/16 10:11 AM, Erick Erickson wrote: First of all, "optimize-like" does _not_ happen "every time a commit happens". What _does_ happen is the current state of the index is examined and if certain conditions are met _then_ segment merges happen. Think of these as "

Solr5 Optimize

2016-03-19 Thread Rallavagu
All, Solr 5.4 with emdbedded Jetty (4G heap) Trying to understand behavior of "optimize" operation if not run explicitly. What is the frequency at which this operation is run, what are the storage requirements and how do we schedule it? Any comments/pointers would greatly help. Thanks in ad

Re: Making managed schema unmutable correctly?

2016-03-19 Thread Shawn Heisey
On 3/16/2016 1:14 AM, Alexandre Rafalovitch wrote: > So, I am looking at the Solr 5.5 examples with their all-in by-default > managed schemas. And I am scratching my head on the workflow users are > expected to follow. > > One example is straight from documentation: > "With the above configuration,

Would it be better to make my Schema changes within the renamed "/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf/schema.xml" instead of the way that I am doing it now via curl -X PO

2016-03-19 Thread John Mitchell
I noticed that within "/solr-5.3.0/server/solr/configsets/data_driven_schema_configs/conf" it has a file called "managed-schema" and within this file it says "This is the Solr schema file. This file should be named "schema.xml" and should be in the conf directory". Currently I have not renamed thi

Re: stop words as blacklist

2016-03-19 Thread Binoy Dalal
Like Ahmet says, a custom update request processor is the best way to go, and it's pretty simple too. I have a ready to use example here: https://github.com/lttazz99/SolrPluginsExamples On Fri, Mar 18, 2016 at 9:21 PM Ahmet Arslan wrote: > Hi John, > > Do you want to skip that document in the in

Re: High Cpu sys usage

2016-03-19 Thread Shawn Heisey
On 3/16/2016 8:59 AM, Patrick Plaatje wrote: > From the sar output you supplied, it looks like you might have a memory issue > on your hosts. The memory usage just before your crash seems to be *very* > close to 100%. Even the slightest increase (Solr itself, or possibly by a > system service) c

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Robert Brown
Thanks, would be a great idea but unfortunately we don't have that sort of granularity of features. Can definitely use the category of clicked products though, sounds like a good enough start. On 03/18/2016 04:36 PM, Alessandro Benedetti wrote: Actually if you are able to collect past ( o

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Nick Vasilyev
I work with a similar catalog; except our data is especially bad. We've found that several things helped: - Item level grouping (group same item sold by multiple vendors). Rank items with more vendors a bit higher. - Include a boost function for other attributes, such as an original image of the

Re: indexing pdf files using post tool

2016-03-19 Thread Binoy Dalal
Take a look at the CloneFieldUpdateProcessorFactory here: http://www.solr-start.com/info/update-request-processors/ On Wed, 16 Mar 2016, 18:25 Binoy Dalal, wrote: > Like Francisco said, use a custom update processor to map the fields the > way you want and add it to your update chain. > > On Wed

Re: Document Cache

2016-03-19 Thread Emir Arnautovic
Running single query that returns all docs and all fields will actually load as many document as queryResultWindowSize is. What you need to do is run multiple queries that will return different documents. In case your id is numeric, you can run something like id:[1 TO 100] and then id:[100 TO 20

How is _rest_managed.json used?

2016-03-19 Thread Alexandre Rafalovitch
Hello, What is _rest_managed.json actually for? I can see the mechanics in the Ref Guide and even found where it is managed by source code. But I cannot figure out how it actually fits into a workflow. It seems to be a registry of REST managed components (e.g. synonyms) for when they are NOT decl

Re: No live SolrServers available to handle this request

2016-03-19 Thread Anil
Thanks Shawn. we are using 4.10.3. I don't see any issues with replicas of all shards at the time of exception. health of all shards is good in CDH. Regards, Anil On 18 March 2016 at 10:52, Shawn Heisey wrote: > On 3/17/2016 4:22 AM, Anil wrote: > > We are using solrcloud with zookeeper and

Re: Query behavior.

2016-03-19 Thread Modassar Ather
What I understand by q.op is the default operator. If there is no AND/OR in-between the terms the default will be AND as per my setting of q.op=AND. But what if the query has AND/OR explicitly put in-between the query terms? I just think that if (A OR B) is the query then the result should be based

Re: FW: SolrCloud App Unit Testing

2016-03-19 Thread GW
I think the easiest way to write apps for Solr is with some kind of programming language and the REST API. Don't bother with the PHP or Perl modules. They are deprecated and beyond useless. just use the HTTP call that you see in Solr Admin. Mind the URL encoding when putting together your server ca

Why is multiplicative boost prefered over additive?

2016-03-19 Thread jimi.hullegard
Hi, After reading a bit on various sites, and especially the blog post "Comparing boost methods in Solr", it seems that the preferred boosting type is the multiplicative one, over the additive one. But I can't really get my head around *why* that is so, since in most boosting problems I can thi

Re: indexing Free-form text description

2016-03-19 Thread Erick Erickson
This question is way too general to answer in any detail, so I'd just start with the text_general fieldType in any of the stock schema.xml files. It would be well for you to get familiar with the admin/analysis page, as you'll have a zillion questions about what each change you make to that fieldT

Re: using solr AnalyticsQuery API vs facet API

2016-03-19 Thread sudsport s
Thanks Joel for responding. but I am still not sure when to use Solr analytics API i vs JSON facet API (What is difference between ValueSource vs PostFilter) I know that ValueSource is useful to implement functions. On Wed, Mar 16, 2016 at 9:49 AM, sudsport s wrote: > Hi , > > I am planning t

Re: Query behavior.

2016-03-19 Thread Alessandro Benedetti
I think what he tried to explain was : " Input query : *fl:(java OR book)* Instead of having the query parser parsing : *+((fl:java fl:book)~2) *( which seems what is happening right now) He want the query parser to parse : +((fl:java fl:book)) ( without the mm expressed) More than the outer le

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-19 Thread Zheng Lin Edwin Yeo
Thanks Shawn for your reply. Yes, I'm looking to see if we can implement a combination of tokenizes and filters. However, I tried before that we can only implement one tokenizer for each fieldType. So is it true that I can only stick to one tokenizer, and the rest of the implementation have to be

Re: No live SolrServers available to handle this request

2016-03-19 Thread Anil
HI Michael, i could not post the query. i know its difficult to find out the root cause without query. sorry about that. query includes expand/collpase and query filter (fq) and 2 to 3 terms with AND. please share your thoughts. thanks. Regards, Anil On 17 March 2016 at 19:46, michael solomon

Re: RETRY: SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-19 Thread Kelly, Frank
Thanks for taking look I’m not sure https://issues.apache.org/jira/browse/SOLR-8326 is a match as we aren’t using PKIAuthPlugin -Frank Frank Kelly Principal Software Engineer Predictive Analytics Team (SCBE/HAC/CDA) HERE 5 Wayside Rd, Burlington, MA 01803, USA 42° 29' 7" N 71° 11' 32” W

RE: Explain score is different from score

2016-03-19 Thread Rick Sullivan
I'm not. I only have query boosts. > Date: Fri, 18 Mar 2016 16:42:36 + > From: iori...@yahoo.com.INVALID > To: solr-user@lucene.apache.org > Subject: Re: Explain score is different from score > > Hi Rick, > > This could be a bug I think. Do you guys use

Re: Ping handler in SolrCloud mode

2016-03-19 Thread Tom Evans
On Wed, Mar 16, 2016 at 4:10 PM, Shawn Heisey wrote: > On 3/16/2016 8:14 AM, Tom Evans wrote: >> The problem occurs when we attempt to query a node to see if products >> or items is active on that node. The balancer (haproxy) requests the >> ping handler for the appropriate collection, however all

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
You still haven't explained what exactly you are trying to accomplish with that outer level AND/+/MUST. Please be specific - why you insist on "+((fl:java fl:book))" rather than "fl:java fl:book". -- Jack Krupansky On Fri, Mar 18, 2016 at 12:12 AM, Modassar Ather wrote: > What I understand by

Re: Explain score is different from score

2016-03-19 Thread Ahmet Arslan
Hi Rick, This could be a bug I think. Do you guys use index time boosts? Ahmet On Friday, March 18, 2016 6:15 PM, Rick Sullivan wrote: Yes it seems to be something similar, but the normalization isn't applied to all retrieved documents, which messes with the document rankings. Some document

RE: RETRY: SolrCloud does not recover after ZooKeeper ensemble loses (and then regains) a quorum

2016-03-19 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
I am wondering whether this might be the bug of SOLR-8326, which is fixed in Solr 5.4 That's my guess as a user who ran into the bug myself. -Original Message- From: Kelly, Frank [mailto:frank.ke...@here.com] Sent: Wednesday, March 16, 2016 3:09 PM To: solr-user@lucene.apache.org Subjec

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-19 Thread Zheng Lin Edwin Yeo
I found that in WordDelimiterFilterFactory, there is a parameter called splitOnNumerics, which does the same function as what HMMChineseTokenizer did. - *splitOnNumerics="1"* causes alphabet => number transitions to generate a new part [Solr 1.3]: - "j2se" => "j" "2" "se"

Re: indexing Free-form text description

2016-03-19 Thread Alexandre Rafalovitch
Well, Solr ships with nearly 10 examples. So, if you go through them, you will know quite a lot. This article (mine) may help you to navigate them: http://blog.outerthoughts.com/2015/11/oh-solr-home-where-art-thou/ More specifically, as Erick said, your question is too generic. One step forward w

Re: Why is multiplicative boost prefered over additive?

2016-03-19 Thread Walter Underwood
Think about using popularity as a boost. If one movie has a million rentals and one has a hundred rentals, there is no additive formula that balances that with text relevance. Even with log(popularity), it doesn’t work. With multiplicative boost, we only care about the difference between the one

Error starting solr 5.5 - Cannot open solr.log:No such file or directory

2016-03-19 Thread Shamik Bandopadhyay
Hi, I'm trying to upgrade from Solr 5.0 to 5.5. I'm getting the following error: tail: cannot open `/mnt/ebs2/solrhome/logs/solr.log' for reading: No such file or directory I'm running on CentOS 6.7. The same startup script has been working fine for 5.0 till now. I'm executing as user "solr".

Re: High Cpu sys usage

2016-03-19 Thread YouPeng Yang
Hi Shawn Here is my top screenshot: https://www.dropbox.com/s/jaw10mkmipz943y/topscreen.jpg?dl=0 It is captured when my system is normal.And I have reduced the memory size down to 48GB originating from 64GB. We have two hardware clusters ,each is comprised of 3 machines,and On one c

[nested] how to specify a path for multiple nesting?

2016-03-19 Thread Alisa Z .
Hi all, I have a deeply multi-level data structure (up to 6-7 levels deep) where due to the nature of the data some nested documents can have same type names at various levels. How to form a proper query on a nested field that would contain "a path"  that defines that field? I'll clarify wit

Re: No live SolrServers available to handle this request

2016-03-19 Thread Shawn Heisey
On 3/18/2016 9:55 PM, Anil wrote: > Thanks for your response. > CDH is a Cloudera (third party) distribution. is there any to get the > notifications copy of it when cluster state changed ? in logs ? > > I can assume that the exception is result of no availability of replicas > only. Agree? Yes, I

Shard splitting for immediate performance boost?

2016-03-19 Thread Robert Brown
Hi, I have an index of 60m docs split across 2 shards (each with a replica). When load testing queries (picking random keywords I know exist), and randomly requesting facets too, 95% of my responses are under 0.5s. However, during some random manual tests, sometimes I see searches taking bet

Re: FW: SolrCloud App Unit Testing

2016-03-19 Thread Shawn Heisey
On 3/19/2016 7:11 AM, GW wrote: > I think the easiest way to write apps for Solr is with some kind of > programming language and the REST API. Don't bother with the PHP or Perl > modules. They are deprecated and beyond useless. just use the HTTP call > that you see in Solr Admin. Mind the URL encod

Re: Document Cache

2016-03-19 Thread Rallavagu
comments in line... On 3/17/16 2:16 PM, Erick Erickson wrote: First, I want to make sure when you say "TTL", you're talking about documents being evicted from the documentCache and not the "Time To Live" option whereby documents are removed completely from the index. May be TTL was not the rig

Re: Shard splitting for immediate performance boost?

2016-03-19 Thread Shawn Heisey
On 3/19/2016 11:12 AM, Robert Brown wrote: > I have an index of 60m docs split across 2 shards (each with a replica). > > When load testing queries (picking random keywords I know exist), and > randomly requesting facets too, 95% of my responses are under 0.5s. > > However, during some random manua

Re: Explain score is different from score

2016-03-19 Thread Ahmet Arslan
Hi Rajesh, I suspect it is due to the queryNorm(q). But it is weird that relative order is different in your example. "queryNorm(q) is a normalizing factor used to make scores between queries comparable. This factor does not affect document ranking (since all ranked documents are multiplied b

Re: Solr Wiki - Request to add to contributors group

2016-03-19 Thread Shawn Heisey
On 3/16/2016 8:57 AM, Alessandro Benedetti wrote: > Shawn, thank you very much ! > So, I didn't have an account in the old wiki, can you add me as contributor > ? > Just created. > I will then proceed adding the classification documentation. > > AlessandroBenedetti The username that I added before

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
Now you've confused me... Did you actually intend that q.op=AND was going to perform some function in a query with only two terms and and OR operator? I mean, why not just drop the q.op=AND? -- Jack Krupansky On Wed, Mar 16, 2016 at 1:31 AM, Modassar Ather wrote: > Jack as suggested I have crea

Re: indexing pdf files using post tool

2016-03-19 Thread Francisco Andrés Fernández
Vidya, I don't know if I'm understanding it very well but, I think that the best way is to parse your text using a routine outside Solr. You might need to map the different parts of your document using your domain knowledge and use such routine to produce an XML document for example, with correspon

DIG issue with SolrEntityProcessor 5.4.1

2016-03-19 Thread William Bell
We are running this inside of another entity in DIH. There appears to be an issue. We get 2 calls to the survey core if hits > 0. If hits = 0 we get 1 call. Has anyone else seen this? Shall I fix it? Any ideas where this bug may be? http://localhost:8983/solr/survey"; qt="dihsurvey" query="provide

Explain style json? Without using wt=json...

2016-03-19 Thread jimi.hullegard
Hi, We are using Solrj to query our solr server, and it works great. However, it uses the binary format wt=javabin, and now when I'm trying to get better debug output, I notice a problem with this. The thing is, I want to include the explain data for each search result, by adding "[explain]" as

Re: Boosts for relevancy (shopping products)

2016-03-19 Thread Robert Brown
That does sound rather useful! We currently have it set to 0.1 On 03/18/2016 04:13 PM, Nick Vasilyev wrote: Tie does quite a bit, without it only the highest weighted field that has the term will be included in relevance score. Tie let's you include the other fields that match as well. On Mar

Re: High Cpu sys usage

2016-03-19 Thread YouPeng Yang
Hi It happened again,and worse thing is that my system went to crash.we can even not connect to it with ssh. I use the sar command to capture the statistics information about it.Here are my details: [1]cpu(by using sar -u),we have to restart our system just as the red font LINUX RESTART in the

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
I was just wanting to see the Jira clarified (without creating noise on the Jira), but if others feel they understand the relevance of the outer AND/+ to the stated problem, fine. I don't think I have anything else to add to the discussion at this stage. Now we sit and wait for some senior committe

Re: Query behavior.

2016-03-19 Thread Modassar Ather
What I understand by "+((fl:java fl:book))" is any of the terms should be present in the complete query. Please correct me if I am wrong. What I want to achieve is (A OR B) where any of the term or both of the term will cause a match. Thanks, Modassar On Thu, Mar 17, 2016 at 10:32 AM, Jack Krupan

Re: Regarding google maps polyline to use IsWithin(POLYGON(())) in solr

2016-03-19 Thread David Smiley
JTS doesn't has any vertex limit on the geometries. So I don't know why your query isn't working. On Wed, Mar 16, 2016 at 1:58 AM Pradeep Chandra < pradeepchandra@gmail.com> wrote: > Hi Sir, > > Let me give some clarification on IsWithin(POLYGON(())) query...It is not > giving any result for

Re: Query behavior.

2016-03-19 Thread Jack Krupansky
That's what I thought you had meant before, but the Jira ticket indicates that you are looking for some extra level of AND/MUST outside of the OR, which is different from what you just indicated. In the ticket you say: "How can I achieve following? "+((fl:java fl:book))"", which has an extra AND ou

RE: Making managed schema unmutable correctly?

2016-03-19 Thread Davis, Daniel (NIH/NLM) [C]
Alexandre, I just made this transition, both to SolrCloud and to managed schema. In QA and Production, you update solrconfig.xml to say the schema is not mutable: true managed-schema My workflow in development is as follows: - Start with gettingstarted configuration and downcon

Re: Why is multiplicative boost prefered over additive?

2016-03-19 Thread Jan Høydahl
You can also use functions to “compress” the source number, so that the effect of a certain boost becomes bigger or smaller compared to the other boost you have. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 17. mar. 2016 kl. 23.21 skrev Upayavira : > > Yes. Boost

Re: Making managed schema unmutable correctly?

2016-03-19 Thread Alexandre Rafalovitch
Daniel, Thank you for the very concrete example. That is helpful. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 17 March 2016 at 08:17, Davis, Daniel (NIH/NLM) [C] wrote: > Alexandre, > > I just made this transition, both to

Re: Shard splitting for immediate performance boost?

2016-03-19 Thread Erick Erickson
Be _very_ cautious when you're looking at these timings. Random spikes are often due to opening a new searcher (assuming you're indexing as you query) and are eminently tunable by autowarming. Obviously you can't fire the same query again and again, but if you collect a set of "bad" queries and, sa

Re: Shard splitting for immediate performance boost?

2016-03-19 Thread Robert Brown
Thanks Erick, I have another index with the same infrastructure setup, but only 10m docs, and never see these slow-downs, that's why my first instinct was to look at creating more shards. I'll definitely make a point of investigating further tho with all the things you and Shawn mentioned, t

Re: Making managed schema unmutable correctly?

2016-03-19 Thread Shawn Heisey
On 3/16/2016 7:51 PM, Jay Potharaju wrote: > Does using schema API mean that no upconfig to zookeeper and no reloading > of all the nodes in my solrcloud? In which scenario should I not use schema > API, if any? The documentation says that a reload occurs automatically after the schema modificatio

  1   2   >