Re: Questions on SolrCloud core state, when will Solr recover a "DOWN" core to "ACTIVE" core.

2016-04-20 Thread danny teichthal
Hi Li, If you could supply some more info from your logs would help. We also had some similar issue. There were some bugs related to SolrCloud that were solved on solr 4.10.4 and further on solr 5.x. I would suggest you compare your logs with defects on 4.10.4 release notes to see if they are the s

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard
Yes, it definately seems to be the main problem for us. I did some simple tests of the encoding and decoding calculations in DefaultSimilarity, and my findings are: * For input between 1.0 and 0.5, a difference of 0.01 in the input causes the output to change by a value of 0 or 0.125 depending

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard
Yes, we do edismax per field boosting, with explicit boosting of the title field. So it sure makes length normalization less relevant. But not *completely* irrelevant, which is why I still want to have it as part of the scoring, just with much less impact that it currently has. /Jimi __

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard
Yes, the example was contrived. Partly because our documents are mostly in Swedish text, but mostly because I thought that the example should be simple enough so it focused on the thing discussed (even though I simplifyed it to such a degree that I left out the current main problem with the fiel

pivoting with json facet api

2016-04-20 Thread Yangrui Guo
Hi I am trying to facet results on my nest documents. The solr document did not say much on how to pivot with json api with nest documents. Could someone show me some examples? Thanks very much. Yangrui

Re: Traversal of documents through network

2016-04-20 Thread vidya
ok. I understand that. So, you would say documents traverse through network. If i specify some 100 docs to be dispalyed on my first page, will it effect performance. While docs gets traversed, will there be any high volume traffic and effects performance of the application. And whats the time sol

Re: Overall large size in Solr across collections

2016-04-20 Thread Zheng Lin Edwin Yeo
Hi Shawn, Yes, I'm using the Extracting Request Handler. The 0.7GB/hr is the indexing rate at which the size of the original documents which get ingested into Solr. This means that for every hour, only 0.7GB of my documents gets ingested into Solr. It will require 10 hours just to index documents

complete cluster shutdown

2016-04-20 Thread Zap Org
I have 5 zookeeper and 2 solr machines and after a month or two whole clustre shutdown i dont know why. The logs i get in zookeeper are attached below. otherwise i dont get any error. All this is based on linux VM. 2016-03-11 16:50:18,159 [myid:5] - WARN [SyncThread:5:FileTxnLog@334] - fsync-ing

Re: Overall large size in Solr across collections

2016-04-20 Thread Shawn Heisey
On 4/20/2016 8:10 PM, Zheng Lin Edwin Yeo wrote: > I'm currently running 4 threads concurrently to run the indexing, Which > means I run the script in command prompt in 4 different command windows. > The ID has been configured in such a way that it will not overwrite each > other during the indexin

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread Jack Krupansky
Or should this be higher rated about NY, since it's shorter: * New York Another though on length norms: with the advent of multi-field dismax with per-field boosting, people tend to explicitly boost the title field so that the traditional length normalization is less relevant. -- Jack Krupansky

Re: Overall large size in Solr across collections

2016-04-20 Thread Zheng Lin Edwin Yeo
Hi Shawn, I'm currently running 4 threads concurrently to run the indexing, Which means I run the script in command prompt in 4 different command windows. The ID has been configured in such a way that it will not overwrite each other during the indexing. Is that considered multi-threading? The ra

Re: Storing different collection on different hard disk

2016-04-20 Thread Zheng Lin Edwin Yeo
Thanks for your reply. I have managed to solve the problem. The reason is that we have to use this "/" instead of this "\", even in Windows, and to include the data folder as well. This is the working one: dataDir=D:/collection1/data Regards, Edwin On 20 April 2016 at 21:39, Bram Van Dam wrot

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread Walter Underwood
Sure, here are some real world examples from my time at Netflix. Is this movie twice as much about “new york”? * New York, New York Which one of these is the best match for “blade runner”: * Blade Runner: The Final Cut * Blade Runner: Theatrical & Director’s Cut * Blade Runner: Workprint http:

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread Jack Krupansky
Maybe it's a cultural difference, but I can't imagine why on a query for "John", any of those titles would be treated as anything other than equals - namely, that they are all about John. Maybe the issue is that this seems like a contrived example, and I'm asking for a realistic example. Or, maybe

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread Ahmet Arslan
Hi Jim, fieldNorm encode/decode thing cause some precision loss. This may be a problem when dealing with very short documents. You can find many discussions on this topic. ahmet On Thursday, April 21, 2016 3:10 AM, "jimi.hulleg...@svensktnaringsliv.se" wrote: Ok sure, I can try and give som

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard
Ok sure, I can try and give some examples :) Lets say that we have the following documents: Id: 1 Title: John Doe Id: 2 Title: John Doe Jr. Id: 3 Title: John Lennon: The Life Id: 4 Title: John Thompson's Modern Course for the Piano: First Grade Book Id: 5 Title: I Rode With Stonewall: Being C

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread Jack Krupansky
I'm not sure I fully follow what distinction you're trying to focus on. I mean, traditionally length normalization has simply tried to distinguish a title field (rarely more than a dozen words) from a full body of text, or maybe an abstract, not things like exactly how many words were in a title. O

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard
I am talking about the title field. And for the title field, a sweetspot interval of 1 to 50 makes very little sense. I want to have a fieldNorm value that differentiates between for example 2, 3, 4 and 5 terms in the title, but only very little. The 20% number I got by simply calculating the d

Re: set session variable in mysql importHandler

2016-04-20 Thread Alexandre Rafalovitch
The driver documentation talks about "sessionVariables" that might be possible to pass through the connection URL: https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-configuration-properties.html Alternatively, there might be a way to configure driver via JNDI and set some variable

Questions on SolrCloud core state, when will Solr recover a "DOWN" core to "ACTIVE" core.

2016-04-20 Thread Li Ding
Hi All, We are using SolrCloud 4.6.1. We have observed following behaviors recently. A Solr node in a Solrcloud cluster is up but some of the cores on the nodes are marked as down in Zookeeper. If the cores are parts of a multi-sharded collection with one replica, the queries to that collectio

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread Ahmet Arslan
Hi Jimi, Please define a meaningful document-lenght range like min=1 max=50. By the way you need to reindex every time you change something. Regarding 20% score change, I am not sure how you calculated that number and I assume it is correct. What really matters is the relative order of documents

Re: Questions about tie parameter for dismax/edismax

2016-04-20 Thread Ahmet Arslan
Hi Jimi, Contribution to the documentation is very important. It would be great if you can prepare a good text explaining things with common sense and easy to understand. Please include your documentation proposal as a commend to the confluence wiki [1]. [1] https://cwiki.apache.org/confluence

Remedial Map-Reduce logic

2016-04-20 Thread Davis, Daniel (NIH/NLM) [C]
Well, it's been a long time since I took any data structures and algorithms course (2000, basically), and after the recent Solr 6 feature chat, I was very curious whether there was real computational goodness behind the move towards a JDBC interface based on Streaming Expressions. This led me

RE: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard
Hang on... It didn't work out as I wanted. But the problem seems to be in the encoding of the fieldNorm value. The decoded value is so coarse, so that when it is decoded the result is that two values that were quite close to each other originally, can become quite far apart after encoding and de

RE: Questions about tie parameter for dismax/edismax

2016-04-20 Thread jimi.hullegard
Thanks Ahmet! The second I read that part about the "albino elephant" query I remembered that I had read that before, but just forgotten about it. That explanation is really good, and really should be part of the regular documentation if you ask me. :) /Jimi -Original Message- From: Ah

RE: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard
Hi Ahmet, SweetSpotSimilarity seems quite nice. Some simple testing by throwing some different values at the class gives quite good results. Setting ln_min=1, ln_max=2, steepness=0.1 and discountOverlaps=true should give me more or less what I want. At least for the title field. I'm not sure wh

Re: Indexing 700 docs per second

2016-04-20 Thread Mark Robinson
Thank you all for your very valuable suggestions. I will try out the options shared once our set up is ready and probably get back on my experience once it is done. Thanks! Mark. On Wed, Apr 20, 2016 at 9:54 AM, Bram Van Dam wrote: > > I have a requirement to index (mainly updation) 700 docs pe

Re: how to restrict phrase to appear in same child document

2016-04-20 Thread Yangrui Guo
Hi thanks for answering. My problem is that users do not distinguish what color the color belongs to in the query. For example, "which black driver has a white mercedes", it is difficult to distinguish which color belongs to which field, because there can be thousands of car brands and professions.

Block Join faceting on intermediate levels with JSON Facet API (might be related to block join rollups & SOLR-8998)

2016-04-20 Thread Alisa Z .
Hi all, I have been stretching some SOLR's capabilities for nested documents handling and I've come up with the following issue... Let's say I have the following structure: { "blog-posts":{  //level 1     "leaf-fields":[     "date",     "author"],     "title":{

Re: Questions about tie parameter for dismax/edismax

2016-04-20 Thread Ahmet Arslan
Hi Jimi, Field based scoring, where you query multiple fields (title,body,keywords etc) with multiple query terms, is an unsolved problem. (E)dismax is a heuristic approach to attack the problem. Please see the javadoc of DisjunctionMaxQuery : https://lucene.apache.org/core/6_0_0/core/org/apac

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread Ahmet Arslan
Hi Jimi, SweetSpotSimilarity allows you define a document length range, so that all documents in that range will get same fieldNorm value. In your case, you can say that from 1 word up to 100 words do not employ document length punishment. If a document is longer than 100 do some punishment. By

Re: how to restrict phrase to appear in same child document

2016-04-20 Thread Alisa Z .
Yangrui, First, have you indexed your documents with proper nested document structure [https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-NestedChildDocuments]? From the peice of data you showed, it seems that you just put it righ

Re: Traversal of documents through network

2016-04-20 Thread Alisa Z .
Viday, No, not all of those 500 result docs will be brought to your client (browser, etc.)   Only as many documents as fit into the 1st "search result page" will be brought. There is a notion of "pagination" in Solr (as well as in most search engines). The counts of occurrence might be appro

RE: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard
OK. Well, still, the fact that the score increases almost 20% because of just one extra term in the field, is not really reasonable if you ask me. But you seem to say that this is expected, reasonable and wanted behavior for most use case? I'm not sure that I feel comfortable replacing the defa

Questions about tie parameter for dismax/edismax

2016-04-20 Thread jimi.hullegard
Hi, I have been looking a bit at the tie parameter, and I think I understand how it works, but I still have a few questions about it. 1. It is not documented anywhere (as far as I have seen) what the default value is. Some testing indicates that the default value is 0, and it makes perfect sen

Re: Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread Jack Krupansky
FWIW, length for normalization is measured in terms (tokens), not characters. With TDIFS similarity (the default before 6.0), the normalization is based on the inverse square root of the number of terms in the field: return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms))); That code is i

Re: Live Podcast on Solr 6 with Yonik and Erik Hatcher (Today, 2pm ET)

2016-04-20 Thread Doug Turnbull
Thanks to those that watched live. If you missed it, here's the audio recording if you'd like to listen in http://opensourceconnections.com/blog/2016/04/19/solr-6-release/ Best -Doug On Tue, Apr 19, 2016 at 12:32 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Doh! Thanks Yonik

Is it possible to configure a minimum field length for the fieldNorm value?

2016-04-20 Thread jimi.hullegard
Hi, In general I think that the fieldNorm factor in the score calculation is quite good. But when the text is short I think that the effect is two big. Ie with two documents that have a short text in the same field, just a few characters extra in of the documents lower the fieldNorm factor too

Traversal of documents through network

2016-04-20 Thread vidya
Hi When i queried a word in solr, documents having that keyword is displayed in 500 documents,lets say. Will all those documents traverse through network ? Or how it happens ? Please help me on this. -- View this message in context: http://lucene.472066.n3.nabble.com/Traversal-of-documents-th

Re: Indexing 700 docs per second

2016-04-20 Thread Bram Van Dam
> I have a requirement to index (mainly updation) 700 docs per second. > Suppose I have a 128GB RAM, 32 CPU machine, with each doc size around 260 > byes (6 fields out of which only 2 will undergo updation at the above > rate). This collection has around 122Million docs and that count is pretty > m

Re: Overall large size in Solr across collections

2016-04-20 Thread Shawn Heisey
On 4/19/2016 10:12 PM, Zheng Lin Edwin Yeo wrote: > Thanks for the information Shawn. > > I believe it could be due to the types of file that is being indexed. > Currently, I'm indexing the EML files which are in HTML format, and they > are more rich in content (with in line images and full text),

Re: set session variable in mysql importHandler

2016-04-20 Thread Shawn Heisey
On 4/20/2016 6:01 AM, Zaccheo Bagnati wrote: > I configured an ImportHandler on a MySQL table using jdbc driver. I'm > wondering if is possible to set a session variable in the mysql connection > before executing queries. e. g. "SET SESSION group_concat_max_len = > 100;" Normally the MySQL JDB

Re: Storing different collection on different hard disk

2016-04-20 Thread Bram Van Dam
Have you considered simply mounting different disks under different paths? It looks like you're using Windows, so I'm not sure if that's possible, but it seems like a relatively basic task, so who knows. You could mount Disk 1 as /path/to/collection1 and Disk 2 as /path/to/collection2. That way yo

Solr documents into application cache

2016-04-20 Thread Anil
HI, i would like to load solr documents (based on certain criteria) in application cache (Hazelcast). Is there any best way to do it other than firing paginated queries ? Thanks. Regards, Anil

set session variable in mysql importHandler

2016-04-20 Thread Zaccheo Bagnati
Hi all, I configured an ImportHandler on a MySQL table using jdbc driver. I'm wondering if is possible to set a session variable in the mysql connection before executing queries. e. g. "SET SESSION group_concat_max_len = 100;" Thanks Bye Zaccheo

Re: Facet heatmaps: cluster coordinates based on average position of docs

2016-04-20 Thread Anton K.
Thanks for your answer, David, and have a good vacation. It seems more detailed heatmap is not a goods solution in my case because i need to display cluster icon with number of items inside cluster. So if i got very large amount of cells on map, some of the cells will overlap. I also think about