Re: [ANNOUNCE] YCSB 0.11.0 released

2016-09-22 Thread Alexandre Rafalovitch
Sorry, what is YCSB? The email does not say, the link does not say. Solr connection is not exception except to say that this release has not changed it and so - whatever the YCSB is - this specific update not really relevant to announce to the Solr community. Perhaps you meant to send that to the

Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Erick Erickson
If you can break these up into tokens somehow, that's clearly best. But from the patterns you show it's not likely. WordDelimiterFactory won't quite work since it wouldn't be able to separate ASEF into the token SEF. You'll have a _lot_ fewer terms if you don't use edgengram. Try just using bi

Re: Heap memory usage is -1 in UI

2016-09-22 Thread Shawn Heisey
On 9/22/2016 4:59 PM, Yago Riveiro wrote: > The Heap Memory Usage in the UI it's always -1. There is some way to > get the amount of heap that a core consumes? In all the versions that I have looked at, up to 6.0, this number is either entirely too small or -1. Looking into the code, this info co

Re: Solr Cloud prevent Ping Request From Forwarding Request

2016-09-22 Thread Shawn Heisey
On 9/22/2016 11:33 AM, jimtronic wrote: > Boxes 1,2, and 3 have replicas of collections dogs and cats. Box 4 has > only a replica of dogs. All of these boxes have a healthcheck file on > them that works with the PingRequestHandler to say whether the box is > up or not. If I hit Box4/cats/admin/ping

Re: Very Slow Commits After Solr Index Optimization

2016-09-22 Thread Shawn Heisey
On 9/22/2016 3:27 PM, vsolakhian wrote: > This is not the cause of the problem though. The disk cache is > important for queries and overall performance during optimization, but > once it is done, everything should go back to "normal" (whatever that > normal is). In our case it is the SOFT COMMIT (

Re: How to retrieve parent documents without a nested structure (block-join)

2016-09-22 Thread Alexandre Rafalovitch
Why not a traditional join? https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 23 September 2016 at 00:16, Shamik Bandopadhyay wrote: > Hi

Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Alexandre Rafalovitch
Not fully clear still, but perhaps you need several fields, at least one of which just contains your SEF and OFF values serving effectively as binary switches (FQ matches). And then maybe you strip the leading IDs that you are not matching on. Remember your Solr data shape does not need to match y

[ANNOUNCE] YCSB 0.11.0 released

2016-09-22 Thread Govind Kamat
On behalf of the development community, I'm pleased to announce the release of YCSB version 0.11.0. Highlights: * Support for ArangoDB. This is a new binding. * Update to Apache Geode (incubating) to improve memory footprint. * "couchbase" client deprecated in favor of "couchbase2". * Capability

Re: Heap memory usage is -1 in UI

2016-09-22 Thread Alexandre Rafalovitch
What version of Solr and which Operating System is that on? Regards, Alex On 23 Sep 2016 5:59 AM, "Yago Riveiro" wrote: > The Heap Memory Usage in the UI it's always -1. > > There is some way to get the amount of heap that a core consumes? > > > > - > Best regards > -- > View this messa

SolrCloud query logs, change from 4.10 to 5.5

2016-09-22 Thread Elaine Cario
We're in the process of upgrading from SolrCloud 4.10 to 5.5, and we noticed a change in how distributed queries get logged. In Solr 4.10 we noted that the original node receiving the query logged the query with a full hit count and elapsed time for the entire query, using the original request han

Heap memory usage is -1 in UI

2016-09-22 Thread Yago Riveiro
The Heap Memory Usage in the UI it's always -1. There is some way to get the amount of heap that a core consumes? - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Heap-memory-usage-is-1-in-UI-tp4297601.html Sent from the Solr - User mailing list archive at

Re: Stream expressions: Break up multivalue field into usable tuples

2016-09-22 Thread Joel Bernstein
You could use the facet() expression which works with multi-value fields. This emits aggregated tuples useful for recommendations. For example: facet(baskets, q="item:taco", buckets="item", bucketSorts="count(*) desc", bucketSizeLimit="100", count(*))

Re: Very Slow Commits After Solr Index Optimization

2016-09-22 Thread vsolakhian
Thanks again, Shawn. You are completely right about the use of disk cache and the special note regarding the optimize operation in Solr wiki. This is not the cause of the problem though. The disk cache is important for queries and overall performance during optimization, but once it is done, ever

Stream expressions: Break up multivalue field into usable tuples

2016-09-22 Thread Doug Turnbull
I have a field like follows in my search index { "shopper_id": 1234, "basket_id": 2512, "items_bought": ["eggs", "tacos", "nachos"] } { "shopper_id" 1236, "basket_id": 2515, "items_bought": ["eggs", "tacos", "chicken", "bubble gum"] } I would like to use some of the stream expr

Re: SolrJ App Engine Client

2016-09-22 Thread Susheel Kumar
As per this doc, socket are allowed for paid apps. Not sure if this would make it unrestricted. https://cloud.google.com/appengine/docs/java/sockets/ On Thu, Sep 22, 2016 at 3:38 PM, Jay Parashar wrote: > I sent a similar message earlier but do not see it. Apologize if its > duplicated. > > I a

Re: Very Slow Commits After Solr Index Optimization

2016-09-22 Thread Shawn Heisey
On 9/22/2016 1:01 PM, vsolakhian wrote: > Our index is in HDFS, but we did not change any configuration after we > deleted 35% of records and optimized. > > The relatively slow commit (soft commit and warming up took 1.5 minutes) is > OK for our use case (adding hundreds of thousands and even milli

RE: SolrJ App Engine Client

2016-09-22 Thread Jay Parashar
I am on java 7. As the GAE states, the SocketChannel is not on Google's white list. Stackoverflow (the 2nd link you sent) suggests to re-invent the class. I will see if I come up with anything. Thanks John. -Original Message- From: John Bickerstaff [mailto:j...@johnbickerstaff.com] Sen

Re: SolrJ App Engine Client

2016-09-22 Thread Jay Parashar
No, it does not. The error is (instead of SocketChannel) is now Caused by: java.lang.NoClassDefFoundError: java.net.ProxySelector is a restricted class And it's during an actual query (solrClient.query(query);) -Original Message- From: Mikhail Khludnev [mailto:m...@apache.org] Sent: T

RE: [Ext] Re: SolrJ App Engine Client

2016-09-22 Thread Jay Parashar
I am on java 7. As the GAE states, the SocketChannel is not on Google's white list. Stackoverflow (the 2nd link you sent) suggests to re-invent the class. I will see if I come up with anything. Thanks John. -Original Message- From: John Bickerstaff [mailto:j...@johnbickerstaff.com] Se

RE: [Ext] Re: SolrJ App Engine Client

2016-09-22 Thread Jay Parashar
No, it does not. The error is (instead of SocketChannel) is now Caused by: java.lang.NoClassDefFoundError: java.net.ProxySelector is a restricted class And it's during an actual query (solrClient.query(query);) -Original Message- From: Mikhail Khludnev [mailto:m...@apache.org] Sent:

Re: Solr Cloud prevent Ping Request From Forwarding Request

2016-09-22 Thread jimtronic
It seems like all the parameters in the PingHandler get processed by the remote server. So, things like shards=localhost or distrib=false take effect too late. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-prevent-Ping-Request-From-Forwarding-Request-tp429752

Re: SolrJ App Engine Client

2016-09-22 Thread Mikhail Khludnev
Does it work with plain HttpSolrClient? On Thu, Sep 22, 2016 at 10:50 PM, John Bickerstaff wrote: > Two possibilities from a quick search on the error message - both point to > GAE NOT fully supporting Java 8 > > http://stackoverflow.com/questions/29528580/how-to-deal-with-app-engine- > devserve

Re: SolrJ App Engine Client

2016-09-22 Thread John Bickerstaff
Two possibilities from a quick search on the error message - both point to GAE NOT fully supporting Java 8 http://stackoverflow.com/questions/29528580/how-to-deal-with-app-engine-devserver-exception-due-to-formatstyle-restricted-cl http://stackoverflow.com/questions/29543131/beancreationexception-

SolrJ App Engine Client

2016-09-22 Thread Jay Parashar
I sent a similar message earlier but do not see it. Apologize if its duplicated. I am unable to connect to Solr Cloud zkhost (using CloudSolrClient) from a SolrJ client running on Google App Engine. The error message is "java.nio.channels.SocketChannel is a restricted class. Please see the Googl

RE: Disabling Zip bomb detection in Tika

2016-09-22 Thread Allison, Timothy B.
Not sure what to do with this one. The triggering document has a run of ~50 starts and then ~50+ starts. So, y, Tika limits nested elements to 100. Tika's DefaultHtmlMapper only passes through a few handfuls of elements (SAFE_ELEMENTS), not including or . Solr's MostlyPassThroughHtmlMappe

Re: Very Slow Commits After Solr Index Optimization

2016-09-22 Thread vsolakhian
Hi Shawn, Thank you for response. Everything you said is correct in general. Our index is in HDFS, but we did not change any configuration after we deleted 35% of records and optimized. The relatively slow commit (soft commit and warming up took 1.5 minutes) is OK for our use case (adding hundre

Merging two seperate Solr Indexs

2016-09-22 Thread Lakshmi
Hi Everone, we are redesigning our site and doing this in phases. we have solr as our search engine, our new site data set is different from the old one and is indexes into the new core. now we need to search accross both new and old cores to show the results. 1.how do we search across two dif

Re: Solr Cloud prevent Ping Request From Forwarding Request

2016-09-22 Thread Erick Erickson
Don't know if it works with ping, but try &distrib=false perhaps? But wouldn't you still have, at best, "no such collection?" or something? You may have to read state(s) from Zookeeper and ping each one directly only if it has a replica for a particular collection. Best, Erick On Thu, Sep 22, 2

Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread slee
Alex, You do have a point with EdgeNGramFilterFactory. As requested, I've attached a sample screenshotfor your review. Erick, Here's my use-case. Assume I have the following term stored in global_Value as such: - executionvenuetype#*

Re: Best way to generate multivalue fields from streaming API

2016-09-22 Thread Gus Heck
Hi Mike, Bit late on this, but just saw it... Using streaming to ingest has occurred to me too but I think it's not really right for that except in fairly trivial cases. The very first big problem you will have in the example you give is that you won't be able to mark things as already ingested,

Solr Cloud prevent Ping Request From Forwarding Request

2016-09-22 Thread jimtronic
Here's the scenario: Boxes 1,2, and 3 have replicas of collections dogs and cats. Box 4 has only a replica of dogs. All of these boxes have a healthcheck file on them that works with the PingRequestHandler to say whether the box is up or not. If I hit Box4/cats/admin/ping, Solr forwards the ping

How to retrieve parent documents without a nested structure (block-join)

2016-09-22 Thread Shamik Bandopadhyay
Hi, I have a set of documents indexed which has a pseudo parent-child relationship. Each child document has a reference to the parent document through an ID. As the documents are not available to the crawler in order, I'm not able to index them in a nested structure to support block-join.Here's

Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Erick Erickson
So far a Tika JIRA seems like the right thing. Tim is "a well known entity" in Solr though so I'm sure he'll move it over to Solr if appropriate. Erick On Thu, Sep 22, 2016 at 9:43 AM, Rodrigo Rosenfeld Rosas wrote: > Here it is. Not sure if it's clear enough though: > > https://issues.apache.or

Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Rodrigo Rosenfeld Rosas
Here it is. Not sure if it's clear enough though: https://issues.apache.org/jira/browse/TIKA-2091 Or should I have created the ticket in the Solr project instead? Em 22-09-2016 13:32, Rodrigo Rosenfeld Rosas escreveu: This is one of the documents: https://www.sec.gov/Archives/edgar/data/14720

RE: Disabling Zip bomb detection in Tika

2016-09-22 Thread Allison, Timothy B.
Tika might be overkill for you (no one can hear us, right?). One thing that Tika buys you is fairly smart encoding detection for html pages. Looks like Nokogiri does do some kind of encoding detection, but it may only read the meta-headers. I haven't used Nokogiri, but if you're happy with

Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Rodrigo Rosenfeld Rosas
This is one of the documents: https://www.sec.gov/Archives/edgar/data/1472033/000119380513001310/e611133_f6ef-eutelsat.htm I'll try to create a ticket for this on Jira if I find its location but feel free to open it yourself if you prefer, just let me know. Em 22-09-2016 12:33, Allison, Timot

Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Rodrigo Rosenfeld Rosas
Great, thanks for the URL, I'll check that. I was wondering if maybe Tika would be an overkill solution to my specific case. We don't index PDF, DOC or anything like that, just plain HTML. I mean, if everything Tika does is to extract text from HTML, maybe I could get the same result using N

Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Erick Erickson
I totally missed EdgeNGram. Good catch Alex! Yeah, that's a killer. My shot in the dark here is that your analysis chain isn't the best choice to support your use-case and you're shooting yourself in the foot. So let's back up and talk about your use-case and maybe re-define your analysis chain fo

RE: Disabling Zip bomb detection in Tika

2016-09-22 Thread Allison, Timothy B.
> I'll try to get a sample HTML yielding to this problem and attach it to Jira. Great! Tika 1.14 is around the corner...if this is an easy fix ... :) Thank you.

Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Alexandre Rafalovitch
Well, I am guessing this is the line that's causing the problem: Run your real sample for that field against your indexing definition in Admin UI and see how many tokens you end up with. You may have 50 tokens, but if each of them generates up to 47 representations.. Regards, Alex.

Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread Erick Erickson
Wait: Are you really doing leading wildcard queries? If so, that's likely the root of the problem. Unless you add ReverseWildcardFilterFactory to your analysis chain, Lucene has to enumerate your entire set of terms to find likely candidates, which takes a lot of resources. What happens if you use

Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Erick Erickson
Tika was upgraded from 1.7 to 1.13 in Solr 6.2 so this is likely a change in Tika. You could _try_ downgrading Tika, but that's chancy and I have no guarantee that it'll work. Or use a SolrJ client to use an older version of Tika and transmit it to Solr, here's an example: https://lucidworks.com

Re: Performance Issue when querying Multivalued fields [SOLR 6.1.0]

2016-09-22 Thread slee
Here's what I have define in my schema: This is what I send in the query (2 values): q=global_Value:*mas+AND+global_Value:*sef&df=text&rows=5&version=2.2&echoParams=explicit&fl=global_Value In addition, memory is taking wa

Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Rodrigo Rosenfeld Rosas
Hi, thanks. I was talking to @elyograg over freenode#solr and he (or she, can't know by the nickname) recommended me to create a Java app integrating SolrJ and Tika to perform the indexing. Is this the only way to achieve that with Solr? Since I'm not usually a Java developer, I'd prefer anothe

Solr on GCE

2016-09-22 Thread Jay Parashar
Hi, Is it possible to have SolrJ client running on Google App Engine to talk to a Solr instance hosted on a compute engine? The solr version is 6.2.0 There is also a similar question on Stack Overflow but no answers http://stackoverflow.com/questions/37390072/httpsolrclient-on-google-app-engine

Re: Disabling Zip bomb detection in Tika

2016-09-22 Thread Rodrigo Rosenfeld Rosas
I forgot to mention that this problem just happened after I upgraded to a recent version of Solr and tried to reindex all documents. Some documents that had previously succeeded now failed with this error. Em 22-09-2016 11:58, Rodrigo Rosenfeld Rosas escreveu: Hi, thanks. I was talking to @elyo

Re: Removing SOLR fields from schema

2016-09-22 Thread Erick Erickson
Not only will optimize not help, even re-indexing all the docs to the current collection will leave the meta-data in the index about the removed fields. For 50 fields that likely won't matter. As Shawn says, though, re-indexing from scratch (and I'd use a new collection) is best if at all possible

RE: Disabling Zip bomb detection in Tika

2016-09-22 Thread Allison, Timothy B.
Y, looks like Nick (gagravarr) has answered on SO -- can't do it in Tika currently. -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, September 22, 2016 10:42 AM To: solr-user@lucene.apache.org Cc: 'u...@tika.apache.org' Subject: RE: Disabling Zip

RE: Disabling Zip bomb detection in Tika

2016-09-22 Thread Allison, Timothy B.
I don't think that's configurable at the moment. Tika-colleagues, any recommendations? If you're able to share the file on Tika's jira, we'd be happy to take a look. You shouldn't be getting the zip bomb unless there is a mismatch between opening and closing tags (which could point to a bug

Re: migration to solr 5.5.2 highlight on ngrams not working

2016-09-22 Thread elisabeth benoit
and as was said in previous post, we can clearly see in analysis output that end values for edgengrams are good for solr 4.10.1 and not good for solr 5.5.2 solr 5.5.2 text raw_bytes start end positionLength type position p [70] 0 5 1 word 1 pa [70 61] 0 5 1 word 1 par [70 61 72] 0 5 1 word 1 par

Disabling Zip bomb detection in Tika

2016-09-22 Thread Rodrigo Rosenfeld Rosas
Hi, this is my first message in this list. Is it possible to disable Zip bomb detection in the Tika handler? I've also described the problem here: http://stackoverflow.com/questions/39628519/how-to-disable-or-increase-limit-zip-bomb-detection-in-tika-with-solr-config?noredirect=1#comment6657534

Re: Removing SOLR fields from schema

2016-09-22 Thread Shawn Heisey
On 9/22/2016 7:17 AM, David Santamauro wrote: > Will an optimize remove those fields and corresponding data? I am about 99 percent sure that an optimize will have no effect on fields removed from the Solr schema. The schema doesn't exist at the Lucene level. When you do an optimize, the entire op

Re: Removing SOLR fields from schema

2016-09-22 Thread David Santamauro
On 09/22/2016 08:55 AM, Shawn Heisey wrote: On 9/21/2016 11:46 PM, Selvam wrote: We use SOLR 5.x in cloud mode and have huge set of fields. We now want to remove some 50 fields from Index/schema itself so that indexing & querying will be faster. Is there a way to do that without losing existin

Re: slow updates/searches

2016-09-22 Thread Shawn Heisey
On 9/22/2016 5:46 AM, Muhammad Zahid Iqbal wrote: > Did you find any solution to slow searches? As far as I know jetty > container default configuration is bit slow for large production > environment. This might be true for the default configuration that comes with a completely stock jetty downlo

migration to solr 5.5.2 highlight on ngrams not working

2016-09-22 Thread elisabeth benoit
Hello After migrating from solr 4.10.1 to solr 5.5.2, we dont have the same behaviour with highlighting on edge ngrams fields. We're using it for an autocomplete component. With Solr 4.10.1, if request is sol, highlighting on solr is sol<\em>r with solr 5.5.2, we have solr<\em>. Same problem as

Re: Tutorial not working for me

2016-09-22 Thread Pritchett, James
> > > > From your perspective as a new user, did you find it > anoying/frustrating/confusing that the README.txt in the films example > required/instructed you to first create a handful of fields using a curl > command to hit the Schema API before you could index any of the documents? > > https://g

Re: Removing SOLR fields from schema

2016-09-22 Thread Shawn Heisey
On 9/21/2016 11:46 PM, Selvam wrote: > We use SOLR 5.x in cloud mode and have huge set of fields. We now want > to remove some 50 fields from Index/schema itself so that indexing & > querying will be faster. Is there a way to do that without losing > existing data on other fields? We don't want to

Re: slow updates/searches

2016-09-22 Thread Muhammad Zahid Iqbal
Rallavagu, Did you find any solution to slow searches? As far as I know jetty container default configuration is bit slow for large production environment. On Tue, Sep 20, 2016 at 8:05 AM, Erick Erickson wrote: > If both queries _and_ updates are slow, it's hard to see how upping > the number o

Re: How to set NOT clause on Date range query in Solr

2016-09-22 Thread Muhammad Zahid Iqbal
Intend your question properly so that someone can understand. I am out! On Tue, Sep 20, 2016 at 12:23 PM, Sandeep Khanzode < sandeep_khanz...@yahoo.com.invalid> wrote: > Have been trying to understand this for a while ...How can I specify NOT > clause in the following query?{!field f=schedule >

Re: Solr Special Character Search

2016-09-22 Thread Muhammad Zahid Iqbal
Hi, To handled special characters, either you need to create your own custom filter factory or need to replace already specified filter factory with some other, if you are using StandardFilterFactory. On Tue, Sep 20, 2016 at 5:16 PM, Alexandre Rafalovitch wrote: > What's your field definition

Re: SolrCloud setup

2016-09-22 Thread Customer
Would be great if someone could share link how to create solrcloud on 3 different machines with zookeeper. Been reading documentation and there is nothing worth it for beginner, and best would be if Solr documentation team could make similar example somewhere in documentation, that would be ver

Re: Hackday next month

2016-09-22 Thread Anshum Gupta
Sure, seems like Tuesday works best :) I'll try and make it too. On Thu, Sep 22, 2016 at 10:02 AM Charlie Hull wrote: > On 21/09/2016 19:28, Trey Grainger wrote: > > I know a bunch of folks who would be likely attend the hackday (including > > committers) will have some other meetings on Wednesd

Re: Hackday next month

2016-09-22 Thread Charlie Hull
On 21/09/2016 19:28, Trey Grainger wrote: I know a bunch of folks who would be likely attend the hackday (including committers) will have some other meetings on Wednesday before the conference, so I think that Tuesday is actually a pretty good time to have this. Wednesday is also Yom Kippur - w