date:20160324

On Thu, Mar 24, 2016 at 11:45 AM, Yonik Seeley  wrote:
>> I've been led to understand that 6.X (at least the Lucene part?) won't
>> be backwards compatible with 4.X data. 5.5 at least works fine with data
>> files from 4.7, for instance.

It really doesn't seem like much changed at the lucene index-format
level from 5 to 6...
it makes one wonder how much work would be involved in allowing Lucene
6 to directly read a newer 4.x index... maybe it's just down to
version strings in the index and not much else?

-Yonik

Is dataimporter.functions.escapeSql() functional ?

2016-03-24 Thread Joachim DORNBUSCH

Hi,

I wonder if the function ${dataimporter.functions.escapeSql()} is available in 
Solr 5.3.1.

Whenever i use it in my data import handlers, Solr replaces 
'${dataimporter.functions.escapeSql(field)}' by '' (an empty string).

How can I escape strings when building sql queries in DIHconfigFile ?

I have 2 entities : ref_entity0 and ref_entity1.

If ref_entity0's name contains a simple quote ('), the following exception 
occurs : 
ref_entity1:org.apache.solr.handler.dataimport.DataImportHandlerException: 
Unable to execute query: select ...

Below an excerpt of the code.

Re: SyntaxError - Block Join Parent Query

I suggest to add debugQuery=true and fl=*,[child ...] doc transformer. And
come back with response.

On Thu, Mar 24, 2016 at 3:23 PM, Charles Sanders 
wrote:

> Ah yes. Thank you. Made the correction and I do not get the SyntaxError.
> However, it does not apply the child filter. The query should return only
> TestParent4. But it is returning TestParent2, TestParent3 and TestParent4.
> All of these meet the parent portion of the query (+blue). But only
> TestParent4 should meet the child portion of the query.
>
> q=+blue +{!parent
> which="documentKind:TestParent"v=$childq}&childq=portal_product:("red hat")
>
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":15,
> "params":{
>   "indent":"true",
>   "q":" blue  {!parent which=\"documentKind:TestParent\"v=$childq}",
>   "childq":"portal_product:(\"red hat\")",
>   "wt":"json"}},
>   "response":{"numFound":2733,"start":0,"maxScore":3.0138793,"docs":[
>   {
> "documentKind":"TestParent",
> "uri":"https://ping/pong/testparent4";,
> "view_uri":"https://ping/pong/testparent4";,
> "id":"TestParent4",
> "allTitle":"blue",
> "sortTitle":"blue",
> "_version_":1529622873461751808,
> "_root_":["https://ping/pong/testparent4";],
> "timestamp":"2016-03-23T19:40:48.211Z",
> "language":"en"},
>   {
> "documentKind":"TestParent",
> "uri":"https://ping/pong/testparent3";,
> "view_uri":"https://ping/pong/testparent3";,
> "id":"TestParent3",
> "allTitle":"blue",
> "sortTitle":"blue",
> "_version_":1529622308758487040,
> "_root_":["https://ping/pong/testparent3";],
> "timestamp":"2016-03-23T19:31:49.668Z",
> "language":"en"},
>   {
> "documentKind":"TestParent",
> "uri":"https://ping/pong/testparent2";,
> "view_uri":"https://ping/pong/testparent2";,
> "id":"TestParent2",
> "allTitle":"blue",
> "sortTitle":"blue",
> "_version_":1529622293809987584,
> "_root_":["https://ping/pong/testparent2";],
> "timestamp":"2016-03-23T19:31:35.408Z",
> "language":"en"}
>
> - Original Message -
>
> From: "Mikhail Khludnev" 
> To: "solr-user" 
> Sent: Wednesday, March 23, 2016 5:34:31 PM
> Subject: Re: SyntaxError - Block Join Parent Query
>
> On Thu, Mar 24, 2016 at 12:16 AM, Charles Sanders 
> wrote:
>
> > Thanks for the quick reply. But I'm not sure I understand. Did I do
> > something wrong??
> >
> Absolutely
> 'portal_product"red hat")'
> You omit a colon and opening bracket after a field name, don't you?
>
>
> >
> >
> >
> /select?q=+blue%20+{!parent%20which=%22documentKind:TestParent%22%20v=$childq}&childq=portal_product%22red%20hat%22)
> >
> > 
> > 
> >  400 
> >  2 
> > 
> > 
> > blue {!parent which="documentKind:TestParent" v=$childq}
> > 
> >  portal_product"red hat") 
> > 
> > 
> > 
> > 
> > org.apache.solr.search.SyntaxError: Cannot parse 'portal_product"red
> > hat")': Encountered " ")" ") "" at line 1, column 23. Was expecting one
> of:
> >   ...  ...  ... "+" ... "-" ...  ... "(" ...
> > "*" ... "^" ...  ...  ...  ...  ...
> >  ...  ... "[" ... "{" ...  ... 
> ...
> > 
> >  400 
> > 
> > 
> >
> > - Original Message -
> >
> > From: "Mikhail Khludnev" 
> > To: "solr-user" 
> > Sent: Wednesday, March 23, 2016 5:02:29 PM
> > Subject: Re: SyntaxError - Block Join Parent Query
> >
> > On Wed, Mar 23, 2016 at 11:09 PM, Charles Sanders 
> > wrote:
> >
> > > I'm getting a SyntaxError which I do not understand when I execute a
> > block
> > > join parent query. I'm running Solr5.2.1, with 2 shards. The problem
> > > appears to be in that portion of the query that filters the child
> > document.
> > > Any insight as to where I made the error is greatly appreciated.
> > >
> > > This query produces an error:
> > > q=+blue +{!parent which="documentKind:TestParent"}portal_product:("red
> > > hat")
> > > -- should return TestParent4
> > >
> > q=+blue +{!parent which="documentKind:TestParent"
> > v=$childq}&childq=portal_product:("red hat")
> >
> >
> > > However, this query works:
> > > q=+blue +{!parent which="documentKind:TestParent"}portal_product:rhel
> > > -- should return TestParent2
> > >
> > > Sample data and schema information below:
> > > {
> > > "documentKind": "TestParent",
> > > "uri": "https://ping/pong/testparent1";,
> > > "view_uri": "https://ping/pong/testparent1";,
> > > "id": "TestParent1",
> > > "allTitle": "gold",
> > > "allText": "gold",
> > > "contents": "gold",
> > > "_childDocuments_": [
> > > {
> > > "documentKind": "TestChild",
> > > "uri": "testchild1",
> > > "id": "testchild1",
> > > "portal_product_version": "6",
> > > "portal_product": "rhel"
> > > }
> > > ]
> > > }
> > >
> > > {
> > > "documentKind": "TestParent",
> > > "uri": "https://ping/pong/testparent2";,
> > > "view_uri": "https://ping/pong/testparent2";,
> > > "id": "TestParent2",
> > > "allTitle": "blue",
> > > "a

Re: Next Solr Release - 5.5.1 or 6.0 ?

On Thu, Mar 24, 2016 at 11:38 AM, Bram Van Dam  wrote:
> On 23/03/16 15:50, Yonik Seeley wrote:
>> Kind of a unique situation for a dot-oh release, but from the Solr
>> perspective, 6.0 should have *fewer* bugs than 5.5 (for those features
>> in 5.5 at least)... we've been squashing a bunch of docValue related
>> issues.
>
> I've been led to understand that 6.X (at least the Lucene part?) won't
> be backwards compatible with 4.X data. 5.5 at least works fine with data
> files from 4.7, for instance. With that in mind, at least from my
> selfish perspective, applying fixes to 5.X would be much appreciated ;-)

I hear ya...
In the event that someone volunteers to make a 5.6 or a 5.5.1 release,
we should have a big back-porting party :-)

-Yonik

Re: Index not fitting in memory (file-cache)

2016-03-24 Thread Toke Eskildsen

Robert Brown  wrote:
> Before I go out and throw more RAM into the system, in the above
> example, what would you recommend?

That you try to determine what causes the slow response times.

Replay logged queries (thousands of queries, not just a few) and see if the 
pauses are random or tied to specific queries. Turn on GC-logs to see if the 
pauses are caused by garbage collection.

If the long response times are tied to specific queries, then try turning off 
queryResultCache and documentCache and replay a handful of the slow queries a 
few times. This ensures that the data needed are cached by the file system. If 
they continue being slow, then more RAM might not help you.

- Toke Eskildsen

Solr 5 and JDK 8 - awful performance

2016-03-24 Thread Dragos Vizireanu

Hi,

I have a big problem with the performance of running Solr 5 with JDK 8.

Details:
- tried with both Solr 5.4.0 and Solr 5.5.0  (even with Solr 4)
- default Solr 5 configuration
- created a new core, for which I am using data import handler to get data
from MySQL


When I am trying to index data using the import handler, the import is very
fast with JDK 7 and awful slow with JDK 8 ! Here are the results copied
from Solr gui:

- *JDK 7  - Duration 17 seconds*

Indexing completed. Added/Updated: 5,997 documents. Deleted 0
documents. *(Duration:
17s)*

Requests: 6 (0/s), Fetched: 235,593 (13,858/s), Skipped: 0, Processed:
5,997 (353/s)



- *JDK 8 - Duration 7 minutes*

Indexing completed. Added/Updated: 5,997 documents. Deleted 0
documents. *(Duration:
7m 06s)*
Requests: 6 (0/s), Fetched: 47,160 (111/s), Skipped: 0, Processed: 5,997


As you can see, there is a problem with the performance being awful with
JDK 8. While calling DIH to index, I got no errors in the log, but the
indexing just didn't do anything for some minutes and then started again.
This is very strange I could not find the reason why this is happening.


I also found these ideas on Solr Wiki:

https://wiki.apache.org/solr/SolrPerformanceProblems -> GC pause problems

https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr

The thing is that it's saying " If you are using the bin/solr or bin\solr
script to start Solr, you already have GC tuning and won't need to worry
about the recommendations here."; I did check the scripts in /bin directory
and there is set a Garbage tuning "set GC_TUNE ...".


I would appreciate if you can help with this issue.


Best regards,

Dragos

Re: Solr 5 and JDK 8 - awful performance

Wow... that's pretty strange.

> indexing just didn't do anything for some minutes and then started again.

I wonder if it's anything to do with DNS lookups or something like that?

-Yonik


On Thu, Mar 24, 2016 at 11:54 AM, Dragos Vizireanu  wrote:
> Hi,
>
> I have a big problem with the performance of running Solr 5 with JDK 8.
>
> Details:
> - tried with both Solr 5.4.0 and Solr 5.5.0  (even with Solr 4)
> - default Solr 5 configuration
> - created a new core, for which I am using data import handler to get data
> from MySQL
>
>
> When I am trying to index data using the import handler, the import is very
> fast with JDK 7 and awful slow with JDK 8 ! Here are the results copied
> from Solr gui:
>
> - *JDK 7  - Duration 17 seconds*
>
> Indexing completed. Added/Updated: 5,997 documents. Deleted 0
> documents. *(Duration:
> 17s)*
>
> Requests: 6 (0/s), Fetched: 235,593 (13,858/s), Skipped: 0, Processed:
> 5,997 (353/s)
>
>
>
> - *JDK 8 - Duration 7 minutes*
>
> Indexing completed. Added/Updated: 5,997 documents. Deleted 0
> documents. *(Duration:
> 7m 06s)*
> Requests: 6 (0/s), Fetched: 47,160 (111/s), Skipped: 0, Processed: 5,997
>
>
> As you can see, there is a problem with the performance being awful with
> JDK 8. While calling DIH to index, I got no errors in the log, but the
> indexing just didn't do anything for some minutes and then started again.
> This is very strange I could not find the reason why this is happening.
>
>
> I also found these ideas on Solr Wiki:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems -> GC pause problems
>
> https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr
>
> The thing is that it's saying " If you are using the bin/solr or bin\solr
> script to start Solr, you already have GC tuning and won't need to worry
> about the recommendations here."; I did check the scripts in /bin directory
> and there is set a Garbage tuning "set GC_TUNE ...".
>
>
> I would appreciate if you can help with this issue.
>
>
> Best regards,
>
> Dragos

Re: Next Solr Release - 5.5.1 or 6.0 ?

There's always the IndexUpgrader, one could run the 5x version against
a 4x index and have a 5x-compatible index that would then be readable
by 6x OOB.

A bit convoluted to be sure.

Erick

On Thu, Mar 24, 2016 at 8:49 AM, Yonik Seeley  wrote:
> On Thu, Mar 24, 2016 at 11:45 AM, Yonik Seeley  wrote:
>>> I've been led to understand that 6.X (at least the Lucene part?) won't
>>> be backwards compatible with 4.X data. 5.5 at least works fine with data
>>> files from 4.7, for instance.
>
> It really doesn't seem like much changed at the lucene index-format
> level from 5 to 6...
> it makes one wonder how much work would be involved in allowing Lucene
> 6 to directly read a newer 4.x index... maybe it's just down to
> version strings in the index and not much else?
>
> -Yonik

Re: Next Solr Release - 5.5.1 or 6.0 ?

On Thu, Mar 24, 2016 at 12:16 PM, Erick Erickson
 wrote:
> There's always the IndexUpgrader, one could run the 5x version against
> a 4x index and have a 5x-compatible index that would then be readable
> by 6x OOB.

This may be the last time that will work.  See the thread on the
changes to all the numeric types... it doesn't seem like the
IndexUpgrader will be able to handle that transition?

Not to mention the fact that Solr 6 is using deprecated Lucene 6
numeric types if those are removed in Lucene 7, then what?

-Yonik

Re: SolrCloud: published host/port

2016-03-24 Thread Tomás Fernández Löbbe

I believe this can be done by setting the "host" and "hostPort" elements in
solr.xml. In the default solr.xml they are configured in a way to support
also setting them via System properties:

${host:}
${jetty.port:8983}

Tomás

On Wed, Mar 23, 2016 at 11:26 PM, Hendrik Haddorp 
wrote:

> Hi,
>
> is it possible to instruct Solr to publish a different host/port into
> ZooKeeper then it is actually running on? This is required if the Solr
> node is not directly reachable on its port from outside due to a NAT
> setup or when running Solr as a Docker container with a mapped port.
>
> For what its worth ElasticSearch is supporting this as documented here [1]:
> - transport.publish_port
> - transport.publish_host
>
> regards,
> Hendrik
>
> [1]
>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-transport.html
>

Re: Performance potential for updating (reindexing) documents

Impossible to say if for no other reason than you haven't told us
how many physical machines this is spread over ;).

For the process you've outlined to work, all the fields are stored,
right? So why not use Atomic Updates? You still have to query
the docs.

About querying. If I'm reading this right, you'll form some query
like q=whatever_identifies_docs_that_should_get_values_X_Y_Z
then process each one of those. So, really, all you need here is
the id of all the queries that satisfy that clause. You should
consider the /export handler (Streaming Aggregation). It's
designed to return large result sets with minimal memory.

So the process I'm thinking of is this (and it assumes all your
fields are stored so Atomic updates work).

Use the CloudSolrStream for each query. As the stream
comes back, you get the IDs you need and use them
to do an atomic update that adds the relevant fields.

Note that when _adding_ fields, you can change the schema
to include the new fields on an existing collection. All that
means is that any new docs added can have these fields.

Now, if all the fields are _not_ stored at least once, you can't
use atomic updates and you'll have to re-index from the system
of record.

Best,
Erick

On Thu, Mar 24, 2016 at 7:18 AM, tedsolr  wrote:
> With a properly tuned solr cloud infrastructure and less than 1B total docs
> spread out over 50 collections where the largest collection is 100M docs,
> what is a reasonable target goal for entirely reindexing a single
> collection?
>
> I understand there are a lot of variables, so I'm hypothetically wiping them
> away by assuming "a properly tuned infrastructure". So the hardware, RAM,
> etc. is configured correctly (not so in my case).
>
> The scenario is to add 3 fields to all the existing docs in one collection.
> The fields are the same but the values vary based on the docs. So a search
> is performed and finds 100 matches - all 100 docs will get the same updates.
> Then another search is performed that matches 15000 docs, and these are
> updated. This continues 10-20,000 times until essentially all the docs have
> been updated.
>
> The docs all have 100 - 200 fields, mostly text and mostly small in size.
> What's the best possible throughput I can expect? 1000 docs/sec? 5000
> docs/sec?
>
> Using SolrJ for querying and indexing against a v5.2.1 cloud.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performance-potential-for-updating-reindexing-documents-tp4265861.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Tomás Fernández Löbbe

>
>
> Not to mention the fact that Solr 6 is using deprecated Lucene 6
> numeric types if those are removed in Lucene 7, then what?
>
> I believe this is going to be an issue. We have SOLR-8396
 open, but it doesn't look
like it's going to make it to 6.0 (I tried to look at it but I didn't have
time the past weeks). We'll have to support it until Solr 8 I guess.

Tomás

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Yago Riveiro

I did the IndexUpgrade path to upgrade my 4.x index to 5.x (15 terabytes of
data an growing), It wasn't an easy task to do it without downtime,
IndexUpgrade doesn't work if the replica is loaded.  

With 12T of data re-index is like a no-no operation (the time expended to do
the re-index can take several months).

Optimize one replica at a time doesn't work (All replicas are optimize at the
same time) killing CPU an IO and as result the cluster.

Conclusion, if I need to do it again to upgrade to a newer version of Solr I'm
in literally in troubles ...  

\--

/Yago Riveiro

> On Mar 24 2016, at 4:32 pm, Tomás Fernández Löbbe
 wrote:  

>

> >  
>  
> Not to mention the fact that Solr 6 is using deprecated Lucene 6  
> numeric types if those are removed in Lucene 7, then what?  
>  
> I believe this is going to be an issue. We have SOLR-8396  
; open, but it doesn't
look  
like it's going to make it to 6.0 (I tried to look at it but I didn't have  
time the past weeks). We'll have to support it until Solr 8 I guess.

>

> Tomás

RE: Reload or Reload and Solr Restart

2016-03-24 Thread Matt Kuiper

Based on what I have read, it looks like only a collection reload is needed for 
the scenario below and for that matter for applying any modifications to the 
solrconfig.xml.

Matt

From: Matt Kuiper
Sent: Wednesday, March 23, 2016 10:26 AM
To: solr-user@lucene.apache.org
Subject: Reload or Reload and Solr Restart

Hi,

I am preparing for a Production install.  In this release we will be moving 
from an AutoSuggest feature based on the Suggestor component to one based on an 
Ngram approach.  We will perform a re-index of the source data.

[
After updating the Solr config for each collection a collection reload (via the 
Solr collection api) will be executed. My question is whether this reload will 
clear the memory used by the Suggestor component or if a Solr restart on each 
Solr node will be necessary to clear the in-memory structure that was 
previously used by the Suggestor component.

Thanks,
Matt

Re: Next Solr Release - 5.5.1 or 6.0 ?

On Thu, Mar 24, 2016 at 12:32 PM, Tomás Fernández Löbbe
 wrote:
>>
>>
>> Not to mention the fact that Solr 6 is using deprecated Lucene 6
>> numeric types if those are removed in Lucene 7, then what?
>>
>> I believe this is going to be an issue. We have SOLR-8396
>  open, but it doesn't look
> like it's going to make it to 6.0 (I tried to look at it but I didn't have
> time the past weeks). We'll have to support it until Solr 8 I guess.

Even if it did make it for 6.0, it seems like someone couldn't upgrade
from 6->7 (future) without reindexing?
I don't think the IndexUpgrade tool is going to migrate from
(Trie)NumericField->PointField, right?

-Yonik

RE: Solr 5.5 Issue with CJK and mm being ignored when searching with white space.

2016-03-24 Thread Tiffany Goguen

Hi Shawn,

Thank you for the reply.  

I removed defaultOperator parameter from the schema.  I have the following in 
the request handler:

   edismax
   100

I reindexed content.

I am still seeing the same incorrect behavior.

mm=100 does not seem to be sticking with
クイック リファレンス(space between ク リ)  I am still incorrectly getting 1 result.

Tiffany

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Wednesday, March 23, 2016 6:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 5.5 Issue with CJK and mm being ignored when searching with 
white space.

On 3/23/2016 8:21 AM, Tiffany Goguen wrote:
> Is this a new bug?
>
> I am using esdimax and I have set mm=100% via  defaultOperator="AND"/>

This config was deprecated in 4.x.  I actually thought support had been removed 
from 5.x already, but perhaps not.  The equivalent in modern configs is the 
q.op parameter ... but when using edismax, you should not use either q.op OR 
the defaultOperator parameter.  You should explicitly set mm, to 100% in this 
case.

The following bug applies to 5.5 and probably explains at least some of what 
you are seeing:

https://issues.apache.org/jira/browse/SOLR-8812

Thanks,
Shawn

Re: SyntaxError - Block Join Parent Query

2016-03-24 Thread Charles Sanders

I tried this on another machine with a clean index. I was able to get the query 
to work. Thank you. 

Couple of related questions. 
1) I was able to get this to work on a single shard machine. But I am not able 
to get this query to work on Solr with two shards (SorlCloud). Any reason why 
this does not work with SolrCloud? 
2) The query pattern you supplied does not appear in the documentation. Do you 
know of any reason why the information in the documentation does not work and 
does not mention your pattern? 
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers
 

Thanks again. 


- Original Message -

From: "Mikhail Khludnev"  
To: "solr-user"  
Sent: Thursday, March 24, 2016 11:31:11 AM 
Subject: Re: SyntaxError - Block Join Parent Query 

I suggest to add debugQuery=true and fl=*,[child ...] doc transformer. And 
come back with response. 

On Thu, Mar 24, 2016 at 3:23 PM, Charles Sanders  
wrote: 

> Ah yes. Thank you. Made the correction and I do not get the SyntaxError. 
> However, it does not apply the child filter. The query should return only 
> TestParent4. But it is returning TestParent2, TestParent3 and TestParent4. 
> All of these meet the parent portion of the query (+blue). But only 
> TestParent4 should meet the child portion of the query. 
> 
> q=+blue +{!parent 
> which="documentKind:TestParent"v=$childq}&childq=portal_product:("red hat") 
> 
> 
> { 
> "responseHeader":{ 
> "status":0, 
> "QTime":15, 
> "params":{ 
> "indent":"true", 
> "q":" blue {!parent which=\"documentKind:TestParent\"v=$childq}", 
> "childq":"portal_product:(\"red hat\")", 
> "wt":"json"}}, 
> "response":{"numFound":2733,"start":0,"maxScore":3.0138793,"docs":[ 
> { 
> "documentKind":"TestParent", 
> "uri":"https://ping/pong/testparent4";, 
> "view_uri":"https://ping/pong/testparent4";, 
> "id":"TestParent4", 
> "allTitle":"blue", 
> "sortTitle":"blue", 
> "_version_":1529622873461751808, 
> "_root_":["https://ping/pong/testparent4";], 
> "timestamp":"2016-03-23T19:40:48.211Z", 
> "language":"en"}, 
> { 
> "documentKind":"TestParent", 
> "uri":"https://ping/pong/testparent3";, 
> "view_uri":"https://ping/pong/testparent3";, 
> "id":"TestParent3", 
> "allTitle":"blue", 
> "sortTitle":"blue", 
> "_version_":1529622308758487040, 
> "_root_":["https://ping/pong/testparent3";], 
> "timestamp":"2016-03-23T19:31:49.668Z", 
> "language":"en"}, 
> { 
> "documentKind":"TestParent", 
> "uri":"https://ping/pong/testparent2";, 
> "view_uri":"https://ping/pong/testparent2";, 
> "id":"TestParent2", 
> "allTitle":"blue", 
> "sortTitle":"blue", 
> "_version_":1529622293809987584, 
> "_root_":["https://ping/pong/testparent2";], 
> "timestamp":"2016-03-23T19:31:35.408Z", 
> "language":"en"} 
> 
> - Original Message - 
> 
> From: "Mikhail Khludnev"  
> To: "solr-user"  
> Sent: Wednesday, March 23, 2016 5:34:31 PM 
> Subject: Re: SyntaxError - Block Join Parent Query 
> 
> On Thu, Mar 24, 2016 at 12:16 AM, Charles Sanders  
> wrote: 
> 
> > Thanks for the quick reply. But I'm not sure I understand. Did I do 
> > something wrong?? 
> > 
> Absolutely 
> 'portal_product"red hat")' 
> You omit a colon and opening bracket after a field name, don't you? 
> 
> 
> > 
> > 
> > 
> /select?q=+blue%20+{!parent%20which=%22documentKind:TestParent%22%20v=$childq}&childq=portal_product%22red%20hat%22)
>  
> > 
> >  
> >  
> >  400  
> >  2  
> >  
> >  
> > blue {!parent which="documentKind:TestParent" v=$childq} 
> >  
> >  portal_product"red hat")  
> >  
> >  
> >  
> >  
> > org.apache.solr.search.SyntaxError: Cannot parse 'portal_product"red 
> > hat")': Encountered " ")" ") "" at line 1, column 23. Was expecting one 
> of: 
> >   ...  ...  ... "+" ... "-" ...  ... "(" ... 
> > "*" ... "^" ...  ...  ...  ...  ... 
> >  ...  ... "[" ... "{" ...  ...  
> ... 
> >  
> >  400  
> >  
> >  
> > 
> > - Original Message - 
> > 
> > From: "Mikhail Khludnev"  
> > To: "solr-user"  
> > Sent: Wednesday, March 23, 2016 5:02:29 PM 
> > Subject: Re: SyntaxError - Block Join Parent Query 
> > 
> > On Wed, Mar 23, 2016 at 11:09 PM, Charles Sanders  
> > wrote: 
> > 
> > > I'm getting a SyntaxError which I do not understand when I execute a 
> > block 
> > > join parent query. I'm running Solr5.2.1, with 2 shards. The problem 
> > > appears to be in that portion of the query that filters the child 
> > document. 
> > > Any insight as to where I made the error is greatly appreciated. 
> > > 
> > > This query produces an error: 
> > > q=+blue +{!parent which="documentKind:TestParent"}portal_product:("red 
> > > hat") 
> > > -- should return TestParent4 
> > > 
> > q=+blue +{!parent which="documentKind:TestParent" 
> > v=$childq}&childq=portal_product:("red hat") 
> > 
> > 
> > > However, this query works: 
> > > q=+blue +{!parent which="documentKind:TestParent"}portal_product:rhel 
> > > -- should return TestParent2 
> > > 
> > > Sample data and schema information below: 
> > > { 
> > > "documen

Re: Performance potential for updating (reindexing) documents

2016-03-24 Thread tedsolr

Hi Erick,

My post was scant on details. The numbers I gave for collection sizes are
projections for the future. I am in the midst of an upgrade that will be
completed within a few weeks. My concern is that I may not be able to
produce the throughput necessary to index an entire collection quickly
enough (3 to 4 hours) for a large customer (100M docs).

Currently:
- single Solr instance on one host that is sharing memory and cpu with other
applications
- 4GB dedicated to Solr
~ 20M docs
~ 10GB index size
- using HttpSolrClient for all queries and updates

Very soon:
- two VMs dedicated to Solr (2 nodes)
- up to 16GB available memory
- running in cloud mode, and can now scale horizontally
- all collections are single sharded with 2 replicas

All fields are stored. The scenario I gave is using atomic updates. The
updates are done in large batches of 5000-1 docs. The use case I have is
different than most Solr setups perhaps. Indexing throughput is more
important than qps. We have very few concurrent users that do massive
amounts of doc updates. I am seeing lousy (production) performance currently
(not a surprise - long GC pauses), and have just begun the process of tuning
in a test environment.

After some more weeks of testing and tweaking I hope to get to 5000
updates/sec, but even that may not be enough. So my main concern is that
this business model (of updating entire collections about once a day) cannot
be supported by Solr.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Performance-potential-for-updating-reindexing-documents-tp4265861p4265922.html
Sent from the Solr - User mailing list archive at Nabble.com.

[nesting] Any way to return the whole hierarchical structure when doing Block Join queries?

2016-03-24 Thread Alisa Z .

 Hi all, 

I apologize for duplicating my previous message: 
Solr 5.3:  anything similar to ChildDocTransformerFactory  that does not 
flatten the hierarchical structure?    

However, it is still an open and interesting question:  

Following the example from  https://dzone.com/articles/using-solr-49-new , 
let's say we are given multiple-level nested structure: 


1
I am the parent
PARENT

1.1
I am the 1st child
CHILD


1.2
I am the 2nd child
CHILD

1.2.1
I am a grandchildren
GRANDCHILD





Querying 
q={!parent which="cat:PARENT"}name:(I am +child)&fl=id,name,[child 
parentFilter=cat:PARENT]

will return flattened structure, where cat:CHILD and cat:GRANDCHILD documents 
end up on the same level:

1
I am the parent
PARENT

1.1
I am the 1st child
CHILD


1.2
I am the 2nd child
CHILD


1.2.1
I am a grandchildren
GRANDCHILD
  
 Indeed, the JAVAdocs for ChildDocTransformerFactory say: "This 
transformer returns all descendants of each parent document in a flat list 
nested inside the parent document". 

Yet is there any way to preserve the hierarchy in the response? I really need 
to find the way to preserve the structure in the response.  

Thank you in advance! 

-- 
Alisa Zhila
--

Overriding SolrCloud Leader Election and manually assign leadership?-Is it possible?

2016-03-24 Thread ram

Hello,
  We have a setup where we have a 5 server cluster of which 3 are cloud
boxes and 2 are physical boxes. We have external zookeeper setup for the
same.The physical boxes have more capacity and in the past,we have seen
whenever the one of the boxes is leader in solrcloud,the performance seems
to be really good. However the leader election changes from time to time and
most of the time the cloud boxes seem to process most of the traffic
  Currently our solrcloud looks something like this
  Physical Box 1
X ->shard 1  Cloud 1
 - Cloud 2(Leader)
 - Physical Box 2
 -- Cloud 3
   
   Physical Box 1
 ->shard 1  Cloud 1
 - Cloud 2(Leader)
 - Physical Box 2
 -- Cloud 3

 Physical Box 1
 ->shard 1  Cloud 1
 - Cloud 2(Leader)
 - Physical Box 2
 -- Cloud 3


We are looking for a way to assign leadership to one of the physical box
always and if possible distribute the traffic only between Physical Box 1
and Physical Box 2. 

Is it possible to manually assign leadership? Would appreciate your inputs




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Overriding-SolrCloud-Leader-Election-and-manually-assign-leadership-Is-it-possible-tp4265932.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Next Solr Release - 5.5.1 or 6.0 ?

2016-03-24 Thread Jack Krupansky

Thanks, Erick, I had forgotten about that. I did find one short reference
to it in the doc: "Be sure to run the Lucene IndexUpgrader included with
Solr 4.10 if you might still have old 3x formatted segments in your index.
Alternatively: fully optimize your index with Solr 4.10 to make sure it
consists only of one up-to-date index segment."

See:
https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5

Note to doc guys and committers: That section needs to be replaced with
"Major Changes form Solr 5 to Solr 6".

Also, that IU reference doesn't link to any doc, even the Lucene Javadoc:
https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/index/IndexUpgrader.html

Feels like there should be some Solr doc as well. For example, can Solr be
running, or does it (each node if SolrCloud) need to be shut down first.
And note that it's needed for each collection. Presumably the collections
can be upgraded in parallel since they are distinct directories. It would
be nice to have a SolrIndexUpgrader to run in one shot and discover and
upgrade all Solr collections.

-- Jack Krupansky

On Thu, Mar 24, 2016 at 12:16 PM, Erick Erickson 
wrote:

> There's always the IndexUpgrader, one could run the 5x version against
> a 4x index and have a 5x-compatible index that would then be readable
> by 6x OOB.
>
> A bit convoluted to be sure.
>
> Erick
>
> On Thu, Mar 24, 2016 at 8:49 AM, Yonik Seeley  wrote:
> > On Thu, Mar 24, 2016 at 11:45 AM, Yonik Seeley 
> wrote:
> >>> I've been led to understand that 6.X (at least the Lucene part?) won't
> >>> be backwards compatible with 4.X data. 5.5 at least works fine with
> data
> >>> files from 4.7, for instance.
> >
> > It really doesn't seem like much changed at the lucene index-format
> > level from 5 to 6...
> > it makes one wonder how much work would be involved in allowing Lucene
> > 6 to directly read a newer 4.x index... maybe it's just down to
> > version strings in the index and not much else?
> >
> > -Yonik
>

Re: Overriding SolrCloud Leader Election and manually assign leadership?-Is it possible?

First of all, for a cluster this size the additional work a leader
does is so small I suspect you'd have a hard time measuring any
performance difference. Personally I wouldn't worry about it. If you
insist, you can look at the collections API call REBALANCELEADERS (you
have to assigned the preferredLeader property). HOWEVER, that
functionality was put in for situations in which 100s of leaders could
be on a single box. At your scale it's highly unlikely to make any
difference so I'd imagine you'd get a lot greater return for effort by
concentrating your efforts somewhere else...

But step back and think about "manually assign leadership". There's no
way to absolutely fix leadership, and you wouldn't want to. Without a
leader, you cannot index documents. So fixing leadership on a
particular node would mean you'd take all the HA/DR away from
SolrCloud, A Very Bad Thing.

bq:  ...if possible distribute the traffic only between Physical Box 1
and Physical Box 2...

I have no idea what this means. Solr will randomly distribute queries
across all nodes in the collection. If you don't want to use some box,
don't put Solr instances on it.

Best,
Erick

On Thu, Mar 24, 2016 at 11:53 AM, ram  wrote:
> Hello,
>   We have a setup where we have a 5 server cluster of which 3 are cloud
> boxes and 2 are physical boxes. We have external zookeeper setup for the
> same.The physical boxes have more capacity and in the past,we have seen
> whenever the one of the boxes is leader in solrcloud,the performance seems
> to be really good. However the leader election changes from time to time and
> most of the time the cloud boxes seem to process most of the traffic
>   Currently our solrcloud looks something like this
>   Physical Box 1
> X ->shard 1  Cloud 1
>  - Cloud 2(Leader)
>  - Physical Box 2
>  -- Cloud 3
>
>    Physical Box 1
>  ->shard 1  Cloud 1
>  - Cloud 2(Leader)
>  - Physical Box 2
>  -- Cloud 3
>
>  Physical Box 1
>  ->shard 1  Cloud 1
>  - Cloud 2(Leader)
>  - Physical Box 2
>  -- Cloud 3
>
>
> We are looking for a way to assign leadership to one of the physical box
> always and if possible distribute the traffic only between Physical Box 1
> and Physical Box 2.
>
> Is it possible to manually assign leadership? Would appreciate your inputs
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Overriding-SolrCloud-Leader-Election-and-manually-assign-leadership-Is-it-possible-tp4265932.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance potential for updating (reindexing) documents

Well, for comparison I routinely get 20K docs/second on my Mac Pro
indexing Wikipedia docs. I _think_ I have 4 shards when I do this, all
in the same JVM. I'd be surprised if you can't get your 5K docs/sec,
but you may indeed need more shards.

All that said, 4G  for the JVM is kind of constrained, you already
mentioned GC. There are two pitfalls here:
1> allocating too little memory and spending lots of cycles doing very
small GCs. At 4G this is likelier than:
2> having very large heaps and seeing "stop the world" GC pauses.

So I think you're on the right track looking at memory, at least
that's what I'd be looking at first.

Note: your indexing scaling (assuming you're sending complete docs not
atomic updates) will scale better if you
1> use CloudSolrClient from Java since it routes docs to the right
leader first and avoids an extra hop.
2> batch updates. Sending one doc at a time makes things very slow, see:

https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/

Best,
Erick

On Thu, Mar 24, 2016 at 10:57 AM, tedsolr  wrote:
> Hi Erick,
>
> My post was scant on details. The numbers I gave for collection sizes are
> projections for the future. I am in the midst of an upgrade that will be
> completed within a few weeks. My concern is that I may not be able to
> produce the throughput necessary to index an entire collection quickly
> enough (3 to 4 hours) for a large customer (100M docs).
>
> Currently:
> - single Solr instance on one host that is sharing memory and cpu with other
> applications
> - 4GB dedicated to Solr
> ~ 20M docs
> ~ 10GB index size
> - using HttpSolrClient for all queries and updates
>
> Very soon:
> - two VMs dedicated to Solr (2 nodes)
> - up to 16GB available memory
> - running in cloud mode, and can now scale horizontally
> - all collections are single sharded with 2 replicas
>
> All fields are stored. The scenario I gave is using atomic updates. The
> updates are done in large batches of 5000-1 docs. The use case I have is
> different than most Solr setups perhaps. Indexing throughput is more
> important than qps. We have very few concurrent users that do massive
> amounts of doc updates. I am seeing lousy (production) performance currently
> (not a surprise - long GC pauses), and have just begun the process of tuning
> in a test environment.
>
> After some more weeks of testing and tweaking I hope to get to 5000
> updates/sec, but even that may not be enough. So my main concern is that
> this business model (of updating entire collections about once a day) cannot
> be supported by Solr.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performance-potential-for-updating-reindexing-documents-tp4265861p4265922.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: [nesting] Any way to return the whole hierarchical structure when doing Block Join queries?

I think you cal already kick tires and contribute a test case into
https://issues.apache.org/jira/browse/SOLR-8208 that's already reachable
there I believe, but I still working on core design.

On Thu, Mar 24, 2016 at 10:02 PM, Alisa Z.  wrote:

>  Hi all,
>
> I apologize for duplicating my previous message:
> Solr 5.3:  anything similar to ChildDocTransformerFactory  that does not
> flatten the hierarchical structure?
>
> However, it is still an open and interesting question:
>
> Following the example from  https://dzone.com/articles/using-solr-49-new
> , let's say we are given multiple-level nested structure:
>
> 
> 1
> I am the parent
> PARENT
> 
> 1.1
> I am the 1st child
> CHILD
> 
> 
> 1.2
> I am the 2nd child
> CHILD
> 
> 1.2.1
> I am a grandchildren
> GRANDCHILD
> 
> 
> 
>
>
> Querying
> q={!parent which="cat:PARENT"}name:(I am +child)&fl=id,name,[child
> parentFilter=cat:PARENT]
>
> will return flattened structure, where cat:CHILD and cat:GRANDCHILD
> documents end up on the same level:
> 
> 1
> I am the parent
> PARENT
> 
> 1.1
> I am the 1st child
> CHILD
> 
> 
> 1.2
> I am the 2nd child
> CHILD
> 
> 
> 1.2.1
> I am a grandchildren
> GRANDCHILD
> 
>  Indeed, the JAVAdocs for ChildDocTransformerFactory say: "This
> transformer returns all descendants of each parent document in a flat list
> nested inside the parent document".
>
> Yet is there any way to preserve the hierarchy in the response? I really
> need to find the way to preserve the structure in the response.
>
> Thank you in advance!
>
> --
> Alisa Zhila
> --
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: SyntaxError - Block Join Parent Query

On Thu, Mar 24, 2016 at 8:31 PM, Charles Sanders 
wrote:

> I tried this on another machine with a clean index. I was able to get the
> query to work. Thank you.
>
> Couple of related questions.
> 1) I was able to get this to work on a single shard machine. But I am not
> able to get this query to work on Solr with two shards (SorlCloud). Any
> reason why this does not work with SolrCloud?
>

I hardly imagine. Show me your debugQuery=true output.


> 2) The query pattern you supplied does not appear in the documentation. Do
> you know of any reason why the information in the documentation does not
> work and does not mention your pattern?
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers

Just nobody wrote it there, but anybody can, and somebody wrote at some
other places
http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html?q=block+join
https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries


>
>
> Thanks again.
>
>
> - Original Message -
>
> From: "Mikhail Khludnev" 
> To: "solr-user" 
> Sent: Thursday, March 24, 2016 11:31:11 AM
> Subject: Re: SyntaxError - Block Join Parent Query
>
> I suggest to add debugQuery=true and fl=*,[child ...] doc transformer. And
> come back with response.
>
> On Thu, Mar 24, 2016 at 3:23 PM, Charles Sanders 
> wrote:
>
> > Ah yes. Thank you. Made the correction and I do not get the SyntaxError.
> > However, it does not apply the child filter. The query should return only
> > TestParent4. But it is returning TestParent2, TestParent3 and
> TestParent4.
> > All of these meet the parent portion of the query (+blue). But only
> > TestParent4 should meet the child portion of the query.
> >
> > q=+blue +{!parent
> > which="documentKind:TestParent"v=$childq}&childq=portal_product:("red
> hat")
> >
> >
> > {
> > "responseHeader":{
> > "status":0,
> > "QTime":15,
> > "params":{
> > "indent":"true",
> > "q":" blue {!parent which=\"documentKind:TestParent\"v=$childq}",
> > "childq":"portal_product:(\"red hat\")",
> > "wt":"json"}},
> > "response":{"numFound":2733,"start":0,"maxScore":3.0138793,"docs":[
> > {
> > "documentKind":"TestParent",
> > "uri":"https://ping/pong/testparent4";,
> > "view_uri":"https://ping/pong/testparent4";,
> > "id":"TestParent4",
> > "allTitle":"blue",
> > "sortTitle":"blue",
> > "_version_":1529622873461751808,
> > "_root_":["https://ping/pong/testparent4";],
> > "timestamp":"2016-03-23T19:40:48.211Z",
> > "language":"en"},
> > {
> > "documentKind":"TestParent",
> > "uri":"https://ping/pong/testparent3";,
> > "view_uri":"https://ping/pong/testparent3";,
> > "id":"TestParent3",
> > "allTitle":"blue",
> > "sortTitle":"blue",
> > "_version_":1529622308758487040,
> > "_root_":["https://ping/pong/testparent3";],
> > "timestamp":"2016-03-23T19:31:49.668Z",
> > "language":"en"},
> > {
> > "documentKind":"TestParent",
> > "uri":"https://ping/pong/testparent2";,
> > "view_uri":"https://ping/pong/testparent2";,
> > "id":"TestParent2",
> > "allTitle":"blue",
> > "sortTitle":"blue",
> > "_version_":1529622293809987584,
> > "_root_":["https://ping/pong/testparent2";],
> > "timestamp":"2016-03-23T19:31:35.408Z",
> > "language":"en"}
> >
> > - Original Message -
> >
> > From: "Mikhail Khludnev" 
> > To: "solr-user" 
> > Sent: Wednesday, March 23, 2016 5:34:31 PM
> > Subject: Re: SyntaxError - Block Join Parent Query
> >
> > On Thu, Mar 24, 2016 at 12:16 AM, Charles Sanders 
> > wrote:
> >
> > > Thanks for the quick reply. But I'm not sure I understand. Did I do
> > > something wrong??
> > >
> > Absolutely
> > 'portal_product"red hat")'
> > You omit a colon and opening bracket after a field name, don't you?
> >
> >
> > >
> > >
> > >
> >
> /select?q=+blue%20+{!parent%20which=%22documentKind:TestParent%22%20v=$childq}&childq=portal_product%22red%20hat%22)
> > >
> > > 
> > > 
> > >  400 
> > >  2 
> > > 
> > > 
> > > blue {!parent which="documentKind:TestParent" v=$childq}
> > > 
> > >  portal_product"red hat") 
> > > 
> > > 
> > > 
> > > 
> > > org.apache.solr.search.SyntaxError: Cannot parse 'portal_product"red
> > > hat")': Encountered " ")" ") "" at line 1, column 23. Was expecting one
> > of:
> > >   ...  ...  ... "+" ... "-" ...  ... "("
> ...
> > > "*" ... "^" ...  ...  ...  ... 
> ...
> > >  ...  ... "[" ... "{" ...  ... 
> > ...
> > > 
> > >  400 
> > > 
> > > 
> > >
> > > - Original Message -
> > >
> > > From: "Mikhail Khludnev" 
> > > To: "solr-user" 
> > > Sent: Wednesday, March 23, 2016 5:02:29 PM
> > > Subject: Re: SyntaxError - Block Join Parent Query
> > >
> > > On Wed, Mar 23, 2016 at 11:09 PM, Charles Sanders  >
> > > wrote:
> > >
> > > > I'm getting a SyntaxError which I do not understand when I execute a
> > > block
> > > > join parent query. I'm running Solr5.2.1, with 2 shards. The problem
> > > > appears to be in that portion of the query that filters the child
> > > document.
> > > > Any insight as to where I made the error is great

Re: Solr 5 and JDK 8 - awful performance