Re: Poll: Master-Slave or SolrCloud?

2017-04-25 Thread Charlie Hull

On 24/04/2017 15:58, Otis Gospodnetić wrote:

Hi,

I'm really really surprised here.  Back in 2013 we did a poll to see how
people were running Master-Slave (4.x back then) and SolrCloud was a bit
more popular than Master-Slave:
https://sematext.com/blog/2013/02/25/poll-solr-cloud-or-not/

Here is a fresh new poll with pretty much the same question - How do you
run your Solr?  -
and guess what?  SolrCloud is *not* at all a lot more prevalent than
Master-Slave.

We definitely see a lot more SolrCloud used by Sematext Solr
consulting/support customers, so I'm a bit surprised by the results of this
poll so far.


I'm not particularly surprised. We regularly see clients either with 
single nodes or elderly versions of Solr (or even Lucene). Zookeeper is 
still seen as a bit of a black art. Once you move from 'how do I run a 
search engine' to 'how do I manage a cluster of servers with scaling for 
performance/resilience/failover' you're looking at a completely new set 
of skills and challenges, which I think puts many people off.


Charlie


Is anyone else surprised by this?  See https://twitter.com/sematext/
status/854927627748036608

Thanks,
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


---
This email has been checked for viruses by AVG.
http://www.avg.com




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Huge cfs files

2017-04-25 Thread Avi Steiner
Hi

We have a customer with Solr 5.3.1.
The index contains less than 3.5 million docs, and index folder size is about 
240GB.
I found that the most huge files are .cfs files (compound files) that were 
created lately although only few documents were added.
The useCompoundFile parameter is commented in SolrConfig.xml.
As far as I understand the default of Solr is false, and of Lucene is true, 
which means this feature should be disabled.
I would like to understand why those files created and why they are so huge.

Regards,

Avi



This email and any attachments thereto may contain private, confidential, and 
privileged material for the sole use of the intended recipient. Any review, 
copying, or distribution of this email (or any attachments thereto) by others 
is strictly prohibited. If you are not the intended recipient, please contact 
the sender immediately and permanently delete the original and any copies of 
this email and any attachments thereto.


termfreq usage/syntax

2017-04-25 Thread Saman Rasheed
hi Solr team, i'm starting to have fun with solr, and i'm in a big project that 
requires me to index some books and then do certain term lookups on them.


I'm using windows 10 and i've successfully managed to index a book containing 
more than 118,000 words! which is normal i guess.


in the solr admin UI, if i, for example do a look up on a term let's say 
'house', i type in 'fl' field the following:


termfreq(content,house)


and i get the following response:


{ "responseHeader":{ "status":0, "QTime":166, "params":{ "q":"*:*", 
"indent":"on", "fl":"termfreq(content,house)", "wt":"json", 
"_":"1493115416033"}}, "response":{"numFound":1,"start":0,"docs":[ { 
"termfreq(content,house)":200}] }


and this is what i expect, however i havent been successful in doing an 
approximate searchon the word 'house' i.e when 'house' is part or in the middle 
of word! e.g. 'rehousing' or 'housing'.


what i'm looking for is syntax similair to: 'termfreq(content,*house*)'

which doesnt work.


i've had a look at the online wiki reference on the section below:


termfreq


Returns the number of times the term appears in the field for that document.


termfreq(text,'memory')



from: https://cwiki.apache.org/confluence/display/solr/Function+Queries


and i've substituted the words 'text' and 'memory' with the ones in my example 
above, still no luck on approximate searches.


not sure what i'm doing wrong here, can you please help.


regards,

Sam.


Re: Issues with ingesting to Solr using Flume

2017-04-25 Thread Shawn Heisey
On 4/20/2017 9:02 AM, Anantharaman, Srinatha (Contractor) wrote:
> Hi all,
>
> I am trying to ingest data to Solr 6.3 using flume 1.5 on Hortonworks 2.5 
> platform Facing below issue while sinking the data
>
> 19 Apr 2017 19:54:26,943 ERROR [lifecycleSupervisor-1-3] 
> (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:253)  - 
> Unable to start SinkRunner: { 
> policy:org.apache.flume.sink.DefaultSinkProcessor@130344d7 counterGroup:{ 
> name:null counters:{} } } - Exception follows.
> org.kitesdk.morphline.api.MorphlineCompilationException: No command builder 
> registered for name: detectMimeType near: {
> # /etc/flume/conf/morphline.conf: 48
> "detectMimeType" : {
> # /etc/flume/conf/morphline.conf: 50
> "includeDefaultMimeTypes" : true
> }
> }

I know nothing at all about Flume, but reading that message, Solr is not
mentioned anywhere.  My recommendation is to ask for help on this
problem using a Flume resource.  If Solr is doing something wrong, they
should be able to help you find evidence showing that.  At that point,
you can come back to this thread with that evidence.

Are there any ERROR or WARN messages in the Solr logs?

Thanks,
Shawn



Re: Huge cfs files

2017-04-25 Thread Shawn Heisey
On 4/25/2017 3:33 AM, Avi Steiner wrote:
> We have a customer with Solr 5.3.1.
> The index contains less than 3.5 million docs, and index folder size is about 
> 240GB.

If 3.5 million documents creates a 240GB index, then this is a very
atypical index.  The documents must be HUGE, or else you are using
copyField a LOT to create different ways to search the same data.  My
largest index shards have nearly 40 million documents in them and are
only 55GB in size.  The entire distributed index is almost 400 million
docs and about 550GB in size.

> I found that the most huge files are .cfs files (compound files) that were 
> created lately although only few documents were added.
> The useCompoundFile parameter is commented in SolrConfig.xml.
> As far as I understand the default of Solr is false, and of Lucene is true, 
> which means this feature should be disabled.
> I would like to understand why those files created and why they are so huge.

The cfs files are not an indication of a problem.  Solr (Lucene really)
has decided that some threshold has been crossed for those segments, and
that it should consolidate the files instead of keeping them separate.

The reason they are so big is because your index is big.  No other
reason.  The total disk space consumed would be pretty much identical
even if the files were separate instead of combined into a .cfs file. 
Think of the .cfs file as a little bit like a .tar file, or a .zip file
without compression.  Because segment files are never changed after
creation, there's very little difference between accessing part of a
large file instead of an individual file.

Thanks,
Shawn



Re: termfreq usage/syntax

2017-04-25 Thread Shawn Heisey
On 4/25/2017 4:46 AM, Saman Rasheed wrote:
> what i'm looking for is syntax similair to: 'termfreq(content,*house*)'
> which doesnt work.

I doubt this function knows how to deal with wildcards.  It sounds like
it can only do exact terms.

One option you have is to use the /terms handler with the terms.regex
parameter to get a list all terms that match the regex with their
counts, and add up the numbers on the client side.

Here's some documentation on the Terms Component and the /terms handler
that uses it:

https://cwiki.apache.org/confluence/display/solr/The+Terms+Component

Thanks,
Shawn



JSON Response for Spellcheck Collate

2017-04-25 Thread Zoran
Hi Guys,

 

I’m using solr 6.5.0, which is fantastic, and I’ve come across an issue with 
the collations in the spellcheck response. 

 

The way the json is structured collations is an object with each collation 
named ‘collation’ where it should be an array with multiple ‘collation’ object 
elements. I’m using the ajax solr javascript library and because of the above 
each successive collation object writes over the previous, leaving the last and 
least likely collation.

 

I was going to dig deeper into ajax solr, but I’m pretty sure sure the issue is 
with the solr json response.

 

Below is the snippet of code relevant to this issue, notice collations contains 
multiple collation objects named the same:

 

"collations":{

  "collation":{

    "collationQuery":"haemorrhagic broken",

    "hits":36,

    "misspellingsAndCorrections":[

      "hemorrhagic","haemorrhagic",

  "braken","broken"]},

  "collation":{

    "collationQuery":"haemorrhagic brake",

    "hits":22,

    "misspellingsAndCorrections":[

  "hemorrhagic","haemorrhagic",

  "braken","brake"]},

  "collation":{

    "collationQuery":"hemorrhage broken",

    "hits":16,

    "misspellingsAndCorrections":[

  "hemorrhagic","hemorrhage",

  "braken","broken"]},

  "collation":{

    "collationQuery":"haemorrhagic braces",

    "hits":23,

    "misspellingsAndCorrections":[

  "hemorrhagic","haemorrhagic",

  "braken","braces"]},

  "collation":{

    "collationQuery":"hemorrhage brake",

    "hits":2,

    "misspellingsAndCorrections":[

  "hemorrhagic","hemorrhage",

  "braken","brake"]}

 }

 }

 

 

I’d appreciate any pointers with this. 

 

Z.

 



DIH Issues

2017-04-25 Thread AJ Lemke
Hey all,

We are using 6.3.0 and we have issues with DIH throwing errors.  We are seeing 
an intermittent issue where on a full index a single error will be thrown.  The 
error is always "missing required field: fieldname".
Our SQL database always has data in the field that comes up with the error.  
Most of the errors are coming on fields that SQL has marked as required.

Would anyone have any hints or ideas where to look to remedy this situation.

As always if you need more information let me know.

Thanks
AJ


Re: DIH Issues

2017-04-25 Thread Alexandre Rafalovitch
Maybe the content gets simplified away between the database and the
Solr schema. For example if your field contains just spaces and you
have UpdateRequestProcessors to do trim and removal of empty fields?

Schemaless mode will remove empty fields, but will not trim for example.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 25 April 2017 at 09:21, AJ Lemke  wrote:
> Hey all,
>
> We are using 6.3.0 and we have issues with DIH throwing errors.  We are 
> seeing an intermittent issue where on a full index a single error will be 
> thrown.  The error is always "missing required field: fieldname".
> Our SQL database always has data in the field that comes up with the error.  
> Most of the errors are coming on fields that SQL has marked as required.
>
> Would anyone have any hints or ideas where to look to remedy this situation.
>
> As always if you need more information let me know.
>
> Thanks
> AJ


RE: DIH Issues

2017-04-25 Thread AJ Lemke
Thanks for the thought Alex!
The fields that have this happen most often are numeric and boolean fields. 
These fields have real data (id numbers, true/false, etc.)

AJ

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Tuesday, April 25, 2017 8:27 AM
To: solr-user 
Subject: Re: DIH Issues

Maybe the content gets simplified away between the database and the Solr 
schema. For example if your field contains just spaces and you have 
UpdateRequestProcessors to do trim and removal of empty fields?

Schemaless mode will remove empty fields, but will not trim for example.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 25 April 2017 at 09:21, AJ Lemke  wrote:
> Hey all,
>
> We are using 6.3.0 and we have issues with DIH throwing errors.  We are 
> seeing an intermittent issue where on a full index a single error will be 
> thrown.  The error is always "missing required field: fieldname".
> Our SQL database always has data in the field that comes up with the error.  
> Most of the errors are coming on fields that SQL has marked as required.
>
> Would anyone have any hints or ideas where to look to remedy this situation.
>
> As always if you need more information let me know.
>
> Thanks
> AJ


SolrIndexSearcher#getDocList() method returns zero results, if query includes tdate range query

2017-04-25 Thread Victor Solakhian
We have code that uses *SolrIndexSearcher#getDocList()* method to get
document IDs for the query.

First a Solr query string is generated from UI, then the following code
creates a Lucene Query

org.apache.lucene.search.Query query = parser.parse(solrQueryString);

where parser is org.apache.lucene.queryparser.classic.QueryParser and then
the following is used to get the document IDs:

DocList docList = indexSearcher.getDocList(query, filterList, sort,
start, length, 0);

The code worked perfectly in Solr 4.5. Now, in Solr 5.5.4, it works only if
the query does not contain a date range query. For example, solrQueryString:

"(+c_class:(Industry.government)) AND +valid_date:[2015-10-21 TO
2017-04-21] AND -class:(TitleCodeMiddle.Board) AND +company:[* TO *] AND
+(has_email:(true) OR has_phone:(true) OR c_has_phone:(true)) AND
+c_class:(Industry.government)"

was parsed to the Lucene query:

"+(+c_class:industry.government) +valid_date:[2015-10-21 TO 2017-04-21]
-class:titlecodemiddle.board +company:[* TO *] +(has_email:T has_phone:T
c_has_phone:T) +c_class:industry.government"

which contains "+valid_date:[2015-10-21 TO 2017-04-21]", returns ZERO
results, although the same query (actually the Solr equivalent) returns
3326 records when used in Solr Admin.

Here is the definition of the "valid_date" field:

   

   

For a similar query without the range query:

+(+c_class:industry.government) -class:titlecodemiddle.board +company:[* TO
*] +(has_email:T has_phone:T c_has_phone:T) +c_class:industry.government

our code returns 5629 results (same as Solr Admin).

I tried to use different formats for date in the Solr query (according to
what I was able to find on the web for Lucene date format):

   - "+valid_date:[2015-10-21 TO 2017-04-21]"
   - "+valid_date:[20151021 TO 20170421]"
   - "+valid_date:[2015-10-21T04:00:00.000 TO 2017-04-21T04:00:00.000]"
   - "+valid_date:[2015-10-21T04:00:00.000Z TO 2017-04-21T04\:00\:00]"
   - "+valid_date:[2015-10-21T00:00:00Z TO 2017-04-21T00:00:00Z]"

Just out of curiosity, I even generated "+valid_date:[XXX TO XXX]" just to
see that SolrIndexSearcher#getDocList() method does not check for correct
syntax and returns ZERO results.

Does anybody know what is happening and what is the proper date format for
Lucene range query in v. 5.5.4?

Thanks,

Victor


Re: Graph traversel

2017-04-25 Thread mganeshs
Dear Solr experts,

Can you any one over here explain about why graph traversal is not working
as expected in Solr 6.5 ?

It's not traversing all the child nodes. It traverse only few nodes and not
getting all the mid level and leaf nodes.

As I explained below, 

For this query 

http://localhost:8983/solr/graph/query?q=*:*&fq={!graph%20from=parent_id%20to=id}id:1
 

( which is to get all node getting traversed via node 1 ) 

I get the result as 
"docs":[ 
  { 
"id":"1"}, 
  { 
"id":"11"}, 
  { 
"id":"12"}, 
  { 
"id":"13"}, 
  { 
"id":"122"}] 

Where as I expect result as 1,11,12,13,121, 122, 131. 

What's going wrong ? 

Following is the data I uploaded

[{
"id": "1",
"name": "Root document one"
},
{
"id": "2",
"name": "Root document two"
},
{
"id": "3",
"name": "Root document three"
},
{
"id": "11",
"parent_id": "1",
"name": "First level document 1, child one"
},
{
"id": "12",
"parent_id": "1",
"name": "First level document 1, child two"
},
{
"id": "13",
"parent_id": "1",
"name": "First level document 1, child three"
},
{
"id": "21",
"parent_id": "2",
"name": "First level document 2, child one"
},
{
"id": "22",
"parent_id": "2",
"name": "First level document 2, child two"
},
{
"id": "121",
"parent_id": "12",
"name": "Second level document 12, child one"
},
{
"id": "122",
"parent_id": "12",
"name": "Second level document 12, child two"
},
{
"id": "131",
"parent_id": "13",
"name": "Second level document 13, child three"
}]







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Graph-traversel-tp4331207p4331799.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH Issues

2017-04-25 Thread Erick Erickson
You say your SQL database always has the values, but does the output
from the SQL query you actually use have them? I've been fooled before
by the query I form "somehow" doesn't have a value for all fields I
expect.

You could also crank the logging level up enough to see the docs that
are indexed, although that would probably only confirm that the fields
weren't in the docs which you know already, not tell you why they are
missing. Pull the SQL out and run it independently perhaps?

I sound a bit like a broken record, but this is why I like SolrJ, I
can actually debug that:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

On Tue, Apr 25, 2017 at 8:28 AM, AJ Lemke  wrote:
> Thanks for the thought Alex!
> The fields that have this happen most often are numeric and boolean fields. 
> These fields have real data (id numbers, true/false, etc.)
>
> AJ
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Tuesday, April 25, 2017 8:27 AM
> To: solr-user 
> Subject: Re: DIH Issues
>
> Maybe the content gets simplified away between the database and the Solr 
> schema. For example if your field contains just spaces and you have 
> UpdateRequestProcessors to do trim and removal of empty fields?
>
> Schemaless mode will remove empty fields, but will not trim for example.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 25 April 2017 at 09:21, AJ Lemke  wrote:
>> Hey all,
>>
>> We are using 6.3.0 and we have issues with DIH throwing errors.  We are 
>> seeing an intermittent issue where on a full index a single error will be 
>> thrown.  The error is always "missing required field: fieldname".
>> Our SQL database always has data in the field that comes up with the error.  
>> Most of the errors are coming on fields that SQL has marked as required.
>>
>> Would anyone have any hints or ideas where to look to remedy this situation.
>>
>> As always if you need more information let me know.
>>
>> Thanks
>> AJ


Re: Poll: Master-Slave or SolrCloud?

2017-04-25 Thread Erick Erickson
Maybe the other thing in play here is that use-cases that "just work"
in the master/slave environment are less likely to employ consultants
so we get something of a skewed sense of who uses what ;)

On Tue, Apr 25, 2017 at 1:50 AM, Charlie Hull  wrote:
> On 24/04/2017 15:58, Otis Gospodnetić wrote:
>>
>> Hi,
>>
>> I'm really really surprised here.  Back in 2013 we did a poll to see how
>> people were running Master-Slave (4.x back then) and SolrCloud was a bit
>> more popular than Master-Slave:
>> https://sematext.com/blog/2013/02/25/poll-solr-cloud-or-not/
>>
>> Here is a fresh new poll with pretty much the same question - How do you
>> run your Solr?  -
>> and guess what?  SolrCloud is *not* at all a lot more prevalent than
>> Master-Slave.
>>
>> We definitely see a lot more SolrCloud used by Sematext Solr
>> consulting/support customers, so I'm a bit surprised by the results of
>> this
>> poll so far.
>
>
> I'm not particularly surprised. We regularly see clients either with single
> nodes or elderly versions of Solr (or even Lucene). Zookeeper is still seen
> as a bit of a black art. Once you move from 'how do I run a search engine'
> to 'how do I manage a cluster of servers with scaling for
> performance/resilience/failover' you're looking at a completely new set of
> skills and challenges, which I think puts many people off.
>
> Charlie
>>
>>
>> Is anyone else surprised by this?  See https://twitter.com/sematext/
>> status/854927627748036608
>>
>> Thanks,
>> Otis
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>> ---
>> This email has been checked for viruses by AVG.
>> http://www.avg.com
>>
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk


Re: DIH Issues

2017-04-25 Thread Alexandre Rafalovitch
I wonder if it is possible to write a component/URP/something that
will intercept exceptions like these and dump out full record.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 25 April 2017 at 12:19, Erick Erickson  wrote:
> You say your SQL database always has the values, but does the output
> from the SQL query you actually use have them? I've been fooled before
> by the query I form "somehow" doesn't have a value for all fields I
> expect.
>
> You could also crank the logging level up enough to see the docs that
> are indexed, although that would probably only confirm that the fields
> weren't in the docs which you know already, not tell you why they are
> missing. Pull the SQL out and run it independently perhaps?
>
> I sound a bit like a broken record, but this is why I like SolrJ, I
> can actually debug that:
> https://lucidworks.com/2012/02/14/indexing-with-solrj/
>
> Best,
> Erick
>
> On Tue, Apr 25, 2017 at 8:28 AM, AJ Lemke  wrote:
>> Thanks for the thought Alex!
>> The fields that have this happen most often are numeric and boolean fields. 
>> These fields have real data (id numbers, true/false, etc.)
>>
>> AJ
>>
>> -Original Message-
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: Tuesday, April 25, 2017 8:27 AM
>> To: solr-user 
>> Subject: Re: DIH Issues
>>
>> Maybe the content gets simplified away between the database and the Solr 
>> schema. For example if your field contains just spaces and you have 
>> UpdateRequestProcessors to do trim and removal of empty fields?
>>
>> Schemaless mode will remove empty fields, but will not trim for example.
>>
>> Regards,
>>Alex.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>>
>>
>> On 25 April 2017 at 09:21, AJ Lemke  wrote:
>>> Hey all,
>>>
>>> We are using 6.3.0 and we have issues with DIH throwing errors.  We are 
>>> seeing an intermittent issue where on a full index a single error will be 
>>> thrown.  The error is always "missing required field: fieldname".
>>> Our SQL database always has data in the field that comes up with the error. 
>>>  Most of the errors are coming on fields that SQL has marked as required.
>>>
>>> Would anyone have any hints or ideas where to look to remedy this situation.
>>>
>>> As always if you need more information let me know.
>>>
>>> Thanks
>>> AJ


Re: SolrIndexSearcher#getDocList() method returns zero results, if query includes tdate range query

2017-04-25 Thread Rick Leir
Victor,
When you do a query in SolrAdmin, the generated query is shown in at the top of 
the page. Can you compare that with the query that getDocList generates? Or did 
I misunderstand your question.
Cheers -- Rick

On April 25, 2017 11:34:17 AM EDT, Victor Solakhian  
wrote:
>We have code that uses *SolrIndexSearcher#getDocList()* method to get
>document IDs for the query.
>
>First a Solr query string is generated from UI, then the following code
>creates a Lucene Query
>
>  org.apache.lucene.search.Query query = parser.parse(solrQueryString);
>
>where parser is org.apache.lucene.queryparser.classic.QueryParser and
>then
>the following is used to get the document IDs:
>
>DocList docList = indexSearcher.getDocList(query, filterList, sort,
>start, length, 0);
>
>The code worked perfectly in Solr 4.5. Now, in Solr 5.5.4, it works
>only if
>the query does not contain a date range query. For example,
>solrQueryString:
>
>"(+c_class:(Industry.government)) AND +valid_date:[2015-10-21 TO
>2017-04-21] AND -class:(TitleCodeMiddle.Board) AND +company:[* TO *]
>AND
>+(has_email:(true) OR has_phone:(true) OR c_has_phone:(true)) AND
>+c_class:(Industry.government)"
>
>was parsed to the Lucene query:
>
>"+(+c_class:industry.government) +valid_date:[2015-10-21 TO 2017-04-21]
>-class:titlecodemiddle.board +company:[* TO *] +(has_email:T
>has_phone:T
>c_has_phone:T) +c_class:industry.government"
>
>which contains "+valid_date:[2015-10-21 TO 2017-04-21]", returns ZERO
>results, although the same query (actually the Solr equivalent) returns
>3326 records when used in Solr Admin.
>
>Here is the definition of the "valid_date" field:
>
> 
>
>   sortMissingLast="true" precisionStep="6" positionIncrementGap="0"
>omitNorms="true"/>
>
>For a similar query without the range query:
>
>+(+c_class:industry.government) -class:titlecodemiddle.board
>+company:[* TO
>*] +(has_email:T has_phone:T c_has_phone:T)
>+c_class:industry.government
>
>our code returns 5629 results (same as Solr Admin).
>
>I tried to use different formats for date in the Solr query (according
>to
>what I was able to find on the web for Lucene date format):
>
>   - "+valid_date:[2015-10-21 TO 2017-04-21]"
>   - "+valid_date:[20151021 TO 20170421]"
>   - "+valid_date:[2015-10-21T04:00:00.000 TO 2017-04-21T04:00:00.000]"
>   - "+valid_date:[2015-10-21T04:00:00.000Z TO 2017-04-21T04\:00\:00]"
>   - "+valid_date:[2015-10-21T00:00:00Z TO 2017-04-21T00:00:00Z]"
>
>Just out of curiosity, I even generated "+valid_date:[XXX TO XXX]" just
>to
>see that SolrIndexSearcher#getDocList() method does not check for
>correct
>syntax and returns ZERO results.
>
>Does anybody know what is happening and what is the proper date format
>for
>Lucene range query in v. 5.5.4?
>
>Thanks,
>
>Victor

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: DIH Issues

2017-04-25 Thread Sales

> On Apr 25, 2017, at 10:28 AM, AJ Lemke  wrote:
> 
> Thanks for the thought Alex!
> The fields that have this happen most often are numeric and boolean fields. 
> These fields have real data (id numbers, true/false, etc.)
> 
> AJ
> 

We had an identical problem a few months ago, and there was no question that 
the field was populated in all MySQL records. We figured out how to use another 
field in the schema to do the same query, so, ended up deleting the troublesome 
field. Never did discover why, all ideas failed. In our case, the same data 
populated 2 different fields, one worked, one did not, but, never found a good 
reason for that. I’d love to know if you figure it out, as, it could be the 
reason why ours did the same thing. Our is a much older version though. We 
figured it’s some sort of rare bug. We played around for several weeks. Hope 
you can find it. 

Steve

Re: Poll: Master-Slave or SolrCloud?

2017-04-25 Thread Sales

> On Apr 25, 2017, at 11:23 AM, Erick Erickson  wrote:
> 
> Maybe the other thing in play here is that use-cases that "just work"
> in the master/slave environment are less likely to employ consultants
> so we get something of a skewed sense of who uses what ;)
> 

So, there’s a new poll you just started there! That exactly describes us. 

Steve



Re: SolrIndexSearcher#getDocList() method returns zero results, if query includes tdate range query

2017-04-25 Thread Victor Solakhian
Rick,

Solr Admin does not generate a query. I use it just to confirm that the
query generated by our code returns results.

getDocList() method also does not generate a query, It returns a list of
document IDs for the query created by the QueryParser.parse(query,...).
method.

Thanks,

Victor

On Tue, Apr 25, 2017 at 12:44 PM, Rick Leir  wrote:

> Victor,
> When you do a query in SolrAdmin, the generated query is shown in at the
> top of the page. Can you compare that with the query that getDocList
> generates? Or did I misunderstand your question.
> Cheers -- Rick
>
> On April 25, 2017 11:34:17 AM EDT, Victor Solakhian 
> wrote:
> >We have code that uses *SolrIndexSearcher#getDocList()* method to get
> >document IDs for the query.
> >
> >First a Solr query string is generated from UI, then the following code
> >creates a Lucene Query
> >
> >  org.apache.lucene.search.Query query = parser.parse(solrQueryString);
> >
> >where parser is org.apache.lucene.queryparser.classic.QueryParser and
> >then
> >the following is used to get the document IDs:
> >
> >DocList docList = indexSearcher.getDocList(query, filterList, sort,
> >start, length, 0);
> >
> >The code worked perfectly in Solr 4.5. Now, in Solr 5.5.4, it works
> >only if
> >the query does not contain a date range query. For example,
> >solrQueryString:
> >
> >"(+c_class:(Industry.government)) AND +valid_date:[2015-10-21 TO
> >2017-04-21] AND -class:(TitleCodeMiddle.Board) AND +company:[* TO *]
> >AND
> >+(has_email:(true) OR has_phone:(true) OR c_has_phone:(true)) AND
> >+c_class:(Industry.government)"
> >
> >was parsed to the Lucene query:
> >
> >"+(+c_class:industry.government) +valid_date:[2015-10-21 TO 2017-04-21]
> >-class:titlecodemiddle.board +company:[* TO *] +(has_email:T
> >has_phone:T
> >c_has_phone:T) +c_class:industry.government"
> >
> >which contains "+valid_date:[2015-10-21 TO 2017-04-21]", returns ZERO
> >results, although the same query (actually the Solr equivalent) returns
> >3326 records when used in Solr Admin.
> >
> >Here is the definition of the "valid_date" field:
> >
> > 
> >
> >>sortMissingLast="true" precisionStep="6" positionIncrementGap="0"
> >omitNorms="true"/>
> >
> >For a similar query without the range query:
> >
> >+(+c_class:industry.government) -class:titlecodemiddle.board
> >+company:[* TO
> >*] +(has_email:T has_phone:T c_has_phone:T)
> >+c_class:industry.government
> >
> >our code returns 5629 results (same as Solr Admin).
> >
> >I tried to use different formats for date in the Solr query (according
> >to
> >what I was able to find on the web for Lucene date format):
> >
> >   - "+valid_date:[2015-10-21 TO 2017-04-21]"
> >   - "+valid_date:[20151021 TO 20170421]"
> >   - "+valid_date:[2015-10-21T04:00:00.000 TO 2017-04-21T04:00:00.000]"
> >   - "+valid_date:[2015-10-21T04:00:00.000Z TO 2017-04-21T04\:00\:00]"
> >   - "+valid_date:[2015-10-21T00:00:00Z TO 2017-04-21T00:00:00Z]"
> >
> >Just out of curiosity, I even generated "+valid_date:[XXX TO XXX]" just
> >to
> >see that SolrIndexSearcher#getDocList() method does not check for
> >correct
> >syntax and returns ZERO results.
> >
> >Does anybody know what is happening and what is the proper date format
> >for
> >Lucene range query in v. 5.5.4?
> >
> >Thanks,
> >
> >Victor
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Re: Poll: Master-Slave or SolrCloud?

2017-04-25 Thread Otis Gospodnetić
This is interesting - that ZK is seen as adding so much complexity that it
turns people off!

If you think about it, Elasticsearch users have no choice -- except their
"ZK" is built-in, hidden, so one doesn't have to think about it, at least
not initially.

I think I saw mentions (maybe on user or dev MLs or JIRA) about
potentially, in the future, there only being SolrCloud mode (and dropping
SolrCloud name in favour of Solr).  If the above comment from Charlie about
complexity is really true for Solr users, and if that's the reason why we
see so few people running SolrCloud today, perhaps that's a good signal for
Solr development/priorities in terms of ZK
hiding/automating/embedding/something...

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Tue, Apr 25, 2017 at 4:50 AM, Charlie Hull  wrote:

> On 24/04/2017 15:58, Otis Gospodnetić wrote:
>
>> Hi,
>>
>> I'm really really surprised here.  Back in 2013 we did a poll to see how
>> people were running Master-Slave (4.x back then) and SolrCloud was a bit
>> more popular than Master-Slave:
>> https://sematext.com/blog/2013/02/25/poll-solr-cloud-or-not/
>>
>> Here is a fresh new poll with pretty much the same question - How do you
>> run your Solr?  -
>> and guess what?  SolrCloud is *not* at all a lot more prevalent than
>> Master-Slave.
>>
>> We definitely see a lot more SolrCloud used by Sematext Solr
>> consulting/support customers, so I'm a bit surprised by the results of
>> this
>> poll so far.
>>
>
> I'm not particularly surprised. We regularly see clients either with
> single nodes or elderly versions of Solr (or even Lucene). Zookeeper is
> still seen as a bit of a black art. Once you move from 'how do I run a
> search engine' to 'how do I manage a cluster of servers with scaling for
> performance/resilience/failover' you're looking at a completely new set
> of skills and challenges, which I think puts many people off.
>
> Charlie
>
>>
>> Is anyone else surprised by this?  See https://twitter.com/sematext/
>> status/854927627748036608
>>
>> Thanks,
>> Otis
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>> ---
>> This email has been checked for viruses by AVG.
>> http://www.avg.com
>>
>>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>


Re: Poll: Master-Slave or SolrCloud?

2017-04-25 Thread Rick Leir
All,
I read somewhere that you should run your own ZK externally, and turn off 
SolrCloud. Comments please!
Rick

On April 25, 2017 1:33:31 PM EDT, "Otis Gospodnetić" 
 wrote:
>This is interesting - that ZK is seen as adding so much complexity that
>it
>turns people off!
>
>If you think about it, Elasticsearch users have no choice -- except
>their
>"ZK" is built-in, hidden, so one doesn't have to think about it, at
>least
>not initially.
>
>I think I saw mentions (maybe on user or dev MLs or JIRA) about
>potentially, in the future, there only being SolrCloud mode (and
>dropping
>SolrCloud name in favour of Solr).  If the above comment from Charlie
>about
>complexity is really true for Solr users, and if that's the reason why
>we
>see so few people running SolrCloud today, perhaps that's a good signal
>for
>Solr development/priorities in terms of ZK
>hiding/automating/embedding/something...
>
>Otis
>--
>Monitoring - Log Management - Alerting - Anomaly Detection
>Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>On Tue, Apr 25, 2017 at 4:50 AM, Charlie Hull 
>wrote:
>
>> On 24/04/2017 15:58, Otis Gospodnetić wrote:
>>
>>> Hi,
>>>
>>> I'm really really surprised here.  Back in 2013 we did a poll to see
>how
>>> people were running Master-Slave (4.x back then) and SolrCloud was a
>bit
>>> more popular than Master-Slave:
>>> https://sematext.com/blog/2013/02/25/poll-solr-cloud-or-not/
>>>
>>> Here is a fresh new poll with pretty much the same question - How do
>you
>>> run your Solr?
> -
>>> and guess what?  SolrCloud is *not* at all a lot more prevalent than
>>> Master-Slave.
>>>
>>> We definitely see a lot more SolrCloud used by Sematext Solr
>>> consulting/support customers, so I'm a bit surprised by the results
>of
>>> this
>>> poll so far.
>>>
>>
>> I'm not particularly surprised. We regularly see clients either with
>> single nodes or elderly versions of Solr (or even Lucene). Zookeeper
>is
>> still seen as a bit of a black art. Once you move from 'how do I run
>a
>> search engine' to 'how do I manage a cluster of servers with scaling
>for
>> performance/resilience/failover' you're looking at a completely new
>set
>> of skills and challenges, which I think puts many people off.
>>
>> Charlie
>>
>>>
>>> Is anyone else surprised by this?  See https://twitter.com/sematext/
>>> status/854927627748036608
>>>
>>> Thanks,
>>> Otis
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training -
>http://sematext.com/
>>>
>>>
>>> ---
>>> This email has been checked for viruses by AVG.
>>> http://www.avg.com
>>>
>>>
>>
>> --
>> Charlie Hull
>> Flax - Open Source Enterprise Search
>>
>> tel/fax: +44 (0)8700 118334
>> mobile:  +44 (0)7767 825828
>> web: www.flax.co.uk
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Poll: Master-Slave or SolrCloud?

2017-04-25 Thread David Hastings
I can definitely attest to this.  The really nice thing about the standard
Solr/Jetty configuration is that its all there, Lucene+Solr+Jetty and you
just turn it on and run, and after only minor tweaks to JVM and memory
settings, its effectively production ready with a reliable master- slave
configuration.  The servers I run do about 30,000 + searches a day 95% are
sub second on massive indexes.  With SolrCloud and ZK, it does work out of
the box, but explicitly said every where its not supposed to be in
production until you configure and maintain your own external ZK ensemble.
  If it was simplified some what, I think a lot more people would migrate
over to SolrCloud, but for now I can say we are not going in that direction.

On Tue, Apr 25, 2017 at 1:33 PM, Otis Gospodnetić <
otis.gospodne...@gmail.com> wrote:

> This is interesting - that ZK is seen as adding so much complexity that it
> turns people off!
>
> If you think about it, Elasticsearch users have no choice -- except their
> "ZK" is built-in, hidden, so one doesn't have to think about it, at least
> not initially.
>
> I think I saw mentions (maybe on user or dev MLs or JIRA) about
> potentially, in the future, there only being SolrCloud mode (and dropping
> SolrCloud name in favour of Solr).  If the above comment from Charlie about
> complexity is really true for Solr users, and if that's the reason why we
> see so few people running SolrCloud today, perhaps that's a good signal for
> Solr development/priorities in terms of ZK
> hiding/automating/embedding/something...
>
> Otis
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
> On Tue, Apr 25, 2017 at 4:50 AM, Charlie Hull  wrote:
>
> > On 24/04/2017 15:58, Otis Gospodnetić wrote:
> >
> >> Hi,
> >>
> >> I'm really really surprised here.  Back in 2013 we did a poll to see how
> >> people were running Master-Slave (4.x back then) and SolrCloud was a bit
> >> more popular than Master-Slave:
> >> https://sematext.com/blog/2013/02/25/poll-solr-cloud-or-not/
> >>
> >> Here is a fresh new poll with pretty much the same question - How do you
> >> run your Solr? 
> -
> >> and guess what?  SolrCloud is *not* at all a lot more prevalent than
> >> Master-Slave.
> >>
> >> We definitely see a lot more SolrCloud used by Sematext Solr
> >> consulting/support customers, so I'm a bit surprised by the results of
> >> this
> >> poll so far.
> >>
> >
> > I'm not particularly surprised. We regularly see clients either with
> > single nodes or elderly versions of Solr (or even Lucene). Zookeeper is
> > still seen as a bit of a black art. Once you move from 'how do I run a
> > search engine' to 'how do I manage a cluster of servers with scaling for
> > performance/resilience/failover' you're looking at a completely new set
> > of skills and challenges, which I think puts many people off.
> >
> > Charlie
> >
> >>
> >> Is anyone else surprised by this?  See https://twitter.com/sematext/
> >> status/854927627748036608
> >>
> >> Thanks,
> >> Otis
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >> ---
> >> This email has been checked for viruses by AVG.
> >> http://www.avg.com
> >>
> >>
> >
> > --
> > Charlie Hull
> > Flax - Open Source Enterprise Search
> >
> > tel/fax: +44 (0)8700 118334
> > mobile:  +44 (0)7767 825828
> > web: www.flax.co.uk
> >
>


Re: SolrIndexSearcher#getDocList() method returns zero results, if query includes tdate range query

2017-04-25 Thread Rick Leir
Victor,
In SorAdmin, do a query then look at the top bar on the screen. Sorry, I cannot 
do a screenshot here. The actual query that SolrAdmin generated is in that top 
bar. It is difficult to cut and paste the query but possible. Or you can click 
on it and jump to a results page. SolrAdmin has changed over time, I know the 
5.5.2 version best.

GetDocList must cause a query to get sent? You could see the query in the logs 
or even use Wireshark.
Cheers -- Rick

On April 25, 2017 1:11:25 PM EDT, Victor Solakhian  wrote:
>Rick,
>
>Solr Admin does not generate a query. I use it just to confirm that the
>query generated by our code returns results.
>
>getDocList() method also does not generate a query, It returns a list
>of
>document IDs for the query created by the QueryParser.parse(query,...).
>method.
>
>Thanks,
>
>Victor
>
>On Tue, Apr 25, 2017 at 12:44 PM, Rick Leir  wrote:
>
>> Victor,
>> When you do a query in SolrAdmin, the generated query is shown in at
>the
>> top of the page. Can you compare that with the query that getDocList
>> generates? Or did I misunderstand your question.
>> Cheers -- Rick
>>
>> On April 25, 2017 11:34:17 AM EDT, Victor Solakhian
>
>> wrote:
>> >We have code that uses *SolrIndexSearcher#getDocList()* method to
>get
>> >document IDs for the query.
>> >
>> >First a Solr query string is generated from UI, then the following
>code
>> >creates a Lucene Query
>> >
>> >  org.apache.lucene.search.Query query =
>parser.parse(solrQueryString);
>> >
>> >where parser is org.apache.lucene.queryparser.classic.QueryParser
>and
>> >then
>> >the following is used to get the document IDs:
>> >
>> >DocList docList = indexSearcher.getDocList(query, filterList,
>sort,
>> >start, length, 0);
>> >
>> >The code worked perfectly in Solr 4.5. Now, in Solr 5.5.4, it works
>> >only if
>> >the query does not contain a date range query. For example,
>> >solrQueryString:
>> >
>> >"(+c_class:(Industry.government)) AND +valid_date:[2015-10-21 TO
>> >2017-04-21] AND -class:(TitleCodeMiddle.Board) AND +company:[* TO *]
>> >AND
>> >+(has_email:(true) OR has_phone:(true) OR c_has_phone:(true)) AND
>> >+c_class:(Industry.government)"
>> >
>> >was parsed to the Lucene query:
>> >
>> >"+(+c_class:industry.government) +valid_date:[2015-10-21 TO
>2017-04-21]
>> >-class:titlecodemiddle.board +company:[* TO *] +(has_email:T
>> >has_phone:T
>> >c_has_phone:T) +c_class:industry.government"
>> >
>> >which contains "+valid_date:[2015-10-21 TO 2017-04-21]", returns
>ZERO
>> >results, although the same query (actually the Solr equivalent)
>returns
>> >3326 records when used in Solr Admin.
>> >
>> >Here is the definition of the "valid_date" field:
>> >
>> > />
>> >
>> >   > >sortMissingLast="true" precisionStep="6" positionIncrementGap="0"
>> >omitNorms="true"/>
>> >
>> >For a similar query without the range query:
>> >
>> >+(+c_class:industry.government) -class:titlecodemiddle.board
>> >+company:[* TO
>> >*] +(has_email:T has_phone:T c_has_phone:T)
>> >+c_class:industry.government
>> >
>> >our code returns 5629 results (same as Solr Admin).
>> >
>> >I tried to use different formats for date in the Solr query
>(according
>> >to
>> >what I was able to find on the web for Lucene date format):
>> >
>> >   - "+valid_date:[2015-10-21 TO 2017-04-21]"
>> >   - "+valid_date:[20151021 TO 20170421]"
>> >   - "+valid_date:[2015-10-21T04:00:00.000 TO
>2017-04-21T04:00:00.000]"
>> >   - "+valid_date:[2015-10-21T04:00:00.000Z TO
>2017-04-21T04\:00\:00]"
>> >   - "+valid_date:[2015-10-21T00:00:00Z TO 2017-04-21T00:00:00Z]"
>> >
>> >Just out of curiosity, I even generated "+valid_date:[XXX TO XXX]"
>just
>> >to
>> >see that SolrIndexSearcher#getDocList() method does not check for
>> >correct
>> >syntax and returns ZERO results.
>> >
>> >Does anybody know what is happening and what is the proper date
>format
>> >for
>> >Lucene range query in v. 5.5.4?
>> >
>> >Thanks,
>> >
>> >Victor
>>
>> --
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Troubleshooting solr errors

2017-04-25 Thread Daniel Miller
The problem isn't a particular email message - I get a cascade of those 
errors (every time a new message is received) once the server "breaks".  
The fix is to restart the server.  I did find a Java heap error in the 
log - so I've increased the memory allocation (now to -Xms512m 
-Xmx2048m).  I had thought that a heap failure would result in "simple" 
termination - and that systemd would restart it appropriately - but 
obviously I'm missing something.


I was hoping to be able to help find whatever the bug might be - if 
indeed there is one - or if the problem is simply not enough memory 
available to Solr.


As for the Dovecot specifics, if you'll check the Dovecot user mailing 
list archives for "Solr 6.4.1 config" you should find my post, including 
my config.  If you need it I'll be happy to re-post that message here.  
Searching is performed by the IMAP clients via Dovecot - so no manual 
Solr queries are performed.  I simply use the search function of my mail 
clients that support server-side searches (Thunderbird for 
Windoze/Linux, AquaMail for Android).


--
Daniel

On 4/24/2017 5:43 PM, Rick Leir wrote:

Daniel,
Would it be too much trouble to get some text out of that particular email 
message, and try it in the Solr Admin Analysis tool?

By the way, I also have my email in Dovecot. Would you be able to describe how 
you index it and how you query to find an email? Perhaps with scripts in a 
github project?
Thanks -- Rick

On April 24, 2017 5:55:29 PM EDT, Daniel Miller  wrote:

I'm running Solr 6.4.2 to index my mail server (Dovecot). Searching is
great - but periodically I have Solr errors. Previously, when an error
would occur Solr would terminate.  I now have it running as a systemd
service so it would auto-restart - but it seems like that doesn't solve
it.

Some of the log lines include:

2017-04-24 18:18:31.101 ERROR (qtp594427726-30) [   x:dovecot]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException:
Exception writing document id
17697/7db132200dd2df4d2f7b3bc41c5f/dmil...@amfes.com to the index;
possible analysis error.

2017-04-24 18:18:31.125 ERROR (qtp594427726-32) [   x:dovecot]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error
opening new searcher

I don't know what else to provide to try to troubleshoot this.

--
Daniel




Caused by: org.noggit.JSONParser$ParseException: Expected ',' or '}': char=",position=312 BEFORE='ssions"

2017-04-25 Thread bay chae
https://stackoverflow.com/questions/43618000/solr-standalone-basicauth-org-noggit-jsonparserparseexception
 


Hi I am following guides on security.json in 
https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin
 
.

But when solr starts up I am getting:

Caused by: org.noggit.JSONParser$ParseException: Expected ',' or '}': 
char=",position=312 BEFORE='ssions":[{"name":"security-edit", "role":"admin"}] 
"' AFTER='user-role":{"solr":"admin"} }} '
at org.noggit.JSONParser.err(JSONParser.java:356)
at org.noggit.JSONParser.nextEvent(JSONParser.java:958)
at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:124)
at org.noggit.ObjectBuilder.getVal(ObjectBuilder.java:57)
at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:128)
at org.apache.solr.common.util.Utils.fromJSON(Utils.java:127)
at 
org.apache.solr.handler.admin.SecurityConfHandler$SecurityConfig.setData(SecurityConfHandler.java:311)
at 
org.apache.solr.handler.admin.SecurityConfHandlerLocal.getSecurityConfig(SecurityConfHandlerLocal.java:58)
... 46 more

Any help for a poor noob? This is for solr standalone.

==

2017-04-25 17:45:03.530 INFO  (main) [   ] o.e.j.s.Server jetty-9.3.14.v20161028
2017-04-25 17:45:03.870 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter  ___  
_   Welcome to Apache Solr™ version 6.5.0
2017-04-25 17:45:03.870 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter / __| 
___| |_ _   Starting in standalone mode on port 8984
2017-04-25 17:45:03.871 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter \__ \/ _ 
\ | '_|  Install dir: /usr/local/solr-6.5.0
2017-04-25 17:45:03.885 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter 
|___/\___/_|_|Start time: 2017-04-25T17:45:03.872Z
2017-04-25 17:45:03.885 INFO  (main) [   ] o.a.s.s.StartupLoggingUtils Property 
solr.log.muteconsole given. Muting ConsoleAppender named CONSOLE
2017-04-25 17:45:03.900 INFO  (main) [   ] o.a.s.c.SolrResourceLoader Using 
system property solr.solr.home: /usr/local/solr-6.5.0/server/solr
2017-04-25 17:45:03.908 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading 
container configuration from /usr/local/solr-6.5.0/server/solr/solr.xml
2017-04-25 17:45:04.181 INFO  (main) [   ] o.a.s.u.UpdateShardHandler Creating 
UpdateShardHandler HTTP client with params: 
socketTimeout=60&connTimeout=6&retry=true
2017-04-25 17:45:04.193 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could not 
start Solr. Check solr/home property and the logs
2017-04-25 17:45:04.217 ERROR (main) [   ] o.a.s.c.SolrCore 
null:org.apache.solr.common.SolrException: Failed opening existing 
security.json file: /usr/local/solr-6.5.0/server/solr/security.json



Re: Caused by: org.noggit.JSONParser$ParseException: Expected ',' or '}': char=",position=312 BEFORE='ssions"

2017-04-25 Thread Shawn Heisey
On 4/25/2017 12:10 PM, bay chae wrote:
> https://stackoverflow.com/questions/43618000/solr-standalone-basicauth-org-noggit-jsonparserparseexception
>  
> 
>
> Hi I am following guides on security.json in 
> https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin
>  
> .
>
> But when solr starts up I am getting:
>
> Caused by: org.noggit.JSONParser$ParseException: Expected ',' or '}': 
> char=",position=312 BEFORE='ssions":[{"name":"security-edit", 
> "role":"admin"}] "' AFTER='user-role":{"solr":"admin"} }} 

Looks like the JSON on that documentation page is incorrect, and has
been wrong for a very long time.  It doesn't validate when run through a
JSON validator.  If I add a comma at the end of line 10 (just before
"user-role"), then it validates.  I do not know whether this is the
correct fix, but I think it probably is.

Before I update the documentation, I would like somebody who's familiar
with this file to tell me whether I've got the right fix.

Thanks,
Shawn



Re: Caused by: org.noggit.JSONParser$ParseException: Expected ',' or '}': char=",position=312 BEFORE='ssions"

2017-04-25 Thread bay chae
doh

Thanks for the tip

It worked perfectly!!

> On 25 Apr 2017, at 19:28, Shawn Heisey  wrote:
> 
> 



Re: SolrIndexSearcher#getDocList() method returns zero results, if query includes tdate range query

2017-04-25 Thread Chris Hostetter

Diff FieldType's encode diff values into terms in diff ways.  at query 
time the FieldTypes need to be consulted to know how to build the 
resulting query object.

Solr's query parsers are "schema aware" and delegate to the appropriate 
FieldType to handle any index term encoding needed -- but the lower level 
lucene QueryParser that you ar using does not...

: org.apache.lucene.search.Query query = parser.parse(solrQueryString);
: 
: where parser is org.apache.lucene.queryparser.classic.QueryParser and then
: the following is used to get the document IDs:

...for anything except trivial StrField instances, that's not going to 
work -- not for any Trie based fields, or any TextFields (unless you go 
get the Analyzer from the schema) or any non trivial FieldTypes.

: The code worked perfectly in Solr 4.5. Now, in Solr 5.5.4, it works only if
: the query does not contain a date range query. For example, solrQueryString:

I'm not sure how that would have worked in Solr 4.5, ... unless 
perhaps your definition of a "date" field was different in the schema's 
you used in 4.5, and did not involve a Trie based date field  (the very 
old legacy date format ields used a simple String based encoding that 
might have worked)

The correct way for a plugin to do the sort of thing you are trying to do 
would be to use an instance of SolrQueryParser -- see for example the code 
in LuceneQParser and how it uses SolrQueryParser ... you'll most likeley 
just want to use LuceneQParser directly in your plugin to simplify things.



-Hoss
http://www.lucidworks.com/


Re: SolrIndexSearcher#getDocList() method returns zero results, if query includes tdate range query

2017-04-25 Thread Chris Hostetter
: The correct way for a plugin to do the sort of thing you are trying to do 
: would be to use an instance of SolrQueryParser -- see for example the code 
: in LuceneQParser and how it uses SolrQueryParser ... you'll most likeley 
: just want to use LuceneQParser directly in your plugin to simplify things.

...or depending on how low level you want to deal with things, consider 
using IndexSchema.getField(...).getFieldType().getRangeQuery(null, ...) to 
build your range Query object directly from the low/high end points 
provided as input instead of needing to build a string just to parse it 
again.


-Hoss
http://www.lucidworks.com/


Re: Caused by: org.noggit.JSONParser$ParseException: Expected ',' or '}': char=",position=312 BEFORE='ssions"

2017-04-25 Thread Fuad Efendi
Yes, absolutely correct, comma is missing at the end of line 10

All key-value pairs inside the same block should be comma separated, except
last one



From: Shawn Heisey  
Reply: solr-user@lucene.apache.org 

Date: April 25, 2017 at 2:29:03 PM
To: solr-user@lucene.apache.org 

Subject:  Re: Caused by: org.noggit.JSONParser$ParseException: Expected ','
or '}': char=",position=312 BEFORE='ssions"

On 4/25/2017 12:10 PM, bay chae wrote:
>
https://stackoverflow.com/questions/43618000/solr-standalone-basicauth-org-noggit-jsonparserparseexception
<
https://stackoverflow.com/questions/43618000/solr-standalone-basicauth-org-noggit-jsonparserparseexception>

>
> Hi I am following guides on security.json in
https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin
<
https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin>.

>
> But when solr starts up I am getting:
>
> Caused by: org.noggit.JSONParser$ParseException: Expected ',' or '}':
char=",position=312 BEFORE='ssions":[{"name":"security-edit",
"role":"admin"}] "' AFTER='user-role":{"solr":"admin"} }}

Looks like the JSON on that documentation page is incorrect, and has
been wrong for a very long time. It doesn't validate when run through a
JSON validator. If I add a comma at the end of line 10 (just before
"user-role"), then it validates. I do not know whether this is the
correct fix, but I think it probably is.

Before I update the documentation, I would like somebody who's familiar
with this file to tell me whether I've got the right fix.

Thanks,
Shawn


Atomic Updates

2017-04-25 Thread Chris Ulicny
Hello all,

Suppose I have the following fields in a document and populate all 4 fields
for every document.

id: uniqueKey, indexed and stored
integer_field: indexed and stored
text_field: indexed and stored
othertext_field: indexed but not stored

No default values, multivalues, docvalues, copyfields, or any other
properties set.

If I make an atomic update to a document like the following:
{"id":"1", "integer_field":{set: "1000"}}

what should we expect to happen with the othertext_field?

In a few tests, it seemed like the original indexed values of the record
were preserved.

I know to use atomic updates all fields should be stored since the document
is read and reindexed internally, but was curious if there was any
consistency or expected results for the state of othertext_field after an
atomic update.

Thanks,
Chris


SolrServerException: Invalid use of BasicClientConnManager: connection still allocated.

2017-04-25 Thread Putul S
Hello,

I am using single instance CloudSolrClient using my HttpClinet. Problem
with using this httpClient is that, whenever I add more than one
document, LBHttpSolrClient complains about connection not released. Everything
works fine is I do not use my own HttpClient.



HttpClient httpClient = new DefaultHttpClient(); //
also set timeout and authentication parameters

server = new CloudSolrClient("localhost:2198", httpClient);  //singleton
for the app, have tried multi-instance too

CloudSolrClient.add(document);
 //adding document in a loop result sin error


Documents get updated but I end up seeing many Zookeeper connections
established  depending on number of documents added. How do I release
connection when my HttpClient is wrapped within CluodSolrClient? I tried
calling shutting down httpClient connection manager, it did not work.


ERRORS:

org.apache.solr.common.cloud.ConnectionManager] (zkCallback-2-thread-1)
Watcher org.apache.solr.common.cloud.ConnectionManager@29e14ddb
name:ZooKeeperConnection Watcher:localhost.harvard.edu:2181 got event
WatchedEvent

….

org.apache.solr.client.solrj.SolrServerException:
java.lang.IllegalStateException: Invalid use of BasicClientConnManager:
connection still allocated.

Make sure to release the connection before allocating another one.

org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:410)

org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:325)


Thank you in advance


Putul


Re: Atomic Updates

2017-04-25 Thread Erick Erickson
How is "otherText" getting values in the first place? If
it's the destination of a copyField directive, it'll be repopulated
if the source of the copyField is stored=true.

Best,
Erick

On Tue, Apr 25, 2017 at 12:40 PM, Chris Ulicny  wrote:
> Hello all,
>
> Suppose I have the following fields in a document and populate all 4 fields
> for every document.
>
> id: uniqueKey, indexed and stored
> integer_field: indexed and stored
> text_field: indexed and stored
> othertext_field: indexed but not stored
>
> No default values, multivalues, docvalues, copyfields, or any other
> properties set.
>
> If I make an atomic update to a document like the following:
> {"id":"1", "integer_field":{set: "1000"}}
>
> what should we expect to happen with the othertext_field?
>
> In a few tests, it seemed like the original indexed values of the record
> were preserved.
>
> I know to use atomic updates all fields should be stored since the document
> is read and reindexed internally, but was curious if there was any
> consistency or expected results for the state of othertext_field after an
> atomic update.
>
> Thanks,
> Chris


Re: Troubleshooting solr errors

2017-04-25 Thread Erick Erickson
Solr likes memory. A lot. Even 2G is quite small by recent
installations I have seen.

There is an "oom killer script" that can be specified to kill Solr if
it gets an OOM, at least then you have something to warn you.

After an OOM, Java is in an indeterminate state so all bets are off.

Best,
Erick

On Tue, Apr 25, 2017 at 11:05 AM, Daniel Miller  wrote:
> The problem isn't a particular email message - I get a cascade of those
> errors (every time a new message is received) once the server "breaks".  The
> fix is to restart the server.  I did find a Java heap error in the log - so
> I've increased the memory allocation (now to -Xms512m -Xmx2048m).  I had
> thought that a heap failure would result in "simple" termination - and that
> systemd would restart it appropriately - but obviously I'm missing
> something.
>
> I was hoping to be able to help find whatever the bug might be - if indeed
> there is one - or if the problem is simply not enough memory available to
> Solr.
>
> As for the Dovecot specifics, if you'll check the Dovecot user mailing list
> archives for "Solr 6.4.1 config" you should find my post, including my
> config.  If you need it I'll be happy to re-post that message here.
> Searching is performed by the IMAP clients via Dovecot - so no manual Solr
> queries are performed.  I simply use the search function of my mail clients
> that support server-side searches (Thunderbird for Windoze/Linux, AquaMail
> for Android).
>
> --
> Daniel
>
>
> On 4/24/2017 5:43 PM, Rick Leir wrote:
>>
>> Daniel,
>> Would it be too much trouble to get some text out of that particular email
>> message, and try it in the Solr Admin Analysis tool?
>>
>> By the way, I also have my email in Dovecot. Would you be able to describe
>> how you index it and how you query to find an email? Perhaps with scripts in
>> a github project?
>> Thanks -- Rick
>>
>> On April 24, 2017 5:55:29 PM EDT, Daniel Miller  wrote:
>>>
>>> I'm running Solr 6.4.2 to index my mail server (Dovecot). Searching is
>>> great - but periodically I have Solr errors. Previously, when an error
>>> would occur Solr would terminate.  I now have it running as a systemd
>>> service so it would auto-restart - but it seems like that doesn't solve
>>> it.
>>>
>>> Some of the log lines include:
>>>
>>> 2017-04-24 18:18:31.101 ERROR (qtp594427726-30) [   x:dovecot]
>>> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException:
>>> Exception writing document id
>>> 17697/7db132200dd2df4d2f7b3bc41c5f/dmil...@amfes.com to the index;
>>> possible analysis error.
>>>
>>> 2017-04-24 18:18:31.125 ERROR (qtp594427726-32) [   x:dovecot]
>>> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error
>>> opening new searcher
>>>
>>> I don't know what else to provide to try to troubleshoot this.
>>>
>>> --
>>> Daniel
>
>


Re: Poll: Master-Slave or SolrCloud?

2017-04-25 Thread Erick Erickson
bq: I read somewhere that you should run your own ZK externally, and
turn off SolrCloud

this is a bit confused. "turn off SolrCloud" has nothing to do with
running ZK internally or externally. SolrCloud requires ZK, whether
internal or external is irrelevant to the term SolrCloud.

On to running an external ZK ensemble. Mostly, that's administratively
by far the safest. If you're running the embedded ZK, then the ZK
instances are tied to your Solr instance. Now if, for any reason, your
Solr nodes hosting ZK go down, you lose ZK quorum, can't index.
etc

Now consider a cluster with, say, 100 Solr nodes. Not talking replicas
in a collection here, I'm talking 100 physical machines. BTW, this is
not even close to the largest ones I'm aware of. Which three (for
example) are running ZK? If I want to upgrade Solr I better make
really sure not to upgrade to of the Solr instances running ZK at once
if I want my cluster to keep going

And, ZK is sensitive to system resources. So putting ZK on a Solr node
then hosing, say, updates to my Solr cluster can cause ZK to be
starved for resources.

This is one of those deals where _functionally_, it's OK to run
embedded ZK, but administratively it's suspect.

Best,
Erick

On Tue, Apr 25, 2017 at 10:49 AM, Rick Leir  wrote:
> All,
> I read somewhere that you should run your own ZK externally, and turn off 
> SolrCloud. Comments please!
> Rick
>
> On April 25, 2017 1:33:31 PM EDT, "Otis Gospodnetić" 
>  wrote:
>>This is interesting - that ZK is seen as adding so much complexity that
>>it
>>turns people off!
>>
>>If you think about it, Elasticsearch users have no choice -- except
>>their
>>"ZK" is built-in, hidden, so one doesn't have to think about it, at
>>least
>>not initially.
>>
>>I think I saw mentions (maybe on user or dev MLs or JIRA) about
>>potentially, in the future, there only being SolrCloud mode (and
>>dropping
>>SolrCloud name in favour of Solr).  If the above comment from Charlie
>>about
>>complexity is really true for Solr users, and if that's the reason why
>>we
>>see so few people running SolrCloud today, perhaps that's a good signal
>>for
>>Solr development/priorities in terms of ZK
>>hiding/automating/embedding/something...
>>
>>Otis
>>--
>>Monitoring - Log Management - Alerting - Anomaly Detection
>>Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>On Tue, Apr 25, 2017 at 4:50 AM, Charlie Hull 
>>wrote:
>>
>>> On 24/04/2017 15:58, Otis Gospodnetić wrote:
>>>
 Hi,

 I'm really really surprised here.  Back in 2013 we did a poll to see
>>how
 people were running Master-Slave (4.x back then) and SolrCloud was a
>>bit
 more popular than Master-Slave:
 https://sematext.com/blog/2013/02/25/poll-solr-cloud-or-not/

 Here is a fresh new poll with pretty much the same question - How do
>>you
 run your Solr?
>> -
 and guess what?  SolrCloud is *not* at all a lot more prevalent than
 Master-Slave.

 We definitely see a lot more SolrCloud used by Sematext Solr
 consulting/support customers, so I'm a bit surprised by the results
>>of
 this
 poll so far.

>>>
>>> I'm not particularly surprised. We regularly see clients either with
>>> single nodes or elderly versions of Solr (or even Lucene). Zookeeper
>>is
>>> still seen as a bit of a black art. Once you move from 'how do I run
>>a
>>> search engine' to 'how do I manage a cluster of servers with scaling
>>for
>>> performance/resilience/failover' you're looking at a completely new
>>set
>>> of skills and challenges, which I think puts many people off.
>>>
>>> Charlie
>>>

 Is anyone else surprised by this?  See https://twitter.com/sematext/
 status/854927627748036608

 Thanks,
 Otis
 --
 Monitoring - Log Management - Alerting - Anomaly Detection
 Solr & Elasticsearch Consulting Support Training -
>>http://sematext.com/


 ---
 This email has been checked for viruses by AVG.
 http://www.avg.com


>>>
>>> --
>>> Charlie Hull
>>> Flax - Open Source Enterprise Search
>>>
>>> tel/fax: +44 (0)8700 118334
>>> mobile:  +44 (0)7767 825828
>>> web: www.flax.co.uk
>>>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Re: Atomic Updates

2017-04-25 Thread Chris Ulicny
All fields are being explicitly populated on the initial document load
without copyFields, and the atomic updates come after.

This situation actually came up while we were planning removing copyField
properties from one of the fields in a schema for a new collection.

On Tue, Apr 25, 2017 at 3:54 PM Erick Erickson 
wrote:

> How is "otherText" getting values in the first place? If
> it's the destination of a copyField directive, it'll be repopulated
> if the source of the copyField is stored=true.
>
> Best,
> Erick
>
> On Tue, Apr 25, 2017 at 12:40 PM, Chris Ulicny  wrote:
> > Hello all,
> >
> > Suppose I have the following fields in a document and populate all 4
> fields
> > for every document.
> >
> > id: uniqueKey, indexed and stored
> > integer_field: indexed and stored
> > text_field: indexed and stored
> > othertext_field: indexed but not stored
> >
> > No default values, multivalues, docvalues, copyfields, or any other
> > properties set.
> >
> > If I make an atomic update to a document like the following:
> > {"id":"1", "integer_field":{set: "1000"}}
> >
> > what should we expect to happen with the othertext_field?
> >
> > In a few tests, it seemed like the original indexed values of the record
> > were preserved.
> >
> > I know to use atomic updates all fields should be stored since the
> document
> > is read and reindexed internally, but was curious if there was any
> > consistency or expected results for the state of othertext_field after an
> > atomic update.
> >
> > Thanks,
> > Chris
>


Re: Version conflict during data import from another Solr instance into clean Solr

2017-04-25 Thread deansg
Hi, I ran into the same problem. Chris' first solution worked for us, however
the second solution on its own doesn't work, as the conflict error arises
before the update processors' code is even reached. However, creating an
alias for the _version_ field in the dataconfig file, together with an
update processor that removes the temporary field (and possibly other
unwanted fields) seemed to work great for us.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Version-conflict-during-data-import-from-another-Solr-instance-into-clean-Solr-tp4046937p4331876.html
Sent from the Solr - User mailing list archive at Nabble.com.


Pointless query parsing before distributed processing

2017-04-25 Thread Mikhail Khludnev
Hello,
Before distributed requests are submitted, QueryComponent.prepare() is
invoked and parses the query, but then that parsed query is just thrown
away (probably it appears in debug).
It's neglectable for the most of the cases until a heavy wildcarded
{!complexphrase} query is submitted. It can spend a lot of time for terms
expansion.
How can we bypass it?

-- 
Sincerely yours
Mikhail Khludnev


Re: Poll: Master-Slave or SolrCloud?

2017-04-25 Thread Otis Gospodnetić
Hi Erick,

Could one run *only* embedded ZK on some SolrCloud nodes, sans any data?
It would be equivalent of dedicated Elasticsearch nodes, which is the
current ES best practice/recommendation.  I've never heard of anyone being
scared of running 3 dedicated master ES nodes, so if SolrCloud offered the
same, perhaps even completely hiding ZK from users, that would present the
same level of complexity (err, simplicity) ES users love about ES.  Don't
want to talk about SolrCloud vs. ES here at all, just trying to share
observations since we work a lot with both Elasticsearch and Solr(Cloud) at
Sematext.

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Tue, Apr 25, 2017 at 4:03 PM, Erick Erickson 
wrote:

> bq: I read somewhere that you should run your own ZK externally, and
> turn off SolrCloud
>
> this is a bit confused. "turn off SolrCloud" has nothing to do with
> running ZK internally or externally. SolrCloud requires ZK, whether
> internal or external is irrelevant to the term SolrCloud.
>
> On to running an external ZK ensemble. Mostly, that's administratively
> by far the safest. If you're running the embedded ZK, then the ZK
> instances are tied to your Solr instance. Now if, for any reason, your
> Solr nodes hosting ZK go down, you lose ZK quorum, can't index.
> etc
>
> Now consider a cluster with, say, 100 Solr nodes. Not talking replicas
> in a collection here, I'm talking 100 physical machines. BTW, this is
> not even close to the largest ones I'm aware of. Which three (for
> example) are running ZK? If I want to upgrade Solr I better make
> really sure not to upgrade to of the Solr instances running ZK at once
> if I want my cluster to keep going
>
> And, ZK is sensitive to system resources. So putting ZK on a Solr node
> then hosing, say, updates to my Solr cluster can cause ZK to be
> starved for resources.
>
> This is one of those deals where _functionally_, it's OK to run
> embedded ZK, but administratively it's suspect.
>
> Best,
> Erick
>
> On Tue, Apr 25, 2017 at 10:49 AM, Rick Leir  wrote:
> > All,
> > I read somewhere that you should run your own ZK externally, and turn
> off SolrCloud. Comments please!
> > Rick
> >
> > On April 25, 2017 1:33:31 PM EDT, "Otis Gospodnetić" <
> otis.gospodne...@gmail.com> wrote:
> >>This is interesting - that ZK is seen as adding so much complexity that
> >>it
> >>turns people off!
> >>
> >>If you think about it, Elasticsearch users have no choice -- except
> >>their
> >>"ZK" is built-in, hidden, so one doesn't have to think about it, at
> >>least
> >>not initially.
> >>
> >>I think I saw mentions (maybe on user or dev MLs or JIRA) about
> >>potentially, in the future, there only being SolrCloud mode (and
> >>dropping
> >>SolrCloud name in favour of Solr).  If the above comment from Charlie
> >>about
> >>complexity is really true for Solr users, and if that's the reason why
> >>we
> >>see so few people running SolrCloud today, perhaps that's a good signal
> >>for
> >>Solr development/priorities in terms of ZK
> >>hiding/automating/embedding/something...
> >>
> >>Otis
> >>--
> >>Monitoring - Log Management - Alerting - Anomaly Detection
> >>Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>On Tue, Apr 25, 2017 at 4:50 AM, Charlie Hull 
> >>wrote:
> >>
> >>> On 24/04/2017 15:58, Otis Gospodnetić wrote:
> >>>
>  Hi,
> 
>  I'm really really surprised here.  Back in 2013 we did a poll to see
> >>how
>  people were running Master-Slave (4.x back then) and SolrCloud was a
> >>bit
>  more popular than Master-Slave:
>  https://sematext.com/blog/2013/02/25/poll-solr-cloud-or-not/
> 
>  Here is a fresh new poll with pretty much the same question - How do
> >>you
>  run your Solr?
> >> -
>  and guess what?  SolrCloud is *not* at all a lot more prevalent than
>  Master-Slave.
> 
>  We definitely see a lot more SolrCloud used by Sematext Solr
>  consulting/support customers, so I'm a bit surprised by the results
> >>of
>  this
>  poll so far.
> 
> >>>
> >>> I'm not particularly surprised. We regularly see clients either with
> >>> single nodes or elderly versions of Solr (or even Lucene). Zookeeper
> >>is
> >>> still seen as a bit of a black art. Once you move from 'how do I run
> >>a
> >>> search engine' to 'how do I manage a cluster of servers with scaling
> >>for
> >>> performance/resilience/failover' you're looking at a completely new
> >>set
> >>> of skills and challenges, which I think puts many people off.
> >>>
> >>> Charlie
> >>>
> 
>  Is anyone else surprised by this?  See https://twitter.com/sematext/
>  status/854927627748036608
> 
>  Thanks,
>  Otis
>  --
>  Monitoring - Log Management - Alerting - Anomaly Detection
>  Solr & Elasticsearch Consulting Support Tra

Re: SolrIndexSearcher#getDocList() method returns zero results, if query includes tdate range query

2017-04-25 Thread Victor Solakhian
Hi Chris,

I think you are leading me to  the right direction.

I'm not sure how that would have worked in Solr 4.5, ... unless
> perhaps your definition of a "date" field was different in the schema's
> you used in 4.5, and did not involve a Trie based date field  (the very
> old legacy date format ields used a simple String based encoding that
> might have worked)


You are right. In Solr 4.5 we had:




I will need some time to digest all information you provided. I will let
you know.

Thank you very much.

Victor



On Tue, Apr 25, 2017 at 2:45 PM, Chris Hostetter 
wrote:

> : The correct way for a plugin to do the sort of thing you are trying to do
> : would be to use an instance of SolrQueryParser -- see for example the
> code
> : in LuceneQParser and how it uses SolrQueryParser ... you'll most likeley
> : just want to use LuceneQParser directly in your plugin to simplify
> things.
>
> ...or depending on how low level you want to deal with things, consider
> using IndexSchema.getField(...).getFieldType().getRangeQuery(null, ...) to
> build your range Query object directly from the low/high end points
> provided as input instead of needing to build a string just to parse it
> again.
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Poll: Master-Slave or SolrCloud?

2017-04-25 Thread Walter Underwood
1. I never saw the poll.

2. It looks better than the previous poll, which was poorly worded. I couldn’t 
answer “yes” or “no”, really.

Here is what we have in production.

Solr 3: Using every threat I can think of to get the remaining clients off of 
it. It has been shut down in test for months.

Solr 4 master/slave: Main cluster for smallish (under 1Mdoc) collections with 
daily updates, plus one that needs to move to…

Solr 6 cloud: Hosts one small collection with strong freshness requirements and 
one large collection with very difficult queries. The second is mid-transition 
from the Solr 4 cluster.

There is no reason to go to Solr Cloud for a moderate size collection with 
daily update. None. The loose coupling makes scaling out trivial, just spin up 
an exact duplicate of an existing slave. No ADDREPLICA commands or trying to 
understand how core names are mapped to node names and then to host names 
(drives me nuts). Same thing for scaling back, take it out of the load balancer 
and shoot it.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Apr 25, 2017, at 9:23 AM, Erick Erickson  wrote:
> 
> Maybe the other thing in play here is that use-cases that "just work"
> in the master/slave environment are less likely to employ consultants
> so we get something of a skewed sense of who uses what ;)
> 
> On Tue, Apr 25, 2017 at 1:50 AM, Charlie Hull  wrote:
>> On 24/04/2017 15:58, Otis Gospodnetić wrote:
>>> 
>>> Hi,
>>> 
>>> I'm really really surprised here.  Back in 2013 we did a poll to see how
>>> people were running Master-Slave (4.x back then) and SolrCloud was a bit
>>> more popular than Master-Slave:
>>> https://sematext.com/blog/2013/02/25/poll-solr-cloud-or-not/
>>> 
>>> Here is a fresh new poll with pretty much the same question - How do you
>>> run your Solr?  -
>>> and guess what?  SolrCloud is *not* at all a lot more prevalent than
>>> Master-Slave.
>>> 
>>> We definitely see a lot more SolrCloud used by Sematext Solr
>>> consulting/support customers, so I'm a bit surprised by the results of
>>> this
>>> poll so far.
>> 
>> 
>> I'm not particularly surprised. We regularly see clients either with single
>> nodes or elderly versions of Solr (or even Lucene). Zookeeper is still seen
>> as a bit of a black art. Once you move from 'how do I run a search engine'
>> to 'how do I manage a cluster of servers with scaling for
>> performance/resilience/failover' you're looking at a completely new set of
>> skills and challenges, which I think puts many people off.
>> 
>> Charlie
>>> 
>>> 
>>> Is anyone else surprised by this?  See https://twitter.com/sematext/
>>> status/854927627748036608
>>> 
>>> Thanks,
>>> Otis
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> ---
>>> This email has been checked for viruses by AVG.
>>> http://www.avg.com
>>> 
>> 
>> 
>> --
>> Charlie Hull
>> Flax - Open Source Enterprise Search
>> 
>> tel/fax: +44 (0)8700 118334
>> mobile:  +44 (0)7767 825828
>> web: www.flax.co.uk



Is there a way to specify word position in solr search query on text fields?

2017-04-25 Thread Sundeep T
Hello,

We have a text field in our schema that is indexed using the
StandardTokenizerFactory. We have set omitPositions= false, so that
positional information of individual tokens is also included in the index
data.

Question is if there is a way to construct a query in which we can specify
the position information as well?

For example, if I have two text strings like "foo bar" and "bar foo".

Now, i want to find strings which only start with "foo". Is there a way to
do that? Basically, looking whether something like position=0 for the word
"foo" can be specified as a parameter in the query

Thanks
Sundeep