Re: empty result set for a sort query

2016-12-13 Thread moscovig
Thanks for the help Yonik. 


Cheers
Gilad



--
View this message in context: 
http://lucene.472066.n3.nabble.com/empty-result-set-for-a-sort-query-tp4309256p4309500.html
Sent from the Solr - User mailing list archive at Nabble.com.


questions about SOLR vs ES

2016-12-13 Thread Bernd Fehling
Hi list,

I don't want to write-up another SOLR vs ES. Every user should build up
his own mind by installing and testing both. This is more about questions
to the developers in which direction they "think" the future of SOLR will go.

After installing the most recent version of ES I was shocked about the direction
the development took. From ES 2.x to 5.x the most useful site plugins
head, kopf and bigdesk were "kicked out" (by not supporting site plugins 
anymore)
and replaced by license ware X-Pack.
Sure X-Pack has a basic free license, but you have to register and the license
lasts one year. This gives also the ability to withdraw or change any licensing.

I'm absolutely happy with SOLR Admin and it is free of any licensing,
thanks to all developers.

Questions, will it ever be free as free is or go to license ware?

The next shock came after setting up a development system in Eclipse.
Gradle!!!
After setting everything up and compiling the Project Window of Eclipse
was full of Gradle crap.
I'm absolutely happy with ant, maven, ivy and everything how it is right now.

Question, will it stay this way or are there any intentions to change to 
something else?


Thanks,
Bernd


Re: Clob transformer not working in DIH

2016-12-13 Thread Kamal Kishore Aggarwal
Any help would be appreciated.

On 12-Dec-2016 1:20 PM, "Kamal Kishore Aggarwal" 
wrote:

> Any help guys ...
>
> On 09-Dec-2016 1:05 PM, "Kamal Kishore Aggarwal" 
> wrote:
>
>> Hi,
>>
>> I am using solr 5.4.1. Here I am using dataimport handler to index data
>> with SQL Server.
>>
>> I am using CLOB transformer to convert clob value to string. Indexing is
>> working fine but clob transformation is not working. Expected string value
>> is not coming for clob column. There is no error or exception coming in log.
>>
>> Here is the configuration:
>>
>> 
>> > url="jdbc:sqlserver://localhost;databaseName=Dictionary;"
>>  user="sa"password="" batchSize="5" />
>>
>> 
>>
>> > transformer="ClobTransformer">
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> 
>>
>> I tried using RegexTransformer, it worked. But ClobTransformer is not
>> working. Please assist.
>>
>> Regards
>> Kamal
>>
>


Solr has a CPU% spike when indexing a batch of data

2016-12-13 Thread forest_soup
Hi, 

I posted this issue to a JIRA. Could anyone help comment? Thanks!

https://issues.apache.org/jira/browse/SOLR-9741

The details:

When we doing a batch of index and search operations to SolrCloud v5.3.2, we
usually met a CPU% spike lasting about 10 min. 
We have 5 physical servers, 2 solr instances running on each server with
different port(8983 and 8984), all 8983 are in a same solrcloud, all 8984
are in another solrcloud.

You can see the chart in the attach file screenshot-1.png.
 

The thread dump are in the attach file threads.zip.
threads.zip   

During the spike, the thread dump shows most of the threads are with the
call stacks below:
"qtp634210724-4759" #4759 prio=5 os_prio=0 tid=0x7fb32803e000 nid=0x64e7
runnable [0x7fb3ef1ef000]
java.lang.Thread.State: RUNNABLE
at
java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444)
at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419)
at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298)
at java.lang.ThreadLocal.get(ThreadLocal.java:163)
at
org.apache.solr.search.SolrQueryTimeoutImpl.get(SolrQueryTimeoutImpl.java:49)
at
org.apache.solr.search.SolrQueryTimeoutImpl.shouldExit(SolrQueryTimeoutImpl.java:57)
at
org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.checkAndThrow(ExitableDirectoryReader.java:165)
at
org.apache.lucene.index.ExitableDirectoryReader$ExitableTermsEnum.(ExitableDirectoryReader.java:157)
at
org.apache.lucene.index.ExitableDirectoryReader$ExitableTerms.iterator(ExitableDirectoryReader.java:141)
at org.apache.lucene.index.TermContext.build(TermContext.java:93)
at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192)
at
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56)
at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203)
at
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56)
at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203)
at
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
at org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:56)
at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:203)
at
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:855)
at
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:838)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:486)
at org.apache.solr.search.Grouping.searchWithTimeLimiter(Grouping.java:456)
at org.apache.solr.search.Grouping.execute(Grouping.java:370)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:496)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-has-a-CPU-spike-when-indexing-a-batch-of-data-tp4309529.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: questions about SOLR vs ES

2016-12-13 Thread Erick Erickson
bq: Questions, will it ever be free as free is or go to license ware?

Solr/Lucene can't be charged for and still be an Apache Open Source
project. And there's no way to remove them from Apache and try to lock
them down. Various vendors could (and do) _add_ functionality on top
of Solr and charge for those additions or package Solr up as part of
their offering, but that doesn't affect the availability of Solr as
open source at all.

bq: Question, will it stay this way or are there any intentions to
change to something else?

It Depends (tm). There's no "official" Apache build system. If a
compelling case can be made (and there'll be vigorous discussion in
which you're free to participate) for going to anything else _and_
people are willing to put in the work then the build system might
change. You'll note that Maven is not officially supported although
some volunteers keep a Maven build configuration available.

Best,
Erick

On Tue, Dec 13, 2016 at 1:20 AM, Bernd Fehling
 wrote:
> Hi list,
>
> I don't want to write-up another SOLR vs ES. Every user should build up
> his own mind by installing and testing both. This is more about questions
> to the developers in which direction they "think" the future of SOLR will go.
>
> After installing the most recent version of ES I was shocked about the 
> direction
> the development took. From ES 2.x to 5.x the most useful site plugins
> head, kopf and bigdesk were "kicked out" (by not supporting site plugins 
> anymore)
> and replaced by license ware X-Pack.
> Sure X-Pack has a basic free license, but you have to register and the license
> lasts one year. This gives also the ability to withdraw or change any 
> licensing.
>
> I'm absolutely happy with SOLR Admin and it is free of any licensing,
> thanks to all developers.
>
> Questions, will it ever be free as free is or go to license ware?
>
> The next shock came after setting up a development system in Eclipse.
> Gradle!!!
> After setting everything up and compiling the Project Window of Eclipse
> was full of Gradle crap.
> I'm absolutely happy with ant, maven, ivy and everything how it is right now.
>
> Question, will it stay this way or are there any intentions to change to 
> something else?
>
>
> Thanks,
> Bernd


Solr MapReduce Indexer Tool is failing for empty core name.

2016-12-13 Thread Manan Sheth
Hi All,


While working on a migration project from Solr 4 to Solr 6, I need to reindex 
my data using Solr map reduce Indexer tool in offline mode with avro data.

While executing the map reduce indexer tool shipped with solr 6.2.1, it is 
throwing error of cannot create core with empty name value. The solr instances 
are running fine with new indexed are being added and modified correctly. Below 
is the command that was being fired:


hadoop --config /etc/hadoop/conf jar 
/home/impadmin/solr-6.2.1/dist/solr-map-reduce-*.jar -D 
'mapred.child.java.opts=-Xmx500m' \
   -libjars `echo /home/impadmin/solr6lib/*.jar | sed 's/ /,/g'` 
--morphline-file /home/impadmin/app_quotes_morphline_actual.conf \
   --zk-host 172.26.45.71:9984 --output-dir 
hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/MapReduceIndexerTool/output5
 \
   --collection app.quotes --log4j src/test/resources/log4j.properties 
--verbose \
 
"hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/MapReduceIndexerTool/5d63e0f8-afc1-483e-bd3f-d508c885d794-00"


Below is the complete snapshot of error trace:


Failed to initialize record writer for 
org.apache.solr.hadoop.MapReduceIndexerTool/MorphlineMapper, 
attempt_1479795440861_0343_r_00_0
at org.apache.solr.hadoop.SolrRecordWriter.(SolrRecordWriter.java:128)
at 
org.apache.solr.hadoop.SolrOutputFormat.getRecordWriter(SolrOutputFormat.java:163)
at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:540)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:614)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.solr.common.SolrException: Cannot create core with empty 
name value
at 
org.apache.solr.core.CoreDescriptor.checkPropertyIsNotEmpty(CoreDescriptor.java:280)
at org.apache.solr.core.CoreDescriptor.(CoreDescriptor.java:191)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:754)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:742)
at 
org.apache.solr.hadoop.SolrRecordWriter.createEmbeddedSolrServer(SolrRecordWriter.java:163)
at 
org.apache.solr.hadoop.SolrRecordWriter.(SolrRecordWriter.java:121) ... 9 
more

Additional points to note:


  *   The solrconfig and schema files are copied as is from Solr 4.
  *   Once collection is deployed, user can perform all operations on the 
collection without any issue.
  *   The indexation process is working fine with the same tool on Solr 4.

Please help.


Thanks,

Manan Sheth








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


'Minimum Should Match' on subquery level

2016-12-13 Thread Rahul Lodha
Hi Myron,

Can you give me an example of this?

http://grokbase.com/t/lucene/solr-user/105jjpxa2x/minimum-should-match-on-subquery-level
 


Regards,
Rahul

Rollback w/ Atomic Update

2016-12-13 Thread Todd Long
We've noticed that partial updates are not rolling back with subsequent
commits based on the same document id. Our only success in mitigating this
issue has been to issue an empty commit immediately following the rollback.
I've included an example below showing the partial updates unexpected
results. We are currently using SolrJ 4.8.1 with the default deletion policy
and auto commits disabled in the configuration. Any help would be greatly
appreciated in better understanding this scenario.

/update?commit=true (initial add)

[
  {
"id": "12345",
"createdBy_t": "John Someone"
  }
]

/update

[
  {
"id": "12345",
"favColors_txt": { "set": ["blue", "green"] }
  }
]

/update?rollback=true
-
[]

/update?commit=true

[
  {
"id": "12345",
"cityBorn_t": { "add": "Charleston" }
  }
]

/select?q=id:12345
--
[
  {
"id": "12345",
"createdBy_t": "John Someone",
"favColors_txt": ["blue", "green"],
"cityBorn_t": "Charleston"
  }
]



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Rollback-w-Atomic-Update-tp4309550.html
Sent from the Solr - User mailing list archive at Nabble.com.


Renaming a collection

2016-12-13 Thread Zheng Lin Edwin Yeo
Hi,

Is it possible to rename a collection without using a collection API? I
could not find any rename action in the Collections API.

I read from here
http://stackoverflow.com/questions/34889307/how-can-i-rename-a-core-created-in-solr

that we can use action=RENAME

http://localhost:8983/solr/admin/cores?action=RENAME&core=oldname&other=newname

However, this action=RENAME is not in the latest Solr guide, and it didn't
work either when I tried it.

I'm using Solr 6.2.1.

Regards,
Edwin


Re: Renaming a collection

2016-12-13 Thread Erick Erickson
Why not just use a collection alias?

Best,
Erick

On Tue, Dec 13, 2016 at 7:38 AM, Zheng Lin Edwin Yeo
 wrote:
> Hi,
>
> Is it possible to rename a collection without using a collection API? I
> could not find any rename action in the Collections API.
>
> I read from here
> http://stackoverflow.com/questions/34889307/how-can-i-rename-a-core-created-in-solr
>
> that we can use action=RENAME
>
> http://localhost:8983/solr/admin/cores?action=RENAME&core=oldname&other=newname
>
> However, this action=RENAME is not in the latest Solr guide, and it didn't
> work either when I tried it.
>
> I'm using Solr 6.2.1.
>
> Regards,
> Edwin


Re: Rollback w/ Atomic Update

2016-12-13 Thread Yonik Seeley
On Tue, Dec 13, 2016 at 10:36 AM, Todd Long  wrote:
> We've noticed that partial updates are not rolling back with subsequent
> commits based on the same document id. Our only success in mitigating this
> issue has been to issue an empty commit immediately following the rollback.

"rollback" is a lucene-level operation that isn't really supported at
the solr level:
https://issues.apache.org/jira/browse/SOLR-4733

-Yonik

> I've included an example below showing the partial updates unexpected
> results. We are currently using SolrJ 4.8.1 with the default deletion policy
> and auto commits disabled in the configuration. Any help would be greatly
> appreciated in better understanding this scenario.


Deep dive on the topic() streaming expression

2016-12-13 Thread Joel Bernstein
I plan on using this thread to address questions that were posted to
SOLR-4587. Below are the questions asked:


1) You mentioned that "The issue here is that it's possible that an out of
order version number could persist across commits."

Is the above possible even if I am using optimistic concurrency (
http://yonik.com/solr/optimistic-concurrency/) to write documents on Solr?

2) Query subscription is going be critical part of my project and our
subscribers won't be able to afford loss of alerts. What can I do to make
sure that there is not loss of alerts. As long as I get error message
whenever there is failure, I will make sure that my system re-tries/replays
indexing that specific document.

3) Do you happen to have any stats about possibility of data loss in Solr.
How often does that happen? Are there any best practices that we can follow
to avoid it?

4) In general, are stream expressions robust enough to be used in
production?

5) Is there any more deep dive documentation about topic(). I would love to
know its stats for query volume as big as ours (9-10 million). Or, I would
love to know how its working internally.





Joel Bernstein
http://joelsolr.blogspot.com/


Re: Renaming a collection

2016-12-13 Thread Shawn Heisey
On 12/13/2016 8:38 AM, Zheng Lin Edwin Yeo wrote:
> Is it possible to rename a collection without using a collection API?
> I could not find any rename action in the Collections API. 

No.  There is no rename functionality.  The workaround is to use the
alias feature, which Erick mentioned.  Renaming a collection could be
done with a significant amount of manual editing, both in the relevant
nodes and in zookeeper.  If you want to try this, I strongly recommend
that you shut down the relevant Solr instances.  As you can might
imagine, it would be highly disruptive.

> I read from here
> http://stackoverflow.com/questions/34889307/how-can-i-rename-a-core-created-in-solr
> that we can use action=RENAME 

That's on a *core*, not a *collection*.  Using the core rename
functionality while running SolrCloud will cause issues.  It might be as
simple as losing one shard replica, or it could completely break the
collection.

The CoreAdmin API still has the rename commmand in 6.3, but the
Collections API has never had it.  It is theoretically possible to
implement it, but it would never execute in an atomic way, which is the
main reason it has never been done.

By atomic, I mean in a manner that results in the old collection
handling requests up to the change, then the new collection handling the
very next request.  The rename would probably take a few seconds, during
which both the old and the new collections might be nonfunctional.  It
would be nearly as disruptive as doing it manually.  This is problematic
for anything trying to query or update the collection.

Collection aliases provide nearly equivalent functionality in SolrCloud
to what's available in standalone mode with core renames.  If you always
use an alias, you have the option of changing an existing alias to point
at a new real collection, or you can make a new alias and then delete
the old one.

Thanks,
Shawn



Re: Deep dive on the topic() streaming expression

2016-12-13 Thread Joel Bernstein
First a brief description of how the topic expression works.

The topic expression allows you to subscribe to a query. Below is how it
works internally.

The topic expression maps to the TopicStream in the java Streaming API. So
I encourage people who are interested to review this code.

Under the covers the TopicStream persists checkpoints to a SolrCloud
Collection that describe where the topic left off. The TopicStream uses
these checkpoints as a filter on the query to return only documents higher
then the last sent checkpoints. After each call to the topic the
checkpoints are updated.

What is a checkpoint?

A checkpoint is the highest version number read from each shard in the
collection. The topic stream sorts by _version_ asc. As it cycles through
the documents from the shards it tracks the highest version number for each
shard and persists it.

Why use version numbers?

Version numbers are monotonic longs. Each new document receives a version
number which is higher then the last document on the shard. So by sorting
on _version_  asc you can cycle through all the documents in a shard in
batches.

Can a topic miss documents?

Currently the answer is theoretically yes. But in practice I believe it
would be very rare. To miss documents the following must occur:

1) Documents must be indexed with out of order version numbers. On the
leader I believe this is no longer possible. So only the replicas have this
issue currently.

2) The out of order version numbers must cross commit boundaries. This
means that a commit must occur while an out of order document is outside
the index.

3) The topic must pull the out of order committed document before the next
commit occurs. Once the out of order document is committed the sort by
version number will fix up the out of order documents.


Since #1 can be eliminated by only querying the leaders, that is one
possible option for dealing with the issue. But this will cut down on
scalability.

But, in my testing getting #1, #2 and #3 to actually occur is very hard.
This is particularly true if commit windows are short because that leaves a
very short window for #2 and #3 to line up. For example a one second
softCommit would allow only a one second window for #2 and #3 to occur at
the same time and this would have to coincide with #1.

I've spent days attempting to make the TopicStream lose data with different
types of stress tests and I've never been able to make it happen.



































Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Dec 13, 2016 at 11:22 AM, Joel Bernstein  wrote:

> I plan on using this thread to address questions that were posted to
> SOLR-4587. Below are the questions asked:
>
>
> 1) You mentioned that "The issue here is that it's possible that an out of
> order version number could persist across commits."
>
> Is the above possible even if I am using optimistic concurrency (
> http://yonik.com/solr/optimistic-concurrency/) to write documents on Solr?
>
> 2) Query subscription is going be critical part of my project and our
> subscribers won't be able to afford loss of alerts. What can I do to make
> sure that there is not loss of alerts. As long as I get error message
> whenever there is failure, I will make sure that my system re-tries/replays
> indexing that specific document.
>
> 3) Do you happen to have any stats about possibility of data loss in Solr.
> How often does that happen? Are there any best practices that we can follow
> to avoid it?
>
> 4) In general, are stream expressions robust enough to be used in
> production?
>
> 5) Is there any more deep dive documentation about topic(). I would love
> to know its stats for query volume as big as ours (9-10 million). Or, I
> would love to know how its working internally.
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>


Re: Solr has a CPU% spike when indexing a batch of data

2016-12-13 Thread Shawn Heisey
On 12/13/2016 6:09 AM, forest_soup wrote:
> I posted this issue to a JIRA. Could anyone help comment? Thanks!
>
> https://issues.apache.org/jira/browse/SOLR-9741

Please use the mailing list *before* opening an issue in Jira.  If at
all possible, we want to be sure that problems are caused by a real bug
in the software before an issue is created.

> When we doing a batch of index and search operations to SolrCloud v5.3.2, we
> usually met a CPU% spike lasting about 10 min. 
> We have 5 physical servers, 2 solr instances running on each server with
> different port(8983 and 8984), all 8983 are in a same solrcloud, all 8984
> are in another solrcloud.
>
> You can see the chart in the attach file screenshot-1.png.
>  
>
> The thread dump are in the attach file threads.zip.
> threads.zip   
>
> During the spike, the thread dump shows most of the threads are with the
> call stacks below:

That stacktrace indicates the thread is doing a query.  If most of the
threads have that stacktrace, it means Solr is handling a lot of
simultaneous queries.  That can cause a CPU spike.  I checked one of the
thread dumps

Indexing tends to use a lot of resources.  If you are doing all your
indexing to the same HTTP endpoint (in a way that doesn't send the
request to the correct shard leader), that will also make Solr work harder.

You appear to be running Solr with SSL.  This is going to increase CPU
requirements.  I wouldn't expect the increase to be very high, but if
CPU is already a problem, that will make it worse.

Your iowait CPU percentage appears to be nearly nonexistent, so I might
be barking up the wrong tree with some of the following questions, but
I'll go ahead and ask them anyway:

* What is the total physical memory in the machine?
* What is the max heap on each of the two Solr processes?
* What is the total index size in each Solr process?
* What is the total tlog size in each Solr process?
* What are your commit characteristics like -- both manual and automatic.
* Do you have WARN or ERROR messages in your logfile?

Thanks,
Shawn



Re: Solr has a CPU% spike when indexing a batch of data

2016-12-13 Thread Shawn Heisey
On 12/13/2016 10:25 AM, Shawn Heisey wrote:
> That stacktrace indicates the thread is doing a query. If most of the
> threads have that stacktrace, it means Solr is handling a lot of
> simultaneous queries. That can cause a CPU spike. I checked one of the
> thread dumps

I didn't complete that sentence.

I checked one of the thread dumps and saw a number of ongoing queries,
plus some delegated requests to other hosts, and some ongoing indexing
requests.  I am unsure what conclusions can be made from what I saw.

I thought of some more questions:

* How many collections are in each cloud?
* How many servers are in each cloud?

Thanks,
Shawn



Re: Rollback w/ Atomic Update

2016-12-13 Thread Todd Long
Yonik Seeley wrote
> "rollback" is a lucene-level operation that isn't really supported at
> the solr level:
> https://issues.apache.org/jira/browse/SOLR-4733

I find it odd that this unsupported operation has been around since Solr
1.4. In this case, it seems like there is some underlying issue specific to
partial updates.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Rollback-w-Atomic-Update-tp4309550p4309596.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: CREATEALIAS to non-existing collections

2016-12-13 Thread Chris Hostetter

: > We currently support requests to CREATEALIAS to collections that don’t
: > exist. Requests to this alias later result in 404s. If the target
: > collection is later created, requests to the alias will begin to work. 
I’m
: > wondering if someone is relying on this behavior, or if we should 
validate
: > the existence of the target collections when creating the alias (and 
thus,
: > fail fast in cases of typos or unexpected cluster state)

+1 ... failing fast on ALIAS creation by default seems like the best 
default behavior, we could always support a new 
"allowNonExistentCollecions=true" param (defaulting to 'false') for 
backcompat or advanced users who want to "pre-create" aliases before hte 
collections.


: I’d prefer it if the alias was required to be removed, or pointed 
: elsewhere, before the collection could be deleted.

also +1 ...

Having action=DELETE fail on a collection if it's part of any aliases 
seems like much better (default) behavior then having it proactively 
delete all aliases the collection is part of.

And similar to above, a (new) advanced "removeAnyAliases=true" param 
(defaulting to false) could be support the more aggressive deletion if 
folks thik it's an important value add for users.


In general I agree with the overall premise that safety values 
during collection delete (based on pre-existing aliases) should go part 
and parcel with any safety values we add on alias creation.




-Hoss
http://www.lucidworks.com/

Solr 6 Default Core URL

2016-12-13 Thread Max Bridgewater
I have one Solr core on my solr 6 instance and I can query it with:

http://localhost:8983/solr/mycore/search?q=*:*

Is there a way to configure solr 6 so that I can simply query it with this
simple URL?

http://localhost:8983/search?q=*:*


Thanks.
Max,


Re: Traverse over response docs in SearchComponent impl.

2016-12-13 Thread Chris Hostetter

FWIW: Perhaps an XY problem?  can you explain more in depth what it is you 
plan on doing in this search component?

: I can see that Solr calls the component's process() method, but from 
: within that method, rb.getResponseDocs(); is always null. No matter what 
: i try, i do not seem to be able to get a hold of that list of response 
: docs.

IIRC getResponseDocs() is only non-null when agregating distributed/cloud 
resultsfrom multiple shards (where we already have a fully 
populated SolrDocumentList due to agregating the remote responses), but in 
a single-node Solr request only a "DocList" is used, and the stored field 
values are read lazily from the IndexReader by the ResponseWriter.

So if you're not writting a distributed component, check 
ResponseBuilder.getResults() ?

Even if you are writting a component for a distributed solr setup, what 
method you call (and where you call it) depends a lot on when/where you 
expect your code to run...

IIRC: 
* prepare() runs on every node for every request (original aggregation 
request and every sub-request to each shard).  
* distributedProcess runs on the aggregation node, and is called 
repeatedly for each "stage" requested by any components (so at a minimum once, 
usually twice to fetch stored fields, maybe more if there are multiple 
facet refinement phases, etc...).  
* modifyRequest() & handleResponses() are called on the aggregation node 
prior/after every sub-request to every shard.
* process() is called on each shard for each sub request. 
* finishStage is called on the aggreation node at the ned of each stage 
(after all the responses from all shards for that sub-request)


...so something like HighlightComponent does it's main work in the 
process() method, because it only needs the data for each doc, the impacts 
of other (aggregated) docs don't affect the results -- then later 
finishStage combines the results.

If you on the otherhand want to look at all of the *final* documents being 
returned to the user, not on a per-shard basis but on an aggregate basis, 
you'd want to put that logic in something like finishStage and check for 
the stage that does a GET_FIELDS -- but if you want your component to 
*also* work in non-cloud mode, you'd need the same logic in your process() 
method (looking at the DocList instead of the SolrDocumentList, with a 
conditional to check for distrib=false so you don't waste a bunch of work 
on per-shard queries when it is in fact being used in cloud-mode)


None of this is very straight forward, but you are admitedly geting int 
overy advanced expert territory here.



-Hoss
http://www.lucidworks.com/


Re: Solr 6 Default Core URL

2016-12-13 Thread Chris Hostetter

No, Solr stoped supporting the concept of a default core back in Solr5.

The only tangible benefit to having a default was being able to change the 
default to point at a diff core w/o impacting existing users.

You can easily do the same thing by creating a core/collection alias that 
you use in your clients, which you can later change.



: Date: Tue, 13 Dec 2016 16:54:18 -0500
: From: Max Bridgewater 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Solr 6 Default Core URL
: 
: I have one Solr core on my solr 6 instance and I can query it with:
: 
: http://localhost:8983/solr/mycore/search?q=*:*
: 
: Is there a way to configure solr 6 so that I can simply query it with this
: simple URL?
: 
: http://localhost:8983/search?q=*:*
: 
: 
: Thanks.
: Max,
: 

-Hoss
http://www.lucidworks.com/


Re: "on deck" searcher vs warming searcher

2016-12-13 Thread Chris Hostetter


(disclaimer: i'm writing this all from memory, maybe there was some code 
change at some point that i'm not aware of and i'm completley wrong)


: I've always understood the "on deck" searcher(s) being the same as the
: warming searcher(s). So you have the "active" searcher and them the warming
: or on deck searchers.

IIRC there is a subtle but significant distinction between "ondeck" 
searchers and warming searchers -- namely: there is only ever one (ondeck) 
searcher being warmed at a time, but there can be multiple ondeck 
searchers *WAITING* to be warmed.

maxWarmingSearchers is in hindsight not the greatest name for that cnfig 
option -- because there's never more then one searcher being warmed at 
a time.  it esentially means "max number of searchers i'm willing to let 
queue up to be warmed"


Or to elaborate and put it all in context:

* There is only ever (at most) one "current" searcher:
** when a request comes in, it grabs a refrence to the current searcher
** when a "newSearcher" event finishes, the "current" searcher can be 
replaced, but the (previous) searcher stays open as long as there are 
requests in progress that have references to it
** for this reason: there can be multiple "active" searchers in use

* There can be any number of searchers ondeck:
** an ondeck searcher is created when a "newSearcher" event starts
** ondeck searchers can be "warmed up" based on the caches of the 
"current" searcher, during which time they are considered 
a "warming searcher"
** "warming" happens i na single threaded executor -- so if there are 
multiple ondeck searchers, only one of them at a time is ever a "warming" 
searcher
** multiple ondeck searchers can be a sign of a potential performance 
problem (hence the log warning) because it means Solr knows you are 
(currently) actively warming a seacher pointed at some commit point which 
solr already knows is stale and plans to open another searcher immediatley 
after it -- ie: commits are happening more frequently then searchers can 
be warmed.

If you imagine a a scenerio where:
* search requests that take over 10 seconds happen continuously all day
* caches are sized such that warming a newSearcher takes 20 seconds
* you set maxWarmingSearchers=15
* openSearcher=true commits happen once per second

Then if i'm not mistaken you would be in a situation where there would be 
10 "active" searchers (the newest of which is "current"), 15 "ondeck" 
searchers (the oldest of which would be currently be a "warming" searcher


-Hoss
http://www.lucidworks.com/


Re: Solr 6 Default Core URL

2016-12-13 Thread billnbell
Yes with Nginx in front of it 

Bill Bell
Sent from mobile


> On Dec 13, 2016, at 2:54 PM, Max Bridgewater  
> wrote:
> 
> I have one Solr core on my solr 6 instance and I can query it with:
> 
> http://localhost:8983/solr/mycore/search?q=*:*
> 
> Is there a way to configure solr 6 so that I can simply query it with this
> simple URL?
> 
> http://localhost:8983/search?q=*:*
> 
> 
> Thanks.
> Max,


Re: Renaming a collection

2016-12-13 Thread Zheng Lin Edwin Yeo
Hi Erick and Shawn,

Thanks for the response and information.

Regards,
Edwin

On 14 December 2016 at 00:53, Shawn Heisey  wrote:

> On 12/13/2016 8:38 AM, Zheng Lin Edwin Yeo wrote:
> > Is it possible to rename a collection without using a collection API?
> > I could not find any rename action in the Collections API.
>
> No.  There is no rename functionality.  The workaround is to use the
> alias feature, which Erick mentioned.  Renaming a collection could be
> done with a significant amount of manual editing, both in the relevant
> nodes and in zookeeper.  If you want to try this, I strongly recommend
> that you shut down the relevant Solr instances.  As you can might
> imagine, it would be highly disruptive.
>
> > I read from here
> > http://stackoverflow.com/questions/34889307/how-can-i-
> rename-a-core-created-in-solr
> > that we can use action=RENAME
>
> That's on a *core*, not a *collection*.  Using the core rename
> functionality while running SolrCloud will cause issues.  It might be as
> simple as losing one shard replica, or it could completely break the
> collection.
>
> The CoreAdmin API still has the rename commmand in 6.3, but the
> Collections API has never had it.  It is theoretically possible to
> implement it, but it would never execute in an atomic way, which is the
> main reason it has never been done.
>
> By atomic, I mean in a manner that results in the old collection
> handling requests up to the change, then the new collection handling the
> very next request.  The rename would probably take a few seconds, during
> which both the old and the new collections might be nonfunctional.  It
> would be nearly as disruptive as doing it manually.  This is problematic
> for anything trying to query or update the collection.
>
> Collection aliases provide nearly equivalent functionality in SolrCloud
> to what's available in standalone mode with core renames.  If you always
> use an alias, you have the option of changing an existing alias to point
> at a new real collection, or you can make a new alias and then delete
> the old one.
>
> Thanks,
> Shawn
>
>


Solr - Amazon like search

2016-12-13 Thread vasanth vijayaraj
Hello,

We are building an e-commerce mobile app. I have implemented Solr search and 
autocomplete. 
But we like the Amazon search and are trying to implement something like that. 
Attached a screenshot 
of what has been implemented so far

The search/suggest should sort list of products based on popularity, document 
hits and more. 
How do we achieve this? Please help us out here. 

Thanks

Re: Deep dive on the topic() streaming expression

2016-12-13 Thread Hemant Purswani
Thanks for the explanation, Joel.

I tried topic function today and it's working great. But I think it lacks
something that I will be needing for my use case. Currently, it returns
response in the following format. As you can see, it's not returning
topicId.


{ "result-set": { "docs": [ { "id": "123456", "_version_":
1553668767834177500 }, { "EOF": true, "RESPONSE_TIME": 81, "sleepMillis": 0
} ] } }

The way I was thinking of using it was to publish message on Kafka with
topicId and documentId, every time a document matches a query. And then on
my Kafka consumer side, I will get subscriber emails from DB using topicId
and email them subsequently.

Is it possible to implement such functionality using topic function?

Thanks,

Hemant