date:20150312

Re: Cores and and ranking (search quality)

2015-03-12 Thread Erick Erickson

SOLR-1632 will certainly help. But trying to predict whether your core
A or core B will appear first doesn't really seem like a good use of
time. If you actually have a setup like you describe, add &debug=all
to your query on both cores and you'll see all the gory detail of how
the scores are calculated, providing a definitive answer in _your_
situation.

Best,
Erick

On Mon, Mar 9, 2015 at 5:44 AM,   wrote:
> (reposing this to see if anyone can help)
>
>
> Help me understand this better (regarding ranking).
>
> If I have two docs that are 100% identical with the exception of uid (which 
> is stored but not indexed).  In a single core setup, if I search "xyz" such 
> that those 2 docs end up ranking as #1 and #2.  When I switch over to two 
> core setup, doc-A goes to core-A (which has 10 records) and doc-B goes to 
> core-B (which has 100,000 records).
>
> Now, are you saying in 2 core setup if I search on "xyz" (just like in singe 
> core setup) this time I will not see doc-A and doc-B as #1 and #2 in ranking? 
>  That is, are you saying doc-A may now be somewhere at the top / bottom far 
> away from doc-B?  If so, which will be #1: the doc off core-A (that has 10 
> records) or doc-B off core-B (that has 100,000 records)?
>
> If I got all this right, are you saying SOLR-1632 will fix this issue such 
> that the end result will now be as if I had 1 core?
>
> - MJ
>
>
> -Original Message-
> From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
> Sent: Thursday, March 5, 2015 9:06 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Cores and and ranking (search quality)
>
> On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote:
>> My question is this: if I put my data in multiple cores and use
>> distributed search will the ranking be different if I had all my data
>> in a single core?
>
> Yes, it will be different. The practical impact depends on how homogeneous 
> your data are across the shards and how large your shards are. If you have 
> small and dissimilar shards, your ranking will suffer a lot.
>
> Work is being done to remedy this:
> https://issues.apache.org/jira/browse/SOLR-1632
>
>> Also, will facet and more-like-this quality / result be the same?
>
> It is not formally guaranteed, but for most practical purposes, faceting on 
> multi-shards will give you the same results as single-shards.
>
> I don't know about more-like-this. My guess is that it will be affected in 
> the same way that standard searches are.
>
>> Also, reading the distributed search wiki
>> (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr
>> does the search and result merging (all I have to do is issue a
>> search), is this correct?
>
> Yes. From a user-perspective, searches are no different.
>
> - Toke Eskildsen, State and University Library, Denmark
>

Re: Solr TCP layer

2015-03-12 Thread Yago Riveiro

IMO each mega of memory saved has more impact that 0.001 less in latency … an 
OOM is killer, a lag of 2 second … is not catastrophic.

—
/Yago Riveiro

On Tue, Mar 10, 2015 at 4:03 PM, Erick Erickson 
wrote:

> Just to pile on:
> I admire your bravery! I'll add to the other comments only by saying
> that _before_ you start down this path, you really need to articulate
> the benefit/cost analysis. "to gain a little more communications
> efficiency" will be a pretty hard sell due to the reasons Shawn
> outlined. This is hugely risky and would require a lot of work for
> as-yet-unarticulated benefits.
> There are lots and lots of other things to work on of significantly
> greater impact IMO. How would you like to work on something to help
> manage Solr's memory usage for instance ;)?
> Best,
> Erick
> On Mon, Mar 9, 2015 at 9:24 AM, Reitzel, Charles
>  wrote:
>> A couple thoughts:
>> 0. Interesting topic.
>> 1. But perhaps better suited to the dev list.
>> 2. Given the existing architecture, shouldn't we be looking to transport 
>> projects, e.g. Jetty, Apache HttpComponents, for support of new socket or 
>> even HTTP layer protocols?
>> 3. To the extent such support exists, then integration work is still needed 
>> at the solr level.  Shalin, is this your intention?
>>
>> Also, for those of us not tracking protocol standards in detail, can you 
>> describe the benefits to Solr users of http/2?
>>
>> Do you expect HTTP/2 to be transparent at the application layer?
>>
>> -Original Message-
>> From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
>> Sent: Monday, March 09, 2015 6:23 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr TCP layer
>>
>> Hi Saumitra,
>>
>> I've been thinking of adding http/2 support for inter node communication 
>> initially and client server communication next in Solr. There's a patch for 
>> SPDY support but now that spdy is deprecated and http/2 is the new standard 
>> we need to wait for Jetty 9.3 to release. That will take care of many 
>> bottlenecks in solrcloud communication. The current trunk is already using 
>> jetty 9.2.x which has support for the draft http/2 spec.
>>
>> A brand new async TCP layer based on netty can be considered but that's a 
>> huge amount of work considering our need to still support simple http, SSL 
>> etc. Frankly for me that effort is better spent optimizing the routing layer.
>> On 09-Mar-2015 1:37 am, "Saumitra Srivastav" 
>> wrote:
>>
>>> Dear Solr Contributors,
>>>
>>> I want to start working on adding a TCP layer for client to node and
>>> inter-node communication.
>>>
>>> I am not up to date on recent changes happening to Solr. So before I
>>> start looking into code, I would like to know if there is already some
>>> work done in this direction, which I can reuse. Are there any know
>>> challenges/complexities?
>>>
>>> I would appreciate any help to kick start this effort. Also, what
>>> would be the best way to discuss and get feedback on design from
>>> contributors? Open a JIRA??
>>>
>>> Regards,
>>> Saumitra
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>> *
>> This e-mail may contain confidential or privileged information.
>> If you are not the intended recipient, please notify the sender immediately 
>> and then delete it.
>>
>> TIAA-CREF
>> *

Re: Combine multiple SOLR Query Results

2015-03-12 Thread Erick Erickson

You simply cannot compare scores from two separate queries, comparing
them is meaningless.

This appears to be an XY problem, you're asking _how_ to do something
without telling us _what_ the end goal here is.

>From your description, I really have no idea what you're trying to do.

You might review:

http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Mon, Mar 9, 2015 at 7:56 AM, Reitzel, Charles
 wrote:
> Hi AnilJayanti,
>
> You shouldn't need 2 separate solr queries.   Just make sure both 'track 
> name' and 'artist name' fields are queried.  Solr will rank and sort the 
> results for you.
>
> e.q. q=foo&qf=trackName,artistName
>
> This is preferable for a number of reasons.  I will be faster and simpler.  
> But, also, highlight results should be better.
>
> hth,
> Charlie
>
> -Original Message-
> From: aniljayanti [mailto:aniljaya...@yahoo.co.in]
> Sent: Monday, March 09, 2015 6:20 AM
> To: solr-user@lucene.apache.org
> Subject: Combine multiple SOLR Query Results
>
> Hi,
>
> I am trying to work on combine multiple SOLR query results into single 
> result. Below is my case.
>
> 1.  Look up search term against ‘track name’, log results
> 2.  Look up search term against ‘artist name’, log results of tracks by 
> those
> artists
> 3.  Combine results
> 4.  results by score descending order.
>
> Using "text_general" fieldType for both track name and artist name.
> copy fields are trackname and artistname
>
> Plase suggest me how to write solr Query to combine two solr results into 
> single result.
>
> Thanks in advance.
>
> AnilJayanti
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Combine-multiple-SOLR-Query-Results-tp4191816.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
> *
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender immediately 
> and then delete it.
>
> TIAA-CREF
> *

Re: Solrcloud Index corruption

2015-03-12 Thread Martin de Vries


Ahhh, ok. When you reloaded the cores, did you do it core-by-core?


Yes, but maybe we reloaded the wrong core or something like that. We 
also noticed that the startTime doesn't update in the admin-ui while 
switching between cores (you have to reload the page). We still use 
4.8.1, so maybe it is fixed in a later version. We will see after our 
next upgrade, if not we will add an issue for it.



Martin



Erick Erickson schreef op 10.03.2015 18:21:


Ahhh, ok. When you reloaded the cores, did you do it core-by-core?
I can see how something could get dropped in that case.

However, if you used the Collections API and two cores mysteriously
failed to reload that would be a bug. Assuming the replicas in 
question

were up and running at the time you reloaded.

Thanks for letting us know what's going on.
Erick

On Tue, Mar 10, 2015 at 4:34 AM, Martin de Vries
 wrote:


Hi,

this _sounds_ like you somehow don't have indexed="true" set for 
the

field in question.
We investigated a lot more. The CheckIndex tool didn't find any 
error.
We now think the following happened: - We changed the schema two 
months
ago: we changed a field to indexed="true". We reloaded the cores, 
but

two of them doesn't seem to be reloaded (maybe we forgot). - We
reindexed all content. The new field worked fine. - We think the 
leader
changed to a server that didn't reload the core - After that we 
field
stopped working for new indexed documents Thanks for your help. 
Martin

Erick Erickson schreef op 06.03.2015 17:02:


bq: You say in our case some docs didn't made it to the node, but
that's not really true: the docs can be found on the corrupted 
nodes
when I search on ID. The docs are also complete. The problem is 
that
the docs do not appear when I filter on certain fields this 
_sounds_

like you somehow don't have indexed="true" set for the field in
question. But it also sounds like you're saying that search on that
field works on some nodes but not on others, I'm assuming you're
adding "&distrib=false" to verify this. It shouldn't be possible to
have different schema.xml files on the different nodes, but you 
might
try checking through the admin UI. Network burps shouldn't be 
related
here. If the content is stored, then the info made it to Solr 
intact,

so this issue shouldn't be related to that. Sounds like it may just
be the bugs Mark is referencing, sorry I don't have the JIRA 
numbers

right off. Best, Erick On Thu, Mar 5, 2015 at 4:46 PM, Shawn Heisey
 wrote:


On 3/5/2015 3:13 PM, Martin de Vries wrote:


I understand there is not a "master" in SolrCloud. In our case we
use haproxy as a load balancer for every request. So when
indexing every document will be sent to a different solr server,
immediately after each other. Maybe SolrCloud is not able to
handle that correctly?

SolrCloud can handle that correctly, but currently sending index
updates to a core that is not the leader of the shard will incur a
significant performance hit, compared to always sending updates to
the correct core. A small performance penalty would be
understandable, because the request must be redirected, but what
actually happens is a much larger penalty than anyone expected. We
have an issue in Jira to investigate that performance issue and
make it work as efficiently as possible. Indexing batches of
documents is recommended, not sending one document per update
request. General performance problems with Solr itself can lead to
extremely odd and unpredictable behavior from SolrCloud. Most 
often

these kinds of performance problems are related in some way to
memory, either the java heap or available memory in the system.
http://wiki.apache.org/solr/SolrPerformanceProblems [1] [1] 
Thanks,

Shawn
Links: -- [1] 
http://wiki.apache.org/solr/SolrPerformanceProblems

[3]




Links:
--
[1] http://wiki.apache.org/solr/SolrPerformanceProblems
[2] mailto:apa...@elyograg.org
[3] http://wiki.apache.org/solr/SolrPerformanceProblems

Re: Chaining components in request handler

2015-03-12 Thread Ashish Mukherjee

Would like to do it during querying.

Thanks,
Ashish

On Tue, Mar 10, 2015 at 11:07 PM, Alexandre Rafalovitch 
wrote:

> Is that during indexing or during query phase?
>
> Indexing has UpdateRequestProcessors (e.g.
> http://www.solr-start.com/info/update-request-processors/ )
> Query has Components (e.g. Faceting, MoreLIkeThis, etc)
>
> Or something different?
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 10 March 2015 at 13:34, Ashish Mukherjee 
> wrote:
> > Hello,
> >
> > I would like to create a request handler which chains components in a
> > particular sequence to return the result, similar to a Unix pipe.
> >
> > eg. Component 1 -> result1 -> Component 2 -> result2
> >
> > result2 is final result returned.
> >
> > Component 1 may be a standard component, Component 2 may be out of the
> box.
> >
> > Is there any tutorial which describes how to wire together components
> like
> > this in a single handler?
> >
> > Regards,
> > Ashish
>

Re: default heap size for solr 5.0? (-Xmx param)

2015-03-12 Thread Karl Kildén

Actually the reason I did not use the solr script was that I didn't really
get how to make a window service out of it from nssm.exe. I tried doing a
.bat that called solr with start -p 8983 but seems it just loops my command
rather then run it.

Thanks for the help / Karl

On 11 March 2015 at 23:08, Erick Erickson  wrote:

> Well, the new way will be the only way eventually, so either you learn
> the old way then switch or learn it now ;)...
>
> But if you insist you could start with a heap size of 4G like this:
>
> java -Xmx4G -Xms4G -jar start.jar
>
> Best,
> Erick
>
>
> On Wed, Mar 11, 2015 at 1:09 PM, Karl Kildén 
> wrote:
> > Thanks!
> >
> > I am using the old way and I see no reason to switch really?
> >
> > cheers
> >
> > On 11 March 2015 at 20:18, Shawn Heisey  wrote:
> >
> >> On 3/11/2015 12:25 PM, Karl Kildén wrote:
> >> > I am a solr beginner. Anyone knows how solr 5.0 determines the max
> heap
> >> > size? I can't find it anywhere.
> >> >
> >> > Also, where whould you activate jmx? Would like to be able to use
> >> visualvm
> >> > in the future I imagine.
> >> >
> >> > I have a custom nssm thing going that installs it as a window service
> >> that
> >> > simply calls java -jar start.jar
> >>
> >> The default heap size is 512m.  This is hardcoded in the bin/solr
> >> script.  You can override that with the -m parameter.
> >>
> >> If you are not using the bin/solr script and are instead doing the old
> >> "java -jar start.jar" startup, the default heap size is determined by
> >> the version of Java you are running.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>

Update solr schema.xml in real time for Solr 4.10.1

2015-03-12 Thread Zheng Lin Edwin Yeo

Hi,

I understand that in Solr 5.0, they provide a REST API to do real-time
update of the schema using Curl. However, I could not do that for my
eariler version of Solr 4.10.1.

Would like to check, is this function available for the earlier version of
Solr, and is the curl syntax the same as Solr 5.0?

Regards,
Edwin

Re: Invalid Date String:'1992-07-10T17'

2015-03-12 Thread Shawn Heisey

On 3/10/2015 1:39 PM, Ryan, Michael F. (LNG-DAY) wrote:
> You'll need to wrap the date in quotes, since it contains a colon:
>
> String a = "speechDate:\"1992-07-10T17:33:18Z\"";

You could also escape the colons with a backslash.  Here's another way
to do it that doesn't require quotes or manual escaping:

  String d = "1992-07-10T17:33:18Z";
  String a = "speechDate:" + ClientUtils.escapeQueryChars(d);

If you wanted to go to the trouble of using StringBuilder instead of
string concatenation for performance reasons, you could certainly do that.

This is the class you need to import in order to use escapeQueryChars:

http://lucene.apache.org/solr/4_10_2/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html

Thanks,
Shawn

Does Solr runs on MapR file system?

2015-03-12 Thread Shenghua(Daniel) Wan

I tried to run Solr over HDFS following
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
when I was testing map reduce way of index generation.

However, when I run Solr on MapRFS, solr gave error that could not
recognize maprfs:// scheme in URI.

Have anyone met similar issues?
Thanks.

-- 

Regards,
Shenghua (Daniel) Wan

Re: Update solr schema.xml in real time for Solr 4.10.1

2015-03-12 Thread Nitin Solanki

Hi Zheng,

*** I understand that in Solr 5.0, they provide a REST API to do real-time
update of the schema using Curl ** *. Would please help me how to do this?
I need to update both schema.xml and solrconfig.xml in Solr 5.0 in
SolrCloud.
Your help is appreciated..

*Thanks Again..*

On Thu, Mar 12, 2015 at 1:30 PM, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I understand that in Solr 5.0, they provide a REST API to do real-time
> update of the schema using Curl. However, I could not do that for my
> eariler version of Solr 4.10.1.
>
> Would like to check, is this function available for the earlier version of
> Solr, and is the curl syntax the same as Solr 5.0?
>
> Regards,
> Edwin
>

Missing doc fields

2015-03-12 Thread phiroc



Hello,

when I display one of my core's  schema, lots of fields appear:

"fields":[{
"name":"_root_",
"type":"string",
"indexed":true,
"stored":false},
  {
"name":"_version_",
"type":"long",
"indexed":true,
"stored":true},
  {
"name":"id",
"type":"string",
"multiValued":false,
"indexed":true,
"required":true,
"stored":true},
  {
"name":"ymd",
"type":"tdate",
"indexed":true,
"stored":true}],
   


Yet, when I display $results in the richtext_doc.vm Velocity template, 
documents only contain three fields (id, _version_, score):

SolrDocument{id=3, _version_=1495262517955395584, score=1.0}, 


How can I increase the number of doc fields?

Many thanks.

Philipppe

Re: increase connections on tomcat

2015-03-12 Thread Shawn Heisey

On 3/11/2015 8:56 AM, SolrUser1543 wrote:
> Client application which queries solr needs to increase a number of
> simultaneously connections in order to improve performance ( in additional
> to get solr results, it needs to get an internal resources like images. ) 
> But this increment has improved client performance, but caused degradation
> in solr  .
>
> what I think is that I need to increase a number of connection in order to
> allow to more requests run between solr shards. 
>
> How can I prove that I need?
> How can I increase it on tomcat? ( on each shard )

Hopefully this isn't an XY problem.

http://people.apache.org/~hossman/#xyproblem

To accomplish what you have requested, you will want to increase the
maxThreads parameter in the tomcat config.  It defaults to 200, we have
included a setting of 1 in the example jetty server.  For most
installations, a value of 1 means there is effectively no limit on
the number of threads allowed.  Solr will behave unpredictably if it is
prevented from starting threads, and it is very easy to exceed 200
threads, especially if the container is serving requests for other
things besides Solr.

To configure more connections to other machines for distributed search,
you need to configure the shard handler in your solrconfig.xml file.  In
particular you need to be worried about maxConnectionsPerHost and
maxConnections.

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests#DistributedRequests-ConfiguringtheShardHandlerFactory

Thanks,
Shawn

DocumentAnalysisRequestHandler

2015-03-12 Thread phiroc

Hello,

my solr logs say:

INFO  - 2015-03-12 08:49:34.900; org.apache.solr.core.RequestHandlers; created 
/analysis/document: solr.DocumentAnalysisRequestHandler
WARN  - 2015-03-12 08:49:34.919; org.apache.solr.core.SolrResourceLoader; Solr 
loaded a deprecated plugin/analysis class [solr.admin.AdminHandlers]. Please 
consult documentation how to replace it accordingly.


Is /analysis/document deprecated in SOLR 5?




What is the modern equivalent of Luke?

Many thanks.

Philippe

Re: test cases for solr cloud

2015-03-12 Thread Aman Tandon

Anyone please suggest

With Regards
Aman Tandon

On Sat, Mar 7, 2015 at 9:55 PM, Aman Tandon  wrote:

> Hi,
>
> Please suggest me what should be the tests which i should run to check the
> availability, query time, etc in my solr cloud setup.
>
> With Regards
> Aman Tandon
>

sort by given order

2015-03-12 Thread Johannes Siegert


Hi,

i want to sort my documents by a given order. The order is defined by a 
list of ids.


My current solution is:

list of ids: 15, 5, 1, 10, 3

query: q=*:*&fq=(id:((15) OR (5) OR (1) OR (10) OR 
(3)))&sort=query($idqsort) desc,id asc&idqsort=id:((15^5) OR (5^4) OR 
(1^3) OR (10^2) OR (3^1))&start=0&rows=5


Do you know an other solution to sort by a list of ids?

Thanks!

Johannes

Re: increase connections on tomcat

2015-03-12 Thread Shawn Heisey


On 3/11/2015 10:53 AM, SolrUser1543 wrote:

does it apply to solr 4.10 ? or only to  solr 5 ?


The information I provided is not version-specific.  It would apply to 
either version you listed and at least some of the previous 4.x versions.


Thanks,
Shawn

Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping

2015-03-12 Thread Shamik Bandopadhyay

Hi,

   I've a field which is being used for result grouping. Here's the field
definition.



This started once I did a rolling update from 4.7 to 5.0. I started getting
the error on any group by query --> "SolrDispatchFilter null:java.lang.
IllegalStateException: unexpected docvalues type NONE for field 'ADSKDedup'
(expected=SORTED). Use UninvertingReader or dex with docvalues."

Does this mean that I need to re-index documents to get over this error ?

Regards,
Shamik

Re: increase connections on tomcat

2015-03-12 Thread SolrUser1543

I investigated my tomcat7 configuration. 
I have founded that we  work in BIO mode. 
I consider to switch to NIO mode. 

 what are recommendation in this case? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/increase-connections-on-tomcat-tp4192405p4192602.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Where is schema.xml and solrconfig.xml in solr 5.0.0

2015-03-12 Thread Nitin Solanki

Hi. Erick..
   Would please help me distinguish between
Uploading a Configuration Directory and Linking a Collection to a
Configuration Set ?

On Thu, Mar 12, 2015 at 2:01 AM, Nitin Solanki  wrote:

> Thanks a lot Erick.. It will be helpful.
>
> On Wed, Mar 11, 2015 at 9:27 PM, Erick Erickson 
> wrote:
>
>> The configs are in Zookeeper. So you have to switch your thinking,
>> it's rather confusing at first.
>>
>> When you create a collection, you specify a "config set", these are
>> usually in
>>
>> ./server/solr/configsets/data_driven_schema,
>> ./server/solr/configsets/techproducts and the like.
>>
>> The entire conf directory under one of these is copied to Zookeeper
>> (which you can see
>> from the admin screen cloud>>tree, then in the right hand side you'll
>> be able to find the config sets
>> you uploaded.
>>
>> But, you cannot edit them there directly. You edit them on disk, then
>> push them to Zookeeper,
>> then reload the collection (or restart everything). See the reference
>> guide here:
>> https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities
>>
>> Best,
>> Erick
>>
>> On Wed, Mar 11, 2015 at 6:01 AM, Nitin Solanki 
>> wrote:
>> > Hi, alexandre..
>> >
>> > Thanks for responding...
>> > When I created new collection(wikingram) using solrCloud. It gets create
>> > into example/cloud/node*(node1, node2) like that.
>> > I have used *schema.xml and solrconfig.xml of
>> sample_techproducts_configs*
>> > configuration.
>> >
>> > Now, The problem is that.
>> > If I change the configuration of *solrconfig.xml of *
>> > *sample_techproducts_configs*. Its configuration doesn't reflect on
>> > *wikingram* collection.
>> > How to reflect the changes of configuration in the collection?
>> >
>> > On Wed, Mar 11, 2015 at 5:42 PM, Alexandre Rafalovitch <
>> arafa...@gmail.com>
>> > wrote:
>> >
>> >> Which example are you using? Or how are you creating your collection?
>> >>
>> >> If you are using your example, it creates a new directory under
>> >> "example". If you are creating a new collection with "-c", it creates
>> >> a new directory under the "server/solr". The actual files are a bit
>> >> deeper than usual to allow for a log folder next to the collection
>> >> folder. So, for example:
>> >> "example/schemaless/solr/gettingstarted/conf/solrconfig.xml"
>> >>
>> >> If it's a dynamic schema configuration, you don't actually have
>> >> schema.xml, but managed-schema, as you should be mostly using REST
>> >> calls to configure it.
>> >>
>> >> If you want to see the configuration files before the collection
>> >> actually created, they are under "server/solr/configsets", though they
>> >> are not configsets in Solr sense, as they do get copied when you
>> >> create your collections (sharing them causes issues).
>> >>
>> >> Regards,
>> >>Alex.
>> >> 
>> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> >> http://www.solr-start.com/
>> >>
>> >>
>> >> On 11 March 2015 at 07:50, Nitin Solanki  wrote:
>> >> > Hello,
>> >> >I have switched from solr 4.10.2 to solr 5.0.0. In
>> solr
>> >> > 4-10.2, schema.xml and solrconfig.xml were in example/solr/conf/
>> folder.
>> >> > Where is schema.xml and solrconfig.xml in solr 5.0.0 ? and also want
>> to
>> >> > know how to configure in solrcloud ?
>> >>
>>
>
>

Re: default heap size for solr 5.0? (-Xmx param)

2015-03-12 Thread Erick Erickson

Well, the new way will be the only way eventually, so either you learn
the old way then switch or learn it now ;)...

But if you insist you could start with a heap size of 4G like this:

java -Xmx4G -Xms4G -jar start.jar

Best,
Erick


On Wed, Mar 11, 2015 at 1:09 PM, Karl Kildén  wrote:
> Thanks!
>
> I am using the old way and I see no reason to switch really?
>
> cheers
>
> On 11 March 2015 at 20:18, Shawn Heisey  wrote:
>
>> On 3/11/2015 12:25 PM, Karl Kildén wrote:
>> > I am a solr beginner. Anyone knows how solr 5.0 determines the max heap
>> > size? I can't find it anywhere.
>> >
>> > Also, where whould you activate jmx? Would like to be able to use
>> visualvm
>> > in the future I imagine.
>> >
>> > I have a custom nssm thing going that installs it as a window service
>> that
>> > simply calls java -jar start.jar
>>
>> The default heap size is 512m.  This is hardcoded in the bin/solr
>> script.  You can override that with the -m parameter.
>>
>> If you are not using the bin/solr script and are instead doing the old
>> "java -jar start.jar" startup, the default heap size is determined by
>> the version of Java you are running.
>>
>> Thanks,
>> Shawn
>>
>>

java.nio.channels.CancelledKeyException

2015-03-12 Thread Nitin Solanki

Hi,
I am indexing documents on Solr 4.10.2. While indexing, I am
getting this error in log -

java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at 
org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at 
org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
at 
org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
at 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169)

What does it means? Will it skip the current index documents? Or anything else?

Please Help...

Re: Jetty version

2015-03-12 Thread Aman Tandon

Hi,

I am not sure but when i am looking into the server/lib directory then i am
able to see the version 8.1 with all those lib files present in that
folder. So i am guessing its version 8.1.

I confirmed it by downloading the new jetty server which was jetty-9.2 and
i found the same version on jetty libraries.

With Regards
Aman Tandon

On Thu, Mar 12, 2015 at 12:19 PM, Philippe de Rochambeau 
wrote:

> Hello,
>
> which jetty version does solr 5 integrate?
>
> Cheers,
>
> Philippe
>

Re: default heap size for solr 5.0? (-Xmx param)

2015-03-12 Thread Aman Tandon

You could also check the default memory by starting solr with the -V
parameter for verbose output. It will show your output like this.

If your are startinf solr with script present in bin directory using this
command
*./solr -c -V*

Using Solr root directory: /data/solr/aman/solr_cloud/solr-5.0.0
> Using Java: java
> java version "1.7.0_75"
> OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~trusty1)
> OpenJDK Server VM (build 24.75-b04, mixed mode)
> Backing up /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server/logs/solr.log
> Backing up /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server/logs/solr_gc.log
> Starting Solr using the following settings:
> JAVA= java
> SOLR_SERVER_DIR = /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server
> SOLR_HOME   = /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server/solr
> SOLR_HOST   =
> SOLR_PORT   = 4567
> STOP_PORT   = 3567
> *SOLR_JAVA_MEM   = -Xms512m -Xmx512m*
> GC_TUNE = -XX:NewRatio=3 -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4
> -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
> -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled
> -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80
> GC_LOG_OPTS = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> -Xloggc:/data/solr/aman/solr_cloud/solr-5.0.0/server/logs/solr_gc.log
> SOLR_TIMEZONE   = UTC
> CLOUD_MODE_OPTS = -DzkClientTimeout=15000 -DzkHost=192.168.6.217:2181,
> 192.168.5.81:2181,192.168.5.236:2181
>



With Regards
Aman Tandon

On Thu, Mar 12, 2015 at 1:19 PM, Karl Kildén  wrote:

> Actually the reason I did not use the solr script was that I didn't really
> get how to make a window service out of it from nssm.exe. I tried doing a
> .bat that called solr with start -p 8983 but seems it just loops my command
> rather then run it.
>
> Thanks for the help / Karl
>
> On 11 March 2015 at 23:08, Erick Erickson  wrote:
>
> > Well, the new way will be the only way eventually, so either you learn
> > the old way then switch or learn it now ;)...
> >
> > But if you insist you could start with a heap size of 4G like this:
> >
> > java -Xmx4G -Xms4G -jar start.jar
> >
> > Best,
> > Erick
> >
> >
> > On Wed, Mar 11, 2015 at 1:09 PM, Karl Kildén 
> > wrote:
> > > Thanks!
> > >
> > > I am using the old way and I see no reason to switch really?
> > >
> > > cheers
> > >
> > > On 11 March 2015 at 20:18, Shawn Heisey  wrote:
> > >
> > >> On 3/11/2015 12:25 PM, Karl Kildén wrote:
> > >> > I am a solr beginner. Anyone knows how solr 5.0 determines the max
> > heap
> > >> > size? I can't find it anywhere.
> > >> >
> > >> > Also, where whould you activate jmx? Would like to be able to use
> > >> visualvm
> > >> > in the future I imagine.
> > >> >
> > >> > I have a custom nssm thing going that installs it as a window
> service
> > >> that
> > >> > simply calls java -jar start.jar
> > >>
> > >> The default heap size is 512m.  This is hardcoded in the bin/solr
> > >> script.  You can override that with the -m parameter.
> > >>
> > >> If you are not using the bin/solr script and are instead doing the old
> > >> "java -jar start.jar" startup, the default heap size is determined by
> > >> the version of Java you are running.
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> > >>
> >
>

Re: default heap size for solr 5.0? (-Xmx param)

2015-03-12 Thread Aman Tandon

Just a small correction

> If your are startinf solr with script present in bin directory using this
> command
> *./solr -c -V*


*./solr start -c -V*

With Regards
Aman Tandon

On Thu, Mar 12, 2015 at 4:05 PM, Aman Tandon 
wrote:

> You could also check the default memory by starting solr with the -V
> parameter for verbose output. It will show your output like this.
>
> If your are startinf solr with script present in bin directory using this
> command
> *./solr -c -V*
>
> Using Solr root directory: /data/solr/aman/solr_cloud/solr-5.0.0
>> Using Java: java
>> java version "1.7.0_75"
>> OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~trusty1)
>> OpenJDK Server VM (build 24.75-b04, mixed mode)
>> Backing up /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server/logs/solr.log
>> Backing up /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server/logs/solr_gc.log
>> Starting Solr using the following settings:
>> JAVA= java
>> SOLR_SERVER_DIR = /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server
>> SOLR_HOME   = /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server/solr
>> SOLR_HOST   =
>> SOLR_PORT   = 4567
>> STOP_PORT   = 3567
>> *SOLR_JAVA_MEM   = -Xms512m -Xmx512m*
>> GC_TUNE = -XX:NewRatio=3 -XX:SurvivorRatio=4
>> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4
>> -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
>> -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
>> -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled
>> -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80
>> GC_LOG_OPTS = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails
>> -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
>> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
>> -Xloggc:/data/solr/aman/solr_cloud/solr-5.0.0/server/logs/solr_gc.log
>> SOLR_TIMEZONE   = UTC
>> CLOUD_MODE_OPTS = -DzkClientTimeout=15000 -DzkHost=192.168.6.217:2181
>> ,192.168.5.81:2181,192.168.5.236:2181
>>
>
>
>
> With Regards
> Aman Tandon
>
> On Thu, Mar 12, 2015 at 1:19 PM, Karl Kildén 
> wrote:
>
>> Actually the reason I did not use the solr script was that I didn't really
>> get how to make a window service out of it from nssm.exe. I tried doing a
>> .bat that called solr with start -p 8983 but seems it just loops my
>> command
>> rather then run it.
>>
>> Thanks for the help / Karl
>>
>> On 11 March 2015 at 23:08, Erick Erickson 
>> wrote:
>>
>> > Well, the new way will be the only way eventually, so either you learn
>> > the old way then switch or learn it now ;)...
>> >
>> > But if you insist you could start with a heap size of 4G like this:
>> >
>> > java -Xmx4G -Xms4G -jar start.jar
>> >
>> > Best,
>> > Erick
>> >
>> >
>> > On Wed, Mar 11, 2015 at 1:09 PM, Karl Kildén 
>> > wrote:
>> > > Thanks!
>> > >
>> > > I am using the old way and I see no reason to switch really?
>> > >
>> > > cheers
>> > >
>> > > On 11 March 2015 at 20:18, Shawn Heisey  wrote:
>> > >
>> > >> On 3/11/2015 12:25 PM, Karl Kildén wrote:
>> > >> > I am a solr beginner. Anyone knows how solr 5.0 determines the max
>> > heap
>> > >> > size? I can't find it anywhere.
>> > >> >
>> > >> > Also, where whould you activate jmx? Would like to be able to use
>> > >> visualvm
>> > >> > in the future I imagine.
>> > >> >
>> > >> > I have a custom nssm thing going that installs it as a window
>> service
>> > >> that
>> > >> > simply calls java -jar start.jar
>> > >>
>> > >> The default heap size is 512m.  This is hardcoded in the bin/solr
>> > >> script.  You can override that with the -m parameter.
>> > >>
>> > >> If you are not using the bin/solr script and are instead doing the
>> old
>> > >> "java -jar start.jar" startup, the default heap size is determined by
>> > >> the version of Java you are running.
>> > >>
>> > >> Thanks,
>> > >> Shawn
>> > >>
>> > >>
>> >
>>
>
>

Re: Jetty version

2015-03-12 Thread Ramkumar R. Aiyengar

Yes, Solr 5.0 uses Jetty 8.

FYI, the upcoming release 5.1 will move to Jetty 9.

Also, just in case it matters -- as noted in the 5.0 release notes, the use
of Jetty is now an implementation detail and we might move away from it in
the future -- so you shouldn't be depending on Solr using Jetty or a
particular version of Jetty.
On 12 Mar 2015 10:33, "Aman Tandon"  wrote:

> Hi,
>
> I am not sure but when i am looking into the server/lib directory then i am
> able to see the version 8.1 with all those lib files present in that
> folder. So i am guessing its version 8.1.
>
> I confirmed it by downloading the new jetty server which was jetty-9.2 and
> i found the same version on jetty libraries.
>
> With Regards
> Aman Tandon
>
> On Thu, Mar 12, 2015 at 12:19 PM, Philippe de Rochambeau 
> wrote:
>
> > Hello,
> >
> > which jetty version does solr 5 integrate?
> >
> > Cheers,
> >
> > Philippe
> >
>

Should I Use Solr

2015-03-12 Thread Pratik Thaker

Hi,

I am using Oracle 11g2 and we are having a schema where few tables are having 
more than 100 million rows (some of them are Varchar2 100 bytes). And we have 
to frequently do the LIKE based search on those tables. Sometimes we need to 
join the tables also. Insert / Updates are also happening very frequently for 
such tables (1000 insert / updates per second) by other applications.

So my question is, for my User Interface, should I use Apache Solr to let user 
search on these tables instead of SQL queries? I have tried SQL and it is 
really slow (considering amount of data I am having in my database).

My requirements are,

Result should come faster and it should be accurate.
It should have the latest data.
Can you suggest if I should go with Apache Solr, or another solution for my 
problem ?

Regards,
Pratik Thaker

The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.

[Poll]: User need for Solr security

2015-03-12 Thread Jan Høydahl

Hi,

Securing various Solr APIs has once again surfaced as a discussion in the 
developer list. See e.g. SOLR-7236
Would be useful to get some feedback from Solr users about needs "in the field".

Please reply to this email and let us know what security aspect(s) would be 
most important for your company to see supported in a future version of Solr.
Examples: Local user management, AD/LDAP integration, SSL, authenticated login 
to Admin UI, authorization for Admin APIs, e.g. admin user vs read-only user etc

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

Re: Missing doc fields

2015-03-12 Thread Dmitry Kan

that would explain it!

Luke tool (http://github.com/DmitryKey/luke) is also useful for such cases
or generally, when in need to check the field contents:



On Wed, Mar 11, 2015 at 12:50 PM,  wrote:

> Hello,
>
> I found the reason: the query to store ymds in SOLR was invalid ("json"
> and "literal" are concatenated below).
>
> curl -Ss -X POST '
> http://myserver:8990/solr/archives0/update/extract?extractFormat=text&wt=jsonliteral.ymd=1944-12-31T00:00:00A&literal.id=159168
>
>
> Philippe
>
>
>
> - Mail original -
> De: phi...@free.fr
> À: solr-user@lucene.apache.org
> Envoyé: Mercredi 11 Mars 2015 11:44:15
> Objet: Re: Missing doc fields
>
> I meant 'fl'.
>
> --
>
> http://myserver:8990/solr/archives0/select?q=*:*&rows=3&wt=json&fl=*
>
> --
>
>
> {"responseHeader":{"status":0,"QTime":3,"params":{"q":"*:*","fl":"*","rows":"3","wt":"json"}},"response":{"numFound":160238,"start":0,"docs":[{"id":"10","_version_":1495262519674011648},{"id":"1","_version_":1495262517261238272},{"id":"2","_version_":1495262517637677056}]}}
>
>
> -- schema.xml 
>
> 
>
>  
>
>
>
>
>
> required="true" multiValued="false" />
>
>
>
>
> 
>
> ---
>
>
>
>
>
> - Mail original -
> De: "Dmitry Kan" 
> À: solr-user@lucene.apache.org
> Envoyé: Mercredi 11 Mars 2015 11:38:26
> Objet: Re: Missing doc fields
>
> What is the ft parameter that you are sending?
>
>
> In order to see all stored fields use the parameter fl=*
>
> Or list the field names you need: fl=id,ymd
>
> On Wed, Mar 11, 2015 at 12:35 PM,  wrote:
>
> > When I run the following query,
> >
> >
> http://myserver:8990/solr/archives0/select?q=*:*&rows=3&wt=json&ft=id,ymd
> >
> > The response is
> >
> >
> >
> {"responseHeader":{"status":0,"QTime":1,"params":{"q":"*:*","rows":"3","wt":"json","ft":"id,ymd"}},"response":{"numFound":160238,"start":0,"docs":[{"id":"10","_version_":1495262519674011648},{"id":"1","_version_":1495262517261238272},{"id":"2","_version_":1495262517637677056}]}}
> >
> >
> > the ymd field does not appear in the list of document fields, although it
> > is defined in my schema.xml.
> >
> > Is there a way to tell SOLR to return that field in responses?
> >
> >
> > Philippe
> >
> >
> >
> > - Mail original -
> > De: phi...@free.fr
> > À: solr-user@lucene.apache.org
> > Envoyé: Mercredi 11 Mars 2015 11:06:29
> > Objet: Missing doc fields
> >
> >
> >
> > Hello,
> >
> > when I display one of my core's  schema, lots of fields appear:
> >
> > "fields":[{
> > "name":"_root_",
> > "type":"string",
> > "indexed":true,
> > "stored":false},
> >   {
> > "name":"_version_",
> > "type":"long",
> > "indexed":true,
> > "stored":true},
> >   {
> > "name":"id",
> > "type":"string",
> > "multiValued":false,
> > "indexed":true,
> > "required":true,
> > "stored":true},
> >   {
> > "name":"ymd",
> > "type":"tdate",
> > "indexed":true,
> > "stored":true}],
> >
> >
> >
> > Yet, when I display $results in the richtext_doc.vm Velocity template,
> > documents only contain three fields (id, _version_, score):
> >
> > SolrDocument{id=3, _version_=1495262517955395584, score=1.0},
> >
> >
> > How can I increase the number of doc fields?
> >
> > Many thanks.
> >
> > Philipppe
> >
>
>
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: DocumentAnalysisRequestHandler

2015-03-12 Thread Dmitry Kan

>> What is the modern equivalent of Luke?

It is same Luke, but polished:

http://github.com/DmitryKey/luke

On Thu, Mar 12, 2015 at 11:03 AM,  wrote:

> Hello,
>
> my solr logs say:
>
> INFO  - 2015-03-12 08:49:34.900; org.apache.solr.core.RequestHandlers;
> created /analysis/document: solr.DocumentAnalysisRequestHandler
> WARN  - 2015-03-12 08:49:34.919; org.apache.solr.core.SolrResourceLoader;
> Solr loaded a deprecated plugin/analysis class [solr.admin.AdminHandlers].
> Please consult documentation how to replace it accordingly.
>
>
> Is /analysis/document deprecated in SOLR 5?
>
>class="solr.DocumentAnalysisRequestHandler"
>   startup="lazy" />
>
>
> What is the modern equivalent of Luke?
>
> Many thanks.
>
> Philippe
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: [Poll]: User need for Solr security

2015-03-12 Thread Dmitry Kan

Hi,

Things you have mentioned would be useful for our use-case.

On top we've seen these two requests for securing Solr:

1. Encrypting the index (with a customer private key for instance). There
are certainly other ways to go about this, like using virtual private
clouds, but having the feature in solr could allow multitenant Solr
installations.

2. ACLs: giving access rights to parts of the index / document sets
depending on the user access rights.

On Thu, Mar 12, 2015 at 1:32 PM, Jan Høydahl  wrote:

> Hi,
>
> Securing various Solr APIs has once again surfaced as a discussion in the
> developer list. See e.g. SOLR-7236
> Would be useful to get some feedback from Solr users about needs "in the
> field".
>
> Please reply to this email and let us know what security aspect(s) would
> be most important for your company to see supported in a future version of
> Solr.
> Examples: Local user management, AD/LDAP integration, SSL, authenticated
> login to Admin UI, authorization for Admin APIs, e.g. admin user vs
> read-only user etc
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
>

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: how to change configurations in solrcloud setup

2015-03-12 Thread Shawn Heisey

On 3/11/2015 10:45 PM, Aman Tandon wrote:
>> You may need to manually remove the 127.0.1.1 entries from zookeeper
>> after you fix the IP address problem.
> 
> 
> How to do that?

The zkcli script included with Solr should have everything you need --
getfile, putfile, and clear ... but that would be a rather frustrating
way to handle it.  You won't be able to accomplish your goal by only
deleting znodes, you'll have to edit some json structures and replace
them in zookeeper.  The main thing you'll need to edit is the
clusterstate.json ... this is a single "file" in Solr 4.x, in 5.0 it has
changed to a clusterstate for every collection.

There are not very many GUI clients for zookeeper.  The only one that
I've really found is the one that is a plugin for eclipse.  I happen to
use eclipse, so this is fairly convenient for me:

http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper

Thanks,
Shawn

Re: Creating a directory resource in solr-jetty

2015-03-12 Thread Shawn Heisey

On 3/11/2015 7:38 AM, phi...@free.fr wrote:
> does anyone if it is possible to create a directory resource in the 
> solr-jetty configuration files?
> 
> In Tomcat 8, you can do the following:
> 
> 
>  
> className="org.apache.catalina.webresources.DirResourceSet"
> base="/mnt/archive_pdf/PDF/IHT"
> webAppMount="/arcpdf0"
> />

This is a question that you'd need to ask in a Jetty support venue.  I
don't know the answer, and from the lack of response, I would guess that
nobody else who has seen your question knows the answer either.  This
container config has nothing to do with Solr at all ... most people here
are only familiar with those pieces of container config that affect Solr.

http://eclipse.org/jetty/mailinglists.php

I hate to turn you away without giving you an answer ... if I knew, I
would ignore the fact that this is off topic, and give you the answer.

Thanks,
Shawn

Re: Update solr schema.xml in real time for Solr 4.10.1

2015-03-12 Thread Shawn Heisey

On 3/12/2015 2:00 AM, Zheng Lin Edwin Yeo wrote:
> I understand that in Solr 5.0, they provide a REST API to do real-time
> update of the schema using Curl. However, I could not do that for my
> eariler version of Solr 4.10.1.
> 
> Would like to check, is this function available for the earlier version of
> Solr, and is the curl syntax the same as Solr 5.0?

Providing a way to simply edit the config files directly is a potential
security issue.  We briefly had a way to edit those configs right in the
admin UI, but Redhat reported this capability as a security problem, so
we removed it.  I don't remember whether there is a way to re-enable
this functionality.

The Schema REST API is available in 4.10.  It was also present in 4.9.
Currently you can only *add* to the schema, you cannot edit what's
already there.

Thanks,
Shawn

Re: Should I Use Solr

2015-03-12 Thread Shawn Heisey

On 3/12/2015 5:03 AM, Pratik Thaker wrote:
> I am using Oracle 11g2 and we are having a schema where few tables are having 
> more than 100 million rows (some of them are Varchar2 100 bytes). And we have 
> to frequently do the LIKE based search on those tables. Sometimes we need to 
> join the tables also. Insert / Updates are also happening very frequently for 
> such tables (1000 insert / updates per second) by other applications.
> 
> So my question is, for my User Interface, should I use Apache Solr to let 
> user search on these tables instead of SQL queries? I have tried SQL and it 
> is really slow (considering amount of data I am having in my database).
> 
> My requirements are,
> 
> Result should come faster and it should be accurate.
> It should have the latest data.
> Can you suggest if I should go with Apache Solr, or another solution for my 
> problem ?

Solr will do what you want.  I have essentially the same situation,
except the database is MySQL.  We have just over 100 million total
documents.  Our add/update rate is much lower than yours.

For a fully redundant setup, I am running two copies of the index on
four Solr servers that each have 64GB of RAM.  It's a distributed index
that's not running SolrCloud, two servers are required to house one
complete copy of the index.  The total index size on each pair of
servers is about 150GB.

Thanks,
Shawn

Re: Creating a directory resource in solr-jetty

2015-03-12 Thread phiroc

Hi Shawn,

here is the Jetty Mailing List's reply concerning my question.

Unfortunately, this solution won't work with SOLR Jetty, because its version is 
< 9.

Philippe

--

Just ensure you don't have a /WEB-INF/ directory, and you can use this on Jetty 
9.2.9+

http://www.eclipse.org/jetty/configure_9_0.dtd";>

  /example
  /mnt/iiiparnex01_pdf/PDF/III/

- Mail original -
De: "Shawn Heisey" 
À: solr-user@lucene.apache.org
Envoyé: Jeudi 12 Mars 2015 13:59:49
Objet: Re: Creating a directory resource in solr-jetty

On 3/11/2015 7:38 AM, phi...@free.fr wrote:
> does anyone if it is possible to create a directory resource in the 
> solr-jetty configuration files?
> 
> In Tomcat 8, you can do the following:
> 
> 
>  
> className="org.apache.catalina.webresources.DirResourceSet"
> base="/mnt/archive_pdf/PDF/IHT"
> webAppMount="/arcpdf0"
> />

This is a question that you'd need to ask in a Jetty support venue.  I
don't know the answer, and from the lack of response, I would guess that
nobody else who has seen your question knows the answer either.  This
container config has nothing to do with Solr at all ... most people here
are only familiar with those pieces of container config that affect Solr.

http://eclipse.org/jetty/mailinglists.php

I hate to turn you away without giving you an answer ... if I knew, I
would ignore the fact that this is off topic, and give you the answer.

Thanks,
Shawn

Re: sort by given order

2015-03-12 Thread Erick Erickson

Not unless you can somehow codify that sort order at index time, but
I'm assuming the sort order changes dynamically.

You can also sort by function, but that's not really useful.

Or, if these are relatively short lists, you can sort at the app layer.

Best,
Erick

On Thu, Mar 12, 2015 at 2:16 AM, Johannes Siegert
 wrote:
> Hi,
>
> i want to sort my documents by a given order. The order is defined by a list
> of ids.
>
> My current solution is:
>
> list of ids: 15, 5, 1, 10, 3
>
> query: q=*:*&fq=(id:((15) OR (5) OR (1) OR (10) OR
> (3)))&sort=query($idqsort) desc,id asc&idqsort=id:((15^5) OR (5^4) OR (1^3)
> OR (10^2) OR (3^1))&start=0&rows=5
>
> Do you know an other solution to sort by a list of ids?
>
> Thanks!
>
> Johannes

Re: Where is schema.xml and solrconfig.xml in solr 5.0.0

2015-03-12 Thread Erick Erickson

By and large, I really never use linking. But it's about associating a
config set
you've _already_ uploaded with a collection.

So uploading is pushing the configset from your local machine up to Zookeeper,
and linking is using that uploaded, named configuration with an
arbitrary collection.

But usually you just make this association when creating the collection.

It's simple to test all this out, just upconfig a couple of config
sets, play with the linking
and reload the collections. From there the admin UI will show you what actually
happened.

Best,
Erick

On Thu, Mar 12, 2015 at 2:39 AM, Nitin Solanki  wrote:
> Hi. Erick..
>Would please help me distinguish between
> Uploading a Configuration Directory and Linking a Collection to a
> Configuration Set ?
>
> On Thu, Mar 12, 2015 at 2:01 AM, Nitin Solanki  wrote:
>
>> Thanks a lot Erick.. It will be helpful.
>>
>> On Wed, Mar 11, 2015 at 9:27 PM, Erick Erickson 
>> wrote:
>>
>>> The configs are in Zookeeper. So you have to switch your thinking,
>>> it's rather confusing at first.
>>>
>>> When you create a collection, you specify a "config set", these are
>>> usually in
>>>
>>> ./server/solr/configsets/data_driven_schema,
>>> ./server/solr/configsets/techproducts and the like.
>>>
>>> The entire conf directory under one of these is copied to Zookeeper
>>> (which you can see
>>> from the admin screen cloud>>tree, then in the right hand side you'll
>>> be able to find the config sets
>>> you uploaded.
>>>
>>> But, you cannot edit them there directly. You edit them on disk, then
>>> push them to Zookeeper,
>>> then reload the collection (or restart everything). See the reference
>>> guide here:
>>> https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities
>>>
>>> Best,
>>> Erick
>>>
>>> On Wed, Mar 11, 2015 at 6:01 AM, Nitin Solanki 
>>> wrote:
>>> > Hi, alexandre..
>>> >
>>> > Thanks for responding...
>>> > When I created new collection(wikingram) using solrCloud. It gets create
>>> > into example/cloud/node*(node1, node2) like that.
>>> > I have used *schema.xml and solrconfig.xml of
>>> sample_techproducts_configs*
>>> > configuration.
>>> >
>>> > Now, The problem is that.
>>> > If I change the configuration of *solrconfig.xml of *
>>> > *sample_techproducts_configs*. Its configuration doesn't reflect on
>>> > *wikingram* collection.
>>> > How to reflect the changes of configuration in the collection?
>>> >
>>> > On Wed, Mar 11, 2015 at 5:42 PM, Alexandre Rafalovitch <
>>> arafa...@gmail.com>
>>> > wrote:
>>> >
>>> >> Which example are you using? Or how are you creating your collection?
>>> >>
>>> >> If you are using your example, it creates a new directory under
>>> >> "example". If you are creating a new collection with "-c", it creates
>>> >> a new directory under the "server/solr". The actual files are a bit
>>> >> deeper than usual to allow for a log folder next to the collection
>>> >> folder. So, for example:
>>> >> "example/schemaless/solr/gettingstarted/conf/solrconfig.xml"
>>> >>
>>> >> If it's a dynamic schema configuration, you don't actually have
>>> >> schema.xml, but managed-schema, as you should be mostly using REST
>>> >> calls to configure it.
>>> >>
>>> >> If you want to see the configuration files before the collection
>>> >> actually created, they are under "server/solr/configsets", though they
>>> >> are not configsets in Solr sense, as they do get copied when you
>>> >> create your collections (sharing them causes issues).
>>> >>
>>> >> Regards,
>>> >>Alex.
>>> >> 
>>> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>>> >> http://www.solr-start.com/
>>> >>
>>> >>
>>> >> On 11 March 2015 at 07:50, Nitin Solanki  wrote:
>>> >> > Hello,
>>> >> >I have switched from solr 4.10.2 to solr 5.0.0. In
>>> solr
>>> >> > 4-10.2, schema.xml and solrconfig.xml were in example/solr/conf/
>>> folder.
>>> >> > Where is schema.xml and solrconfig.xml in solr 5.0.0 ? and also want
>>> to
>>> >> > know how to configure in solrcloud ?
>>> >>
>>>
>>
>>

Re: Creating a directory resource in solr-jetty

2015-03-12 Thread Shawn Heisey

On 3/12/2015 8:17 AM, phi...@free.fr wrote:
> here is the Jetty Mailing List's reply concerning my question.
>
> Unfortunately, this solution won't work with SOLR Jetty, because its version 
> is < 9.

The trunk branch of the Solr source code (version 6.0 development) is
already running Jetty 9.2.9.  I have seen two committers say that 5.1
will be upgraded to Jetty 9 as well, though this needs to happen very
soon, or it won't make the cutoff and may get pushed back to 5.2.

Thanks,
Shawn

Re: [Poll]: User need for Solr security

2015-03-12 Thread Erick Erickson

About <1>. Gotta be careful here about what would be promised. You
really _can't_ encrypt the _indexed_ terms in a meaningful way and
still search. And, as you well know, you can reconstruct documents
from the indexed terms. It's lossy, but still coherent enough to give
security folks fits.

For instance, to do a wildcard search I need to have the "run" in
"run" match "running", "runner" "runs" etc. Any but trivial encryption
will break that, and the trivial encryption is easy to break.

So putting all this over an encrypting filesystem is an approach
that's often used.

FWIW


On Thu, Mar 12, 2015 at 5:22 AM, Dmitry Kan  wrote:
> Hi,
>
> Things you have mentioned would be useful for our use-case.
>
> On top we've seen these two requests for securing Solr:
>
> 1. Encrypting the index (with a customer private key for instance). There
> are certainly other ways to go about this, like using virtual private
> clouds, but having the feature in solr could allow multitenant Solr
> installations.
>
> 2. ACLs: giving access rights to parts of the index / document sets
> depending on the user access rights.
>
>
>
> On Thu, Mar 12, 2015 at 1:32 PM, Jan Høydahl  wrote:
>
>> Hi,
>>
>> Securing various Solr APIs has once again surfaced as a discussion in the
>> developer list. See e.g. SOLR-7236
>> Would be useful to get some feedback from Solr users about needs "in the
>> field".
>>
>> Please reply to this email and let us know what security aspect(s) would
>> be most important for your company to see supported in a future version of
>> Solr.
>> Examples: Local user management, AD/LDAP integration, SSL, authenticated
>> login to Admin UI, authorization for Admin APIs, e.g. admin user vs
>> read-only user etc
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>>
>
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info

Re: Update solr schema.xml in real time for Solr 4.10.1

2015-03-12 Thread Erick Erickson

Actually I ran across a neat IntelliJ plugin that you could install
and directly edit ZK files. And I'm pretty sure there are stand-alone
programs that do this, but they are all outside Solr.

I'm not sure what "real time update of the schema" is for, would you
(Zheng) explain further? Collections _must_ be reloaded for schema
changes to take effect so I'm not quite sure what you're referring to.

Nitin:
The usual process is to have the master config be local, change the
local version then upload it to ZK with the upconfig option in zkCli,
then reload your collection.

Best,
Erick

On Thu, Mar 12, 2015 at 6:04 AM, Shawn Heisey  wrote:
> On 3/12/2015 2:00 AM, Zheng Lin Edwin Yeo wrote:
>> I understand that in Solr 5.0, they provide a REST API to do real-time
>> update of the schema using Curl. However, I could not do that for my
>> eariler version of Solr 4.10.1.
>>
>> Would like to check, is this function available for the earlier version of
>> Solr, and is the curl syntax the same as Solr 5.0?
>
> Providing a way to simply edit the config files directly is a potential
> security issue.  We briefly had a way to edit those configs right in the
> admin UI, but Redhat reported this capability as a security problem, so
> we removed it.  I don't remember whether there is a way to re-enable
> this functionality.
>
> The Schema REST API is available in 4.10.  It was also present in 4.9.
> Currently you can only *add* to the schema, you cannot edit what's
> already there.
>
> Thanks,
> Shawn
>

Re: Where is schema.xml and solrconfig.xml in solr 5.0.0

2015-03-12 Thread Shawn Heisey

On 3/12/2015 9:18 AM, Erick Erickson wrote:
> By and large, I really never use linking. But it's about associating a
> config set
> you've _already_ uploaded with a collection.
>
> So uploading is pushing the configset from your local machine up to Zookeeper,
> and linking is using that uploaded, named configuration with an
> arbitrary collection.
>
> But usually you just make this association when creating the collection.

The primary use case that I see for linkconfig is in testing upgrades to
configurations.  So let's say you have a production collection that uses
a config that you name fooV1 for foo version 1.  You can build a test
collection that uses a config named fooV2, work out all the bugs, and
then when you're ready to deploy it, you can use linkconfig to link your
production collection to fooV2, reload the collection, and you're using
the new config.  I haven't discussed here how to handle the situation
where a reindex is required.

One thing you CAN do is run linkconfig for a collection that doesn't
exist yet, and then you don't need to include collection.configName when
you create the collection, because the link is already present in
zookeeper.  I personally don't like doing things this way, but I'm
pretty sure it works.

Thanks,
Shawn

Re: DocumentAnalysisRequestHandler

2015-03-12 Thread Jack Krupansky

Yes, the admin handlers are deprecated because they are now implicit - no
need to specify them in solrconfig. Yeah, the doc is very unclear on that
point, but in CHANGES.TXT: "*AdminHandlers is deprecated , /admin/* are
implicitly defined, /get ,/replication and handlers are also implicitly
registered (refer to SOLR-6792)*". IOW, remove the  XML
element from your solrconfig.

As far as the document analysis request handler, that should still be fine.
Are you encountering some problem? The first log line you gave is just an
INFO - information only, not a problem.

-- Jack Krupansky

On Thu, Mar 12, 2015 at 5:03 AM,  wrote:

> Hello,
>
> my solr logs say:
>
> INFO  - 2015-03-12 08:49:34.900; org.apache.solr.core.RequestHandlers;
> created /analysis/document: solr.DocumentAnalysisRequestHandler
> WARN  - 2015-03-12 08:49:34.919; org.apache.solr.core.SolrResourceLoader;
> Solr loaded a deprecated plugin/analysis class [solr.admin.AdminHandlers].
> Please consult documentation how to replace it accordingly.
>
>
> Is /analysis/document deprecated in SOLR 5?
>
>class="solr.DocumentAnalysisRequestHandler"
>   startup="lazy" />
>
>
> What is the modern equivalent of Luke?
>
> Many thanks.
>
> Philippe
>

Re: How to configure Solr PostingsFormat block size

2015-03-12 Thread Tom Burton-West

Hi Hoss,

I created a wrapper class, compiled a jar and included an
org.apache.lucene.codecs.Codec file in META-INF/services in the jar file
with an entry for the wrapper class :HTPostingsFormatWrapper.   I created a
collection1/lib directory and put the jar there. (see below)

I'm getting the dread "ClassCastException Class.asSubclass(Unknown Source"
error (See below).

This is looking like a complex classloader issues.   Should I put the file
somewhere else and/or declare a lib directory in solrconfig.xml?

Any suggestions on how to troubleshoot this?.

Tom



error:
by: java.lang.ClassCastException: class
org.apache.lucene.codecs.HTPostingsFormatWrapper
 at java.lang.Class.asSubclass(Unknown Source)
 at org.apache.lucene.util.SPIClassIterator.next(SPIClassIterator.java:141)


---
Contents of the jar file:

C:\d\solr\lucene_solr_4_10_2\solr\example\solr\collection1\lib>jar -tvf
HTPostingsFormatWrapper.jar
25 Thu Mar 12 10:37:04 EDT 2015 META-INF/MANIFEST.MF
  1253 Thu Mar 12 10:37:04 EDT 2015
org/apache/lucene/codecs/HTPostingsFormatWrapper.class
  1276 Thu Mar 12 10:49:06 EDT 2015
META-INF/services/org.apache.lucene.codecs.Codec




Contents of  META-INF/services/org.apache.lucene.codecs.Codec in the jar
file:
org.apache.lucene.codecs.lucene49.Lucene49Codec
org.apache.lucene.codecs.lucene410.Lucene410Codec
# tbw adds custom wrapper here per Hoss e-mail
org.apache.lucene.codecs.HTPostingsFormatWrapper

-
log file excerpt with stack trace:

12821 [main] INFO  org.apache.solr.core.CoresLocator  – Looking for core
definitions underneath C:\d\solr\lucene_solr_4_10_2\solr\example\solr
12838 [main] INFO  org.apache.solr.core.CoresLocator  – Found core
collection1 in C:\d\solr\lucene_solr_4_10_2\solr\example\solr\collection1\
12839 [main] INFO  org.apache.solr.core.CoresLocator  – Found 1 core
definitions
12841 [coreLoadExecutor-5-thread-1] INFO
 org.apache.solr.core.SolrResourceLoader  – new SolrResourceLoader for
directory: 'C:\d\solr\lucene_solr_4_10_2\solr\example\solr\collection1\'
12842 [coreLoadExecutor-5-thread-1] INFO
 org.apache.solr.core.SolrResourceLoader  – Adding
'file:/C:/d/solr/lucene_solr_4_10_2/solr/example/solr/collection1/lib/HTPostingsFormatWrapper.jar'
to classloader
12870 [coreLoadExecutor-5-thread-1] ERROR
org.apache.solr.core.CoreContainer  – Error creating core [collection1]:
class org.apache.lucene.codecs.HTPostingsFormatWrapper
java.lang.ClassCastException: class
org.apache.lucene.codecs.HTPostingsFormatWrapper
at java.lang.Class.asSubclass(Unknown Source)
at org.apache.lucene.util.SPIClassIterator.next(SPIClassIterator.java:141)
at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:65)
at org.apache.lucene.codecs.Codec.reloadCodecs(Codec.java:119)
at
org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:206)
at
org.apache.solr.core.SolrResourceLoader.(SolrResourceLoader.java:142)
at
org.apache.solr.core.ConfigSetService$Default.createCoreResourceLoader(ConfigSetService.java:144)
at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:58)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:489)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

On Wed, Jan 14, 2015 at 6:05 PM, Chris Hostetter 
wrote:

>
> : As a foolish dev (not malicious I hope!), I did mess around with
> something
> : like this once; I was writing my own Codec.  I found I had to create a
> file
> : called META-INF/services/org.apache.lucene.codecs.Codec in my solr
> plugin jar
> : that contained the fully-qualified class name of my codec: I guess this
> : registers it with the SPI framework so it can be found by name?  I'm not
>
> Yep, that's how SPI works - the important bits are mentioned/linked in the
> PostingsFormat (and other SPI related classes in lucene) javadocs...
>
>
> https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/PostingsFormat.html
>
>
> https://docs.oracle.com/javase/7/docs/api/java/util/ServiceLoader.html?is-external=true
>
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>

error message This IndexSchema is not mutable with a classicSchemaIndexFactory

2015-03-12 Thread Pedro Figueiredo

Hi guys,

 

I saw an issue in Jira (https://issues.apache.org/jira/browse/SOLR-7234)
with status Resolved, but the resolution is not identified in the issue.

I am facing the exact same problem.. and not able to identified the
solution.

 

In the last comment of the issue, is said that this kind of questions should
be done in the solr-user mailing list.. 

So anyone. I'll appreciate any kind of help.

 

Thanks is advanced!

 

Best regards,

Pedro Figueiredo

Re: Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping

2015-03-12 Thread shamik

Well, I think I've narrowed down the issue. The error is happening when I'm
trying to do a rolling update from Solr 4.7 (which is our current version)
to 5.0 . I'm able to re-produce this couple of times. If I do a fresh index
on a 5.0, it works. Not sure if there's any other way to mitigate it. 

I'll appreciate if someone can share their experience on the same.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-0-migration-IllegalStateException-unexpected-docvalues-type-NONE-on-fields-using-docvalues-tp4192477p4192706.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping

2015-03-12 Thread Shawn Heisey

On 3/11/2015 4:45 PM, shamik wrote:
>  multiValued="false" required="false" omitNorms="true" docValues="true"
> /> 3/11/2015, 2:14:30 PM ERROR SolrDispatchFilter
> null:java.lang.IllegalStateException: unexpected docvalues type NONE
> for field 'DocumentType' (expected=SORTED). Use UninvertingReader or
> index with docvalues. null:java.lang.IllegalStateException: unexpected
> docvalues type NONE for field 'DocumentType' (expected=SORTED). Use
> UninvertingReader or index with docvalues.

I admit right up front that I know very little about what might be
happening here ... but I did have one idea.  It could be completely wrong.

Is it possible that you have an index that fits the following description?

The field originally did not have docValues.  You enabled docValues on
the field in the schema, but there are index segments still in the index
directory from *before* you changed the schema.

If that sounds at all possible, then if you did not fully reindex, there
would be segments with valid documents that do not have docValues.  You
should fully reindex and then optimize before upgrading.  If you did
fully reindex, but did not optimize, then there might be segments with
*deleted* documents that do not have docValues ... and maybe 4.7 was
fine with that but 5.0 isn't.

Whenever I upgrade Solr, I always reindex from scratch, and often I will
completely delete all the data directories.  It takes longer, but then I
know the index is 100% correct for the version and config I'm running.

I'll reiterate that this whole idea could be 100% wrong.

Thanks,
Shawn

Re: error message This IndexSchema is not mutable with a classicSchemaIndexFactory

2015-03-12 Thread Alexandre Rafalovitch

The answer meant it was most likely something user has done not quite
understanding Solr's behavior. Not a bug. I'd ignore that case and
just explain what your issue actually is.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 12 March 2015 at 14:43, Pedro Figueiredo
 wrote:
> Hi guys,
>
>
>
> I saw an issue in Jira (https://issues.apache.org/jira/browse/SOLR-7234)
> with status Resolved, but the resolution is not identified in the issue.
>
> I am facing the exact same problem.. and not able to identified the
> solution.
>
>
>
> In the last comment of the issue, is said that this kind of questions should
> be done in the solr-user mailing list..
>
> So anyone. I'll appreciate any kind of help.
>
>
>
> Thanks is advanced!
>
>
>
> Best regards,
>
> Pedro Figueiredo
>

Re: Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping

2015-03-12 Thread Alexandre Rafalovitch

Do you have any really old segments in that index? Could be worth
trying to optimize them down to one in latest format first.

Like Shawn, this is just a "one more idea" proposal.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 12 March 2015 at 14:47, shamik  wrote:
> Well, I think I've narrowed down the issue. The error is happening when I'm
> trying to do a rolling update from Solr 4.7 (which is our current version)
> to 5.0 . I'm able to re-produce this couple of times. If I do a fresh index
> on a 5.0, it works. Not sure if there's any other way to mitigate it.
>
> I'll appreciate if someone can share their experience on the same.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-5-0-migration-IllegalStateException-unexpected-docvalues-type-NONE-on-fields-using-docvalues-tp4192477p4192706.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: [Poll]: User need for Solr security

2015-03-12 Thread Jan Høydahl

If you cannot trust your root users you probably have bigger problems than with 
search... I think it has been suggested to encrypt on codec or directory level 
as well. Yep, here is the JIRA 
https://issues.apache.org/jira/browse/LUCENE-2228 :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 12. mar. 2015 kl. 16.22 skrev Erick Erickson :
> 
> About <1>. Gotta be careful here about what would be promised. You
> really _can't_ encrypt the _indexed_ terms in a meaningful way and
> still search. And, as you well know, you can reconstruct documents
> from the indexed terms. It's lossy, but still coherent enough to give
> security folks fits.
> 
> For instance, to do a wildcard search I need to have the "run" in
> "run" match "running", "runner" "runs" etc. Any but trivial encryption
> will break that, and the trivial encryption is easy to break.
> 
> So putting all this over an encrypting filesystem is an approach
> that's often used.
> 
> FWIW
> 
> 
> On Thu, Mar 12, 2015 at 5:22 AM, Dmitry Kan  wrote:
>> Hi,
>> 
>> Things you have mentioned would be useful for our use-case.
>> 
>> On top we've seen these two requests for securing Solr:
>> 
>> 1. Encrypting the index (with a customer private key for instance). There
>> are certainly other ways to go about this, like using virtual private
>> clouds, but having the feature in solr could allow multitenant Solr
>> installations.
>> 
>> 2. ACLs: giving access rights to parts of the index / document sets
>> depending on the user access rights.
>> 
>> 
>> 
>> On Thu, Mar 12, 2015 at 1:32 PM, Jan Høydahl  wrote:
>> 
>>> Hi,
>>> 
>>> Securing various Solr APIs has once again surfaced as a discussion in the
>>> developer list. See e.g. SOLR-7236
>>> Would be useful to get some feedback from Solr users about needs "in the
>>> field".
>>> 
>>> Please reply to this email and let us know what security aspect(s) would
>>> be most important for your company to see supported in a future version of
>>> Solr.
>>> Examples: Local user management, AD/LDAP integration, SSL, authenticated
>>> login to Admin UI, authorization for Admin APIs, e.g. admin user vs
>>> read-only user etc
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>>> 
>> 
>> 
>> --
>> Dmitry Kan
>> Luke Toolbox: http://github.com/DmitryKey/luke
>> Blog: http://dmitrykan.blogspot.com
>> Twitter: http://twitter.com/dmitrykan
>> SemanticAnalyzer: www.semanticanalyzer.info

RE: error message This IndexSchema is not mutable with a classicSchemaIndexFactory

2015-03-12 Thread Pedro Figueiredo

Hello Alex,

I'm trying to add a new document, using solrj and the error "This IndexSchema 
is not mutable" is raised when inserting the document in the solr index.
My index in solr, is configured with classicSchemaIndexFactory. 
If I change it to AutoManaged the insert is done without any problems.

I believe that there is no mutable configuration (true or false) for 
ClassicSchema as for AutoManaged. 

The document does not have any new field, all fields are specified in the 
schema.xml file.

Any thoughts!?

Thanks!
Pedro Figueiredo

De: Alexandre Rafalovitch [arafa...@gmail.com]
Enviado: quinta-feira, 12 de Março de 2015 19:04
Para: solr-user
Assunto: Re: error message This IndexSchema is not mutable with a 
classicSchemaIndexFactory

The answer meant it was most likely something user has done not quite
understanding Solr's behavior. Not a bug. I'd ignore that case and
just explain what your issue actually is.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 12 March 2015 at 14:43, Pedro Figueiredo
 wrote:
> Hi guys,
>
>
>
> I saw an issue in Jira (https://issues.apache.org/jira/browse/SOLR-7234)
> with status Resolved, but the resolution is not identified in the issue.
>
> I am facing the exact same problem.. and not able to identified the
> solution.
>
>
>
> In the last comment of the issue, is said that this kind of questions should
> be done in the solr-user mailing list..
>
> So anyone. I'll appreciate any kind of help.
>
>
>
> Thanks is advanced!
>
>
>
> Best regards,
>
> Pedro Figueiredo
>

SSD endurance

2015-03-12 Thread Toke Eskildsen

For those who have not yet taken the leap to SSD goodness because they are 
afraid of flash wear, the burnout test from The Tech Report seems worth a read. 
The short story is that they wrote data to the drives until they wore out. All 
tested drives survived considerably longer than guaranteed, but 4/6 failed 
catastrophically when they did die. 

http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead

I am disappointed about the catastrophic failures. One of the promises of SSDs 
was graceful end of life by switching to read-only mode. Some of them did give 
warnings before the end, but I wonder how those are communicated in a server 
environment?


Regarding Lucene/Solr, the write pattern when updating an index is benign to 
SSDs: Updates are relatively bulky, rather than the evil 
constantly-flip-random-single-bits-and-flush pattern of databases. With 
segments being immutable, the bird's eye view is that Lucene creates and 
deletes large files, which makes it possible for the SSD's wear-leveler to 
select the least-used flash sectors for new writes: The write pattern over time 
is not too far from the one that The Tech Report tested with.

- Toke Eskildsen
Whose trusty old 160GB Intel X25-M reports an accumulated 36TB of writes.

RE: error message This IndexSchema is not mutable with a classicSchemaIndexFactory

2015-03-12 Thread Chris Hostetter


what does your schema.xml look like?

what does your solrconfig.xml look like?

what does the document you are indexing look like?

what is the full error with stack trace from your server logs?

details matter.

https://wiki.apache.org/solr/UsingMailingLists


: Date: Thu, 12 Mar 2015 20:27:05 +
: From: Pedro Figueiredo 
: Reply-To: solr-user@lucene.apache.org
: To: "solr-user@lucene.apache.org" 
: Subject: RE: error message This IndexSchema is not mutable with a
: classicSchemaIndexFactory
: 
: Hello Alex,
: 
: I'm trying to add a new document, using solrj and the error "This IndexSchema 
is not mutable" is raised when inserting the document in the solr index.
: My index in solr, is configured with classicSchemaIndexFactory. 
: If I change it to AutoManaged the insert is done without any problems.
: 
: I believe that there is no mutable configuration (true or false) for 
ClassicSchema as for AutoManaged. 
: 
: The document does not have any new field, all fields are specified in the 
schema.xml file.
: 
: Any thoughts!?
: 
: Thanks!
: Pedro Figueiredo
: 
: De: Alexandre Rafalovitch [arafa...@gmail.com]
: Enviado: quinta-feira, 12 de Março de 2015 19:04
: Para: solr-user
: Assunto: Re: error message This IndexSchema is not mutable with a 
classicSchemaIndexFactory
: 
: The answer meant it was most likely something user has done not quite
: understanding Solr's behavior. Not a bug. I'd ignore that case and
: just explain what your issue actually is.
: 
: Regards,
:Alex.
: 
: Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
: http://www.solr-start.com/
: 
: 
: On 12 March 2015 at 14:43, Pedro Figueiredo
:  wrote:
: > Hi guys,
: >
: >
: >
: > I saw an issue in Jira (https://issues.apache.org/jira/browse/SOLR-7234)
: > with status Resolved, but the resolution is not identified in the issue.
: >
: > I am facing the exact same problem.. and not able to identified the
: > solution.
: >
: >
: >
: > In the last comment of the issue, is said that this kind of questions should
: > be done in the solr-user mailing list..
: >
: > So anyone. I'll appreciate any kind of help.
: >
: >
: >
: > Thanks is advanced!
: >
: >
: >
: > Best regards,
: >
: > Pedro Figueiredo
: >
: 

-Hoss
http://www.lucidworks.com/

Re: Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping

2015-03-12 Thread shamik

Wow, "optimize" worked like a charm. This really addressed the docvalues
issue. A follow-up question, is it recommended to run optimize in a
Production Solr index ? Also, in a Sorl cloud mode, do we need to run
optimize on each instance / each shard / any instance ?

Appreciate your help Alex.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-0-migration-IllegalStateException-unexpected-docvalues-type-NONE-on-fields-using-docvalues-tp4192477p4192732.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping

2015-03-12 Thread Alexandre Rafalovitch

Manual optimize is no longer needed for modern Solr. It does great
optimization automatically. The only reason I recommended it here is
to make sure that all segments are brought up to the latest version
and the deleted documents are purged. That's something that also would
happen automatically eventually, but "eventually" was not an option
for you.

I am glad this helped. I am not 100% sure if you have to do it on each
shard in SolrCloud mode, but I suspect so.

Regards,
  Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 12 March 2015 at 17:24, shamik  wrote:
> Wow, "optimize" worked like a charm. This really addressed the docvalues
> issue. A follow-up question, is it recommended to run optimize in a
> Production Solr index ? Also, in a Sorl cloud mode, do we need to run
> optimize on each instance / each shard / any instance ?
>
> Appreciate your help Alex.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-5-0-migration-IllegalStateException-unexpected-docvalues-type-NONE-on-fields-using-docvalues-tp4192477p4192732.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Best way to dump out entire solr content?

2015-03-12 Thread vsriram30

Hi All,

I am having a solr cloud cluster of 20 nodes with each node having close to
20 Million records and total index size is around 400GB ( 20GB per node X 20
nodes ). I am trying to know the best way to dump out the entire solr data
in say CSV format. 

I use successive queries by incrementing the start param with 2000 and
keeping the rows as 2000 and hitting each individual servers using
distrib=false so that I don't overload the top level server and causing any
timeouts between top level and lower level servers. I am getting response
from solr very quickly when the start param is in lower millions < 2
millions. As the start param grows towards 16 million, solr takes almost 2
to 3 minutes to return back those 2000 records for a single query. I assume
this is because of skipping all the lower level index positions to get to
that start index of > 16 millions and then provide the results.

Is there any better way to do this? I saw cursor feature in solr pagination
Wiki but it is mentioned that it is for sort on a unique field. Would it
make sense for my use this to sort on my solr key field(Solr unique key
field) with rows as 2000 and keep on using the nextCursorMark to dump out
all the documents in csv format?

Thanks,
Sriram




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best way to dump out entire solr content?

2015-03-12 Thread Alexandre Rafalovitch

Well, it's cursor or nothing. Well, or some sort of custom code to
manually read Lucene indexes (good luck with deleted items, etc).

I think your understanding is correct.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 12 March 2015 at 18:10, vsriram30  wrote:
> Hi All,
>
> I am having a solr cloud cluster of 20 nodes with each node having close to
> 20 Million records and total index size is around 400GB ( 20GB per node X 20
> nodes ). I am trying to know the best way to dump out the entire solr data
> in say CSV format.
>
> I use successive queries by incrementing the start param with 2000 and
> keeping the rows as 2000 and hitting each individual servers using
> distrib=false so that I don't overload the top level server and causing any
> timeouts between top level and lower level servers. I am getting response
> from solr very quickly when the start param is in lower millions < 2
> millions. As the start param grows towards 16 million, solr takes almost 2
> to 3 minutes to return back those 2000 records for a single query. I assume
> this is because of skipping all the lower level index positions to get to
> that start index of > 16 millions and then provide the results.
>
> Is there any better way to do this? I saw cursor feature in solr pagination
> Wiki but it is mentioned that it is for sort on a unique field. Would it
> make sense for my use this to sort on my solr key field(Solr unique key
> field) with rows as 2000 and keep on using the nextCursorMark to dump out
> all the documents in csv format?
>
> Thanks,
> Sriram
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734.html
> Sent from the Solr - User mailing list archive at Nabble.com.

RE: SSD endurance

2015-03-12 Thread Markus Jelsma

Thanks for sharing Toke! 

Reliability should not be a problem for a Solr cloud environment. A corrupted 
index cannot be loaded due to exceptions so the core should not enter an active 
state. However, what would happen if parts of the data become corrupted but can 
still be processed by the codec? I don't even know if the data has a CRC check 
to guard against such madness?

Markus
 
-Original message-
> From:Toke Eskildsen 
> Sent: Thursday 12th March 2015 21:33
> To: solr-user 
> Subject: SSD endurance
> 
> For those who have not yet taken the leap to SSD goodness because they are 
> afraid of flash wear, the burnout test from The Tech Report seems worth a 
> read. The short story is that they wrote data to the drives until they wore 
> out. All tested drives survived considerably longer than guaranteed, but 4/6 
> failed catastrophically when they did die. 
> 
> http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead
> 
> I am disappointed about the catastrophic failures. One of the promises of 
> SSDs was graceful end of life by switching to read-only mode. Some of them 
> did give warnings before the end, but I wonder how those are communicated in 
> a server environment?
> 
> 
> Regarding Lucene/Solr, the write pattern when updating an index is benign to 
> SSDs: Updates are relatively bulky, rather than the evil 
> constantly-flip-random-single-bits-and-flush pattern of databases. With 
> segments being immutable, the bird's eye view is that Lucene creates and 
> deletes large files, which makes it possible for the SSD's wear-leveler to 
> select the least-used flash sectors for new writes: The write pattern over 
> time is not too far from the one that The Tech Report tested with.
> 
> - Toke Eskildsen
> Whose trusty old 160GB Intel X25-M reports an accumulated 36TB of writes.
>

Re: [Poll]: User need for Solr security

2015-03-12 Thread Henrique O. Santos

Hi,

I’m currently working with indexes that need document level security. Based on 
the user logged in, query results would omit documents that this user doesn’t 
have access to, with LDAP integration and such.

I think that would be nice to have on a future Solr release.

Henrique.

> On Mar 12, 2015, at 7:32 AM, Jan Høydahl  wrote:
> 
> Hi,
> 
> Securing various Solr APIs has once again surfaced as a discussion in the 
> developer list. See e.g. SOLR-7236
> Would be useful to get some feedback from Solr users about needs "in the 
> field".
> 
> Please reply to this email and let us know what security aspect(s) would be 
> most important for your company to see supported in a future version of Solr.
> Examples: Local user management, AD/LDAP integration, SSL, authenticated 
> login to Admin UI, authorization for Admin APIs, e.g. admin user vs read-only 
> user etc
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>

Re: error message This IndexSchema is not mutable with a classicSchemaIndexFactory

2015-03-12 Thread Shawn Heisey

On 3/12/2015 12:43 PM, Pedro Figueiredo wrote:
> I saw an issue in Jira (https://issues.apache.org/jira/browse/SOLR-7234)
> with status Resolved, but the resolution is not identified in the issue.
>
> I am facing the exact same problem.. and not able to identified the
> solution.

I believe the problem is that you are using ClassicSchemaIndexFactory,
but you did not remove AddSchemaFieldsUpdateProcessorFactory from the
updateRequestProcessorChain config.  That update processor requires the
managed schema factory.

Chances are that you started with the data-driven example config set and
then realized you did not need/want the managed schema, so you switched
to the classic factory.  If you do not want the managed schema, you
should probably start with the techproducts example rather than the
data-driven example.

I think we need to add some info to the schemaFactory comment in the
data-driven example config so that people know they need to also modify
the update processor chain when they want to disable the Schema API.

Thanks,
Shawn

Re: SSD endurance

2015-03-12 Thread Alexandre Rafalovitch

Lucene 5 has added a lot of various CRCs to catch index corruption
situations. I don't know if it is 'perfect', but there was certainly a
lot of work.

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 12 March 2015 at 18:39, Markus Jelsma  wrote:
> Thanks for sharing Toke!
>
> Reliability should not be a problem for a Solr cloud environment. A corrupted 
> index cannot be loaded due to exceptions so the core should not enter an 
> active state. However, what would happen if parts of the data become 
> corrupted but can still be processed by the codec? I don't even know if the 
> data has a CRC check to guard against such madness?
>
> Markus
>
> -Original message-
>> From:Toke Eskildsen 
>> Sent: Thursday 12th March 2015 21:33
>> To: solr-user 
>> Subject: SSD endurance
>>
>> For those who have not yet taken the leap to SSD goodness because they are 
>> afraid of flash wear, the burnout test from The Tech Report seems worth a 
>> read. The short story is that they wrote data to the drives until they wore 
>> out. All tested drives survived considerably longer than guaranteed, but 4/6 
>> failed catastrophically when they did die.
>>
>> http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead
>>
>> I am disappointed about the catastrophic failures. One of the promises of 
>> SSDs was graceful end of life by switching to read-only mode. Some of them 
>> did give warnings before the end, but I wonder how those are communicated in 
>> a server environment?
>>
>>
>> Regarding Lucene/Solr, the write pattern when updating an index is benign to 
>> SSDs: Updates are relatively bulky, rather than the evil 
>> constantly-flip-random-single-bits-and-flush pattern of databases. With 
>> segments being immutable, the bird's eye view is that Lucene creates and 
>> deletes large files, which makes it possible for the SSD's wear-leveler to 
>> select the least-used flash sectors for new writes: The write pattern over 
>> time is not too far from the one that The Tech Report tested with.
>>
>> - Toke Eskildsen
>> Whose trusty old 160GB Intel X25-M reports an accumulated 36TB of writes.
>>

RE: [Poll]: User need for Solr security

2015-03-12 Thread Markus Jelsma

Jan - we don't really need any security for our products, nor for most clients. 
However, one client does deal with very sensitive data so we proposed to 
encrypt the transfer of data and the data on disk through a Lucene Directory. 
It won't fill all gaps but it would adhere to such a client's guidelines. 

I think many approaches of security in Solr/Lucene would find advocates, be it 
index encryption or authentication/authorization or transport security, which 
is now possible. I understand the reluctance of the PMC, and i agree with it, 
but some users would definitately benefit and it would certainly make 
Solr/Lucene the search platform to use for some enterprises.

Markus 

-Original message-
> From:Henrique O. Santos 
> Sent: Thursday 12th March 2015 23:43
> To: solr-user@lucene.apache.org
> Subject: Re: [Poll]: User need for Solr security
> 
> Hi,
> 
> I’m currently working with indexes that need document level security. Based 
> on the user logged in, query results would omit documents that this user 
> doesn’t have access to, with LDAP integration and such.
> 
> I think that would be nice to have on a future Solr release.
> 
> Henrique.
> 
> > On Mar 12, 2015, at 7:32 AM, Jan Høydahl  wrote:
> > 
> > Hi,
> > 
> > Securing various Solr APIs has once again surfaced as a discussion in the 
> > developer list. See e.g. SOLR-7236
> > Would be useful to get some feedback from Solr users about needs "in the 
> > field".
> > 
> > Please reply to this email and let us know what security aspect(s) would be 
> > most important for your company to see supported in a future version of 
> > Solr.
> > Examples: Local user management, AD/LDAP integration, SSL, authenticated 
> > login to Admin UI, authorization for Admin APIs, e.g. admin user vs 
> > read-only user etc
> > 
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> > 
> 
>

RE: SSD endurance

2015-03-12 Thread Markus Jelsma

Hello Alexandre - if you, and others, allow me to be a bit lazy right now; are 
there unit tests that input corrupted segments, where not the structure but the 
data is affected, to the codec?

Thanks,
Markus

 
 
-Original message-
> From:Alexandre Rafalovitch 
> Sent: Thursday 12th March 2015 23:52
> To: solr-user 
> Subject: Re: SSD endurance
> 
> Lucene 5 has added a lot of various CRCs to catch index corruption
> situations. I don't know if it is 'perfect', but there was certainly a
> lot of work.
> 
> Regards,
> Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
> 
> 
> On 12 March 2015 at 18:39, Markus Jelsma  wrote:
> > Thanks for sharing Toke!
> >
> > Reliability should not be a problem for a Solr cloud environment. A 
> > corrupted index cannot be loaded due to exceptions so the core should not 
> > enter an active state. However, what would happen if parts of the data 
> > become corrupted but can still be processed by the codec? I don't even know 
> > if the data has a CRC check to guard against such madness?
> >
> > Markus
> >
> > -Original message-
> >> From:Toke Eskildsen 
> >> Sent: Thursday 12th March 2015 21:33
> >> To: solr-user 
> >> Subject: SSD endurance
> >>
> >> For those who have not yet taken the leap to SSD goodness because they are 
> >> afraid of flash wear, the burnout test from The Tech Report seems worth a 
> >> read. The short story is that they wrote data to the drives until they 
> >> wore out. All tested drives survived considerably longer than guaranteed, 
> >> but 4/6 failed catastrophically when they did die.
> >>
> >> http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead
> >>
> >> I am disappointed about the catastrophic failures. One of the promises of 
> >> SSDs was graceful end of life by switching to read-only mode. Some of them 
> >> did give warnings before the end, but I wonder how those are communicated 
> >> in a server environment?
> >>
> >>
> >> Regarding Lucene/Solr, the write pattern when updating an index is benign 
> >> to SSDs: Updates are relatively bulky, rather than the evil 
> >> constantly-flip-random-single-bits-and-flush pattern of databases. With 
> >> segments being immutable, the bird's eye view is that Lucene creates and 
> >> deletes large files, which makes it possible for the SSD's wear-leveler to 
> >> select the least-used flash sectors for new writes: The write pattern over 
> >> time is not too far from the one that The Tech Report tested with.
> >>
> >> - Toke Eskildsen
> >> Whose trusty old 160GB Intel X25-M reports an accumulated 36TB of writes.
> >>
>

Re: backport Heliosearch features to Solr

2015-03-12 Thread Damien Kamerman

Are there any results of off-heap cache vs JRE 8 with G1GC?

On 10 March 2015 at 11:13, Alexandre Rafalovitch  wrote:

> Ask and you shall receive:
> SOLR-7210 Off-Heap filter cache
> SOLR-7211 Off-Heap field cache
> SOLR-7212 Parameter substitution
> SOLR-7214 JSON Facet API
> SOLR-7216 JSON Request API
>
> Regards,
>Alex.
> P.s. Oh, the power of GMail filters :-)
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 9 March 2015 at 18:59, Markus Jelsma 
> wrote:
> > Ok, so what's next? Do you intend to open issues and send the links over
> here so interested persons can follow them? Clearly some would like to see
> features to merge. Let's see what the PMC thinks about it :)
> >
> > Cheers,
> > M.
> >
> > -Original message-
> >> From:Yonik Seeley 
> >> Sent: Monday 9th March 2015 19:53
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: backport Heliosearch features to Solr
> >>
> >> Thanks everyone for voting!
> >>
> >> Result charts (note that these auto-generated charts don't show blanks
> >> as equivalent to "0")
> >>
> https://docs.google.com/forms/d/1gaMpNpHVdquA3q75yiFhqZhAWdWB-K6N8Jh3dBbWAU8/viewanalytics
> >>
> >> Raw results spreadsheet (correlations can be interesting), and
> >> percentages at the bottom.
> >>
> https://docs.google.com/spreadsheets/d/1uZ2qgOaKx1ZxJ_NKwj2zIAYFQ9fp8OrEPI5hqadcPeY/
> >>
> >> -Yonik
> >>
> >>
> >> On Sun, Mar 1, 2015 at 4:50 PM, Yonik Seeley  wrote:
> >> > As many of you know, I've been doing some work in the experimental
> >> > "heliosearch" fork of Solr over the past year.  I think it's time to
> >> > bring some more of those changes back.
> >> >
> >> > So here's a poll: Which Heliosearch features do you think should be
> >> > brought back to Apache Solr?
> >> >
> >> > http://bit.ly/1E7wi1Q
> >> > (link to google form)
> >> >
> >> > -Yonik
> >>
>



-- 
Damien Kamerman

Re: Best way to dump out entire solr content?

2015-03-12 Thread vsriram30

Thanks Alex for quick response. I wanted to avoid reading the lucene index to
prevent complications of merging deleted info. Also I would like to do this
on very frequent basis as well like once in two or three days.

I am wondering if the issues that I faced while scraping the index towards
higher order of millions will get resolved with Cursor. Do you think using
cursor to scrap solr with sort on unique key field is better than not using
it and does it not do the same skip operations and take more time as without
using cursor?

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734p4192745.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best way to dump out entire solr content?

2015-03-12 Thread Alexandre Rafalovitch

Without cursor, you are rerunning a full search every time. So, slow
down is entirely expected.

With cursor, you do not. It does an internal skip based on cursor
value. I think the sort is there to ensure the value is stable.

Basically, you need to use the cursor.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 12 March 2015 at 19:05, vsriram30  wrote:
> Thanks Alex for quick response. I wanted to avoid reading the lucene index to
> prevent complications of merging deleted info. Also I would like to do this
> on very frequent basis as well like once in two or three days.
>
> I am wondering if the issues that I faced while scraping the index towards
> higher order of millions will get resolved with Cursor. Do you think using
> cursor to scrap solr with sort on unique key field is better than not using
> it and does it not do the same skip operations and take more time as without
> using cursor?
>
> Thanks,
> Sriram
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734p4192745.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: SSD endurance

2015-03-12 Thread Alexandre Rafalovitch

Well, I don't know this issue to such level of granularity. Perhaps others do.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 12 March 2015 at 18:57, Markus Jelsma  wrote:
> Hello Alexandre - if you, and others, allow me to be a bit lazy right now; 
> are there unit tests that input corrupted segments, where not the structure 
> but the data is affected, to the codec?
>
> Thanks,
> Markus
>
>
>
> -Original message-
>> From:Alexandre Rafalovitch 
>> Sent: Thursday 12th March 2015 23:52
>> To: solr-user 
>> Subject: Re: SSD endurance
>>
>> Lucene 5 has added a lot of various CRCs to catch index corruption
>> situations. I don't know if it is 'perfect', but there was certainly a
>> lot of work.
>>
>> Regards,
>> Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 12 March 2015 at 18:39, Markus Jelsma  wrote:
>> > Thanks for sharing Toke!
>> >
>> > Reliability should not be a problem for a Solr cloud environment. A 
>> > corrupted index cannot be loaded due to exceptions so the core should not 
>> > enter an active state. However, what would happen if parts of the data 
>> > become corrupted but can still be processed by the codec? I don't even 
>> > know if the data has a CRC check to guard against such madness?
>> >
>> > Markus
>> >
>> > -Original message-
>> >> From:Toke Eskildsen 
>> >> Sent: Thursday 12th March 2015 21:33
>> >> To: solr-user 
>> >> Subject: SSD endurance
>> >>
>> >> For those who have not yet taken the leap to SSD goodness because they 
>> >> are afraid of flash wear, the burnout test from The Tech Report seems 
>> >> worth a read. The short story is that they wrote data to the drives until 
>> >> they wore out. All tested drives survived considerably longer than 
>> >> guaranteed, but 4/6 failed catastrophically when they did die.
>> >>
>> >> http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead
>> >>
>> >> I am disappointed about the catastrophic failures. One of the promises of 
>> >> SSDs was graceful end of life by switching to read-only mode. Some of 
>> >> them did give warnings before the end, but I wonder how those are 
>> >> communicated in a server environment?
>> >>
>> >>
>> >> Regarding Lucene/Solr, the write pattern when updating an index is benign 
>> >> to SSDs: Updates are relatively bulky, rather than the evil 
>> >> constantly-flip-random-single-bits-and-flush pattern of databases. With 
>> >> segments being immutable, the bird's eye view is that Lucene creates and 
>> >> deletes large files, which makes it possible for the SSD's wear-leveler 
>> >> to select the least-used flash sectors for new writes: The write pattern 
>> >> over time is not too far from the one that The Tech Report tested with.
>> >>
>> >> - Toke Eskildsen
>> >> Whose trusty old 160GB Intel X25-M reports an accumulated 36TB of writes.
>> >>
>>

RE: backport Heliosearch features to Solr

2015-03-12 Thread Markus Jelsma

Hello - i would assume off-heap would out perform any heap based data 
structure. G1 is only useful if you deal with very large heaps, and it eats CPU 
at the same time. As much as G1 is better than CMS in same cases, you would 
still have less wasted CPU time and resp. less STW events.

Anyway. if someone has a setup at hand to provide details, please do :)
 
-Original message-
> From:Damien Kamerman 
> Sent: Friday 13th March 2015 0:02
> To: solr-user@lucene.apache.org
> Subject: Re: backport Heliosearch features to Solr
> 
> Are there any results of off-heap cache vs JRE 8 with G1GC?
> 
> On 10 March 2015 at 11:13, Alexandre Rafalovitch  wrote:
> 
> > Ask and you shall receive:
> > SOLR-7210 Off-Heap filter cache
> > SOLR-7211 Off-Heap field cache
> > SOLR-7212 Parameter substitution
> > SOLR-7214 JSON Facet API
> > SOLR-7216 JSON Request API
> >
> > Regards,
> >Alex.
> > P.s. Oh, the power of GMail filters :-)
> > 
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 9 March 2015 at 18:59, Markus Jelsma 
> > wrote:
> > > Ok, so what's next? Do you intend to open issues and send the links over
> > here so interested persons can follow them? Clearly some would like to see
> > features to merge. Let's see what the PMC thinks about it :)
> > >
> > > Cheers,
> > > M.
> > >
> > > -Original message-
> > >> From:Yonik Seeley 
> > >> Sent: Monday 9th March 2015 19:53
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Re: backport Heliosearch features to Solr
> > >>
> > >> Thanks everyone for voting!
> > >>
> > >> Result charts (note that these auto-generated charts don't show blanks
> > >> as equivalent to "0")
> > >>
> > https://docs.google.com/forms/d/1gaMpNpHVdquA3q75yiFhqZhAWdWB-K6N8Jh3dBbWAU8/viewanalytics
> > >>
> > >> Raw results spreadsheet (correlations can be interesting), and
> > >> percentages at the bottom.
> > >>
> > https://docs.google.com/spreadsheets/d/1uZ2qgOaKx1ZxJ_NKwj2zIAYFQ9fp8OrEPI5hqadcPeY/
> > >>
> > >> -Yonik
> > >>
> > >>
> > >> On Sun, Mar 1, 2015 at 4:50 PM, Yonik Seeley  wrote:
> > >> > As many of you know, I've been doing some work in the experimental
> > >> > "heliosearch" fork of Solr over the past year.  I think it's time to
> > >> > bring some more of those changes back.
> > >> >
> > >> > So here's a poll: Which Heliosearch features do you think should be
> > >> > brought back to Apache Solr?
> > >> >
> > >> > http://bit.ly/1E7wi1Q
> > >> > (link to google form)
> > >> >
> > >> > -Yonik
> > >>
> >
> 
> 
> 
> -- 
> Damien Kamerman
>

Re: Best way to dump out entire solr content?

2015-03-12 Thread vsriram30

Thanks Alex for explanation. Actually since I am scraping all the contents
from Solr, I am doing a generic query of *:* So I think it should not take
so much time right?

But as you say probably the internal skips using the cursor might be more
efficient than the skip done with increasing the start, I will use the
cursors. Kindly correct me if my understanding is not right.

Thanks,
Sriram



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734p4192750.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to store _text field

2015-03-12 Thread Mirko Torrisi


Hi folks,

I googled and tried without success so I ask you: how can I modify the 
setting of a field to store it ?


It is interesting to note that I did not add _text field so I guess it 
is a default one. Maybe it is normal that it is not showed on the result 
but actually this is my real problem. It could be grand also to copy it 
in a new field but I do not know how to do it with the last Solr (5) and 
the new kind of schema. I know that I have to use curl but I do not know 
how to use it to copy a field.


Thank you in advance!
Cheers,

 Mirko

Re: how to store _text field

2015-03-12 Thread Alexandre Rafalovitch

Wait, step back. This is confusing. What's your real problem you are
trying to solve?

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 12 March 2015 at 19:50, Mirko Torrisi  wrote:
> Hi folks,
>
> I googled and tried without success so I ask you: how can I modify the
> setting of a field to store it ?
>
> It is interesting to note that I did not add _text field so I guess it is a
> default one. Maybe it is normal that it is not showed on the result but
> actually this is my real problem. It could be grand also to copy it in a new
> field but I do not know how to do it with the last Solr (5) and the new kind
> of schema. I know that I have to use curl but I do not know how to use it to
> copy a field.
>
> Thank you in advance!
> Cheers,
>
>  Mirko

Re: [Poll]: User need for Solr security

2015-03-12 Thread johnmunir

I would love to see record level (or even field level) restricted access in 
Solr / Lucene.

This should be group level, LDAP like or some rule base (which can be dynamic). 
 If the solution means having a second core, so be it.

The following is the closest I found: 
https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security but I cannot 
use Manifold CF (Connector Framework).  Does anyone know how Manifold does it?

- MJ

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Thursday, March 12, 2015 6:51 PM
To: solr-user@lucene.apache.org
Subject: RE: [Poll]: User need for Solr security

Jan - we don't really need any security for our products, nor for most clients. 
However, one client does deal with very sensitive data so we proposed to 
encrypt the transfer of data and the data on disk through a Lucene Directory. 
It won't fill all gaps but it would adhere to such a client's guidelines. 

I think many approaches of security in Solr/Lucene would find advocates, be it 
index encryption or authentication/authorization or transport security, which 
is now possible. I understand the reluctance of the PMC, and i agree with it, 
but some users would definitately benefit and it would certainly make 
Solr/Lucene the search platform to use for some enterprises.

Markus 

-Original message-
> From:Henrique O. Santos 
> Sent: Thursday 12th March 2015 23:43
> To: solr-user@lucene.apache.org
> Subject: Re: [Poll]: User need for Solr security
> 
> Hi,
> 
> I’m currently working with indexes that need document level security. Based 
> on the user logged in, query results would omit documents that this user 
> doesn’t have access to, with LDAP integration and such.
> 
> I think that would be nice to have on a future Solr release.
> 
> Henrique.
> 
> > On Mar 12, 2015, at 7:32 AM, Jan Høydahl  wrote:
> > 
> > Hi,
> > 
> > Securing various Solr APIs has once again surfaced as a discussion 
> > in the developer list. See e.g. SOLR-7236 Would be useful to get some 
> > feedback from Solr users about needs "in the field".
> > 
> > Please reply to this email and let us know what security aspect(s) would be 
> > most important for your company to see supported in a future version of 
> > Solr.
> > Examples: Local user management, AD/LDAP integration, SSL, 
> > authenticated login to Admin UI, authorization for Admin APIs, e.g. 
> > admin user vs read-only user etc
> > 
> > --
> > Jan Høydahl, search solution architect Cominvent AS - 
> > www.cominvent.com
> > 
> 
>

Whole RAM consumed while Indexing.

2015-03-12 Thread Nitin Solanki

Hello,
  I have written a python script to do 2 documents indexing
each time on Solr. I have 28 GB RAM with 8 CPU.
When I started indexing, at that time 15 GB RAM was freed. While indexing,
all RAM is consumed but **not** a single document is indexed. Why so?
And it through *HTTPError: HTTP Error 503: Service Unavailable* in python
script.
I think it is due to heavy load on Zookeeper by which all nodes went down.
I am not sure about that. Any help please..
Or anything else is happening..
And how to overcome this issue.
Please assist me towards right path.
Thanks..

Warm Regards,
Nitin Solanki

Re: Whole RAM consumed while Indexing.

2015-03-12 Thread Alexandre Rafalovitch

What's your commit strategy? Explicit commits? Soft commits/hard
commits (in solrconfig.xml)?

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 12 March 2015 at 23:19, Nitin Solanki  wrote:
> Hello,
>   I have written a python script to do 2 documents indexing
> each time on Solr. I have 28 GB RAM with 8 CPU.
> When I started indexing, at that time 15 GB RAM was freed. While indexing,
> all RAM is consumed but **not** a single document is indexed. Why so?
> And it through *HTTPError: HTTP Error 503: Service Unavailable* in python
> script.
> I think it is due to heavy load on Zookeeper by which all nodes went down.
> I am not sure about that. Any help please..
> Or anything else is happening..
> And how to overcome this issue.
> Please assist me towards right path.
> Thanks..
>
> Warm Regards,
> Nitin Solanki

Re: Whole RAM consumed while Indexing.

2015-03-12 Thread Nitin Solanki

Hi Alexandre,


*Hard Commit* is :

 
   ${solr.autoCommit.maxTime:3000}
   false
 

*Soft Commit* is :


${solr.autoSoftCommit.maxTime:300}


And I am committing 2 documents each time.
Is it good config for committing?
Or I am good something wrong ?


On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch 
wrote:

> What's your commit strategy? Explicit commits? Soft commits/hard
> commits (in solrconfig.xml)?
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 12 March 2015 at 23:19, Nitin Solanki  wrote:
> > Hello,
> >   I have written a python script to do 2 documents indexing
> > each time on Solr. I have 28 GB RAM with 8 CPU.
> > When I started indexing, at that time 15 GB RAM was freed. While
> indexing,
> > all RAM is consumed but **not** a single document is indexed. Why so?
> > And it through *HTTPError: HTTP Error 503: Service Unavailable* in python
> > script.
> > I think it is due to heavy load on Zookeeper by which all nodes went
> down.
> > I am not sure about that. Any help please..
> > Or anything else is happening..
> > And how to overcome this issue.
> > Please assist me towards right path.
> > Thanks..
> >
> > Warm Regards,
> > Nitin Solanki
>

Re: Where is schema.xml and solrconfig.xml in solr 5.0.0

2015-03-12 Thread Nitin Solanki

Thanks Shawn and Erick for explanation...

On Thu, Mar 12, 2015 at 9:02 PM, Shawn Heisey  wrote:

> On 3/12/2015 9:18 AM, Erick Erickson wrote:
> > By and large, I really never use linking. But it's about associating a
> > config set
> > you've _already_ uploaded with a collection.
> >
> > So uploading is pushing the configset from your local machine up to
> Zookeeper,
> > and linking is using that uploaded, named configuration with an
> > arbitrary collection.
> >
> > But usually you just make this association when creating the collection.
>
> The primary use case that I see for linkconfig is in testing upgrades to
> configurations.  So let's say you have a production collection that uses
> a config that you name fooV1 for foo version 1.  You can build a test
> collection that uses a config named fooV2, work out all the bugs, and
> then when you're ready to deploy it, you can use linkconfig to link your
> production collection to fooV2, reload the collection, and you're using
> the new config.  I haven't discussed here how to handle the situation
> where a reindex is required.
>
> One thing you CAN do is run linkconfig for a collection that doesn't
> exist yet, and then you don't need to include collection.configName when
> you create the collection, because the link is already present in
> zookeeper.  I personally don't like doing things this way, but I'm
> pretty sure it works.
>
> Thanks,
> Shawn
>
>

Parsing error on space

2015-03-12 Thread Rajesh

Hi,

I want to retrieve the parent document which contain "Test Street" in street
field or if any of it's child contain "Test Street" in childStreet field.
So, I've used the following syntax.
q=street:"Test Street" OR {!parent which="type:parent"}childStreet:"Test
Street"

If the query after the OR condition is a parent query it's not executing.
I'm getting the records based on the first query alone. So, I tried using
the filter query as below.
 
q="*:*"&fq=street:"Test Street" OR {!parent
which="type:parent"}childStreet:"Test Street". This query retrieves records
based on both the condition, but when the query string contains multiple
words like "Test Street" I'm getting EOF exception and it's not parsing due
to space.

Any approach to overcome this.

Thanks in advance
Rajesh





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Parsing-error-on-space-tp4192796.html
Sent from the Solr - User mailing list archive at Nabble.com.

76 matches

Mail list logo