Re: Cores and and ranking (search quality)
SOLR-1632 will certainly help. But trying to predict whether your core A or core B will appear first doesn't really seem like a good use of time. If you actually have a setup like you describe, add &debug=all to your query on both cores and you'll see all the gory detail of how the scores are calculated, providing a definitive answer in _your_ situation. Best, Erick On Mon, Mar 9, 2015 at 5:44 AM, wrote: > (reposing this to see if anyone can help) > > > Help me understand this better (regarding ranking). > > If I have two docs that are 100% identical with the exception of uid (which > is stored but not indexed). In a single core setup, if I search "xyz" such > that those 2 docs end up ranking as #1 and #2. When I switch over to two > core setup, doc-A goes to core-A (which has 10 records) and doc-B goes to > core-B (which has 100,000 records). > > Now, are you saying in 2 core setup if I search on "xyz" (just like in singe > core setup) this time I will not see doc-A and doc-B as #1 and #2 in ranking? > That is, are you saying doc-A may now be somewhere at the top / bottom far > away from doc-B? If so, which will be #1: the doc off core-A (that has 10 > records) or doc-B off core-B (that has 100,000 records)? > > If I got all this right, are you saying SOLR-1632 will fix this issue such > that the end result will now be as if I had 1 core? > > - MJ > > > -Original Message- > From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] > Sent: Thursday, March 5, 2015 9:06 AM > To: solr-user@lucene.apache.org > Subject: Re: Cores and and ranking (search quality) > > On Thu, 2015-03-05 at 14:34 +0100, johnmu...@aol.com wrote: >> My question is this: if I put my data in multiple cores and use >> distributed search will the ranking be different if I had all my data >> in a single core? > > Yes, it will be different. The practical impact depends on how homogeneous > your data are across the shards and how large your shards are. If you have > small and dissimilar shards, your ranking will suffer a lot. > > Work is being done to remedy this: > https://issues.apache.org/jira/browse/SOLR-1632 > >> Also, will facet and more-like-this quality / result be the same? > > It is not formally guaranteed, but for most practical purposes, faceting on > multi-shards will give you the same results as single-shards. > > I don't know about more-like-this. My guess is that it will be affected in > the same way that standard searches are. > >> Also, reading the distributed search wiki >> (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr >> does the search and result merging (all I have to do is issue a >> search), is this correct? > > Yes. From a user-perspective, searches are no different. > > - Toke Eskildsen, State and University Library, Denmark >
Re: Solr TCP layer
IMO each mega of memory saved has more impact that 0.001 less in latency … an OOM is killer, a lag of 2 second … is not catastrophic. — /Yago Riveiro On Tue, Mar 10, 2015 at 4:03 PM, Erick Erickson wrote: > Just to pile on: > I admire your bravery! I'll add to the other comments only by saying > that _before_ you start down this path, you really need to articulate > the benefit/cost analysis. "to gain a little more communications > efficiency" will be a pretty hard sell due to the reasons Shawn > outlined. This is hugely risky and would require a lot of work for > as-yet-unarticulated benefits. > There are lots and lots of other things to work on of significantly > greater impact IMO. How would you like to work on something to help > manage Solr's memory usage for instance ;)? > Best, > Erick > On Mon, Mar 9, 2015 at 9:24 AM, Reitzel, Charles > wrote: >> A couple thoughts: >> 0. Interesting topic. >> 1. But perhaps better suited to the dev list. >> 2. Given the existing architecture, shouldn't we be looking to transport >> projects, e.g. Jetty, Apache HttpComponents, for support of new socket or >> even HTTP layer protocols? >> 3. To the extent such support exists, then integration work is still needed >> at the solr level. Shalin, is this your intention? >> >> Also, for those of us not tracking protocol standards in detail, can you >> describe the benefits to Solr users of http/2? >> >> Do you expect HTTP/2 to be transparent at the application layer? >> >> -Original Message- >> From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] >> Sent: Monday, March 09, 2015 6:23 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Solr TCP layer >> >> Hi Saumitra, >> >> I've been thinking of adding http/2 support for inter node communication >> initially and client server communication next in Solr. There's a patch for >> SPDY support but now that spdy is deprecated and http/2 is the new standard >> we need to wait for Jetty 9.3 to release. That will take care of many >> bottlenecks in solrcloud communication. The current trunk is already using >> jetty 9.2.x which has support for the draft http/2 spec. >> >> A brand new async TCP layer based on netty can be considered but that's a >> huge amount of work considering our need to still support simple http, SSL >> etc. Frankly for me that effort is better spent optimizing the routing layer. >> On 09-Mar-2015 1:37 am, "Saumitra Srivastav" >> wrote: >> >>> Dear Solr Contributors, >>> >>> I want to start working on adding a TCP layer for client to node and >>> inter-node communication. >>> >>> I am not up to date on recent changes happening to Solr. So before I >>> start looking into code, I would like to know if there is already some >>> work done in this direction, which I can reuse. Are there any know >>> challenges/complexities? >>> >>> I would appreciate any help to kick start this effort. Also, what >>> would be the best way to discuss and get feedback on design from >>> contributors? Open a JIRA?? >>> >>> Regards, >>> Saumitra >>> >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/Solr-TCP-layer-tp4191715.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >> * >> This e-mail may contain confidential or privileged information. >> If you are not the intended recipient, please notify the sender immediately >> and then delete it. >> >> TIAA-CREF >> *
Re: Combine multiple SOLR Query Results
You simply cannot compare scores from two separate queries, comparing them is meaningless. This appears to be an XY problem, you're asking _how_ to do something without telling us _what_ the end goal here is. >From your description, I really have no idea what you're trying to do. You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Mon, Mar 9, 2015 at 7:56 AM, Reitzel, Charles wrote: > Hi AnilJayanti, > > You shouldn't need 2 separate solr queries. Just make sure both 'track > name' and 'artist name' fields are queried. Solr will rank and sort the > results for you. > > e.q. q=foo&qf=trackName,artistName > > This is preferable for a number of reasons. I will be faster and simpler. > But, also, highlight results should be better. > > hth, > Charlie > > -Original Message- > From: aniljayanti [mailto:aniljaya...@yahoo.co.in] > Sent: Monday, March 09, 2015 6:20 AM > To: solr-user@lucene.apache.org > Subject: Combine multiple SOLR Query Results > > Hi, > > I am trying to work on combine multiple SOLR query results into single > result. Below is my case. > > 1. Look up search term against ‘track name’, log results > 2. Look up search term against ‘artist name’, log results of tracks by > those > artists > 3. Combine results > 4. results by score descending order. > > Using "text_general" fieldType for both track name and artist name. > copy fields are trackname and artistname > > Plase suggest me how to write solr Query to combine two solr results into > single result. > > Thanks in advance. > > AnilJayanti > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Combine-multiple-SOLR-Query-Results-tp4191816.html > Sent from the Solr - User mailing list archive at Nabble.com. > > * > This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender immediately > and then delete it. > > TIAA-CREF > *
Re: Solrcloud Index corruption
Ahhh, ok. When you reloaded the cores, did you do it core-by-core? Yes, but maybe we reloaded the wrong core or something like that. We also noticed that the startTime doesn't update in the admin-ui while switching between cores (you have to reload the page). We still use 4.8.1, so maybe it is fixed in a later version. We will see after our next upgrade, if not we will add an issue for it. Martin Erick Erickson schreef op 10.03.2015 18:21: Ahhh, ok. When you reloaded the cores, did you do it core-by-core? I can see how something could get dropped in that case. However, if you used the Collections API and two cores mysteriously failed to reload that would be a bug. Assuming the replicas in question were up and running at the time you reloaded. Thanks for letting us know what's going on. Erick On Tue, Mar 10, 2015 at 4:34 AM, Martin de Vries wrote: Hi, this _sounds_ like you somehow don't have indexed="true" set for the field in question. We investigated a lot more. The CheckIndex tool didn't find any error. We now think the following happened: - We changed the schema two months ago: we changed a field to indexed="true". We reloaded the cores, but two of them doesn't seem to be reloaded (maybe we forgot). - We reindexed all content. The new field worked fine. - We think the leader changed to a server that didn't reload the core - After that we field stopped working for new indexed documents Thanks for your help. Martin Erick Erickson schreef op 06.03.2015 17:02: bq: You say in our case some docs didn't made it to the node, but that's not really true: the docs can be found on the corrupted nodes when I search on ID. The docs are also complete. The problem is that the docs do not appear when I filter on certain fields this _sounds_ like you somehow don't have indexed="true" set for the field in question. But it also sounds like you're saying that search on that field works on some nodes but not on others, I'm assuming you're adding "&distrib=false" to verify this. It shouldn't be possible to have different schema.xml files on the different nodes, but you might try checking through the admin UI. Network burps shouldn't be related here. If the content is stored, then the info made it to Solr intact, so this issue shouldn't be related to that. Sounds like it may just be the bugs Mark is referencing, sorry I don't have the JIRA numbers right off. Best, Erick On Thu, Mar 5, 2015 at 4:46 PM, Shawn Heisey wrote: On 3/5/2015 3:13 PM, Martin de Vries wrote: I understand there is not a "master" in SolrCloud. In our case we use haproxy as a load balancer for every request. So when indexing every document will be sent to a different solr server, immediately after each other. Maybe SolrCloud is not able to handle that correctly? SolrCloud can handle that correctly, but currently sending index updates to a core that is not the leader of the shard will incur a significant performance hit, compared to always sending updates to the correct core. A small performance penalty would be understandable, because the request must be redirected, but what actually happens is a much larger penalty than anyone expected. We have an issue in Jira to investigate that performance issue and make it work as efficiently as possible. Indexing batches of documents is recommended, not sending one document per update request. General performance problems with Solr itself can lead to extremely odd and unpredictable behavior from SolrCloud. Most often these kinds of performance problems are related in some way to memory, either the java heap or available memory in the system. http://wiki.apache.org/solr/SolrPerformanceProblems [1] [1] Thanks, Shawn Links: -- [1] http://wiki.apache.org/solr/SolrPerformanceProblems [3] Links: -- [1] http://wiki.apache.org/solr/SolrPerformanceProblems [2] mailto:apa...@elyograg.org [3] http://wiki.apache.org/solr/SolrPerformanceProblems
Re: Chaining components in request handler
Would like to do it during querying. Thanks, Ashish On Tue, Mar 10, 2015 at 11:07 PM, Alexandre Rafalovitch wrote: > Is that during indexing or during query phase? > > Indexing has UpdateRequestProcessors (e.g. > http://www.solr-start.com/info/update-request-processors/ ) > Query has Components (e.g. Faceting, MoreLIkeThis, etc) > > Or something different? > > Regards, >Alex. > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 10 March 2015 at 13:34, Ashish Mukherjee > wrote: > > Hello, > > > > I would like to create a request handler which chains components in a > > particular sequence to return the result, similar to a Unix pipe. > > > > eg. Component 1 -> result1 -> Component 2 -> result2 > > > > result2 is final result returned. > > > > Component 1 may be a standard component, Component 2 may be out of the > box. > > > > Is there any tutorial which describes how to wire together components > like > > this in a single handler? > > > > Regards, > > Ashish >
Re: default heap size for solr 5.0? (-Xmx param)
Actually the reason I did not use the solr script was that I didn't really get how to make a window service out of it from nssm.exe. I tried doing a .bat that called solr with start -p 8983 but seems it just loops my command rather then run it. Thanks for the help / Karl On 11 March 2015 at 23:08, Erick Erickson wrote: > Well, the new way will be the only way eventually, so either you learn > the old way then switch or learn it now ;)... > > But if you insist you could start with a heap size of 4G like this: > > java -Xmx4G -Xms4G -jar start.jar > > Best, > Erick > > > On Wed, Mar 11, 2015 at 1:09 PM, Karl Kildén > wrote: > > Thanks! > > > > I am using the old way and I see no reason to switch really? > > > > cheers > > > > On 11 March 2015 at 20:18, Shawn Heisey wrote: > > > >> On 3/11/2015 12:25 PM, Karl Kildén wrote: > >> > I am a solr beginner. Anyone knows how solr 5.0 determines the max > heap > >> > size? I can't find it anywhere. > >> > > >> > Also, where whould you activate jmx? Would like to be able to use > >> visualvm > >> > in the future I imagine. > >> > > >> > I have a custom nssm thing going that installs it as a window service > >> that > >> > simply calls java -jar start.jar > >> > >> The default heap size is 512m. This is hardcoded in the bin/solr > >> script. You can override that with the -m parameter. > >> > >> If you are not using the bin/solr script and are instead doing the old > >> "java -jar start.jar" startup, the default heap size is determined by > >> the version of Java you are running. > >> > >> Thanks, > >> Shawn > >> > >> >
Update solr schema.xml in real time for Solr 4.10.1
Hi, I understand that in Solr 5.0, they provide a REST API to do real-time update of the schema using Curl. However, I could not do that for my eariler version of Solr 4.10.1. Would like to check, is this function available for the earlier version of Solr, and is the curl syntax the same as Solr 5.0? Regards, Edwin
Re: Invalid Date String:'1992-07-10T17'
On 3/10/2015 1:39 PM, Ryan, Michael F. (LNG-DAY) wrote: > You'll need to wrap the date in quotes, since it contains a colon: > > String a = "speechDate:\"1992-07-10T17:33:18Z\""; You could also escape the colons with a backslash. Here's another way to do it that doesn't require quotes or manual escaping: String d = "1992-07-10T17:33:18Z"; String a = "speechDate:" + ClientUtils.escapeQueryChars(d); If you wanted to go to the trouble of using StringBuilder instead of string concatenation for performance reasons, you could certainly do that. This is the class you need to import in order to use escapeQueryChars: http://lucene.apache.org/solr/4_10_2/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html Thanks, Shawn
Does Solr runs on MapR file system?
I tried to run Solr over HDFS following https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS when I was testing map reduce way of index generation. However, when I run Solr on MapRFS, solr gave error that could not recognize maprfs:// scheme in URI. Have anyone met similar issues? Thanks. -- Regards, Shenghua (Daniel) Wan
Re: Update solr schema.xml in real time for Solr 4.10.1
Hi Zheng, *** I understand that in Solr 5.0, they provide a REST API to do real-time update of the schema using Curl ** *. Would please help me how to do this? I need to update both schema.xml and solrconfig.xml in Solr 5.0 in SolrCloud. Your help is appreciated.. *Thanks Again..* On Thu, Mar 12, 2015 at 1:30 PM, Zheng Lin Edwin Yeo wrote: > Hi, > > I understand that in Solr 5.0, they provide a REST API to do real-time > update of the schema using Curl. However, I could not do that for my > eariler version of Solr 4.10.1. > > Would like to check, is this function available for the earlier version of > Solr, and is the curl syntax the same as Solr 5.0? > > Regards, > Edwin >
Missing doc fields
Hello, when I display one of my core's schema, lots of fields appear: "fields":[{ "name":"_root_", "type":"string", "indexed":true, "stored":false}, { "name":"_version_", "type":"long", "indexed":true, "stored":true}, { "name":"id", "type":"string", "multiValued":false, "indexed":true, "required":true, "stored":true}, { "name":"ymd", "type":"tdate", "indexed":true, "stored":true}], Yet, when I display $results in the richtext_doc.vm Velocity template, documents only contain three fields (id, _version_, score): SolrDocument{id=3, _version_=1495262517955395584, score=1.0}, How can I increase the number of doc fields? Many thanks. Philipppe
Re: increase connections on tomcat
On 3/11/2015 8:56 AM, SolrUser1543 wrote: > Client application which queries solr needs to increase a number of > simultaneously connections in order to improve performance ( in additional > to get solr results, it needs to get an internal resources like images. ) > But this increment has improved client performance, but caused degradation > in solr . > > what I think is that I need to increase a number of connection in order to > allow to more requests run between solr shards. > > How can I prove that I need? > How can I increase it on tomcat? ( on each shard ) Hopefully this isn't an XY problem. http://people.apache.org/~hossman/#xyproblem To accomplish what you have requested, you will want to increase the maxThreads parameter in the tomcat config. It defaults to 200, we have included a setting of 1 in the example jetty server. For most installations, a value of 1 means there is effectively no limit on the number of threads allowed. Solr will behave unpredictably if it is prevented from starting threads, and it is very easy to exceed 200 threads, especially if the container is serving requests for other things besides Solr. To configure more connections to other machines for distributed search, you need to configure the shard handler in your solrconfig.xml file. In particular you need to be worried about maxConnectionsPerHost and maxConnections. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests#DistributedRequests-ConfiguringtheShardHandlerFactory Thanks, Shawn
DocumentAnalysisRequestHandler
Hello, my solr logs say: INFO - 2015-03-12 08:49:34.900; org.apache.solr.core.RequestHandlers; created /analysis/document: solr.DocumentAnalysisRequestHandler WARN - 2015-03-12 08:49:34.919; org.apache.solr.core.SolrResourceLoader; Solr loaded a deprecated plugin/analysis class [solr.admin.AdminHandlers]. Please consult documentation how to replace it accordingly. Is /analysis/document deprecated in SOLR 5? What is the modern equivalent of Luke? Many thanks. Philippe
Re: test cases for solr cloud
Anyone please suggest With Regards Aman Tandon On Sat, Mar 7, 2015 at 9:55 PM, Aman Tandon wrote: > Hi, > > Please suggest me what should be the tests which i should run to check the > availability, query time, etc in my solr cloud setup. > > With Regards > Aman Tandon >
sort by given order
Hi, i want to sort my documents by a given order. The order is defined by a list of ids. My current solution is: list of ids: 15, 5, 1, 10, 3 query: q=*:*&fq=(id:((15) OR (5) OR (1) OR (10) OR (3)))&sort=query($idqsort) desc,id asc&idqsort=id:((15^5) OR (5^4) OR (1^3) OR (10^2) OR (3^1))&start=0&rows=5 Do you know an other solution to sort by a list of ids? Thanks! Johannes
Re: increase connections on tomcat
On 3/11/2015 10:53 AM, SolrUser1543 wrote: does it apply to solr 4.10 ? or only to solr 5 ? The information I provided is not version-specific. It would apply to either version you listed and at least some of the previous 4.x versions. Thanks, Shawn
Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping
Hi, I've a field which is being used for result grouping. Here's the field definition. This started once I did a rolling update from 4.7 to 5.0. I started getting the error on any group by query --> "SolrDispatchFilter null:java.lang. IllegalStateException: unexpected docvalues type NONE for field 'ADSKDedup' (expected=SORTED). Use UninvertingReader or dex with docvalues." Does this mean that I need to re-index documents to get over this error ? Regards, Shamik
Re: increase connections on tomcat
I investigated my tomcat7 configuration. I have founded that we work in BIO mode. I consider to switch to NIO mode. what are recommendation in this case? -- View this message in context: http://lucene.472066.n3.nabble.com/increase-connections-on-tomcat-tp4192405p4192602.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Where is schema.xml and solrconfig.xml in solr 5.0.0
Hi. Erick.. Would please help me distinguish between Uploading a Configuration Directory and Linking a Collection to a Configuration Set ? On Thu, Mar 12, 2015 at 2:01 AM, Nitin Solanki wrote: > Thanks a lot Erick.. It will be helpful. > > On Wed, Mar 11, 2015 at 9:27 PM, Erick Erickson > wrote: > >> The configs are in Zookeeper. So you have to switch your thinking, >> it's rather confusing at first. >> >> When you create a collection, you specify a "config set", these are >> usually in >> >> ./server/solr/configsets/data_driven_schema, >> ./server/solr/configsets/techproducts and the like. >> >> The entire conf directory under one of these is copied to Zookeeper >> (which you can see >> from the admin screen cloud>>tree, then in the right hand side you'll >> be able to find the config sets >> you uploaded. >> >> But, you cannot edit them there directly. You edit them on disk, then >> push them to Zookeeper, >> then reload the collection (or restart everything). See the reference >> guide here: >> https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities >> >> Best, >> Erick >> >> On Wed, Mar 11, 2015 at 6:01 AM, Nitin Solanki >> wrote: >> > Hi, alexandre.. >> > >> > Thanks for responding... >> > When I created new collection(wikingram) using solrCloud. It gets create >> > into example/cloud/node*(node1, node2) like that. >> > I have used *schema.xml and solrconfig.xml of >> sample_techproducts_configs* >> > configuration. >> > >> > Now, The problem is that. >> > If I change the configuration of *solrconfig.xml of * >> > *sample_techproducts_configs*. Its configuration doesn't reflect on >> > *wikingram* collection. >> > How to reflect the changes of configuration in the collection? >> > >> > On Wed, Mar 11, 2015 at 5:42 PM, Alexandre Rafalovitch < >> arafa...@gmail.com> >> > wrote: >> > >> >> Which example are you using? Or how are you creating your collection? >> >> >> >> If you are using your example, it creates a new directory under >> >> "example". If you are creating a new collection with "-c", it creates >> >> a new directory under the "server/solr". The actual files are a bit >> >> deeper than usual to allow for a log folder next to the collection >> >> folder. So, for example: >> >> "example/schemaless/solr/gettingstarted/conf/solrconfig.xml" >> >> >> >> If it's a dynamic schema configuration, you don't actually have >> >> schema.xml, but managed-schema, as you should be mostly using REST >> >> calls to configure it. >> >> >> >> If you want to see the configuration files before the collection >> >> actually created, they are under "server/solr/configsets", though they >> >> are not configsets in Solr sense, as they do get copied when you >> >> create your collections (sharing them causes issues). >> >> >> >> Regards, >> >>Alex. >> >> >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: >> >> http://www.solr-start.com/ >> >> >> >> >> >> On 11 March 2015 at 07:50, Nitin Solanki wrote: >> >> > Hello, >> >> >I have switched from solr 4.10.2 to solr 5.0.0. In >> solr >> >> > 4-10.2, schema.xml and solrconfig.xml were in example/solr/conf/ >> folder. >> >> > Where is schema.xml and solrconfig.xml in solr 5.0.0 ? and also want >> to >> >> > know how to configure in solrcloud ? >> >> >> > >
Re: default heap size for solr 5.0? (-Xmx param)
Well, the new way will be the only way eventually, so either you learn the old way then switch or learn it now ;)... But if you insist you could start with a heap size of 4G like this: java -Xmx4G -Xms4G -jar start.jar Best, Erick On Wed, Mar 11, 2015 at 1:09 PM, Karl Kildén wrote: > Thanks! > > I am using the old way and I see no reason to switch really? > > cheers > > On 11 March 2015 at 20:18, Shawn Heisey wrote: > >> On 3/11/2015 12:25 PM, Karl Kildén wrote: >> > I am a solr beginner. Anyone knows how solr 5.0 determines the max heap >> > size? I can't find it anywhere. >> > >> > Also, where whould you activate jmx? Would like to be able to use >> visualvm >> > in the future I imagine. >> > >> > I have a custom nssm thing going that installs it as a window service >> that >> > simply calls java -jar start.jar >> >> The default heap size is 512m. This is hardcoded in the bin/solr >> script. You can override that with the -m parameter. >> >> If you are not using the bin/solr script and are instead doing the old >> "java -jar start.jar" startup, the default heap size is determined by >> the version of Java you are running. >> >> Thanks, >> Shawn >> >>
java.nio.channels.CancelledKeyException
Hi, I am indexing documents on Solr 4.10.2. While indexing, I am getting this error in log - java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:169) What does it means? Will it skip the current index documents? Or anything else? Please Help...
Re: Jetty version
Hi, I am not sure but when i am looking into the server/lib directory then i am able to see the version 8.1 with all those lib files present in that folder. So i am guessing its version 8.1. I confirmed it by downloading the new jetty server which was jetty-9.2 and i found the same version on jetty libraries. With Regards Aman Tandon On Thu, Mar 12, 2015 at 12:19 PM, Philippe de Rochambeau wrote: > Hello, > > which jetty version does solr 5 integrate? > > Cheers, > > Philippe >
Re: default heap size for solr 5.0? (-Xmx param)
You could also check the default memory by starting solr with the -V parameter for verbose output. It will show your output like this. If your are startinf solr with script present in bin directory using this command *./solr -c -V* Using Solr root directory: /data/solr/aman/solr_cloud/solr-5.0.0 > Using Java: java > java version "1.7.0_75" > OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~trusty1) > OpenJDK Server VM (build 24.75-b04, mixed mode) > Backing up /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server/logs/solr.log > Backing up /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server/logs/solr_gc.log > Starting Solr using the following settings: > JAVA= java > SOLR_SERVER_DIR = /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server > SOLR_HOME = /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server/solr > SOLR_HOST = > SOLR_PORT = 4567 > STOP_PORT = 3567 > *SOLR_JAVA_MEM = -Xms512m -Xmx512m* > GC_TUNE = -XX:NewRatio=3 -XX:SurvivorRatio=4 > -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 > -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark > -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 > -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled > -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80 > GC_LOG_OPTS = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > -Xloggc:/data/solr/aman/solr_cloud/solr-5.0.0/server/logs/solr_gc.log > SOLR_TIMEZONE = UTC > CLOUD_MODE_OPTS = -DzkClientTimeout=15000 -DzkHost=192.168.6.217:2181, > 192.168.5.81:2181,192.168.5.236:2181 > With Regards Aman Tandon On Thu, Mar 12, 2015 at 1:19 PM, Karl Kildén wrote: > Actually the reason I did not use the solr script was that I didn't really > get how to make a window service out of it from nssm.exe. I tried doing a > .bat that called solr with start -p 8983 but seems it just loops my command > rather then run it. > > Thanks for the help / Karl > > On 11 March 2015 at 23:08, Erick Erickson wrote: > > > Well, the new way will be the only way eventually, so either you learn > > the old way then switch or learn it now ;)... > > > > But if you insist you could start with a heap size of 4G like this: > > > > java -Xmx4G -Xms4G -jar start.jar > > > > Best, > > Erick > > > > > > On Wed, Mar 11, 2015 at 1:09 PM, Karl Kildén > > wrote: > > > Thanks! > > > > > > I am using the old way and I see no reason to switch really? > > > > > > cheers > > > > > > On 11 March 2015 at 20:18, Shawn Heisey wrote: > > > > > >> On 3/11/2015 12:25 PM, Karl Kildén wrote: > > >> > I am a solr beginner. Anyone knows how solr 5.0 determines the max > > heap > > >> > size? I can't find it anywhere. > > >> > > > >> > Also, where whould you activate jmx? Would like to be able to use > > >> visualvm > > >> > in the future I imagine. > > >> > > > >> > I have a custom nssm thing going that installs it as a window > service > > >> that > > >> > simply calls java -jar start.jar > > >> > > >> The default heap size is 512m. This is hardcoded in the bin/solr > > >> script. You can override that with the -m parameter. > > >> > > >> If you are not using the bin/solr script and are instead doing the old > > >> "java -jar start.jar" startup, the default heap size is determined by > > >> the version of Java you are running. > > >> > > >> Thanks, > > >> Shawn > > >> > > >> > > >
Re: default heap size for solr 5.0? (-Xmx param)
Just a small correction > If your are startinf solr with script present in bin directory using this > command > *./solr -c -V* *./solr start -c -V* With Regards Aman Tandon On Thu, Mar 12, 2015 at 4:05 PM, Aman Tandon wrote: > You could also check the default memory by starting solr with the -V > parameter for verbose output. It will show your output like this. > > If your are startinf solr with script present in bin directory using this > command > *./solr -c -V* > > Using Solr root directory: /data/solr/aman/solr_cloud/solr-5.0.0 >> Using Java: java >> java version "1.7.0_75" >> OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~trusty1) >> OpenJDK Server VM (build 24.75-b04, mixed mode) >> Backing up /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server/logs/solr.log >> Backing up /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server/logs/solr_gc.log >> Starting Solr using the following settings: >> JAVA= java >> SOLR_SERVER_DIR = /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server >> SOLR_HOME = /xyz/bbc/qwe/solr_cloud/solr-5.0.0/server/solr >> SOLR_HOST = >> SOLR_PORT = 4567 >> STOP_PORT = 3567 >> *SOLR_JAVA_MEM = -Xms512m -Xmx512m* >> GC_TUNE = -XX:NewRatio=3 -XX:SurvivorRatio=4 >> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 >> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 >> -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark >> -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly >> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 >> -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled >> -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80 >> GC_LOG_OPTS = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps >> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime >> -Xloggc:/data/solr/aman/solr_cloud/solr-5.0.0/server/logs/solr_gc.log >> SOLR_TIMEZONE = UTC >> CLOUD_MODE_OPTS = -DzkClientTimeout=15000 -DzkHost=192.168.6.217:2181 >> ,192.168.5.81:2181,192.168.5.236:2181 >> > > > > With Regards > Aman Tandon > > On Thu, Mar 12, 2015 at 1:19 PM, Karl Kildén > wrote: > >> Actually the reason I did not use the solr script was that I didn't really >> get how to make a window service out of it from nssm.exe. I tried doing a >> .bat that called solr with start -p 8983 but seems it just loops my >> command >> rather then run it. >> >> Thanks for the help / Karl >> >> On 11 March 2015 at 23:08, Erick Erickson >> wrote: >> >> > Well, the new way will be the only way eventually, so either you learn >> > the old way then switch or learn it now ;)... >> > >> > But if you insist you could start with a heap size of 4G like this: >> > >> > java -Xmx4G -Xms4G -jar start.jar >> > >> > Best, >> > Erick >> > >> > >> > On Wed, Mar 11, 2015 at 1:09 PM, Karl Kildén >> > wrote: >> > > Thanks! >> > > >> > > I am using the old way and I see no reason to switch really? >> > > >> > > cheers >> > > >> > > On 11 March 2015 at 20:18, Shawn Heisey wrote: >> > > >> > >> On 3/11/2015 12:25 PM, Karl Kildén wrote: >> > >> > I am a solr beginner. Anyone knows how solr 5.0 determines the max >> > heap >> > >> > size? I can't find it anywhere. >> > >> > >> > >> > Also, where whould you activate jmx? Would like to be able to use >> > >> visualvm >> > >> > in the future I imagine. >> > >> > >> > >> > I have a custom nssm thing going that installs it as a window >> service >> > >> that >> > >> > simply calls java -jar start.jar >> > >> >> > >> The default heap size is 512m. This is hardcoded in the bin/solr >> > >> script. You can override that with the -m parameter. >> > >> >> > >> If you are not using the bin/solr script and are instead doing the >> old >> > >> "java -jar start.jar" startup, the default heap size is determined by >> > >> the version of Java you are running. >> > >> >> > >> Thanks, >> > >> Shawn >> > >> >> > >> >> > >> > >
Re: Jetty version
Yes, Solr 5.0 uses Jetty 8. FYI, the upcoming release 5.1 will move to Jetty 9. Also, just in case it matters -- as noted in the 5.0 release notes, the use of Jetty is now an implementation detail and we might move away from it in the future -- so you shouldn't be depending on Solr using Jetty or a particular version of Jetty. On 12 Mar 2015 10:33, "Aman Tandon" wrote: > Hi, > > I am not sure but when i am looking into the server/lib directory then i am > able to see the version 8.1 with all those lib files present in that > folder. So i am guessing its version 8.1. > > I confirmed it by downloading the new jetty server which was jetty-9.2 and > i found the same version on jetty libraries. > > With Regards > Aman Tandon > > On Thu, Mar 12, 2015 at 12:19 PM, Philippe de Rochambeau > wrote: > > > Hello, > > > > which jetty version does solr 5 integrate? > > > > Cheers, > > > > Philippe > > >
Should I Use Solr
Hi, I am using Oracle 11g2 and we are having a schema where few tables are having more than 100 million rows (some of them are Varchar2 100 bytes). And we have to frequently do the LIKE based search on those tables. Sometimes we need to join the tables also. Insert / Updates are also happening very frequently for such tables (1000 insert / updates per second) by other applications. So my question is, for my User Interface, should I use Apache Solr to let user search on these tables instead of SQL queries? I have tried SQL and it is really slow (considering amount of data I am having in my database). My requirements are, Result should come faster and it should be accurate. It should have the latest data. Can you suggest if I should go with Apache Solr, or another solution for my problem ? Regards, Pratik Thaker The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful.
[Poll]: User need for Solr security
Hi, Securing various Solr APIs has once again surfaced as a discussion in the developer list. See e.g. SOLR-7236 Would be useful to get some feedback from Solr users about needs "in the field". Please reply to this email and let us know what security aspect(s) would be most important for your company to see supported in a future version of Solr. Examples: Local user management, AD/LDAP integration, SSL, authenticated login to Admin UI, authorization for Admin APIs, e.g. admin user vs read-only user etc -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com
Re: Missing doc fields
that would explain it! Luke tool (http://github.com/DmitryKey/luke) is also useful for such cases or generally, when in need to check the field contents: On Wed, Mar 11, 2015 at 12:50 PM, wrote: > Hello, > > I found the reason: the query to store ymds in SOLR was invalid ("json" > and "literal" are concatenated below). > > curl -Ss -X POST ' > http://myserver:8990/solr/archives0/update/extract?extractFormat=text&wt=jsonliteral.ymd=1944-12-31T00:00:00A&literal.id=159168 > > > Philippe > > > > - Mail original - > De: phi...@free.fr > À: solr-user@lucene.apache.org > Envoyé: Mercredi 11 Mars 2015 11:44:15 > Objet: Re: Missing doc fields > > I meant 'fl'. > > -- > > http://myserver:8990/solr/archives0/select?q=*:*&rows=3&wt=json&fl=* > > -- > > > {"responseHeader":{"status":0,"QTime":3,"params":{"q":"*:*","fl":"*","rows":"3","wt":"json"}},"response":{"numFound":160238,"start":0,"docs":[{"id":"10","_version_":1495262519674011648},{"id":"1","_version_":1495262517261238272},{"id":"2","_version_":1495262517637677056}]}} > > > -- schema.xml > > > > > > > > > > required="true" multiValued="false" /> > > > > > > > --- > > > > > > - Mail original - > De: "Dmitry Kan" > À: solr-user@lucene.apache.org > Envoyé: Mercredi 11 Mars 2015 11:38:26 > Objet: Re: Missing doc fields > > What is the ft parameter that you are sending? > > > In order to see all stored fields use the parameter fl=* > > Or list the field names you need: fl=id,ymd > > On Wed, Mar 11, 2015 at 12:35 PM, wrote: > > > When I run the following query, > > > > > http://myserver:8990/solr/archives0/select?q=*:*&rows=3&wt=json&ft=id,ymd > > > > The response is > > > > > > > {"responseHeader":{"status":0,"QTime":1,"params":{"q":"*:*","rows":"3","wt":"json","ft":"id,ymd"}},"response":{"numFound":160238,"start":0,"docs":[{"id":"10","_version_":1495262519674011648},{"id":"1","_version_":1495262517261238272},{"id":"2","_version_":1495262517637677056}]}} > > > > > > the ymd field does not appear in the list of document fields, although it > > is defined in my schema.xml. > > > > Is there a way to tell SOLR to return that field in responses? > > > > > > Philippe > > > > > > > > - Mail original - > > De: phi...@free.fr > > À: solr-user@lucene.apache.org > > Envoyé: Mercredi 11 Mars 2015 11:06:29 > > Objet: Missing doc fields > > > > > > > > Hello, > > > > when I display one of my core's schema, lots of fields appear: > > > > "fields":[{ > > "name":"_root_", > > "type":"string", > > "indexed":true, > > "stored":false}, > > { > > "name":"_version_", > > "type":"long", > > "indexed":true, > > "stored":true}, > > { > > "name":"id", > > "type":"string", > > "multiValued":false, > > "indexed":true, > > "required":true, > > "stored":true}, > > { > > "name":"ymd", > > "type":"tdate", > > "indexed":true, > > "stored":true}], > > > > > > > > Yet, when I display $results in the richtext_doc.vm Velocity template, > > documents only contain three fields (id, _version_, score): > > > > SolrDocument{id=3, _version_=1495262517955395584, score=1.0}, > > > > > > How can I increase the number of doc fields? > > > > Many thanks. > > > > Philipppe > > > > > > -- > Dmitry Kan > Luke Toolbox: http://github.com/DmitryKey/luke > Blog: http://dmitrykan.blogspot.com > Twitter: http://twitter.com/dmitrykan > SemanticAnalyzer: www.semanticanalyzer.info > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: DocumentAnalysisRequestHandler
>> What is the modern equivalent of Luke? It is same Luke, but polished: http://github.com/DmitryKey/luke On Thu, Mar 12, 2015 at 11:03 AM, wrote: > Hello, > > my solr logs say: > > INFO - 2015-03-12 08:49:34.900; org.apache.solr.core.RequestHandlers; > created /analysis/document: solr.DocumentAnalysisRequestHandler > WARN - 2015-03-12 08:49:34.919; org.apache.solr.core.SolrResourceLoader; > Solr loaded a deprecated plugin/analysis class [solr.admin.AdminHandlers]. > Please consult documentation how to replace it accordingly. > > > Is /analysis/document deprecated in SOLR 5? > >class="solr.DocumentAnalysisRequestHandler" > startup="lazy" /> > > > What is the modern equivalent of Luke? > > Many thanks. > > Philippe > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: [Poll]: User need for Solr security
Hi, Things you have mentioned would be useful for our use-case. On top we've seen these two requests for securing Solr: 1. Encrypting the index (with a customer private key for instance). There are certainly other ways to go about this, like using virtual private clouds, but having the feature in solr could allow multitenant Solr installations. 2. ACLs: giving access rights to parts of the index / document sets depending on the user access rights. On Thu, Mar 12, 2015 at 1:32 PM, Jan Høydahl wrote: > Hi, > > Securing various Solr APIs has once again surfaced as a discussion in the > developer list. See e.g. SOLR-7236 > Would be useful to get some feedback from Solr users about needs "in the > field". > > Please reply to this email and let us know what security aspect(s) would > be most important for your company to see supported in a future version of > Solr. > Examples: Local user management, AD/LDAP integration, SSL, authenticated > login to Admin UI, authorization for Admin APIs, e.g. admin user vs > read-only user etc > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: how to change configurations in solrcloud setup
On 3/11/2015 10:45 PM, Aman Tandon wrote: >> You may need to manually remove the 127.0.1.1 entries from zookeeper >> after you fix the IP address problem. > > > How to do that? The zkcli script included with Solr should have everything you need -- getfile, putfile, and clear ... but that would be a rather frustrating way to handle it. You won't be able to accomplish your goal by only deleting znodes, you'll have to edit some json structures and replace them in zookeeper. The main thing you'll need to edit is the clusterstate.json ... this is a single "file" in Solr 4.x, in 5.0 it has changed to a clusterstate for every collection. There are not very many GUI clients for zookeeper. The only one that I've really found is the one that is a plugin for eclipse. I happen to use eclipse, so this is fairly convenient for me: http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper Thanks, Shawn
Re: Creating a directory resource in solr-jetty
On 3/11/2015 7:38 AM, phi...@free.fr wrote: > does anyone if it is possible to create a directory resource in the > solr-jetty configuration files? > > In Tomcat 8, you can do the following: > > > > className="org.apache.catalina.webresources.DirResourceSet" > base="/mnt/archive_pdf/PDF/IHT" > webAppMount="/arcpdf0" > /> This is a question that you'd need to ask in a Jetty support venue. I don't know the answer, and from the lack of response, I would guess that nobody else who has seen your question knows the answer either. This container config has nothing to do with Solr at all ... most people here are only familiar with those pieces of container config that affect Solr. http://eclipse.org/jetty/mailinglists.php I hate to turn you away without giving you an answer ... if I knew, I would ignore the fact that this is off topic, and give you the answer. Thanks, Shawn
Re: Update solr schema.xml in real time for Solr 4.10.1
On 3/12/2015 2:00 AM, Zheng Lin Edwin Yeo wrote: > I understand that in Solr 5.0, they provide a REST API to do real-time > update of the schema using Curl. However, I could not do that for my > eariler version of Solr 4.10.1. > > Would like to check, is this function available for the earlier version of > Solr, and is the curl syntax the same as Solr 5.0? Providing a way to simply edit the config files directly is a potential security issue. We briefly had a way to edit those configs right in the admin UI, but Redhat reported this capability as a security problem, so we removed it. I don't remember whether there is a way to re-enable this functionality. The Schema REST API is available in 4.10. It was also present in 4.9. Currently you can only *add* to the schema, you cannot edit what's already there. Thanks, Shawn
Re: Should I Use Solr
On 3/12/2015 5:03 AM, Pratik Thaker wrote: > I am using Oracle 11g2 and we are having a schema where few tables are having > more than 100 million rows (some of them are Varchar2 100 bytes). And we have > to frequently do the LIKE based search on those tables. Sometimes we need to > join the tables also. Insert / Updates are also happening very frequently for > such tables (1000 insert / updates per second) by other applications. > > So my question is, for my User Interface, should I use Apache Solr to let > user search on these tables instead of SQL queries? I have tried SQL and it > is really slow (considering amount of data I am having in my database). > > My requirements are, > > Result should come faster and it should be accurate. > It should have the latest data. > Can you suggest if I should go with Apache Solr, or another solution for my > problem ? Solr will do what you want. I have essentially the same situation, except the database is MySQL. We have just over 100 million total documents. Our add/update rate is much lower than yours. For a fully redundant setup, I am running two copies of the index on four Solr servers that each have 64GB of RAM. It's a distributed index that's not running SolrCloud, two servers are required to house one complete copy of the index. The total index size on each pair of servers is about 150GB. Thanks, Shawn
Re: Creating a directory resource in solr-jetty
Hi Shawn, here is the Jetty Mailing List's reply concerning my question. Unfortunately, this solution won't work with SOLR Jetty, because its version is < 9. Philippe -- Just ensure you don't have a /WEB-INF/ directory, and you can use this on Jetty 9.2.9+ http://www.eclipse.org/jetty/configure_9_0.dtd";> /example /mnt/iiiparnex01_pdf/PDF/III/ - Mail original - De: "Shawn Heisey" À: solr-user@lucene.apache.org Envoyé: Jeudi 12 Mars 2015 13:59:49 Objet: Re: Creating a directory resource in solr-jetty On 3/11/2015 7:38 AM, phi...@free.fr wrote: > does anyone if it is possible to create a directory resource in the > solr-jetty configuration files? > > In Tomcat 8, you can do the following: > > > > className="org.apache.catalina.webresources.DirResourceSet" > base="/mnt/archive_pdf/PDF/IHT" > webAppMount="/arcpdf0" > /> This is a question that you'd need to ask in a Jetty support venue. I don't know the answer, and from the lack of response, I would guess that nobody else who has seen your question knows the answer either. This container config has nothing to do with Solr at all ... most people here are only familiar with those pieces of container config that affect Solr. http://eclipse.org/jetty/mailinglists.php I hate to turn you away without giving you an answer ... if I knew, I would ignore the fact that this is off topic, and give you the answer. Thanks, Shawn
Re: sort by given order
Not unless you can somehow codify that sort order at index time, but I'm assuming the sort order changes dynamically. You can also sort by function, but that's not really useful. Or, if these are relatively short lists, you can sort at the app layer. Best, Erick On Thu, Mar 12, 2015 at 2:16 AM, Johannes Siegert wrote: > Hi, > > i want to sort my documents by a given order. The order is defined by a list > of ids. > > My current solution is: > > list of ids: 15, 5, 1, 10, 3 > > query: q=*:*&fq=(id:((15) OR (5) OR (1) OR (10) OR > (3)))&sort=query($idqsort) desc,id asc&idqsort=id:((15^5) OR (5^4) OR (1^3) > OR (10^2) OR (3^1))&start=0&rows=5 > > Do you know an other solution to sort by a list of ids? > > Thanks! > > Johannes
Re: Where is schema.xml and solrconfig.xml in solr 5.0.0
By and large, I really never use linking. But it's about associating a config set you've _already_ uploaded with a collection. So uploading is pushing the configset from your local machine up to Zookeeper, and linking is using that uploaded, named configuration with an arbitrary collection. But usually you just make this association when creating the collection. It's simple to test all this out, just upconfig a couple of config sets, play with the linking and reload the collections. From there the admin UI will show you what actually happened. Best, Erick On Thu, Mar 12, 2015 at 2:39 AM, Nitin Solanki wrote: > Hi. Erick.. >Would please help me distinguish between > Uploading a Configuration Directory and Linking a Collection to a > Configuration Set ? > > On Thu, Mar 12, 2015 at 2:01 AM, Nitin Solanki wrote: > >> Thanks a lot Erick.. It will be helpful. >> >> On Wed, Mar 11, 2015 at 9:27 PM, Erick Erickson >> wrote: >> >>> The configs are in Zookeeper. So you have to switch your thinking, >>> it's rather confusing at first. >>> >>> When you create a collection, you specify a "config set", these are >>> usually in >>> >>> ./server/solr/configsets/data_driven_schema, >>> ./server/solr/configsets/techproducts and the like. >>> >>> The entire conf directory under one of these is copied to Zookeeper >>> (which you can see >>> from the admin screen cloud>>tree, then in the right hand side you'll >>> be able to find the config sets >>> you uploaded. >>> >>> But, you cannot edit them there directly. You edit them on disk, then >>> push them to Zookeeper, >>> then reload the collection (or restart everything). See the reference >>> guide here: >>> https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities >>> >>> Best, >>> Erick >>> >>> On Wed, Mar 11, 2015 at 6:01 AM, Nitin Solanki >>> wrote: >>> > Hi, alexandre.. >>> > >>> > Thanks for responding... >>> > When I created new collection(wikingram) using solrCloud. It gets create >>> > into example/cloud/node*(node1, node2) like that. >>> > I have used *schema.xml and solrconfig.xml of >>> sample_techproducts_configs* >>> > configuration. >>> > >>> > Now, The problem is that. >>> > If I change the configuration of *solrconfig.xml of * >>> > *sample_techproducts_configs*. Its configuration doesn't reflect on >>> > *wikingram* collection. >>> > How to reflect the changes of configuration in the collection? >>> > >>> > On Wed, Mar 11, 2015 at 5:42 PM, Alexandre Rafalovitch < >>> arafa...@gmail.com> >>> > wrote: >>> > >>> >> Which example are you using? Or how are you creating your collection? >>> >> >>> >> If you are using your example, it creates a new directory under >>> >> "example". If you are creating a new collection with "-c", it creates >>> >> a new directory under the "server/solr". The actual files are a bit >>> >> deeper than usual to allow for a log folder next to the collection >>> >> folder. So, for example: >>> >> "example/schemaless/solr/gettingstarted/conf/solrconfig.xml" >>> >> >>> >> If it's a dynamic schema configuration, you don't actually have >>> >> schema.xml, but managed-schema, as you should be mostly using REST >>> >> calls to configure it. >>> >> >>> >> If you want to see the configuration files before the collection >>> >> actually created, they are under "server/solr/configsets", though they >>> >> are not configsets in Solr sense, as they do get copied when you >>> >> create your collections (sharing them causes issues). >>> >> >>> >> Regards, >>> >>Alex. >>> >> >>> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: >>> >> http://www.solr-start.com/ >>> >> >>> >> >>> >> On 11 March 2015 at 07:50, Nitin Solanki wrote: >>> >> > Hello, >>> >> >I have switched from solr 4.10.2 to solr 5.0.0. In >>> solr >>> >> > 4-10.2, schema.xml and solrconfig.xml were in example/solr/conf/ >>> folder. >>> >> > Where is schema.xml and solrconfig.xml in solr 5.0.0 ? and also want >>> to >>> >> > know how to configure in solrcloud ? >>> >> >>> >> >>
Re: Creating a directory resource in solr-jetty
On 3/12/2015 8:17 AM, phi...@free.fr wrote: > here is the Jetty Mailing List's reply concerning my question. > > Unfortunately, this solution won't work with SOLR Jetty, because its version > is < 9. The trunk branch of the Solr source code (version 6.0 development) is already running Jetty 9.2.9. I have seen two committers say that 5.1 will be upgraded to Jetty 9 as well, though this needs to happen very soon, or it won't make the cutoff and may get pushed back to 5.2. Thanks, Shawn
Re: [Poll]: User need for Solr security
About <1>. Gotta be careful here about what would be promised. You really _can't_ encrypt the _indexed_ terms in a meaningful way and still search. And, as you well know, you can reconstruct documents from the indexed terms. It's lossy, but still coherent enough to give security folks fits. For instance, to do a wildcard search I need to have the "run" in "run" match "running", "runner" "runs" etc. Any but trivial encryption will break that, and the trivial encryption is easy to break. So putting all this over an encrypting filesystem is an approach that's often used. FWIW On Thu, Mar 12, 2015 at 5:22 AM, Dmitry Kan wrote: > Hi, > > Things you have mentioned would be useful for our use-case. > > On top we've seen these two requests for securing Solr: > > 1. Encrypting the index (with a customer private key for instance). There > are certainly other ways to go about this, like using virtual private > clouds, but having the feature in solr could allow multitenant Solr > installations. > > 2. ACLs: giving access rights to parts of the index / document sets > depending on the user access rights. > > > > On Thu, Mar 12, 2015 at 1:32 PM, Jan Høydahl wrote: > >> Hi, >> >> Securing various Solr APIs has once again surfaced as a discussion in the >> developer list. See e.g. SOLR-7236 >> Would be useful to get some feedback from Solr users about needs "in the >> field". >> >> Please reply to this email and let us know what security aspect(s) would >> be most important for your company to see supported in a future version of >> Solr. >> Examples: Local user management, AD/LDAP integration, SSL, authenticated >> login to Admin UI, authorization for Admin APIs, e.g. admin user vs >> read-only user etc >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> >> > > > -- > Dmitry Kan > Luke Toolbox: http://github.com/DmitryKey/luke > Blog: http://dmitrykan.blogspot.com > Twitter: http://twitter.com/dmitrykan > SemanticAnalyzer: www.semanticanalyzer.info
Re: Update solr schema.xml in real time for Solr 4.10.1
Actually I ran across a neat IntelliJ plugin that you could install and directly edit ZK files. And I'm pretty sure there are stand-alone programs that do this, but they are all outside Solr. I'm not sure what "real time update of the schema" is for, would you (Zheng) explain further? Collections _must_ be reloaded for schema changes to take effect so I'm not quite sure what you're referring to. Nitin: The usual process is to have the master config be local, change the local version then upload it to ZK with the upconfig option in zkCli, then reload your collection. Best, Erick On Thu, Mar 12, 2015 at 6:04 AM, Shawn Heisey wrote: > On 3/12/2015 2:00 AM, Zheng Lin Edwin Yeo wrote: >> I understand that in Solr 5.0, they provide a REST API to do real-time >> update of the schema using Curl. However, I could not do that for my >> eariler version of Solr 4.10.1. >> >> Would like to check, is this function available for the earlier version of >> Solr, and is the curl syntax the same as Solr 5.0? > > Providing a way to simply edit the config files directly is a potential > security issue. We briefly had a way to edit those configs right in the > admin UI, but Redhat reported this capability as a security problem, so > we removed it. I don't remember whether there is a way to re-enable > this functionality. > > The Schema REST API is available in 4.10. It was also present in 4.9. > Currently you can only *add* to the schema, you cannot edit what's > already there. > > Thanks, > Shawn >
Re: Where is schema.xml and solrconfig.xml in solr 5.0.0
On 3/12/2015 9:18 AM, Erick Erickson wrote: > By and large, I really never use linking. But it's about associating a > config set > you've _already_ uploaded with a collection. > > So uploading is pushing the configset from your local machine up to Zookeeper, > and linking is using that uploaded, named configuration with an > arbitrary collection. > > But usually you just make this association when creating the collection. The primary use case that I see for linkconfig is in testing upgrades to configurations. So let's say you have a production collection that uses a config that you name fooV1 for foo version 1. You can build a test collection that uses a config named fooV2, work out all the bugs, and then when you're ready to deploy it, you can use linkconfig to link your production collection to fooV2, reload the collection, and you're using the new config. I haven't discussed here how to handle the situation where a reindex is required. One thing you CAN do is run linkconfig for a collection that doesn't exist yet, and then you don't need to include collection.configName when you create the collection, because the link is already present in zookeeper. I personally don't like doing things this way, but I'm pretty sure it works. Thanks, Shawn
Re: DocumentAnalysisRequestHandler
Yes, the admin handlers are deprecated because they are now implicit - no need to specify them in solrconfig. Yeah, the doc is very unclear on that point, but in CHANGES.TXT: "*AdminHandlers is deprecated , /admin/* are implicitly defined, /get ,/replication and handlers are also implicitly registered (refer to SOLR-6792)*". IOW, remove the XML element from your solrconfig. As far as the document analysis request handler, that should still be fine. Are you encountering some problem? The first log line you gave is just an INFO - information only, not a problem. -- Jack Krupansky On Thu, Mar 12, 2015 at 5:03 AM, wrote: > Hello, > > my solr logs say: > > INFO - 2015-03-12 08:49:34.900; org.apache.solr.core.RequestHandlers; > created /analysis/document: solr.DocumentAnalysisRequestHandler > WARN - 2015-03-12 08:49:34.919; org.apache.solr.core.SolrResourceLoader; > Solr loaded a deprecated plugin/analysis class [solr.admin.AdminHandlers]. > Please consult documentation how to replace it accordingly. > > > Is /analysis/document deprecated in SOLR 5? > >class="solr.DocumentAnalysisRequestHandler" > startup="lazy" /> > > > What is the modern equivalent of Luke? > > Many thanks. > > Philippe >
Re: How to configure Solr PostingsFormat block size
Hi Hoss, I created a wrapper class, compiled a jar and included an org.apache.lucene.codecs.Codec file in META-INF/services in the jar file with an entry for the wrapper class :HTPostingsFormatWrapper. I created a collection1/lib directory and put the jar there. (see below) I'm getting the dread "ClassCastException Class.asSubclass(Unknown Source" error (See below). This is looking like a complex classloader issues. Should I put the file somewhere else and/or declare a lib directory in solrconfig.xml? Any suggestions on how to troubleshoot this?. Tom error: by: java.lang.ClassCastException: class org.apache.lucene.codecs.HTPostingsFormatWrapper at java.lang.Class.asSubclass(Unknown Source) at org.apache.lucene.util.SPIClassIterator.next(SPIClassIterator.java:141) --- Contents of the jar file: C:\d\solr\lucene_solr_4_10_2\solr\example\solr\collection1\lib>jar -tvf HTPostingsFormatWrapper.jar 25 Thu Mar 12 10:37:04 EDT 2015 META-INF/MANIFEST.MF 1253 Thu Mar 12 10:37:04 EDT 2015 org/apache/lucene/codecs/HTPostingsFormatWrapper.class 1276 Thu Mar 12 10:49:06 EDT 2015 META-INF/services/org.apache.lucene.codecs.Codec Contents of META-INF/services/org.apache.lucene.codecs.Codec in the jar file: org.apache.lucene.codecs.lucene49.Lucene49Codec org.apache.lucene.codecs.lucene410.Lucene410Codec # tbw adds custom wrapper here per Hoss e-mail org.apache.lucene.codecs.HTPostingsFormatWrapper - log file excerpt with stack trace: 12821 [main] INFO org.apache.solr.core.CoresLocator – Looking for core definitions underneath C:\d\solr\lucene_solr_4_10_2\solr\example\solr 12838 [main] INFO org.apache.solr.core.CoresLocator – Found core collection1 in C:\d\solr\lucene_solr_4_10_2\solr\example\solr\collection1\ 12839 [main] INFO org.apache.solr.core.CoresLocator – Found 1 core definitions 12841 [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader – new SolrResourceLoader for directory: 'C:\d\solr\lucene_solr_4_10_2\solr\example\solr\collection1\' 12842 [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/C:/d/solr/lucene_solr_4_10_2/solr/example/solr/collection1/lib/HTPostingsFormatWrapper.jar' to classloader 12870 [coreLoadExecutor-5-thread-1] ERROR org.apache.solr.core.CoreContainer – Error creating core [collection1]: class org.apache.lucene.codecs.HTPostingsFormatWrapper java.lang.ClassCastException: class org.apache.lucene.codecs.HTPostingsFormatWrapper at java.lang.Class.asSubclass(Unknown Source) at org.apache.lucene.util.SPIClassIterator.next(SPIClassIterator.java:141) at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:65) at org.apache.lucene.codecs.Codec.reloadCodecs(Codec.java:119) at org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:206) at org.apache.solr.core.SolrResourceLoader.(SolrResourceLoader.java:142) at org.apache.solr.core.ConfigSetService$Default.createCoreResourceLoader(ConfigSetService.java:144) at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:58) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:489) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) On Wed, Jan 14, 2015 at 6:05 PM, Chris Hostetter wrote: > > : As a foolish dev (not malicious I hope!), I did mess around with > something > : like this once; I was writing my own Codec. I found I had to create a > file > : called META-INF/services/org.apache.lucene.codecs.Codec in my solr > plugin jar > : that contained the fully-qualified class name of my codec: I guess this > : registers it with the SPI framework so it can be found by name? I'm not > > Yep, that's how SPI works - the important bits are mentioned/linked in the > PostingsFormat (and other SPI related classes in lucene) javadocs... > > > https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/PostingsFormat.html > > > https://docs.oracle.com/javase/7/docs/api/java/util/ServiceLoader.html?is-external=true > > > > > > -Hoss > http://www.lucidworks.com/ >
error message This IndexSchema is not mutable with a classicSchemaIndexFactory
Hi guys, I saw an issue in Jira (https://issues.apache.org/jira/browse/SOLR-7234) with status Resolved, but the resolution is not identified in the issue. I am facing the exact same problem.. and not able to identified the solution. In the last comment of the issue, is said that this kind of questions should be done in the solr-user mailing list.. So anyone. I'll appreciate any kind of help. Thanks is advanced! Best regards, Pedro Figueiredo
Re: Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping
Well, I think I've narrowed down the issue. The error is happening when I'm trying to do a rolling update from Solr 4.7 (which is our current version) to 5.0 . I'm able to re-produce this couple of times. If I do a fresh index on a 5.0, it works. Not sure if there's any other way to mitigate it. I'll appreciate if someone can share their experience on the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-5-0-migration-IllegalStateException-unexpected-docvalues-type-NONE-on-fields-using-docvalues-tp4192477p4192706.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping
On 3/11/2015 4:45 PM, shamik wrote: > multiValued="false" required="false" omitNorms="true" docValues="true" > /> 3/11/2015, 2:14:30 PM ERROR SolrDispatchFilter > null:java.lang.IllegalStateException: unexpected docvalues type NONE > for field 'DocumentType' (expected=SORTED). Use UninvertingReader or > index with docvalues. null:java.lang.IllegalStateException: unexpected > docvalues type NONE for field 'DocumentType' (expected=SORTED). Use > UninvertingReader or index with docvalues. I admit right up front that I know very little about what might be happening here ... but I did have one idea. It could be completely wrong. Is it possible that you have an index that fits the following description? The field originally did not have docValues. You enabled docValues on the field in the schema, but there are index segments still in the index directory from *before* you changed the schema. If that sounds at all possible, then if you did not fully reindex, there would be segments with valid documents that do not have docValues. You should fully reindex and then optimize before upgrading. If you did fully reindex, but did not optimize, then there might be segments with *deleted* documents that do not have docValues ... and maybe 4.7 was fine with that but 5.0 isn't. Whenever I upgrade Solr, I always reindex from scratch, and often I will completely delete all the data directories. It takes longer, but then I know the index is 100% correct for the version and config I'm running. I'll reiterate that this whole idea could be 100% wrong. Thanks, Shawn
Re: error message This IndexSchema is not mutable with a classicSchemaIndexFactory
The answer meant it was most likely something user has done not quite understanding Solr's behavior. Not a bug. I'd ignore that case and just explain what your issue actually is. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 14:43, Pedro Figueiredo wrote: > Hi guys, > > > > I saw an issue in Jira (https://issues.apache.org/jira/browse/SOLR-7234) > with status Resolved, but the resolution is not identified in the issue. > > I am facing the exact same problem.. and not able to identified the > solution. > > > > In the last comment of the issue, is said that this kind of questions should > be done in the solr-user mailing list.. > > So anyone. I'll appreciate any kind of help. > > > > Thanks is advanced! > > > > Best regards, > > Pedro Figueiredo >
Re: Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping
Do you have any really old segments in that index? Could be worth trying to optimize them down to one in latest format first. Like Shawn, this is just a "one more idea" proposal. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 14:47, shamik wrote: > Well, I think I've narrowed down the issue. The error is happening when I'm > trying to do a rolling update from Solr 4.7 (which is our current version) > to 5.0 . I'm able to re-produce this couple of times. If I do a fresh index > on a 5.0, it works. Not sure if there's any other way to mitigate it. > > I'll appreciate if someone can share their experience on the same. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-5-0-migration-IllegalStateException-unexpected-docvalues-type-NONE-on-fields-using-docvalues-tp4192477p4192706.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: [Poll]: User need for Solr security
If you cannot trust your root users you probably have bigger problems than with search... I think it has been suggested to encrypt on codec or directory level as well. Yep, here is the JIRA https://issues.apache.org/jira/browse/LUCENE-2228 :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 12. mar. 2015 kl. 16.22 skrev Erick Erickson : > > About <1>. Gotta be careful here about what would be promised. You > really _can't_ encrypt the _indexed_ terms in a meaningful way and > still search. And, as you well know, you can reconstruct documents > from the indexed terms. It's lossy, but still coherent enough to give > security folks fits. > > For instance, to do a wildcard search I need to have the "run" in > "run" match "running", "runner" "runs" etc. Any but trivial encryption > will break that, and the trivial encryption is easy to break. > > So putting all this over an encrypting filesystem is an approach > that's often used. > > FWIW > > > On Thu, Mar 12, 2015 at 5:22 AM, Dmitry Kan wrote: >> Hi, >> >> Things you have mentioned would be useful for our use-case. >> >> On top we've seen these two requests for securing Solr: >> >> 1. Encrypting the index (with a customer private key for instance). There >> are certainly other ways to go about this, like using virtual private >> clouds, but having the feature in solr could allow multitenant Solr >> installations. >> >> 2. ACLs: giving access rights to parts of the index / document sets >> depending on the user access rights. >> >> >> >> On Thu, Mar 12, 2015 at 1:32 PM, Jan Høydahl wrote: >> >>> Hi, >>> >>> Securing various Solr APIs has once again surfaced as a discussion in the >>> developer list. See e.g. SOLR-7236 >>> Would be useful to get some feedback from Solr users about needs "in the >>> field". >>> >>> Please reply to this email and let us know what security aspect(s) would >>> be most important for your company to see supported in a future version of >>> Solr. >>> Examples: Local user management, AD/LDAP integration, SSL, authenticated >>> login to Admin UI, authorization for Admin APIs, e.g. admin user vs >>> read-only user etc >>> >>> -- >>> Jan Høydahl, search solution architect >>> Cominvent AS - www.cominvent.com >>> >>> >> >> >> -- >> Dmitry Kan >> Luke Toolbox: http://github.com/DmitryKey/luke >> Blog: http://dmitrykan.blogspot.com >> Twitter: http://twitter.com/dmitrykan >> SemanticAnalyzer: www.semanticanalyzer.info
RE: error message This IndexSchema is not mutable with a classicSchemaIndexFactory
Hello Alex, I'm trying to add a new document, using solrj and the error "This IndexSchema is not mutable" is raised when inserting the document in the solr index. My index in solr, is configured with classicSchemaIndexFactory. If I change it to AutoManaged the insert is done without any problems. I believe that there is no mutable configuration (true or false) for ClassicSchema as for AutoManaged. The document does not have any new field, all fields are specified in the schema.xml file. Any thoughts!? Thanks! Pedro Figueiredo De: Alexandre Rafalovitch [arafa...@gmail.com] Enviado: quinta-feira, 12 de Março de 2015 19:04 Para: solr-user Assunto: Re: error message This IndexSchema is not mutable with a classicSchemaIndexFactory The answer meant it was most likely something user has done not quite understanding Solr's behavior. Not a bug. I'd ignore that case and just explain what your issue actually is. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 14:43, Pedro Figueiredo wrote: > Hi guys, > > > > I saw an issue in Jira (https://issues.apache.org/jira/browse/SOLR-7234) > with status Resolved, but the resolution is not identified in the issue. > > I am facing the exact same problem.. and not able to identified the > solution. > > > > In the last comment of the issue, is said that this kind of questions should > be done in the solr-user mailing list.. > > So anyone. I'll appreciate any kind of help. > > > > Thanks is advanced! > > > > Best regards, > > Pedro Figueiredo >
SSD endurance
For those who have not yet taken the leap to SSD goodness because they are afraid of flash wear, the burnout test from The Tech Report seems worth a read. The short story is that they wrote data to the drives until they wore out. All tested drives survived considerably longer than guaranteed, but 4/6 failed catastrophically when they did die. http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead I am disappointed about the catastrophic failures. One of the promises of SSDs was graceful end of life by switching to read-only mode. Some of them did give warnings before the end, but I wonder how those are communicated in a server environment? Regarding Lucene/Solr, the write pattern when updating an index is benign to SSDs: Updates are relatively bulky, rather than the evil constantly-flip-random-single-bits-and-flush pattern of databases. With segments being immutable, the bird's eye view is that Lucene creates and deletes large files, which makes it possible for the SSD's wear-leveler to select the least-used flash sectors for new writes: The write pattern over time is not too far from the one that The Tech Report tested with. - Toke Eskildsen Whose trusty old 160GB Intel X25-M reports an accumulated 36TB of writes.
RE: error message This IndexSchema is not mutable with a classicSchemaIndexFactory
what does your schema.xml look like? what does your solrconfig.xml look like? what does the document you are indexing look like? what is the full error with stack trace from your server logs? details matter. https://wiki.apache.org/solr/UsingMailingLists : Date: Thu, 12 Mar 2015 20:27:05 + : From: Pedro Figueiredo : Reply-To: solr-user@lucene.apache.org : To: "solr-user@lucene.apache.org" : Subject: RE: error message This IndexSchema is not mutable with a : classicSchemaIndexFactory : : Hello Alex, : : I'm trying to add a new document, using solrj and the error "This IndexSchema is not mutable" is raised when inserting the document in the solr index. : My index in solr, is configured with classicSchemaIndexFactory. : If I change it to AutoManaged the insert is done without any problems. : : I believe that there is no mutable configuration (true or false) for ClassicSchema as for AutoManaged. : : The document does not have any new field, all fields are specified in the schema.xml file. : : Any thoughts!? : : Thanks! : Pedro Figueiredo : : De: Alexandre Rafalovitch [arafa...@gmail.com] : Enviado: quinta-feira, 12 de Março de 2015 19:04 : Para: solr-user : Assunto: Re: error message This IndexSchema is not mutable with a classicSchemaIndexFactory : : The answer meant it was most likely something user has done not quite : understanding Solr's behavior. Not a bug. I'd ignore that case and : just explain what your issue actually is. : : Regards, :Alex. : : Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: : http://www.solr-start.com/ : : : On 12 March 2015 at 14:43, Pedro Figueiredo : wrote: : > Hi guys, : > : > : > : > I saw an issue in Jira (https://issues.apache.org/jira/browse/SOLR-7234) : > with status Resolved, but the resolution is not identified in the issue. : > : > I am facing the exact same problem.. and not able to identified the : > solution. : > : > : > : > In the last comment of the issue, is said that this kind of questions should : > be done in the solr-user mailing list.. : > : > So anyone. I'll appreciate any kind of help. : > : > : > : > Thanks is advanced! : > : > : > : > Best regards, : > : > Pedro Figueiredo : > : -Hoss http://www.lucidworks.com/
Re: Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping
Wow, "optimize" worked like a charm. This really addressed the docvalues issue. A follow-up question, is it recommended to run optimize in a Production Solr index ? Also, in a Sorl cloud mode, do we need to run optimize on each instance / each shard / any instance ? Appreciate your help Alex. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-5-0-migration-IllegalStateException-unexpected-docvalues-type-NONE-on-fields-using-docvalues-tp4192477p4192732.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 5.0 --> "IllegalStateException: unexpected docvalues type NONE" on result grouping
Manual optimize is no longer needed for modern Solr. It does great optimization automatically. The only reason I recommended it here is to make sure that all segments are brought up to the latest version and the deleted documents are purged. That's something that also would happen automatically eventually, but "eventually" was not an option for you. I am glad this helped. I am not 100% sure if you have to do it on each shard in SolrCloud mode, but I suspect so. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 17:24, shamik wrote: > Wow, "optimize" worked like a charm. This really addressed the docvalues > issue. A follow-up question, is it recommended to run optimize in a > Production Solr index ? Also, in a Sorl cloud mode, do we need to run > optimize on each instance / each shard / any instance ? > > Appreciate your help Alex. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-5-0-migration-IllegalStateException-unexpected-docvalues-type-NONE-on-fields-using-docvalues-tp4192477p4192732.html > Sent from the Solr - User mailing list archive at Nabble.com.
Best way to dump out entire solr content?
Hi All, I am having a solr cloud cluster of 20 nodes with each node having close to 20 Million records and total index size is around 400GB ( 20GB per node X 20 nodes ). I am trying to know the best way to dump out the entire solr data in say CSV format. I use successive queries by incrementing the start param with 2000 and keeping the rows as 2000 and hitting each individual servers using distrib=false so that I don't overload the top level server and causing any timeouts between top level and lower level servers. I am getting response from solr very quickly when the start param is in lower millions < 2 millions. As the start param grows towards 16 million, solr takes almost 2 to 3 minutes to return back those 2000 records for a single query. I assume this is because of skipping all the lower level index positions to get to that start index of > 16 millions and then provide the results. Is there any better way to do this? I saw cursor feature in solr pagination Wiki but it is mentioned that it is for sort on a unique field. Would it make sense for my use this to sort on my solr key field(Solr unique key field) with rows as 2000 and keep on using the nextCursorMark to dump out all the documents in csv format? Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to dump out entire solr content?
Well, it's cursor or nothing. Well, or some sort of custom code to manually read Lucene indexes (good luck with deleted items, etc). I think your understanding is correct. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 18:10, vsriram30 wrote: > Hi All, > > I am having a solr cloud cluster of 20 nodes with each node having close to > 20 Million records and total index size is around 400GB ( 20GB per node X 20 > nodes ). I am trying to know the best way to dump out the entire solr data > in say CSV format. > > I use successive queries by incrementing the start param with 2000 and > keeping the rows as 2000 and hitting each individual servers using > distrib=false so that I don't overload the top level server and causing any > timeouts between top level and lower level servers. I am getting response > from solr very quickly when the start param is in lower millions < 2 > millions. As the start param grows towards 16 million, solr takes almost 2 > to 3 minutes to return back those 2000 records for a single query. I assume > this is because of skipping all the lower level index positions to get to > that start index of > 16 millions and then provide the results. > > Is there any better way to do this? I saw cursor feature in solr pagination > Wiki but it is mentioned that it is for sort on a unique field. Would it > make sense for my use this to sort on my solr key field(Solr unique key > field) with rows as 2000 and keep on using the nextCursorMark to dump out > all the documents in csv format? > > Thanks, > Sriram > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734.html > Sent from the Solr - User mailing list archive at Nabble.com.
RE: SSD endurance
Thanks for sharing Toke! Reliability should not be a problem for a Solr cloud environment. A corrupted index cannot be loaded due to exceptions so the core should not enter an active state. However, what would happen if parts of the data become corrupted but can still be processed by the codec? I don't even know if the data has a CRC check to guard against such madness? Markus -Original message- > From:Toke Eskildsen > Sent: Thursday 12th March 2015 21:33 > To: solr-user > Subject: SSD endurance > > For those who have not yet taken the leap to SSD goodness because they are > afraid of flash wear, the burnout test from The Tech Report seems worth a > read. The short story is that they wrote data to the drives until they wore > out. All tested drives survived considerably longer than guaranteed, but 4/6 > failed catastrophically when they did die. > > http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead > > I am disappointed about the catastrophic failures. One of the promises of > SSDs was graceful end of life by switching to read-only mode. Some of them > did give warnings before the end, but I wonder how those are communicated in > a server environment? > > > Regarding Lucene/Solr, the write pattern when updating an index is benign to > SSDs: Updates are relatively bulky, rather than the evil > constantly-flip-random-single-bits-and-flush pattern of databases. With > segments being immutable, the bird's eye view is that Lucene creates and > deletes large files, which makes it possible for the SSD's wear-leveler to > select the least-used flash sectors for new writes: The write pattern over > time is not too far from the one that The Tech Report tested with. > > - Toke Eskildsen > Whose trusty old 160GB Intel X25-M reports an accumulated 36TB of writes. >
Re: [Poll]: User need for Solr security
Hi, I’m currently working with indexes that need document level security. Based on the user logged in, query results would omit documents that this user doesn’t have access to, with LDAP integration and such. I think that would be nice to have on a future Solr release. Henrique. > On Mar 12, 2015, at 7:32 AM, Jan Høydahl wrote: > > Hi, > > Securing various Solr APIs has once again surfaced as a discussion in the > developer list. See e.g. SOLR-7236 > Would be useful to get some feedback from Solr users about needs "in the > field". > > Please reply to this email and let us know what security aspect(s) would be > most important for your company to see supported in a future version of Solr. > Examples: Local user management, AD/LDAP integration, SSL, authenticated > login to Admin UI, authorization for Admin APIs, e.g. admin user vs read-only > user etc > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com >
Re: error message This IndexSchema is not mutable with a classicSchemaIndexFactory
On 3/12/2015 12:43 PM, Pedro Figueiredo wrote: > I saw an issue in Jira (https://issues.apache.org/jira/browse/SOLR-7234) > with status Resolved, but the resolution is not identified in the issue. > > I am facing the exact same problem.. and not able to identified the > solution. I believe the problem is that you are using ClassicSchemaIndexFactory, but you did not remove AddSchemaFieldsUpdateProcessorFactory from the updateRequestProcessorChain config. That update processor requires the managed schema factory. Chances are that you started with the data-driven example config set and then realized you did not need/want the managed schema, so you switched to the classic factory. If you do not want the managed schema, you should probably start with the techproducts example rather than the data-driven example. I think we need to add some info to the schemaFactory comment in the data-driven example config so that people know they need to also modify the update processor chain when they want to disable the Schema API. Thanks, Shawn
Re: SSD endurance
Lucene 5 has added a lot of various CRCs to catch index corruption situations. I don't know if it is 'perfect', but there was certainly a lot of work. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 18:39, Markus Jelsma wrote: > Thanks for sharing Toke! > > Reliability should not be a problem for a Solr cloud environment. A corrupted > index cannot be loaded due to exceptions so the core should not enter an > active state. However, what would happen if parts of the data become > corrupted but can still be processed by the codec? I don't even know if the > data has a CRC check to guard against such madness? > > Markus > > -Original message- >> From:Toke Eskildsen >> Sent: Thursday 12th March 2015 21:33 >> To: solr-user >> Subject: SSD endurance >> >> For those who have not yet taken the leap to SSD goodness because they are >> afraid of flash wear, the burnout test from The Tech Report seems worth a >> read. The short story is that they wrote data to the drives until they wore >> out. All tested drives survived considerably longer than guaranteed, but 4/6 >> failed catastrophically when they did die. >> >> http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead >> >> I am disappointed about the catastrophic failures. One of the promises of >> SSDs was graceful end of life by switching to read-only mode. Some of them >> did give warnings before the end, but I wonder how those are communicated in >> a server environment? >> >> >> Regarding Lucene/Solr, the write pattern when updating an index is benign to >> SSDs: Updates are relatively bulky, rather than the evil >> constantly-flip-random-single-bits-and-flush pattern of databases. With >> segments being immutable, the bird's eye view is that Lucene creates and >> deletes large files, which makes it possible for the SSD's wear-leveler to >> select the least-used flash sectors for new writes: The write pattern over >> time is not too far from the one that The Tech Report tested with. >> >> - Toke Eskildsen >> Whose trusty old 160GB Intel X25-M reports an accumulated 36TB of writes. >>
RE: [Poll]: User need for Solr security
Jan - we don't really need any security for our products, nor for most clients. However, one client does deal with very sensitive data so we proposed to encrypt the transfer of data and the data on disk through a Lucene Directory. It won't fill all gaps but it would adhere to such a client's guidelines. I think many approaches of security in Solr/Lucene would find advocates, be it index encryption or authentication/authorization or transport security, which is now possible. I understand the reluctance of the PMC, and i agree with it, but some users would definitately benefit and it would certainly make Solr/Lucene the search platform to use for some enterprises. Markus -Original message- > From:Henrique O. Santos > Sent: Thursday 12th March 2015 23:43 > To: solr-user@lucene.apache.org > Subject: Re: [Poll]: User need for Solr security > > Hi, > > I’m currently working with indexes that need document level security. Based > on the user logged in, query results would omit documents that this user > doesn’t have access to, with LDAP integration and such. > > I think that would be nice to have on a future Solr release. > > Henrique. > > > On Mar 12, 2015, at 7:32 AM, Jan Høydahl wrote: > > > > Hi, > > > > Securing various Solr APIs has once again surfaced as a discussion in the > > developer list. See e.g. SOLR-7236 > > Would be useful to get some feedback from Solr users about needs "in the > > field". > > > > Please reply to this email and let us know what security aspect(s) would be > > most important for your company to see supported in a future version of > > Solr. > > Examples: Local user management, AD/LDAP integration, SSL, authenticated > > login to Admin UI, authorization for Admin APIs, e.g. admin user vs > > read-only user etc > > > > -- > > Jan Høydahl, search solution architect > > Cominvent AS - www.cominvent.com > > > >
RE: SSD endurance
Hello Alexandre - if you, and others, allow me to be a bit lazy right now; are there unit tests that input corrupted segments, where not the structure but the data is affected, to the codec? Thanks, Markus -Original message- > From:Alexandre Rafalovitch > Sent: Thursday 12th March 2015 23:52 > To: solr-user > Subject: Re: SSD endurance > > Lucene 5 has added a lot of various CRCs to catch index corruption > situations. I don't know if it is 'perfect', but there was certainly a > lot of work. > > Regards, > Alex. > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 12 March 2015 at 18:39, Markus Jelsma wrote: > > Thanks for sharing Toke! > > > > Reliability should not be a problem for a Solr cloud environment. A > > corrupted index cannot be loaded due to exceptions so the core should not > > enter an active state. However, what would happen if parts of the data > > become corrupted but can still be processed by the codec? I don't even know > > if the data has a CRC check to guard against such madness? > > > > Markus > > > > -Original message- > >> From:Toke Eskildsen > >> Sent: Thursday 12th March 2015 21:33 > >> To: solr-user > >> Subject: SSD endurance > >> > >> For those who have not yet taken the leap to SSD goodness because they are > >> afraid of flash wear, the burnout test from The Tech Report seems worth a > >> read. The short story is that they wrote data to the drives until they > >> wore out. All tested drives survived considerably longer than guaranteed, > >> but 4/6 failed catastrophically when they did die. > >> > >> http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead > >> > >> I am disappointed about the catastrophic failures. One of the promises of > >> SSDs was graceful end of life by switching to read-only mode. Some of them > >> did give warnings before the end, but I wonder how those are communicated > >> in a server environment? > >> > >> > >> Regarding Lucene/Solr, the write pattern when updating an index is benign > >> to SSDs: Updates are relatively bulky, rather than the evil > >> constantly-flip-random-single-bits-and-flush pattern of databases. With > >> segments being immutable, the bird's eye view is that Lucene creates and > >> deletes large files, which makes it possible for the SSD's wear-leveler to > >> select the least-used flash sectors for new writes: The write pattern over > >> time is not too far from the one that The Tech Report tested with. > >> > >> - Toke Eskildsen > >> Whose trusty old 160GB Intel X25-M reports an accumulated 36TB of writes. > >> >
Re: backport Heliosearch features to Solr
Are there any results of off-heap cache vs JRE 8 with G1GC? On 10 March 2015 at 11:13, Alexandre Rafalovitch wrote: > Ask and you shall receive: > SOLR-7210 Off-Heap filter cache > SOLR-7211 Off-Heap field cache > SOLR-7212 Parameter substitution > SOLR-7214 JSON Facet API > SOLR-7216 JSON Request API > > Regards, >Alex. > P.s. Oh, the power of GMail filters :-) > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 9 March 2015 at 18:59, Markus Jelsma > wrote: > > Ok, so what's next? Do you intend to open issues and send the links over > here so interested persons can follow them? Clearly some would like to see > features to merge. Let's see what the PMC thinks about it :) > > > > Cheers, > > M. > > > > -Original message- > >> From:Yonik Seeley > >> Sent: Monday 9th March 2015 19:53 > >> To: solr-user@lucene.apache.org > >> Subject: Re: backport Heliosearch features to Solr > >> > >> Thanks everyone for voting! > >> > >> Result charts (note that these auto-generated charts don't show blanks > >> as equivalent to "0") > >> > https://docs.google.com/forms/d/1gaMpNpHVdquA3q75yiFhqZhAWdWB-K6N8Jh3dBbWAU8/viewanalytics > >> > >> Raw results spreadsheet (correlations can be interesting), and > >> percentages at the bottom. > >> > https://docs.google.com/spreadsheets/d/1uZ2qgOaKx1ZxJ_NKwj2zIAYFQ9fp8OrEPI5hqadcPeY/ > >> > >> -Yonik > >> > >> > >> On Sun, Mar 1, 2015 at 4:50 PM, Yonik Seeley wrote: > >> > As many of you know, I've been doing some work in the experimental > >> > "heliosearch" fork of Solr over the past year. I think it's time to > >> > bring some more of those changes back. > >> > > >> > So here's a poll: Which Heliosearch features do you think should be > >> > brought back to Apache Solr? > >> > > >> > http://bit.ly/1E7wi1Q > >> > (link to google form) > >> > > >> > -Yonik > >> > -- Damien Kamerman
Re: Best way to dump out entire solr content?
Thanks Alex for quick response. I wanted to avoid reading the lucene index to prevent complications of merging deleted info. Also I would like to do this on very frequent basis as well like once in two or three days. I am wondering if the issues that I faced while scraping the index towards higher order of millions will get resolved with Cursor. Do you think using cursor to scrap solr with sort on unique key field is better than not using it and does it not do the same skip operations and take more time as without using cursor? Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734p4192745.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to dump out entire solr content?
Without cursor, you are rerunning a full search every time. So, slow down is entirely expected. With cursor, you do not. It does an internal skip based on cursor value. I think the sort is there to ensure the value is stable. Basically, you need to use the cursor. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 19:05, vsriram30 wrote: > Thanks Alex for quick response. I wanted to avoid reading the lucene index to > prevent complications of merging deleted info. Also I would like to do this > on very frequent basis as well like once in two or three days. > > I am wondering if the issues that I faced while scraping the index towards > higher order of millions will get resolved with Cursor. Do you think using > cursor to scrap solr with sort on unique key field is better than not using > it and does it not do the same skip operations and take more time as without > using cursor? > > Thanks, > Sriram > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734p4192745.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: SSD endurance
Well, I don't know this issue to such level of granularity. Perhaps others do. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 18:57, Markus Jelsma wrote: > Hello Alexandre - if you, and others, allow me to be a bit lazy right now; > are there unit tests that input corrupted segments, where not the structure > but the data is affected, to the codec? > > Thanks, > Markus > > > > -Original message- >> From:Alexandre Rafalovitch >> Sent: Thursday 12th March 2015 23:52 >> To: solr-user >> Subject: Re: SSD endurance >> >> Lucene 5 has added a lot of various CRCs to catch index corruption >> situations. I don't know if it is 'perfect', but there was certainly a >> lot of work. >> >> Regards, >> Alex. >> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: >> http://www.solr-start.com/ >> >> >> On 12 March 2015 at 18:39, Markus Jelsma wrote: >> > Thanks for sharing Toke! >> > >> > Reliability should not be a problem for a Solr cloud environment. A >> > corrupted index cannot be loaded due to exceptions so the core should not >> > enter an active state. However, what would happen if parts of the data >> > become corrupted but can still be processed by the codec? I don't even >> > know if the data has a CRC check to guard against such madness? >> > >> > Markus >> > >> > -Original message- >> >> From:Toke Eskildsen >> >> Sent: Thursday 12th March 2015 21:33 >> >> To: solr-user >> >> Subject: SSD endurance >> >> >> >> For those who have not yet taken the leap to SSD goodness because they >> >> are afraid of flash wear, the burnout test from The Tech Report seems >> >> worth a read. The short story is that they wrote data to the drives until >> >> they wore out. All tested drives survived considerably longer than >> >> guaranteed, but 4/6 failed catastrophically when they did die. >> >> >> >> http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead >> >> >> >> I am disappointed about the catastrophic failures. One of the promises of >> >> SSDs was graceful end of life by switching to read-only mode. Some of >> >> them did give warnings before the end, but I wonder how those are >> >> communicated in a server environment? >> >> >> >> >> >> Regarding Lucene/Solr, the write pattern when updating an index is benign >> >> to SSDs: Updates are relatively bulky, rather than the evil >> >> constantly-flip-random-single-bits-and-flush pattern of databases. With >> >> segments being immutable, the bird's eye view is that Lucene creates and >> >> deletes large files, which makes it possible for the SSD's wear-leveler >> >> to select the least-used flash sectors for new writes: The write pattern >> >> over time is not too far from the one that The Tech Report tested with. >> >> >> >> - Toke Eskildsen >> >> Whose trusty old 160GB Intel X25-M reports an accumulated 36TB of writes. >> >> >>
RE: backport Heliosearch features to Solr
Hello - i would assume off-heap would out perform any heap based data structure. G1 is only useful if you deal with very large heaps, and it eats CPU at the same time. As much as G1 is better than CMS in same cases, you would still have less wasted CPU time and resp. less STW events. Anyway. if someone has a setup at hand to provide details, please do :) -Original message- > From:Damien Kamerman > Sent: Friday 13th March 2015 0:02 > To: solr-user@lucene.apache.org > Subject: Re: backport Heliosearch features to Solr > > Are there any results of off-heap cache vs JRE 8 with G1GC? > > On 10 March 2015 at 11:13, Alexandre Rafalovitch wrote: > > > Ask and you shall receive: > > SOLR-7210 Off-Heap filter cache > > SOLR-7211 Off-Heap field cache > > SOLR-7212 Parameter substitution > > SOLR-7214 JSON Facet API > > SOLR-7216 JSON Request API > > > > Regards, > >Alex. > > P.s. Oh, the power of GMail filters :-) > > > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > > http://www.solr-start.com/ > > > > > > On 9 March 2015 at 18:59, Markus Jelsma > > wrote: > > > Ok, so what's next? Do you intend to open issues and send the links over > > here so interested persons can follow them? Clearly some would like to see > > features to merge. Let's see what the PMC thinks about it :) > > > > > > Cheers, > > > M. > > > > > > -Original message- > > >> From:Yonik Seeley > > >> Sent: Monday 9th March 2015 19:53 > > >> To: solr-user@lucene.apache.org > > >> Subject: Re: backport Heliosearch features to Solr > > >> > > >> Thanks everyone for voting! > > >> > > >> Result charts (note that these auto-generated charts don't show blanks > > >> as equivalent to "0") > > >> > > https://docs.google.com/forms/d/1gaMpNpHVdquA3q75yiFhqZhAWdWB-K6N8Jh3dBbWAU8/viewanalytics > > >> > > >> Raw results spreadsheet (correlations can be interesting), and > > >> percentages at the bottom. > > >> > > https://docs.google.com/spreadsheets/d/1uZ2qgOaKx1ZxJ_NKwj2zIAYFQ9fp8OrEPI5hqadcPeY/ > > >> > > >> -Yonik > > >> > > >> > > >> On Sun, Mar 1, 2015 at 4:50 PM, Yonik Seeley wrote: > > >> > As many of you know, I've been doing some work in the experimental > > >> > "heliosearch" fork of Solr over the past year. I think it's time to > > >> > bring some more of those changes back. > > >> > > > >> > So here's a poll: Which Heliosearch features do you think should be > > >> > brought back to Apache Solr? > > >> > > > >> > http://bit.ly/1E7wi1Q > > >> > (link to google form) > > >> > > > >> > -Yonik > > >> > > > > > > -- > Damien Kamerman >
Re: Best way to dump out entire solr content?
Thanks Alex for explanation. Actually since I am scraping all the contents from Solr, I am doing a generic query of *:* So I think it should not take so much time right? But as you say probably the internal skips using the cursor might be more efficient than the skip done with increasing the start, I will use the cursors. Kindly correct me if my understanding is not right. Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734p4192750.html Sent from the Solr - User mailing list archive at Nabble.com.
how to store _text field
Hi folks, I googled and tried without success so I ask you: how can I modify the setting of a field to store it ? It is interesting to note that I did not add _text field so I guess it is a default one. Maybe it is normal that it is not showed on the result but actually this is my real problem. It could be grand also to copy it in a new field but I do not know how to do it with the last Solr (5) and the new kind of schema. I know that I have to use curl but I do not know how to use it to copy a field. Thank you in advance! Cheers, Mirko
Re: how to store _text field
Wait, step back. This is confusing. What's your real problem you are trying to solve? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 19:50, Mirko Torrisi wrote: > Hi folks, > > I googled and tried without success so I ask you: how can I modify the > setting of a field to store it ? > > It is interesting to note that I did not add _text field so I guess it is a > default one. Maybe it is normal that it is not showed on the result but > actually this is my real problem. It could be grand also to copy it in a new > field but I do not know how to do it with the last Solr (5) and the new kind > of schema. I know that I have to use curl but I do not know how to use it to > copy a field. > > Thank you in advance! > Cheers, > > Mirko
Re: [Poll]: User need for Solr security
I would love to see record level (or even field level) restricted access in Solr / Lucene. This should be group level, LDAP like or some rule base (which can be dynamic). If the solution means having a second core, so be it. The following is the closest I found: https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security but I cannot use Manifold CF (Connector Framework). Does anyone know how Manifold does it? - MJ -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Thursday, March 12, 2015 6:51 PM To: solr-user@lucene.apache.org Subject: RE: [Poll]: User need for Solr security Jan - we don't really need any security for our products, nor for most clients. However, one client does deal with very sensitive data so we proposed to encrypt the transfer of data and the data on disk through a Lucene Directory. It won't fill all gaps but it would adhere to such a client's guidelines. I think many approaches of security in Solr/Lucene would find advocates, be it index encryption or authentication/authorization or transport security, which is now possible. I understand the reluctance of the PMC, and i agree with it, but some users would definitately benefit and it would certainly make Solr/Lucene the search platform to use for some enterprises. Markus -Original message- > From:Henrique O. Santos > Sent: Thursday 12th March 2015 23:43 > To: solr-user@lucene.apache.org > Subject: Re: [Poll]: User need for Solr security > > Hi, > > I’m currently working with indexes that need document level security. Based > on the user logged in, query results would omit documents that this user > doesn’t have access to, with LDAP integration and such. > > I think that would be nice to have on a future Solr release. > > Henrique. > > > On Mar 12, 2015, at 7:32 AM, Jan Høydahl wrote: > > > > Hi, > > > > Securing various Solr APIs has once again surfaced as a discussion > > in the developer list. See e.g. SOLR-7236 Would be useful to get some > > feedback from Solr users about needs "in the field". > > > > Please reply to this email and let us know what security aspect(s) would be > > most important for your company to see supported in a future version of > > Solr. > > Examples: Local user management, AD/LDAP integration, SSL, > > authenticated login to Admin UI, authorization for Admin APIs, e.g. > > admin user vs read-only user etc > > > > -- > > Jan Høydahl, search solution architect Cominvent AS - > > www.cominvent.com > > > >
Whole RAM consumed while Indexing.
Hello, I have written a python script to do 2 documents indexing each time on Solr. I have 28 GB RAM with 8 CPU. When I started indexing, at that time 15 GB RAM was freed. While indexing, all RAM is consumed but **not** a single document is indexed. Why so? And it through *HTTPError: HTTP Error 503: Service Unavailable* in python script. I think it is due to heavy load on Zookeeper by which all nodes went down. I am not sure about that. Any help please.. Or anything else is happening.. And how to overcome this issue. Please assist me towards right path. Thanks.. Warm Regards, Nitin Solanki
Re: Whole RAM consumed while Indexing.
What's your commit strategy? Explicit commits? Soft commits/hard commits (in solrconfig.xml)? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 23:19, Nitin Solanki wrote: > Hello, > I have written a python script to do 2 documents indexing > each time on Solr. I have 28 GB RAM with 8 CPU. > When I started indexing, at that time 15 GB RAM was freed. While indexing, > all RAM is consumed but **not** a single document is indexed. Why so? > And it through *HTTPError: HTTP Error 503: Service Unavailable* in python > script. > I think it is due to heavy load on Zookeeper by which all nodes went down. > I am not sure about that. Any help please.. > Or anything else is happening.. > And how to overcome this issue. > Please assist me towards right path. > Thanks.. > > Warm Regards, > Nitin Solanki
Re: Whole RAM consumed while Indexing.
Hi Alexandre, *Hard Commit* is : ${solr.autoCommit.maxTime:3000} false *Soft Commit* is : ${solr.autoSoftCommit.maxTime:300} And I am committing 2 documents each time. Is it good config for committing? Or I am good something wrong ? On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch wrote: > What's your commit strategy? Explicit commits? Soft commits/hard > commits (in solrconfig.xml)? > > Regards, >Alex. > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 12 March 2015 at 23:19, Nitin Solanki wrote: > > Hello, > > I have written a python script to do 2 documents indexing > > each time on Solr. I have 28 GB RAM with 8 CPU. > > When I started indexing, at that time 15 GB RAM was freed. While > indexing, > > all RAM is consumed but **not** a single document is indexed. Why so? > > And it through *HTTPError: HTTP Error 503: Service Unavailable* in python > > script. > > I think it is due to heavy load on Zookeeper by which all nodes went > down. > > I am not sure about that. Any help please.. > > Or anything else is happening.. > > And how to overcome this issue. > > Please assist me towards right path. > > Thanks.. > > > > Warm Regards, > > Nitin Solanki >
Re: Where is schema.xml and solrconfig.xml in solr 5.0.0
Thanks Shawn and Erick for explanation... On Thu, Mar 12, 2015 at 9:02 PM, Shawn Heisey wrote: > On 3/12/2015 9:18 AM, Erick Erickson wrote: > > By and large, I really never use linking. But it's about associating a > > config set > > you've _already_ uploaded with a collection. > > > > So uploading is pushing the configset from your local machine up to > Zookeeper, > > and linking is using that uploaded, named configuration with an > > arbitrary collection. > > > > But usually you just make this association when creating the collection. > > The primary use case that I see for linkconfig is in testing upgrades to > configurations. So let's say you have a production collection that uses > a config that you name fooV1 for foo version 1. You can build a test > collection that uses a config named fooV2, work out all the bugs, and > then when you're ready to deploy it, you can use linkconfig to link your > production collection to fooV2, reload the collection, and you're using > the new config. I haven't discussed here how to handle the situation > where a reindex is required. > > One thing you CAN do is run linkconfig for a collection that doesn't > exist yet, and then you don't need to include collection.configName when > you create the collection, because the link is already present in > zookeeper. I personally don't like doing things this way, but I'm > pretty sure it works. > > Thanks, > Shawn > >
Parsing error on space
Hi, I want to retrieve the parent document which contain "Test Street" in street field or if any of it's child contain "Test Street" in childStreet field. So, I've used the following syntax. q=street:"Test Street" OR {!parent which="type:parent"}childStreet:"Test Street" If the query after the OR condition is a parent query it's not executing. I'm getting the records based on the first query alone. So, I tried using the filter query as below. q="*:*"&fq=street:"Test Street" OR {!parent which="type:parent"}childStreet:"Test Street". This query retrieves records based on both the condition, but when the query string contains multiple words like "Test Street" I'm getting EOF exception and it's not parsing due to space. Any approach to overcome this. Thanks in advance Rajesh -- View this message in context: http://lucene.472066.n3.nabble.com/Parsing-error-on-space-tp4192796.html Sent from the Solr - User mailing list archive at Nabble.com.