Taking Solr to production with docker
Hi Solr community, I can find many blog posts on how to deploy Solr with docker but I am wondering if Solr/Docker is really ready for production. Has anybody ever ran Solr in production with Docker? Thank you for your feedback, Aurélien
Re: Passivate core in Solr Cloud
Thank you Erick, Ok, I will probably perform some tests. It seems to be a good candidate for a future blog post... Regards, Aurelien On 27.07.2014 20:20, Erick Erickson wrote: "Does not play nice" really means it was designed to run in a non-distributed mode. There has been no work done to verify that it does work in cloud mode, I fully expect some "interesting" problems in that mode. If/when we get to it that is. About replication: I haven't heard of any problems, but I also haven't heard of it working in that environment. I expect that it'll only try to replicate when it's loaded, so that might be interesting Best, Erick On Thu, Jul 24, 2014 at 6:49 AM, Aurélien MAZOYER < aurelien.mazo...@francelabs.com> wrote: Thank you Erick and Alex for your answers. Lots of core stuff seems to meet my requirement but it is a problem if it does not work with Solr Cloud. Is there an issue opened for this problem? If I understand well, the only solution for me is to use multiple monoinstances of Solr using transient cores and to distribute manually the cores for my tenant (I assume the LRU mechanimn will be less effective as it will be done per solr instance). When you say "does NOT play nice with distributed mode", does it also include the standard replication mecanism? Thanks, Regards, Aurelien Le 23/07/2014 17:21, Erick Erickson a écrit : Do note that the lots of cores stuff does NOT play nice with in distributed mode (yet). Best, Erick On Wed, Jul 23, 2014 at 6:00 AM, Alexandre Rafalovitch > wrote: Solr has some support for large number of cores, including transient cores:http://wiki.apache.org/solr/LotsOfCores Regards, Alex. Personal:http://www.outerthoughts.com/ and @arafalov Solr resources:http://www.solr-start.com/ and @solrstart Solr popularizers community:https://www.linkedin.com/groups?gid=6713853 On Wed, Jul 23, 2014 at 7:55 PM, Aurélien MAZOYER wrote: Hello, We want to setup a Solr Cloud cluster in order to handle a high volume of documents with a multi-tenant architecture. The problem is that an application-level isolation for a tenant (using a mutual index with a field "customer") is not enough to fit our requirements. As a result, we need 1 collection/customer. There is more than a thousand customers and it seems unreasonable to create thousands of collections in Solr Cloud... But as we know that there are less than 1 query/customer/day, we are currently looking for a way to passivate collection when they are not in use. Can it be a good idea? If yes, are there best practices to implement this? What side effects can we expect? Do we need to put some application-level logic on top on the Solr Cloud cluster to choose which collection we have to unload (and maybe there is something smarter (and quicker?) than simply loading/unloading the core when it is not in used?) ? Thank you for your answer(s), Aurelien
Re: Character encoding problems
Hi, If you use solr 4.8.1, you don't have to add URIEncoding="UTF-8" in the tomcat conf file anymore : https://wiki.apache.org/solr/SolrTomcat Regards, Aurélien MAZOYER On 29.07.2014 14:22, Gulliver Smith wrote: I have solr 4.8.1 under Tomcat 7 on Debian Linux. The connector in Tomcat's server.xml has been changed to include character encoding UTF-8: I am posting to the server from PHP 5.5 curl. The extract POST was intercepted and confirmed that everything is being encode in UTF-8. However, the responses to query commands, whether XML or JSON are returning field values such as title_fr in something that looks like latin1 or iso-8859-1 when displayed in a browser or editor. E.g.: "title_fr":[" appelé au téléphone"] The highlights in the query response do have correctly displaying character codes. E.g. "text_fr":[" \n \n \n \n \n \n \n \n \n \n \nappelé au téléphone\nappelé au téléphone\n PHP's utf8_decode doesn't make sense of the title_fr. Is there something to configure to fix this and get proper UTF8 results for everything? Thanks Gulliver
Re : Re: Multipart documents with different update cycles
Yes, that is the point : I have to handle complex queries that perform full text search both on user-metadata and main part of documents :-(... Aurélien Do you search the frequently changing user-metadata? If not, maybe the external file field is helpful. https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Fri, Jul 25, 2014 at 12:04 AM, Aurélien MAZOYER wrote: Hello, I have to index a dataset containing multipart documents. The "main" part and the "user metadata" part have different update cycles : we want to update the "user metadata part" frequently without having to refetch the main part from the datasource nor storing every fields in order to use atomic update. As there is no true field level update in Solr yet, I am afraid that I have to build an index for both parts and to perform a query time join, with all the well-known performance limitation. I have also heard of side car index. Is it a solution that can meet my requirements? Is it stable enough to be usable in production? Does the community plan to make it part of the trunk code? Thanks, Aurelien
Re: Solr on AWS ubuntu12.04 instance
Hi Pusakar, Did you try to ping your solr from localhost in your ssh console: curl http://localhost:8983(or 8984 if you change the jetty port)/solr/collection1/admin/ping ? Aurélien On 29.07.2014 15:15, pushkar sawant wrote: Hi Team, I have done Solr 4.9.0 setup on ubuntu 12.04 instanace on AWS. with Java 7. When i start the solr with "java -jar start.jar" it start with attached output. It sys -: 5460 [main] INFO org.eclipse.jetty.server.AbstractConnector – Started SocketConnector@0.0.0.0:8984 When i try to open it through browser it do not open the web interface attache is the error. Please suggest if any one come across with same issue & resolved. Thanks Pusakar
Re: Solr on AWS ubuntu12.04 instance
Ooops, didn't see Andrew's answer: sorry for my redundant answer :-) Aurélien On 29.07.2014 15:47, aurelien.mazo...@francelabs.com wrote: Hi Pusakar, Did you try to ping your solr from localhost in your ssh console: curl http://localhost:8983(or 8984 if you change the jetty port)/solr/collection1/admin/ping ? Aurélien On 29.07.2014 15:15, pushkar sawant wrote: Hi Team, I have done Solr 4.9.0 setup on ubuntu 12.04 instanace on AWS. with Java 7. When i start the solr with "java -jar start.jar" it start with attached output. It sys -: 5460 [main] INFO org.eclipse.jetty.server.AbstractConnector – Started SocketConnector@0.0.0.0:8984 When i try to open it through browser it do not open the web interface attache is the error. Please suggest if any one come across with same issue & resolved. Thanks Pusakar
Re: Searching and highlighting ten's of fields
Hello, Do you use classic highlighter or fast vector highlighter? Aurélien On 30.07.2014 09:36, Manuel Le Normand wrote: Hello, I need to expose the search and highlighting capabilities over few tens of fields. The edismax's qf param makes it possible but the time performances for searching tens of words over tens of fields is problematic. I made a copyField (indexed, not stored) for these fields, which gives way better search performances but does not enable highlighting the original fields which are stored. Is there any way of searching this copyField and highlighting other fields with any of the highlight components? BTW, I need to keep the field structure so storing the copyField is not an alternative.
Re: Multiple shards in the same Solr instance server
Hi Nabil, Does this blog answer to your question? : http://solr.pl/en/2013/01/07/solr-4-1-solrcloud-multiple-shards-on-the-same-solr-node/ Regards, Aurélien On 10.10.2014 11:48, nabil Kouici wrote: Hi All, in Solr, is it possible to create for the same index multiple shards in the same Solr instance (or server)? Regards, Nabil. On 10.10.2014 11:48, nabil Kouici wrote: Hi All, in Solr, is it possible to create for the same index multiple shards in the same Solr instance (or server)? Regards, Nabil.
Nested documents in Solr
Hi, I have some question regarding nested document queries. For example, let’s say that I have many books, one of which is the following one: Book _title: Nested documents for dummies Chapter1_Title: Introduction Chapter1_Content: Nested documents are fun. Chapter2_Title: Which technology should I use? Chapter2_Content: Lucene of course! First I want to find books that contain an introduction and that are about Lucene. So I decide to flatten my data and use 3 multivalued fields (Book_Title,Chapter_Title and Chapter_Content), I index my document and then I get what I want when I run the following query : “ chapter_title:Introduction AND chapter_title:Lucene “ But now I want to find books that contain “fun” in a chapter which name is “introduction”. My model is no more valid (Chapter2_content is no more linked with Chapter2_title). That is why I change my datamodel and use nested documents: I now have a parent with a single valued field Book_title and different childs with single valued fields Chapter_title and Chapter_Content. Now, when I run the query “chapter_title: Introduction AND chapter_content:fun” I also get what I want… But what do I have to do if I want to use these two kinds of query with a unique data model? Maybe the only way to do this is to use nested documents and to index data both in child documents and in a flattened form in the parent document. Then we will be able to run the two different queries. Do you have any other (better) idea? Thank you, Regards, Aurélien
Re: Nested documents in Solr
Hi Ramzi, Thank you but I am not sure to understand well your answer. In your example, I suppose that the indexed docs are flattened. If I want an AND query instead of an OR query (let say, for example 'chapter_title:Lucene AND chapter_content:fun'), how can I be sure that the terms "Lucene" and "fun" will be matched in the same chapter of the book? (since in this case chapter_content and chapter_title are multivalued fields)? Regards, Aurélien On 21.10.2014 19:59, Ramzi Alqrainy wrote: I think if I have your question right, You can use multiple custom query syntax. You explicitly specify an alternative query parser such as DisMax or eDisMax, you're using the standard Lucene query parser by default. In your case, I think I can solve it by using this query chapter_title:Introduction ( chapter_title:Lucene OR chapter_content:fun ) Here are some query examples demonstrating the query syntax. *Keyword matching* Search for word "foo" in the title field. title:foo Search for phrase "foo bar" in the title field. title:"foo bar" Search for phrase "foo bar" in the title field AND the phrase "quick fox" in the body field. title:"foo bar" AND body:"quick fox" Search for either the phrase "foo bar" in the title field AND the phrase "quick fox" in the body field, or the word "fox" in the title field. (title:"foo bar" AND body:"quick fox") OR title:fox Search for word "foo" and not "bar" in the title field. title:foo -title:bar *Wildcard matching* Search for any word that starts with "foo" in the title field. title:foo* Search for any word that starts with "foo" and ends with bar in the title field. title:foo*bar Note that Lucene doesn't support using a * symbol as the first character of a search. -- View this message in context: http://lucene.472066.n3.nabble.com/Nested-documents-in-Solr-tp4165099p4165232.html Sent from the Solr - User mailing list archive at Nabble.com. Hi, I have question regarding nested document queries: For example, let’s say that I have the following book: Book _title: Nested document for dummies Chapter1_Title: Introduction Chapter1_Content: Nested documents are fun. Chapter2_Title: Which technology should I use? Chapter2_Content: Lucene of course! First I want to find books that contain an introduction and that are about Lucene. So I decide to flatten my data and use 3 multivalued fields (Book_Title,Chapter_Title and Chapter_Content), I index my document and then I get what I want when I use the following query : “ chapter_title:Introduction AND chapter_title:Lucene “ Now I want to find books that contain “fun” in a chapter called “introduction”. My model is no more valid (Chapter2_content is no more linked with Chapter2_title). That is why I change my datamodel and use nested documents: I have now a parent with a single valued field Book_title and different childs with single valued fields Chapter_title and Chapter_Content. Now, when I run the query “chapter_title: Introduction AND chapter_content:fun” I also get what I want… But what do I have to do if I want to use these two kinds of query with a unique data model? Thank you, Regards, Aurélien MAZOYER - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org