Local Solr and Webserver-Solr act differently ("and" treated like "or")
Hello Solr-Experts, I am currently having a strange issue with my solr querys. I am running a small php/mysql-website that uses Solr for faster text-searches in name-lists, movie-titles, etc. Recently I noticed that the results on my local development-environment differ from those on my webserver. Both use the 100% same mysql-database with identical solr-queries for data-import. This is a sample query: http://localhost:8080/solr/select/?q=title%3A%28into+AND+the+AND+wild*%29&version=2.2&start=0&rows=1000&indent=on&fl=titleid It is autogenerated by an php-script and 100% identical on local and on my webserver. My local solr gives me the expected results: all entries that have the words "into" AND "the" AND "wild*" in them. But my webserver acts as if I was looking for "into" OR "the" OR "wild*", eventhough the query is the same (as shown above). That's why I get useless (too many) results on the webserver-side. I don't know what could be the issue. I have tried to check the config-files but I don't really know what to look for, so it is overwhelming for me to search through this big file without knowing. What could be the problem, where can I check/find it and how can I solve that problem? In case, additional informations are needed, let me know please. Thank you! (Excuse my poor english, please. It's not my mother-language.)
Re: Local Solr and Webserver-Solr act differently ("and" treated like "or")
My local solr gives me: http://pastebin.com/Q6d9dFmZ and my webserver this: http://pastebin.com/q87WEjVA I copied only the first few hundret lines (of more than 8000) because the webserver output was to big even for pastebin. On 16.10.2013 12:27, Erik Hatcher wrote: > What does the debug output say from debugQuery=true say between the two? > > > > On Oct 16, 2013, at 5:16, Stavros Delisavas wrote: > >> Hello Solr-Experts, >> >> I am currently having a strange issue with my solr querys. I am running >> a small php/mysql-website that uses Solr for faster text-searches in >> name-lists, movie-titles, etc. Recently I noticed that the results on my >> local development-environment differ from those on my webserver. Both >> use the 100% same mysql-database with identical solr-queries for >> data-import. >> This is a sample query: >> >> http://localhost:8080/solr/select/?q=title%3A%28into+AND+the+AND+wild*%29&version=2.2&start=0&rows=1000&indent=on&fl=titleid >> >> It is autogenerated by an php-script and 100% identical on local and on >> my webserver. My local solr gives me the expected results: all entries >> that have the words "into" AND "the" AND "wild*" in them. >> But my webserver acts as if I was looking for "into" OR "the" OR >> "wild*", eventhough the query is the same (as shown above). That's why I >> get useless (too many) results on the webserver-side. >> >> I don't know what could be the issue. I have tried to check the >> config-files but I don't really know what to look for, so it is >> overwhelming for me to search through this big file without knowing. >> >> What could be the problem, where can I check/find it and how can I solve >> that problem? >> >> In case, additional informations are needed, let me know please. >> >> Thank you! >> >> (Excuse my poor english, please. It's not my mother-language.)
Re: Local Solr and Webserver-Solr act differently ("and" treated like "or")
Thank you, I found the file with the stopwords and noticed that my local file is empty (comments only) and the one on my webserver has a big list of english stopwords. That seems to be the problem. I think in general it is a good idea to use stopwords for random searches, but it is not usefull in my special case. Is there a way to (de)activate stopwords query-wise? Like I would like to ignore stopwords when searching in titles but I would like to use stopwords when users do a fulltext-search on whole articles, etc. Thanks again, Stavros On 17.10.2013 09:13, Upayavira wrote: > Stopwords are small words such as "and", "the" or "is",that we might > choose to exclude from our documents and queries because they are such > common terms. Once you have stripped stop words from your above query, > all that is left is the word "wild", or so is being suggested. > > Somewhere in your config, close to solr config.xml, you will find a file > called something like stopwords.txt. Compare these files between your > two systems. > > Upayavira > > On Thu, Oct 17, 2013, at 07:18 AM, Stavros Delsiavas wrote: >> Unfortunatly, I don't really know what stopwords are. I would like it to >> not ignore any words of my query. >> How/Where can I change this stopwords-behaviour? >> >> >> Am 16.10.2013 23:45, schrieb Jack Krupansky: >>> So, the stopwords.txt file is different between the two systems - the >>> first has stop words but the second does not. Did you expect stop >>> words to be removed, or not? >>> >>> -- Jack Krupansky >>> >>> -Original Message- From: Stavros Delsiavas >>> Sent: Wednesday, October 16, 2013 5:02 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Local Solr and Webserver-Solr act differently ("and" >>> treated like "or") >>> >>> Okay I understand, >>> >>> here's the rawquerystring. It was at about line 3000: >>> >>> >>> title:(into AND the AND wild*) >>> title:(into AND the AND wild*) >>> +title:wild* >>> +title:wild* >>> >>> At this place the debug output DOES differ from the one on my local >>> system. But I don't understand why... >>> This is the local debug output: >>> >>> >>> title:(into AND the AND wild*) >>> title:(into AND the AND wild*) >>> +title:into +title:the +title:wild* >>> +title:into +title:the >>> +title:wild* >>> >>> Why is that? Any ideas? >>> >>> >>> >>> >>> Am 16.10.2013 21:03, schrieb Shawn Heisey: >>>> On 10/16/2013 4:46 AM, Stavros Delisavas wrote: >>>>> My local solr gives me: >>>>> http://pastebin.com/Q6d9dFmZ >>>>> >>>>> and my webserver this: >>>>> http://pastebin.com/q87WEjVA >>>>> >>>>> I copied only the first few hundret lines (of more than 8000) because >>>>> the webserver output was to big even for pastebin. >>>>> >>>>> >>>>> >>>>> On 16.10.2013 12:27, Erik Hatcher wrote: >>>>>> What does the debug output say from debugQuery=true say between the >>>>>> two? >>>> What's really needed here is the first part of the section, >>>> which has rawquerystring, querystring, parsedquery, and >>>> parsedquery_toString. The info from your local solr has this part, but >>>> what you pasted from the webserver one didn't include those parts, >>>> because it's further down than the first few hundred lines. >>>> >>>> Thanks, >>>> Shawn >>>>
Re: Local Solr and Webserver-Solr act differently ("and" treated like "or")
Okay, I emtpied the stopword file. I don't know where the wordlist came from. I have never seen this and never touched that file. Anyways... Now my queries do work with one word, like "in" or "to" but the queries still do not work when I use more than one stopword within one query. Instead of too many results I now get NO results at all. What could be the problem? On 17.10.2013 15:02, Jack Krupansky wrote: > The default Solr stopwords.txt file is empty, so SOMEBODY created that > non-empty stop words file. > > The StopFilterFactory token filter in the field type analyzer controls > stop word processing. You can remove that step entirely, or different > field types can reference different stop word files, or some field type > analyzers can use the stop filter and some would not have it. This does > mean that you would have to use different field types for fields that > want different stop word processing. > > -- Jack Krupansky > > -Original Message- From: Stavros Delisavas > Sent: Thursday, October 17, 2013 3:27 AM > To: solr-user@lucene.apache.org > Subject: Re: Local Solr and Webserver-Solr act differently ("and" > treated like "or") > > Thank you, > I found the file with the stopwords and noticed that my local file is > empty (comments only) and the one on my webserver has a big list of > english stopwords. That seems to be the problem. > > I think in general it is a good idea to use stopwords for random > searches, but it is not usefull in my special case. Is there a way to > (de)activate stopwords query-wise? Like I would like to ignore stopwords > when searching in titles but I would like to use stopwords when users do > a fulltext-search on whole articles, etc. > > Thanks again, > Stavros > > > On 17.10.2013 09:13, Upayavira wrote: >> Stopwords are small words such as "and", "the" or "is",that we might >> choose to exclude from our documents and queries because they are such >> common terms. Once you have stripped stop words from your above query, >> all that is left is the word "wild", or so is being suggested. >> >> Somewhere in your config, close to solr config.xml, you will find a file >> called something like stopwords.txt. Compare these files between your >> two systems. >> >> Upayavira >> >> On Thu, Oct 17, 2013, at 07:18 AM, Stavros Delsiavas wrote: >>> Unfortunatly, I don't really know what stopwords are. I would like it to >>> not ignore any words of my query. >>> How/Where can I change this stopwords-behaviour? >>> >>> >>> Am 16.10.2013 23:45, schrieb Jack Krupansky: >>>> So, the stopwords.txt file is different between the two systems - the >>>> first has stop words but the second does not. Did you expect stop >>>> words to be removed, or not? >>>> >>>> -- Jack Krupansky >>>> >>>> -Original Message- From: Stavros Delsiavas >>>> Sent: Wednesday, October 16, 2013 5:02 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Local Solr and Webserver-Solr act differently ("and" >>>> treated like "or") >>>> >>>> Okay I understand, >>>> >>>> here's the rawquerystring. It was at about line 3000: >>>> >>>> >>>> title:(into AND the AND wild*) >>>> title:(into AND the AND wild*) >>>> +title:wild* >>>> +title:wild* >>>> >>>> At this place the debug output DOES differ from the one on my local >>>> system. But I don't understand why... >>>> This is the local debug output: >>>> >>>> >>>> title:(into AND the AND wild*) >>>> title:(into AND the AND wild*) >>>> +title:into +title:the +title:wild* >>>> +title:into +title:the >>>> +title:wild* >>>> >>>> Why is that? Any ideas? >>>> >>>> >>>> >>>> >>>> Am 16.10.2013 21:03, schrieb Shawn Heisey: >>>>> On 10/16/2013 4:46 AM, Stavros Delisavas wrote: >>>>>> My local solr gives me: >>>>>> http://pastebin.com/Q6d9dFmZ >>>>>> >>>>>> and my webserver this: >>>>>> http://pastebin.com/q87WEjVA >>>>>> >>>>>> I copied only the first few hundret lines (of more than 8000) because >>>>>> the webserver output was to big even for pastebin. >>>>>> >>>>>> >>>>>> >>>>>> On 16.10.2013 12:27, Erik Hatcher wrote: >>>>>>> What does the debug output say from debugQuery=true say between the >>>>>>> two? >>>>> What's really needed here is the first part of the section, >>>>> which has rawquerystring, querystring, parsedquery, and >>>>> parsedquery_toString. The info from your local solr has this part, >>>>> but >>>>> what you pasted from the webserver one didn't include those parts, >>>>> because it's further down than the first few hundred lines. >>>>> >>>>> Thanks, >>>>> Shawn >>>>> >
How to work with remote solr savely?
Hello Solr-Friends, I have a question about working with solr which is installed on a remote server. I have a php-project with a very big mysql-database of about 10gb and I am also using solr for about 10,000,000 entries indexed for fast search and access of the mysql-data. I have a local copy myself so I can continue to work on the php-project itself, but I want to make it available for more developers too. How can I make solr accessable ONLY for those exclusive developers? For mysql it's no problem to add an additional mysql-user with limited access. But for Solr it seems difficult to me. I have had my administrator restrict the java-port 8080 to localhost only. That way no one outside can access solr or the solr-admin interface. How can I allow access to other developers without making the whole solr-interface (port 8080) available to the public? Thanks, Stavros
Re: How to work with remote solr savely?
Thanks for your fast reply. First of all http basic authentication unfortunatly is not secure. Also this would give every developer full admin priviliges. Anyways, can you tell me where I can do those configurations? Are there any alternative or more secure ways to restrict solr-access? In general extern developers need search-query-access only. They should not be able to write/change the documents or access solr-admin-pages. Thank you Am 22.11.2013 15:34, schrieb michael.boom: > Use HTTP basic authentication, setup in your servlet container > (jetty/tomcat). > > That should work fine if you are *not* using SolrCloud. > > > > - > Thanks, > Michael > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-tp4102612p4102613.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: How to work with remote solr savely?
Thanks for the suggestions. I will have a look at the suggestions and try them out. Am 22.11.2013 16:01, schrieb Hoggarth, Gil: > You could also use one of the proxy scripts, such as > http://code.google.com/p/solr-php-client/, which is coincidentally > linked (eventually) from Michael's suggested SolrSecurity URL. > > -Original Message- > From: michael.boom [mailto:my_sky...@yahoo.com] > Sent: 22 November 2013 14:53 > To: solr-user@lucene.apache.org > Subject: Re: How to work with remote solr savely? > > http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication > > Maybe you could achieve write/read access limitation by setting path > based > authentication: > The update handler "/solr/core/update" should be protected by > authentication, with credentials only known to you. But then of course, > your indexing client will need to authenticate in order to add docs to > solr. > Your select handler "/solr/core/select" could then be open or protected > by http auth with credentials open to developers. > > That's the first idea that comes to mind - haven't tested it. > If you do, feedback and let us know how it went. > > > > - > Thanks, > Michael > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-t > p4102612p4102618.html > Sent from the Solr - User mailing list archive at Nabble.com. >
How to use Solr for two different projects on one server
Dear Solr-Experts, I am using Solr for my current web-application on my server successfully. Now I would like to use it in my second web-application that is hosted on the same server. Is it possible in any way to create two independent instances/databases in Solr? I know that I could create another set of fields with alternated field names, but I would prefer to be independent on my field naming for all my projects. Also I would like to be able to have one state of my development version and one state of my production version on my server so that I can do tests on my development-state without interference on my production-version. What is the best-practice to achieve this or how can this be done in general? I have searched google but could not get any usefull results because I don't even know what terms to search for with solr. A minimal-example would be most helpfull. Thanks a lot! Stavros
Re: How to use Solr for two different projects on one server
Thanks for the fast responses. Looks like exactly what I was looking for! Am 23.01.2014 09:46, schrieb Furkan KAMACI: > Hi; > > Firstly you should read here and learn the terminology of Solr: > http://wiki.apache.org/solr/SolrTerminology > > Thanks; > Furkan KAMACI > > > 2014/1/23 Alexandre Rafalovitch > >> If you are not worried about them stepping on each other's toes >> (performance, disk space, etc), just create multiple collections. >> There are examples of that in standard distribution (e.g. badly named >> example/multicore). >> >> Regards, >> Alex. >> Personal website: http://www.outerthoughts.com/ >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >> - Time is the quality of nature that keeps events from happening all >> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD >> book) >> >> >> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas >> wrote: >>> Dear Solr-Experts, >>> >>> I am using Solr for my current web-application on my server successfully. >>> Now I would like to use it in my second web-application that is hosted >>> on the same server. Is it possible in any way to create two independent >>> instances/databases in Solr? I know that I could create another set of >>> fields with alternated field names, but I would prefer to be independent >>> on my field naming for all my projects. >>> >>> Also I would like to be able to have one state of my development version >>> and one state of my production version on my server so that I can do >>> tests on my development-state without interference on my >> production-version. >>> What is the best-practice to achieve this or how can this be done in >>> general? >>> >>> I have searched google but could not get any usefull results because I >>> don't even know what terms to search for with solr. >>> A minimal-example would be most helpfull. >>> >>> Thanks a lot! >>> >>> Stavros
Re: How to use Solr for two different projects on one server
I didn't know that the "core"-term is associated with this use case. I expected it to be some technical feature that allows to run more solr-instances for better multithread-cpu-usage. For example to activate two solr-cores when two cpu-cores are available on the server. So in general, I have the feeling that the term "core" is somewhat confusing for solr-beginners like me. Am 23.01.2014 09:54, schrieb Alexandre Rafalovitch: > Which is why it is curious that you did not find it. Looking back at > it now, do you have a suggestion of what could be improved to insure > people find this easier in the future? > > Regards, >Alex. > Personal website: http://www.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Thu, Jan 23, 2014 at 3:49 PM, Stavros Delisavas > wrote: >> Thanks for the fast responses. Looks like exactly what I was looking for! >> >> >> >> >> Am 23.01.2014 09:46, schrieb Furkan KAMACI: >>> Hi; >>> >>> Firstly you should read here and learn the terminology of Solr: >>> http://wiki.apache.org/solr/SolrTerminology >>> >>> Thanks; >>> Furkan KAMACI >>> >>> >>> 2014/1/23 Alexandre Rafalovitch >>> >>>> If you are not worried about them stepping on each other's toes >>>> (performance, disk space, etc), just create multiple collections. >>>> There are examples of that in standard distribution (e.g. badly named >>>> example/multicore). >>>> >>>> Regards, >>>> Alex. >>>> Personal website: http://www.outerthoughts.com/ >>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >>>> - Time is the quality of nature that keeps events from happening all >>>> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD >>>> book) >>>> >>>> >>>> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas >>>> wrote: >>>>> Dear Solr-Experts, >>>>> >>>>> I am using Solr for my current web-application on my server successfully. >>>>> Now I would like to use it in my second web-application that is hosted >>>>> on the same server. Is it possible in any way to create two independent >>>>> instances/databases in Solr? I know that I could create another set of >>>>> fields with alternated field names, but I would prefer to be independent >>>>> on my field naming for all my projects. >>>>> >>>>> Also I would like to be able to have one state of my development version >>>>> and one state of my production version on my server so that I can do >>>>> tests on my development-state without interference on my >>>> production-version. >>>>> What is the best-practice to achieve this or how can this be done in >>>>> general? >>>>> >>>>> I have searched google but could not get any usefull results because I >>>>> don't even know what terms to search for with solr. >>>>> A minimal-example would be most helpfull. >>>>> >>>>> Thanks a lot! >>>>> >>>>> Stavros
Re: How to use Solr for two different projects on one server
So far, I successfully managed to create a core from my existing configuration by opening this URL in my browser: http://localhost:8080/solr/admin/cores?action=CREATE&name=glPrototypeCore&instanceDir=/etc/solr New status from http://localhost:8080/solr/admin/cores?action=STATUS is: 0 4 /usr/share/solr/./ /var/lib/solr/data/ 2014-01-23T08:42:39.087Z 3056197 4401029 4401029 1370010628806 12 true false org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@77c58801 2013-10-29T14:17:22Z glPrototypeCore /etc/solr/ /var/lib/solr/data/ 2014-01-23T09:29:30.019Z 245267 4401029 4401029 1370010628806 12 true false org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@5ad83862 2013-10-29T14:17:22Z >From my understanding I now have an unnamed core and a core named "glPrototypeCore" which uses the same configuration. I copied the files data-config.xml, schema.xml into a new directory "/etc/solr/glinstance" and tried to create another core but this always throws me error 400. I even tried by adding the schema- and config-parameters with full path, but this did not lead to any difference. Also I don't understand what the "dataDir"-parameter is for. I could not find any data-directories in /etc/solr/ but the creation of the first core worked anyway. Can someone help? Is there any better place for my new instance-directory and what files do I really need? Am 23.01.2014 10:10, schrieb Stavros Delisavas: > I didn't know that the "core"-term is associated with this use case. I > expected it to be some technical feature that allows to run more > solr-instances for better multithread-cpu-usage. For example to activate > two solr-cores when two cpu-cores are available on the server. > > So in general, I have the feeling that the term "core" is somewhat > confusing for solr-beginners like me. > > > > Am 23.01.2014 09:54, schrieb Alexandre Rafalovitch: >> Which is why it is curious that you did not find it. Looking back at >> it now, do you have a suggestion of what could be improved to insure >> people find this easier in the future? >> >> Regards, >>Alex. >> Personal website: http://www.outerthoughts.com/ >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >> - Time is the quality of nature that keeps events from happening all >> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD >> book) >> >> >> On Thu, Jan 23, 2014 at 3:49 PM, Stavros Delisavas >> wrote: >>> Thanks for the fast responses. Looks like exactly what I was looking for! >>> >>> >>> >>> >>> Am 23.01.2014 09:46, schrieb Furkan KAMACI: >>>> Hi; >>>> >>>> Firstly you should read here and learn the terminology of Solr: >>>> http://wiki.apache.org/solr/SolrTerminology >>>> >>>> Thanks; >>>> Furkan KAMACI >>>> >>>> >>>> 2014/1/23 Alexandre Rafalovitch >>>> >>>>> If you are not worried about them stepping on each other's toes >>>>> (performance, disk space, etc), just create multiple collections. >>>>> There are examples of that in standard distribution (e.g. badly named >>>>> example/multicore). >>>>> >>>>> Regards, >>>>> Alex. >>>>> Personal website: http://www.outerthoughts.com/ >>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >>>>> - Time is the quality of nature that keeps events from happening all >>>>> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD >>>>> book) >>>>> >>>>> >>>>> On Thu, Jan 23, 2014 at 3:36 PM, Stavros Delisavas >>>>> wrote: >>>>>> Dear Solr-Experts, >>>>>> >>>>>> I am using Solr for my current web-application on my server successfully. >>>>>> Now I would like to use it in my second web-application that is hosted >>>>>> on the same server. Is it possible in any way to create two independent >>>>>> instances/databases in Solr? I know that I could create another set of >>>>>> fields with alternated field names, but I would prefer to be independent >>>>>> on my field naming for all my projects. >>>>>> >>>>>> Also I would like to be able to have one state of my development version >>>>>> and one state of my production version on my server so that I can do >>>>>> tests on my development-state without interference on my >>>>> production-version. >>>>>> What is the best-practice to achieve this or how can this be done in >>>>>> general? >>>>>> >>>>>> I have searched google but could not get any usefull results because I >>>>>> don't even know what terms to search for with solr. >>>>>> A minimal-example would be most helpfull. >>>>>> >>>>>> Thanks a lot! >>>>>> >>>>>> Stavros
Re: How to use Solr for two different projects on one server
Thanks a lot, those are great examples. I managed to get my cores working. What I noticed so far is that the first (auto-created) core is symlinking files to /etc/solr/... or to /var/lib/solr/... I now am not sure where my self made-collections should be. Shall I create folders in /usr/share/solr/ and symlink to my files in /etc/solr or can I have hard-copies in my collection-folders? Is /usr/share/solr/ a good place for my collection-folders at all? Am 23.01.2014 12:16, schrieb Alexandre Rafalovitch: > You need config-dir level schema.xml, and solrconfig.xml. For multiple > collections, you also need a top-level solr.xml. And unless the config > files a lot of references to other files, you need nothing else. > > For examples, check the example directory in the distribution. Or have > a look at examples from my book: > https://github.com/arafalov/solr-indexing-book/tree/master/published . > This shows the solr.xml that points at a lot of collections. The first > nearly minimal collection is collection1, but you can then explore > others for various degree of complexity. > > Regards, >Alex. > Personal website: http://www.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Thu, Jan 23, 2014 at 4:55 PM, Stavros Delisavas > wrote: >> So far, I successfully managed to create a core from my existing >> configuration by opening this URL in my browser: >> >> http://localhost:8080/solr/admin/cores?action=CREATE&name=glPrototypeCore&instanceDir=/etc/solr >> >> New status from http://localhost:8080/solr/admin/cores?action=STATUS is: >> >> >> >> 0 >> 4 >> >> >> >> >> /usr/share/solr/./ >> /var/lib/solr/data/ >> 2014-01-23T08:42:39.087Z >> 3056197 >> >> 4401029 >> 4401029 >> 1370010628806 >> 12 >> true >> false >> >> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index >> lockFactory=org.apache.lucene.store.NativeFSLockFactory@77c58801 >> >> 2013-10-29T14:17:22Z >> >> >> >> glPrototypeCore >> /etc/solr/ >> /var/lib/solr/data/ >> 2014-01-23T09:29:30.019Z >> 245267 >> >> 4401029 >> 4401029 >> 1370010628806 >> 12 >> true >> false >> >> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index >> lockFactory=org.apache.lucene.store.NativeFSLockFactory@5ad83862 >> >> 2013-10-29T14:17:22Z >> >> >> >> >> >> >> From my understanding I now have an unnamed core and a core named >> "glPrototypeCore" which uses the same configuration. >> >> I copied the files data-config.xml, schema.xml into a new directory >> "/etc/solr/glinstance" and tried to create another core but this always >> throws me error 400. I even tried by adding the schema- and >> config-parameters with full path, but this did not lead to any >> difference. Also I don't understand what the "dataDir"-parameter is for. >> I could not find any data-directories in /etc/solr/ but the creation of >> the first core worked anyway. >> >> Can someone help? Is there any better place for my new >> instance-directory and what files do I really need? >> >> >> >> >> >> Am 23.01.2014 10:10, schrieb Stavros Delisavas: >>> I didn't know that the "core"-term is associated with this use case. I >>> expected it to be some technical feature that allows to run more >>> solr-instances for better multithread-cpu-usage. For example to activate >>> two solr-cores when two cpu-cores are available on the server. >>> >>> So in general, I have the feeling that the term "core" is somewhat >>> confusing for solr-beginners like me. >>> >>> >>> >>> Am 23.01.2014 09:54, schrieb Alexandre Rafalovitch: >>>> Which is why it is curious that you did not find it. Looking back at >>>> it now, do you have a suggestion of what could be improved to insure >>>> people find this easier in the future? >>>> >>>> Regards, >>>>Alex. >>>> Personal website: http://www.outerthoughts.com/ >>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >>>> - Time is the quality of nature that keeps events from happening all &g
Re: How to use Solr for two different projects on one server
I installed solr via apt-get and followed the online tutorials that I found to adjust the existing schema.xml and created dataconfig.xml the way I needed them. Was this the wrong approach? I don't know what Bitname stack is. Am 23.01.2014 12:50, schrieb Alexandre Rafalovitch: > You are not doing this on a download distribution, do you? You are > using Bitnami stack or something. That's why you are not seeing the > examples folder, etc. > > I recommend step back, use downloaded distribution and do your > learning and setup using that. Then, go and see where your production > stack put various bits of Solr. Otherwise, you are doing two (15?) > things at once. > > Regards, >Alex. > P.s. If you like the examples, the book actually explains what they > do. You could be quarter way to mastery in less than 24 hours... > Personal website: http://www.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Thu, Jan 23, 2014 at 6:38 PM, Stavros Delisavas > wrote: >> Thanks a lot, >> those are great examples. I managed to get my cores working. What I >> noticed so far is that the first (auto-created) core is symlinking files >> to /etc/solr/... or to /var/lib/solr/... >> >> I now am not sure where my self made-collections should be. Shall I >> create folders in /usr/share/solr/ and symlink to my >> files in /etc/solr or can I have hard-copies in my collection-folders? >> Is /usr/share/solr/ a good place for my collection-folders at all? >> >> >> >> Am 23.01.2014 12:16, schrieb Alexandre Rafalovitch: >>> You need config-dir level schema.xml, and solrconfig.xml. For multiple >>> collections, you also need a top-level solr.xml. And unless the config >>> files a lot of references to other files, you need nothing else. >>> >>> For examples, check the example directory in the distribution. Or have >>> a look at examples from my book: >>> https://github.com/arafalov/solr-indexing-book/tree/master/published . >>> This shows the solr.xml that points at a lot of collections. The first >>> nearly minimal collection is collection1, but you can then explore >>> others for various degree of complexity. >>> >>> Regards, >>>Alex. >>> Personal website: http://www.outerthoughts.com/ >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >>> - Time is the quality of nature that keeps events from happening all >>> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD >>> book) >>> >>> >>> On Thu, Jan 23, 2014 at 4:55 PM, Stavros Delisavas >>> wrote: >>>> So far, I successfully managed to create a core from my existing >>>> configuration by opening this URL in my browser: >>>> >>>> http://localhost:8080/solr/admin/cores?action=CREATE&name=glPrototypeCore&instanceDir=/etc/solr >>>> >>>> New status from http://localhost:8080/solr/admin/cores?action=STATUS is: >>>> >>>> >>>> >>>> 0 >>>> 4 >>>> >>>> >>>> >>>> >>>> /usr/share/solr/./ >>>> /var/lib/solr/data/ >>>> 2014-01-23T08:42:39.087Z >>>> 3056197 >>>> >>>> 4401029 >>>> 4401029 >>>> 1370010628806 >>>> 12 >>>> true >>>> false >>>> >>>> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index >>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@77c58801 >>>> >>>> 2013-10-29T14:17:22Z >>>> >>>> >>>> >>>> glPrototypeCore >>>> /etc/solr/ >>>> /var/lib/solr/data/ >>>> 2014-01-23T09:29:30.019Z >>>> 245267 >>>> >>>> 4401029 >>>> 4401029 >>>> 1370010628806 >>>> 12 >>>> true >>>> false >>>> >>>> org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/lib/solr/data/index >>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@5ad83862 >>>> >>>> 2013-10-29T14:17:22Z >>>> >>>> >>>> >>>> >>>> >>>> >>>> From my understanding I now ha
How to query multiple words correctly
Hello Solr-Community, I am having some strange behavior that I don't understand. I hope you can help. I try to query/search for two words. For example "(*foo* AND *bar*) What I want is to get all entries that contain the string foo AND contain the word bar. What I get is all entries that contain foo OR contain bar. But I want entries that contain BOTH words. like: "foobar 123", "bla foo bla bar", "blafoobla bar", etc What do I have to change in my query to get the desired result? Thank you!
Re: How to query multiple words correctly
Thank you, problem solved! On 13.07.2013 12:16, Otis Gospodnetic wrote: Hi, Does the same happen if you use +*foo* +*bar* syntax? If such queries turn out to be too slow, consider indexing ngrams. Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Sat, Jul 13, 2013 at 5:40 AM, Stavros Delisavas wrote: Hello Solr-Community, I am having some strange behavior that I don't understand. I hope you can help. I try to query/search for two words. For example "(*foo* AND *bar*) What I want is to get all entries that contain the string foo AND contain the word bar. What I get is all entries that contain foo OR contain bar. But I want entries that contain BOTH words. like: "foobar 123", "bla foo bla bar", "blafoobla bar", etc What do I have to change in my query to get the desired result? Thank you!
data-import problem
Hello Solr-Friends, I have a problem with my current solr configuration. I want to import two tables into solr. I got it to work for the first table, but the second table doesn't get imported (no errormessage, 0 rows skipped). I have two tables called name and title and i want to load their fields called id, name and id title (two id colums that have nothing to do with each other) This is in my data-config.xml: and this is in my schema.xml:
data-import problem
Hello Solr-Friends, I have a problem with my current solr configuration. I want to import two tables into solr. I got it to work for the first table, but the second table doesn't get imported (no errormessage, 0 rows skipped). I have two tables called name and title and i want to load their fields called id, name and id title (two id colums that have nothing to do with each other) This is in my data-config.xml: and this is in my schema.xml: id I chose that unique key only because solr asked for it. In my SolrAdmin Scheme-Browser I can see three fields id, name and title, but titleid is missing and title itself is empy with no entries. I don't know how to get it work to index two seperate lists. I hope someone can help, thank you!
data-import problem
Hello Solr-Friends, I have a problem with my current solr configuration. I want to import two tables into solr. I got it to work for the first table, but the second table doesn't get imported (no errormessage, 0 rows skipped). I have two tables called name and title and i want to load their fields called id, name and id title (two id colums that have nothing to do with each other) This is in my data-config.xml: and this is in my schema.xml: id I chose that unique key only because solr asked for it. In my SolrAdmin Scheme-Browser I can see three fields id, name and title, but titleid is missing and title itself is empy with no entries. I don't know how to get it work to index two seperate lists. I hope someone can help, thank you! PS: I am sorry if this mail reached you twice. I sent it the first time when I was not registered yet and don't know if the mail was received. Sending now again after registration to mailing list.
Re: data-import problem
Thanks so far. This change makes Solr work over the title-entries too, yay! Unfortunatly they don't get processed(skipped rows). In my log it says "missing required field id" for every entry. I checked my schema.xml. In there "id" is not set as a required field. removing the uniquekey-property also leads to no improvement. Any further ideas? Am 05.06.2013 18:01, schrieb sodoo: Maybe problem is two document declare in data-config.xml. You will try change this one. -- View this message in context: http://lucene.472066.n3.nabble.com/data-import-problem-tp4068306p4068373.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: data-import problem
Thanks for the hints. I am not sure how to solve this issue. I previously made a typo, there are definetly two different tables. Here is my real configuration: http://pastebin.com/JUDzaMk0 For testing purposes I added "LIMIT 10" to the SQL-statements because my tables are very huge and tests would take too long (about 5gb, 6.5million rows). I included my whole data-config and what I have changed from the default schema.xml. I don't know how to solve the "all ids have to be unique"-problem. I can not believe that Solr does not offer any solution at all to handle multiple data sources with their own individual ids. Maybe its possible to have solr create its own ids while importing the data? Actually there is no direct relation between my "name"-table and my "title"-table. All I want is to be able to do fast text-search in those two tables in order to find the belonging ids of these entries. Let me know if you need more information. Thank you! Am 05.06.2013 20:54, schrieb Gora Mohanty: On 6 June 2013 00:09, Stavros Delisavas wrote: Thanks so far. This change makes Solr work over the title-entries too, yay! Unfortunatly they don't get processed(skipped rows). In my log it says "missing required field id" for every entry. I checked my schema.xml. In there "id" is not set as a required field. removing the uniquekey-property also leads to no improvement. [...] There are several things wrong with your problem statement. You say that you have two tables, but both SELECTs seem to use the same table. I am going to assume that you really have two different tables. Unless you have changed the default schema.xml, "id" should be defined as the uniqueKey for the document. You probably do not want to remove that, and even if you just remove the uniqueKey property, the field "id" remains defined as a required field. The issue is with with your SELECT for the second entity: This renames "id" to titleid, and hence the required field "id" in schema.xml is missing. While you do need something like: However, you will need to ensure that the ids are unique in the two tables, else entries from the second entity will overwrite matching ids from the first. Also, do you have field definitions within the entities? Please share the complete schema.xml and the DIH configuration file with us, rather than snippets: Use pastebin.com if they are large. Regards, Gora
Re: data-import problem
I tryed to deactivate the uniquekey, but that made solr not work at all. I got Error 500 for everything (no admin page, etc). So I had to reactivate it. This is my current configuration as you recommended. Unfortunatly still no improvement. The second table doesn't get recorded. I included the errormessage of the log file. http://pastebin.com/0vut38qL Has no one ever successfully imported two tables into solr before? Am 06.06.2013 00:01, schrieb bbarani: A Solr index does not need a unique key, but almost all indexes use one. http://wiki.apache.org/solr/UniqueKey Try the below query passing id as id instead of titleid.. A proper dataimport config will look like, -- View this message in context: http://lucene.472066.n3.nabble.com/data-import-problem-tp4068345p4068447.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Heap space problem with mlt query
I recently had the same issue which could be fixed very easily. Add the property batchSize="-1" to your -tag. Tell me if that helped. Am 06.06.2013 11:30, schrieb Varsha Rani: Hi, As per suggestions , changed in my config file as : reduced document cache size from 31067 to 16384 and autowarmcount from 2046 to 1024. My machine RAM size is 16GB , 1 GB RAM used as index of 85GB started. my config file as : 128 I am running 20-25 mlt queries in 1 sec . As with each mlt query RAM used increases continuously. As RAM used reached to 6GB, java heap space problem occur. With each 5 continuous mlt queries RAM used increased by 1GB. -- View this message in context: http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068541.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: data-import problem
It's surprising to me that all tables have to have a relationship in order to be used in solr. What if I have two indipendent projects running on the same webserver? I would not be able to use Solr for both of them, really? That would be very dissappointing... Anyway, luckily there is an indirect relationship between the two tables but there is an "N to N" relationship with a thrid table in between. The full join in MySQL would be something like this: SELECT (cast.id??), title.id, title.title, name.id, name.name FROM name, title, cast WHERE title.id = cast.movie_id AND cast.person_id = name.id But this will definatly lead to multiple entries of name.name and title.title because they are connected with an N-to-N relationship. So the resulting table would not have unique keys either!! Nor title.id or name.id. There is another id available cast.id which could be used as a unique id, but its a completly useless and irrelevant id which has no connection/relation to anything else at all. So there is no real use for it to include it, unless Solr really needs a unique id. I am still a noob with Solr. Can you please help me to adapt the given Join to the xml-syntax for my data-config.xml? That would be very great! Am 06.06.2013 17:58, schrieb bbarani: The below error clearly says that you have declared a unique id but that unique id is missing for some documents. org.apache.solr.common.SolrException: [doc=null] missing required field: nameid This is mainly because you are just trying to import 2 tables in to a document without any relationship between the data of 2 tables. table 1 has the nameid (unique key) but table 2 has to be joined with table 1 to form a relationship between the 2 tables. You can't just dump the value since table 2 might have more values than table1 (but table1 has the unique id). I am not sure of your table structure, I am assuming that there is a key (ex: nameid in title table) that can be used to join name and title table. Try something like this.. * * -- View this message in context: http://lucene.472066.n3.nabble.com/data-import-problem-tp4068345p4068636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: data-import problem
Unfortunatly my two tables do not share a unique key. they both have integers as keys starting with number 1. Is there any way to overcome this problem? Removing the uniquekey-property from my schema.xml leads to solr not working (I have tryed that already). The link you provided is showing what I have already tryed before which was leading to my current problem. When I setup my data-config as shown in that thread, my second table does not get recorded because of the missing field (name.id/nameid the unique key) in my "title"-table... Am 06.06.2013 18:32, schrieb bbarani: You don't really need to have a relationship but the unique id should be unique in a document. I had mentioned about the relationship due to the fact that the unique key was present only in one table but not the other.. Check out this link for more information on importing multiple table data. http://lucene.472066.n3.nabble.com/Create-index-on-few-unrelated-table-in-Solr-td4068054.html -- View this message in context: http://lucene.472066.n3.nabble.com/data-import-problem-tp4068345p4068650.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: data-import problem
Perfect! This finally worked! Shawn, thank you a lot! How do I set up multiple cores? Again, thank you so much! I was looking for a solution for days! Am 06.06.2013 19:23, schrieb Shawn Heisey: On 6/6/2013 11:15 AM, Stavros Delisavas wrote: Unfortunatly my two tables do not share a unique key. they both have integers as keys starting with number 1. Is there any way to overcome this problem? Removing the uniquekey-property from my schema.xml leads to solr not working (I have tryed that already). The link you provided is showing what I have already tryed before which was leading to my current problem. When I setup my data-config as shown in that thread, my second table does not get recorded because of the missing field (name.id/nameid the unique key) in my "title"-table... Change the id field to a StrField in your schema, and then use something like this: If these documents have no connection to each other at all, set up multiple cores so they are entirely separate indexes. Thanks, Shawn
Re: data-import problem
Think about movies and the cast of a movie. There are movies (title) which have their unique ids. And there are many people (name) like the producer, actors, etc which have their unique ids. But there are ppl who have been actor in more than one movie. Thats why i have a third table which connects those two tables via name.id and title.id. Anyway, I think my problem is satisfactory solved for me. Do you think I did something wrong? Am 06.06.2013 19:45, schrieb bbarani: Not sure if I understand your situation..I am not sure how would you relate the data between 2 tables if theres no relationship? You are trying to just dump random values from 2 tables in to a document?ConsiderTable1: Name idpeter 1john2mike 3Table2:Title TitleIdCEO 111developer222Officer333Cleaner 444IT 555Your document will look something like..but Peter is a cleaner and not a CEO..1peterCEO> -- View this message in context: http://lucene.472066.n3.nabble.com/data-import-problem-tp4068345p4068677.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: data-import problem
Thats okay. For now, I guess it is okay. Finally I could import all 6.6 million entries successfully. I am happy. Am 06.06.2013 19:44, schrieb Shawn Heisey: On 6/6/2013 11:38 AM, Stavros Delisavas wrote: Perfect! This finally worked! Shawn, thank you a lot! How do I set up multiple cores? Again, thank you so much! I was looking for a solution for days! Cores are defined in solr.xml - the default example core is named collection1. I am struggling to find documentation for multicore that is suitable for a novice. There is some information on this wiki page, but it is geared towards the use of the CoreAdmin API, not multiple cores themselves. wiki.apache.org/solr/CoreAdmin To access a specific core with query urls, you don't use URLs like /solr/select that you might have seen in documentation, you use /solr/corename/select or /solr/corename/update instead. Thanks, Shawn