Tomcat has a HTTP Post bug?
As I read, that Tomcat would need to be configured to support international characters in a HTTP Get, I determined to use a HTTP Post instead. Testing our Solr integration worked with testcases using Jetty perfectly, however it turned out that HTTP Post in combination with a query containing international characters is not supported by Tomcat (I get 0 results back). So I finally use HTTP Get to do queries, with an additional configuring of conf/server.xml. This however has the disatvantage that queries with a char length of >1000 do not always work. Is there perhaps another configuration of Tomcat requried but not described in the Solr Wiki to use international chars in a Post query?
Re: Solr Tutorial Issue
Yousef Ourabi wrote: Use any text editor to open /etc/hosts. You'll probably have to either log in as root or use sudo since you probably won't have permissions. This is quickly drifting out of solr-land, so you might want to engage a more general linux community such as linuxquestions.org. -Yousef - Original Message - From: "Kirk Beers" <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Saturday, December 15, 2007 1:42:21 PM (GMT-0800) America/Los_Angeles Subject: Re: Solr Tutorial Issue How would I add that ? Yousef Ourabi wrote: Try adding just 'kirk' to the end of the 2nd line so it looks like this: 127.0.1.1 kirk.nald.ca kirk You can also confirm that just 'kirk' is the hostname by running the 'hostname' command. - Original Message - From: "Kirk Beers" <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Saturday, December 15, 2007 8:37:07 AM (GMT-0800) America/Los_Angeles Subject: Re: Solr Tutorial Issue Here is the result: [EMAIL PROTECTED]:~$ cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 kirk.nald.ca # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts 64.5.219.172 instructors.gonssal.ca [EMAIL PROTECTED]:~$ Yousef Ourabi wrote: can you run: "cat /etc/hosts" and paste the output in an email. - Original Message - From: "Kirk Beers" <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, December 14, 2007 12:46:13 PM (GMT-0800) America/Los_Angeles Subject: Re: Solr Tutorial Issue Hi Hoss, I get an error that reads java.net.UnknownHostException kirk : kirk I will point out that I am new to Linux as well ! Thanks Kirk Chris Hostetter wrote: This thread in general is really confusing to me ... if you are following along withthe tutorial then tomcat should never enter the equation ... "java -jar start.jar" will use a copy of Jetty that is included in the Solr release to spin up a self contained webserver totally indepent of any application server you may already have installed. (that's the whole reason for the start.jar, you don't need to worry about wether you have a servlet container installed correctly) When you run "java -jar start.jar" do you get logging output in your console? is there anything in that logging output that looks like a stack trace? what gets added to the end of that loggign output when you then hit the url http://localhost:8983/solr/admin/ in your browser? : I am running Ubuntu, Java1.6 jdk and tomcat5.5. I can not seem to get the : tutorial to run. The instructions seem simple and clear. : : start.jar ran fine but when I used http://localhost:8983/solr/admin/ : nothing appeared. I also individually copied the : apache-solr-nightly/dist/apache-solr-nightly.war and the : apache-solr-nightly/example/webapps/solr.war to my tomcat webapps and still : nothing!! -Hoss Thank-you Yousef, I am up and running :-)
Re: retrieve lucene "doc id"
On Dec 17, 2007 1:40 AM, Ben Incani <[EMAIL PROTECTED]> wrote: > I have converted to using the Solr search interface and I am trying to > retrieve documents from a list of search results (where previously I had > used the doc id directly from the lucene query results) and the solr id > I have got currently indexed is unfortunately configured not be unique! Ouch... I'd try to make a unique Id then! Or barring that, just try to make the query match exactly the docs you want back (don't do the 2 phase thing). > I do realise that lucene internal ids are transient, but for a read-only > requests (that are not cached) should be ok. But if you use 2 separate requests, an index can change versions between when you get the list of ids and when you request the documents for those ids. That's why this isn't safe as a general feature. Actually, it may be possible in the future... I had planned something like for more internal use (distributed search). The first request returns the index version, and then subsequent requests specify the index version to ensure that the internal lucene ids remain unchanged. -Yonik
RE: MultiCore problem
> If you started with the example confit, make small changed till it > stops working as expected. The problem was using consistency assumptions instead of looking at what the real url was. so I was using solr/select?core=core1 instead of solr/@core1/select simply because the multicore admin works that way. silly me. Best Regards, Martin Owens
Re: Tomcat has a HTTP Post bug?
On Dec 17, 2007 4:22 AM, Jörg Kiegeland <[EMAIL PROTECTED]> wrote: > As I read, that Tomcat would need to be configured to support > international characters in a HTTP Get, I determined to use a HTTP Post > instead. > Testing our Solr integration worked with testcases using Jetty > perfectly, however it turned out that HTTP Post in combination with a > query containing international characters is not supported by Tomcat (I > get 0 results back). GET can need extra server configuration because there is no way to specify a charset in the request (it's actually supposed to be UTF-8 percent encoded per the standards, but sometimes not configured that way for historical back compatibility I guess). When you use POST, you can and should specify the charset. If you are doing this, it should work. -Yonik
Re: Tomcat has a HTTP Post bug?
Did you set Content-Type HTTP request header to application/x-www-form-urlencoded? Koji Jörg Kiegeland wrote: As I read, that Tomcat would need to be configured to support international characters in a HTTP Get, I determined to use a HTTP Post instead. Testing our Solr integration worked with testcases using Jetty perfectly, however it turned out that HTTP Post in combination with a query containing international characters is not supported by Tomcat (I get 0 results back). So I finally use HTTP Get to do queries, with an additional configuring of conf/server.xml. This however has the disatvantage that queries with a char length of >1000 do not always work. Is there perhaps another configuration of Tomcat requried but not described in the Solr Wiki to use international chars in a Post query?
Multiple Solr Webapps
Hello, I've got this dumb problem. I've tried to browse the mailing list archive, but there are way too much messages (btw, is there a way to "fullsearch" the archives ?)... I'm trying to deploy several solr instance on my linux server, following the solr wiki instruction : I've created TWO context fragment files (solr1.xml solr2.xml), each one pointing on a different solr directory ( and ) and the same solr.war ( and ) to have it working fine. I would prefer to have only one instance of solr.war, as specified in the solr wiki ( http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac ). Do someone know if this a known problem, and if there a solution to it ? Thank you very much, Pierre-Yves Landron _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
Re: Tomcat has a HTTP Post bug?
When you use POST, you can and should specify the charset. If you are doing this, it should work. Where can I do this? Have you any example? I have a QueryRequest instance, a SolrQuery and a SolrServer instance and set the query by solrQuery.setQuery(query) where "query" is a String containing Japanese characters. Do I have to escape them somehow (Java strings are UTF-16 and not UTF-8 encoded I guess)? On which instance shall I set the charset and how?
Re: retrieve lucene "doc id"
Yonik Seeley wrote: On Dec 17, 2007 1:40 AM, Ben Incani <[EMAIL PROTECTED]> wrote: I have converted to using the Solr search interface and I am trying to retrieve documents from a list of search results (where previously I had used the doc id directly from the lucene query results) and the solr id I have got currently indexed is unfortunately configured not be unique! Ouch... I'd try to make a unique Id then! Or barring that, just try to make the query match exactly the docs you want back (don't do the 2 phase thing). In 1.3-dev, you can use UUIDField to have solr generate a UUID for each doc. ryan
Re: Multiple Solr Webapps
Pierre-Yves LANDRON wrote: Hello, I've got this dumb problem. I've tried to browse the mailing list archive, but there are way too much messages (btw, is there a way to "fullsearch" the archives ?)... try: http://www.nabble.com/Solr-f14479.html I'm trying to deploy several solr instance on my linux server, following the solr wiki instruction : I've created TWO context fragment files (solr1.xml solr2.xml), each one pointing on a different solr directory ( and ) and the same solr.war ( and ) to have it working fine. I would prefer to have only one instance of solr.war, as specified in the solr wiki ( http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac ). With resin, I have one .war that is exploded for multiple web-apps. This works fine -- i have not tried with tomcat. ryan
Re: Tomcat has a HTTP Post bug?
On Dec 17, 2007 11:04 AM, Jörg Kiegeland <[EMAIL PROTECTED]> wrote: > > When you use POST, you can and should specify the charset. If you are > > doing this, it should work. > > > > Where can I do this? Have you any example? I have a QueryRequest > instance, a SolrQuery and a SolrServer instance > and set the query by solrQuery.setQuery(query) where "query" is a String > containing Japanese characters. Ah, sorry, I hadn't realized you were using SolrJ. It looks like SolrJ uses percent encoded UTF8 in the POST body for parameters, just as it does in the URL. Does anyone know if this double-encoding (percent encoding of UTF-8 bytes) is a standard for application/x-www-form-urlencoded? Is there any reason we shouldn't just use UTF8 directly and declare that in the Content-Type? $ nc -l -p 8983 POST /solr/select HTTP/1.1 User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0 Host: localhost:8983 Content-Length: 42 Content-Type: application/x-www-form-urlencoded q=features%3Ah%C3%A9llo&wt=xml&version=2.2 -Yonik
Re: Tomcat has a HTTP Post bug?
On Dec 17, 2007 11:04 AM, Jörg Kiegeland <[EMAIL PROTECTED]> wrote: > When you use POST, you can and should specify the charset. If you are > doing this, it should work. > Where can I do this? Have you any example? I have a QueryRequest instance, a SolrQuery and a SolrServer instance and set the query by solrQuery.setQuery(query) where "query" is a String containing Japanese characters. Ah, sorry, I hadn't realized you were using SolrJ. It looks like SolrJ uses percent encoded UTF8 in the POST body for parameters, just as it does in the URL. Does anyone know if this double-encoding (percent encoding of UTF-8 bytes) is a standard for application/x-www-form-urlencoded? I don't believe it is. I had to code up some support for handling form data sent with a PUT request, and the logic that I copied from (I believe) the Resin web server was: 1. Make sure the contentType was either unspecified or application/x-www-form-urlencoded. 2. Read the body as a byte array, then use the request charset to convert to a String. If the request charset was unspecified, assume us-ascii. Typically the charset isn't specified, e.g. if you use the curl tool to POST data then no charset is sent with the Content-Type header. 3. Convert all key/value pairs using URLDecoder.deccode(string, "UTF-8") Is there any reason we shouldn't just use UTF8 directly and declare that in the Content-Type? Since the key/value pairs should be URL-encoded, I believe it's standard to assume us-ascii as the charset for the Content-Type. But UTF-8 would work as well, as us-ascii can be viewed as a sub-set of UTF-8. -- Ken $ nc -l -p 8983 POST /solr/select HTTP/1.1 User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0 Host: localhost:8983 Content-Length: 42 Content-Type: application/x-www-form-urlencoded q=features%3Ah%C3%A9llo&wt=xml&version=2.2 -Yonik -- Ken Krugler Krugle, Inc. +1 530-210-6378 "If you can't find it, you can't fix it"
Re: Tomcat has a HTTP Post bug?
It was perhaps not clear in my first mail, but in Jetty, HTTP-Post perfectly works, it does not work with Tomcat however. So the question is: Has Jetty a bug (that it works but it shouldn't) or has Tomcat a bug (that it conforms to some standard so that it is not allowed to work)? And has anyone a workaround I can use now so that Solrj & HTTP-Post works with Tomcat?
Need help setting up with Eclipse and Tomcat - looking for $consultant
Hi list, I was once a Java developer but have spent the past years working on the Drupal project (PHP). This means my Java skills have weakened a bit. In particular, I'm not sure how to go about getting the source code for solr into Eclipse and running it in a way that I can use the step through debugger. I also have a couple pretty normal configuration questions where I would really benefit from having someone to talk to for a little bit. I'm looking for someone who has a good grasp of these things who can offer a couple hours of paid telephone support this week. My boss graciously gave me a little budget to support this, so if you're interested in bringing a fellow OS developer up to speed, send me a mail with your hourly rate and some available times. I'm in Europe so my schedule jives best with N. America's morning/early afternoon. Thanks! Robert Douglass [EMAIL PROTECTED]
Recompiled Solr1.3~2007-12-13 Dies
Hello, I've just been rolling in my highlighter changes to the 2007-12-13 build of Solr, but even though the whole thing compiles I'm getting the following odd error when I run a search: SEVERE: java.lang.NoClassDefFoundError: org/apache/solr/search/ScorePriorityQueue at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:892) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:808) at org.apache.solr.search.SolrIndexSearcher.getDocList(SolrIndexSearcher.java:693) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:104) at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:155) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:117) at org.apache.solr.core.SolrCore.execute(SolrCore.java:874) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:283) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:234) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) I had a search and I see this happened before with the PHP writer, but I'm using the standard writer. Thought? As always, thanks so much for your help. Best Regards, Martin Owens
Re: Tomcat has a HTTP Post bug?
It looks like SolrJ uses percent encoded UTF8 in the POST body for parameters, just as it does in the URL. Does anyone know if this double-encoding (percent encoding of UTF-8 bytes) is a standard for application/x-www-form-urlencoded? I don't believe it is. It is the way it is because it worked and then I moved on ;) char-set stuff has always felt a bit like voodoo to me. I think we should do whatever is most standard and likely to work on most servers with limited fuss. ryan
Re: Recompiled Solr1.3~2007-12-13 Dies
Try 'ant clean' first. On 17-Dec-07, at 10:34 AM, Owens, Martin wrote: Hello, I've just been rolling in my highlighter changes to the 2007-12-13 build of Solr, but even though the whole thing compiles I'm getting the following odd error when I run a search: SEVERE: java.lang.NoClassDefFoundError: org/apache/solr/search/ ScorePriorityQueue at org.apache.solr.search.SolrIndexSearcher.getDocListNC (SolrIndexSearcher.java:892) at org.apache.solr.search.SolrIndexSearcher.getDocListC (SolrIndexSearcher.java:808) at org.apache.solr.search.SolrIndexSearcher.getDocList (SolrIndexSearcher.java:693) at org.apache.solr.handler.component.QueryComponent.process (QueryComponent.java:104) at org.apache.solr.handler.SearchHandler.handleRequestBody (SearchHandler.java:155) at org.apache.solr.handler.RequestHandlerBase.handleRequest (RequestHandlerBase.java:117) at org.apache.solr.core.SolrCore.execute(SolrCore.java:874) at org.apache.solr.servlet.SolrDispatchFilter.execute (SolrDispatchFilter.java:283) at org.apache.solr.servlet.SolrDispatchFilter.doFilter (SolrDispatchFilter.java:234) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle (ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle (SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle (SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle (ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle (WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle (ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle (HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle (HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest (HttpConnection.java:502) at org.mortbay.jetty.HttpConnection $RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable (HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle (HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run (SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run (BoundedThreadPool.java:442) I had a search and I see this happened before with the PHP writer, but I'm using the standard writer. Thought? As always, thanks so much for your help. Best Regards, Martin Owens
Re: Tomcat has a HTTP Post bug?
On Dec 17, 2007 1:33 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > >> It looks like SolrJ uses percent encoded UTF8 in the POST body for > >> parameters, just as it does in the URL. > >> Does anyone know if this double-encoding (percent encoding of UTF-8 > >> bytes) is a standard for application/x-www-form-urlencoded? > > > > I don't believe it is. > > > > It is the way it is because it worked and then I moved on ;) char-set > stuff has always felt a bit like voodoo to me. > > I think we should do whatever is most standard and likely to work on > most servers with limited fuss. It looks to me like HttpClient is doing the encoding... I quickly tried to change it via setting the header to include the charset, but the body turns out the same: $ nc -l -p 8983 POST /solr/select HTTP/1.1 Content-type: application/x-www-form-urlencoded; charset=UTF-8 User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0 Host: localhost:8983 Content-Length: 42 q=features%3Ah%C3%A9llo&wt=xml&version=2.2 This charset declaration makes me feel uncomfortable though, as the body is *not* straight UTF8, but uses the double-coded URI standard from http://www.ietf.org/rfc/rfc2396.txt Unfortunately, I haven't been able to find any standard that relates to the application/x-www-form-urlencoded mime type, and unicode. In the absense of any special way to do it, it seems like it should just obey the normal charset rules for post bodies... of course we need to work with what people have actually implemented. -Yonik
Multi-index searches
Hi, I am interested in using solr and I ran the tutorial but I was wondering if it supports multi-index searching ? Kirk
RE: retrieve lucene "doc id"
We are using MD5 to generate our IDs. MD5s are 128 bits creating a very unique and very randomized number for the content. Nobody has ever reported two different data sets that create the same MD5. We use the standard (some RFC) text representation of 32 hex characters. This has the advantage that F* pulls 1/16 of the total index, with a completely randomized distribution, F** 1/256, etc. This is very handy for data analysis and document extraction. MD5 creates 128 bits, but if your index is small enough that you are willing to risk it, you could pick 64 bits and park them in a Java long. -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, December 17, 2007 8:15 AM To: solr-user@lucene.apache.org Subject: Re: retrieve lucene "doc id" Yonik Seeley wrote: > On Dec 17, 2007 1:40 AM, Ben Incani <[EMAIL PROTECTED]> wrote: >> I have converted to using the Solr search interface and I am trying >> to retrieve documents from a list of search results (where previously >> I had used the doc id directly from the lucene query results) and the >> solr id I have got currently indexed is unfortunately configured not be unique! > > Ouch... I'd try to make a unique Id then! > Or barring that, just try to make the query match exactly the docs you > want back (don't do the 2 phase thing). > In 1.3-dev, you can use UUIDField to have solr generate a UUID for each doc. ryan
Re: Multi-index searches
Kirk Beers wrote: Hi, I am interested in using solr and I ran the tutorial but I was wondering if it supports multi-index searching ? Kirk Allow me to clear that up! I would like to have the documents of 2 indices returned at once. Does solr support that ? Or am will it only return the documents of one index at a time? Kirk
Re: Multi-index searches
Kirk Beers wrote: Kirk Beers wrote: Hi, I am interested in using solr and I ran the tutorial but I was wondering if it supports multi-index searching ? Kirk Allow me to clear that up! I would like to have the documents of 2 indices returned at once. Does solr support that ? Or am will it only return the documents of one index at a time? one index at a time... ryan
Re: Tomcat has a HTTP Post bug?
It was perhaps not clear in my first mail, but in Jetty, HTTP-Post perfectly works, it does not work with Tomcat however. So the question is: Has Jetty a bug (that it works but it shouldn't) or has Tomcat a bug (that it conforms to some standard so that it is not allowed to work)? And has anyone a workaround I can use now so that Solrj & HTTP-Post works with Tomcat? I would first try setting the charset in the POST explicitly to UTF-8. This shouldn't matter, but it appears that Resin & Tomcat both have had issues with not properly handling form data without this. It shouldn't matter, but there's also the issue of configuring Tomcat to handle UTF-8 encoded URLs - http://wiki.apache.org/solr/SolrTomcat Also see https://issues.apache.org/jira/browse/SOLR-214, where there was a patch applied on May 7th of this year to work around what appears to be a bug with Tomcat. I assume you have a version of Solr with this patch. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 "If you can't find it, you can't fix it"
Multiple solr webapps
Hello, I've got this dumb problem. I've tried to browse the mailing list archive, but there are way too much messages (btw, is there a way to "fullsearch" the archives ?)... I'm trying to deploy several solr instance on my linux server, following the solr wiki instruction : I've created TWO context fragment files (solr1.xml solr2.xml), each one pointing on a different solr directory ( and ) and the same solr.war ( and ) to have it working fine. I would prefer to have only one instance of solr.war, as specified in the solr wiki ( http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac ). Do someone know if this a known problem, and if there a solution to it ? Thank you very much, Pierre-Yves Landron _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
Re: Facets - What's a better term for non technical people?
I don't think you have to give the user a label other than the name of the facet field. The beauty of facets is that they are pretty intuitive. Manufacturer Microsoft (140) Logitech Inc. (128) Belkin (127) Rosewill (124) APEVIA (Aspire) (119) STARTECH (97) That said, I've seen them called: Parametric Tag Names Facet (200) Parameter (122) Tag (100) Advanced Selection (20) Select (15) Navigate (13) Filter (2) Bucket (1) Enumeration (1) Category (1) Topic (1) Regards, George On Dec 11, 2007, at 11:16 PM, Otis Gospodnetic wrote: Isn't that GROUP BY ColumnX, count(1) type of thing? I'd think "group by" would be a good label. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: "Norskog, Lance" <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, December 11, 2007 9:38:37 PM Subject: RE: Facets - What's a better term for non technical people? In SQL terms they are: 'select unique'. Except on only one field. -Original Message- From: Charles Hornberger [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 11, 2007 9:49 AM To: solr-user@lucene.apache.org Subject: Re: Facets - What's a better term for non technical people? FAST calls them "navigators" (which I think is a terrible term - YMMV of course :-)) I tend to think that "filters" -- or perhaps "dynamic filters" -- captures the essential function. On Dec 11, 2007 2:38 AM, "DAVIGNON Andre - CETE NP/DIODé/PANDOC" <[EMAIL PROTECTED]> wrote: Hi, So, has anyone got a good example of the language they might use over, say, a set of radio buttons and fields on a web form, to indicate that selecting one or more of these would return facets. 'Show grouping by' or 'List the sets that the results fall into' or something similar. Here's what i found some time : http://www.searchtools.com/info/faceted-metadata.html It has been quite useful to me. André Davignon
RE: Replication hooks - changing the index while the slave is running ...
It works via two Unix file system tricks. 1) Files are not directly bound with with filenames, instead there is a layer of indirection called an 'inode'. So, multiple file and directory names point to the same physical file. The "." and ".." directory entries are implemented this way. 2) Physical files are bound to all open file descriptors even after there are no file names for the files. So, file data exists until all file names are gone AND all open files are gone. Lance -Original Message- From: Tracy Flynn [mailto:[EMAIL PROTECTED] Sent: Saturday, December 15, 2007 7:36 AM To: solr-user@lucene.apache.org Subject: Re: Replication hooks - changing the index while the slave is running ... That helps Thanks for the prompt reply On Dec 15, 2007, at 10:15 AM, Yonik Seeley wrote: > On Dec 14, 2007 7:36 PM, Tracy Flynn > <[EMAIL PROTECTED]> wrote: >> 1) The existing index(es) being used by the Solr slave instance are >> physically deleted >> 2) The new index snapshots are renamed/moved from their temporary >> installation location to the default index location >> 3) The slave is sent a 'commit' to force a new IndexReader to start >> to read the new index. >> >> What happens to search requests against the existing/old index during >> step 1) and between steps 1 and 2? > > Search requests will still work on the old searcher/index. > >> Where do they get information if >> they need to go to disk for results that are not cached? Do they a) >> hang b) produce no results c) error in some other way? > > A lucene IndexReader keeps all the files open that aren't loaded into > memory... and external deletion has no effect on the ability to keep > reading these open files (they aren't really deleted yet). > > -Yonik
Issues with postOptimize
I've set up solrconfig.xml to create a snap shot of an index after doing a optimize, but the snap shot cannot be created because of permission issues. I've set permissions to the bin, data and log directories to read/write/execute for all users. Even with these settings I cannot seem to be able to run snapshooter on the postOptimize event. Any ideas? Could it be a java permissions issue? Thanks. Sunny Config settings: snapshooter /search/replication_test/0/index/solr/bin true Error: Dec 17, 2007 7:45:19 AM org.apache.solr.core.RunExecutableListener exec FINE: About to exec snapshooter Dec 17, 2007 7:45:19 AM org.apache.solr.core.SolrException log SEVERE: java.io.IOException: Cannot run program "snapshooter" (in directory "/search/replication_test/0/index/solr/bin"): java.io.IOException: error=13, Permission denied at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) at java.lang.Runtime.exec(Runtime.java:593) at org.apache.solr.core.RunExecutableListener.exec(RunExecutableListener.ja va:70) at org.apache.solr.core.RunExecutableListener.postCommit(RunExecutableListe ner.java:97) at org.apache.solr.update.UpdateHandler.callPostOptimizeCallbacks(UpdateHan dler.java:105) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2. java:516) at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestH andler.java:214) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpd ateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:159) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 63) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 4) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:584) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied at java.lang.UNIXProcess.(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) ... 23 more
Re: Issues with postOptimize
Make sure that the user running Solr has permission to execute snapshooter. Also, try ./snapshooter instead of snapshooter. Good luck. On Dec 18, 2007 10:57 AM, Sunny Bassan <[EMAIL PROTECTED]> wrote: > I've set up solrconfig.xml to create a snap shot of an index after doing > a optimize, but the snap shot cannot be created because of permission > issues. I've set permissions to the bin, data and log directories to > read/write/execute for all users. Even with these settings I cannot seem > to be able to run snapshooter on the postOptimize event. Any ideas? > Could it be a java permissions issue? Thanks. > > Sunny > > Config settings: > > > snapshooter > /search/replication_test/0/index/solr/bin > true > > > Error: > > Dec 17, 2007 7:45:19 AM org.apache.solr.core.RunExecutableListener exec > FINE: About to exec snapshooter > Dec 17, 2007 7:45:19 AM org.apache.solr.core.SolrException log > SEVERE: java.io.IOException: Cannot run program "snapshooter" (in > directory "/search/replication_test/0/index/solr/bin"): > java.io.IOException: error=13, Permission denied > at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) > at java.lang.Runtime.exec(Runtime.java:593) > at > org.apache.solr.core.RunExecutableListener.exec(RunExecutableListener.ja > va:70) > at > org.apache.solr.core.RunExecutableListener.postCommit(RunExecutableListe > ner.java:97) > at > org.apache.solr.update.UpdateHandler.callPostOptimizeCallbacks(UpdateHan > dler.java:105) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2. > java:516) > at > org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestH > andler.java:214) > at > org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpd > ateRequestHandler.java:84) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB > ase.java:77) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja > va:191) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j > ava:159) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica > tionFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt > erChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv > e.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv > e.java:175) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java > :128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java > :102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. > java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 > 63) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 > 4) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( > Http11Protocol.java:584) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.io.IOException: java.io.IOException: error=13, > Permission denied > at java.lang.UNIXProcess.(UNIXProcess.java:148) > at java.lang.ProcessImpl.start(ProcessImpl.java:65) > at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) > ... 23 more > > > > -- Regards, Cuong Hoang
RE: Issues with postOptimize
Also, the script itself has to be execute mode. Lance -Original Message- From: climbingrose [mailto:[EMAIL PROTECTED] Sent: Monday, December 17, 2007 4:38 PM To: solr-user@lucene.apache.org Subject: Re: Issues with postOptimize Make sure that the user running Solr has permission to execute snapshooter. Also, try ./snapshooter instead of snapshooter. Good luck. On Dec 18, 2007 10:57 AM, Sunny Bassan <[EMAIL PROTECTED]> wrote: > I've set up solrconfig.xml to create a snap shot of an index after > doing a optimize, but the snap shot cannot be created because of > permission issues. I've set permissions to the bin, data and log > directories to read/write/execute for all users. Even with these > settings I cannot seem to be able to run snapshooter on the postOptimize event. Any ideas? > Could it be a java permissions issue? Thanks. > > Sunny > > Config settings: > > > snapshooter > /search/replication_test/0/index/solr/bin > true > > > Error: > > Dec 17, 2007 7:45:19 AM org.apache.solr.core.RunExecutableListener > exec > FINE: About to exec snapshooter > Dec 17, 2007 7:45:19 AM org.apache.solr.core.SolrException log > SEVERE: java.io.IOException: Cannot run program "snapshooter" (in > directory "/search/replication_test/0/index/solr/bin"): > java.io.IOException: error=13, Permission denied at > java.lang.ProcessBuilder.start(ProcessBuilder.java:459) > at java.lang.Runtime.exec(Runtime.java:593) > at > org.apache.solr.core.RunExecutableListener.exec(RunExecutableListener. > ja > va:70) > at > org.apache.solr.core.RunExecutableListener.postCommit(RunExecutableLis > te > ner.java:97) > at > org.apache.solr.update.UpdateHandler.callPostOptimizeCallbacks(UpdateH > an > dler.java:105) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2. > java:516) > at > org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateReques > tH > andler.java:214) > at > org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlU > pd > ateRequestHandler.java:84) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle > rB > ase.java:77) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter. > ja > va:191) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter > .j > ava:159) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli > ca > tionFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi > lt > erChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa > lv > e.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa > lv > e.java:175) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja > va > :128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja > va > :102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. > java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java > :2 > 63) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java: > 84 > 4) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proces > s( > Http11Protocol.java:584) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447 > ) at java.lang.Thread.run(Thread.java:619) > Caused by: java.io.IOException: java.io.IOException: error=13, > Permission denied at > java.lang.UNIXProcess.(UNIXProcess.java:148) > at java.lang.ProcessImpl.start(ProcessImpl.java:65) > at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) > ... 23 more > > > > -- Regards, Cuong Hoang
Re: does solr handle hierarchical facets?
On Dec 13, 2007, at 1:56 AM, Chris Hostetter wrote: ie, if this is your hierarchy... Products/ Products/Computers/ Products/Computers/Laptops Products/Computers/Desktops Products/Cases Products/Cases/Laptops Products/Cases/CellPhones Then this trick won't work (because Laptops appears twice) but if you have numeric IDs that corrispond with each of those categories (so that the two instances of Laptops are unique... 1/ 1/2/ 1/2/3 1/2/4 1/5/ 1/5/6 1/5/7 Why not just use the whole path as the unique identifying token for a given node on the hierarchy? That way, you don't need to map nodes to unique numbers, just use a prefix query. taxonomy:Products/Computers/Laptops* or taxonomy:Products/Cases/Laptops* Sorry - that may be bogus query syntax, but you get the idea. Products/Computers/Laptops* and Products/Cases/Laptops* are two unique identifiers. You just need to make sure they are tokenized properly - which is beyond my current off-the-cuff expertise. At least that is the way I've been doing it with IDOL lately. I dearly hope I can do the same in Solr when the time comes. I have a whole mess of Java code which parses out arbitrary path separated values into real tree structures. I think it would be a useful addition to Solr, or maybe Solrj. It's been knocking around my hard drives for the better part of a decade. If I get enough interest, I'll clean it up and figure out how to offer it up as a part of the code base. I'm pretty naive when it comes to FLOSS, so any authoritative non-condescending hints on how to go about this would be greatly appreciated. Regards, George
Re: does solr handle hierarchical facets?
This approach works (I do a similar thing using solr), but you have to be careful as BooleanQuery.TooManyClauses exception can be thrown depending where you use the wild card. It should be fine in the case you described however. Anyway, there is a pretty interesting discussion about this here: http://www.usit.uio.no/it/vortex/arbeidsomrader/metadata/lucene/ limitations.html Brendan On Dec 17, 2007, at 10:39 PM, George Everitt wrote: On Dec 13, 2007, at 1:56 AM, Chris Hostetter wrote: ie, if this is your hierarchy... Products/ Products/Computers/ Products/Computers/Laptops Products/Computers/Desktops Products/Cases Products/Cases/Laptops Products/Cases/CellPhones Then this trick won't work (because Laptops appears twice) but if you have numeric IDs that corrispond with each of those categories (so that the two instances of Laptops are unique... 1/ 1/2/ 1/2/3 1/2/4 1/5/ 1/5/6 1/5/7 Why not just use the whole path as the unique identifying token for a given node on the hierarchy? That way, you don't need to map nodes to unique numbers, just use a prefix query. taxonomy:Products/Computers/Laptops* or taxonomy:Products/Cases/ Laptops* Sorry - that may be bogus query syntax, but you get the idea. Products/Computers/Laptops* and Products/Cases/Laptops* are two unique identifiers. You just need to make sure they are tokenized properly - which is beyond my current off-the-cuff expertise. At least that is the way I've been doing it with IDOL lately. I dearly hope I can do the same in Solr when the time comes. I have a whole mess of Java code which parses out arbitrary path separated values into real tree structures. I think it would be a useful addition to Solr, or maybe Solrj. It's been knocking around my hard drives for the better part of a decade. If I get enough interest, I'll clean it up and figure out how to offer it up as a part of the code base. I'm pretty naive when it comes to FLOSS, so any authoritative non-condescending hints on how to go about this would be greatly appreciated. Regards, George
RE: Solr replication
Hi, I understand that the Rsync is a Unix/Linux daemon thread which needs to be enable/run to achieve Solr Collection Distribution. Do we have any similar support for the Solr Collection Distribution in the Windows environment or Do we need to write equivalent commands (in the form of batch files) which will do the same steps as the shell scripts placed under solr/bin folder. Thanks in advance. Regards, Dilip. -Original Message- From: Bill Au [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 18, 2007 4:00 AM To: [EMAIL PROTECTED] Subject: Re: Solr replication Rsync is a Unix/Linux command. I dont' know if that's available on Windows. All the distribution scripts were developed and tested under Unix/Linux. They may or may not work on Windows. I don't know much about Windows so if you are running on Windows that I am the wrong person to be asking help. You may want to use the mailing list to see if anyone is doing collection distribution on Windows. Solr is accessed through HTTP so you just need to use HTTP (for example, IE) on a Windows system to access a Solr server. Bill On Dec 17, 2007 8:53 AM, Dilip.TS < [EMAIL PROTECTED]> wrote: Hi Bill, I have a basic question (as im not an expert in unix). I understand that the rsync is a deamon thread (similar to services in Windows). Im not clear about what are the things/steps required to set up this rysncd deamon thread? (Dont mind asking this question againg since im not very much clear about this) Does it mean that the SOLR servers(both master and slave) should be made running on a unix/linux machine only? How does a client (using Windows environment) able to access the SOLR Server running on Unix/Platform? Any links/references would be of great help. Thanks in advance. Regards Dilip -Original Message- From: Bill Au [mailto: [EMAIL PROTECTED] Sent: Saturday, December 15, 2007 1:08 AM To: solr-user@lucene.apache.org; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Solr replication On Dec 14, 2007 7:00 AM, Dilip.TS <[EMAIL PROTECTED]> wrote: > Hi, > I have the following requirement for SOLR Collection Distribution using > Embedded Solr with the Jetty server: > > I have different data folders for multiple instances of SOLR within the > Same > application. > Im using the same SOLR_HOME with a single bin and conf folder. > > My query is: > 1)Is is possible to have the same SOLR_HOME for multiple solr instances > and > still be able to > achieve Solr Distribution? > (As i understand that we need to have differnet rsync port for different > solr instances) Yes, solr distribution will work for multiple solr instances even if they all use the same SOLR_HOME. All the distribution scripts have a command line argument for specifying the data directory. > > 2)Can i get some more information about how to start this rsyncd daemon > and > which is the best way of doing it i.e. to start during system reboot or > doing it manually? Please note that the rsyncd -CollectionDistributionScripts#head-1e6cdce516ecf1eb31bffceaccf2abeb72bd ce81 So it is best to configure the master server to run the rsyncd-start script at system boot time. If the rsync daemon has for some reasons been disabled, it will not be started automatically at system reboot even if it is configured to do so. If rsyncd is started manually, then one will have to remember to start it every time the master server is rebooted. > > 3)Let me know if my understanding is correct. We require 1 Master Server > and > a minimum of 1 slave server. > The master server and the slave server cannot be running on the same > machine. Am i right? > > In the case of the SOLR Distribution, if the SOLR server acts as the > Master server > then how about this slave server ? Is it the Application server which > calls the Master SOLR Server > acts as slave server? Both the master and slave are SOLR servers. Typically they are on different machines. It doesn't make sense (at least not to me) to have both of them on the same machine. > > 4)I observe the file scripts.conf for master server: >solr_port=8983 >rsyncd_port=18983 > >+Enable and start rsync: > rsyncd-enable; rsyncd-start >+Run snapshooter: > snapshooter > >Just to confirm is it mandatory that the solr master server should have > the solr_port as 8983 only? It does not to be 8983. That's just an example. > > > 5) How do we enable and start rsync? The link to > SolrCollectionDistributionScripts mentions about > installing rsyncd daemon either during system boot time or by manually. >
Re: retrieve lucene "doc id"
On Mon, 17 Dec 2007 14:43:55 -0500 "Norskog, Lance" <[EMAIL PROTECTED]> wrote: > We are using MD5 to generate our IDs. MD5s are 128 bits creating a very > unique and very randomized number for the content. Nobody has ever > reported two different data sets that create the same MD5. yup, we use 2 Md5 concatenated . the first part is the MD5 of a group name,the 2nd part is related to the item in the group (the same item can be in different groups, so this 2nd part can also be repeated ) - of course, only 1 item can exist in each group, so it is always unique. > > We use the standard (some RFC) text representation of 32 hex characters. > This has the advantage that F* pulls 1/16 of the total index, with a > completely randomized distribution, F** 1/256, etc. This is very handy > for data analysis and document extraction. yup, and in our case, the first half of the docId could be used to get all items in a group. But your example is a good one - I haven't used it for that yet, but it's a simple and practical use of the doc id :) cheers, B _ {Beto|Norberto|Numard} Meijome "I was born not knowing and have had only a little time to change that here and there." Richard Feynman I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Meaning of "max clauses 1024" error
Hello, I'm curious as to the meaning of a certain exception I am receiving. If i try a query such as "*", I get an exception which basically says there is a maximum of 1024 clauses for a BooleanQuery. However, if I enter "*:*" it matches all documents and returns them. Can someone explain to me what happens when the query is "*" versus a query for "*:*"? Thanks, David _ i’m is proud to present Cause Effect, a series about real people making a difference. http://im.live.com/Messenger/IM/MTV/?source=text_Cause_Effect