Re: Delete from Solr index...
escher2k wrote: Thanks Ryan. I need to use query since I am deleting a range of documents. From your comment, I wasn't sure if one doesn't need to do an explicit commit when using delete by query. Does delete by query not need an explicit commit. delete by query causes a commit *before* it executes... I think you still need one after. From the javadoc: http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/update/DirectUpdateHandler2.java "deleteByQuery causes a commit to happen (close current index writer, open new index reader) before it can be processed. If deleteByQuery functionality is needed, it's best if they can be batched and executed together so they may share the same index reader." I don't quite know what "batched" means since it only reads one command... Thanks. ryan mckinley wrote: escher2k wrote: I am trying to remove documents from my index using "delete by query". However when I did this, the deleted items seem to remain. This is the format of the XML file I am using - load_id:20070424150841 load_id:20070425145301 load_id:20070426145301 load_id:20070427145302 load_id:20070428145301 load_id:20070429145301 When I do the deletes individually, it seems to work (i.e. create each of the above in a separate file). Does this mean that each delete query request has to be executed separately ? correct, delete (unlike ) only accepts one command. Just to note, if "load_id" is your unique key, you could also use: 20070424150841 This will give you better performance and does not commit the changes until you explicitly send
Re: Delete from Solr index...
If you want to do this as a single delete-by-query, you could OR all the clauses together: load_id:(20070424150841 OR 20070425145301 )query> Erik On May 1, 2007, at 2:14 AM, Ryan McKinley wrote: escher2k wrote: I am trying to remove documents from my index using "delete by query". However when I did this, the deleted items seem to remain. This is the format of the XML file I am using - load_id:20070424150841 load_id:20070425145301 load_id:20070426145301 load_id:20070427145302 load_id:20070428145301 load_id:20070429145301 When I do the deletes individually, it seems to work (i.e. create each of the above in a separate file). Does this mean that each delete query request has to be executed separately ? correct, delete (unlike ) only accepts one command. Just to note, if "load_id" is your unique key, you could also use: 20070424150841 This will give you better performance and does not commit the changes until you explicitly send
RE: Faceted count syntax (exclude zeros)...
There is an bug related to "facet.mincount" in incubating version. http://www.mail-archive.com/solr-user@lucene.apache.org/msg03269.html -Yao -Original Message- From: escher2k [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 01, 2007 2:00 AM To: solr-user@lucene.apache.org Subject: Faceted count syntax (exclude zeros)... I am trying to execute a faceted count on a field called "load_id" and want to exclude 0s. The URL below doesn't seem to be excluding zeros. http://localhost:12002/solr/select/?qt=dismax&q=Y&qf=show_all_flag&fl=lo ad_id&facet=true&facet.limit=-1&facet.field=load_id&facet.mincount=1&row s=0 Result (relevant part of XML): 0 0 80 81 77 62 31061 Thanks. -- View this message in context: http://www.nabble.com/Faceted-count-syntax-%28exclude-zeros%29...-tf3673 535.html#a10264961 Sent from the Solr - User mailing list archive at Nabble.com.
Re: i wanna find one crawl that can crawl with defined urls and defined data
2007/4/30, Graeme Merrall <[EMAIL PROTECTED]>: > i wanna crawl http://www.amazone.com/ and just wanna product title , > product information, writer, publisher. > > and other data i wanna ignore. How about http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html i read it before this mail. for example, i wanna crawl http://www.amazone.com/ and just wanna product title , product information, writer, publisher. and other data i wanna ignore. or if you're prepared to wait or help out there's http://svn.apache.org/repos/asf/labs/droids/README.TXT -- regards jl
NullPointerException (not schema related)
Hello, I'm evaluating solr for potential use in an application I'm working on, and it sounds like a really great fit. I'm having trouble getting the Collection Distribution part set up, though. Initially, I had problems setting up the postCommit listener. I first used this xml to configure the listener: snapshooter /usr/local/Production/solr/solr/bin/ true This is what came in the solrconfig.xml file with just a minor tweak to the directory. However, when I committed data to the index, I was getting "No such file or directory" errors from the Runtime.exec call. I verified all of the permissions, etc, with the user I was trying to use. In the end, I wrote up a little test program to see if it was a problem with the Runtime.exec call and I think it is. I'm running this on CentOS 4.4 and Runtime.exec seems to have a hard time directly executing bash scripts. For example, if I called Runtime.exec with a command of "test_program" (which is a bash script), it failed. If I called Runtime.exec with a command of "/bin/bash test_program" it worked. So, with this knowledge in hand, I modified the solrconfig.xml file again to this: /bin/bash /usr/local/Production/solr/solr/bin/ true snapshooter When I commit data now, however, I get a NullPointerException. I'm including the stack trace here: SEVERE: java.lang.NullPointerException at org.apache.solr.core.SolrCore.update(SolrCore.java:716) at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java: 53) at javax.servlet.http.HttpServlet.service(HttpServlet.java:710) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:269) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:210) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:174) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1 51) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:87 0) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.proc essConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint .java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollow erWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool .java:685) at java.lang.Thread.run(Thread.java:619) I know this has something to do with my config change (the problem goes away if I turn off the postCommit listener) but I don't know what! BTW I'm using solr-1.1.0-incubating. Thanks in advance for any help! Charlie
Re: Faceted count syntax (exclude zeros)...
: to exclude 0s. The URL below : doesn't seem to be excluding zeros. : http://localhost:12002/solr/select/?qt=dismax&q=Y&qf=show_all_flag&fl=load_id&facet=true&facet.limit=-1&facet.field=load_id&facet.mincount=1&rows=0 which version of Solr are you using? facet.mincount was added after Solr 1.1, but you can use "facet.zeros=false" to getthe desired results. -Hoss
Re: Specifying no-ops...
: I want to capture information about the user who is executing a particular : search. Is there a way to specify in Solr that certain fields should just be : treated as pass through and not processed ? This way I can use arbitrary : params to do better logging. fields are different from query params ... it sounds like you are asking about query params (which will be in the URL and recorded in your appserver logs). Any param Solr doesn't know about is already ignored... http://localhost:8983/solr/select/?q=ipod&some_random_param_that_is_ignored=hoss+is+being_ignored -Hoss
RE: NullPointerException (not schema related)
Nevermind this...looks like my problem was tagging the "args" as an node instead of an node. Thanks anyway! Charlie -Original Message- From: Charlie Jackson [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 01, 2007 12:02 PM To: solr-user@lucene.apache.org Subject: NullPointerException (not schema related) Hello, I'm evaluating solr for potential use in an application I'm working on, and it sounds like a really great fit. I'm having trouble getting the Collection Distribution part set up, though. Initially, I had problems setting up the postCommit listener. I first used this xml to configure the listener: snapshooter /usr/local/Production/solr/solr/bin/ true This is what came in the solrconfig.xml file with just a minor tweak to the directory. However, when I committed data to the index, I was getting "No such file or directory" errors from the Runtime.exec call. I verified all of the permissions, etc, with the user I was trying to use. In the end, I wrote up a little test program to see if it was a problem with the Runtime.exec call and I think it is. I'm running this on CentOS 4.4 and Runtime.exec seems to have a hard time directly executing bash scripts. For example, if I called Runtime.exec with a command of "test_program" (which is a bash script), it failed. If I called Runtime.exec with a command of "/bin/bash test_program" it worked. So, with this knowledge in hand, I modified the solrconfig.xml file again to this: /bin/bash /usr/local/Production/solr/solr/bin/ true snapshooter When I commit data now, however, I get a NullPointerException. I'm including the stack trace here: SEVERE: java.lang.NullPointerException at org.apache.solr.core.SolrCore.update(SolrCore.java:716) at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java: 53) at javax.servlet.http.HttpServlet.service(HttpServlet.java:710) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:269) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:210) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:174) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1 51) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:87 0) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.proc essConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint .java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollow erWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool .java:685) at java.lang.Thread.run(Thread.java:619) I know this has something to do with my config change (the problem goes away if I turn off the postCommit listener) but I don't know what! BTW I'm using solr-1.1.0-incubating. Thanks in advance for any help! Charlie
RE: NullPointerException (not schema related)
: : snapshooter : /usr/local/Production/solr/solr/bin/ : true : : the directory. However, when I committed data to the index, I was : getting "No such file or directory" errors from the Runtime.exec call. I : verified all of the permissions, etc, with the user I was trying to use. : In the end, I wrote up a little test program to see if it was a problem : with the Runtime.exec call and I think it is. I'm running this on CentOS : 4.4 and Runtime.exec seems to have a hard time directly executing bash : scripts. For example, if I called Runtime.exec with a command of : "test_program" (which is a bash script), it failed. If I called : Runtime.exec with a command of "/bin/bash test_program" it worked. this initial problem you were having may be a result of path issues. dir doesn't need to be the directory where your script lives, it's the directory where you wnat your script to run (the "cwd" of the process). it's possible that the error you were getting was because "." isn't in the PATH that was being used, you should try something like this... /usr/local/Production/solr/solr/bin/snapshooter /usr/local/Production/solr/solr/bin/ true ...or maybe even... ./snapshooter /usr/local/Production/solr/solr/bin/ true -Hoss
RE: Unicode characters
Thanks a lot for the time you spent understanding my problem and checking for a solution in Neko! It helps a lot. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Friday, April 27, 2007 4:02 PM To: solr-user@lucene.apache.org Subject: Re: Unicode characters : -fetch a web page : -decode entities and unicode characters(such as $#149; ) using Neko : library : -get a unicode String in Java : -Sent it to SOLR through XML created by SAX, with the right encoding : (UTF-8) specified everywhere( writer, header etc...) : -it apparently arrives clean on the SOLR side (verified in our logs). : -In the query output from SOLR (XML message), the character is not : encoded as an entity (not •) but the character itself is used : (character 149=95 hexadecimal). Just because someone uses an html entity to display a character in a web page doesn't mean it needs to be "escaped" in XML ... i think that in theory we could use numeric entities to escape *every* character but that would make the XML responses a lot bigger ... so in general Solr only escapes the characters that need to be escaped to have a valid UTF-8 XML response. Your may also be having some additional problems since 149 (hex 95) is not a printable UTF-8 character, it's a control character (MESSAGE_WAITING) ... it sounds like you're dealing with HTML where people were using the numeric value from the "Windows-1252" charset. you may want to modify your parsing code to do some mappings between "control" characters that you know aren't ment to be control characters before you ever send them to solr. a quick search for "Neko windows-1525" indicates that enough people have had problems with this that it is a built in feature... http://people.apache.org/~andyc/neko/doc/html/settings.html "http://cyberneko.org/html/features/scanner/fix-mswindows-refs Specifies whether to fix character entity references for Microsoft Windows characters as described at http://www.cs.tut.fi/~jkorpela/www/windows-chars.html."; (I've run into this a number of times over the years when dealing with content created by windows users, as you can see from my one and only thread on "JavaJunkies" ... http://www.javajunkies.org/index.pl?node_id=3436 ) -Hoss
Re: Specifying no-ops...
When we use solr in a javascript / ajax.request context we often want to 'tag' requests with the user id or item number or something that will not normally appear in the solr results. Because in an asynchronous request handler, you won't know who or what the query is about. To do this, we make sure all of our requesthandlers in solrconfig.xml have "echoParams = explicit" set. Then you can do select?q=dogs&userid=XR30010&itemid=TR30120 And solr will not complain about those extra params and will also echo them back in the response XML/json, which your client can parse. On May 1, 2007, at 2:22 AM, escher2k wrote: I want to capture information about the user who is executing a particular search. Is there a way to specify in Solr that certain fields should just be treated as pass through and not processed ? This way I can use arbitrary params to do better logging. Thanks. -- View this message in context: http://www.nabble.com/Specifying-no- ops...-tf3673559.html#a10265041 Sent from the Solr - User mailing list archive at Nabble.com. -- http://variogr.am/ [EMAIL PROTECTED]
Re: NullPointerException (not schema related)
On 5/1/07, Charlie Jackson <[EMAIL PROTECTED]> wrote: This is what came in the solrconfig.xml file with just a minor tweak to the directory. However, when I committed data to the index, I was getting "No such file or directory" errors from the Runtime.exec call. I verified all of the permissions, etc, with the user I was trying to use. In the end, I wrote up a little test program to see if it was a problem with the Runtime.exec call and I think it is. I'm running this on CentOS 4.4 and Runtime.exec seems to have a hard time directly executing bash scripts. For example, if I called Runtime.exec with a command of "test_program" (which is a bash script), it failed. If I called Runtime.exec with a command of "/bin/bash test_program" it worked. Yes, Runtime.exec does not invoke a shell automatically, so shebang lines, shell built-ins, io redirection, etc. cannot be used directly. -Mike
RE: NullPointerException (not schema related)
I went with the first approach which got me up and running. Your other example config (using ./snapshooter) made me realize how foolish my original problem was! Anyway, I've got the whole thing up and running and it looks pretty awesome! One quick question, though. As stated in the wiki, one of the benefits of distributing the indexes is load balance the queries. Is there a built-in solr mechanism for performing this query load balancing? I'm suspecting there is not, and I haven't seen anything about it in the wiki, but I wanted to check because I know I'm going to be asked. Thanks, Charlie -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 01, 2007 3:20 PM To: solr-user@lucene.apache.org Subject: RE: NullPointerException (not schema related) : : snapshooter : /usr/local/Production/solr/solr/bin/ : true : : the directory. However, when I committed data to the index, I was : getting "No such file or directory" errors from the Runtime.exec call. I : verified all of the permissions, etc, with the user I was trying to use. : In the end, I wrote up a little test program to see if it was a problem : with the Runtime.exec call and I think it is. I'm running this on CentOS : 4.4 and Runtime.exec seems to have a hard time directly executing bash : scripts. For example, if I called Runtime.exec with a command of : "test_program" (which is a bash script), it failed. If I called : Runtime.exec with a command of "/bin/bash test_program" it worked. this initial problem you were having may be a result of path issues. dir doesn't need to be the directory where your script lives, it's the directory where you wnat your script to run (the "cwd" of the process). it's possible that the error you were getting was because "." isn't in the PATH that was being used, you should try something like this... /usr/local/Production/solr/solr/bin/snapshooter /usr/local/Production/solr/solr/bin/ true ...or maybe even... ./snapshooter /usr/local/Production/solr/solr/bin/ true -Hoss
Ranking ApacheCon proposals
I have no idea if they did this for the impending ApacheCon EU, but I just noticed that for ApacheCon US, they have a "Would you attend this session?" ranking for for people to give feedback on the abstracts that have been submited before the schedule is made. I would never dream of shilling my own session proposals, but I will happily encourage people who are interested in seeing Solr/Lucene well represented in the ApacehCon sessions to go to the ApacheCon website, create an account, and click the "Rate the session proposals" link after you login... http://apachecon.com/html/login.html If you are *not* interested in seeing Solr/Lucene well represented in the ApacheCon sessions, please disregard this email. :) -Hoss
Wondering about results from PhraseQuer
Hi Everyone, Pardon me if this question might be asked here in the mailing list earlier. I tried looking for this but I could not get any answers. I am querying against indexes with a phrase query. And although I can see my terms occurrence in the debug results I get the overall score to be "0". To give the scenario, understand this that user runs a search for title which has pretty common terms such as "how do I update" {all of the words appears 1000s of times in indexes } and they want to search "prison" the last term appears not more than 1 or 2 times across the indexes. Now I have the problem, if I try to run phrase query on this I get zero results and if I run term query with boolean across all terms I have too many results to be meaningful. So what and how should I arrange the query so that I can get relevant results. Here are my debug results for my search query = − subject_t:"how do I prison" subject_t:"how do I prison" PhraseQuery(subject_t:"how do i prison") subject_t:"how do i prison" standard − − 0.0 = fieldWeight(subject_t:"how do i prison" in 9268), product of: 0.0 = tf(phraseFreq=0.0) 18.508762 = idf(subject_t: how=2225 do=3359 i=4918 prison=4) 0.5 = fieldNorm(field=subject_t, doc=9268) − 0.0 = fieldWeight(subject_t:"how do i prison" in 10424), product of: 0.0 = tf(phraseFreq=0.0) 18.508762 = idf(subject_t: how=2225 do=3359 i=4918 prison=4) 0.5 = fieldNorm(field=subject_t, doc=10424) − 0.0 = fieldWeight(subject_t:"how do i prison" in 12163), product of: 0.0 = tf(phraseFreq=0.0) 18.508762 = idf(subject_t: how=2225 do=3359 i=4918 prison=4) 0.625 = fieldNorm(field=subject_t, doc=12163) − 0.0 = fieldWeight(subject_t:"how do i prison" in 9289), product of: 0.0 = tf(phraseFreq=0.0) 18.508762 = idf(subject_t: how=2225 do=3359 i=4918 prison=4) 0.625 = fieldNorm(field=subject_t, doc=9289) − 0.0 = fieldWeight(subject_t:"how do i prison" in 14700), product of: 0.0 = tf(phraseFreq=0.0) 18.508762 = idf(subject_t: how=2225 do=3359 i=4918 prison=4) 0.4375 = fieldNorm(field=subject_t, doc=14700) − 0.0 = fieldWeight(subject_t:"how do i prison" in 11920), product of: 0.0 = tf(phraseFreq=0.0) 18.508762 = idf(subject_t: how=2225 do=3359 i=4918 prison=4) 0.625 = fieldNorm(field=subject_t, doc=11920) − 0.0 = fieldWeight(subject_t:"how do i prison" in 1278), product of: 0.0 = tf(phraseFreq=0.0) 18.508762 = idf(subject_t: how=2225 do=3359 i=4918 prison=4) 0.375 = fieldNorm(field=subject_t, doc=1278) − 0.0 = fieldWeight(subject_t:"how do i prison" in 3868), product of: 0.0 = tf(phraseFreq=0.0) 18.508762 = idf(subject_t: how=2225 do=3359 i=4918 prison=4) 0.3125 = fieldNorm(field=subject_t, doc=3868) − 0.0 = fieldWeight(subject_t:"how do i prison" in 3893), product of: 0.0 = tf(phraseFreq=0.0) 18.508762 = idf(subject_t: how=2225 do=3359 i=4918 prison=4) 0.5 = fieldNorm(field=subject_t, doc=3893) − 0.0 = fieldWeight(subject_t:"how do i prison" in 19024), product of: 0.0 = tf(phraseFreq=0.0) 18.508762 = idf(subject_t: how=2225 do=3359 i=4918 prison=4) 0.5 = fieldNorm(field=subject_t, doc=19024) = Thanks -- View this message in context: http://www.nabble.com/Wondering-about-results-from-PhraseQuer-tf3677924.html#a10277926 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Ranking ApacheCon proposals
Cool, I noticed a ruby-Flare-Solr presentation too who is giving that? ERIC Chris Hostetter wrote: > I have no idea if they did this for the impending ApacheCon EU, but I just > noticed that for ApacheCon US, they have a "Would you attend this > session?" ranking for for people to give feedback on the abstracts that > have been submited before the schedule is made. > > I would never dream of shilling my own session proposals, but I will > happily encourage people who are interested in seeing Solr/Lucene well > represented in the ApacehCon sessions to go to the ApacheCon website, > create an account, and click the "Rate the session proposals" link after > you login... > >http://apachecon.com/html/login.html > > If you are *not* interested in seeing Solr/Lucene well represented in the > ApacheCon sessions, please disregard this email. :) > > > -Hoss > >
Re: Ranking ApacheCon proposals
On May 1, 2007, at 7:42 PM, ericp wrote: Cool, I noticed a ruby-Flare-Solr presentation too who is giving that? I proposed that one. Erik
RE: EmbeddedSolr class from Wiki
Thank you Hoss, this is exactly what I need! Currently I perform reindexing once a month, and it takes few days... Very slow... Over 2 millions documents (not too much; 300Mb in files), database & SOLR on a same box, and SOLR uses about 60-80% CPU. I will implement real-time updates, via direct Java calls (as soon as data gets changed). About Compass, I noticed some messages. I tried to use it (before SOLR) because of advertised "transactional" Lucene updates; that is not true, and performance was really bad. -Original Message- From: Chris Hostetter postCommit and postOptimize hooks can be subclass of SolrEventListener so you can trigger arbitrary jva code if you want to write your own (use JMS, or make an HTTP call, whatever) the RunExecutableListener that ships with Solr would be the easiest thing to do ... just have it execute the "commit" command line script on your slave (which will make it reopen the index you just modified)