add/update document as distinct operations? Is it possible?
Hi I have distributed messaging solution where I need to distinct between adding a document and just trying to update it. Scenario: 1. message sent for document to be updated 2. meanwhile another message is sent for document to be deleted and is executed before 1 As a result when 1 comes instead of ignoring the update as document is no more...it will add it again. >From what I see in manual I cannot distinct those operations which would. Any pointers? Cheers
Re: add/update document as distinct operations? Is it possible?
Also in this regard the possibility to update document without giving all required fields but just uniqueKey and some other data Julian Davchev wrote: > Hi > I have distributed messaging solution where I need to distinct between > adding a document and just > trying to update it. > > Scenario: > 1. message sent for document to be updated > 2. meanwhile another message is sent for document to be deleted and is > executed before 1 > As a result when 1 comes instead of ignoring the update as document is > no more...it will add it again. > > From what I see in manual I cannot distinct those operations which > would. Any pointers? > > Cheers >
Re: Solr crashing while extracting from very simple text file
Yes, please report this to the Tika project. Erik On Mar 31, 2010, at 9:31 PM, Ross wrote: Does anyone have any thoughts or suggestions on this? I guess it's really a Tika problem. Should I try to report it to the Tika project? I wonder if someone could try it to see if it's a general problem or just me. I can reproduce it by firing up the nano editor, creating a file with XXBLE on one line and nothing else. Try indexing that and Solr / Tika crashes. I can avoid it by editing the file slightly but I haven't really been able to discover a consistent pattern. It works if I change the word to lower case. Also a three line file like this works a a XXBLE but not x x XXBLE It's a bit unfortunate because a similar word (a person's name ??BLE ) with the same problem appears frequently in upper case near the top of my files. Cheers Ross On Sun, Mar 21, 2010 at 12:58 PM, Ross wrote: Hi all I'm trying to import some text files. I'm mostly following Avi Rappoport's tutorial. Some of my files cause Solr to crash while indexing. I've narrowed it down to a very simple example. I have a file named test.txt with one line. That line is the word XXBLE and nothing else This is the command I'm using. curl "http://localhost:8080/solr-example/update/extract?literal.id=1&commit=true " -F "myfi...@test.txt" The result is pasted below. Other files work just fine. The problem seems to be related to the letters B and E. If I change them to something else or make them lower case then it works. In my real files, the XX is something else but the result is the same. It's a common word in the files. I guess for this "quick and dirty" job I'm doing I could do a bulk replace in the files to make it lower case. Is there any workaround for this? Thanks Ross Apache Tomcat/6.0.20 - Error report HTTP Status 500 - org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba at org .apache .solr .handler .extraction .ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org .apache .solr .handler .ContentStreamHandlerBase .handleRequestBody(ContentStreamHandlerBase.java:54) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131) at org.apache.solr.core.RequestHandlers $LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org .apache .catalina .core .ApplicationFilterChain .internalDoFilter(ApplicationFilterChain.java:235) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 206) at org .apache .catalina .core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org .apache .catalina .core.StandardContextValve.invoke(StandardContextValve.java:191) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 109) at org .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 293) at org .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 849) at org.apache.coyote.http11.Http11Protocol $Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint $Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:636) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java: 121) at org.apache.tika.parser.AutoDetectPar
Is this a bug of the RessourceLoader?
Hello community, I was hunting a ghostbug the last few days. As some of you might have recognized, I have written some postings, because of unexpected dismax-handler behaviour and some other problems. However, there was no error in my code nor in my schema.xml. It seems like that the ressource-loader has got a little bug. The first line of a file you want to load with the getLine()-method of RessourceLoader [1] has to be outcommented by "#". If not, the first line seems to be ignored or something like that. Please let me know, whether you can reproduce this bug on your own. The responsible code was copied from the StopWordFilter and looks like that: //copied from StopFilterFactory and some vars are renamed if (wordsFile != null) { try { List files = StrUtils.splitFileNames(wordsFile); if (words == null && files.size() > 0) { //default stopwords list has 35 or so words, but maybe don't make it that big to start words = new CharArraySet(files.size() * 10, true); } for (String file : files) { List wlist = loader.getLines(file.trim()); //TODO: once StopFilter.makeStopSet(List) method is available, switch to using that so we can avoid a toArray() call words.addAll(StopFilter.makeStopSet((String[])wlist.toArray(new String[0]), true)); } } catch (IOException e) { throw new RuntimeException(e); } } -- If you can reproduce this error, I think one should note it in the javadocs, because bypassing this unexpected behaviour seems to be easy: just outcomment the first line with a "#"-character. Hope this helps - Mitch [1] http://lucene.apache.org/solr/api/org/apache/solr/common/ResourceLoader.html -- View this message in context: http://n3.nabble.com/Is-this-a-bug-of-the-RessourceLoader-tp690523p690523.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is this a bug of the RessourceLoader?
On Thu, Apr 1, 2010 at 7:06 AM, MitchK wrote: > > It seems like that the ressource-loader has got a little bug. The first > line > of a file you want to load with the getLine()-method of RessourceLoader [1] > has to be outcommented by "#". If not, the first line seems to be ignored > or > something like that. > > Some applications (such as Windows Notepad), insert a UTF-8 Byte Order Mark (BOM) as the first character of the file. So, perhaps the first word in your stopwords list contains a UTF-8 BOM and thats why you are seeing this behavior. If you look at the file with "more" and the first character appears to be , then you can confirm thats the problem. -- Robert Muir rcm...@gmail.com
Re: Is this a bug of the RessourceLoader?
I used notepadd++ to create the file and yes, you might be right. I will test whether that was the problem. If yes, do you know whether script-languages like php or javascript also setting a BOM when they create a utf-8-encoded file/text? Probably making a note for this behaviour somewhere in the FAQ would be a good idea, since it depends in parts on what software one used to create the file, wouldn't it? - Mitch -- View this message in context: http://n3.nabble.com/Is-this-a-bug-of-the-RessourceLoader-tp690523p690669.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Query time only Ranges
Hi Chris, Actually I needed time upto seconds granularity, so did you mean I should index the field after conversion into seconds Ankit -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, March 31, 2010 10:05 PM To: solr-user@lucene.apache.org Subject: Re: Query time only Ranges : I am working on use case - wherein i need to Query to just time ranges : without date component. : : search for docs with between 4pm - 6pm if you only need to store the hour of the day, and query on the hour of the day, then i would just use a numeric integer field containing the hour of the day. if you want minute or second (even even millisecond) granularity, but you still only care abotu the time of day (and note the *date*) then i would still use an integer field, and just index the numeric value in whatever granualrity you need. -Hoss
Re: SOLR-1316 How To Implement this autosuggest component ???
hello. i understand not really much about this conversation :D but i think you can help me. i got an idea for my suggestions. make it sense to group my suggestions with patch-236 ? i test it. and it worked not complete well =( my problem ist that i have too many productnames with too long names for our app. so its necessary to group single terms and mulitple terms to one suggestion. the field collapse works well but i got some strange results. does anybody try something like this ? 1316 is not the right component i think so !? thx -- View this message in context: http://n3.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp506492p690933.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: selecting documents older than 4 hours
I did something similar. The only difference with my set up is that I have two columns; one that store the dates the document was first created and a second that stores the date it was last updated as unix time stamps So my query to find documents that are older than 4 hours would be very easy To find documents that were last updated more than for hours ago you would do something like this q=last_update_date:[* TO 1270119278] The current timestamp now is 1270133678. 4 hours ago was 1270119278 The column types in the schema is tint On Wed, Mar 31, 2010 at 11:18 PM, herceg_novi wrote: > > Hello, I'd like to select documents older than 4 hours in my Solr 1.4 > installation. > > The query > > q=last_update_date:[NOW-7DAYS TO NOW-4HOURS] > > does not return a correct recordset. I would expect to get all documents > with last_update_date in the specified range. Instead solr returns all > documents that exist in the index which is not what I would expect. > Last_update_date is SolrDate field. > > This does not work either > q=last_update_date:[NOW/DAY-7DAYS TO NOW/HOUR-4HOURS] > > This works, but I manually had to calculate the 4 hour difference and > insert > solr date formated timestamp into my query (I prefer not to do that) > q=last_update_date:[NOW/DAY-7DAYS TO 2010-03-31T19:40:34Z] > > Any ideas if I can get this to work as expected? > q=last_update_date:[NOW-7DAYS TO NOW-4HOURS] > > Thanks! > -- > View this message in context: > http://n3.nabble.com/selecting-documents-older-than-4-hours-tp689975p689975.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: add/update document as distinct operations? Is it possible?
One of the most requested features in Lucene/SOLR is to be able to update only selected fields rather than the whole document. But that's not how it works at present. An update is really a delete and an add. So for your second message, you can't do a partial update, you must "update" the whole document. I'm a little confused by what you *want* in your first e-mail. But the current way SOLR works, if the SOLR server first received the delete then the update, the index would have the document in it. But the opposite order would delete the documen. But this really doesn't sound like a SOLR issue, since SOLR can't magically divine the desired outcome. Somewhere you have to coordinate the requests or your index will not be what you expect. That is, you have to define what rules index modifications follow and enforce them. Perhaps you can consider a queueing mechanism of some sort (that you'd have to implement yourself...) HTH Erick On Thu, Apr 1, 2010 at 1:03 AM, Julian Davchev wrote: > Hi > I have distributed messaging solution where I need to distinct between > adding a document and just > trying to update it. > > Scenario: > 1. message sent for document to be updated > 2. meanwhile another message is sent for document to be deleted and is > executed before 1 > As a result when 1 comes instead of ignoring the update as document is > no more...it will add it again. > > From what I see in manual I cannot distinct those operations which > would. Any pointers? > > Cheers >
Re: Read Time Out Exception while trying to upload a huge SOLR input xml
Don't do that. For many reasons . By trying to batch so many docs together, you're just *asking* for trouble. Quite apart from whether it'll work once, having *any* HTTP-based protocol work reliably with 13G is fragile... For instance, I don't want to have my know whether the XML parsing in SOLR parses the entire document into memory before processing or not. But I sure don't want my application to change behavior if SOLR changes it's mind and wants to process the other way. My perfectly working application (assuming an event-driven parser) could suddenly start requiring over 13G of memory... Oh my aching head! Your specific error might even be dependent upon GCing, which will cause it to break differently, sometimes, maybe.. So do break things up and transmit multiple documents. It'll save you a world of hurt. HTH Erick On Thu, Apr 1, 2010 at 4:34 AM, Mark Fletcher wrote: > Hi, > > For the first time I tried uploading a huge input SOLR xml having about 1.2 > million *docs* (13GB in size). After some time I get the following > exception:- > > The server encountered an internal error ([was class > java.net.SocketTimeoutException] Read timed out > java.lang.RuntimeException: [was class java.net.SocketTimeoutException] > Read > timed out > at > > com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) > at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) > at > > com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) > at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) > at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:279) > at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) > at > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > at > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) > at > > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.net.SocketTimeoutException: Read timed out > ... > > Was the file I tried to upload too big and should I try reducing its > size..? > > Thanks and Rgds, > Mark. >
Does Lucidimagination search uses Multi facet query filter or uses session?
Hi, I am trying to create a search functionality same as that of Lucidimagination search. As of now I have formed the Facet query as below http://localhost:8080/solr/db/select?q=*:*&fq={!tag=3DotHierarchyFacet}3DotHierarchyFacet:ABC&facet=on&facet.field={!ex=3DotHierarchyFacet}3DotHierarchyFacet&facet.field=ApplicationStatusFacet&facet.mincount=1 Since I am having multiple facets I have planned to form the query based on the user selection. Something like below...if the user selects (multiple facets) application status as 'P' I would form the query as below http://localhost:8080/solr/db/select?q=*:*&fq={!tag=3DotHierarchyFacet}3DotHierarchyFacet:NTS&fq={!tag=ApplicationStatusFacet}ApplicationStatusFacet:P&facet=on&facet.field={!ex=3DotHierarchyFacet}3DotHierarchyFacet&&facet.field={!ex=ApplicationStatusFacet}&facet.mincount=1 Can someone let me know I am forming the correct query to perform multiselect facets? I just want to know if I am doing anything wrong in the query.. We are also trying to achieve this using sessions but if we are able to solve this by query I would prefer using query than using session variables.. Thanks, Barani -- View this message in context: http://n3.nabble.com/Does-Lucidimagination-search-uses-Multi-facet-query-filter-or-uses-session-tp691167p691167.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: selecting documents older than 4 hours
: q=last_update_date:[NOW-7DAYS TO NOW-4HOURS] : : does not return a correct recordset. I would expect to get all documents : with last_update_date in the specified range. Instead solr returns all : documents that exist in the index which is not what I would expect. : Last_update_date is SolrDate field. that query should work fine, and do exactly what you describe. there is no field type named "SolrDate" in solr ... can you please past in the exact schema.xml entries for your last_update_date field, as well as for the fieldType assocaited with that field? also: what does debugQuery=true show when you execute that query? (in particular i'd like to see the what the parsedquery information looks like) -Hoss
Re: Solr crashing while extracting from very simple text file
: Yes, please report this to the Tika project. except that when i run "tika-app-0.6.jar" on a text file like the one Ross describes, i don't get the error he describes, which means it may be something off in how Solr is using Tika. Ross: I can't reproduce this error on the trunk using the example solr configs and the text file below. can you verify exactly which version of SOlr you are using (and which version of tika you are using inside solr) and the exact byte contents of your simplest problematic text file? hoss...@brunner:~/tmp$ cat tmp.txt x x XXBLE hoss...@brunner:~/tmp$ hexdump -C tmp.txt 78 0a 78 0a 58 58 42 4c 45 0a|x.x.XXBLE.| 000a hoss...@brunner:~/tmp$ curl "http://localhost:8983/solr/update/extract?literal.id=1&commit=true"; -F "myfi...@tmp.txt" 066 -Hoss
Re: Solr crashing while extracting from very simple text file
Hi Chris, thanks for looking at this. I'm using Solr 1.4.0 including the Tika that's in the tgz file which means Tika 0.4. I've now discovered that only two letters are required. A single line with XE will crash it. This fails: r...@gamma:/home/ross# hexdump -C test.txt 58 45 0a |XE.| 0003 r...@gamma:/home/ross# This works r...@gamma:/home/ross# hexdump -C test.txt 58 46 0a |XF.| 0003 r...@gamma:/home/ross# XA, XB, XC, XD, XF all work okay. There's just something special about XE. The command I use is: curl "http://localhost:8080/solr-example/update/extract?literal.id=doc1&fmap.content=body&commit=true"; -F "myfi...@test.txt" I filed a bug at https://issues.apache.org/jira/browse/TIKA-397 but I guess 0.4 is an old version so I wouldn't expert it to get much attention. It looks like I should upgrade Tika to 0.6. I don't really know how to do that or if Solr 1.4 works with Tika 0.6. The Tika pages talk about using Maven to build it. Sorry, I'm no Linux expert. Ross On Thu, Apr 1, 2010 at 1:07 PM, Chris Hostetter wrote: > > : Yes, please report this to the Tika project. > > except that when i run "tika-app-0.6.jar" on a text file like the one Ross > describes, i don't get the error he describes, which means it may be > something off in how Solr is using Tika. > > Ross: I can't reproduce this error on the trunk using the example solr > configs and the text file below. can you verify exactly which version of > SOlr you are using (and which version of tika you are using inside solr) > and the exact byte contents of your simplest problematic text file? > > hoss...@brunner:~/tmp$ cat tmp.txt > x > x > XXBLE > hoss...@brunner:~/tmp$ hexdump -C tmp.txt > 78 0a 78 0a 58 58 42 4c 45 0a |x.x.XXBLE.| > 000a > hoss...@brunner:~/tmp$ curl > "http://localhost:8983/solr/update/extract?literal.id=1&commit=true"; -F > "myfi...@tmp.txt" > > > 0 name="QTime">66 > > > > -Hoss > >
Re: add/update document as distinct operations? Is it possible?
: Subject: add/update document as distinct operations? Is it possible? : References: : : In-Reply-To: : http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
How to view SOLR logs
Hi, We have a application which uses SOLR sharp to get the details from SOLR. Currently since we are in testing stage we would like to know what query is being passed to SOLR from our application without debuggging the application each time. Is there a way to view the queries passed to SOLR on a specified time. We are running SOLR on jetty and using SOLR Sharp for accessing the SOLR data. Thanks, Barani -- View this message in context: http://n3.nabble.com/How-to-view-SOLR-logs-tp691642p691642.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Read Time Out Exception while trying to upload a huge SOLR input xml
The error might be that your http client doesn't handle really large files (32-bit overflow in the Content-Length header?) or something in your network is killing your long-lived socket? Solr can definitely accept a 13GB xml document. I've uploaded large files into Solr successfully, including recently a 12GB XML input file with ~4 million documents. My Solr instance had 2GB of memory and it took about 2 hours. Solr streamed the XML in nicely. I had to jump through a couple of hoops, but in my case it was easier than writing a tool to split up my 12GB XML file... 1. I tried to use curl to do the upload, but it didn't handle files that large. For my quick and dirty testing, netcat (nc) did the trick--it doesn't buffer the file in memory and it doesn't overflow the Content-Length header. Plus I could pipe the data through pv to get a progress bar and estimated time of completion. Not recommended for production! FILE=documents.xml SIZE=$(stat --format %s $FILE) (echo "POST /solr/update HTTP/1.1 Host: localhost:8983 Content-Type: text/xml Content-Length: $SIZE " ; cat $FILE ) | pv -s $SIZE | nc localhost 8983 2. Indexing seemed to use less memory if I configured Solr to auto commit periodically in solrconfig.xml. This is what I used: 25000 30 Shawn On Thu, Apr 1, 2010 at 10:10 AM, Erick Erickson wrote: > Don't do that. For many reasons . By trying to batch so many docs > together, you're just *asking* for trouble. Quite apart from whether it'll > work once, having *any* HTTP-based protocol work reliably with 13G is > fragile... > > For instance, I don't want to have my know whether the XML parsing in > SOLR parses the entire document into memory before processing or > not. But I sure don't want my application to change behavior if SOLR > changes it's mind and wants to process the other way. My perfectly > working application (assuming an event-driven parser) could > suddenly start requiring over 13G of memory... Oh my aching head! > > Your specific error might even be dependent upon GCing, which will > cause it to break differently, sometimes, maybe.. > > So do break things up and transmit multiple documents. It'll save you > a world of hurt. > > HTH > Erick > > On Thu, Apr 1, 2010 at 4:34 AM, Mark Fletcher > wrote: > >> Hi, >> >> For the first time I tried uploading a huge input SOLR xml having about 1.2 >> million *docs* (13GB in size). After some time I get the following >> exception:- >> >> The server encountered an internal error ([was class >> java.net.SocketTimeoutException] Read timed out >> java.lang.RuntimeException: [was class java.net.SocketTimeoutException] >> Read >> timed out >> at >> >> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) >> at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) >> at >> >> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) >> at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) >> at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:279) >> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138) >> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) >> at >> >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) >> at >> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) >> at >> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) >> at >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) >> at >> >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >> at >> >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >> at >> >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >> at >> >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >> at >> >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >> at >> >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >> at >> >> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >> at >> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) >> at >> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) >> at >> >> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) >> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) >> at java.lang.Thread.run(Thread.java:619) >> Caused by: java.net.SocketTimeoutException: Read timed out >> ... >> >> Was the file I tried to upload too big and should I try reducing its >> size..? >> >> Thanks and R
Re: How to view SOLR logs
The default jetty.xml sets up a request logger that logs to "logs/_mm_dd.request.log" relative to the directory jetty is started from. Look for NCSARequestLog in your jetty.xml. If SOLR Sharp uses GETs (not POSTs) you can look at the urls in the log and pull out the "q" and "fq" parameters which will contain the queries. Shawn On Thu, Apr 1, 2010 at 2:56 PM, bbarani wrote: > > Hi, > > We have a application which uses SOLR sharp to get the details from SOLR. > Currently since we are in testing stage we would like to know what query is > being passed to SOLR from our application without debuggging the application > each time. > > Is there a way to view the queries passed to SOLR on a specified time. We > are running SOLR on jetty and using SOLR Sharp for accessing the SOLR data. > > Thanks, > Barani > -- > View this message in context: > http://n3.nabble.com/How-to-view-SOLR-logs-tp691642p691642.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Search accross more than one field (dismax) ignored
: select/?q=video&qt=dismax&qf=titleMain^2.0+titleShort^5.3&debugQuery=on ... : : +(titleMain:video^2.0)~0.01 (titleMain:video^2.0)~0.01 : ... : My solrconfig for the dismax handler: ..what about schema.xml? ... what do the field (and corrisponding fieldtype) for titleShort look like? : Even when I do not query against the dismax-requestHandler, a search accross : more than one field seems to fail. please be explict: provide an example and define "fail" (error page? no results? incorrect results? ... what are we talking baout in this case) -Hoss
Re: How to view SOLR logs
Hi, I could see all GET request properly in SOLR but couldnt find any POST request issued from SOLRsharp. If I issue search directly in SOLR (not from the application) I could see the logs as below, 127.0.0.1 - - [02/04/2010:03:33:23 +] "GET /solr/db/select?q=test But when search happens through application the logs are as follows, 171.165.243.16 - - [01/04/2010:22:07:39 +] "POST /solr/db/select/ HTTP/1.1" 200 1806 Not sure why the entire string is not logged... Thanks, Barani -- View this message in context: http://n3.nabble.com/How-to-view-SOLR-logs-tp691642p692216.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Read Time Out Exception while trying to upload a huge SOLR input xml
Hi Eric, Shawn, Thank you for your reply. Luckily just on the second time itself my 13GB SOLR XML (more than a million docs) went in fine into SOLR without any problem and I uploaded another 2 more sets of 1.2million+ docs fine without any hassle. I will try for lesser sized more xmls next time as well as the auto commit suggestion. Best Rgds, Mark. On Thu, Apr 1, 2010 at 6:18 PM, Shawn Smith wrote: > The error might be that your http client doesn't handle really large > files (32-bit overflow in the Content-Length header?) or something in > your network is killing your long-lived socket? Solr can definitely > accept a 13GB xml document. > > I've uploaded large files into Solr successfully, including recently a > 12GB XML input file with ~4 million documents. My Solr instance had > 2GB of memory and it took about 2 hours. Solr streamed the XML in > nicely. I had to jump through a couple of hoops, but in my case it > was easier than writing a tool to split up my 12GB XML file... > > 1. I tried to use curl to do the upload, but it didn't handle files > that large. For my quick and dirty testing, netcat (nc) did the > trick--it doesn't buffer the file in memory and it doesn't overflow > the Content-Length header. Plus I could pipe the data through pv to > get a progress bar and estimated time of completion. Not recommended > for production! > > FILE=documents.xml > SIZE=$(stat --format %s $FILE) > (echo "POST /solr/update HTTP/1.1 > Host: localhost:8983 > Content-Type: text/xml > Content-Length: $SIZE > " ; cat $FILE ) | pv -s $SIZE | nc localhost 8983 > > 2. Indexing seemed to use less memory if I configured Solr to auto > commit periodically in solrconfig.xml. This is what I used: > > > >25000 >30 > > > > Shawn > > On Thu, Apr 1, 2010 at 10:10 AM, Erick Erickson > wrote: > > Don't do that. For many reasons . By trying to batch so many docs > > together, you're just *asking* for trouble. Quite apart from whether > it'll > > work once, having *any* HTTP-based protocol work reliably with 13G is > > fragile... > > > > For instance, I don't want to have my know whether the XML parsing in > > SOLR parses the entire document into memory before processing or > > not. But I sure don't want my application to change behavior if SOLR > > changes it's mind and wants to process the other way. My perfectly > > working application (assuming an event-driven parser) could > > suddenly start requiring over 13G of memory... Oh my aching head! > > > > Your specific error might even be dependent upon GCing, which will > > cause it to break differently, sometimes, maybe.. > > > > So do break things up and transmit multiple documents. It'll save you > > a world of hurt. > > > > HTH > > Erick > > > > On Thu, Apr 1, 2010 at 4:34 AM, Mark Fletcher > > wrote: > > > >> Hi, > >> > >> For the first time I tried uploading a huge input SOLR xml having about > 1.2 > >> million *docs* (13GB in size). After some time I get the following > >> exception:- > >> > >> The server encountered an internal error ([was class > >> java.net.SocketTimeoutException] Read timed out > >> java.lang.RuntimeException: [was class java.net.SocketTimeoutException] > >> Read > >> timed out > >> at > >> > >> > com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) > >> at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) > >> at > >> > >> > com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) > >> at > com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) > >> at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:279) > >> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138) > >> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) > >> at > >> > >> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > >> at > >> > >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > >> at > >> > >> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > >> at > >> > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > >> at > >> > >> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > >> at > >> > >> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > >> at > >> > >> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > >> at > >> > >> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > >> at > >> > >> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > >> at > >> > >> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > >> at > >> > >>
MoreLikeThis function queries
Are function queries possible using the MLT request handler? How about using the _val_ hack? Thanks for your help -- View this message in context: http://n3.nabble.com/MoreLikeThis-function-queries-tp692377p692377.html Sent from the Solr - User mailing list archive at Nabble.com.