add/update document as distinct operations? Is it possible?

2010-04-01 Thread Julian Davchev
Hi
I have distributed messaging solution where I need to distinct between
adding a document and just
trying to update it.

Scenario:
1. message sent for document to be updated
2. meanwhile another message is sent for document to be deleted and is
executed before 1
As a result when 1 comes instead of ignoring the update as document is
no more...it will add it again.

>From what I see in manual I cannot distinct those operations which
would. Any pointers?

Cheers


Re: add/update document as distinct operations? Is it possible?

2010-04-01 Thread Julian Davchev
Also in this regard the possibility to update document without giving
all required fields
but just uniqueKey  and some other data

Julian Davchev wrote:
> Hi
> I have distributed messaging solution where I need to distinct between
> adding a document and just
> trying to update it.
>
> Scenario:
> 1. message sent for document to be updated
> 2. meanwhile another message is sent for document to be deleted and is
> executed before 1
> As a result when 1 comes instead of ignoring the update as document is
> no more...it will add it again.
>
> From what I see in manual I cannot distinct those operations which
> would. Any pointers?
>
> Cheers
>   



Re: Solr crashing while extracting from very simple text file

2010-04-01 Thread Erik Hatcher

Yes, please report this to the Tika project.

Erik

On Mar 31, 2010, at 9:31 PM, Ross wrote:


Does anyone have any thoughts or suggestions on this?  I guess it's
really a Tika problem. Should I try to report it to the Tika project?

I wonder if someone could try it to see if it's a general problem or
just me. I can reproduce it by firing up the nano editor, creating a
file with XXBLE on one line and nothing else. Try indexing that and
Solr / Tika crashes. I can avoid it by editing the file slightly but I
haven't really been able to discover a consistent pattern. It works if
I change the word to lower case. Also a three line file like this
works

a
a
XXBLE

but not

x
x
XXBLE

It's a bit unfortunate because a similar word (a person's name ??BLE )
with the same problem appears frequently in upper case near the top of
my files.

Cheers
Ross


On Sun, Mar 21, 2010 at 12:58 PM, Ross  wrote:

Hi all

I'm trying to import some text files. I'm mostly following Avi
Rappoport's tutorial.  Some of my files cause Solr to crash while
indexing. I've narrowed it down to a very simple example.

I have a file named test.txt with one line. That line is the word
XXBLE and nothing else

This is the command I'm using.

curl "http://localhost:8080/solr-example/update/extract?literal.id=1&commit=true 
"

-F "myfi...@test.txt"

The result is pasted below. Other files work just fine. The problem
seems to be related to the letters B and E. If I change them to
something else or make them lower case then it works. In my real
files, the XX is something else but the result is the same. It's a
common word in the files. I guess for this "quick and dirty" job I'm
doing I could do a bulk replace in the files to make it lower case.

Is there any workaround for this?

Thanks
Ross

Apache Tomcat/6.0.20 - Error
report HTTP Status 500 -
org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.txt.txtpar...@19ccba

org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.txt.txtpar...@19ccba
   at  
org 
.apache 
.solr 
.handler 
.extraction 
.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
   at  
org 
.apache 
.solr 
.handler 
.ContentStreamHandlerBase 
.handleRequestBody(ContentStreamHandlerBase.java:54)
   at  
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)
   at org.apache.solr.core.RequestHandlers 
$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
241)
   at  
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain 
.internalDoFilter(ApplicationFilterChain.java:235)
   at  
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
206)
   at  
org 
.apache 
.catalina 
.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at  
org 
.apache 
.catalina 
.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at  
org 
.apache 
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at  
org 
.apache 
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at  
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
109)
   at  
org 
.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
293)
   at  
org 
.apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
849)
   at org.apache.coyote.http11.Http11Protocol 
$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at org.apache.tomcat.util.net.JIoEndpoint 
$Worker.run(JIoEndpoint.java:454)

   at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.txt.txtpar...@19ccba
   at  
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java: 
121)
   at  
org.apache.tika.parser.AutoDetectPar

Is this a bug of the RessourceLoader?

2010-04-01 Thread MitchK

Hello community,

I was hunting a ghostbug the last few days. As some of you might have
recognized, I have written some postings, because of unexpected
dismax-handler behaviour and some other problems. However, there was no
error in my code nor in my schema.xml. 

It seems like that the ressource-loader has got a little bug. The first line
of a file you want to load with the getLine()-method of RessourceLoader [1]
has to be outcommented by "#". If not, the first line seems to be ignored or
something like that. 

Please let me know, whether you can reproduce this bug on your own. The
responsible code was copied from the StopWordFilter and looks like that:

//copied from StopFilterFactory and some vars are renamed
if (wordsFile != null) 
{
try {
List files = 
StrUtils.splitFileNames(wordsFile);
if (words == null && files.size() > 0)
{
//default stopwords list has 35 
or so words, but maybe don't make it
that big to start
words = new 
CharArraySet(files.size() * 10, true);
}
for (String file : files) 
{
List wlist = 
loader.getLines(file.trim());
//TODO: once 
StopFilter.makeStopSet(List) method is available, switch
to using that so we can avoid a toArray() call

words.addAll(StopFilter.makeStopSet((String[])wlist.toArray(new
String[0]), true));
}
  } catch (IOException e) 
  {
throw new RuntimeException(e);
  }
} 
--



If you can reproduce this error, I think one should note it in the javadocs,
because bypassing this unexpected behaviour seems to be easy: just
outcomment the first line with a "#"-character.


Hope this helps
- Mitch

[1]
http://lucene.apache.org/solr/api/org/apache/solr/common/ResourceLoader.html
-- 
View this message in context: 
http://n3.nabble.com/Is-this-a-bug-of-the-RessourceLoader-tp690523p690523.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is this a bug of the RessourceLoader?

2010-04-01 Thread Robert Muir
On Thu, Apr 1, 2010 at 7:06 AM, MitchK  wrote:

>
> It seems like that the ressource-loader has got a little bug. The first
> line
> of a file you want to load with the getLine()-method of RessourceLoader [1]
> has to be outcommented by "#". If not, the first line seems to be ignored
> or
> something like that.
>
>
Some applications (such as Windows Notepad), insert a UTF-8 Byte Order Mark
(BOM) as the first character of the file. So, perhaps the first word in your
stopwords list contains a UTF-8 BOM and thats why you are seeing this
behavior.

If you look at the file with "more" and the first character appears to be
, then you can confirm thats the problem.
-- 
Robert Muir
rcm...@gmail.com


Re: Is this a bug of the RessourceLoader?

2010-04-01 Thread MitchK

I used notepadd++ to create the file and yes, you might be right. I will test
whether that was the problem.
If yes, do you know whether script-languages like php or javascript also
setting a BOM when they create a utf-8-encoded file/text?

Probably making a note for this behaviour somewhere in the FAQ would be a
good idea, since it depends in parts on what software one used to create the
file, wouldn't it?

- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Is-this-a-bug-of-the-RessourceLoader-tp690523p690669.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Query time only Ranges

2010-04-01 Thread Ankit Bhatnagar
Hi Chris,
Actually I needed time upto seconds granularity, so did you mean I should index 
the field after conversion into seconds

Ankit

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, March 31, 2010 10:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Query time only Ranges


: I am working on use case - wherein i need to Query to just time ranges
: without date component.
: 
: search for docs with between 4pm - 6pm 

if you only need to store the hour of the day, and query on the hour of 
the day, then i would just use a numeric integer field containing the hour 
of the day.

if you want minute or second (even even millisecond) granularity, but you 
still only care abotu the time of day (and note the *date*) then i would 
still use an integer field, and just index the numeric value in whatever 
granualrity you need.



-Hoss



Re: SOLR-1316 How To Implement this autosuggest component ???

2010-04-01 Thread stockii

hello. 

i understand not really much about this conversation :D

but i think you can help me. i got an idea for my suggestions.
make it sense to group my suggestions with patch-236 ?

i test it. and it worked not complete well =(

my problem ist that i have too many productnames with too long names for our
app. so its necessary to group single terms and mulitple terms to one
suggestion. 
the field collapse works well but i got some strange results.

does anybody try something like this ? 

1316 is not the right component i think so !? 

thx

 
-- 
View this message in context: 
http://n3.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp506492p690933.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: selecting documents older than 4 hours

2010-04-01 Thread Israel Ekpo
I did something similar.

The only difference with my set up is that I have two columns; one that
store the dates the document was first created and a second that stores the
date it was last updated as unix time stamps

So my query to find documents that are older than 4 hours would be very easy

To find documents that were last updated more than for hours ago you would
do something like this

q=last_update_date:[* TO 1270119278]

The current timestamp now is 1270133678. 4 hours ago was 1270119278

The column types in the schema is tint



On Wed, Mar 31, 2010 at 11:18 PM, herceg_novi  wrote:

>
> Hello, I'd like to select documents older than 4 hours in my Solr 1.4
> installation.
>
> The query
>
> q=last_update_date:[NOW-7DAYS TO NOW-4HOURS]
>
> does not return a correct recordset. I would expect to get all documents
> with last_update_date in the specified range. Instead solr returns all
> documents that exist in the index which is not what I would expect.
> Last_update_date is SolrDate field.
>
> This does not work either
> q=last_update_date:[NOW/DAY-7DAYS TO NOW/HOUR-4HOURS]
>
> This works, but I manually had to calculate the 4 hour difference and
> insert
> solr date formated timestamp into my query (I prefer not to do that)
> q=last_update_date:[NOW/DAY-7DAYS TO 2010-03-31T19:40:34Z]
>
> Any ideas if I can get this to work as expected?
> q=last_update_date:[NOW-7DAYS TO NOW-4HOURS]
>
> Thanks!
> --
> View this message in context:
> http://n3.nabble.com/selecting-documents-older-than-4-hours-tp689975p689975.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: add/update document as distinct operations? Is it possible?

2010-04-01 Thread Erick Erickson
One of the most requested features in Lucene/SOLR is to be able
to update only selected fields rather than the whole document. But
that's not how it works at present. An update is really a delete and
an add.

So for your second message, you can't do a partial update, you must
"update" the whole document.

I'm a little confused by what you *want* in your first e-mail. But the
current way SOLR works, if the SOLR server first received the delete
then the update, the index would have the document in it. But the
opposite order would delete the documen.

But this really doesn't sound like a SOLR issue, since SOLR can't
magically divine the desired outcome. Somewhere you have
to coordinate the requests or your index will not be what you expect.
That is, you have to define what rules index modifications follow and
enforce them. Perhaps you can consider a queueing mechanism of
some sort (that you'd have to implement yourself...)

HTH
Erick


On Thu, Apr 1, 2010 at 1:03 AM, Julian Davchev  wrote:

> Hi
> I have distributed messaging solution where I need to distinct between
> adding a document and just
> trying to update it.
>
> Scenario:
> 1. message sent for document to be updated
> 2. meanwhile another message is sent for document to be deleted and is
> executed before 1
> As a result when 1 comes instead of ignoring the update as document is
> no more...it will add it again.
>
> From what I see in manual I cannot distinct those operations which
> would. Any pointers?
>
> Cheers
>


Re: Read Time Out Exception while trying to upload a huge SOLR input xml

2010-04-01 Thread Erick Erickson
Don't do that. For many reasons . By trying to batch so many docs
together, you're just *asking* for trouble. Quite apart from whether it'll
work once, having *any* HTTP-based protocol work reliably with 13G is
fragile...

For instance, I don't want to have my know whether the XML parsing in
SOLR parses the entire document into memory before processing or
not. But I sure don't want my application to change behavior if SOLR
changes it's mind and wants to process the other way. My perfectly
working application (assuming an event-driven parser) could
suddenly start requiring over 13G of memory... Oh my aching head!

Your specific error might even be dependent upon GCing, which will
cause it to break differently, sometimes, maybe..

So do break things up and transmit multiple documents. It'll save you
a world of hurt.

HTH
Erick

On Thu, Apr 1, 2010 at 4:34 AM, Mark Fletcher
wrote:

> Hi,
>
> For the first time I tried uploading a huge input SOLR xml having about 1.2
> million *docs* (13GB in size). After some time I get the following
> exception:-
>
>  The server encountered an internal error ([was class
> java.net.SocketTimeoutException] Read timed out
> java.lang.RuntimeException: [was class java.net.SocketTimeoutException]
> Read
> timed out
>  at
>
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
>  at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
>  at
>
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
>  at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>  at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:279)
>  at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138)
>  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>  at
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>  at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>  at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>  at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>  at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>  at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>  at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>  at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>  at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>  at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>  at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>  at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>  at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>  at
>
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>  at java.lang.Thread.run(Thread.java:619)
> Caused by: java.net.SocketTimeoutException: Read timed out
> ...
>
> Was the file I tried to upload too big and should I try reducing its
> size..?
>
> Thanks and Rgds,
> Mark.
>


Does Lucidimagination search uses Multi facet query filter or uses session?

2010-04-01 Thread bbarani

Hi,

I am trying to create a search functionality same as that of
Lucidimagination search.

As of now I have formed the Facet query as below

http://localhost:8080/solr/db/select?q=*:*&fq={!tag=3DotHierarchyFacet}3DotHierarchyFacet:ABC&facet=on&facet.field={!ex=3DotHierarchyFacet}3DotHierarchyFacet&facet.field=ApplicationStatusFacet&facet.mincount=1

Since I am having multiple facets I have planned to form the query based on
the user selection. Something like below...if the user selects (multiple
facets) application status as 'P' I would form the query as below

http://localhost:8080/solr/db/select?q=*:*&fq={!tag=3DotHierarchyFacet}3DotHierarchyFacet:NTS&fq={!tag=ApplicationStatusFacet}ApplicationStatusFacet:P&facet=on&facet.field={!ex=3DotHierarchyFacet}3DotHierarchyFacet&&facet.field={!ex=ApplicationStatusFacet}&facet.mincount=1

Can someone let me know I am forming the correct query to perform
multiselect facets? I just want to know if I am doing anything wrong in the
query..

We are also trying to achieve this using sessions but if we are able to
solve this by query I would prefer using query than using session
variables..

Thanks,
Barani
-- 
View this message in context: 
http://n3.nabble.com/Does-Lucidimagination-search-uses-Multi-facet-query-filter-or-uses-session-tp691167p691167.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: selecting documents older than 4 hours

2010-04-01 Thread Chris Hostetter

: q=last_update_date:[NOW-7DAYS TO NOW-4HOURS]
: 
: does not return a correct recordset. I would expect to get all documents
: with last_update_date in the specified range. Instead solr returns all
: documents that exist in the index which is not what I would expect.
: Last_update_date is SolrDate field. 

that query should work fine, and do exactly what you describe.

there is no field type named "SolrDate" in solr ... can you please past in 
the exact schema.xml entries for your last_update_date field, as well as 
for the fieldType assocaited with that field?

also: what does debugQuery=true show when you execute that query? (in 
particular i'd like to see the what the parsedquery information looks 
like)



-Hoss



Re: Solr crashing while extracting from very simple text file

2010-04-01 Thread Chris Hostetter

: Yes, please report this to the Tika project.

except that when i run "tika-app-0.6.jar" on a text file like the one Ross 
describes, i don't get the error he describes, which means it may be 
something off in how Solr is using Tika.

Ross: I can't reproduce this error on the trunk using the example solr 
configs and the text file below.  can you verify exactly which version of 
SOlr you are using (and which version of tika you are using inside solr) 
and the exact byte contents of your simplest problematic text file?

hoss...@brunner:~/tmp$ cat tmp.txt
x
x
XXBLE
hoss...@brunner:~/tmp$ hexdump -C tmp.txt
  78 0a 78 0a 58 58 42 4c  45 0a|x.x.XXBLE.|
000a
hoss...@brunner:~/tmp$ curl 
"http://localhost:8983/solr/update/extract?literal.id=1&commit=true"; -F 
"myfi...@tmp.txt"


066



-Hoss



Re: Solr crashing while extracting from very simple text file

2010-04-01 Thread Ross
Hi Chris, thanks for looking at this.

I'm using Solr 1.4.0 including the Tika that's in the tgz file which
means Tika 0.4.

I've now discovered that only two letters are required. A single line
with XE will crash it.

This fails:

r...@gamma:/home/ross# hexdump -C test.txt
  58 45 0a  |XE.|
0003
r...@gamma:/home/ross#

This works

r...@gamma:/home/ross# hexdump -C test.txt
  58 46 0a  |XF.|
0003
r...@gamma:/home/ross#

XA, XB, XC, XD, XF all work okay. There's just something special about XE.

The command I use is:

curl 
"http://localhost:8080/solr-example/update/extract?literal.id=doc1&fmap.content=body&commit=true";
-F "myfi...@test.txt"

I filed a bug at https://issues.apache.org/jira/browse/TIKA-397 but I
guess 0.4 is an old version so I wouldn't expert it to get much
attention.

It looks like I should upgrade Tika to 0.6. I don't really know how to
do that or if Solr 1.4 works with Tika 0.6. The Tika pages talk about
using Maven to build it. Sorry, I'm no Linux expert.

Ross


On Thu, Apr 1, 2010 at 1:07 PM, Chris Hostetter
 wrote:
>
> : Yes, please report this to the Tika project.
>
> except that when i run "tika-app-0.6.jar" on a text file like the one Ross
> describes, i don't get the error he describes, which means it may be
> something off in how Solr is using Tika.
>
> Ross: I can't reproduce this error on the trunk using the example solr
> configs and the text file below.  can you verify exactly which version of
> SOlr you are using (and which version of tika you are using inside solr)
> and the exact byte contents of your simplest problematic text file?
>
> hoss...@brunner:~/tmp$ cat tmp.txt
> x
> x
> XXBLE
> hoss...@brunner:~/tmp$ hexdump -C tmp.txt
>   78 0a 78 0a 58 58 42 4c  45 0a                    |x.x.XXBLE.|
> 000a
> hoss...@brunner:~/tmp$ curl 
> "http://localhost:8983/solr/update/extract?literal.id=1&commit=true"; -F 
> "myfi...@tmp.txt"
> 
> 
> 0 name="QTime">66
> 
>
>
> -Hoss
>
>


Re: add/update document as distinct operations? Is it possible?

2010-04-01 Thread Chris Hostetter

: Subject: add/update document as distinct operations? Is it possible?
: References:
: 
: In-Reply-To:
: 

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking




-Hoss



How to view SOLR logs

2010-04-01 Thread bbarani

Hi,

We have a application which uses SOLR sharp to get the details from SOLR.
Currently since we are in testing stage we would like to know what query is
being passed to SOLR from our application without debuggging the application
each time. 

Is there a way to view the queries passed to SOLR on a specified time. We
are running SOLR on jetty and using SOLR Sharp for accessing the SOLR data.

Thanks,
Barani
-- 
View this message in context: 
http://n3.nabble.com/How-to-view-SOLR-logs-tp691642p691642.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Read Time Out Exception while trying to upload a huge SOLR input xml

2010-04-01 Thread Shawn Smith
The error might be that your http client doesn't handle really large
files (32-bit overflow in the Content-Length header?) or something in
your network is killing your long-lived socket?  Solr can definitely
accept a 13GB xml document.

I've uploaded large files into Solr successfully, including recently a
12GB XML input file with ~4 million documents.  My Solr instance had
2GB of memory and it took about 2 hours.  Solr streamed the XML in
nicely.  I had to jump through a couple of hoops, but in my case it
was easier than writing a tool to split up my 12GB XML file...

1. I tried to use curl to do the upload, but it didn't handle files
that large.  For my quick and dirty testing, netcat (nc) did the
trick--it doesn't buffer the file in memory and it doesn't overflow
the Content-Length header.  Plus I could pipe the data through pv to
get a progress bar and estimated time of completion.  Not recommended
for production!

  FILE=documents.xml
  SIZE=$(stat --format %s $FILE)
  (echo "POST /solr/update HTTP/1.1
  Host: localhost:8983
  Content-Type: text/xml
  Content-Length: $SIZE
  " ; cat $FILE ) | pv -s $SIZE | nc localhost 8983

2. Indexing seemed to use less memory if I configured Solr to auto
commit periodically in solrconfig.xml.  This is what I used:



25000 
30 



Shawn

On Thu, Apr 1, 2010 at 10:10 AM, Erick Erickson  wrote:
> Don't do that. For many reasons . By trying to batch so many docs
> together, you're just *asking* for trouble. Quite apart from whether it'll
> work once, having *any* HTTP-based protocol work reliably with 13G is
> fragile...
>
> For instance, I don't want to have my know whether the XML parsing in
> SOLR parses the entire document into memory before processing or
> not. But I sure don't want my application to change behavior if SOLR
> changes it's mind and wants to process the other way. My perfectly
> working application (assuming an event-driven parser) could
> suddenly start requiring over 13G of memory... Oh my aching head!
>
> Your specific error might even be dependent upon GCing, which will
> cause it to break differently, sometimes, maybe..
>
> So do break things up and transmit multiple documents. It'll save you
> a world of hurt.
>
> HTH
> Erick
>
> On Thu, Apr 1, 2010 at 4:34 AM, Mark Fletcher
> wrote:
>
>> Hi,
>>
>> For the first time I tried uploading a huge input SOLR xml having about 1.2
>> million *docs* (13GB in size). After some time I get the following
>> exception:-
>>
>>  The server encountered an internal error ([was class
>> java.net.SocketTimeoutException] Read timed out
>> java.lang.RuntimeException: [was class java.net.SocketTimeoutException]
>> Read
>> timed out
>>  at
>>
>> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
>>  at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
>>  at
>>
>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
>>  at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>>  at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:279)
>>  at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138)
>>  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>>  at
>>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>>  at
>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>  at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>  at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>>  at
>>
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>  at
>>
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>  at
>>
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>  at
>>
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>  at
>>
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>  at
>>
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>  at
>>
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>  at
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>>  at
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>>  at
>>
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>>  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>>  at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.net.SocketTimeoutException: Read timed out
>> ...
>>
>> Was the file I tried to upload too big and should I try reducing its
>> size..?
>>
>> Thanks and R

Re: How to view SOLR logs

2010-04-01 Thread Shawn Smith
The default jetty.xml sets up a request logger that logs to
"logs/_mm_dd.request.log" relative to the directory jetty is
started from.  Look for NCSARequestLog in your jetty.xml.  If SOLR
Sharp uses GETs (not POSTs) you can look at the urls in the log and
pull out the "q" and "fq" parameters which will contain the queries.

Shawn

On Thu, Apr 1, 2010 at 2:56 PM, bbarani  wrote:
>
> Hi,
>
> We have a application which uses SOLR sharp to get the details from SOLR.
> Currently since we are in testing stage we would like to know what query is
> being passed to SOLR from our application without debuggging the application
> each time.
>
> Is there a way to view the queries passed to SOLR on a specified time. We
> are running SOLR on jetty and using SOLR Sharp for accessing the SOLR data.
>
> Thanks,
> Barani
> --
> View this message in context: 
> http://n3.nabble.com/How-to-view-SOLR-logs-tp691642p691642.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Search accross more than one field (dismax) ignored

2010-04-01 Thread Chris Hostetter

: select/?q=video&qt=dismax&qf=titleMain^2.0+titleShort^5.3&debugQuery=on
...
: 
: +(titleMain:video^2.0)~0.01 (titleMain:video^2.0)~0.01
: 
...
: My solrconfig for the dismax handler:

..what about schema.xml? ... what do the field (and corrisponding 
fieldtype) for titleShort look like?

: Even when I do not query against the dismax-requestHandler, a search accross
: more than one field seems to fail. 

please be explict: provide an example and define "fail" (error page? no 
results? incorrect results? ... what are we talking baout in this case)


-Hoss



Re: How to view SOLR logs

2010-04-01 Thread bbarani

Hi,

I could see all GET request properly in SOLR but couldnt find any POST
request issued from SOLRsharp.

If I issue search directly in SOLR (not from the application) I could see
the logs as below,

127.0.0.1 -  -  [02/04/2010:03:33:23 +] "GET /solr/db/select?q=test

But when search happens through application the logs are as follows,

171.165.243.16 -  -  [01/04/2010:22:07:39 +] "POST /solr/db/select/
HTTP/1.1" 200 1806 

Not sure why the entire string is not logged...

Thanks,
Barani
-- 
View this message in context: 
http://n3.nabble.com/How-to-view-SOLR-logs-tp691642p692216.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Read Time Out Exception while trying to upload a huge SOLR input xml

2010-04-01 Thread Mark Fletcher
Hi Eric, Shawn,

Thank you for your reply.

Luckily just on the second time itself my 13GB SOLR XML (more than a million
docs) went in fine into SOLR without any problem and I uploaded another 2
more sets of 1.2million+ docs fine without any hassle.

I will try for lesser sized more xmls next time as well as the auto commit
suggestion.

Best Rgds,
Mark.

On Thu, Apr 1, 2010 at 6:18 PM, Shawn Smith  wrote:

> The error might be that your http client doesn't handle really large
> files (32-bit overflow in the Content-Length header?) or something in
> your network is killing your long-lived socket?  Solr can definitely
> accept a 13GB xml document.
>
> I've uploaded large files into Solr successfully, including recently a
> 12GB XML input file with ~4 million documents.  My Solr instance had
> 2GB of memory and it took about 2 hours.  Solr streamed the XML in
> nicely.  I had to jump through a couple of hoops, but in my case it
> was easier than writing a tool to split up my 12GB XML file...
>
> 1. I tried to use curl to do the upload, but it didn't handle files
> that large.  For my quick and dirty testing, netcat (nc) did the
> trick--it doesn't buffer the file in memory and it doesn't overflow
> the Content-Length header.  Plus I could pipe the data through pv to
> get a progress bar and estimated time of completion.  Not recommended
> for production!
>
>  FILE=documents.xml
>  SIZE=$(stat --format %s $FILE)
>  (echo "POST /solr/update HTTP/1.1
>  Host: localhost:8983
>  Content-Type: text/xml
>  Content-Length: $SIZE
>  " ; cat $FILE ) | pv -s $SIZE | nc localhost 8983
>
> 2. Indexing seemed to use less memory if I configured Solr to auto
> commit periodically in solrconfig.xml.  This is what I used:
>
>
>
>25000 
>30 
>
>
>
> Shawn
>
> On Thu, Apr 1, 2010 at 10:10 AM, Erick Erickson 
> wrote:
> > Don't do that. For many reasons . By trying to batch so many docs
> > together, you're just *asking* for trouble. Quite apart from whether
> it'll
> > work once, having *any* HTTP-based protocol work reliably with 13G is
> > fragile...
> >
> > For instance, I don't want to have my know whether the XML parsing in
> > SOLR parses the entire document into memory before processing or
> > not. But I sure don't want my application to change behavior if SOLR
> > changes it's mind and wants to process the other way. My perfectly
> > working application (assuming an event-driven parser) could
> > suddenly start requiring over 13G of memory... Oh my aching head!
> >
> > Your specific error might even be dependent upon GCing, which will
> > cause it to break differently, sometimes, maybe..
> >
> > So do break things up and transmit multiple documents. It'll save you
> > a world of hurt.
> >
> > HTH
> > Erick
> >
> > On Thu, Apr 1, 2010 at 4:34 AM, Mark Fletcher
> > wrote:
> >
> >> Hi,
> >>
> >> For the first time I tried uploading a huge input SOLR xml having about
> 1.2
> >> million *docs* (13GB in size). After some time I get the following
> >> exception:-
> >>
> >>  The server encountered an internal error ([was class
> >> java.net.SocketTimeoutException] Read timed out
> >> java.lang.RuntimeException: [was class java.net.SocketTimeoutException]
> >> Read
> >> timed out
> >>  at
> >>
> >>
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> >>  at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> >>  at
> >>
> >>
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> >>  at
> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> >>  at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:279)
> >>  at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:138)
> >>  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
> >>  at
> >>
> >>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
> >>  at
> >>
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> >>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> >>  at
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> >>  at
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> >>  at
> >>
> >>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >>  at
> >>
> >>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >>  at
> >>
> >>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> >>  at
> >>
> >>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> >>  at
> >>
> >>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> >>  at
> >>
> >>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> >>  at
> >>
> >>

MoreLikeThis function queries

2010-04-01 Thread Blargy

Are function queries possible using the MLT request handler? How about using
the _val_ hack? Thanks for your help
-- 
View this message in context: 
http://n3.nabble.com/MoreLikeThis-function-queries-tp692377p692377.html
Sent from the Solr - User mailing list archive at Nabble.com.