Hi,

This is a very strange behavior and the fact that it is cause by one specific field, again, leads me to believe it's still a data issue. Did you try using SolrJ to query the data as well? If the same thing happens when using the binary protocol, then it's probably not a data issue. On the other hand, if it works fine, then at least you can inspect the data to see where things go wrong. Sorry for insisting on that, but I cannot think of anything else that can cause this problem.

If anyone else have a better idea, I'm actually very curious to hear about it.

Uri

Rupert Fiasco wrote:
The text file at:

http://brockwine.com/solr.txt

Represents one of these truncated responses (this one in XML). It
starts out great, then look at the bottom, boom, game over. :)

I found this document by first running our bigger search which breaks
and then zeroing in a specific broken document by using the rows/start
parameters. But there are any unknown number of these "broken"
documents - a lot I presume.

-Rupert

On Tue, Aug 25, 2009 at 9:40 AM, Avlesh Singh<avl...@gmail.com> wrote:
Can you copy-paste the source data indexed in this field which causes the
error?

Cheers
Avlesh

On Tue, Aug 25, 2009 at 10:01 PM, Rupert Fiasco <rufia...@gmail.com> wrote:

Using wt=json also yields an invalid document. So after more
investigation it appears that I can always "break" the response by
pulling back a specific field via the "fl" parameter. If I leave off a
field then the response is valid, if I include it then Solr yields an
invalid document - a truncated document. This happens in any response
format (xml, json, ruby).

I am using the SolrJ client to add documents to in my index. My field
is a normal "text" field type and the text itself is the first 1000
characters of an article.

It can very well be an issue with the data itself. For example, if the
data
contains un-escaped characters which invalidates the response
When I look at the document in using wt=xml then all XML entities are
escaped. When I look at it under wt=ruby then all single quotes are
escaped, same for json, so it appears that all escaping it taking
place. The core problem seems to be that the document is just
truncated - it just plain end of files. Jetty's log says its sending
back an HTTP 200 so all is well.

Any ideas on how I can dig deeper?

Thanks
-Rupert


On Mon, Aug 24, 2009 at 4:31 PM, Uri Boness<ubon...@gmail.com> wrote:
It can very well be an issue with the data itself. For example, if the
data
contains un-escaped characters which invalidates the response. I don't
know
much about ruby, but what do you get with wt=json?

Rupert Fiasco wrote:
I am seeing our responses getting truncated if and only if I search on
our main text field.

E.g. I just do some basic like

title_t:arthritis

Then I get a valid document back. But if I add in our larger text field:

title_t:arthritis OR text_t:arthritis

then the resultant document is NOT valid XML (if using wt=xml) or Ruby
(using wt=ruby). If I run these through curl on the command its
truncated and if I run the search through the web-based admin panel
then I get an XML parse error.

This appears to have just started recently and the only thing we have
done is change our indexer from a PHP one to a Java one, but
functionally they are identical.

Any thoughts? Thanks in advance.

- Rupert



Reply via email to