Re: Indexing HTML

Erik Hatcher Mon, 27 Aug 2007 08:42:38 -0700


On Aug 27, 2007, at 10:00 AM, Michael Kimsal wrote:

What's odd about this is that the error seems to indicate that I did.

Actually the error message looks like you escaped too much. Youshould _not_ escape <field>, only the contents of it.


        Erik


The full text (minus the stack trace) was

org.xmlpull.v1.XmlPullParserException: parser must be on START_TAGor TEXTto read text (position: START_TAG seen ...<fieldname="line"><a

href="foobar"&gt;... @4:37)

Or is that just a byproduct of how SOLR reports the errors back -always

escaping them?

Thanks guys - I'll have another crack at this tonight.


On 8/27/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:


Michael,

I think the issue is that you're not escaping the <field> values.
Send something like this to Solr instead:

  <field name="line">&lt;a
href="foobar"&gt;&lt;b&gt;&lt;i&gt;linktext&lt;/i&gt;&lt;/b&gt;&lt;/
a&gt;</field>

        Erik


On Aug 27, 2007, at 9:29 AM, Michael Kimsal wrote:

Hello

I'm trying to index individual lines of an HTML file, and I'm
hitting this
error:

TEXT must be immediately followed by END_TAG and not START_TAG

I've got something that looks like

<add>
<doc>
<field name="id">4</field>

<field name="line"><a href="foobar"><b><i>linktext</i></b></a></field>

</doc>
</add>

Actually, that sample code above, as its own data file POSTed toSOLR,

throws

parser must be on START_TAG or TEXT to read text (position:
START_TAG seen
...&lt;field name="line"&gt;&lt;a href="foobar"&gt;... @4:37

as an error.

Any clues as to how I can do this?  I'd like to keep the original
copy of
each line intact in the index.

Thanks!

--
Michael Kimsal
http://webdevradio.com



--
Michael Kimsal
http://webdevradio.com

Re: Indexing HTML

Reply via email to