Re: problems indexing web content

2011-03-28 Thread Markus Jelsma
> I have about 1000 documents per xml file. I am not really doing anything > with the data other than putting the xml tags around it. So essentially > the data is okay with the exception of a few documents that are causing > the errors. > > Let's say document # 47 in the xml file has a problem, i

Re: problems indexing web content

2011-03-28 Thread Charles Wardell
I have about 1000 documents per xml file. I am not really doing anything with the data other than putting the xml tags around it. So essentially the data is okay with the exception of a few documents that are causing the errors. Let's say document # 47 in the xml file has a problem, is the whole

Re: problems indexing web content

2011-03-28 Thread Markus Jelsma
Also, don't forget to encode entities or wrap them in CDATA. > Jan, > > thank you for such a quick reply. I have a feed coming in that I convert to > an Here is the type for text including index > and query with the changes suggested. > > > positionIncrementGap="100"> >

Re: problems indexing web content

2011-03-28 Thread Markus Jelsma
The analyzer order doesn't really matter, char filters are regardless of position in the analyzer always executed first. Multiple filters of the same type, however, are affected by order. Also, your error is not caused by a faulty analyzer, there is something wrong in your XML. Anyway, accordi

Re: problems indexing web content

2011-03-28 Thread Charles Wardell
Jan, thank you for such a quick reply. I have a feed coming in that I convert to an Here is the type for text including index and query with the changes suggested.

Re: problems indexing web content

2011-03-28 Thread Jan Høydahl
Hi, I assume you try to post HTML files from post.jar, and use HTMLStripCharFilter to sanitize the HTML. But you refer to "my file" as if you have multiple docs in one file? XML or HTML? Multiple files? To what UpdateRequestHandler are you posting? /update/xml or /update/extract ? For us to und

problems indexing web content

2011-03-28 Thread Charles Wardell
Hi Everyone, I setup a server and began to index my data. I have two questions I am hoping someone can help me with. Many of my files seem to index without any problems. Others, I get a host of different errors. I am indexing primarily web based content and have identified my text field as foll