Hi!

Thorsten Scherler wrote:

On Mon, 2007-01-15 at 12:23 +0000, Luis Neves wrote:
Hello.
What I do now to index XML documents it's to use a Filter to strip the markup, this works but it's impossible to know where in the document is the match located. What would it take to make possible to specify a filter query that accepts xpath expressions?... something like:

fq=xmlField:/book/content/text()

This way only the "/book/content/" element was searched.

Did I make sense? Is this possible?

AFAIK short answer: no.

The field is ALWAYS plain text. There is no xmlField type.

...but why don't you just add your text in multiple field when indexing.

Instead of plain stripping the markup do above xpath on your document
and create different fields. Like
<field name="content"> <xsl:value-of
select="/book/content/text()"/></field>
<field name="more"> <xsl:value-of select="/book/more/text()"/></field>

Makes sense?

Yes, but I have documents with different schemas on the same "xml field", also, that way I would have to know the schema of the documents being indexed (which I don't).

The schema I use is something like:
<field name="DocumentType" type="string" indexed="true" stored="true"/>
<field name="Document" type="text" indexed="true" stored="true"/>

Where each distinct DocumentType has its own schema.

I could revise this approach to use an Solr instance for each DocumentType but I would have to find a way to "merge" results from the different instances because I also need to search across different DocumentTypes... I guess I'm SOL :-(


--
Luis Neves

Reply via email to