Re: Tips on recursive xml-parsing in dataConfig

Geert-Jan Brits Tue, 08 Jun 2010 02:09:04 -0700

my bad, it looks like XPathEntityProcessor doesn't support relative xpaths.


However, I quickly looked at the Slashdot example (which is pretty good
actually) at http://wiki.apache.org/solr/DataImportHandler.
>From that I infer that you use only 1 entity per xml-doc. And within that
entity use multiple field declararations with xpath-attributes to extract
the values you want.
So even though your xml-dcoument is nested (like most xml's are) your
field-declarations are not.

I think your best bet is to read the slashdot example and go from there.

For now, I'm not entirely sure what you want a solr-document to be in your
example. i.e:
- 1 solr-document per 1 xml-document (as supplied)
- or 1 solr-doc per CHAP  per PARA or per SUB?

Once you know that, perhaps coming up with a decent pointer is easier.

HTH,
Geert-Jan


<http://wiki.apache.org/solr/DataImportHandler>

2010/6/8 Tor Henning Ueland <tor.henn...@gmail.com>

> I have tried both to change the datasource per child node to use the
> parent nodes name, and tried to making the Xpath`s relative, all
> causing either exceptions telling that Xpath must start with /, or
> nullpointer exceptions ( nsfgrantsdir document : null).
>
> Best regards
>
> On Mon, Jun 7, 2010 at 4:12 PM, Geert-Jan Brits <gbr...@gmail.com> wrote:
> > I'm guessing (I'm not familiar with the xml dataimport handler, but I am
> > pretty familiar with Xpath)
> > that your problem lies in having absolute xpath-queries, instead of
> relative
> > xpath queries to your parent node.
> >
> > e.g: /DOK/TEKST/KAP is absolute ( the prefixed '/' tells it to be). Try
> > 'KAP' instead.
> > The same for all xpaths deeper in the tree.
> >
> > Geert-Jan
> >
> > 2010/6/7 Tor Henning Ueland <tor.henn...@gmail.com>
> >
> >> Hi,
> >>
> >> I am doing some testing of dataimport to Solr from XML-documents with
> >> many children in the children. To parse the children i some levels
> >> down using Xpath goes fine, but the speed is very slow. (~1 minute per
> >> document, on a quad Xeon server). When i do the same using the format
> >> solr wants it, the parsing time is 0.02 seconds per document.
> >>
> >> I have published a quick example here:
> >> http://pastebin.com/adhcEvRx
> >>
> >> My question is:
> >>
> >> I hope that i have done something wrong in the child-parsing  (as you
> >> can see, it goes down quite a few levels). Can anybody point me in the
> >> right direction so i can speed up the process?  I have been looking
> >> around for some examples, but nobody gives examples of such deep data
> >> indexing.
> >>
> >> PS: I know there are some bugs in the Xpath naming etc, but it is just
> >> a rough example :)
> >>
> >> --
> >> Best regars
> >> Tor Henning Ueland
> >>
> >
>
>
>
> --
> Mvh
> Tor Henning Ueland
>

Re: Tips on recursive xml-parsing in dataConfig

Reply via email to