Solr is not an XML engine (or a MARC engine). It uses XML as an input format for fielded data. It does not index or search arbitrary XML. You need to convert your XML into Solr's format.
I would recommend expressing MARC in a Solr schema, then working on the input XML. The input XML depends on the schema. If you need an XML engine, I'd recommend MarkLogic (commercial), a very good product. wunder On 10/5/07 12:44 AM, "PAUWELS Benoit" <[EMAIL PROTECTED]> wrote: > Hi, > > I wish to index well formed xml documents as they are. > > I have a database filled with MARCXML records. An example of these looks like > this: > > > > <record > > ns0:schemaLocation="http://www.loc.gov/MARC21/slim > http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" > > xmlns="http://www.loc.gov/MARC21/slim" > xmlns:ns0="http://www.w3.org/2001/XMLSchema-instance"> > > <leader>00000nam 22 a 4500</leader> > > <controlfield tag="001">000500000</controlfield> > > <controlfield tag="005">20050826220257.0</controlfield> > > <controlfield tag="008">000710s1998 xx r 000 0 dut > d</controlfield> > > <datafield ind1=" " ind2=" " tag="040"> > > <subfield code="a">Univ</subfield> > > </datafield> > > <datafield ind1="1" ind2=" " tag="100"> > > <subfield code="a">van Wetten, J. W.</subfield> > > </datafield> > > <datafield ind1="1" ind2="3" tag="245"> > > <subfield code="a">De positie van vrouwen in de asielprocedure > /</subfield> > > <subfield code="c">J.W. van Wetten, N. Dijkhof, F. > Heide.</subfield> > > </datafield> > > </record> > > > > The idea is to create Lucene indexes on specific MARC fields and store the > complete MARC record in Lucene 'as is'. In the presentation layer of my > application I would then have this complete MARC record at hand, and as such > have full flexibility on which MARC fields to display. So I want to create the > following record through XSLT and feed this to SOLR. > > > > <doc> > > <field name="title">De positie van vrouwen in de asielprocedure</field> > > <field name="author">van Wetten, J. W.</field> > > ... > > <field name="originalRecord"> > > <record > > ns0:schemaLocation="http://www.loc.gov/MARC21/slim > http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" > > xmlns="http://www.loc.gov/MARC21/slim" > xmlns:ns0="http://www.w3.org/2001/XMLSchema-instance"> > > <leader>00000nam 22 a 4500</leader> > > <controlfield tag="001">000500000</controlfield> > > <controlfield tag="005">20050826220257.0</controlfield> > > <controlfield tag="008">000710s1998 xx r 000 0 dut > d</controlfield> > > <datafield ind1=" " ind2=" " tag="040"> > > <subfield code="a">UGent</subfield> > > </datafield> > > <datafield ind1="1" ind2=" " tag="100"> > > <subfield code="a">van Wetten, J. W.</subfield> > > </datafield> > > <datafield ind1="1" ind2="3" tag="245"> > > <subfield code="a">De positie van vrouwen in de asielprocedure > /</subfield> > > <subfield code="c">J.W. van Wetten, N. Dijkhof, F. > Heide.</subfield> > > </datafield> > > </record> > > </field> > > </doc> > > > > I have the following in my schema.xml: > > > > <field name="author" type="text" indexed="true" stored="true" > termVectors="true"/> > > <field name="title" type="text" indexed="true" stored="true" > termVectors="true"/> > > <field name="originalRecord" type="text" indexed="false" stored="true"/> > > > > > > SOLR has of course a problem with the XML in the 'originalRecord' field. > > Is there a solution to this? Has anyone done this before? > > > > Thanks a lot. > > Benoit. > > > > > > ============================= > > PAUWELS Benoit > > Université Libre de Bruxelles - Libraries > > Head of Automation > > Av. F.D. Roosevelt 50, CP 180 > > 1050 BRUSSELS > > Belgium > > Tel: + 32 2 650 23 91 > > Fax: + 32 2 650 23 91 > > ============================= > > > > >