Re: Indexing an XML file in Apache Solr

Michael Sokolov Mon, 19 Aug 2013 06:35:20 -0700

Abhiroop, I'm cc-ing the lux mailing list since this thread might not beof interest to all of solr-user; I'd suggest following up on that list.

But to answer your actual question: see the documentation herehttp://luxdb.org/REST-API.html#LuxUpdateProcessor where it explains whatto do. Basically you just insert documents with two fields: lux_xml(the full text of the document, serialized as XML) and lux_uri (apathname uniquely identifying the document). You can add other fieldsif you want, but those are the special names (can be aliased if needed)that trigger Lux's update processor.


-Mike

PS I think we need a better "getting started" tutorial; lots of folksare confused about how to insert docs and get started. Putting it on theTODO list ...


On 08/19/2013 03:24 AM, Abhiroop wrote:

Funnily just today itself I was looking at Lux for searching through my xml
file. Now what I have inferred is that I need to format my xml to fit the
format of Solr. Now do I have to manually code it or do i have some kind of
parser on which the xml if fed is formatted to the Solr version? I couldnt
find any code examples in Lux.


On Sun, Aug 18, 2013 at 11:20 PM, Michael Sokolov-3 [via Lucene] <
ml-node+s472066n4085344...@n3.nabble.com> wrote:

You might be interested in trying Lux, which is a Solr extension that
indexes XML documents using the element and attribute names and the
contents of those nodes in your document.  It also allows you to define
XPath indexes (like DIH, I think, but with the full XPath 2.0 syntax),
and to query your document collection using XQuery 1.0 (in combination
with standard lucene searches at the document level).  See
http://luxdb.org/

-Mike Sokolov

On 8/16/2013 8:55 AM, Abhiroop wrote:

I am very new to Solr. I am looking to index an xml file and search its
contents. Its structure resembles something like this

<entry id="REACT_142474" acc="REACT_142474.5">
<name>((1,6)-alpha-glucosyl)poly((1,4)-alpha-glucosyl)glycogenin =&gt;
poly{(1,4)-alpha-      glucosyl} glycogenin + alpha-D-glucose</name>
<description>This event has been computationally inferred from an event

that

has been demonstrated in another species.The inference is based on the
homology mapping in Ensembl Compara. Briefly, reactions for which all
involved PhysicalEntities (in input, output and catalyst) have a mapped
orthologue/paralogue (for complexes at least 75% of components must have

mapping) are inferred to the other species. High level events are also
inferred for these events to allow for easier navigation.More details

and

caveats of the event inference in Reactome. For details on the Ensembl
Compara system see also: Gene orthology/paralogy prediction
method.</description>
<dates>
<date type="creation" value="06-JUN-2013"/>
<date type="last_modification" value="06-JUN-2013"/>
</dates>
<cross_references>
<ref dbname="ChEBI" dbkey="17925"/>
<ref dbname="UniProt" dbkey="Q06625"/>
<ref dbname="ChEBI" dbkey="18291"/>
<ref dbname="UniProt" dbkey="P47011"/>
<ref dbname="UniProt" dbkey="P36143"/>
<ref dbname="GO" dbkey="GO:0004135"/>
<ref dbname="taxonomy" dbkey="4932"/>
</cross_references>
<additional_fields>
<field name="organism">Saccharomyces cerevisiae</field>
</additional_fields>
</entry>

Is it essential to use the DIH to import this data into Solr? Isn't

there

any simpler way to accomplish the task? Can it be done through SolrJ as

I am

fine with outputting the result through the console too. It would be

really

helpful if someone could point me to some useful examples or resources

on

this apart from the official documentation.



--
View this message in context:

http://lucene.472066.n3.nabble.com/Indexing-an-XML-file-in-Apache-Solr-tp4085053.html

Sent from the Solr - User mailing list archive at Nabble.com.



------------------------------
  If you reply to this email, your message will be added to the discussion
below:

http://lucene.472066.n3.nabble.com/Indexing-an-XML-file-in-Apache-Solr-tp4085053p4085344.html
  To unsubscribe from Indexing an XML file in Apache Solr, click 
here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4085053&code=YXNpYW1nZW5pdXNAZ21haWwuY29tfDQwODUwNTN8LTMzNDk4OTkzNQ==>
.
NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

Re: Indexing an XML file in Apache Solr

Reply via email to