Abhiroop, I'm cc-ing the lux mailing list since this thread might not be of interest to all of solr-user; I'd suggest following up on that list.

But to answer your actual question: see the documentation here http://luxdb.org/REST-API.html#LuxUpdateProcessor where it explains what to do. Basically you just insert documents with two fields: lux_xml (the full text of the document, serialized as XML) and lux_uri (a pathname uniquely identifying the document). You can add other fields if you want, but those are the special names (can be aliased if needed) that trigger Lux's update processor.

-Mike

PS I think we need a better "getting started" tutorial; lots of folks are confused about how to insert docs and get started. Putting it on the TODO list ...

On 08/19/2013 03:24 AM, Abhiroop wrote:
Funnily just today itself I was looking at Lux for searching through my xml
file. Now what I have inferred is that I need to format my xml to fit the
format of Solr. Now do I have to manually code it or do i have some kind of
parser on which the xml if fed is formatted to the Solr version? I couldnt
find any code examples in Lux.


On Sun, Aug 18, 2013 at 11:20 PM, Michael Sokolov-3 [via Lucene] <
ml-node+s472066n4085344...@n3.nabble.com> wrote:

You might be interested in trying Lux, which is a Solr extension that
indexes XML documents using the element and attribute names and the
contents of those nodes in your document.  It also allows you to define
XPath indexes (like DIH, I think, but with the full XPath 2.0 syntax),
and to query your document collection using XQuery 1.0 (in combination
with standard lucene searches at the document level).  See
http://luxdb.org/

-Mike Sokolov

On 8/16/2013 8:55 AM, Abhiroop wrote:

I am very new to Solr. I am looking to index an xml file and search its
contents. Its structure resembles something like this

<entry id="REACT_142474" acc="REACT_142474.5">
<name>((1,6)-alpha-glucosyl)poly((1,4)-alpha-glucosyl)glycogenin =&gt;
poly{(1,4)-alpha-      glucosyl} glycogenin + alpha-D-glucose</name>
<description>This event has been computationally inferred from an event
that
has been demonstrated in another species.The inference is based on the
homology mapping in Ensembl Compara. Briefly, reactions for which all
involved PhysicalEntities (in input, output and catalyst) have a mapped
orthologue/paralogue (for complexes at least 75% of components must have
a
mapping) are inferred to the other species. High level events are also
inferred for these events to allow for easier navigation.More details
and
caveats of the event inference in Reactome. For details on the Ensembl
Compara system see also: Gene orthology/paralogy prediction
method.</description>
<dates>
<date type="creation" value="06-JUN-2013"/>
<date type="last_modification" value="06-JUN-2013"/>
</dates>
<cross_references>
<ref dbname="ChEBI" dbkey="17925"/>
<ref dbname="UniProt" dbkey="Q06625"/>
<ref dbname="ChEBI" dbkey="18291"/>
<ref dbname="UniProt" dbkey="P47011"/>
<ref dbname="UniProt" dbkey="P36143"/>
<ref dbname="GO" dbkey="GO:0004135"/>
<ref dbname="taxonomy" dbkey="4932"/>
</cross_references>
<additional_fields>
<field name="organism">Saccharomyces cerevisiae</field>
</additional_fields>
</entry>

Is it essential to use the DIH to import this data into Solr? Isn't
there
any simpler way to accomplish the task? Can it be done through SolrJ as
I am
fine with outputting the result through the console too. It would be
really
helpful if someone could point me to some useful examples or resources
on
this apart from the official documentation.



--
View this message in context:
http://lucene.472066.n3.nabble.com/Indexing-an-XML-file-in-Apache-Solr-tp4085053.html
Sent from the Solr - User mailing list archive at Nabble.com.


------------------------------
  If you reply to this email, your message will be added to the discussion
below:

http://lucene.472066.n3.nabble.com/Indexing-an-XML-file-in-Apache-Solr-tp4085053p4085344.html
  To unsubscribe from Indexing an XML file in Apache Solr, click 
here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4085053&code=YXNpYW1nZW5pdXNAZ21haWwuY29tfDQwODUwNTN8LTMzNDk4OTkzNQ==>
.
NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




Reply via email to