Re: indexing XML stored on HDFS

2017-12-08 Thread Matthew Roth
Thanks Rick, While long term storage of the documents in HDFS is not necessary you do raise that easy access to these documents durning the development phase will be useful. Cassandra, spark-solr I am under the impression that I must be running SolrCloud. At this time I need some of the features

Re: indexing XML stored on HDFS

2017-12-08 Thread Cassandra Targett
Matthew, The hadoop-solr project you mention would give you the ability to index files in HDFS. It's a Job Jar, so you submit it to Hadoop with the params you need and it processes the files and sends them to Solr. It might not be the fastest thing in the world since it uses MapReduce but we (I wo

Re: indexing XML stored on HDFS

2017-12-07 Thread Rick Leir
Matthew, Oops, I should have mentioned re-indexing. With Solr, you want to be able to re-index quickly so you can try out different analysis chains. XSLT may not be fast enough for this if you have millions of docs. So I would be inclined to save the docs to a normal filesystem, perhaps in JSONL

Re: indexing XML stored on HDFS

2017-12-07 Thread Rick Leir
Matthew, Do you have some sort of script calling xslt? Sorry, I do not know Scala and I did not have time to look into your spark utils. The script or Scala could then shell out to curl, or if it is python it could use the request library to send a doc to Solr. Extra points for batching the doc

Re: indexing XML stored on HDFS

2017-12-07 Thread Matthew Roth
Yes the post tool would also be an acceptable option and one I am familiar with. However, I also am not seeing exactly how I would query hdfs. The hadoop-solr [0 ] tool by lucidworks looks the most promising. I have a meeting to attend t

Re: indexing XML stored on HDFS

2017-12-06 Thread Erick Erickson
Perhaps the bin/post tool? See: https://lucidworks.com/2015/08/04/solr-5-new-binpost-utility/ On Wed, Dec 6, 2017 at 2:05 PM, Matthew Roth wrote: > Hi All, > > Is there a DIH for HDFS? I see this old feature request [0 > ] that never seems to have

Re: Indexing xml documents using solrj 6.0 + solr 6.0

2016-05-09 Thread Abdel Belkasri
did you look at this: https://cwiki.apache.org/confluence/display/solr/Using+SolrJ Regards, --Abdel. On Mon, May 9, 2016 at 1:32 PM, Mat San wrote: > Hello, > > Could I ask please for urgent help since I'm new to solrj and solr. I've > read all documentation but I did not find a full complete e

Re: [Indexing XML files in Solr with DataImportHandler]

2013-10-16 Thread Gora Mohanty
On 16 October 2013 13:06, kujta1 wrote: > it is not indexing, it is saying there are no files indexed If you expect answers on the mailing list it might be best to provide details here. From a quick glance at Stackoverflow, it looks like you need a FileListEntityProcessor. Searching Google turns

Re: [Indexing XML files in Solr with DataImportHandler]

2013-10-16 Thread kujta1
it is not indexing, it is saying there are no files indexed -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-XML-files-in-Solr-with-DataImportHandler-tp4095628p4095811.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: [Indexing XML files in Solr with DataImportHandler]

2013-10-15 Thread Shalin Shekhar Mangar
What is not working? Are you seeing any exceptions in the logs? On Tue, Oct 15, 2013 at 3:53 PM, kujta1 wrote: > hello i have problems wih indexing xml file format. my solrconfigdaa-config > and solr files are here > > http://stackoverflow.com/questions/19337979/indexing-xml-files-in-solr-with-

Re: indexing xml attributes?

2011-05-17 Thread bryan rasmussen
Ah never mind, I had to restart my instance in order for my changes to the dataimporter to register. thanks, Bryan Rasmussen On Tue, May 17, 2011 at 12:19 PM, bryan rasmussen wrote: > Hi, > > As I understand it the DIH XPathEntityProcessor will not allow me to > index attributes - like so /> >

Re: indexing xml document with literals

2010-07-07 Thread Chris Hostetter
: Does anyone know how to read in data from one or more of the example xml docs : and ALSO store the filename and path from which it came? Solr has no knowledge that your "xml docs" are actually files ... the XML syntax ("...") is just a serialization mechanism for streaming data to solr about

Re: indexing XML with solr example webapp - out of java heap space

2009-12-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
the post.jar does not stream. use "curl" if you are using *nix. --Noble On Wed, Dec 9, 2009 at 12:28 AM, Feroze Daud wrote: > Hi! > > > > I downloaded SOLR and am trying to index an XML file. This XML file is > huge (500M). > > > > When I try to index it using the "post.jar" tool in example\examp

Re: Indexing XML

2009-07-07 Thread Saeli Mathieu
And here it's my code :) If you need some explanation feel free to ask :) You can test it on the first test file I gave you when I open the thread. At the moment that works only on one file, I have to change it a bit to make it works on repertory with lots of xml files, See you later guys :-) $

Re: Indexing XML

2009-07-07 Thread Saeli Mathieu
I'm sorry I almost finish my script to format my xml in Solr's xml. I'll give it to you later, I think that can help some people like me in the future :) I just need to formate my output text and everything will be fine :) Cheers for your help guys ;) On Tue, Jul 7, 2009 at 7:06 PM, Jay Hill wr

Re: Indexing XML

2009-07-07 Thread Jay Hill
Mathieu, have a look at Solr's DataImportHandler. It provides a configuration-based approach to index different types of datasources including relational databases and XML files. In particular have a look at the XpathEntityProcessor ( http://wiki.apache.org/solr/DataImportHandler#head-f1502b1ed71d9

Re: Indexing XML

2009-07-07 Thread Saeli Mathieu
Yep that making sense. But I was afraid it was the only solution. Since I finished to wrote my email I started to create a php script to create the same file but compatible with Solr. thx for your quick answer ;) On Tue, Jul 7, 2009 at 4:40 PM, Matt Mitchell wrote: > Saeli, > > Solr expects a

Re: Indexing XML

2009-07-07 Thread Matt Mitchell
Saeli, Solr expects a certain XML structure when adding documents. You'll need to come up with a mapping, that translates the original structure to one that solr understands. You can then search solr and get those solr documents back. If you want to keep the original XML, you can store it in a fie

Re: Indexing xml data

2008-07-09 Thread Alexander Ramos Jardim
Oh thanks. I don't want to search on that. I will have a name field that contains the unique identifier of the document. 2008/7/9 Noble Paul നോബിള്‍ नोब्ळ् <[EMAIL PROTECTED]>: > On Wed, Jul 9, 2008 at 8:46 PM, Noble Paul നോബിള്‍ नोब्ळ् > <[EMAIL PROTECTED]> wrote: > > yep. you cant search. It i

Re: Indexing xml data

2008-07-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Jul 9, 2008 at 8:46 PM, Noble Paul നോബിള്‍ नोब्ळ् <[EMAIL PROTECTED]> wrote: > yep. you cant search. It is better to extract the data out and index > it if you want to search > > On Wed, Jul 9, 2008 at 8:37 PM, Norberto Meijome <[EMAIL PROTECTED]> wrote: >> On Wed, 9 Jul 2008 19:51:45 +0530

Re: Indexing xml data

2008-07-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
yep. you cant search. It is better to extract the data out and index it if you want to search On Wed, Jul 9, 2008 at 8:37 PM, Norberto Meijome <[EMAIL PROTECTED]> wrote: > On Wed, 9 Jul 2008 19:51:45 +0530 > "Noble Paul _ __" <[EMAIL PROTECTED]> > wrote: > >> Y

Re: Indexing xml data

2008-07-09 Thread Norberto Meijome
On Wed, 9 Jul 2008 19:51:45 +0530 "Noble Paul _ __" <[EMAIL PROTECTED]> wrote: > You can put it into a 'string' field directly if we refer to the default string field , you won't be able to search for the contents of the XML (unless you search for the whole t

Re: Indexing xml data

2008-07-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
You can put it into a 'string' field directly On Wed, Jul 9, 2008 at 7:41 PM, Alexander Ramos Jardim <[EMAIL PROTECTED]> wrote: > I need to put big xml files on a string field in one of my projects. Does > Solr accept it automatically or should I put a on my xml before > putting on the index? >

Re: Indexing XML

2007-10-05 Thread Wayne Graham
Benoit, Are you familiar with the Vufind project (http://www.vufind.org)? If you look at the PHP code in the import folder to see how the indexing is working (there's an XSL transformation that then updates the index). I've also written some initial code to use embedded Solr to do this indexing di

Re: Indexing XML

2007-10-05 Thread Walter Underwood
Solr is not an XML engine (or a MARC engine). It uses XML as an input format for fielded data. It does not index or search arbitrary XML. You need to convert your XML into Solr's format. I would recommend expressing MARC in a Solr schema, then working on the input XML. The input XML depends on the

Re: Indexing XML

2007-10-05 Thread Alan Rykhus
Hello Benoit, An additonal thing to check out is the work being done on fac-back-opac. They have a parser that will parse native MARC records. I would assume that if you can extract your records in MARC XML you can extract them in native MARC. I've used the parser and it works well. al On Fri

Re: Indexing XML

2007-10-05 Thread Pieter Berkel
> SOLR has of course a problem with the XML in the 'originalRecord' field. > Is there a solution to this? Has anyone done this before? I would suggest changing the field type of "originalRecord" to "string" rather than "text", and if you're still having trouble with the XML data simply encapsulat

Re: Indexing xml documents with custom field type

2007-06-19 Thread Chris Hostetter
: I wish to index well formed xml documents as they are without escaping : all the tags with lt;s and gt;s. I searched this mailing list's archive : and found someone who suggested that you can make a new field type : having a file something like: in the thread in question... http://www.nabble.c

Re: Indexing XML files

2006-12-07 Thread Chris Hostetter
: I looked at the XSD and there is one thing I don't understand: : : If the desired way is to conform to the XSD (and hence the types used in XSD), : then how would it possible to use user-defined fieldtypes as plugins? Wouldn't : they violate the same principle? The XSD is intended to match th

Re: Indexing XML files

2006-12-07 Thread mirko
Thank you all for the quick responses. They were very helpful. My XML is well-formed, so I ended up implementing my own FieldType: public class XMLField extends TextField { public void write(XMLWriter xmlWriter, String name, Fieldable f) throws IOException { xmlWriter.writePrim("xml", name

Re: Indexing XML files

2006-12-06 Thread Yonik Seeley
On 12/6/06, Graham O'Regan <[EMAIL PROTECTED]> wrote: couldn't you use a cdata section? That's just another form of escaping. Mirko actually want's the XML field value to be part of the XML of Solr's response, not encapsulated by it. -Yonik

Re: Indexing XML files

2006-12-06 Thread Graham O'Regan
couldn't you use a cdata section? Chris Hostetter wrote: Since XML is the transport for sending data to Solr, you need to make sure all field values are XML escaped. If you wanted to index a plain text "title" and that tile contained an ampersand character Sense & Sensability ...y

Re: Indexing XML files

2006-12-05 Thread Chris Hostetter
: At some point, it would be simpler to write a custom response handler : and generate the output in your desired XML format. I think Walters got the right idea ... as a general rule, we want to make the XmlResponseWriter "bullet proof" so that no matter waht data you put into your index, it is g

Re: Indexing XML files

2006-12-05 Thread Walter Underwood
At some point, it would be simpler to write a custom response handler and generate the output in your desired XML format. wunder On 12/5/06 1:52 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > Hi, > > the idea is to apply XSLT transformation on the result. But it seems that > I would have

Re: Indexing XML files

2006-12-05 Thread mirko
Hi, the idea is to apply XSLT transformation on the result. But it seems that I would have to apply two transformations in a row, one which unescapes the escaped node and a second which performs the actual transformation... mirko Quoting Yonik Seeley <[EMAIL PROTECTED]>: > On 12/5/06, [EMAIL

Re: Indexing XML files

2006-12-05 Thread mirko
You are right, it is escaped. But my question is: (how) can I make it unescaped? mirko Quoting Yonik Seeley <[EMAIL PROTECTED]>: ... > > I bet it is escaped, but your browser has helpfully displayed it as > unescaped. > Try doing CTRL-U in firefox to see the real source for the reply. > > > -Y

Re: Indexing XML files

2006-12-05 Thread Yonik Seeley
On 12/5/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Thanks for the quick response. Now, I have one more question. Is it possible to get the result for a query back in the following form (considering the input is the escaped xml, what you mentioned before): 0 0 As You Like

Re: Indexing XML files

2006-12-05 Thread mirko
Hi, Thanks for the quick response. Now, I have one more question. Is it possible to get the result for a query back in the following form (considering the input is the escaped xml, what you mentioned before): 0 0 As You Like It (Promptbook of McVicars 1860)Shakespeare, William,

Re: Indexing XML files

2006-12-05 Thread Mike Klaas
On 12/5/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: You are right, it is escaped. But my question is: (how) can I make it unescaped? I don't think solr will support such functionality. The xml that solr uses to return data is completely orthogonal to the xml embedded in the data, and mix

Re: Indexing XML files

2006-12-05 Thread Yonik Seeley
On 12/5/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: You are right, it is escaped. But my question is: (how) can I make it unescaped? For what purpose? If you use an XML parser, the values it gives back to you will be unescaped. -Yonik

Re: Indexing XML files

2006-12-05 Thread Chris Hostetter
Since XML is the transport for sending data to Solr, you need to make sure all field values are XML escaped. If you wanted to index a plain text "title" and that tile contained an ampersand character Sense & Sensability ...you would need to XML escape that as... Sense & Sen