Re: indexing XML stored on HDFS

2017-12-08 Thread Matthew Roth
Thanks Rick, While long term storage of the documents in HDFS is not necessary you do raise that easy access to these documents durning the development phase will be useful. Cassandra, spark-solr I am under the impression that I must be running SolrCloud. At this time I need some of the features

Re: indexing XML stored on HDFS

2017-12-08 Thread Cassandra Targett
Matthew, The hadoop-solr project you mention would give you the ability to index files in HDFS. It's a Job Jar, so you submit it to Hadoop with the params you need and it processes the files and sends them to Solr. It might not be the fastest thing in the world since it uses MapReduce but we (I wo

Re: indexing XML stored on HDFS

2017-12-07 Thread Rick Leir
Matthew, Oops, I should have mentioned re-indexing. With Solr, you want to be able to re-index quickly so you can try out different analysis chains. XSLT may not be fast enough for this if you have millions of docs. So I would be inclined to save the docs to a normal filesystem, perhaps in JSONL

Re: indexing XML stored on HDFS

2017-12-07 Thread Rick Leir
Matthew, Do you have some sort of script calling xslt? Sorry, I do not know Scala and I did not have time to look into your spark utils. The script or Scala could then shell out to curl, or if it is python it could use the request library to send a doc to Solr. Extra points for batching the doc

Re: indexing XML stored on HDFS

2017-12-07 Thread Matthew Roth
Yes the post tool would also be an acceptable option and one I am familiar with. However, I also am not seeing exactly how I would query hdfs. The hadoop-solr [0 ] tool by lucidworks looks the most promising. I have a meeting to attend t

Re: indexing XML stored on HDFS

2017-12-06 Thread Erick Erickson
Perhaps the bin/post tool? See: https://lucidworks.com/2015/08/04/solr-5-new-binpost-utility/ On Wed, Dec 6, 2017 at 2:05 PM, Matthew Roth wrote: > Hi All, > > Is there a DIH for HDFS? I see this old feature request [0 > ] that never seems to have

indexing XML stored on HDFS

2017-12-06 Thread Matthew Roth
Hi All, Is there a DIH for HDFS? I see this old feature request [0 ] that never seems to have gone anywhere. Google searches and searches on this list don't get me to far. Essentially my workflow is that I have many thousands of XML documents store

Re: What's the best practices for indexing XML Content with dynamic XML Elements (SOLR 6.1) ?

2016-08-16 Thread Stan Lee
Sorry for not being specific. I believe this SOLR plugin (LUX) may fit my scenario (query without knowing the tag in advance). http://luxdb.org/README.html On Tue, Aug 16, 2016 at 12:18 PM, Erick Erickson wrote: > You haven't really described the scenario you want > to implement. I get that you

Re: What's the best practices for indexing XML Content with dynamic XML Elements (SOLR 6.1) ?

2016-08-16 Thread Erick Erickson
You haven't really described the scenario you want to implement. I get that you have raw XML of an unknown structure. What do you want to _do_ with that? 1> if all you want to do is index the data (i.e. strip the tags) try HtmlStripCharFilterFactory. 2> If you want to intelligently take content of

What's the best practices for indexing XML Content with dynamic XML Elements (SOLR 6.1) ?

2016-08-16 Thread Stan Lee
We currently have a Microsoft SQL table with a XML datatype. We use DIH to import the XML Content as is, that is not using the XPathEntityProcessor. If the elements of the XML content is known, XPathEntity make sense. Could someone kindly suggest the right way of handling such scenario, without imp

Re: Indexing xml documents using solrj 6.0 + solr 6.0

2016-05-09 Thread Abdel Belkasri
did you look at this: https://cwiki.apache.org/confluence/display/solr/Using+SolrJ Regards, --Abdel. On Mon, May 9, 2016 at 1:32 PM, Mat San wrote: > Hello, > > Could I ask please for urgent help since I'm new to solrj and solr. I've > read all documentation but I did not find a full complete e

Indexing xml documents using solrj 6.0 + solr 6.0

2016-05-09 Thread Mat San
Hello, Could I ask please for urgent help since I'm new to solrj and solr. I've read all documentation but I did not find a full complete example in java how to index arbitrary xml documents and rich documents. (These documents are placed in a folder). Can somebody provide some examples please (J

Getting error while indexing XML files on Hadoop

2015-01-13 Thread celebis
electronics connector car power adapter for iPod, white 2 11.50 1 false 37.7752,-122.4232 2006-02-14T23:55:59Z -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-error-while-indexing-XML-files-on-Hadoop-tp4179168.html Sent from the Solr - User

Re: Problem with indexing xml using DataImportHandler and XPath

2014-03-05 Thread Erick Erickson
NP, Been there, done that, got the t-shirt :)... On Wed, Mar 5, 2014 at 9:51 PM, Farhan Ali wrote: > Sorry figured out my problem. It was stupid mistake on my part. Once again > sorry for that > > Thanks > Farhan > > > On Wed, Mar 5, 2014 at 7:14 PM, Farhan Ali wrote: > >> Hi, >> I am a newbie

Re: Problem with indexing xml using DataImportHandler and XPath

2014-03-05 Thread Farhan Ali
Sorry figured out my problem. It was stupid mistake on my part. Once again sorry for that Thanks Farhan On Wed, Mar 5, 2014 at 7:14 PM, Farhan Ali wrote: > Hi, > I am a newbie to Solr and I am trying to index some xml documents using > DIH and XPath but I am unable to do it. I get a response m

Problem with indexing xml using DataImportHandler and XPath

2014-03-05 Thread Farhan Ali
Hi, I am a newbie to Solr and I am trying to index some xml documents using DIH and XPath but I am unable to do it. I get a response message of successful indexing but no document is added to the index. I do not know what i m doing wrong. This is my data config xml file

Re: [Indexing XML files in Solr with DataImportHandler]

2013-10-16 Thread Gora Mohanty
On 16 October 2013 13:06, kujta1 wrote: > it is not indexing, it is saying there are no files indexed If you expect answers on the mailing list it might be best to provide details here. From a quick glance at Stackoverflow, it looks like you need a FileListEntityProcessor. Searching Google turns

Re: [Indexing XML files in Solr with DataImportHandler]

2013-10-16 Thread kujta1
it is not indexing, it is saying there are no files indexed -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-XML-files-in-Solr-with-DataImportHandler-tp4095628p4095811.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: [Indexing XML files in Solr with DataImportHandler]

2013-10-15 Thread Shalin Shekhar Mangar
What is not working? Are you seeing any exceptions in the logs? On Tue, Oct 15, 2013 at 3:53 PM, kujta1 wrote: > hello i have problems wih indexing xml file format. my solrconfigdaa-config > and solr files are here > > http://stackoverflow.com/questions/19337979/indexing-xml-files-

[Indexing XML files in Solr with DataImportHandler]

2013-10-15 Thread kujta1
hello i have problems wih indexing xml file format. my solrconfigdaa-config and solr files are here http://stackoverflow.com/questions/19337979/indexing-xml-files-in-solr-with-dataimporthandlerCan sombody help me why thi is not working!!thank you -- View this message in context: http://lucene

Re: DataImportHandler - Indexing xml content

2013-04-26 Thread Alexandre Rafalovitch
Have you looked at: http://wiki.apache.org/solr/DataImportHandler#FieldReaderDataSource ? Regards, Alex. On Fri, Apr 26, 2013 at 12:29 PM, Peri Subrahmanya wrote: > I have a column in my database that is of type long text and holds xml > content. I was wondering when I define the entity recor

DataImportHandler - Indexing xml content

2013-04-26 Thread Peri Subrahmanya
I have a column in my database that is of type long text and holds xml content. I was wondering when I define the entity record is there a way to provide a custom extractor that will take in the xml and return rows with appropriate fields to be indexed. Thank you, Peri Subrahmanya On 4/26/13

Re: Solr-4.0.0 DIH not indexing xml attributes

2012-10-20 Thread Billy Newman
; Is it possible to post the whole DIH script? > > - Original Message - > | From: "Billy Newman" > | To: solr-user@lucene.apache.org > | Sent: Friday, October 19, 2012 9:06:08 AM > | Subject: Solr-4.0.0 DIH not indexing xml attributes > | > | Hello

Re: Solr-4.0.0 DIH not indexing xml attributes

2012-10-19 Thread Lance Norskog
| From: "Billy Newman" | To: solr-user@lucene.apache.org | Sent: Friday, October 19, 2012 9:06:08 AM | Subject: Solr-4.0.0 DIH not indexing xml attributes | | Hello all, | | I am having problems indexing xml attributes using the DIH. | | I have the following xml: | | | | | | | H

Solr-4.0.0 DIH not indexing xml attributes

2012-10-19 Thread Billy Newman
Hello all, I am having problems indexing xml attributes using the DIH. I have the following xml: However nothing is getting inserted into my index. I am pretty sure this should work so I have no idea what is wrong. Can anyone else confirm that this is a problem? Or is it just me

Re: indexing xml attributes?

2011-05-17 Thread bryan rasmussen
Ah never mind, I had to restart my instance in order for my changes to the dataimporter to register. thanks, Bryan Rasmussen On Tue, May 17, 2011 at 12:19 PM, bryan rasmussen wrote: > Hi, > > As I understand it the DIH XPathEntityProcessor will not allow me to > index attributes - like so /> >

indexing xml attributes?

2011-05-17 Thread bryan rasmussen
Hi, As I understand it the DIH XPathEntityProcessor will not allow me to index attributes - like so So if I want to index attributes I should pre-process the documents into the format that Solr indexes normally and place the value of the ID into a field? Thanks, Bryan Rasmussen

Re: indexing xml document with literals

2010-07-07 Thread Chris Hostetter
: Does anyone know how to read in data from one or more of the example xml docs : and ALSO store the filename and path from which it came? Solr has no knowledge that your "xml docs" are actually files ... the XML syntax ("...") is just a serialization mechanism for streaming data to solr about

indexing xml document with literals

2010-06-25 Thread Kyle Langan
Does anyone know how to read in data from one or more of the example xml docs and ALSO store the filename and path from which it came? ie: exampledocs/vidcard.xml contains: EN7800GTX/2DHTV/256M ASUS Extreme N7800GTX/2DHTV (256 MB) 100-435805 ATI Radeon X1900 XTX 512 MB PCIE Video Card

Re: full-text indexing XML files

2009-12-11 Thread Lance Norskog
ks...@gmail.com] > Sent: Thursday, December 10, 2009 7:43 PM > To: solr-user@lucene.apache.org > Subject: Re: full-text indexing XML files > > Or CDATA (much easier to work with). > > On Wed, Dec 9, 2009 at 10:37 PM, Shalin Shekhar Mangar > wrote: >> On Thu, Dec 10, 2

Re: full-text indexing XML files

2009-12-11 Thread Walter Underwood
d by XML syntax)? > > -Original Message- > From: Walter Underwood [mailto:wun...@wunderwood.org] > Sent: Thursday, December 10, 2009 8:00 PM > To: solr-user@lucene.apache.org > Subject: Re: full-text indexing XML files > > What kind of searches do you want to do? Do you w

RE: full-text indexing XML files

2009-12-11 Thread Feroze Daud
Underwood [mailto:wun...@wunderwood.org] Sent: Thursday, December 10, 2009 8:00 PM To: solr-user@lucene.apache.org Subject: Re: full-text indexing XML files What kind of searches do you want to do? Do you want to do searches that match the XML tags? wunder On Dec 10, 2009, at 7:43 PM, Lance

RE: full-text indexing XML files

2009-12-11 Thread Feroze Daud
CDATA didn’t work either.It still complained about the input doc not being in correct format. -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Thursday, December 10, 2009 7:43 PM To: solr-user@lucene.apache.org Subject: Re: full-text indexing XML files Or CDATA

Re: full-text indexing XML files

2009-12-10 Thread Walter Underwood
What kind of searches do you want to do? Do you want to do searches that match the XML tags? wunder On Dec 10, 2009, at 7:43 PM, Lance Norskog wrote: > Or CDATA (much easier to work with). > > On Wed, Dec 9, 2009 at 10:37 PM, Shalin Shekhar Mangar > wrote: >> On Thu, Dec 10, 2009 at 5:13 AM,

Re: full-text indexing XML files

2009-12-10 Thread Lance Norskog
Or CDATA (much easier to work with). On Wed, Dec 9, 2009 at 10:37 PM, Shalin Shekhar Mangar wrote: > On Thu, Dec 10, 2009 at 5:13 AM, Feroze Daud wrote: > >> Hi! >> >> >> >> I am trying to full text index an XML file. For various reasons, I >> cannot use Tika or other technology to parse the XML

Re: full-text indexing XML files

2009-12-09 Thread Shalin Shekhar Mangar
On Thu, Dec 10, 2009 at 5:13 AM, Feroze Daud wrote: > Hi! > > > > I am trying to full text index an XML file. For various reasons, I > cannot use Tika or other technology to parse the XML file. The > requirement is to full-text index the XML file, including Tags and > everything. > > > > So, I cr

full-text indexing XML files

2009-12-09 Thread Feroze Daud
Hi! I am trying to full text index an XML file. For various reasons, I cannot use Tika or other technology to parse the XML file. The requirement is to full-text index the XML file, including Tags and everything. So, I created a input index spec like this: 1001 NASA Advanced Researc

Re: indexing XML with solr example webapp - out of java heap space

2009-12-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
the post.jar does not stream. use "curl" if you are using *nix. --Noble On Wed, Dec 9, 2009 at 12:28 AM, Feroze Daud wrote: > Hi! > > > > I downloaded SOLR and am trying to index an XML file. This XML file is > huge (500M). > > > > When I try to index it using the "post.jar" tool in example\examp

indexing XML with solr example webapp - out of java heap space

2009-12-08 Thread Feroze Daud
Hi! I downloaded SOLR and am trying to index an XML file. This XML file is huge (500M). When I try to index it using the "post.jar" tool in example\exampledocs, I get a "out of java heap space" error in the SimplePostTool application. Any ideas how to fix this? Passing in "-Xms1024M" doe

Re: Error when indexing XML files

2009-10-15 Thread Fergus McMenemie
>Hi, > >Please find the schema file attached. Please let me know what I am doing wrong. > >Regards >Chaitali > >--- On Wed, 10/14/09, Fergus McMenemie wrote: > > >From: Fergus McMenemie >Subject: Re: Error when indexing XML files >To: solr-user@lucene.

Re: Error when indexing XML files

2009-10-15 Thread Fergus McMenemie
Hi, Please find the schema file attached. Please let me know what I am doing wrong. Regards Chaitali --- On Wed, 10/14/09, Fergus McMenemie wrote: From: Fergus McMenemie Subject: Re: Error when indexing XML files To: solr-user@lucene.apache.org Date: Wednesday, October 14, 2009, 2:25 AM

Re: Error when indexing XML files

2009-10-14 Thread Chaitali Gupta
Hi, Please find the schema file attached. Please let me know what I am doing wrong. Regards Chaitali --- On Wed, 10/14/09, Fergus McMenemie wrote: From: Fergus McMenemie Subject: Re: Error when indexing XML files To: solr-user@lucene.apache.org Date: Wednesday, October 14, 2009, 2:25 AM

Re: Error when indexing XML files

2009-10-13 Thread Fergus McMenemie
>Hi, > >I am trying to index XML files using SolrJ. The original XML file contains >nested elements. For example, the following is the snippet of the XML file. > > >  SOMETHING >  SOME_OTHER_THING >  > >I have added the elements "name" and "facility" in Schema.xml file to make >these e

Error when indexing XML files

2009-10-13 Thread Chaitali Gupta
Hi, I am trying to index XML files using SolrJ. The original XML file contains nested elements. For example, the following is the snippet of the XML file.   SOMETHING   SOME_OTHER_THING   I have added the elements "name" and "facility" in Schema.xml file to make these elements inde

Re: Question on modifying solr behavior on indexing xml files..

2009-10-02 Thread Shalin Shekhar Mangar
On Thu, Oct 1, 2009 at 3:10 PM, Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340 wrote: > 1. In my playing around with > sending in an XML document within a an XML CDATA tag, > with termVectors="true" > > I noticed the following behavior: > peter > collapses to the term > personpeterperson > inste

Question on modifying solr behavior on indexing xml files..

2009-10-01 Thread Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340
1. In my playing around with sending in an XML document within a an XML CDATA tag, with termVectors="true" I noticed the following behavior: peter collapses to the term personpeterperson instead of person and peter separately. I realize I could try and do a search and replaces of characters

Re: Indexing XML

2009-07-07 Thread Saeli Mathieu
And here it's my code :) If you need some explanation feel free to ask :) You can test it on the first test file I gave you when I open the thread. At the moment that works only on one file, I have to change it a bit to make it works on repertory with lots of xml files, See you later guys :-) $

Re: Indexing XML

2009-07-07 Thread Saeli Mathieu
I'm sorry I almost finish my script to format my xml in Solr's xml. I'll give it to you later, I think that can help some people like me in the future :) I just need to formate my output text and everything will be fine :) Cheers for your help guys ;) On Tue, Jul 7, 2009 at 7:06 PM, Jay Hill wr

Re: Indexing XML

2009-07-07 Thread Jay Hill
Mathieu, have a look at Solr's DataImportHandler. It provides a configuration-based approach to index different types of datasources including relational databases and XML files. In particular have a look at the XpathEntityProcessor ( http://wiki.apache.org/solr/DataImportHandler#head-f1502b1ed71d9

Re: Indexing XML

2009-07-07 Thread Saeli Mathieu
Yep that making sense. But I was afraid it was the only solution. Since I finished to wrote my email I started to create a php script to create the same file but compatible with Solr. thx for your quick answer ;) On Tue, Jul 7, 2009 at 4:40 PM, Matt Mitchell wrote: > Saeli, > > Solr expects a

Re: Indexing XML

2009-07-07 Thread Matt Mitchell
Saeli, Solr expects a certain XML structure when adding documents. You'll need to come up with a mapping, that translates the original structure to one that solr understands. You can then search solr and get those solr documents back. If you want to keep the original XML, you can store it in a fie

Indexing XML

2009-07-07 Thread Saeli Mathieu
Hello. I'm a new user of Solr, I already used Lucene to index files and search. But my programme was too slow, it's why I was looking for another solution, and I thought I found it. I said I thought because I don't know if it's possible to use solar with this kind of XML files. http://ltsc.ieee

Re: query regarding Indexing xml files -db-data-config.xml

2009-05-18 Thread jayakeerthi s
Hi Noble, Thanks for the reply, As advised I have changed the db-data-config.xml as below. But still the Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. Got error as below when baseDir is removed INFO:

Re: query regarding Indexing xml files -db-data-config.xml

2009-05-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
hi , u may not need that enclosing entity , if you only wish to index one file. baseDir is not required if you give absolute path in the fileName. no need to mention forEach or fields if you set useSolrAddSchema="true" On Sat, May 16, 2009 at 1:23 AM, jayakeerthi s wrote: > Hi All, > > I am try

Re: query regarding Indexing xml files -db-data-config.xml

2009-05-16 Thread Fergus McMenemie
Hmmm, I thought that if you were using the XPathEntityProcessor that you have to specify an xpath for each of the fields you want to populate. Unless you are using XPathEntityProcessor's use useSolrAddSchema mode? Fergus. >If that is your complete input file then it looks like you are missing

Re: query regarding Indexing xml files -db-data-config.xml

2009-05-15 Thread jayakeerthi s
Many thanks for the reply The complete input xml file is below I missed to include this earlier. F8V7067-APL-KIT Belkin Mobile Power Cord for iPod w/ Dock Belkin electronics connector car power adapter, white 4 19.95 1 false IW-02 iPod & iPod Mini USB 2.0 Cable Belk

Re: query regarding Indexing xml files -db-data-config.xml

2009-05-15 Thread Jay Hill
If that is your complete input file then it looks like you are missing the wrapping element: F8V7067-APL-KIT > field> > Belkin Mobile Power Cord for iPod w/ Dock > Belkin > electronics > connector > car power adapter, white > 4 > 19.95 > 1 > false > Is it possible you just forgot

query regarding Indexing xml files -db-data-config.xml

2009-05-15 Thread jayakeerthi s
Hi All, I am trying to index the fileds from the xml files, here is the configuration that I am using. db-data-config.xml Schema.xml has the field "manu" The input xml file used to import the field is F8V7067-APL-KIT Belkin Mo

Re: Indexing xml data

2008-07-09 Thread Alexander Ramos Jardim
Oh thanks. I don't want to search on that. I will have a name field that contains the unique identifier of the document. 2008/7/9 Noble Paul നോബിള്‍ नोब्ळ् <[EMAIL PROTECTED]>: > On Wed, Jul 9, 2008 at 8:46 PM, Noble Paul നോബിള്‍ नोब्ळ् > <[EMAIL PROTECTED]> wrote: > > yep. you cant search. It i

Re: Indexing xml data

2008-07-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Wed, Jul 9, 2008 at 8:46 PM, Noble Paul നോബിള്‍ नोब्ळ् <[EMAIL PROTECTED]> wrote: > yep. you cant search. It is better to extract the data out and index > it if you want to search > > On Wed, Jul 9, 2008 at 8:37 PM, Norberto Meijome <[EMAIL PROTECTED]> wrote: >> On Wed, 9 Jul 2008 19:51:45 +0530

Re: Indexing xml data

2008-07-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
yep. you cant search. It is better to extract the data out and index it if you want to search On Wed, Jul 9, 2008 at 8:37 PM, Norberto Meijome <[EMAIL PROTECTED]> wrote: > On Wed, 9 Jul 2008 19:51:45 +0530 > "Noble Paul _ __" <[EMAIL PROTECTED]> > wrote: > >> Y

Re: Indexing xml data

2008-07-09 Thread Norberto Meijome
On Wed, 9 Jul 2008 19:51:45 +0530 "Noble Paul _ __" <[EMAIL PROTECTED]> wrote: > You can put it into a 'string' field directly if we refer to the default string field , you won't be able to search for the contents of the XML (unless you search for the whole t

Re: Indexing xml data

2008-07-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
You can put it into a 'string' field directly On Wed, Jul 9, 2008 at 7:41 PM, Alexander Ramos Jardim <[EMAIL PROTECTED]> wrote: > I need to put big xml files on a string field in one of my projects. Does > Solr accept it automatically or should I put a on my xml before > putting on the index? >

Indexing xml data

2008-07-09 Thread Alexander Ramos Jardim
I need to put big xml files on a string field in one of my projects. Does Solr accept it automatically or should I put a on my xml before putting on the index? -- Alexander Ramos Jardim

Re: Indexing XML

2007-10-05 Thread Wayne Graham
Benoit, Are you familiar with the Vufind project (http://www.vufind.org)? If you look at the PHP code in the import folder to see how the indexing is working (there's an XSL transformation that then updates the index). I've also written some initial code to use embedded Solr to do this indexing di

Re: Indexing XML

2007-10-05 Thread Walter Underwood
Solr is not an XML engine (or a MARC engine). It uses XML as an input format for fielded data. It does not index or search arbitrary XML. You need to convert your XML into Solr's format. I would recommend expressing MARC in a Solr schema, then working on the input XML. The input XML depends on the

Re: Indexing XML

2007-10-05 Thread Alan Rykhus
Hello Benoit, An additonal thing to check out is the work being done on fac-back-opac. They have a parser that will parse native MARC records. I would assume that if you can extract your records in MARC XML you can extract them in native MARC. I've used the parser and it works well. al On Fri

Re: Indexing XML

2007-10-05 Thread Pieter Berkel
> SOLR has of course a problem with the XML in the 'originalRecord' field. > Is there a solution to this? Has anyone done this before? I would suggest changing the field type of "originalRecord" to "string" rather than "text", and if you're still having trouble with the XML data simply encapsulat

Indexing XML

2007-10-05 Thread PAUWELS Benoit
Hi, I wish to index well formed xml documents as they are. I have a database filled with MARCXML records. An example of these looks like this: http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"; xmlns="http://www.loc.gov/MARC21

Re: Indexing xml documents with custom field type

2007-06-19 Thread Chris Hostetter
http://www.nabble.com/Indexing-XML-files-tf2763600.html ...the suggestion to add a new XMLFieldType was so the user could get the xml values from his field "raw" in the body of an XmlResponseWriter response for the purposes of XSLT styling ... but that only affected the display of results returned t

Indexing xml documents with custom field type

2007-06-19 Thread James Gregory
I wish to index well formed xml documents as they are without escaping all the tags with lt;s and gt;s. I searched this mailing list's archive and found someone who suggested that you can make a new field type having a file something like: import org.apache.solr.schema.TextField; import org.ap

Re: Indexing XML files

2006-12-07 Thread Chris Hostetter
: I looked at the XSD and there is one thing I don't understand: : : If the desired way is to conform to the XSD (and hence the types used in XSD), : then how would it possible to use user-defined fieldtypes as plugins? Wouldn't : they violate the same principle? The XSD is intended to match th

Re: Indexing XML files

2006-12-07 Thread mirko
Thank you all for the quick responses. They were very helpful. My XML is well-formed, so I ended up implementing my own FieldType: public class XMLField extends TextField { public void write(XMLWriter xmlWriter, String name, Fieldable f) throws IOException { xmlWriter.writePrim("xml", name

Re: Indexing XML files

2006-12-06 Thread Yonik Seeley
On 12/6/06, Graham O'Regan <[EMAIL PROTECTED]> wrote: couldn't you use a cdata section? That's just another form of escaping. Mirko actually want's the XML field value to be part of the XML of Solr's response, not encapsulated by it. -Yonik

Re: Indexing XML files

2006-12-06 Thread Graham O'Regan
couldn't you use a cdata section? Chris Hostetter wrote: Since XML is the transport for sending data to Solr, you need to make sure all field values are XML escaped. If you wanted to index a plain text "title" and that tile contained an ampersand character Sense & Sensability ...y

Re: Indexing XML files

2006-12-05 Thread Chris Hostetter
: At some point, it would be simpler to write a custom response handler : and generate the output in your desired XML format. I think Walters got the right idea ... as a general rule, we want to make the XmlResponseWriter "bullet proof" so that no matter waht data you put into your index, it is g

Re: Indexing XML files

2006-12-05 Thread Walter Underwood
At some point, it would be simpler to write a custom response handler and generate the output in your desired XML format. wunder On 12/5/06 1:52 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > Hi, > > the idea is to apply XSLT transformation on the result. But it seems that > I would have

Re: Indexing XML files

2006-12-05 Thread mirko
Hi, the idea is to apply XSLT transformation on the result. But it seems that I would have to apply two transformations in a row, one which unescapes the escaped node and a second which performs the actual transformation... mirko Quoting Yonik Seeley <[EMAIL PROTECTED]>: > On 12/5/06, [EMAIL

Re: Indexing XML files

2006-12-05 Thread mirko
You are right, it is escaped. But my question is: (how) can I make it unescaped? mirko Quoting Yonik Seeley <[EMAIL PROTECTED]>: ... > > I bet it is escaped, but your browser has helpfully displayed it as > unescaped. > Try doing CTRL-U in firefox to see the real source for the reply. > > > -Y

Re: Indexing XML files

2006-12-05 Thread Yonik Seeley
On 12/5/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Thanks for the quick response. Now, I have one more question. Is it possible to get the result for a query back in the following form (considering the input is the escaped xml, what you mentioned before): 0 0 As You Like

Re: Indexing XML files

2006-12-05 Thread mirko
Hi, Thanks for the quick response. Now, I have one more question. Is it possible to get the result for a query back in the following form (considering the input is the escaped xml, what you mentioned before): 0 0 As You Like It (Promptbook of McVicars 1860)Shakespeare, William,

Re: Indexing XML files

2006-12-05 Thread Mike Klaas
On 12/5/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: You are right, it is escaped. But my question is: (how) can I make it unescaped? I don't think solr will support such functionality. The xml that solr uses to return data is completely orthogonal to the xml embedded in the data, and mix

Re: Indexing XML files

2006-12-05 Thread Yonik Seeley
On 12/5/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: You are right, it is escaped. But my question is: (how) can I make it unescaped? For what purpose? If you use an XML parser, the values it gives back to you will be unescaped. -Yonik

Re: Indexing XML files

2006-12-05 Thread Chris Hostetter
Since XML is the transport for sending data to Solr, you need to make sure all field values are XML escaped. If you wanted to index a plain text "title" and that tile contained an ampersand character Sense & Sensability ...you would need to XML escape that as... Sense & Sen

Indexing XML files

2006-12-05 Thread mirko
Hi, I am trying to index an xml file as a field in lucene, see example below: As You Like it Shakespeare, William here goes the xml... I can index the title and author fields because they are strings, but the record field is an xml itself and I bump into some problems as I cannot dir

Re: Solr is indexing XML only?

2006-04-27 Thread Yonik Seeley
On 4/27/06, David Trattnig <[EMAIL PROTECTED]> wrote: > thank you so much! Could you also explain me how to use these two > Tokenizers? Here's the HTMLStrip tokenizer description: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-031d5d370010955fdcc529d208395cd556f4a73e Read throug

Re: Solr is indexing XML only?

2006-04-27 Thread David Trattnig
Hi Chris, thank you so much! Could you also explain me how to use these two Tokenizers? But if there is a Tokenizer which throws away HTML markup it should be also possible to extend it and exclude additional content easily? TIA, david : will need to process that data you want to index (ie excl

Re: Solr is indexing XML only?

2006-04-26 Thread Chris Hostetter
: will need to process that data you want to index (ie exclude certain : files and remove HTML tags) and put them into Solr's input format. minor clarification: Solr does ship with two Tokenizers that do a pretty good job of throwing away HTML markup, os you don't have to parse it yourlsef -- but

Re: Solr is indexing XML only?

2006-04-26 Thread Bill Au
With Solr you can index anything Lucene can index since Solr uses Lucene under the cover. The input to Solr is in XML format. You will need to process that data you want to index (ie exclude certain files and remove HTML tags) and put them into Solr's input format. Bill On 4/26/06, David Tratt

Re: Solr is indexing XML only?

2006-04-26 Thread Erik Hatcher
David, Solr doesn't index XML files, but rather XML is used as the wrapper of the text that does get indexed. The document structure is defined in schema.xml, and the field text to be indexed is sent wrapped in an XML request. Regarding your scenario, you would need to write code that pa

Solr is indexing XML only?

2006-04-26 Thread David Trattnig
Hello! I'd like to setup/develop a search-server. I thought I would use Lucene, then I read about Solr. So I have done the Solr-Tutorial. Firstly really happy about the additional features to the Lucene-Functionality I now noticed that Solr can index only XML files. Or am I completely wrong? What

Re: Parsing/indexing XML data

2006-04-14 Thread Ken Krugler
Hi Yonik, Thanks for the fast response. > I've got some fields that will contain embedded XML. Two questions relating to that: 1. It appears as though I'll need to XML-escape the field data, as otherwise Solr complains about find a start tag (one of the embedded tags) before it finds the

Re: Parsing/indexing XML data

2006-04-14 Thread Yonik Seeley
On 4/14/06, Ken Krugler <[EMAIL PROTECTED]> wrote: > Hi all, > > I've got some fields that will contain embedded XML. Two questions > relating to that: > > 1. It appears as though I'll need to XML-escape the field data, as > otherwise Solr complains about find a start tag (one of the embedded > tag

Parsing/indexing XML data

2006-04-13 Thread Ken Krugler
Hi all, I've got some fields that will contain embedded XML. Two questions relating to that: 1. It appears as though I'll need to XML-escape the field data, as otherwise Solr complains about find a start tag (one of the embedded tags) before it finds the end tag for a field. Is this an exp