date:20071005

Indexing XML

2007-10-05 Thread PAUWELS Benoit

Hi, I wish to index well formed xml documents as they are. I have a database filled with MARCXML records. An example of these looks like this: http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"; xmlns="http://www.loc.gov/MARC21

Re: Indexing XML

2007-10-05 Thread Pieter Berkel

> SOLR has of course a problem with the XML in the 'originalRecord' field. > Is there a solution to this? Has anyone done this before? I would suggest changing the field type of "originalRecord" to "string" rather than "text", and if you're still having trouble with the XML data simply encapsulat

Re: unable to figure out nutch type highlighting in solr....

2007-10-05 Thread Adrian Sutton

One last one, when you send HTML to solr, do you too replace special chars and tags with named entities? I did this and HTMLStripper doesn't seem to recognise them the tags :-S While if I try and input HTML as is indexer throws exceptions (as having tags within XML tags is obviously not valid.

Re: unable to figure out nutch type highlighting in solr....

2007-10-05 Thread Ravish Bhagdev

Thanks Adrian, I'm very new to Solr myself so struggling a bit in initial stages... One last one, when you send HTML to solr, do you too replace special chars and tags with named entities? I did this and HTMLStripper doesn't seem to recognise them the tags :-S While if I try and input HTML as i

Re: unable to figure out nutch type highlighting in solr....

2007-10-05 Thread Adrian Sutton

On 05/10/2007, at 4:07 PM, Ravish Bhagdev wrote: (Query esp. Adrian): If you are indexing XHTML, do you replace tags with entities before giving it to solr, if so, when you get back snippets do you get tags or entities or do you convert again to tags for presentation? What's the best way out?

Re: Indexing XML

2007-10-05 Thread Alan Rykhus

Hello Benoit, An additonal thing to check out is the work being done on fac-back-opac. They have a parser that will parse native MARC records. I would assume that if you can extract your records in MARC XML you can extract them in native MARC. I've used the parser and it works well. al On Fri

Re: Indexing XML

2007-10-05 Thread Wayne Graham

Benoit, Are you familiar with the Vufind project (http://www.vufind.org)? If you look at the PHP code in the import folder to see how the indexing is working (there's an XSL transformation that then updates the index). I've also written some initial code to use embedded Solr to do this indexing di

Re: Indexing XML

2007-10-05 Thread Walter Underwood

Solr is not an XML engine (or a MARC engine). It uses XML as an input format for fielded data. It does not index or search arbitrary XML. You need to convert your XML into Solr's format. I would recommend expressing MARC in a Solr schema, then working on the input XML. The input XML depends on the

Re: unable to figure out nutch type highlighting in solr....

2007-10-05 Thread Walter Underwood

That is one seriously manly regex, but I'd recommend using the Tag Soup parser instead: http://ccil.org/~cowan/XML/tagsoup/ wunder On 10/4/07 10:11 PM, "J.J. Larrea" <[EMAIL PROTECTED]> wrote: > It uses a PatternTokenizerFactory with a RegEx that swallows runs of HTML- or > XML-like tags: >

Re: unable to figure out nutch type highlighting in solr....

2007-10-05 Thread Steven Rowe

Adrian Sutton wrote: > We didn't do anything at all to the HTML, the editor returns valid XHTML > (using numeric entities, never named entities which aren't valid in XML > and don't tend to work in XHTML) [...] Named entity references are valid in XML. They just need to be declared before they ar

Re: unable to figure out nutch type highlighting in solr....

2007-10-05 Thread J.J. Larrea

At 9:32 PM +1000 10/5/07, Adrian Sutton wrote: >From what people are suggesting though you'd be better off converting to plain >text before indexing it with Solr. Something like JTidy (http://jtidy.sf.net) >can parse most HTML that's around and you can iterate over the DOM to extract >the text f

Merging Fields

2007-10-05 Thread Jae Joo

Is there any way to merge fields during indexing time. I have field1 and field2 and would like to combine these fields and make field3. In the document, there are field1 and field2, and I may build field3 using CopyField. Thanks, Jae

RE: Merging Fields

2007-10-05 Thread Keene, David

Jae, The easiest way to do this is with CopyField. These entries in your schema will accomplish that: Field 3 will have the tokens from both field 1 and 2 in it. If you want to merge those 2 fields for display, I would just concat them at display time. Dave -Original Message

Re: how to make sure a particular query is ALWAYS cached

2007-10-05 Thread Chris Hostetter

: Although I haven't tried yet, I can't imagine that this request returns in : sub-zero seconds, which is what I want (having a index of about 1M docs with : 6000 fields/ doc and about 10 complex facetqueries / request). i wouldn't neccessarily assume that :) If you have a request handler whi

strange sorting problem

2007-10-05 Thread Kevin Lewandowski

I'm having a problem with sorting on a certain field. In my schema.xml it's defined as a string (not analyzed, indexed/stored verbatim). But when I look at my results (sorted on that field ascending) I get things like the following: Yr City's A Sucker Movement b/w Yr City's A Sucker X, Y & Sometim

Re: strange sorting problem

2007-10-05 Thread Chris Hostetter

can you post... * the fieldtype declaration from your schema.xml * the field declaration from your schema * the full URL that generated that ordering * the full XML output from that URL (you can set the "fl" param to just be the field you are sorting on and score if the XML response is real

Re: unable to figure out nutch type highlighting in solr....

2007-10-05 Thread Ravish Bhagdev

Thanks all for very valuable contributions, I understand these aspects of Solr much better now but... >But a different use-case might be for the highlighting to encompass the markup rather than >just the text, e.g. > Paris >which would have to be accomplished some other way. Yes, exactly. And

query syntax for complement set

2007-10-05 Thread Doug Daniels

Hi, I'm trying to find a way to express a certain query and wondering if anyone could help. The query is against a schema that stores the user_ids who have worked on each document in a multi-value integer field called 'user_ids'. I'd like to query solr for all documents that anyone other th

Re: unable to figure out nutch type highlighting in solr....

2007-10-05 Thread Mike Klaas

On 5-Oct-07, at 11:59 AM, Ravish Bhagdev wrote: But a different use-case might be for the highlighting to encompass the markup rather than >just the text, e.g. Parisspan> which would have to be accomplished some other way. Yes, exactly. And I think nutch handles this somehow as I remember

RE: Merging Fields

2007-10-05 Thread Lance Norskog

A gotcha here is that creates multiple values. Each field copied in becomes a separate field. If you wanted a single-valued field this will not work. Lance Norskog -Original Message- From: Keene, David [mailto:[EMAIL PROTECTED] Sent: Friday, October 05, 2007 10:50 AM To: solr-user@luce

Re: strange sorting problem

2007-10-05 Thread Kevin Lewandowski

Sorry, user error. In the example I posted the field type was actually not string. But I was getting confused on another field because I didn't realize that string was case sensitive. Too many fields to think about! :) On 10/5/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > can you post... > >

Best way to change weighting based on the presence of a field

2007-10-05 Thread Kyle Banerjee

Howdy all, We are attempting to provide access to about 8 million records of highly variable quality and length. In a nutshell, we are trying to find a way to deprioritize "suspect" records without discriminating against useful records that happen to be short. We do not wish to eliminate suspect r

Re: Best way to change weighting based on the presence of a field

2007-10-05 Thread Mike Klaas

On 5-Oct-07, at 2:06 PM, Kyle Banerjee wrote: Howdy all, We are attempting to provide access to about 8 million records of highly variable quality and length. In a nutshell, we are trying to find a way to deprioritize "suspect" records without discriminating against useful records that happen t

Re: Best way to change weighting based on the presence of a field

2007-10-05 Thread Kyle Banerjee

> If you know at index time that the document is shady, the easiest way > to de-emphasize it globally is to set the document boost to some > value other than one. > > ... I considered that, but assumed we'd get the values wrong at first and have to do a lot of tinkering before we got it right. Is

Re: Best way to change weighting based on the presence of a field

2007-10-05 Thread Mike Klaas

On 5-Oct-07, at 3:01 PM, Kyle Banerjee wrote: If you know at index time that the document is shady, the easiest way to de-emphasize it globally is to set the document boost to some value other than one. ... I considered that, but assumed we'd get the values wrong at first and have to do a lot

Re: Best way to change weighting based on the presence of a field

2007-10-05 Thread Yonik Seeley

On 10/5/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > The other option is to use a function query on the value stored in a > field (which could represent a range of 'badness'). This can be used > directly in the dismax handler using the bf (boost function) query > parameter. In the near future, you

Re: unable to figure out nutch type highlighting in solr....

2007-10-05 Thread Adrian Sutton

Named entity references are valid in XML. They just need to be declared before they are used[1], unless they are one of the builtin named entities < > ' " or & -- these are always valid when parsing with an XML parser. Correct, it was an offhand comment and I skipped over all the details

Re: question about bi-gram analysis on query

2007-10-05 Thread Otis Gospodnetic

Dave, Have you tried using &debugQuery=true ? :) Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: "Keene, David" <[EMAIL PROTECTED]> To: Teruhiko Kurosaka <[EMAIL PROTECTED]> Cc: so

Indexing XML

Re: Indexing XML

Re: unable to figure out nutch type highlighting in solr....

Re: unable to figure out nutch type highlighting in solr....

Re: unable to figure out nutch type highlighting in solr....

Re: Indexing XML

Re: Indexing XML

Re: Indexing XML

Re: unable to figure out nutch type highlighting in solr....

Re: unable to figure out nutch type highlighting in solr....

Re: unable to figure out nutch type highlighting in solr....

Merging Fields

RE: Merging Fields

Re: how to make sure a particular query is ALWAYS cached

strange sorting problem

Re: strange sorting problem

Re: unable to figure out nutch type highlighting in solr....

query syntax for complement set

Re: unable to figure out nutch type highlighting in solr....

RE: Merging Fields

Re: strange sorting problem

Best way to change weighting based on the presence of a field

Re: Best way to change weighting based on the presence of a field

Re: Best way to change weighting based on the presence of a field

Re: Best way to change weighting based on the presence of a field

Re: Best way to change weighting based on the presence of a field

Re: unable to figure out nutch type highlighting in solr....

Re: question about bi-gram analysis on query

28 matches

Site Navigation

Mail list logo

Footer information