Re: Solr: extracting/indexing HTML via cURL

2012-05-02 Thread Lance Norskog
processing chain, but >> that may be too much effort compared to the HTML strip filter. >> >> -- Jack Krupansky >> >> -Original Message- From: okayndc >> Sent: Monday, April 30, 2012 10:07 AM >> To: solr-user@lucene.apache.org >> Subject: Solr: e

Re: extracting/indexing HTML via cURL

2012-05-01 Thread okayndc
CopyField to a text field field that has >> the >> HTMLStripCharFilter to strip the HTML tags and index only the text >> (indexed, but not stored.) >> >> -- Jack Krupansky >> >> -Original Message- From: okayndc >> Sent: Monday, April 30, 2012 5:06 PM >&

Re: extracting/indexing HTML via cURL

2012-05-01 Thread Jack Krupansky
will not see "". -- Jack Krupansky -Original Message- From: okayndc Sent: Tuesday, May 01, 2012 10:08 AM To: solr-user@lucene.apache.org Subject: Re: extracting/indexing HTML via cURL Thank you Jack. So, it's not doable/possible to search and highlight keywords with

Re: extracting/indexing HTML via cURL

2012-05-01 Thread okayndc
-- Jack Krupansky > > -Original Message- From: okayndc > Sent: Monday, April 30, 2012 5:06 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr: extracting/indexing HTML via cURL > > Great, thank you for the input. My understanding of HTMLStripCharFilter is > that it stri

Re: extracting/indexing HTML via cURL

2012-04-30 Thread Jack Krupansky
Sent: Monday, April 30, 2012 5:06 PM To: solr-user@lucene.apache.org Subject: Re: Solr: extracting/indexing HTML via cURL Great, thank you for the input. My understanding of HTMLStripCharFilter is that it strips HTML tags, which is not what I want ~ is this correct? I want to keep the HTML tags i

Re: Solr: extracting/indexing HTML via cURL

2012-04-30 Thread okayndc
iginal Message- From: okayndc > Sent: Monday, April 30, 2012 10:07 AM > To: solr-user@lucene.apache.org > Subject: Solr: extracting/indexing HTML via cURL > > > Hello, > > Over the weekend I experimented with extracting HTML content via cURL and > just > wondering why the e

Re: Solr: extracting/indexing HTML via cURL

2012-04-30 Thread Jack Krupansky
nday, April 30, 2012 10:07 AM To: solr-user@lucene.apache.org Subject: Solr: extracting/indexing HTML via cURL Hello, Over the weekend I experimented with extracting HTML content via cURL and just wondering why the extraction/indexing process does not include the HTML tags. It seems as though

Solr: extracting/indexing HTML via cURL

2012-04-30 Thread okayndc
Hello, Over the weekend I experimented with extracting HTML content via cURL and just wondering why the extraction/indexing process does not include the HTML tags. It seems as though the HTML tags either being ignored or stripped somewhere in the pipeline. If this is the case, is it possible to in