subject:"Re\: extracting\/indexing HTML via cURL"

Re: extracting/indexing HTML via cURL

2012-05-01 Thread okayndc

ll be stripped of the of the tags > during analysis and be searchable just like a normal text field. Then, > search will not see "". > > > -- Jack Krupansky > > -Original Message- From: okayndc > Sent: Tuesday, May 01, 2012 10:08 AM > To: solr-user@lucene

Re: extracting/indexing HTML via cURL

2012-05-01 Thread Jack Krupansky

will not see "". -- Jack Krupansky -Original Message- From: okayndc Sent: Tuesday, May 01, 2012 10:08 AM To: solr-user@lucene.apache.org Subject: Re: extracting/indexing HTML via cURL Thank you Jack. So, it's not doable/possible to search and highlight keywords with

Re: extracting/indexing HTML via cURL

2012-05-01 Thread okayndc

Thank you Jack. So, it's not doable/possible to search and highlight keywords within a field that contains the raw formatted HTML? and strip out the HTML tags during analysis...so that a user would get back nothing if they did a search for (ex. )? On Mon, Apr 30, 2012 at 5:17 PM, Jack Krupansky

Re: extracting/indexing HTML via cURL

2012-04-30 Thread Jack Krupansky

I was thinking that you wanted to index the actual text from the HTML page, but have the stored field value still have the raw HTML with tags. If you just want to store only the raw HTML, a simple string field is sufficient, but then you can't easily do a text search on it. Or, you can have tw