Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-02-18 Thread Divyanand Tiwari
Thank you for replying sir !!!

I have two queries related with this -

1) So in this case which request handler I have to use because
'ExtractingRequestHandler' by default strips the html content and the
default handler 'UpdateRequestHandler' does not accepts the HTML contrents.

2) How can I 'Extract' & 'Index' META information in the HTML document
separately.

Awaiting your reply
Thank you!!!


Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-02-19 Thread Divyanand Tiwari
Thank you for your help Jack. I just wanted to know if there is any ready
made solution for this because i really don't know about extracting meta
information.

awaiting reply..
Thank you


On Tue, Feb 19, 2013 at 12:48 PM, Jack Krupansky wrote:

> Use the standard update handler and pass the entire HTML page as literal
> text in a Solr XML document for the field that has the HTML strip filter,
> but be sure to escape the HTML (angle brackets, ampersands, etc.) syntax.
>
> You'll have to process meta information yourself.
>
>
> -- Jack Krupansky
>
> -----Original Message- From: Divyanand Tiwari
> Sent: Monday, February 18, 2013 10:52 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How can i instruct the Solr/ Solr Cell to output the original
> HTML document which was fed to it.?
>
>
> Thank you for replying sir !!!
>
> I have two queries related with this -
>
> 1) So in this case which request handler I have to use because
> 'ExtractingRequestHandler' by default strips the html content and the
> default handler 'UpdateRequestHandler' does not accepts the HTML contrents.
>
> 2) How can I 'Extract' & 'Index' META information in the HTML document
> separately.
>
> Awaiting your reply
> Thank you!!!
>



-- 
Regards,
Divyanand Tiwari


Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-03-05 Thread Divyanand Tiwari
Hi Chris thank you for replying. My "content" field in the schema is
stored="true" and indexed="false" because I am copying the "content" field
in "text" field which is by default indexed="true".

I was having a query that I am able to search in the html documents I had
fed to the solr, but as the results returned by the
Tika/ExtractingRequestHandler is stripped down version of the HTML
document, I am not able to present the document in the original format at
my site. :(

I got certain idea based upon Jack's reply that making my own request
handler and I am working on it.
I'll update if I am coming up with any solution also any help is most
welcomed..!!!

Thank you all for all your support...!!!


On Fri, Feb 22, 2013 at 6:42 AM, Chris Hostetter
wrote:

>
> : Hi everyone, i am new to solr technology and not getting a way to get
> back
> : the original HTML document with Hits highlighted into it. what
> : configuration and where i can do to instruct SolrCell/ Tika so that it
> does
> : not strips down the tags of HTML document in the content field.
>
> I _think_ what you want is simply to ensure that you have a "content"
> field in your schema which is stored="true" (and indexed="true" if you
> want to serach on it directly) ... and then ExtractingRequestHandler will
> put the entire XHTML it generates from the documents you index into that
> field.
>
> http://wiki.apache.org/solr/ExtractingRequestHandler
>
> If that isn't what you had in mind, then you need to provide us with more
> details about what you've tried, what results you get, and how exactly
> those results differ fro mwhat you want to get.
>
>
> -Hoss
>



-- 
Regards,
Divyanand Tiwari