2011/8/11 Ahmet Arslan
> > Is there a way to strip the html tags completly and not
> > index them? If not,
> > how to I retrieve the results without html tags?
>
> How do you push documents to solr? You need to strip html tags before the
> analysis chain. For example, if you are using Data Import
Right, this is expected behavior, it trips a lot of people up.
When you specify ' indexed="true" ' in your field definitions, the
contents of the input stream are put into the inverted index etc, *after*
all the transformations you specify via tokenizers, filters, charFilters,
etc are applied. In
You can use like here
in this example. Check the docs about your specific SOLR version because
something has changed in the htmlstrip syntax in 1.4 and 3.x
2011/8/11 Merlin Morgenstern
> I am sorry, but I do not really understand the difference of indexed and
> returned result set.
>
> I l
> Is there a way to strip the html tags completly and not
> index them? If not,
> how to I retrieve the results without html tags?
How do you push documents to solr? You need to strip html tags before the
analysis chain. For example, if you are using Data Import Handler, you can use
HTMLStripTra
I am sorry, but I do not really understand the difference of indexed and
returned result set.
I look on the "returned" dataset via this command:
solr/select/?q=id:533563&terms=true
which gives me html tags like this ones:
I also tried to turn on TermsComponent, but it did not change anything:
s
OK, what does "not working" mean? You never answered Markus' question:
"Are you looking at the returned result set or what you've actually indexed?
Analyzers are not run on the stored data, only on indexed data."
If "not working" means that your returned results contain the markup, then
you're co
Unfortunatelly I still cant get it running. The code I am using is the
following:
Hmm that looks like it's working fine. I stand corrected.
On 07/25/2011 12:24 PM, Markus Jelsma wrote:
I've seen that issue too and read comments on the list yet i've never had
trouble with the order, don't know what's going on. Check this analyzer, i've
moved the charFilter to the bottom:
I've seen that issue too and read comments on the list yet i've never had
trouble with the order, don't know what's going on. Check this analyzer, i've
moved the charFilter to the bottom:
The analysis chain still does its job as i expect for the input:
bla bla
Index Analyzer
org.apa
Hmm - I'm not sure about that; see
https://issues.apache.org/jira/browse/SOLR-2119
On 07/25/2011 12:01 PM, Markus Jelsma wrote:
charFilters are executed first regardless of their position in the analyzer.
On Monday 25 July 2011 17:53:59 Mike Sokolov wrote:
I think you need to list the cha
charFilters are executed first regardless of their position in the analyzer.
On Monday 25 July 2011 17:53:59 Mike Sokolov wrote:
> I think you need to list the charfilter earlier in the analysis chain;
> before the tokenizer. Porbably Solr should tell you this...
>
> -Mike
>
> On 07/25/2011 09:
I think you need to list the charfilter earlier in the analysis chain;
before the tokenizer. Porbably Solr should tell you this...
-Mike
On 07/25/2011 09:03 AM, Merlin Morgenstern wrote:
sounds logical. I just changed it to the following, restarted and reindexed
with commit:
Are you looking at the returned result set or what you've actually indexed?
Analyzers are not run on the stored data, only on indexed data.
On Monday 25 July 2011 15:03:18 Merlin Morgenstern wrote:
> sounds logical. I just changed it to the following, restarted and reindexed
> with commit:
>
>
sounds logical. I just changed it to the following, restarted and reindexed
with commit:
You've three analyzer elements, i wonder what that would do. You need to add
the char filter to the index-time analyzer.
On Monday 25 July 2011 13:09:14 Merlin Morgenstern wrote:
> Hi there,
>
> I am trying to strip html tags from the data before adding the documents to
> the index. To do that I
15 matches
Mail list logo