i changed following line (xpath):
On 22. Aug 2013, at 10:06 PM, Alexandre Rafalovitch wrote:
> Ah. That's because Tika processor does not support path extraction. You
> need to nest one more level.
>
> Regards,
> Alex
> On 22 Aug 2013 13:34, "Andreas Owen" wrote:
>
>> i can do it like th
ok but i'm not doing any path extraction, at least i don't think so.
htmlMapper="identity" isn't preserving html
it's reading the content of the pages but it's not putting it into "text_test"
and "text". it's only in "text_test" the copyField isn't working.
data-config.xml:
Ah. That's because Tika processor does not support path extraction. You
need to nest one more level.
Regards,
Alex
On 22 Aug 2013 13:34, "Andreas Owen" wrote:
> i can do it like this but then the content isn't copied to text. it's just
> in text_test
>
> url="${rec.path}${rec.file}" dataS
i can do it like this but then the content isn't copied to text. it's just in
text_test
On 22. Aug 2013, at 6:12 PM, Andreas Owen wrote:
> i put it in the tika-entity as attribute, but it doesn't change anything. my
> bigger concern is why text_test isn't populated at all
i put it in the tika-entity as attribute, but it doesn't change anything. my
bigger concern is why text_test isn't populated at all
On 22. Aug 2013, at 5:27 PM, Alexandre Rafalovitch wrote:
> Can you try SOLR-4530 switch:
> https://issues.apache.org/jira/browse/SOLR-4530
>
> Specifically, setti
Can you try SOLR-4530 switch:
https://issues.apache.org/jira/browse/SOLR-4530
Specifically, setting htmlMapper="identity" on the entity definition. This
will tell Tika to send full HTML rather than a seriously stripped one.
Regards,
Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn:
i'm trying to index a html page and only user the div with the id="content".
unfortunately nothing is working within the tika-entity, only the standard text
(content) is populated.
do i have to use copyField for test_text to get the data?
or is there a problem with the entity-h