Re: dataimporter tika fields empty

2013-08-23 Thread Andreas Owen
i changed following line (xpath): On 22. Aug 2013, at 10:06 PM, Alexandre Rafalovitch wrote: > Ah. That's because Tika processor does not support path extraction. You > need to nest one more level. > > Regards, > Alex > On 22 Aug 2013 13:34, "Andreas Owen" wrote: > >> i can do it like th

Re: dataimporter tika fields empty

2013-08-23 Thread Andreas Owen
ok but i'm not doing any path extraction, at least i don't think so. htmlMapper="identity" isn't preserving html it's reading the content of the pages but it's not putting it into "text_test" and "text". it's only in "text_test" the copyField isn't working. data-config.xml:

Re: dataimporter tika fields empty

2013-08-22 Thread Alexandre Rafalovitch
Ah. That's because Tika processor does not support path extraction. You need to nest one more level. Regards, Alex On 22 Aug 2013 13:34, "Andreas Owen" wrote: > i can do it like this but then the content isn't copied to text. it's just > in text_test > > url="${rec.path}${rec.file}" dataS

Re: dataimporter tika fields empty

2013-08-22 Thread Andreas Owen
i can do it like this but then the content isn't copied to text. it's just in text_test On 22. Aug 2013, at 6:12 PM, Andreas Owen wrote: > i put it in the tika-entity as attribute, but it doesn't change anything. my > bigger concern is why text_test isn't populated at all

Re: dataimporter tika fields empty

2013-08-22 Thread Andreas Owen
i put it in the tika-entity as attribute, but it doesn't change anything. my bigger concern is why text_test isn't populated at all On 22. Aug 2013, at 5:27 PM, Alexandre Rafalovitch wrote: > Can you try SOLR-4530 switch: > https://issues.apache.org/jira/browse/SOLR-4530 > > Specifically, setti

Re: dataimporter tika fields empty

2013-08-22 Thread Alexandre Rafalovitch
Can you try SOLR-4530 switch: https://issues.apache.org/jira/browse/SOLR-4530 Specifically, setting htmlMapper="identity" on the entity definition. This will tell Tika to send full HTML rather than a seriously stripped one. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn:

dataimporter tika fields empty

2013-08-22 Thread Andreas Owen
i'm trying to index a html page and only user the div with the id="content". unfortunately nothing is working within the tika-entity, only the standard text (content) is populated. do i have to use copyField for test_text to get the data? or is there a problem with the entity-h