Hi again ,
I just came from trying the version 1.5-dev from Solr trunk.
After applying the patch you provided, and adding icu4j-3_8_1 in classpath,
results are pretty good different then before.
Now words and texts are not reversed and are displayed correctly except some
pdf files's text parts that
On Tue, Mar 9, 2010 at 9:44 AM, Abdelhamid ABID wrote:
> I put ICU4J 4.2 in the lib of Solr, nothing changed, I'm trying now with
> ICU4J 3.8
>
Hello, what version of Solr are you using? I think you will need to
use the trunk version.
I created a patch for this issue that you can apply to trunk
I tried couples of times to get this patch, but downloads fail, filesize
missmach or someting like error poped up
is there another link
On 3/9/10, Dominique Bejean wrote:
>
> Hi,
>
> The problem comes form PDFBox (
> http://brutus.apache.org/jira/browse/PDFBOX-377) and is fixed now. However
> Tik
I put ICU4J 4.2 in the lib of Solr, nothing changed, I'm trying now with
ICU4J 3.8
On 3/9/10, Robert Muir wrote:
>
> I think the problem is that Solr does not include the ICU4J jar, so it
> won't work with Arabic PDF files.
>
> Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your
On Tue, Mar 9, 2010 at 10:10 AM, Abdelhamid ABID wrote:
> nor 3.8 version does change anythings !
>
the patch (https://issues.apache.org/jira/browse/SOLR-1813) can only
work on Solr trunk. It will not work with Solr 1.4.
Solr 1.4 uses pdfbox-0.7.3.jar, which does not support Arabic.
Solr trunk
nor 3.8 version does change anythings !
On 3/9/10, Robert Muir wrote:
>
> I think the problem is that Solr does not include the ICU4J jar, so it
> won't work with Arabic PDF files.
>
> Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your
> classpath.
>
>
> On Mon, Mar 8, 2010 at 6
I doen't know about pdftotext, is it pluggable with Solr, or do we need
hard-code the step of extraction before Solr turn.
On 3/9/10, Dominique Bejean wrote:
>
> Hi,
>
> The problem comes form PDFBox (
> http://brutus.apache.org/jira/browse/PDFBOX-377) and is fixed now. However
> Tika doesn't yet
this depends on what version of solr you are using, the trunk version
has a version of tika that supports this. See SOLR-1813
On Tue, Mar 9, 2010 at 3:59 AM, Dominique Bejean
wrote:
> Hi,
>
> The problem comes form PDFBox
> (http://brutus.apache.org/jira/browse/PDFBOX-377) and is fixed now. Howev
I'm using 1.4 version of Solr
On 3/9/10, Robert Muir wrote:
>
> On Tue, Mar 9, 2010 at 9:44 AM, Abdelhamid ABID
> wrote:
> > I put ICU4J 4.2 in the lib of Solr, nothing changed, I'm trying now with
> > ICU4J 3.8
> >
>
>
> Hello, what version of Solr are you using? I think you will need to
> use
nor 3.8 version does change anythings !
On 3/9/10, Robert Muir wrote:
>
> I think the problem is that Solr does not include the ICU4J jar, so it
> won't work with Arabic PDF files.
>
> Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your
> classpath.
>
>
> On Mon, Mar 8, 2010 at 6
sorry for the link to the wrong JIRA issue, was looking at another issue.
its here: https://issues.apache.org/jira/browse/SOLR-1813
again you will need to apply it to trunk I think, as thats the only
place I have tested it.
--
Robert Muir
rcm...@gmail.com
Hi,
The problem comes form PDFBox
(http://brutus.apache.org/jira/browse/PDFBOX-377) and is fixed now.
However Tika doesn't yet use this version of PDFBox.
So for PDF text extraction, I doesn't use Tika but pdftotext.
Dominique
Le 09/03/10 06:00, Robert Muir a écrit :
it is an optional depe
it is an optional dependency of PDFBox. If ICU is available, then it
is capable of processing Arabic PDF files.
The problem is that Arabic "text" in PDF files is really glyphs
(encoded in visual order) and needs to be 'unshaped' with some stuff
that isn't in the JDK.
If the size of the default IC
Is this a mistake in the Tika library collection in the Solr trunk?
On Mon, Mar 8, 2010 at 5:15 PM, Robert Muir wrote:
> I think the problem is that Solr does not include the ICU4J jar, so it
> won't work with Arabic PDF files.
>
> Try putting ICU4J 3.8 (http://site.icu-project.org/download) in y
I think the problem is that Solr does not include the ICU4J jar, so it
won't work with Arabic PDF files.
Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your classpath.
On Mon, Mar 8, 2010 at 6:30 PM, Abdelhamid ABID wrote:
> Hi,
> Posting arabic pdf files to Solr using a web fo
15 matches
Mail list logo