Hello,
I have a file with the input string "91{40}9490949090", and I wanted to return
this file when I search for the query string "+91?40?9*". The problem is that,
the input string is getting indexed as 3 terms "91", "40", "9490949090". Is
there a way to consider "{" and "}" as part of the
Hello,
It is observed that TIKA does not extract the "Content-Language" for documents
encoded in UTF-8. For natively encoded documents, it works fine. Any idea on
how we can resolve this ?
Thanks,
Sandhya
Yes, I did. But, I don't find a solrj example there. The example in
the doc uses curl.
- Sent from iPhone
On 07-May-2010, at 8:12 PM, "Chris Hostetter"
wrote:
> : Sorry. That is what I meant. But, I put it wrongly. I have not been
> : able to find examples of using solrj, for this.
>
> did
Sorry. That is what I meant. But, I put it wrongly. I have not been
able to find examples of using solrj, for this.
- Sent from iPhone
On 07-May-2010, at 1:23 AM, "Chris Hostetter"
wrote:
>
> : Subject: Example of using "stream.file" to post a binary file to
> solr
>...
> : Can somebo
Hello,
Can somebody please point me to an example, of how we can leverage
*stream.file* for streaming documents, using UpdateRequest API. (SolrJ API)
Thanks,
Sandhya
On May 4, 2010, at 3:28 AM, Sandhya Agarwal wrote:
>
> > Hello,
> >
> >
> >
> > But I see that the libraries are being loaded :
> >
> >
> >
> > INFO: Adding specified lib dirs to ClassLoader
> >
> > May 4, 2010 12:49:59 PM org.apac
hemas-3.6.jar
> > poi-scratchpad-3.6.jar
> > tagsoup-1.2.jar
> > tika-core-0.7.jar
> > tika-parsers-0.7.jar
> > xml-apis-1.0.b2.jar
> > xmlbeans-2.3.0.jar
> >
> > Thanks,
> > Sandhya
> >
> >
> >
> > -Original Message
.jar
metadata-extractor-2.4.0-beta-1.jar
pdfbox-1.1.0.jar
poi-3.6.jar
poi-ooxml-3.6.jar
poi-ooxml-schemas-3.6.jar
poi-scratchpad-3.6.jar
tagsoup-1.2.jar
tika-core-0.7.jar
tika-parsers-0.7.jar
xml-apis-1.0.b2.jar
xmlbeans-2.3.0.jar
Thanks,
Sandhya
-Original Message-
From: Sandhya Agarwal
Thanks,
Praveen
On Tue, May 4, 2010 at 5:28 PM, Sandhya Agarwal wrote:
> Both the files work for me, Praveen.
>
> Thanks,
> Sandhya
>
> From: Praveen Agrawal [mailto:pkal...@gmail.com]
> Sent: Tuesday, May 04, 2010 5:22 PM
> To: solr-user@lucene.apache.org
ote:
Yes Sandhya,
i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is what
you were asking.
Thanks.
On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal
mailto:sagar...@opentext.com>> wrote:
Praveen,
Along with the tika core and parser jars, did you run "mvn
dependency:copy-
-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell
Yes Sandhya,
i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is what
you were asking.
Thanks.
On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal wrote:
> Praveen,
>
> Along with the tika core and parser
Praveen,
Along with the tika core and parser jars, did you run "mvn
dependency:copy-dependencies", to generate all the dependencies too.
Thanks,
Sandhya
-Original Message-
From: Praveen Agrawal [mailto:pkal...@gmail.com]
Sent: Tuesday, May 04, 2010 4:52 PM
To: solr-user@lucene.apache.o
On Behalf Of Grant Ingersoll
Sent: Tuesday, May 04, 2010 4:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell
Yes, it is loading the libraries, but they are in a different classloader that
apparently the new way Tika loads doesn't have access to.
-Gra
-Original Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com]
Sent: Tuesday, May 04, 2010 1:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Problem with pdf, upgrading Cell
Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved
the issue and the content
Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved
the issue and the content extraction works fine now.
Thanks,
Sandhya
-Original Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com]
Sent: Tuesday, May 04, 2010 12:58 PM
To: solr-user@lucene.apache.org
Hello,
But I see that the libraries are being loaded :
INFO: Adding specified lib dirs to ClassLoader
May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' to
classloader
May 4, 2010
Hello,
Please let me know if anybody figured out a way out of this issue.
Thanks,
Sandhya
-Original Message-
From: Praveen Agrawal [mailto:pkal...@gmail.com]
Sent: Friday, April 30, 2010 11:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Problem with pdf, upgrading Cell
Grant,
You
@lucene.apache.org
Subject: Re: Indexing metadata in solr using ContentStreamUpdateRequest
What does your schema look like?
On Apr 30, 2010, at 3:47 AM, Sandhya Agarwal wrote:
> Hello,
>
> I am using ContentStreamUpdateRequest, to index binary documents. At the time
> of indexing the content,
I observed the same issue too, with tika 0.7 jars. It now fails to extract
content from documents of any type. Works with tika 0.5 though.
Thanks,
Sandhya
-Original Message-
From: pk [mailto:pkal...@gmail.com]
Sent: Friday, April 30, 2010 3:17 PM
To: solr-user@lucene.apache.org
Subject:
Hello,
I am using ContentStreamUpdateRequest, to index binary documents. At the time
of indexing the content, I want to be able to index some additional metadata as
well. I believe, this metadata must be provided, prefixed with *literal*. For
instance, I have a field named “field1”, defined in
Hello,
I see that solr 1.4 is bundled with tika 0.4, which does not do proper content
extraction of zip files. So, I replaced tika jars with the latest tika 0.7
jars. I still see an issue and the individual files in the zip file are not
being indexed. Any configuration I must do to get this wor
Hello,
What are the advantages of using the “dismax” query handler vs the “standard”
query handler. As I understand, “dismax” queries are parsed differently and
provide more flexibility w.r.t score boosting etc. Do we have any more reasons ?
Thanks,
Sandhya
Thanks Erick. Using parentheses works.
With parentheses, the query,q=field1: (this is a good string) is parsed as
follows :
+field1:this +field1:good +field1:string
Is that ok to do.
Thanks,
Sandhya
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tues
Also, one of the fields here, *field3* is a dynamic field. All the other fields
except this field, are copied into "text" with copyField.
Thanks,
Sandhya
-Original Message-----
From: Sandhya Agarwal [mailto:sagar...@opentext.com]
Sent: Monday, April 19, 2010 2:55 PM
To:
Thank You Mitch.
I have a query mentioned below : (my defaultOperator is set to "AND")
(field1 : This is a good string AND field2 : This is a good string AND field3 :
This is a good string AND (field4 : ASCIIDocument OR field4 : BinaryDocument OR
field4 : HTMLDocument) AND field5 : doc)
This i
Hello,
I am confused about the proper usage of the Boolean operators, AND, OR and NOT.
Could somebody please provide me an easy to understand explanation.
Thanks,
Sandhya
Hello,
Is it a problem if I use *copyField* for some fields and not for others. In my
query, I have both fields, the ones mentioned in copyField and ones that are
not copied to a common destination. Will this cause an anomaly in my search
results. I am seeing some weird behavior.
Thanks,
Sandh
ng Solr1.4 or later, take a look at solr
trie range support.
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
Ankit
-Original Message-----
From: Sandhya Agarwal [mailto:sagar...@opentext.com]
Sent: Wednesday, April 14, 2010 7:56 AM
To:
sor (I think)
FLEP walks the directory and supplies a separate record per file.
BFDS pulls the file and supplies it to TikaEntityProcessor.
BinFileDataSource is not documented, but you need it for binary data
streams like PDF & Word. For text files, use FileDataSource.
On 4/14/10, Sandh
Hello,
We want to design a solution where we have one polling directory (data source
directory) containing the xml files, of all data that must be indexed. These
XML files contain a reference to the content file. So, we need another
datasource that must be created for the content files. Could s
ay, April 14, 2010 5:09 PM
To: solr-user@lucene.apache.org
Subject: Re: solr numeric range queries
On Apr 14, 2010, at 6:09 AM, Sandhya Agarwal wrote:
> Hello,
>
> As I understand, we have to use the syntax { * TO } or [ *
> TO ], for queries less than or less than or equal
>
Hello,
As I understand, we have to use the syntax { * TO } or [ * TO
], for queries less than or less than or equal to , etc;
Where is a numeric field.
There is no direct < or <= syntax supported. Is that correct ?
Thanks,
Sandhya
arini
wrote:
> Some problem with extraction (Tika, etc...)? My suggestion is : try to
> extract manually the document...I had a lot of problem with Tika and pdf
> extraction...
>
> Cheers,
> Andrea
>
> Il 13/04/2010 13:05, Sandhya Agarwal ha scritto:
>>
>> Hello,
Hello,
I have the following piece of code :
ContentStreamUpdateRequest contentUpdateRequest = new
ContentStreamUpdateRequest("/update/extract");
contentUpdateRequest.addFile(new File(contentFileName));
contentUpdateRequest.setParam("extractOnly","true");
NamedList result = solrServerSession.req
34 matches
Mail list logo