Regarding WordDelimiterFactory

2010-09-09 Thread Sandhya Agarwal
Hello, I have a file with the input string "91{40}9490949090", and I wanted to return this file when I search for the query string "+91?40?9*". The problem is that, the input string is getting indexed as 3 terms "91", "40", "9490949090". Is there a way to consider "{" and "}" as part of the

Tika language extraction

2010-06-10 Thread Sandhya Agarwal
Hello, It is observed that TIKA does not extract the "Content-Language" for documents encoded in UTF-8. For natively encoded documents, it works fine. Any idea on how we can resolve this ? Thanks, Sandhya

Re: Example of using "stream.file" to post a binary file to solr

2010-05-07 Thread Sandhya Agarwal
Yes, I did. But, I don't find a solrj example there. The example in the doc uses curl. - Sent from iPhone On 07-May-2010, at 8:12 PM, "Chris Hostetter" wrote: > : Sorry. That is what I meant. But, I put it wrongly. I have not been > : able to find examples of using solrj, for this. > > did

Re: Example of using "stream.file" to post a binary file to solr

2010-05-06 Thread Sandhya Agarwal
Sorry. That is what I meant. But, I put it wrongly. I have not been able to find examples of using solrj, for this. - Sent from iPhone On 07-May-2010, at 1:23 AM, "Chris Hostetter" wrote: > > : Subject: Example of using "stream.file" to post a binary file to > solr >... > : Can somebo

Example of using "stream.file" to post a binary file to solr

2010-05-06 Thread Sandhya Agarwal
Hello, Can somebody please point me to an example, of how we can leverage *stream.file* for streaming documents, using UpdateRequest API. (SolrJ API) Thanks, Sandhya

RE: Problem with pdf, upgrading Cell

2010-05-06 Thread Sandhya Agarwal
On May 4, 2010, at 3:28 AM, Sandhya Agarwal wrote: > > > Hello, > > > > > > > > But I see that the libraries are being loaded : > > > > > > > > INFO: Adding specified lib dirs to ClassLoader > > > > May 4, 2010 12:49:59 PM org.apac

RE: Problem with pdf, upgrading Cell

2010-05-05 Thread Sandhya Agarwal
hemas-3.6.jar > > poi-scratchpad-3.6.jar > > tagsoup-1.2.jar > > tika-core-0.7.jar > > tika-parsers-0.7.jar > > xml-apis-1.0.b2.jar > > xmlbeans-2.3.0.jar > > > > Thanks, > > Sandhya > > > > > > > > -Original Message

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
.jar metadata-extractor-2.4.0-beta-1.jar pdfbox-1.1.0.jar poi-3.6.jar poi-ooxml-3.6.jar poi-ooxml-schemas-3.6.jar poi-scratchpad-3.6.jar tagsoup-1.2.jar tika-core-0.7.jar tika-parsers-0.7.jar xml-apis-1.0.b2.jar xmlbeans-2.3.0.jar Thanks, Sandhya -Original Message- From: Sandhya Agarwal

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
Thanks, Praveen On Tue, May 4, 2010 at 5:28 PM, Sandhya Agarwal wrote: > Both the files work for me, Praveen. > > Thanks, > Sandhya > > From: Praveen Agrawal [mailto:pkal...@gmail.com] > Sent: Tuesday, May 04, 2010 5:22 PM > To: solr-user@lucene.apache.org

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
ote: Yes Sandhya, i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is what you were asking. Thanks. On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal mailto:sagar...@opentext.com>> wrote: Praveen, Along with the tika core and parser jars, did you run "mvn dependency:copy-

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Yes Sandhya, i copied new poi/jempbox/pdfbox/fontbox etc jars too. I believe this is what you were asking. Thanks. On Tue, May 4, 2010 at 5:01 PM, Sandhya Agarwal wrote: > Praveen, > > Along with the tika core and parser

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
Praveen, Along with the tika core and parser jars, did you run "mvn dependency:copy-dependencies", to generate all the dependencies too. Thanks, Sandhya -Original Message- From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent: Tuesday, May 04, 2010 4:52 PM To: solr-user@lucene.apache.o

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
On Behalf Of Grant Ingersoll Sent: Tuesday, May 04, 2010 4:00 PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Yes, it is loading the libraries, but they are in a different classloader that apparently the new way Tika loads doesn't have access to. -Gra

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
-Original Message- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Tuesday, May 04, 2010 1:10 PM To: solr-user@lucene.apache.org Subject: RE: Problem with pdf, upgrading Cell Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved the issue and the content

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
Yes, Grant. You are right. Copying the tika libraries to solr webapp, solved the issue and the content extraction works fine now. Thanks, Sandhya -Original Message- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Tuesday, May 04, 2010 12:58 PM To: solr-user@lucene.apache.org

RE: Problem with pdf, upgrading Cell

2010-05-04 Thread Sandhya Agarwal
Hello, But I see that the libraries are being loaded : INFO: Adding specified lib dirs to ClassLoader May 4, 2010 12:49:59 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/C:/apache-solr-1.4.0/contrib/extraction/lib/asm-3.1.jar' to classloader May 4, 2010

RE: Problem with pdf, upgrading Cell

2010-05-03 Thread Sandhya Agarwal
Hello, Please let me know if anybody figured out a way out of this issue. Thanks, Sandhya -Original Message- From: Praveen Agrawal [mailto:pkal...@gmail.com] Sent: Friday, April 30, 2010 11:14 PM To: solr-user@lucene.apache.org Subject: Re: Problem with pdf, upgrading Cell Grant, You

RE: Indexing metadata in solr using ContentStreamUpdateRequest

2010-04-30 Thread Sandhya Agarwal
@lucene.apache.org Subject: Re: Indexing metadata in solr using ContentStreamUpdateRequest What does your schema look like? On Apr 30, 2010, at 3:47 AM, Sandhya Agarwal wrote: > Hello, > > I am using ContentStreamUpdateRequest, to index binary documents. At the time > of indexing the content,

RE: Problem with pdf, upgrading Cell

2010-04-30 Thread Sandhya Agarwal
I observed the same issue too, with tika 0.7 jars. It now fails to extract content from documents of any type. Works with tika 0.5 though. Thanks, Sandhya -Original Message- From: pk [mailto:pkal...@gmail.com] Sent: Friday, April 30, 2010 3:17 PM To: solr-user@lucene.apache.org Subject:

Indexing metadata in solr using ContentStreamUpdateRequest

2010-04-30 Thread Sandhya Agarwal
Hello, I am using ContentStreamUpdateRequest, to index binary documents. At the time of indexing the content, I want to be able to index some additional metadata as well. I believe, this metadata must be provided, prefixed with *literal*. For instance, I have a field named “field1”, defined in

Indexing zip files

2010-04-27 Thread Sandhya Agarwal
Hello, I see that solr 1.4 is bundled with tika 0.4, which does not do proper content extraction of zip files. So, I replaced tika jars with the latest tika 0.7 jars. I still see an issue and the individual files in the zip file are not being indexed. Any configuration I must do to get this wor

dismax vs the standard query handlers

2010-04-20 Thread Sandhya Agarwal
Hello, What are the advantages of using the “dismax” query handler vs the “standard” query handler. As I understand, “dismax” queries are parsed differently and provide more flexibility w.r.t score boosting etc. Do we have any more reasons ? Thanks, Sandhya

RE: Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Thanks Erick. Using parentheses works. With parentheses, the query,q=field1: (this is a good string) is parsed as follows : +field1:this +field1:good +field1:string Is that ok to do. Thanks, Sandhya -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tues

RE: Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Also, one of the fields here, *field3* is a dynamic field. All the other fields except this field, are copied into "text" with copyField. Thanks, Sandhya -Original Message----- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Monday, April 19, 2010 2:55 PM To:

RE: Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Thank You Mitch. I have a query mentioned below : (my defaultOperator is set to "AND") (field1 : This is a good string AND field2 : This is a good string AND field3 : This is a good string AND (field4 : ASCIIDocument OR field4 : BinaryDocument OR field4 : HTMLDocument) AND field5 : doc) This i

Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Hello, I am confused about the proper usage of the Boolean operators, AND, OR and NOT. Could somebody please provide me an easy to understand explanation. Thanks, Sandhya

Query regarding "copyField"

2010-04-18 Thread Sandhya Agarwal
Hello, Is it a problem if I use *copyField* for some fields and not for others. In my query, I have both fields, the ones mentioned in copyField and ones that are not copied to a common destination. Will this cause an anomaly in my search results. I am seeing some weird behavior. Thanks, Sandh

RE: solr numeric range queries

2010-04-14 Thread Sandhya Agarwal
ng Solr1.4 or later, take a look at solr trie range support. http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ Ankit -Original Message----- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Wednesday, April 14, 2010 7:56 AM To:

RE: DIH

2010-04-14 Thread Sandhya Agarwal
sor (I think) FLEP walks the directory and supplies a separate record per file. BFDS pulls the file and supplies it to TikaEntityProcessor. BinFileDataSource is not documented, but you need it for binary data streams like PDF & Word. For text files, use FileDataSource. On 4/14/10, Sandh

DIH

2010-04-14 Thread Sandhya Agarwal
Hello, We want to design a solution where we have one polling directory (data source directory) containing the xml files, of all data that must be indexed. These XML files contain a reference to the content file. So, we need another datasource that must be created for the content files. Could s

RE: solr numeric range queries

2010-04-14 Thread Sandhya Agarwal
ay, April 14, 2010 5:09 PM To: solr-user@lucene.apache.org Subject: Re: solr numeric range queries On Apr 14, 2010, at 6:09 AM, Sandhya Agarwal wrote: > Hello, > > As I understand, we have to use the syntax { * TO } or [ * > TO ], for queries less than or less than or equal >

solr numeric range queries

2010-04-14 Thread Sandhya Agarwal
Hello, As I understand, we have to use the syntax { * TO } or [ * TO ], for queries less than or less than or equal to , etc; Where is a numeric field. There is no direct < or <= syntax supported. Is that correct ? Thanks, Sandhya

RE: Internal Server Error

2010-04-13 Thread Sandhya Agarwal
arini wrote: > Some problem with extraction (Tika, etc...)? My suggestion is : try to > extract manually the document...I had a lot of problem with Tika and pdf > extraction... > > Cheers, > Andrea > > Il 13/04/2010 13:05, Sandhya Agarwal ha scritto: >> >> Hello,

Internal Server Error

2010-04-13 Thread Sandhya Agarwal
Hello, I have the following piece of code : ContentStreamUpdateRequest contentUpdateRequest = new ContentStreamUpdateRequest("/update/extract"); contentUpdateRequest.addFile(new File(contentFileName)); contentUpdateRequest.setParam("extractOnly","true"); NamedList result = solrServerSession.req