Re: Pass Double Quotes using SolrJ

2009-04-06 Thread Shalin Shekhar Mangar
On Mon, Apr 6, 2009 at 10:56 AM, dabboo  wrote:

>
> I want to pass double quotes to my solr from the front end, so that it can
> return the specific results of that particular phrase which is there in
> double quotes.
>
> If I use httpClient, it doesnt allow me to send the query in this format.
> As
> it throws me an invalid query exception.
>
> I want to know, if I can do this with SolrJ Client. If yes, can somebody
> please let me know as how SolrJ is doing this and parsing this type of
> query.
>
>
Amit, look at
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Escaping%20Special%20Charactersfor
the list of characters that need to be escaped. Look at
ClientUtils.escapeQueryChars() method in Solrj. I'm curious to know why you
are trying to roll your own solr client when Solrj exists?

-- 
Regards,
Shalin Shekhar Mangar.


How could I limit a specific field size ?

2009-04-06 Thread Veselin K
Hello,

I'm trying to tune my Solr installation, specifically the search
results.

At present, my search queries return some standard fields like filename,
filepath and text of the matching file.

However the text field contains the full contents of the file, which is
not very efficient in my case.

I'd like to copy the "text" field to a field called "preview" and
then limit the "preview" field to just a few lines of text (or number of
terms). 

Then I could configure retrieving the "preview" field instead of "text"
upon search.

Is there a way to specify such size limits per field or something similar?


Thank you much.

Regards,
Veselin K


Boost fileds at indexing ot query time

2009-04-06 Thread Marc Sturlese

Hey there,
Don't know if I shoud ask this in here or in Lucene Users forum...
I have a doubt with field boosting (I am using dismax). I use document
boosting at index time to give more importance to some documents. At this
point, I don't care about the matching, just want to tell solr/lucene that
that docuemtns are more important than the others.
In another hand, I give field boost at query time to some fields because I
want to give more importance to the match in that field than in the others.
Until here everything clear... but I am missing the concept of field boost
at index time... waht is it used for?
-- 
View this message in context: 
http://www.nabble.com/Boost-fileds-at-indexing-ot-query-time-tp22904463p22904463.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH API for specifying a either specific or all configurations imported

2009-04-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
There is a debug mode
http://wiki.apache.org/solr/DataImportHandler#head-0b0ff832aa29f5ba39c22b99603996e8a2f2d801

On Mon, Apr 6, 2009 at 2:35 PM, Wesley Small  wrote:
> Good Morning,
>
> Is there any way to specify or debug a specific DIH configuration via the
> API/http request?
>
> I have the following:
>
> 
> dih_pc_default_feed.xml
> 
> 
> dih_pc_cms_article_feed.xml
> 
> 
> dih_pc_local_event_feed.xml
> 
>
> For example, is there any to specific only the "pc_local_event" be process
> (imported)?


>
> Another questions, if command=full-import, this should effectively mean that
> all DIH configuration are executed in sequential order.  Is that correct?  I
> am not seeing that behaviour at present.
>
> Thanks,
> Wesley
>
>



-- 
--Noble Paul


Re: How could I avoid reindexing same files?

2009-04-06 Thread Veselin K
Hello Paul,
I'm indexing with "curl http://localhost... -F myfi...@file.pdf" 

Regards,
Veselin K


On Mon, Apr 06, 2009 at 02:56:20PM +0530, Noble Paul ?  
?? wrote:
> how are you indexing?
> 
> On Mon, Apr 6, 2009 at 2:54 PM, Veselin Kantsev
>  wrote:
> > Hello,
> > apologies for the basic question.
> >
> > How can I avoid double indexing files?
> >
> > In case all my files are in one folder which is scanned frequently, is
> > there a Solr feature of checking and skipping a file if it has already been 
> > indexed
> > and not changed since?
> >
> >
> > Thank you.
> >
> > Regards,
> > Veselin K
> >
> >
> 
> 
> 
> -- 
> --Noble Paul


Re: DIH API for specifying a either specific or all configurations imported

2009-04-06 Thread Fergus McMenemie
>Good Morning,
>
>Is there any way to specify or debug a specific DIH configuration via the
>API/http request?
>
>I have the following:
>
>
>dih_pc_default_feed.xml
>
>
>dih_pc_cms_article_feed.xml
>
>
>dih_pc_local_event_feed.xml
>
>
>For example, is there any to specific only the "pc_local_event" be process
>(imported)?
>
>Another questions, if command=full-import, this should effectively mean that
>all DIH configuration are executed in sequential order.  Is that correct?  I
>am not seeing that behaviour at present.
>

Wesley,

I do not think the above is valid syntactically.

I am a still coming up to speed on DIH, however I have taken to storing all
my DIH import configurations in a single file. Each of your different
configurations would be within its own top level entity tag. Each of which
MUST be named. It is also a good idea to explicitly name each of your
datasource descriptions, and then have the entities reference there datasource
by name. I can then invoke only that entity from the URL as follows:-

http://localhost:8080/apache-solr-1.4-dev/dataimport?command=full-import&entity=jc

See the docs at:-

http://wiki.apache.org/solr/DataImportHandler#head-1582242c1bfc1f3e89f4025bf2055791848acefb

Fergus.

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


maxBufferedDocs

2009-04-06 Thread Gargate, Siddharth
I see two entries of maxBufferedDocs property in solrconfig.xml. One in
indexDefaults tag and other in mainIndex tag commented as Deprecated. So
is this property required and gets used? What if remove the
indexDefaults tag altogether?
 
Thanks,
Siddharth


What is QTime a measure of?

2009-04-06 Thread Andrew McCombe
Hi

Just started using Solr/Lucene and am getting to grips with it.  Great
product!

What is the QTime a measure of?  is it milliseconds, seconds?  I tried a
Google search but couldn't fins anything definitive.

Thanks In Advance

Andrew McCombe


Re: How could I limit a specific field size ?

2009-04-06 Thread Shalin Shekhar Mangar
On Mon, Apr 6, 2009 at 1:52 PM, Veselin K wrote:

>
> I'd like to copy the "text" field to a field called "preview" and
> then limit the "preview" field to just a few lines of text (or number of
> terms).
>
> Then I could configure retrieving the "preview" field instead of "text"
> upon search.
>
> Is there a way to specify such size limits per field or something similar?
>
>
Yes, there is a maxLength attribute for a copyField which you can use:



-- 
Regards,
Shalin Shekhar Mangar.


How could I avoid reindexing same files?

2009-04-06 Thread Veselin Kantsev
Hello,
apologies for the basic question.

How can I avoid double indexing files?

In case all my files are in one folder which is scanned frequently, is
there a Solr feature of checking and skipping a file if it has already been 
indexed
and not changed since?


Thank you.

Regards,
Veselin K



Re: How could I avoid reindexing same files?

2009-04-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
how are you indexing?

On Mon, Apr 6, 2009 at 2:54 PM, Veselin Kantsev
 wrote:
> Hello,
> apologies for the basic question.
>
> How can I avoid double indexing files?
>
> In case all my files are in one folder which is scanned frequently, is
> there a Solr feature of checking and skipping a file if it has already been 
> indexed
> and not changed since?
>
>
> Thank you.
>
> Regards,
> Veselin K
>
>



-- 
--Noble Paul


DIH API for specifying a either specific or all configurations imported

2009-04-06 Thread Wesley Small
Good Morning,

Is there any way to specify or debug a specific DIH configuration via the
API/http request?

I have the following:


dih_pc_default_feed.xml


dih_pc_cms_article_feed.xml


dih_pc_local_event_feed.xml


For example, is there any to specific only the "pc_local_event" be process
(imported)?

Another questions, if command=full-import, this should effectively mean that
all DIH configuration are executed in sequential order.  Is that correct?  I
am not seeing that behaviour at present.

Thanks,
Wesley



Re: maxBufferedDocs

2009-04-06 Thread Marc Sturlese

maxBufferedDocs is deprecated,  better use ramBufferSizeMB. In case you have
both specified, the more restrictive will be the one that will be used.
You can remove the config of indexDefaults if you have your index
configuration in mainIndex.


Gargate, Siddharth wrote:
> 
> I see two entries of maxBufferedDocs property in solrconfig.xml. One in
> indexDefaults tag and other in mainIndex tag commented as Deprecated. So
> is this property required and gets used? What if remove the
> indexDefaults tag altogether?
>  
> Thanks,
> Siddharth
> 
> 

-- 
View this message in context: 
http://www.nabble.com/maxBufferedDocs-tp22905364p22905494.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH API for specifying a either specific or all configurations imported

2009-04-06 Thread Shalin Shekhar Mangar
On Mon, Apr 6, 2009 at 2:35 PM, Wesley Small wrote:

>
> Is there any way to specify or debug a specific DIH configuration via the
> API/http request?
>
> I have the following:
>
> 
> dih_pc_default_feed.xml
> 
> 
> dih_pc_cms_article_feed.xml
> 
> 
> dih_pc_local_event_feed.xml
> 
>

That is not a valid configuration. There can be only one single config (the
one specified under "defaults") per core.


>
> For example, is there any to specific only the "pc_local_event" be process
> (imported)?


Perhaps what you intend to do, can be achieved through multiple root
entities in the same data-config.xml ?


> Another questions, if command=full-import, this should effectively mean
> that
> all DIH configuration are executed in sequential order.  Is that correct?
>  I
> am not seeing that behaviour at present.


All root entities are executed sequentially. What behavior are you seeing?

-- 
Regards,
Shalin Shekhar Mangar.


Re: What is QTime a measure of?

2009-04-06 Thread Shalin Shekhar Mangar
On Mon, Apr 6, 2009 at 4:38 PM, Andrew McCombe  wrote:

>
> Just started using Solr/Lucene and am getting to grips with it.  Great
> product!


Welcome to Solr!


> What is the QTime a measure of?  is it milliseconds, seconds?  I tried a
> Google search but couldn't fins anything definitive.
>

QTime is the elapsed time (in milliseconds) between the arrival of the
request (when the SolrQueryRequest object is created) and the completion of
the request handler. In other words, it will tell you how long it took to
execute your query including things like query parsing, the actual search,
faceting etc.
-- 
Regards,
Shalin Shekhar Mangar.


Re: How could I limit a specific field size ?

2009-04-06 Thread Veselin Kantsev
Thank you very much Shalin.


Regards,
Veselin K

On Mon, Apr 06, 2009 at 02:19:05PM +0530, Shalin Shekhar Mangar wrote:
> On Mon, Apr 6, 2009 at 1:52 PM, Veselin K wrote:
> 
> >
> > I'd like to copy the "text" field to a field called "preview" and
> > then limit the "preview" field to just a few lines of text (or number of
> > terms).
> >
> > Then I could configure retrieving the "preview" field instead of "text"
> > upon search.
> >
> > Is there a way to specify such size limits per field or something similar?
> >
> >
> Yes, there is a maxLength attribute for a copyField which you can use:
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.


Re: Multiple Core schemas with single solr.solr.home

2009-04-06 Thread Walter Ferrara
the only issue you may have will be related to software that writes files in
solr-home, but the only one I can think of is dataimport.properties of DIH,
so if you use DIH, you may want to make dataimport.properties location to be
configurable dinamically, like an entry in data-config.xml, otherwise each
import on a core will change the file for all cores; Another (easier?
safer?) option would be to use symbolic links, i.e make a dir per core and
add in each one a simbolic link for xml files, so that they all read the
same.


On Sat, Apr 4, 2009 at 6:28 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Sat, Apr 4, 2009 at 9:51 PM, Rakesh Sinha  >wrote:
>
> > I am planning to configure a solr server with multiple cores with
> > different schema for themselves with a single solr.solr.home . Are
> > there any examples in the wiki to the wiki ( the ones that I see have
> > a single schema.xml for a given solr.solr.home under schema directory.
> > ).
> >
> > Thanks for helping pointing to the same.
> >
>
> It should be possible though I don't there are any examples. You can
> specify
> the same instanceDir for different cores but different dataDir (specifying
> dataDir in solr.xml is a trunk feature)
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


How to copy the Dynamic fields into one field

2009-04-06 Thread Radha C.
Hi,
 
Can I have the dynamic field in copyField as follows,
 

 
 

 
 
Can anyone tell me please how to make the dynamic field to be available in
one field "all" ?
 


Re: Boost fileds at indexing ot query time

2009-04-06 Thread Erick Erickson
>From Hossman:

"index time field boosts are a way to express things like 'this documents
title is worth twice as much as the title of most documents'. Query time
boosts are a way to express 'I care about matches on this clause of my query
twice as much as I do about matches to other clauses of my query'.

HTH
Erick

On Mon, Apr 6, 2009 at 4:38 AM, Marc Sturlese wrote:

>
> Hey there,
> Don't know if I shoud ask this in here or in Lucene Users forum...
> I have a doubt with field boosting (I am using dismax). I use document
> boosting at index time to give more importance to some documents. At this
> point, I don't care about the matching, just want to tell solr/lucene that
> that docuemtns are more important than the others.
> In another hand, I give field boost at query time to some fields because I
> want to give more importance to the match in that field than in the others.
> Until here everything clear... but I am missing the concept of field boost
> at index time... waht is it used for?
> --
> View this message in context:
> http://www.nabble.com/Boost-fileds-at-indexing-ot-query-time-tp22904463p22904463.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Custom sort based on arbitrary order

2009-04-06 Thread pris54

Hi, 

Apologies if this question has been answered already, I'm so new to Solr
(literally a few hours using it) that I still find some of the answers a bit
obscure.

I got Apache Solr working for a Drupal install, I must implement ASAP a
custom order that is fairly simple: there is a list of venues and some of
them are more relevant than others (there is no logic, it's arbitrary, it's
not an alphabetic order), it'd be something like this:

Orange venue = 1
Red venu = 2
Blue venue = 3

So results where venue is "orange" should go first, then "red" and finally
"blue". 
Could you advice on the easiest way to have this example working?

Thanks a lot,
Paula

-- 
View this message in context: 
http://www.nabble.com/Custom-sort-based-on-arbitrary-order-tp22908037p22908037.html
Sent from the Solr - User mailing list archive at Nabble.com.



Too many open files and background merge exceptions

2009-04-06 Thread Jarek Zgoda
I'm indexing a set of 50 small documents. I'm adding documents in  
batches of 1000. At the beginning I had a setup that optimized the  
index each 1 documents, but quickly I had to optimize after adding  
each batch of documents. Unfortunately, I'm still getting the "Too  
many open files" IO error on optimize. I went from mergeFactor of 25  
down to 10, but I'm still unable to optimize the index.


I have configuration:
false
256
2
2147483647
1

The machine (2 core AMD64, 4GB RAM) is running Debian Linux, Java is  
1.6.0_11 64-Bit, Solr is nightly build (2009-04-02). And no, I can not  
change the limit of file descriptors (currently: 1024). What more can  
I do?


--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
jarek.zg...@redefine.pl



Re: Too many open files and background merge exceptions

2009-04-06 Thread Jacob Singh
try ulimit -n5 or something

On Mon, Apr 6, 2009 at 6:28 PM, Jarek Zgoda  wrote:
> I'm indexing a set of 50 small documents. I'm adding documents in
> batches of 1000. At the beginning I had a setup that optimized the index
> each 1 documents, but quickly I had to optimize after adding each batch
> of documents. Unfortunately, I'm still getting the "Too many open files" IO
> error on optimize. I went from mergeFactor of 25 down to 10, but I'm still
> unable to optimize the index.
>
> I have configuration:
>    false
>    256
>    2
>    2147483647
>    1
>
> The machine (2 core AMD64, 4GB RAM) is running Debian Linux, Java is
> 1.6.0_11 64-Bit, Solr is nightly build (2009-04-02). And no, I can not
> change the limit of file descriptors (currently: 1024). What more can I do?
>
> --
> We read Knuth so you don't have to. - Tim Peters
>
> Jarek Zgoda, R&D, Redefine
> jarek.zg...@redefine.pl
>
>



-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


Re: Too many open files and background merge exceptions

2009-04-06 Thread Walter Ferrara
you may try to put true in that useCompoundFile entry; this way indexing
should use far less file descriptors, but it will slow down indexing, see
http://issues.apache.org/jira/browse/LUCENE-888.
Try to see if the reason of lack of descriptors is related only on solr. How
are you using indexing, by using solrj, by posting xmls? Are the files being
opened/parsed on the same machine of solr?

On Mon, Apr 6, 2009 at 2:58 PM, Jarek Zgoda  wrote:

> I'm indexing a set of 50 small documents. I'm adding documents in
> batches of 1000. At the beginning I had a setup that optimized the index
> each 1 documents, but quickly I had to optimize after adding each batch
> of documents. Unfortunately, I'm still getting the "Too many open files" IO
> error on optimize. I went from mergeFactor of 25 down to 10, but I'm still
> unable to optimize the index.
>
> I have configuration:
>false
>256
>2
>2147483647
>1
>
> The machine (2 core AMD64, 4GB RAM) is running Debian Linux, Java is
> 1.6.0_11 64-Bit, Solr is nightly build (2009-04-02). And no, I can not
> change the limit of file descriptors (currently: 1024). What more can I do?
>
> --
> We read Knuth so you don't have to. - Tim Peters
>
> Jarek Zgoda, R&D, Redefine
> jarek.zg...@redefine.pl
>
>


Re: Using ExtractingRequestHandler to index a large PDF ~solved

2009-04-06 Thread Fergus McMenemie
Hmmm,

Not sure how this all hangs together. But editing my solrconfig.xml as follows
sorted the problem:-


to 



Also, my initial report of the issue was misled by the log messages. The mention
of "oceania.pdf" refers to a previous successful tika extract. There no mention 
of the filename that was rejected in the logs or any information that would help
me identify it! 

Regards Fergus.

>Sorry if this is a FAQ; I suspect it could be. But how do I work around the 
>following:-
>
>INFO: [] webapp=/apache-solr-1.4-dev path=/update/extract 
>params={ext.def.fl=text&ext.literal.id=factbook/reference_maps/pdf/oceania.pdf}
> status=0 QTime=318 
>Apr 2, 2009 11:17:46 AM org.apache.solr.common.SolrException log
>SEVERE: 
>org.apache.commons.fileupload.FileUploadBase$SizeLimitExceededException: the 
>request was rejected because its size (4585774) exceeds the configured maximum 
>(2097152)
>   at 
> org.apache.commons.fileupload.FileUploadBase$FileItemIteratorImpl.(FileUploadBase.java:914)
>   at 
> org.apache.commons.fileupload.FileUploadBase.getItemIterator(FileUploadBase.java:331)
>   at 
> org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:349)
>   at 
> org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
>   at 
> org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:343)
>   at 
> org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:396)
>   at 
> org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:114)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
>   at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
>   at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
>   at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
>   at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
>
>Although the PDF is big, it contains very little text; it is a map. 
>
>   "java -jar solr/lib/tika-0.3.jar -g" appears to have no bother with it.
>
>Fergus...
>-- 
>
>===
>Fergus McMenemie   Email:fer...@twig.me.uk
>Techmore Ltd   Phone:(UK) 07721 376021
>
>Unix/Mac/Intranets Analyst Programmer
>===

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: How could I limit a specific field size ?

2009-04-06 Thread Koji Sekiguchi

Shalin Shekhar Mangar wrote:

On Mon, Apr 6, 2009 at 1:52 PM, Veselin K wrote:

  

I'd like to copy the "text" field to a field called "preview" and
then limit the "preview" field to just a few lines of text (or number of
terms).

Then I could configure retrieving the "preview" field instead of "text"
upon search.

Is there a way to specify such size limits per field or something similar?




Yes, there is a maxLength attribute for a copyField which you can use:



  

Correction. Use maxChars, not maxLength.

Koji




Stemming and ISO Latin Accent filters together

2009-04-06 Thread Stéphane Tellier

Hi,

we're trying to apply the French Stemmer filter with the ISO Latin
Accent filter for our index, but unfortunately, we're having some bad
behaviors for some searches. After many tries, I've found out that the
French Stemmer (or Snowball with language = "french") seems to be too
sensitive to accents  : for example, we have a couple of documents with the
word "publiée". Normally, if I search for "publiée" or "publié" or "publiee"
or "publie", all this should be equivalent and returns the same results. But
in that case, "publie" and "publiee" does not work at all. I've tried the
same words after deactivating the stemming and then re-index, and
effectively, the results were good.
I've also try to change the order of the filters in the schema, but
unfortunately, it brings other kind of problems.
I know that this should be more a question for the Lucene community, but I'm
just curious if someone using Solr and working with such language seems to
encounter the same behave and has someway found a trick to fix the problem
by, for example, using another filter or using the protword list feature of
Snowball.

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Stemming-and-ISO-Latin-Accent-filters-together-tp22910690p22910690.html
Sent from the Solr - User mailing list archive at Nabble.com.



Pass Quoted Query to Solr

2009-04-06 Thread dabboo

Hi,

I am sending a query to the solr search engine from my application using
httpClient. I want to search for a specific title from the available. 

for e.g. If user wants to search for the book which have titled "Complete
Java Reference", I am sending this query to Solr having double quotes with
the search string. I have encoded the search criteria but it is still giving
me "Invalid Query Exception".

Please suggest as if there is any way to pass this query to Solr. I am not
sure though, but can we do this using SolrJ client instead of httpClient.

If yes, then how Solrj is handing this kind of request.

Thanks,
Amit Garg


-- 
View this message in context: 
http://www.nabble.com/Pass-Quoted-Query-to-Solr-tp22911184p22911184.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExtractingRequestHandler Question

2009-04-06 Thread Venu Mittal
Hi Jacob,

Thanks for the reply. I am still trying to nail down this problem with the best 
possible solution.
Yeah I had thought about these 2 approaches but both of them are gonna make my 
indexing slower.  Plus the fact that I will have atleast 5 rich text files 
associated with each document is not helping much either.

Anyways I will explore and see if I can come up with anything better (may be a 
separate index for rich text docs).

Thanks,
Venu




From: Jacob Singh 
To: solr-user@lucene.apache.org
Sent: Saturday, April 4, 2009 9:59:13 PM
Subject: Re: ExtractingRequestHandler Question

Hi TIA,

I have the same desired requirement.  If you look up in the archives,
you might find a similar thread between myself and the always super
helpful Erik Hatcher.  Basically, it can't be done (right now).

You can however use the "ExtractOnly" request handler, and just get
the extracted text back from solr, and then use xpath to get out the
attributes and then add them to your XML you are sending.

Not ideal because the file has to be transfered twice.

The only other option is to send the file as per the instructions via
POST with its attributes as POST fields.

Keep in mind that Solr documents are immutable, which means they
cannot change.  When you update a document with the same primary key,
it will simply delete the existing one and add the new one.

hth,
Jacob

On Sat, Apr 4, 2009 at 5:59 AM, Venu Mittal  wrote:
> Hi,
>
> I am using ExtractingRequestHandler to index  rich text documents.
> The way I am doing it is I get some data related to the document from 
> database and then post an xml  (containing only this data ) to solr. Then I 
> make another call to solr, which sends the actual document to be indexed.
> But while doing so I am loosing out all the other data that is related to the 
> document.
>
> Is this the right way to do handle it or am I missing out on something.
>
> TIA
>
>
>
>



-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com



  

Re: Pass Double Quotes using SolrJ

2009-04-06 Thread dabboo

My application is using the httpclient. I will have to replace this from
solrj client.

But do Solrj client supports passing query with double quotes in it.

like 
?q="Glorious Revolution"&qt=dismaxrequest

Thanks,
Amit Garg

Shalin Shekhar Mangar wrote:
> 
> On Mon, Apr 6, 2009 at 10:56 AM, dabboo  wrote:
> 
>>
>> I want to pass double quotes to my solr from the front end, so that it
>> can
>> return the specific results of that particular phrase which is there in
>> double quotes.
>>
>> If I use httpClient, it doesnt allow me to send the query in this format.
>> As
>> it throws me an invalid query exception.
>>
>> I want to know, if I can do this with SolrJ Client. If yes, can somebody
>> please let me know as how SolrJ is doing this and parsing this type of
>> query.
>>
>>
> Amit, look at
> http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Escaping%20Special%20Charactersfor
> the list of characters that need to be escaped. Look at
> ClientUtils.escapeQueryChars() method in Solrj. I'm curious to know why
> you
> are trying to roll your own solr client when Solrj exists?
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Pass-Double-Quotes-using-SolrJ-tp22902404p22912443.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Wildcard searches

2009-04-06 Thread Vauthrin, Laurent
So I've started making a QParserPlugin to handle phrase wild card
searches but I think I need a little bit of guidance.  In my plugin I've
subclassed the SolrQueryParser and overridden the getFieldQuery(...)
method so that I can handle queries that contain spaces and wildcards.
I naively tried to construct a WildcardQuery object from the query text
but that didn't seem to work.  What sort of Query object(s) should I be
using here?  (Note: the field I'm working with is an untokenized field).

Thanks,
Laurent

-Original Message-
From:
solr-user-return-20352-laurent.vauthrin=disney@lucene.apache.org
[mailto:solr-user-return-20352-laurent.vauthrin=disney@lucene.apache
.org] On Behalf Of Otis Gospodnetic
Sent: Wednesday, April 01, 2009 9:11 AM
To: solr-user@lucene.apache.org
Subject: Re: Wildcard searches


Hi,

Another option for 1) is to use n-grams with token begin/end symbols.
Then you won't need to use wildcards at all, but you'll have a larger
index.

2) may be added to Lucene in the near future, actually, I saw a related
JIRA issue.  But in the mean time, yes, you coul dimplement it via a
custom QParserPlugin.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: "Vauthrin, Laurent" 
> To: solr-user@lucene.apache.org
> Sent: Monday, March 30, 2009 5:45:30 PM
> Subject: Wildcard searches
> 
> Hello again,
> 
> I'm in the process of converting one of our services that was
previously
> using Lucene to use Solr instead.  The main focus here is to preserve
> backwards compatibility (even if some searches are not as efficient).
> There are currently two scenarios that are giving me problems right
now.
> 
> 1. Leading wildcard searches/suffix searches (e.g. *ickey)
> I've looked at https://issues.apache.org/jira/browse/SOLR-218.  Is the
> best approach to create a QParserPlugin and change the parser to allow
> leading wildcards - setAllowLeadingWildcard(true)?  At the moment
we're
> trying to avoid indexing terms in reverse order.
> 
> 2. Phrase searches with wildcards (e.g. "Mickey Mou*")
> From what I understand, Solr/Lucene doesn't support this but we used
to
> get results with the following code:
> 
> new WildcardQuery(new Term("U_name", " Mickey Mou*"))
> 
> Is it possible for me to allow this capability in a QParserPlugin?  Is
> there another way for me to do it?
> 
> Thanks,
> Laurent Vauthrin



solr 1.4 facet boost field according to another field

2009-04-06 Thread sunnyfr

Hi,

I've title description and tag field ... According to where I find the word
searched, I would like to boost differently other field like nb_views or
rating.

if word is find in title then nb_views^10 and rating^10
if word is find in description then nb_views^2 and rating^2

Thanks a lot for your help,
-- 
View this message in context: 
http://www.nabble.com/solr-1.4-facet-boost-field-according-to-another-field-tp22913642p22913642.html
Sent from the Solr - User mailing list archive at Nabble.com.



solr 1.4 indexation or request > memory

2009-04-06 Thread sunnyfr

Hi 

I would like to know if it use less memory to facet or put weight to a field
when I index it then when I make a dismax request. 

Thanks,

-- 
View this message in context: 
http://www.nabble.com/solr-1.4-indexation-or-request-%3E-memory-tp22913679p22913679.html
Sent from the Solr - User mailing list archive at Nabble.com.



solr 1.4 memory jvm

2009-04-06 Thread sunnyfr

Hi,

Sorry I can't find and issue, during my replication my respond time query
goes very slow.
I'm using replication handler, is there a way to slow down debit or ???

11G index size
8G ram 
20 requests/sec 
Java HotSpot(TM) 64-Bit Server VM


10.0-b22
Java HotSpot(TM) 64-Bit Server VM
4

-Xms4G
-Xmx5G
-XX:ScavengeBeforeFullGC
-XX:+UseConcMarkSweepGC
-XX:+HeapDumpOnOutOfMemoryError
-Xloggc:/data/solr/logs/gc.log
-XX:+PrintGCDetails
-XX:+PrintGCTimeStam
−


Is it a problem ?? 
0.21
(error executing: uname -a)
(error executing: ulimit -n)
(error executing: uptime)

Thanks

-- 
View this message in context: 
http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22913742.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Remote Access To Schema Data

2009-04-06 Thread Fink, Clayton R.
The LukeRequest class gets me what I wanted. Thanks! 

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Friday, April 03, 2009 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Remote Access To Schema Data

On 4/3/09, Erik Hatcher  wrote:
>
> On Apr 3, 2009, at 9:26 AM, Shalin Shekhar Mangar wrote:
>> Note that the luke handler gives out a lot of information like term 
>> frequency and therefore takes a longer time to execute.
>
> It's fast if you say &numTerms=0 though, which is good enough to get 
> field/type info.

Nice. I didn't know that. Thanks Erik.

--
Regards,
Shalin Shekhar Mangar.


Term Counts/Term Frequency Vector Info

2009-04-06 Thread Fink, Clayton R.
I want the functionality that Lucene IndexReader.termDocs gives me. That or 
access on the document level to the term vector. This 
(http://wiki.apache.org/solr/TermVectorComponent?highlight=(term)|(vector) 
seems to suggest that this will be available in 1.4. Is there any way to do 
this in 1.3?

Thanks,

Clay



Re: Searching on mulit-core Solr

2009-04-06 Thread vivek sar
Hi,

  Any help on this. I've looked at DistributedSearch on Wiki, but that
doesn't seem to be working for me on multi-core and multiple Solr
instances on the same box.

Scenario,

1) Two boxes (localhost, 10.4.x.x)
2) Two Solr instances on each box (8080 and 8085 ports)
3) Two cores on each instance (core0, core1)

I'm not sure how to construct my search on the above setup if I need
to search across all the cores on all the boxes. Here is what I'm
trying,

http://localhost:8080/solr/core0/select?shards=localhost:8080/solr/core0,localhost:8085/solr/core0,localhost:8080/solr/core1,localhost:8085/solr/core1,10.4.x.x:8080/solr/core0,10.4.x.x:8085/solr/core0,10.4.x.x:8080/solr/core1,10.4.x.x:8085/solr/core1&indent=true&q=vivek+japan

I get 404 error. Is this the right URL construction for my setup? How
else can I do this?

Thanks,
-vivek

On Fri, Apr 3, 2009 at 1:02 PM, vivek sar  wrote:
> Hi,
>
>  I've a multi-core system (one core per day), so there would be around
> 30 cores in a month on a box running one Solr instance. We have two
> boxes running the Solr instance and input data is feeded to them in
> round-robin fashion. Each box can have up to 30 cores in a month. Here
> are questions,
>
>  1) How would I search for a term in multiple cores on same box?
>
>  Single core I'm able to search like,
>   http://localhost:8080/solr/20090402/select?q=*:*
>
> 2) How would I search for a term in multiple cores on both boxes at
> the same time?
>
> 3) Is it possible to have two Solr instances on one box with one doing
> the indexing and other perform only searches on that index? The idea
> is have two JVMs with each doing its own task - I'm not sure whether
> the indexer process needs to know about searcher process - like do
> they need to have the same solr.xml (for multicore etc). We don't want
> to replicate the indexes also (we got very light search traffic, but
> very high indexing traffic) so they need to use the same index.
>
>
> Thanks,
> -vivek
>


Coming up with a model of memory usage

2009-04-06 Thread Joe Pollard
To combat our frequent OutOfMemory Exceptions, I'm attempting to come up
with a model so that we can determine how much memory to give Solr based
on how much data we have (as we expand to more data types eligible to be
supported this becomes more important).

Are there any published guidelines on how much memory a particular
document takes up in memory, based on the data types, etc?

I have several stored fields, numerous other non-stored fields, a
largish copyTo field, and I am doing some sorting on indexed, non-stored
fields.

Any pointers would be appreciated!

Thanks,
-Joe



Re: Searching on mulit-core Solr

2009-04-06 Thread Fergus McMenemie
vivek,

404 from the URL you provided in the message! Similar URLs work
OK for me.

hmm try http://localhost:8080/solr/admin/cores?action=status and see 
if that gives a 404.

Also are you running a nightly build or a svn checkout? Using tomcat?
Perhaps it should be

http://localhost:8080/apache-solr-1.4-dev/admin/cores?action=status

Fergus.

>Hi,
>
>  Any help on this. I've looked at DistributedSearch on Wiki, but that
>doesn't seem to be working for me on multi-core and multiple Solr
>instances on the same box.
>
>Scenario,
>
>1) Two boxes (localhost, 10.4.x.x)
>2) Two Solr instances on each box (8080 and 8085 ports)
>3) Two cores on each instance (core0, core1)
>
>I'm not sure how to construct my search on the above setup if I need
>to search across all the cores on all the boxes. Here is what I'm
>trying,
>
>http://localhost:8080/solr/core0/select?shards=localhost:8080/solr/core0,localhost:8085/solr/core0,localhost:8080/solr/core1,localhost:8085/solr/core1,10.4.x.x:8080/solr/core0,10.4.x.x:8085/solr/core0,10.4.x.x:8080/solr/core1,10.4.x.x:8085/solr/core1&indent=true&q=vivek+japan
>
>I get 404 error. Is this the right URL construction for my setup? How
>else can I do this?
>
>Thanks,
>-vivek
>
>On Fri, Apr 3, 2009 at 1:02 PM, vivek sar  wrote:
>> Hi,
>>
>>  I've a multi-core system (one core per day), so there would be around
>> 30 cores in a month on a box running one Solr instance. We have two
>> boxes running the Solr instance and input data is feeded to them in
>> round-robin fashion. Each box can have up to 30 cores in a month. Here
>> are questions,
>>
>>  1) How would I search for a term in multiple cores on same box?
>>
>>  Single core I'm able to search like,
>>   http://localhost:8080/solr/20090402/select?q=*:*
>>
>> 2) How would I search for a term in multiple cores on both boxes at
>> the same time?
>>
>> 3) Is it possible to have two Solr instances on one box with one doing
>> the indexing and other perform only searches on that index? The idea
>> is have two JVMs with each doing its own task - I'm not sure whether
>> the indexer process needs to know about searcher process - like do
>> they need to have the same solr.xml (for multicore etc). We don't want
>> to replicate the indexes also (we got very light search traffic, but
>> very high indexing traffic) so they need to use the same index.
>>
>>
>> Thanks,
>> -vivek
>>

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: Term Counts/Term Frequency Vector Info

2009-04-06 Thread Grant Ingersoll

See also http://wiki.apache.org/solr/TermsComponent

You might be able to apply these patches to 1.3 and have them work,  
but there is no guarantee.  You also can get some termDocs like  
capabilities through Solr's faceting capabilities, but I am not aware  
of any way to get at the term vector capabilities.


HTH,
Grant

On Apr 6, 2009, at 1:49 PM, Fink, Clayton R. wrote:

I want the functionality that Lucene IndexReader.termDocs gives me.  
That or access on the document level to the term vector. This (http://wiki.apache.org/solr/TermVectorComponent?highlight=(term 
)|(vector) seems to suggest that this will be available in 1.4. Is  
there any way to do this in 1.3?


Thanks,

Clay



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: solr 1.4 memory jvm

2009-04-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
hi sunnyfr,

I wish to clarify something.

you say that the performance is poor "during" the replication.

I suspect that the performance is poor soon after the replication. The
reason being , replication is a low CPU activity. If you think
otherwise let me know how you found it out.

If the perf is low soon after the replication is completed. I mean the
index files are downloaded and the searcher is getting opened, it is
understandable. That is the time when warming is done. have you setup
auto warming?

On Mon, Apr 6, 2009 at 11:12 PM, sunnyfr  wrote:
>
> Hi,
>
> Sorry I can't find and issue, during my replication my respond time query
> goes very slow.
> I'm using replication handler, is there a way to slow down debit or ???
>
> 11G index size
> 8G ram
> 20 requests/sec
> Java HotSpot(TM) 64-Bit Server VM
>
> 
> 10.0-b22
> Java HotSpot(TM) 64-Bit Server VM
> 4
>
> -Xms4G
> -Xmx5G
> -XX:ScavengeBeforeFullGC
> -XX:+UseConcMarkSweepGC
> -XX:+HeapDumpOnOutOfMemoryError
> -Xloggc:/data/solr/logs/gc.log
> -XX:+PrintGCDetails
> -XX:+PrintGCTimeStam
> -
> 
>
> Is it a problem ??
> 0.21
> (error executing: uname -a)
> (error executing: ulimit -n)
> (error executing: uptime)
>
> Thanks
>
> --
> View this message in context: 
> http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22913742.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


response time

2009-04-06 Thread CIF Search
Hi,

I have around 10 solr servers running indexes of around 80-85 GB each and
and with 16,000,000 docs each. When i use distrib for querying, I am not
getting a satisfactory response time. My response time is around 4-5
seconds. Any suggestions to improve the response time for queries (to bring
it below 1 second). Is the response slow due to the size of the index ? I
have already gone through the pointers provided at:
http://wiki.apache.org/solr/SolrPerformanceFactors

Regards,
CI