Re: Cache full text into memory

2010-07-14 Thread findbestopensource
You have two options
1. Store the compressed text as part of stored field in Solr.
2. Using external caching.
http://www.findbestopensource.com/tagged/distributed-caching
You could use ehcache / Memcache / Membase.

The problem with external caching is you need to synchronize the deletions
and modification. Fetching the stored field from Solr is also faster.

Regards
Aditya
www.findbestopensource.com


On Wed, Jul 14, 2010 at 12:08 PM, Li Li  wrote:

> I want to cache full text into memory to improve performance.
> Full text is only used to highlight in my application(But it's very
> time consuming, My avg query time is about 250ms, I guess it will cost
> about 50ms if I just get top 10 full text. Things get worse when get
> more full text because in disk, it scatters erverywhere for a query.).
> My full text per machine is about 200GB. The memory available for
> store full text is about 10GB. So I want to compress it in memory.
> Suppose compression ratio is 1:5, then I can load 1/4 full text in
> memory. I need a Cache component for it. Has anyone faced the problem
> before? I need some advice. Is it possbile using external tools such
> as MemCached? Thank you.
>


Re: Ranking position in solr

2010-07-14 Thread Chamnap Chhorn
I sent this command: curl http://localhost:8081/solr/update -F stream.body='
', but it doesn't reload.

It doesn't reload automatically after every commit or optimize unless I add
new document then i commit.

Any idea?

On Tue, Jul 13, 2010 at 4:54 PM, Ahmet Arslan  wrote:

> > I'm using solr 1.4 and only one core.
> > The elevate xml file is quite big, and
> > i wonder can solr handle that? How to reload the core?
>
> Markus Jelsma's suggestion is more robust. You don't need to restart or
> reload anything. Put elevate.xml under data directory. It will reloaded
> automatically after every commit or optimize.
>
>
>
>


-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Re: Cache full text into memory

2010-07-14 Thread Li Li
I have already store it in lucene index. But it is in disk and When a
query come, it must seek the disk to get it. I am not familiar with
lucene cache. I just want to fully use my memory that load 10GB of it
in memory and a LRU stragety when cache full. To load more into
memory, I want to compress it "in memory". I don't care much about
disk space so whether or not it's compressed in lucene .

2010/7/14 findbestopensource :
> You have two options
> 1. Store the compressed text as part of stored field in Solr.
> 2. Using external caching.
> http://www.findbestopensource.com/tagged/distributed-caching
>    You could use ehcache / Memcache / Membase.
>
> The problem with external caching is you need to synchronize the deletions
> and modification. Fetching the stored field from Solr is also faster.
>
> Regards
> Aditya
> www.findbestopensource.com
>
>
> On Wed, Jul 14, 2010 at 12:08 PM, Li Li  wrote:
>
>>     I want to cache full text into memory to improve performance.
>> Full text is only used to highlight in my application(But it's very
>> time consuming, My avg query time is about 250ms, I guess it will cost
>> about 50ms if I just get top 10 full text. Things get worse when get
>> more full text because in disk, it scatters erverywhere for a query.).
>> My full text per machine is about 200GB. The memory available for
>> store full text is about 10GB. So I want to compress it in memory.
>> Suppose compression ratio is 1:5, then I can load 1/4 full text in
>> memory. I need a Cache component for it. Has anyone faced the problem
>> before? I need some advice. Is it possbile using external tools such
>> as MemCached? Thank you.
>>
>


Re: ShingleFilter failing with more terms than index phrase

2010-07-14 Thread Ethan Collins
Hi Steve,

Thanks for your kind response. I checked PositionfilterFactory
(re-index as well) but that also didn't solve the problem. Interesting
the problem is not reproduceable from Solr's Field Analysis page, it
manifests only when it's in a query.

I guess the subject for this post is not very correct, it's not that
ShingleFilter is failing but -- using ShingleFilter, there is no score
provided by the shingle field when I pass more terms than the indexed
terms. I observe this using debugQuery.

I had actually posted to solr-user but received no response yet.
Probably because the problem is not clear at first glance. However,
there's an example I have put in the mail for someone interested to
try out and check if there's a problem. Let's see if I receive any
response.

-Ethan

On Tue, Jul 13, 2010 at 9:15 PM, Steven A Rowe  wrote:
> Hi Ethan,
>
> You'll probably get better answers about Solr specific stuff on the 
> solr-u...@a.l.o list.
>
> Check out PositionFilterFactory - it may address your issue:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
>
> Steve
>
>> -Original Message-
>> From: Ethan Collins [mailto:collins.eth...@gmail.com]
>> Sent: Tuesday, July 13, 2010 3:42 AM
>> To: java-u...@lucene.apache.org
>> Subject: ShingleFilter failing with more terms than index phrase
>>
>> I am using lucene 2.9.3 (via Solr 1.4.1) on windows and am trying to
>> understand ShingleFilter. I wrote the following code and find that if I
>> provide more words than the actual phrase indexed in the field, then the
>> search on that field fails (no score found with debugQuery=true).
>>
>> Here is an example to reproduce, with field names:
>> Id: 1
>> title_1: Nina Simone
>> title_2: I put a spell on you
>>
>> Query (dismax) with:
>> - “Nina Simone I put”  <- Fails i.e. no score shown from title_1 search
>> (using debugQuery)
>> - “Nina Simone” <- SUCCESS
>>
>> But, when I used Solr’s Field Analysis with the ‘shingle’ field (given
>> below) and tried “Nina Simone I put”, it succeeds. It’s only during the
>> query that no score is provided. I also checked ‘parsedquery’ and it shows
>> disjunctionMaxQuery issuing the string “Nina_Simone Simone_I I_put” to the
>> title_1 field.
>>
>> title_1 and title_2 fields are of type ‘shingle’, defined as:
>>
>>    > positionIncrementGap="100" indexed="true" stored="true">
>>        
>>            
>>            
>>            > maxShingleSize="2" outputUnigrams="false"/>
>>        
>>        
>>            
>>            
>>            > maxShingleSize="2" outputUnigrams="false"/>
>>        
>>    
>>
>> Note that I also have a catchall field which is text. I have qf set
>> to: 'id^2 catchall' and pf set to: 'title_1^1.5 title_2^1.2'
>>
>> If I am missing something or doing something wrong please let me know.
>>
>> -Ethan
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Cache full text into memory

2010-07-14 Thread findbestopensource
I have just provided you two options. Since you already store as part of the
index, You could try external caching. Try using ehcache / Membase
http://www.findbestopensource.com/tagged/distributed-caching . The caching
system will do LRU and is much more efficient.

On Wed, Jul 14, 2010 at 12:39 PM, Li Li  wrote:

> I have already store it in lucene index. But it is in disk and When a
> query come, it must seek the disk to get it. I am not familiar with
> lucene cache. I just want to fully use my memory that load 10GB of it
> in memory and a LRU stragety when cache full. To load more into
> memory, I want to compress it "in memory". I don't care much about
> disk space so whether or not it's compressed in lucene .
>
> 2010/7/14 findbestopensource :
>  > You have two options
> > 1. Store the compressed text as part of stored field in Solr.
> > 2. Using external caching.
> > http://www.findbestopensource.com/tagged/distributed-caching
> >You could use ehcache / Memcache / Membase.
> >
> > The problem with external caching is you need to synchronize the
> deletions
> > and modification. Fetching the stored field from Solr is also faster.
> >
> > Regards
> > Aditya
> > www.findbestopensource.com
> >
> >
> > On Wed, Jul 14, 2010 at 12:08 PM, Li Li  wrote:
> >
> >> I want to cache full text into memory to improve performance.
> >> Full text is only used to highlight in my application(But it's very
> >> time consuming, My avg query time is about 250ms, I guess it will cost
> >> about 50ms if I just get top 10 full text. Things get worse when get
> >> more full text because in disk, it scatters erverywhere for a query.).
> >> My full text per machine is about 200GB. The memory available for
> >> store full text is about 10GB. So I want to compress it in memory.
> >> Suppose compression ratio is 1:5, then I can load 1/4 full text in
> >> memory. I need a Cache component for it. Has anyone faced the problem
> >> before? I need some advice. Is it possbile using external tools such
> >> as MemCached? Thank you.
> >>
> >
>


Re: Ranking position in solr

2010-07-14 Thread Ahmet Arslan
> I sent this command: curl http://localhost:8081/solr/update -F stream.body='
> ', but it doesn't reload.
> 
> It doesn't reload automatically after every commit or
> optimize unless I add
> new document then i commit.

Hmm. May be there is an easier way to force it? (add empty/dummy doc)
But if you are okey with the core reload/restart you can use this custom code 
to do it. You need to register this in solrconfig.xml. 

If you don't want to use custom code you need to use 
http://wiki.apache.org/solr/CoreAdmin#RELOAD

public class DummyRequestHandler extends RequestHandlerBase {

    public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) 
throws Exception {
        try {
            req.getCore().getCoreDescriptor().getCoreContainer().reload("");
            rsp.add("message", "core reloaded successfully");
        } catch (final Throwable t) {
            rsp.add("message", t.getMessage());
        }
    }







MultiValue dynamicField and copyField

2010-07-14 Thread Jan Simon Winkelmann
Hi everyone,

i was wondering if the following was possible somehow:





As in: using copyField to copy a multiValued field into another multiValued 
field.

Cheers,
Jan


Re: ShingleFilter failing with more terms than index phrase

2010-07-14 Thread Ethan Collins
Hi Steve,

Thanks, wrapping with PositionFilter actually worked the search and
score -- I made a mistake while re-indexing last time.

Trying to analyze PositionFilter: didn't understand why earlier the
search of 'Nina Simone I Put' failed since atleast the phrase 'Nina
Simone' should have matched against title_0 field. Any clue?

I am also trying to understand the impact of PositionFilter on phrase
search quality and score. Unfortunately there are not enough
literature/help put up by google.

-Ethan


Re: Cache full text into memory

2010-07-14 Thread Li Li
Thank you. I don't know which cache system to use. In my application,
the cache system must support compression algorithm which has high
compression ratio and fast decompression speed(because each time it
get from cache, it must decompress).

2010/7/14 findbestopensource :
> I have just provided you two options. Since you already store as part of the
> index, You could try external caching. Try using ehcache / Membase
> http://www.findbestopensource.com/tagged/distributed-caching . The caching
> system will do LRU and is much more efficient.
>
> On Wed, Jul 14, 2010 at 12:39 PM, Li Li  wrote:
>
>> I have already store it in lucene index. But it is in disk and When a
>> query come, it must seek the disk to get it. I am not familiar with
>> lucene cache. I just want to fully use my memory that load 10GB of it
>> in memory and a LRU stragety when cache full. To load more into
>> memory, I want to compress it "in memory". I don't care much about
>> disk space so whether or not it's compressed in lucene .
>>
>> 2010/7/14 findbestopensource :
>>  > You have two options
>> > 1. Store the compressed text as part of stored field in Solr.
>> > 2. Using external caching.
>> > http://www.findbestopensource.com/tagged/distributed-caching
>> >    You could use ehcache / Memcache / Membase.
>> >
>> > The problem with external caching is you need to synchronize the
>> deletions
>> > and modification. Fetching the stored field from Solr is also faster.
>> >
>> > Regards
>> > Aditya
>> > www.findbestopensource.com
>> >
>> >
>> > On Wed, Jul 14, 2010 at 12:08 PM, Li Li  wrote:
>> >
>> >>     I want to cache full text into memory to improve performance.
>> >> Full text is only used to highlight in my application(But it's very
>> >> time consuming, My avg query time is about 250ms, I guess it will cost
>> >> about 50ms if I just get top 10 full text. Things get worse when get
>> >> more full text because in disk, it scatters erverywhere for a query.).
>> >> My full text per machine is about 200GB. The memory available for
>> >> store full text is about 10GB. So I want to compress it in memory.
>> >> Suppose compression ratio is 1:5, then I can load 1/4 full text in
>> >> memory. I need a Cache component for it. Has anyone faced the problem
>> >> before? I need some advice. Is it possbile using external tools such
>> >> as MemCached? Thank you.
>> >>
>> >
>>
>


Re: ShingleFilter failing with more terms than index phrase

2010-07-14 Thread Ethan Collins
> Trying to analyze PositionFilter: didn't understand why earlier the
> search of 'Nina Simone I Put' failed since atleast the phrase 'Nina
> Simone' should have matched against title_0 field. Any clue?

Please note that I have configure the ShingleFilter as bigrams without unigrams.

[Honestly, I am still struggling to understand how this worked and the
earlier one didn't]

-Ethan


Re: Cache full text into memory

2010-07-14 Thread findbestopensource
I doubt about it. Caching system is a key value store. You have to use some
compression library to compress and decompress your data. Caching system
helps to retrieve fast. Anyways please take a look of each of the caching
system features.

Regards
Aditya
www.findbestopensource.com



On Wed, Jul 14, 2010 at 3:06 PM, Li Li  wrote:

> Thank you. I don't know which cache system to use. In my application,
> the cache system must support compression algorithm which has high
> compression ratio and fast decompression speed(because each time it
> get from cache, it must decompress).
>
> 2010/7/14 findbestopensource :
> > I have just provided you two options. Since you already store as part of
> the
> > index, You could try external caching. Try using ehcache / Membase
> > http://www.findbestopensource.com/tagged/distributed-caching . The
> caching
> > system will do LRU and is much more efficient.
> >
> > On Wed, Jul 14, 2010 at 12:39 PM, Li Li  wrote:
> >
> >> I have already store it in lucene index. But it is in disk and When a
> >> query come, it must seek the disk to get it. I am not familiar with
> >> lucene cache. I just want to fully use my memory that load 10GB of it
> >> in memory and a LRU stragety when cache full. To load more into
> >> memory, I want to compress it "in memory". I don't care much about
> >> disk space so whether or not it's compressed in lucene .
> >>
> >> 2010/7/14 findbestopensource :
> >>  > You have two options
> >> > 1. Store the compressed text as part of stored field in Solr.
> >> > 2. Using external caching.
> >> > http://www.findbestopensource.com/tagged/distributed-caching
> >> >You could use ehcache / Memcache / Membase.
> >> >
> >> > The problem with external caching is you need to synchronize the
> >> deletions
> >> > and modification. Fetching the stored field from Solr is also faster.
> >> >
> >> > Regards
> >> > Aditya
> >> > www.findbestopensource.com
> >> >
> >> >
> >> > On Wed, Jul 14, 2010 at 12:08 PM, Li Li  wrote:
> >> >
> >> >> I want to cache full text into memory to improve performance.
> >> >> Full text is only used to highlight in my application(But it's very
> >> >> time consuming, My avg query time is about 250ms, I guess it will
> cost
> >> >> about 50ms if I just get top 10 full text. Things get worse when get
> >> >> more full text because in disk, it scatters erverywhere for a
> query.).
> >> >> My full text per machine is about 200GB. The memory available for
> >> >> store full text is about 10GB. So I want to compress it in memory.
> >> >> Suppose compression ratio is 1:5, then I can load 1/4 full text in
> >> >> memory. I need a Cache component for it. Has anyone faced the problem
> >> >> before? I need some advice. Is it possbile using external tools such
> >> >> as MemCached? Thank you.
> >> >>
> >> >
> >>
> >
>


DataImporter

2010-07-14 Thread Amdebirhan, Samson, VF-Group
Hi all,

 

Can someone help me in this ?

 

Importing 2 different entities one by one (specifying through the entity
parameter) why is the second import deleting the previous created index
for first entity and vice-versa?

 

The documentation provided by the solr website reports that :

 

"entity : Name of an entity directly under the  tag. Use this
to execute one or more entities selectively. Multiple 'entity'
parameters can be passed on to run multiple entities at once. If nothing
is passed , all entities are executed".

 

The problem results the deletion of index already present for other
entities.

 

 

Regards.

Sam

 

 

 



Re: DataImporter

2010-07-14 Thread Bilgin Ibryam
Is it possible that you have the same IDs in both entities?
Could you show here your entity mappings?

Bilgin Ibryam

On Wed, Jul 14, 2010 at 11:48 AM, Amdebirhan, Samson, VF-Group <
samson.amdebir...@vodafone.com> wrote:

> Hi all,
>
>
>
> Can someone help me in this ?
>
>
>
> Importing 2 different entities one by one (specifying through the entity
> parameter) why is the second import deleting the previous created index
> for first entity and vice-versa?
>
>
>
> The documentation provided by the solr website reports that :
>
>
>
> "entity : Name of an entity directly under the  tag. Use this
> to execute one or more entities selectively. Multiple 'entity'
> parameters can be passed on to run multiple entities at once. If nothing
> is passed , all entities are executed".
>
>
>
> The problem results the deletion of index already present for other
> entities.
>
>
>
>
>
> Regards.
>
> Sam
>
>
>
>
>
>
>
>


question on wild card

2010-07-14 Thread Mark N
I have a database field  = hello world and i am indexing to *text* field
with standard analyzer ( text is a copy field of solr)

Now when user  gives a query   text:"hello world%"  , how does the query is
interpreted in the background

are we actually searchingtext: hello OR  text: world%( consider by
default operator is OR )






-- 
Nipen Mark


RE: DataImporter

2010-07-14 Thread Amdebirhan, Samson, VF-Group
Hi Bilgin

It's right I have the same primary key, but testing with the property  
"preImportDeleteQuery" into the tag entity of the data_config.xml. So now it is 
working in fact it deletes only the indexs/docs for which I make the 
full-import based on the field I decleare for the preImportDeleteQuery.

Thanks for your time.




-Original Message-
From: Bilgin Ibryam [mailto:bibr...@gmail.com] 
Sent: mercoledì 14 luglio 2010 14.46
To: solr-user@lucene.apache.org
Subject: Re: DataImporter

Is it possible that you have the same IDs in both entities?
Could you show here your entity mappings?

Bilgin Ibryam

On Wed, Jul 14, 2010 at 11:48 AM, Amdebirhan, Samson, VF-Group <
samson.amdebir...@vodafone.com> wrote:

> Hi all,
>
>
>
> Can someone help me in this ?
>
>
>
> Importing 2 different entities one by one (specifying through the entity
> parameter) why is the second import deleting the previous created index
> for first entity and vice-versa?
>
>
>
> The documentation provided by the solr website reports that :
>
>
>
> "entity : Name of an entity directly under the  tag. Use this
> to execute one or more entities selectively. Multiple 'entity'
> parameters can be passed on to run multiple entities at once. If nothing
> is passed , all entities are executed".
>
>
>
> The problem results the deletion of index already present for other
> entities.
>
>
>
>
>
> Regards.
>
> Sam
>
>
>
>
>
>
>
>


Re: Strange "the" when search with dismax

2010-07-14 Thread kenf_nc

Sounds like you want the 'text' fieldType (or equivalent) and are using
'string' or 'lowercase'. Those must match all exactly (well, case
insensitively in the case of 'lowercase').  The TextType field types (like
'text') do tokenizations so matches will occur under many more conditions.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Strange-the-when-search-with-dismax-tp965473p966524.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MultiValue dynamicField and copyField

2010-07-14 Thread kenf_nc

Yep, my schema does this all day long.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiValue-dynamicField-and-copyField-tp965941p966536.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Strange "the" when search with dismax

2010-07-14 Thread Jonathan Rochkind
"the" sounds like it might be a stopword. Are you using stopwords in any
of your fields covered by the dismax search? But not in some of the
other fields covered by dismax? the combination of dismax and stopwords
can result in unexpected behavior if you aren't careful.

I wrote about this a bit here, you might find it helpful:
http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/

marship wrote:
> Hi. All.
>I am using solr dismax to search over my books in db. I indexed them all 
> using solr.
>the problem I noticed today is,
> Everything start with I want to search for a book "
> The Girl Who Kicked the Hornet's Nest
> "
> but nothing is returned. I'm sure I have this book in DB. So I stripped some 
> keyword and finally I found when I search for "the girl who kicked hornet's 
> nest" , I got the book.
> Then I test more
> when I search for "the first world war", solr return the book successfully to 
> me.
> But when I search for "the first world war the", solr returns NOTHING!
>
>
> So strange!
> So the issue is, if there are 2 "the" in query keywords, solr/dismax simply 
> return nothing!
>
>
> Why is this happening?
>
>
> Please help.
> Thanks.
> Regards.
> Scott
>
>
>
>   


DIH: post-delta-import DB cleanup hook?

2010-07-14 Thread Joachim M

I'm updating my solr index using a "queue" table in my database.  When
records get updated, a row gets inserted into the queue table with pk,
timestamp, deleted flag, and status.  DIH made it easy to use this to
identify new/udpated recods as well as deletes.

I need to do some post processing however to either 1) update the status
field so these queue records can be archived/deleted or 2) delete the queue
record.  This is a functional requirement, since DIH can handle the old
queue records in there.

I've perused the list and didn't see any concrete suggestions for handling
this.  The key requirement is that I'm not just blindly deleting from the
queue table but updating or deleting by pk that was _actually_ processed by
DIH.  This way, if there is an error (onError="skip") we can detect this and
follow up.

Any thoughts?  I'm using 1.4 but could move to trunk if there were a
solution there.

Thanks --Joachim
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-post-delta-import-DB-cleanup-hook-tp966672p966672.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: MultiValue dynamicField and copyField

2010-07-14 Thread Jan Simon Winkelmann
I figured out where the problem was. The destination wildcard was actually 
matching the wrong field. I changed the fieldnames around a bit and now 
everything works fine. Thanks!

> -Ursprüngliche Nachricht-
> Von: kenf_nc [mailto:ken.fos...@realestate.com]
> Gesendet: Mittwoch, 14. Juli 2010 15:56
> An: solr-user@lucene.apache.org
> Betreff: Re: MultiValue dynamicField and copyField
> 
> 
> Yep, my schema does this all day long.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/MultiValue-dynamicField-and-
> copyField-tp965941p966536.html
> Sent from the Solr - User mailing list archive at Nabble.com.


RE: Foreign characters question

2010-07-14 Thread Blargy

Thanks for the reply but that didnt help. 

Tomcat is accepting foreign characters but for some reason when it reads the
synonyms file and it encounters that character ñ it doesnt appear correctly
in the Field Analysis admin. It shows up as �. If I query exactly for ñ it
will work but the synonyms file is srcrewy.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p966740.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Foreign characters question

2010-07-14 Thread Robert Muir
is your synonyms file in UTF-8 encoding?

On Wed, Jul 14, 2010 at 11:11 AM, Blargy  wrote:

>
> Thanks for the reply but that didnt help.
>
> Tomcat is accepting foreign characters but for some reason when it reads
> the
> synonyms file and it encounters that character ñ it doesnt appear correctly
> in the Field Analysis admin. It shows up as �. If I query exactly for ñ it
> will work but the synonyms file is srcrewy.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p966740.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Robert Muir
rcm...@gmail.com


date boosting and dismax

2010-07-14 Thread Shawn Heisey
 I've started a couple of previous threads on this topic, but I did not 
have a good date field in my index to use at the time.  I now have a 
schema with the document's post_date in tdate format, so I would like to 
actually do some implementation.  Right now, we are not doing relevancy 
ranking at all - we sort by descending post_date.  We have been working 
on our application code so we can switch to dismax and use relevancy, 
but it's still important to have a small bias towards newer content.


The idea is nothing this list hasn't heard before - to give newer 
documents a slight relevancy boost.  An important sub-goal is to ensure 
that the adjustment doesn't render Solr's caches useless.  I'm thinking 
that this means that at a minimum, I need to round dates to a resolution 
of 1 day, but if it's doable, 1 week might be even better.  I do like 
the idea of having different boosts for different time ranges.


Can anyone give me a starting point on how to do this?  I will need 
actual URL examples and dismax configuration snippets.


Thanks,
Shawn



RE: date boosting and dismax

2010-07-14 Thread Tim Gilbert
I used this before my search term and it works well:

{!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)}

Its enough that when I search for *:* the articles appear in
chronological order.

Tim

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, July 14, 2010 11:47 AM
To: solr-user@lucene.apache.org
Subject: date boosting and dismax


  I've started a couple of previous threads on this topic, but I did not

have a good date field in my index to use at the time.  I now have a 
schema with the document's post_date in tdate format, so I would like to

actually do some implementation.  Right now, we are not doing relevancy 
ranking at all - we sort by descending post_date.  We have been working 
on our application code so we can switch to dismax and use relevancy, 
but it's still important to have a small bias towards newer content.

The idea is nothing this list hasn't heard before - to give newer 
documents a slight relevancy boost.  An important sub-goal is to ensure 
that the adjustment doesn't render Solr's caches useless.  I'm thinking 
that this means that at a minimum, I need to round dates to a resolution

of 1 day, but if it's doable, 1 week might be even better.  I do like 
the idea of having different boosts for different time ranges.

Can anyone give me a starting point on how to do this?  I will need 
actual URL examples and dismax configuration snippets.

Thanks,
Shawn



Re: Foreign characters question

2010-07-14 Thread Blargy

How can I tell and/or create a UTF-8 synonyms file? Do I have to instruct
solr that this file is UTF-8?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967037.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Foreign characters question

2010-07-14 Thread Blargy

Nevermind. Apparently my IDE (Netbeans) was set to "No encoding"... wtf.
Changed it to UTF-8 and recreated the file and all is good now. Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967058.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: date boosting and dismax

2010-07-14 Thread Shawn Heisey
One of the replies I got on a previous thread mentioned range queries, 
with this example:


[NOW-6MONTHS TO NOW]^5.0 ,
[NOW-1YEARS TO NOW-6MONTHS]^3.0
[NOW-2YEARS TO NOW-1YEARS]^2.0
[* TO NOW-2YEARS]^1.0

Something like this seems more flexible, and into it, I read an 
implication that the performance would be better than the boost function 
you've shown, but I don't know how to actually put it into a URL or 
handler config.


I also seem to remember seeing something about how to do "less than" in 
range queries as well as the "less than or equal to" implied by the 
above, but I cannot find it now.


Thanks,
Shawn


On 7/14/2010 10:26 AM, Tim Gilbert wrote:

I used this before my search term and it works well:

{!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)}

Its enough that when I search for *:* the articles appear in
chronological order.

Tim




RE: date boosting and dismax

2010-07-14 Thread Tim Gilbert
Re: flexibility.

This boost does decays over time, the further it gets from now the less
of a boost it receives.  You are right though, it doesn't allow a fine
degree of control, particularly if you don't want to smoothly decay the
boost.  I hadn't considered your suggestion, so I'll keep it in mind if
the need arises.

Re:  Adding boost to query:

I am no expert, but I did this and it worked:

SolrJ:  solrQuery.setQuery("{!boost
b=recip(ms(NOW,publishdate),3.16e-11,1,1)} " + queryparam);

Where queryparam is what you are searching for.  You quite literally
just prepend it.


Via http://localhost:8080/apache-solr-1.4.0/select, just prepend it to
your q= like this: 
q={!boost+b%3Drecip(ms(NOW,publishdate),3.16e-11,1,1)}+findthis

Tim

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, July 14, 2010 1:16 PM
To: solr-user@lucene.apache.org
Subject: Re: date boosting and dismax

One of the replies I got on a previous thread mentioned range queries, 
with this example:

[NOW-6MONTHS TO NOW]^5.0 ,
[NOW-1YEARS TO NOW-6MONTHS]^3.0
[NOW-2YEARS TO NOW-1YEARS]^2.0
[* TO NOW-2YEARS]^1.0

Something like this seems more flexible, and into it, I read an 
implication that the performance would be better than the boost function

you've shown, but I don't know how to actually put it into a URL or 
handler config.

I also seem to remember seeing something about how to do "less than" in 
range queries as well as the "less than or equal to" implied by the 
above, but I cannot find it now.

Thanks,
Shawn


On 7/14/2010 10:26 AM, Tim Gilbert wrote:
> I used this before my search term and it works well:
>
> {!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)}
>
> Its enough that when I search for *:* the articles appear in
> chronological order.
>
> Tim



Re: Foreign characters question

2010-07-14 Thread Robert Muir
On Wed, Jul 14, 2010 at 12:59 PM, Blargy  wrote:

>
> Nevermind. Apparently my IDE (Netbeans) was set to "No encoding"... wtf.
> Changed it to UTF-8 and recreated the file and all is good now. Thanks!
>
>
fyi I created an issue with your example here:
https://issues.apache.org/jira/browse/SOLR-2003

In this case, the wrong encoding could have been detected and saved you some
time...

-- 
Robert Muir
rcm...@gmail.com


Re: dismax and date boosts

2010-07-14 Thread Shawn Heisey

 I have finally figured out how to turn this off in Thunderbird 3:

Go to Tools, Options, Display, and turn off "Display emoticons as 
graphics".


On 4/12/2010 12:04 PM, Shawn Heisey wrote:

On 4/12/2010 11:55 AM, Shawn Heisey wrote:

[NOW-6MONTHS TO NOW]^5.0 ,
[NOW-1YEARS TO NOW-6MONTHS]^3.0
[NOW-2YEARS TO NOW-1YEARS]^2.0
[* TO NOW-2YEARS]^1.0


And here we have the perfect example of something I mentioned a while 
ago - my Thunderbird (v3.0.4 on Win7) turning Solr boost syntax into 
pretty exponents.  Attached a cropped screenshot.  If anyone has any 
idea how to turn this off, I would appreciate knowing how.


Shawn





Re: date boosting and dismax

2010-07-14 Thread Jonathan Rochkind

Shawn Heisey wrote:

[* TO NOW-2YEARS]^1.0
  


I also seem to remember seeing something about how to do "less than" in 
range queries as well as the "less than or equal to" implied by the 
above, but I cannot find it now.
  
Ranges with square brackets [] are inclusive. Ranges with parens () are 
exclusive.  And you have a less than example above:


[* TO value]   is a 'less than or equal to value' (inclusive)
(* TO value) is a 'less than not including value' (exclusive)

Now, if you want inclusive on one end but exclusive on the other, I'm 
pretty sure you're out of luck. :)


Jonathan



Re: Using hl.regex.pattern to print complete lines

2010-07-14 Thread Peter Spam
Any other thoughts, Chris?  I've been messing with this a bit, and can't seem 
to get (?m)^.*$ to do what I want.

1) I don't care how many characters it returns, I'd like entire lines all the 
time
2) I just want it to always return 3 lines: the line before, the actual line, 
and the line after.
3) This should be like "grep -C1"

Thanks for your time!


-Pete

On Jul 9, 2010, at 12:08 AM, Peter Spam wrote:

> Ah, this makes sense.  I've changed my regex to "(?m)^.*$", and it works 
> better, but I still get fragments before and after some returns.
> Thanks for the hint!
> 
> 
> -Pete
> 
> On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote:
> 
>> 
>> : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
>> : is available that is for getting entire field contents with search terms
>> : highlighted. To use it, set hl.useFastVectorHighlighter to true.
>> 
>> He doesn't want the entire field -- his stored field values contain 
>> multi-line strings (using newline characters) and he wants to make 
>> fragments per "line" (ie: bounded by newline characters, or the start/end 
>> of the entire field value)
>> 
>> Peter: i haven't looked at the code, but i expect that the problem is that 
>> the java regex engine isn't being used in a way that makes ^ and $ match 
>> any line boundary -- they are probably only matching the start/end of the 
>> field (and . is probably only matching non-newline characters)
>> 
>> java regexes support embedded flags (ie: "(?xyz)your regex") so you might 
>> try that (i don't remember what the correct modifier flag is for the 
>> multiline mode off the top of my head)
>> 
>> -Hoss
>> 
> 



Multiple cores or not?

2010-07-14 Thread scrapy

 

 Hi,

We are planning to host on same server different website that will use solr.

What will be the best?

One core with a field i schema: site1, site2 etc... and then add this in every 
query

Or one core per site?

Thanks for your help





limiting the total number of documents matched

2010-07-14 Thread Paul
I'd like to limit the total number of documents that are returned for
a search, particularly when the sort order is not based on relevancy.

In other words, if the user searches for a very common term, they
might get tens of thousands of hits, and if they sort by "title", then
very high relevancy documents will be interspersed with very low
relevancy documents. I'd like to set a limit to the 1000 most relevant
documents, then sort those by title.

Is there a way to do this?

I guess I could always retrieve the top 1000 documents and sort them
in the client, but that seems particularly inefficient. I can't find
any other way to do this, though.

Thanks,
Paul


RE: limiting the total number of documents matched

2010-07-14 Thread Nagelberg, Kallin
So you want to take the top 1000 sorted by score, then sort those by another 
field. It's a strange case, and I can't think of a clean way to accomplish it. 
You could do it in two queries, where the first is by score and you only 
request your IDs to keep it snappy, then do a second query against the IDs and 
sort by your other field. 1000 seems like a lot for that approach, but who 
knows until you try it on your data.

-Kallin Nagelberg 


-Original Message-
From: Paul [mailto:p...@nines.org] 
Sent: Wednesday, July 14, 2010 4:16 PM
To: solr-user
Subject: limiting the total number of documents matched

I'd like to limit the total number of documents that are returned for
a search, particularly when the sort order is not based on relevancy.

In other words, if the user searches for a very common term, they
might get tens of thousands of hits, and if they sort by "title", then
very high relevancy documents will be interspersed with very low
relevancy documents. I'd like to set a limit to the 1000 most relevant
documents, then sort those by title.

Is there a way to do this?

I guess I could always retrieve the top 1000 documents and sort them
in the client, but that seems particularly inefficient. I can't find
any other way to do this, though.

Thanks,
Paul


setting up clustering

2010-07-14 Thread Justin Lolofie
I'm trying to enable clustering in solr 1.4. I'm following these instructions:

http://wiki.apache.org/solr/ClusteringComponent

However, `ant get-libraries` fails for me. Before it tries to download
the 4 jar files, it tries to compile lucene? Is this necessary?

Has anyone gotten clustering working properly?

My next attempt was to just copy contrib/clustering/lib/*.jar and
contrib/clustering/lib/downloads/*.jar to WEB-INF/lib and enable
clustering in solrconfig.xml, but this doesnt work either and I cant
tell from the error log whether it just couldnt find the jar files or
if there is some other problem:

SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.clustering.ClusteringComponent'


Re: limiting the total number of documents matched

2010-07-14 Thread Paul
I was hoping for a way to do this purely by configuration and making
the correct GET requests, but if there is a way to do it by creating a
custom Request Handler, I suppose I could plunge into that. Would that
yield the best results, and would that be particularly difficult?

On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin
 wrote:
> So you want to take the top 1000 sorted by score, then sort those by another 
> field. It's a strange case, and I can't think of a clean way to accomplish 
> it. You could do it in two queries, where the first is by score and you only 
> request your IDs to keep it snappy, then do a second query against the IDs 
> and sort by your other field. 1000 seems like a lot for that approach, but 
> who knows until you try it on your data.
>
> -Kallin Nagelberg
>
>
> -Original Message-
> From: Paul [mailto:p...@nines.org]
> Sent: Wednesday, July 14, 2010 4:16 PM
> To: solr-user
> Subject: limiting the total number of documents matched
>
> I'd like to limit the total number of documents that are returned for
> a search, particularly when the sort order is not based on relevancy.
>
> In other words, if the user searches for a very common term, they
> might get tens of thousands of hits, and if they sort by "title", then
> very high relevancy documents will be interspersed with very low
> relevancy documents. I'd like to set a limit to the 1000 most relevant
> documents, then sort those by title.
>
> Is there a way to do this?
>
> I guess I could always retrieve the top 1000 documents and sort them
> in the client, but that seems particularly inefficient. I can't find
> any other way to do this, though.
>
> Thanks,
> Paul
>


Re: limiting the total number of documents matched

2010-07-14 Thread Paul
I thought of another way to do it, but I still have one thing I don't
know how to do. I could do the search without sorting for the 50th
page, then look at the relevancy score on the first item on that page,
then repeat the search, but add score > that relevancy as a parameter.
Is it possible to do a search with "score:[5 to *]"? It didn't work in
my first attempt.

On Wed, Jul 14, 2010 at 5:34 PM, Paul  wrote:
> I was hoping for a way to do this purely by configuration and making
> the correct GET requests, but if there is a way to do it by creating a
> custom Request Handler, I suppose I could plunge into that. Would that
> yield the best results, and would that be particularly difficult?
>
> On Wed, Jul 14, 2010 at 4:37 PM, Nagelberg, Kallin
>  wrote:
>> So you want to take the top 1000 sorted by score, then sort those by another 
>> field. It's a strange case, and I can't think of a clean way to accomplish 
>> it. You could do it in two queries, where the first is by score and you only 
>> request your IDs to keep it snappy, then do a second query against the IDs 
>> and sort by your other field. 1000 seems like a lot for that approach, but 
>> who knows until you try it on your data.
>>
>> -Kallin Nagelberg
>>
>>
>> -Original Message-
>> From: Paul [mailto:p...@nines.org]
>> Sent: Wednesday, July 14, 2010 4:16 PM
>> To: solr-user
>> Subject: limiting the total number of documents matched
>>
>> I'd like to limit the total number of documents that are returned for
>> a search, particularly when the sort order is not based on relevancy.
>>
>> In other words, if the user searches for a very common term, they
>> might get tens of thousands of hits, and if they sort by "title", then
>> very high relevancy documents will be interspersed with very low
>> relevancy documents. I'd like to set a limit to the 1000 most relevant
>> documents, then sort those by title.
>>
>> Is there a way to do this?
>>
>> I guess I could always retrieve the top 1000 documents and sort them
>> in the client, but that seems particularly inefficient. I can't find
>> any other way to do this, though.
>>
>> Thanks,
>> Paul
>>
>


Less convoluted way to query for an empty string?

2010-07-14 Thread Mat Brown
Hi all,

I can't seem to find a way to query for an empty string that is
simpler than this:

field_name:[* to ""]

Things that don't work:

field_name:""
field_name["" TO ""]

Is the one I'm using the simplest option? If so, is there a particular
reason the other ones I mention don't work? Just curious mostly.

Thanks!
Mat


Re: csv response writer

2010-07-14 Thread Tommy Chheng
  I fixed the path of the queryResponseWriter class in the example 
solrconfig.xml. This was successfully applied against solr 4.0 trunk.


A few quirks:

   * When I didn't specify a default Delimiter, it printed out null as
 delimiter. I couldn't figure out why because init(NamedList args)
 specifies it'll use a default of ","
 "organization"null"2"null"

   * If i don't specify the column names, the output doesn't put in
 empty "" correctly.
 eg: output has a mismatched number of commas.
 "organization","1","Test","Name","2"," ","200","8",
 "organization","4","Solar","4","0",

added the patch to https://issues.apache.org/jira/browse/SOLR-1925

@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


On 7/13/10 1:41 PM, Erik Hatcher wrote:

Tommy,

It's not committed to trunk or any other branch at the moment, so no 
future released version until then.


Have you tested it out?  Any feedback we should incorporate?

When I can carve out some time over the next week or so I'll review 
and commit if there are no issues brought up.


Erik

On Jul 13, 2010, at 3:42 PM, Tommy Chheng wrote:


Hi,
Which next version of solr is the csv response writer set to be 
included in?

https://issues.apache.org/jira/browse/SOLR-1925

--
@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: 
http://gradschoolnow.com






Re: Less convoluted way to query for an empty string?

2010-07-14 Thread Lukas Kahwe Smith

On 15.07.2010, at 00:09, Mat Brown wrote:

> Hi all,
> 
> I can't seem to find a way to query for an empty string that is
> simpler than this:
> 
> field_name:[* to ""]
> 
> Things that don't work:
> 
> field_name:""
> field_name["" TO ""]
> 
> Is the one I'm using the simplest option? If so, is there a particular
> reason the other ones I mention don't work? Just curious mostly.
> 

Yomik recently suggested:
Hmmm, if this is on a String field, it seemed to work for me.
http://localhost:8983/solr/select?debugQuery=on&q=foo_s:"";

The raw query parser would also work (it skips analysis):
http://localhost:8983/solr/select?debugQuery=on&q={!raw f=foo_s}

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





Re: Solr search streaming/callback

2010-07-14 Thread Chris Hostetter

: I was wondering if anyone was aware of any existing functionality where
: clients/server components could register some search criteria and be
: notified of newly committed data matching the search when it becomes
: available

you can register a "postCommit" listener in your solrconfig.xml file ... 
that could either be a custom plugin to execute searches and "push" them 
somewhere else, or using the existing RunExecutableListener you could 
execute any command line app to "pull" the data on demand (and push it 
where ever you want) w/o customizing solr at all.




-Hoss



Re: Solr index optimizing help

2010-07-14 Thread Erick Erickson
Does your schema have a unique id specified? If so, is it possible that you
indexed many documents that had the same ID, thus deleting previous
documents with the same ID? That would account for it, but it's a shot in
the dark...

Best
Erick

On Tue, Jul 13, 2010 at 6:20 AM, Karthik K  wrote:

> Thanks a lot for the reply,
> is it independent of merge factor??
> My index size reduced a lot (almost by 40%) after optimization and i am
> worried that i might have lost the data. I have no deletes at all but a
> high
> merge factor. Any suggestions?
>
> Thanks,
> Karthik
>


Re: stemmed terms and phrases in a combined query

2010-07-14 Thread Chris Hostetter
: My question is how do i query that? 
: q=text_clean:Nike's new text_orig:"running shoes"
: seems like it would work, but not sure its the best way.

that's a perfectly good way to do it.

: Is there a way i can tell the parser(or extend it) so that every phrase
: query it will use one field and for others will use the default field?

not with any of the existing parsers.



-Hoss



Re: Using stored terms for faceting

2010-07-14 Thread Chris Hostetter

: is it possible to use the stored terms of a field for a faceted search?

No, the only thing stored fields can be used for is document centric 
opterations (ie: once you have a small set of individual docIds, you can 
access the stored fields to return to the user, or highlight, etc...)

: I mean, I don't want to get the term frequency per document as it is
: shown here:
: http://wiki.apache.org/solr/TermVectorComponentExampleOptions
: 
: I want to get the frequency of the term of my special search and show
: only the 10 most frequent terms and all the nice things that I can do
: for faceting.

i honestly have no idea what you are saying you want -- can you provide 
a concrete use case explaining what you mean?  describe some example data 
and then explain what type of logic owuld happen and what type of result 
you'd get back?



-Hoss



Re: question on wild card

2010-07-14 Thread Erick Erickson
The best way to understand how things are parsed is to go to the solr admin
page (Full interface link?) and click the "debug info" box and submit your
query. That'll tell you exactly what happens.

Alternatively, you can put &debugQuery=on on your URL...

HTH
Erick

On Wed, Jul 14, 2010 at 8:48 AM, Mark N  wrote:

> I have a database field  = hello world and i am indexing to *text* field
> with standard analyzer ( text is a copy field of solr)
>
> Now when user  gives a query   text:"hello world%"  , how does the query is
> interpreted in the background
>
> are we actually searchingtext: hello OR  text: world%( consider by
> default operator is OR )
>
>
>
>
>
>
> --
> Nipen Mark
>


Re: Strange "the" when search with dismax

2010-07-14 Thread Erick Erickson
If the other suggestions don't work, you need to show us the relevant
portions of your schema.xml, and probably query output with
&debug=on tacked on...

Here are some pointers for getting help...

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

2010/7/14 Jonathan Rochkind 

> "the" sounds like it might be a stopword. Are you using stopwords in any
> of your fields covered by the dismax search? But not in some of the
> other fields covered by dismax? the combination of dismax and stopwords
> can result in unexpected behavior if you aren't careful.
>
> I wrote about this a bit here, you might find it helpful:
> http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/
>
> marship wrote:
> > Hi. All.
> >I am using solr dismax to search over my books in db. I indexed them
> all using solr.
> >the problem I noticed today is,
> > Everything start with I want to search for a book "
> > The Girl Who Kicked the Hornet's Nest
> > "
> > but nothing is returned. I'm sure I have this book in DB. So I stripped
> some keyword and finally I found when I search for "the girl who kicked
> hornet's nest" , I got the book.
> > Then I test more
> > when I search for "the first world war", solr return the book
> successfully to me.
> > But when I search for "the first world war the", solr returns NOTHING!
> >
> >
> > So strange!
> > So the issue is, if there are 2 "the" in query keywords, solr/dismax
> simply return nothing!
> >
> >
> > Why is this happening?
> >
> >
> > Please help.
> > Thanks.
> > Regards.
> > Scott
> >
> >
> >
> >
>


Re: How to find first document for the ALL search

2010-07-14 Thread Chris Hostetter

: I have found that this search crashes:
: 
: /solr/select?q=*%3A*&fq=&start=0&rows=1&fl=id

Ouch .. that exception is kind of hairy.  it suggests that your index may 
have been corrupted in some way -- do you have nay idea what happened?  
have you tried using hte CheckIndex tool to see what it says?

(I'd hate to help you workd arround this but get bit by a timebomb of some 
other bad docs later)

: It looks like just that first document is bad. I am happy to delete it - but
: not sure how to get to it. Does anyone know how to find it?

CheckIndexes might help ... if it doesn't the next thing you might try is 
asking for a legitimate field name that you know no document has (ie: if 
you have a dynamicField with the pattern "str_*" because you have fields 
like "str_foo" and "str_bar" but you never have fields named 
"strBOGUS" then use fl=strBOGUS) and then add debugQuery=true to 
the URL -- the debug info should contain the id.

I'll be honest thought: i'm guessing that if your example query doesn't 
work, by suggestion won't either -- because if you get that error just 
trying to access the "id" field, the same thing will probably happen when 
the debugComponent tries to look at up as well.



-Hoss



Re: range faceting with integers

2010-07-14 Thread Chris Hostetter

: Subject: range faceting with integers
: References: 
: In-Reply-To: 

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking




-Hoss



Re: Solr index optimizing help

2010-07-14 Thread Karthik K
yeah, that happened :( ,lost lot of data because of it.
Can some one explain the terms numDocs and maxDoc ?? will the difference
indicate the duplicates??

Thank you,
karthik


about warm up

2010-07-14 Thread Li Li
I want to load full text into an external cache, So I added so codes
in newSearcher where I found the warm up takes place. I add my codes
before solr warm up  which is configed in solrconfig.xml like this:

  
  ...
  


public void newSearcher(SolrIndexSearcher newSearcher,
SolrIndexSearcher currentSearcher) {
warmTextCache(newSearcher,warmTextCache,new String[]{"title","content"});

for (NamedList nlst : (List)args.get("queries")) {

}
}

in warmTextCache I need a reader to get some docs
for(int i=0;i0 && !forceNew && _searcher==null) {
try {
Line 1000  searcherLock.wait();
} catch (InterruptedException e) {
  log.info(SolrException.toStr(e));
}
  }
And about 5 minutes later. it's ok.

So How can I get a "safe" reader in this situation?


Re: Solr index optimizing help

2010-07-14 Thread Otis Gospodnetic
Hi,

The difference indicates deletes.  Optimize the index (which expunges docs 
marked as deleted) and the difference disappears.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Karthik K 
> To: solr-user@lucene.apache.org
> Sent: Wed, July 14, 2010 10:26:12 PM
> Subject: Re: Solr index optimizing help
> 
> yeah, that happened :( ,lost lot of data because of it.
> Can some one explain  the terms numDocs and maxDoc ?? will the difference
> indicate the  duplicates??
> 
> Thank you,
> karthik
> 


Re: Multiple cores or not?

2010-07-14 Thread Otis Gospodnetic
Hello there,

I'm guessing the sites will be searched separately.  In that case I'd recommend 
a core for each site.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: "scr...@asia.com" 
> To: solr-user@lucene.apache.org
> Sent: Wed, July 14, 2010 3:02:36 PM
> Subject: Multiple cores or not?
> 
> 
> 
> 
>  Hi,
> 
> We are planning to host on same server different  website that will use solr.
> 
> What will be the best?
> 
> One core with a  field i schema: site1, site2 etc... and then add this in 
> every 
>query
> 
> Or  one core per site?
> 
> Thanks for your help
> 
> 
> 
> 


How to speed up solr search speed

2010-07-14 Thread marship
Hi. All.
I got a problem with distributed solr search. The issue is 
I have 76M documents spread over 76 solr instances, each instance handles 
1M documents. 
   Previously I put all 76 instances on single server and when I tested I found 
each time it runs, it will take several times, mostly 10-20s to finish a 
search. 
   Now, I split these instances into 2 servers. each one with 38 instances. the 
search speed is about 5-10s each time. 
10s is a bit unacceptable for me. And based on my observation, the slow is 
caused by disk operation as all theses instances are on same server. Because 
when I test each single instance, it is purely fast, always ~400ms. When I use 
distributed search, I found some instance say it need 7000+ms. 
   Our server has plenty of memory free of use. I am thinking is there a way we 
can make solr use more memory instead of harddisk index, like, load all indexes 
into memory so it can speed up?

welcome any help.
Thanks.
Regards.
Scott


how to eliminating scoring from a query?

2010-07-14 Thread oferiko

in  http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf
http://www.lucidimagination.com/files/file/LIWP_WhatsNew_Solr1.4.pdf  under
the performance it mentions:
"Queries that don’t sort by score can eliminate scoring, which speeds up
queries"
how exactly can i do that? If i don't mention which sort i want, it
automatically sorts by "score desc".

thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-eliminating-scoring-from-a-query-tp968581p968581.html
Sent from the Solr - User mailing list archive at Nabble.com.