date:20100403

Re: deleted segments.gen

2010-04-03 Thread stockii


i got the same problem. twice, deleting of index helped, but why happens ? 

solr runs and seems so all is okay. and after a time. i got the
FileNotFoundException segement_XXX.gen is missing ... ^^ =(
-- 
View this message in context: 
http://n3.nabble.com/deleted-segments-gen-tp487421p694627.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Related terms/combined terms

2010-04-03 Thread MitchK


It sounds a little bit like clustering. 

Have a look at ClusteringComponent in the wiki. 
http://wiki.apache.org/solr/ClusteringComponent

Does this fits your needs?

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Related-terms-combined-terms-tp694083p694682.html
Sent from the Solr - User mailing list archive at Nabble.com.

Minimum Should Match the other way round

2010-04-03 Thread MitchK


Hello,

I want to tinkle a little bit with Solr, so I need a little feedback:
Is it possible to define a Minimum Should Match for the document itself?

I mean, it is possible to say, that a query "this is my query" should only
match a document, if the document matches 3 of the four queried terms.

However, I am searching for a solution that does something like: "this is my
query" and the document has to consist of this query plus maximal - for
example - two another terms?

Example:
Query: "this is my query"
Doc1: "this is my favorite query"
Doc2: "I am searching for a lot of stuff, so this is my query"
Doc2: "I'd like to say: this is my query"

Saying that maximal two another terms should occur in the document, Solr
should response only doc1.
If this is not possible out-of-the-box, I think one has to work with
TermVectors, am I right?

I think it's possible to do so outside of Lucene/Solr by aking the response
of the TermVectorsComponent and filtering the result-list. But I'd like to
integrate this into Lucene/Solr itself.
Any ideas which components I have to customize? 

At the moment I am speculating that I have to customize the class which is
collecting the result, before it is passing it to the ResponseWriter. 

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Minimum-Should-Match-the-other-way-round-tp694867p694867.html
Sent from the Solr - User mailing list archive at Nabble.com.

Remplicas not deleting old index.* folders

2010-04-03 Thread Peter Sturge

Hi,

I've got a question regarding remplication index. folders - can
anyone help?

Note: There is a somewhat related thread here:

http://www.lucidimagination.com/search/document/15a740cca17eed56/solr_1_4_replication_index_directories#e4b0af2f321204d7

I have a remplica that is pushed fetchindex commands on a periodic basis
when it's time to remplicate (i.e. remplication is managed by the server
application, not by remplica polling).
The master that is sending these fetchindex commands tells the remplica to
remplicate one of its cores, but which core it is changes over time.
This has the effect of the remplica periodically saying: 'oh, these files
are totally different, I'll create a brand new index. folder,
upload the master's files to it and reload'. On its own, this is absolutely
fine.
The problem is that any previous index folders are left lying around - i.e.
not deleted, so eventually (quickly for large indexes) the remplica runs out
of disk space.

Is there a way to either tell the remplica to always 'reuse' the /index
folder (ideal) regardless of file name/content, or set its deletionPolicy or
similar so that it deletes any and all 'old' index.* folders and only keeps
the current one?


Many thanks,
Peter

Re: selecting documents older than 4 hours

2010-04-03 Thread herceg_novi


Ok, 

Field type is as follows: 





I changed date to 

# date
Wed Mar 31 19:50:48 PDT 2010

Run the query: 
http://localhost:8983/solr/select/?q=last_update_date:[NOW/DAY-7DAYS%20TO%20NOW/HOUR-5HOURS]&fl=last_update_date&debugQuery=true

I should not be getting the 3 entries below with last update date
2010-03-31T19:40:34Z. 

NOW/HOUR-5HOURS evaluates to 2010-03-31T21:00:00 which should not be the
case if the current time is Wed Mar 31 19:50:48 PDT 2010. Is SOLR converting
NOW to GMT time? 

Here is the result: 




0
2




last_update_date:[NOW/DAY-7DAYS TO NOW/HOUR-5HOURS]

last_update_date
true






2010-03-31T19:40:34Z



2010-03-31T19:40:34Z



2010-03-31T19:40:34Z






last_update_date:[NOW/DAY-7DAYS TO NOW/HOUR-5HOURS]



last_update_date:[NOW/DAY-7DAYS TO NOW/HOUR-5HOURS]



last_update_date:[2010-03-25T00:00:00Z TO 2010-03-31T21:00:00Z]



last_update_date:[2010-03-25T00:00:00 TO 2010-03-31T21:00:00]






1.0 = (MATCH) ConstantScoreQuery(last_update_date:[2010-03-25T00:00:00 TO
2010-03-31T21:00:00]), product of:
  1.0 = boost
  1.0 = queryNorm




1.0 = (MATCH) ConstantScoreQuery(last_update_date:[2010-03-25T00:00:00 TO
2010-03-31T21:00:00]), product of:
  1.0 = boost
  1.0 = queryNorm




1.0 = (MATCH) ConstantScoreQuery(last_update_date:[2010-03-25T00:00:00 TO
2010-03-31T21:00:00]), product of:
  1.0 = boost
  1.0 = queryNorm


LuceneQParser


2.0


1.0


1.0



0.0



0.0



0.0



0.0



0.0




1.0


0.0



0.0



0.0



0.0



0.0



1.0





-- 
View this message in context: 
http://n3.nabble.com/selecting-documents-older-than-4-hours-tp689975p695037.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Remplicas not deleting old index.* folders

2010-04-03 Thread Peter Sturge

Upon further investigation of this, I believe this is potentially quite a
serious situation for remplication servers.

In SOLR-561, the 'create a new index. folder' concept was introduced
mainly, as I understand it, because Windows locks files/folders that are in
use.
I'm not sure why this is a problem, given that it is only the Solr process
itself that is 'using' these, so the file handles can simply be closed (by
terminating fSyncService or similar), then delete, then carry on.
This is somewhat by-the-by, as the code is out there now.

The real issue that remains is that whenever the slave feels it needs to do
a full copy, any existing index folders are left behind. For large indexes
and/or long-running slaves, this is a path to disk starvation.
>From the admittedly little I know about SnapPuller, I've come up with 2
possible solutions:

1. Change the inherent behaviour as described above, so that only 1 index
folder is ever used (i.e. /index unless an explicit index.properties is
specified).
2. Add an optional parameter that tells the SnapPuller to delete all index*
folders in dataDir except the new 'live' one.

I've modified SnapPuller in our test environment for Option 2, and this
works very well. This takes an optional str parameter in solrconfig.xml
/replication as:
   {{true}}
This parameter is really only relevant for slaves, but maybe there's a use
case for masters.

This option is a little bit 'brute force'ish, and not as elegant as Option
1, but it does have the advantage of being completely transparent if
{{autoCleanOldIndexes}} is not specified.

If the experts in this area feel it is worthwhile, I can create a JIRA issue
for this and associated SnapPuller patch. Comments most welcome.

Thanks,
Peter

On Sat, Apr 3, 2010 at 3:32 PM, Peter Sturge wrote:

> Hi,
>
> I've got a question regarding remplication index. folders - can
> anyone help?
>
> Note: There is a somewhat related thread here:
>
> http://www.lucidimagination.com/search/document/15a740cca17eed56/solr_1_4_replication_index_directories#e4b0af2f321204d7
>
> I have a remplica that is pushed fetchindex commands on a periodic basis
> when it's time to remplicate (i.e. remplication is managed by the server
> application, not by remplica polling).
> The master that is sending these fetchindex commands tells the remplica to
> remplicate one of its cores, but which core it is changes over time.
> This has the effect of the remplica periodically saying: 'oh, these files
> are totally different, I'll create a brand new index. folder,
> upload the master's files to it and reload'. On its own, this is absolutely
> fine.
> The problem is that any previous index folders are left lying around - i.e.
> not deleted, so eventually (quickly for large indexes) the remplica runs out
> of disk space.
>
> Is there a way to either tell the remplica to always 'reuse' the /index
> folder (ideal) regardless of file name/content, or set its deletionPolicy or
> similar so that it deletes any and all 'old' index.* folders and only keeps
> the current one?
>
>
> Many thanks,
> Peter
>
> 
>
>

Re: deleted segments.gen

Re: Related terms/combined terms

Minimum Should Match the other way round

Remplicas not deleting old index.* folders

Re: selecting documents older than 4 hours

Re: Remplicas not deleting old index.* folders

6 matches

Site Navigation

Mail list logo

Footer information