Re: deleted segments.gen
i got the same problem. twice, deleting of index helped, but why happens ? solr runs and seems so all is okay. and after a time. i got the FileNotFoundException segement_XXX.gen is missing ... ^^ =( -- View this message in context: http://n3.nabble.com/deleted-segments-gen-tp487421p694627.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Related terms/combined terms
It sounds a little bit like clustering. Have a look at ClusteringComponent in the wiki. http://wiki.apache.org/solr/ClusteringComponent Does this fits your needs? Kind regards - Mitch -- View this message in context: http://n3.nabble.com/Related-terms-combined-terms-tp694083p694682.html Sent from the Solr - User mailing list archive at Nabble.com.
Minimum Should Match the other way round
Hello, I want to tinkle a little bit with Solr, so I need a little feedback: Is it possible to define a Minimum Should Match for the document itself? I mean, it is possible to say, that a query "this is my query" should only match a document, if the document matches 3 of the four queried terms. However, I am searching for a solution that does something like: "this is my query" and the document has to consist of this query plus maximal - for example - two another terms? Example: Query: "this is my query" Doc1: "this is my favorite query" Doc2: "I am searching for a lot of stuff, so this is my query" Doc2: "I'd like to say: this is my query" Saying that maximal two another terms should occur in the document, Solr should response only doc1. If this is not possible out-of-the-box, I think one has to work with TermVectors, am I right? I think it's possible to do so outside of Lucene/Solr by aking the response of the TermVectorsComponent and filtering the result-list. But I'd like to integrate this into Lucene/Solr itself. Any ideas which components I have to customize? At the moment I am speculating that I have to customize the class which is collecting the result, before it is passing it to the ResponseWriter. Kind regards - Mitch -- View this message in context: http://n3.nabble.com/Minimum-Should-Match-the-other-way-round-tp694867p694867.html Sent from the Solr - User mailing list archive at Nabble.com.
Remplicas not deleting old index.* folders
Hi, I've got a question regarding remplication index. folders - can anyone help? Note: There is a somewhat related thread here: http://www.lucidimagination.com/search/document/15a740cca17eed56/solr_1_4_replication_index_directories#e4b0af2f321204d7 I have a remplica that is pushed fetchindex commands on a periodic basis when it's time to remplicate (i.e. remplication is managed by the server application, not by remplica polling). The master that is sending these fetchindex commands tells the remplica to remplicate one of its cores, but which core it is changes over time. This has the effect of the remplica periodically saying: 'oh, these files are totally different, I'll create a brand new index. folder, upload the master's files to it and reload'. On its own, this is absolutely fine. The problem is that any previous index folders are left lying around - i.e. not deleted, so eventually (quickly for large indexes) the remplica runs out of disk space. Is there a way to either tell the remplica to always 'reuse' the /index folder (ideal) regardless of file name/content, or set its deletionPolicy or similar so that it deletes any and all 'old' index.* folders and only keeps the current one? Many thanks, Peter
Re: selecting documents older than 4 hours
Ok, Field type is as follows: I changed date to # date Wed Mar 31 19:50:48 PDT 2010 Run the query: http://localhost:8983/solr/select/?q=last_update_date:[NOW/DAY-7DAYS%20TO%20NOW/HOUR-5HOURS]&fl=last_update_date&debugQuery=true I should not be getting the 3 entries below with last update date 2010-03-31T19:40:34Z. NOW/HOUR-5HOURS evaluates to 2010-03-31T21:00:00 which should not be the case if the current time is Wed Mar 31 19:50:48 PDT 2010. Is SOLR converting NOW to GMT time? Here is the result: 0 2 last_update_date:[NOW/DAY-7DAYS TO NOW/HOUR-5HOURS] last_update_date true 2010-03-31T19:40:34Z 2010-03-31T19:40:34Z 2010-03-31T19:40:34Z last_update_date:[NOW/DAY-7DAYS TO NOW/HOUR-5HOURS] last_update_date:[NOW/DAY-7DAYS TO NOW/HOUR-5HOURS] last_update_date:[2010-03-25T00:00:00Z TO 2010-03-31T21:00:00Z] last_update_date:[2010-03-25T00:00:00 TO 2010-03-31T21:00:00] 1.0 = (MATCH) ConstantScoreQuery(last_update_date:[2010-03-25T00:00:00 TO 2010-03-31T21:00:00]), product of: 1.0 = boost 1.0 = queryNorm 1.0 = (MATCH) ConstantScoreQuery(last_update_date:[2010-03-25T00:00:00 TO 2010-03-31T21:00:00]), product of: 1.0 = boost 1.0 = queryNorm 1.0 = (MATCH) ConstantScoreQuery(last_update_date:[2010-03-25T00:00:00 TO 2010-03-31T21:00:00]), product of: 1.0 = boost 1.0 = queryNorm LuceneQParser 2.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 -- View this message in context: http://n3.nabble.com/selecting-documents-older-than-4-hours-tp689975p695037.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Remplicas not deleting old index.* folders
Upon further investigation of this, I believe this is potentially quite a serious situation for remplication servers. In SOLR-561, the 'create a new index. folder' concept was introduced mainly, as I understand it, because Windows locks files/folders that are in use. I'm not sure why this is a problem, given that it is only the Solr process itself that is 'using' these, so the file handles can simply be closed (by terminating fSyncService or similar), then delete, then carry on. This is somewhat by-the-by, as the code is out there now. The real issue that remains is that whenever the slave feels it needs to do a full copy, any existing index folders are left behind. For large indexes and/or long-running slaves, this is a path to disk starvation. >From the admittedly little I know about SnapPuller, I've come up with 2 possible solutions: 1. Change the inherent behaviour as described above, so that only 1 index folder is ever used (i.e. /index unless an explicit index.properties is specified). 2. Add an optional parameter that tells the SnapPuller to delete all index* folders in dataDir except the new 'live' one. I've modified SnapPuller in our test environment for Option 2, and this works very well. This takes an optional str parameter in solrconfig.xml /replication as: {{true}} This parameter is really only relevant for slaves, but maybe there's a use case for masters. This option is a little bit 'brute force'ish, and not as elegant as Option 1, but it does have the advantage of being completely transparent if {{autoCleanOldIndexes}} is not specified. If the experts in this area feel it is worthwhile, I can create a JIRA issue for this and associated SnapPuller patch. Comments most welcome. Thanks, Peter On Sat, Apr 3, 2010 at 3:32 PM, Peter Sturge wrote: > Hi, > > I've got a question regarding remplication index. folders - can > anyone help? > > Note: There is a somewhat related thread here: > > http://www.lucidimagination.com/search/document/15a740cca17eed56/solr_1_4_replication_index_directories#e4b0af2f321204d7 > > I have a remplica that is pushed fetchindex commands on a periodic basis > when it's time to remplicate (i.e. remplication is managed by the server > application, not by remplica polling). > The master that is sending these fetchindex commands tells the remplica to > remplicate one of its cores, but which core it is changes over time. > This has the effect of the remplica periodically saying: 'oh, these files > are totally different, I'll create a brand new index. folder, > upload the master's files to it and reload'. On its own, this is absolutely > fine. > The problem is that any previous index folders are left lying around - i.e. > not deleted, so eventually (quickly for large indexes) the remplica runs out > of disk space. > > Is there a way to either tell the remplica to always 'reuse' the /index > folder (ideal) regardless of file name/content, or set its deletionPolicy or > similar so that it deletes any and all 'old' index.* folders and only keeps > the current one? > > > Many thanks, > Peter > > > >