RE: Removing irrelevant URLS

2010-11-07 Thread Eric Martin
-user@lucene.apache.org Subject: Re: Removing irrelevant URLS You can always do a delete-by-query, but that pre-supposes you can form a query that would remove only those documents with URLs you want removed... Assuming you do this, an optimize would then physically remove the documents from your index

Re: Removing irrelevant URLS

2010-11-07 Thread Erick Erickson
You can always do a delete-by-query, but that pre-supposes you can form a query that would remove only those documents with URLs you want removed... Assuming you do this, an optimize would then physically remove the documents from your index (delete by query just marks the docs as deleted). Solr h

Removing irrelevant URLS

2010-11-05 Thread Eric Martin
Hi, I have 100k URL's in my index. I specifically crawled sits relating to law. However, during my intitial crawls I didn't specify urlfilters so I am stuck with extrinsic and often irrelevant URL's like twitter, etc. Is there some way in Solr that I can run periodic URL cleanings to remov