Re: Listing Terms by Ascending IDF value . . ?

2010-01-04 Thread Shalin Shekhar Mangar
On Tue, Jan 5, 2010 at 9:15 AM, Christopher Ball < christopher.b...@metaheuristica.com> wrote: > Hello, > > I am trying to get a list of highly unusual terms or phrases (for example a > TF of 1 or 2) within an entire index (essentially this would be the inverse > of how Luke gives 'top terms' on t

Re: Improvising solr queries

2010-01-04 Thread Shalin Shekhar Mangar
On Tue, Jan 5, 2010 at 11:16 AM, dipti khullar wrote: > > This assettype is variable. It can have around 6 values at a time. > But this is true that we apply facet mostly on just one field - assettype. > > Ian has a good point. You are faceting on assettype and you are also filtering on it so you

Re: Improvising solr queries

2010-01-04 Thread dipti khullar
Hey Ian This assettype is variable. It can have around 6 values at a time. But this is true that we apply facet mostly on just one field - assettype. Any idea if the use of date range queries is expensive? Also if Shalin can put in some comments on "sorting by date was pretty rough on CPU", I can

Re: Rules engine and Solr

2010-01-04 Thread Avlesh Singh
Thanks for the response, Shalin. I am still in two minds over doing it "inside" Solr versus "outside". I'll get back with more questions, if any. Cheers Avlesh On Mon, Jan 4, 2010 at 5:11 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Mon, Jan 4, 2010 at 10:24 AM, Avlesh Singh

Listing Terms by Ascending IDF value . . ?

2010-01-04 Thread Christopher Ball
Hello, I am trying to get a list of highly unusual terms or phrases (for example a TF of 1 or 2) within an entire index (essentially this would be the inverse of how Luke gives 'top terms' on the 'Overview' tab). I see how I can do this within a specific query using the Term Vector Componen

Re: Improvising solr queries

2010-01-04 Thread Ian Holsman
On 1/5/10 12:46 AM, Shalin Shekhar Mangar wrote: sitename:XYZ OR sitename:"All Sites") AND (localeid:1237400589415) AND > ((assettype:Gallery)) AND (rbcategory:"ABC XYZ" ) AND (startdate:[* TO > 2009-12-07T23:59:00Z] AND enddate:[2009-12-07T00:00:00Z TO > *])&rows=9&start=63&sort=date > desc

Re: Indexing the latests MS Office documents

2010-01-04 Thread Peter Wolanin
You must have been searching old documentation - I think tika 0,3+ has support for the new MS formats. but don't take my word for it - why don't you build tika and try it? -Peter On Sun, Jan 3, 2010 at 7:00 PM, Roland Villemoes wrote: > Hi All, > > Anyone who knows how to index the latest MS o

RE: Non-leading wildcard search

2010-01-04 Thread Peter S
FYI: I have found the root of this behaviour. It has to do with a test patch I've been working on for working 'round pre SOLR-219 (case insensitive wildcard searching). With the test patch switched out, it works as expected. Although the case insensitive wildcard search reverts to pre-SOLR

RE: Non-leading wildcard search

2010-01-04 Thread Peter S
Hi Yonik, Thanks for your quick reply. No, the queries themselves aren't in quotes. Since I sent the initial email, I have managed to get non-leading wildcard queries to work with this, but by unexpected means (for me at least :-). If I add a LowerCaseFilterFactory to the fieldType,

Re: Non-leading wildcard search

2010-01-04 Thread Yonik Seeley
On Mon, Jan 4, 2010 at 5:38 PM, Peter S wrote: > When I query:  "Something" or "Something Else" or "*thing"  or "*omething*", > I get back the expected results. > If, however, I query: "Some*" or "S*" or "s*" etc, I get no results (although > this type of non-leading wildcard works fine with oth

Non-leading wildcard search

2010-01-04 Thread Peter S
Hello, There are lots of questions and answers in the forum regarding varying wildcard behaviour, but I haven't been able to find any that address this particular behaviour. Perhaps someone could help? Problem: I have a fieldType that only goes through a KeywordTokenizer at index time, to ensure

Re: Improvising solr queries

2010-01-04 Thread Tom Hill
Hi - Something doesn't make sense to me here: On Mon, Jan 4, 2010 at 5:55 AM, dipti khullar wrote: > - optimize runs on master in every 7 minutes > - using postOptimize , we execute snapshooter on master > - snappuller/snapinstaller on 2 slaves runs after every 10 minutes > > Why would you optim

Re: Any way to modify result ranking using an integer field?

2010-01-04 Thread Andy
Thank you Ahmet. Is there any way I can configure Solr to always use {!boost b=log(popularity)} as the default for all queries? I'm using Solr through django-haystack, so all the Solr queries are actually generated by haystack. It'd be much cleaner if I could configure Solr to always use Boost

Phrase search issue with XMLPayload? Is it the better solution?

2010-01-04 Thread Shairon
I have a project that involves words extracted by OCR, each page has words, each word has its geometry to blink a highlight to end user. I've been trying represent this document structure by xml foo bar baz qux Using the field 'fulltext_st' ,

Re: Implementing Autocomplete/Query Suggest using Solr

2010-01-04 Thread Prasanna R
On Mon, Jan 4, 2010 at 1:20 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Wed, Dec 30, 2009 at 3:07 AM, Prasanna R wrote: > > > I looked into the Solr/Lucene classes and found the required > information. > > Am summarizing the same for the benefit of those that might refer to t

Re: Any way to modify result ranking using an integer field?

2010-01-04 Thread Ahmet Arslan
> Thanks Ahmet. > > Do I need to do anything to enable BoostQParserPlugin in > Solr, or is it already enabled? I just confirmed that it is already enabled. You can see affect of it by appending &debugQuery=on to your search url.

Re: High Availability

2010-01-04 Thread rob
I'm also not sure what hooks you could put in upon the IP floating to the other machine, to start/stop replication - if it IS an issue anyway. On Mon 04/01/10 16:28 , Matthew Inger wrote: > So, when the masters switch back, does that mean, we have to force a > full delta update, correct? >

Re: High Availability

2010-01-04 Thread rob
Even when Master 1 is alive again, it shouldn't get the floating IP until Master 2 actually fails. So you'd ideally want them replicating to eachother, but since one will only be updated/Live at a time, it shouldn't cause an issue with cobbling data (?). Just a suggestion tho, not done it myse

Re: High Availability

2010-01-04 Thread Matthew Inger
So, when the masters switch back, does that mean, we have to force a full delta update, correct? mattin...@yahoo.com "Once you start down the dark path, forever will it dominate your destiny. Consume you it will " - Yoda - Original Message From: "r...@intelcompute.com" To: so

Re: Facets and distributed search

2010-01-04 Thread Yonik Seeley
Something looks wrong... that type of slowdown is certainly not expected. You should be able to see both the main query and a sub-query in the logs... could you post an actual example? -Yonik http://www.lucidimagination.com On Mon, Jan 4, 2010 at 4:15 AM, Aleksander Stensby wrote: > Hi everyone

Re: High Availability

2010-01-04 Thread rob
Have you looked into a basic floating IP setup? Have the master also replicate to another hot-spare master. Any downtime during an outage of the 'live' master would be minimal as the hot-spare takes up the floating IP. On Mon 04/01/10 16:13 , Matthew Inger wrote: > I'm kind of stuck and l

High Availability

2010-01-04 Thread Matthew Inger
I'm kind of stuck and looking for suggestions for high availability options. I've figured out without much trouble how to get the master-slave replication working. This eliminates any single points of failure in the application in terms of the application's searching capability. I would setup

Re: Improvising solr queries

2010-01-04 Thread Shalin Shekhar Mangar
On Mon, Jan 4, 2010 at 7:25 PM, dipti khullar wrote: > Thanks Shalin. > > Following are the relevant details: > > There are 2 search servers in a virtualized VMware environment. Each has 2 > instances of Solr running on separates ports in tomcat. > Server 1: hosts 1 master(application 1), 1 slave

RE: Reverse sort facet query [SOLR-1672]

2010-01-04 Thread Peter 4U
> Date: Sun, 3 Jan 2010 22:18:33 -0800 > From: hossman_luc...@fucit.org > To: solr-user@lucene.apache.org > Subject: RE: Reverse sort facet query [SOLR-1672] > > > : Yes, I thought about adding some 'new syntax', but I opted for a separate > 'facet.sortorder' parameter, > : > : mainly beca

Re: Improvising solr queries

2010-01-04 Thread Shalin Shekhar Mangar
On Mon, Jan 4, 2010 at 6:39 PM, dipti khullar wrote: > We have tried out various configurations settings to improvise the > performance of the site which is majorly using Solr but still the response > time remains about 4-5 reqs/sec. We also did some performance tests on Solr > 1.4 but still there

Re: Invalid CRLF - StreamingUpdateSolrServer ?

2010-01-04 Thread Patrick Sauts
The issue was sometimes null result during facet navigation or simple search, results were back after a refresh, we tried to changed the cache to . But same behaviour. *My implementation was :* (maybe wrong ?) LBHttpSolrServer solrServer = new LBHttpSolrServer(new HttpClient(), new XMLResponse

Improvising solr queries

2010-01-04 Thread dipti khullar
Hi We have tried out various configurations settings to improvise the performance of the site which is majorly using Solr but still the response time remains about 4-5 reqs/sec. We also did some performance tests on Solr 1.4 but still there is a very minute improvement in performance. Currently we

Re: Invalid CRLF - StreamingUpdateSolrServer ?

2010-01-04 Thread Shalin Shekhar Mangar
On Mon, Jan 4, 2010 at 6:11 PM, Patrick Sauts wrote: > > I've also tested LBHttpSolrServer (We wanted to have it as a "backup" for > HAproxy) and it appears not to be thread safe ( what is also curious about > it, is that there's no way to manage the connections' pool ). If you're > interresting

Re: Invalid CRLF - StreamingUpdateSolrServer ?

2010-01-04 Thread Patrick Sauts
Thank you Yonik for your answer. The platform encoding is "fr_FR.UTF-8", so it's still UTF-8, it should be I guess "en_US.UTF-8" ? I've also tested LBHttpSolrServer (We wanted to have it as a "backup" for HAproxy) and it appears not to be thread safe ( what is also curious about it, is that

Re: Rules engine and Solr

2010-01-04 Thread Shalin Shekhar Mangar
On Mon, Jan 4, 2010 at 10:24 AM, Avlesh Singh wrote: > I have a Solr (version 1.3) powered search server running in production. > Search is keyword driven is supported using custom fields and tokenizers. > > I am planning to build a rules engine on top search. The rules are database > driven and

Re: Configuring Solr to use RAMDirectory

2010-01-04 Thread Shalin Shekhar Mangar
On Thu, Dec 31, 2009 at 3:36 PM, dipti khullar wrote: > Hi > > Can somebody let me know if its possible to configure RAMDirectory from > solrconfig.xml. Although its clearly mentioned in > https://issues.apache.org/jira/browse/SOLR-465 by Mark that he has worked > upon it, but still I couldn't fin

Re: Search both diacritics and non-diacritics

2010-01-04 Thread Shalin Shekhar Mangar
On Sun, Jan 3, 2010 at 6:01 AM, Lance Norskog wrote: > The ASCIIFoldingFilter is a superset of the ISOLatin1Filter - > ISOLatin1 is deprecated. Here's the Javadoc from ASCIIFoldingFIlter. > You did not mention which language you want to search. > > Unforch, the ASCIIFoldingFilter is not mentione

Re: performance question

2010-01-04 Thread Erik Hatcher
On Jan 4, 2010, at 12:04 AM, A. Steven Anderson wrote: dynamic fields don't make it worse ... the number of actaul field names you sort on makes it worse. If you sort on 100 fields, the cost is the same regardless of wether all 100 of those fields exist because of a single declaration

Re: Solr Cell - PDFs plus literal metadata - GET or POST ?

2010-01-04 Thread Shalin Shekhar Mangar
On Wed, Dec 30, 2009 at 7:49 AM, Ross wrote: > Hi all > > I'm experimenting with Solr. I've successfully indexed some PDFs and > all looks good but now I want to index some PDFs with metadata pulled > from another source. I see this example in the docs. > > curl " > http://localhost:8983/solr/upd

Re: Implementing Autocomplete/Query Suggest using Solr

2010-01-04 Thread Shalin Shekhar Mangar
On Wed, Dec 30, 2009 at 3:07 AM, Prasanna R wrote: > I looked into the Solr/Lucene classes and found the required information. > Am summarizing the same for the benefit of those that might refer to this > thread in the future. > > The change I had to make was very simple - make a call to getPre

Facets and distributed search

2010-01-04 Thread Aleksander Stensby
Hi everyone! I've posted a similar question earlier, but in a thread related to facets in general, so I thought I'd repost it here as a separate thread. I have a faceted search that is very fast when I executed the query on a single solr server, but is significantly slower when executed in a distr

Re: Optimize not having any effect on my index

2010-01-04 Thread Aleksander Stensby
Hey, I managed to run it correctly after a few restarts. Don't really know what happened. Can't really see what this would have had to do with compound file format tho? But no, I'm not using compund file format. Cheers and thanks for your replies, Aleks On Mon, Dec 21, 2009 at 8:27 AM, gurudev

Re: Search algorithm used in Solr

2010-01-04 Thread Shalin Shekhar Mangar
On Mon, Jan 4, 2010 at 11:39 AM, wrote: > Hello everyone, > > Is there an article which explains (on a high level) the algorithm of > search in Solr? > > How does Solr search approach compare to the "inverted index" technique? > > Solr uses Lucene. It is the same inverted index technique at work.

Re: Remove the deleted docs from the Solr Index

2010-01-04 Thread Shalin Shekhar Mangar
On Wed, Dec 30, 2009 at 12:10 AM, Mohamed Parvez wrote: > Ditto. There should have been an DIH command to re-sync the Index with the > DB. > But there is such a command; it is called full-import. -- Regards, Shalin Shekhar Mangar.

Re: Spatial Solr (JTeam)

2010-01-04 Thread Thomas Rabaix
I have also move the jar into the global core's lib directory. and I still have this issue. I am running macosx snowleopard java version "1.6.0_17" Java(TM) SE Runtime Environment (build 1.6.0_17-b04-248-10M3025) Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01-101, mixed mode) I really d