Re: How do I make sure the resulting documents contain the query terms?

2011-06-06 Thread pravesh
>k0 --> A | C >k1 --> A | B >k2 --> A | B | C >k3 --> B | C >Now let q=k1, how do I make sure C doesn't appear as a result since it doesn't contain any occurence of k1? Do we bother to do that. Now that's what lucene does :) -- View this message in context: http://lucene.472066.n3.nabble.com/Ho

Re: How do I make sure the resulting documents contain the query terms?

2011-06-06 Thread Gabriele Kahlout
Sorry being unclear and thank you for answering. Consider the following documents A(k0,k1,k2), B(k1,k2,k3), and C(k0,k2,k3), where A,B,C are document identifiers and the ks in bracket with each are the terms each contains. So Solr inverted index should be something like: k0 --> A | C k1 --> A | B

Re: problem: zooKeeper Integration with solr

2011-06-06 Thread bmdakshinamur...@gmail.com
Instead of integrating zookeeper, you could create shards over multiple machines and specify the shards while you are querying solr. Eg: http://localhost:8983/solr/select?shards=*:/,* *:/*&indent=true&q= On Mon, Jun 6, 2011 at 5:59 PM, Mohammad Shariq wrote: > Hi folk, > I am using solr to inde

Re: Master Slave help

2011-06-06 Thread Jayendra Patil
Do you mean the replication happens everytime you restart the server ? If so, you would need to modify the events you want the replication to happen. Check for the replicateAfter tag and remove the startup option, if you don't need it. startup commit

Re: synonyms problem

2011-06-06 Thread Erick Erickson
Please take a look at the analysis page for the field in question. I don't even know what happens if you define ONLY a query analyzer (or you left things out as an efficiency). Substituting synonyms to a string field is suspicious, I assume you're only indexing single tokens in that field. You ha

Re: synonyms problem

2011-06-06 Thread deniz
well i was trying to say that; i have changed the config files for synonyms and so on but nothing happens so i thought i needed to do something in java code too... i was trying to ask about that... - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066

Re: SpellCheckComponent performance

2011-06-06 Thread Erick Erickson
Hmmm, how are you configuring you spell checker? The first-time slowdown is probably due to cache warming, but subsequent 500 ms slowdowns seem odd. How many unique terms are there in your spellecheck index? It'd probably be best if you showed us your fieldtype and field definition... Best Erick

Re: How do I make sure the resulting documents contain the query terms?

2011-06-06 Thread Erick Erickson
I'm having a hard time understanding what you're driving at, can you provide some examples? This *looks* like filter queries, but I think you already know about those... Best Erick On Mon, Jun 6, 2011 at 4:00 PM, Gabriele Kahlout wrote: > Hello, > > I've seen that through boosting it's possible

Re: Minimum Should Match + External Field + Function Query with boost

2011-06-06 Thread fbytes
Seem to have a solution but I am still trying to figure out how/why it works. Addition of "defType=edismax" in the boost query seem to honor "MM" and correct boosting based on external file source. The new query syntax q={!boost b=dishRating v=$qq defType=edismax}&qq=hot chicken wings -- Vie

SpellCheckComponent performance

2011-06-06 Thread Demian Katz
I'm continuing to work on tuning my Solr server, and now I'm noticing that my biggest bottleneck is the SpellCheckComponent. This is eating multiple seconds on most first-time searches, and still taking around 500ms even on cached searches. Here is my configuration: basicSpell

How do I make sure the resulting documents contain the query terms?

2011-06-06 Thread Gabriele Kahlout
Hello, I've seen that through boosting it's possible to influence the scoring function, but what I would like is sort of a boolean property. In some way it's to search only the indexed documents by that keyword (or the intersection/union) rather than the whole set. Is this supported in any way?

Re: Solr Indexing Patterns

2011-06-06 Thread Jonathan Rochkind
This is a start, for many common best practices: http://wiki.apache.org/solr/SolrRelevancyFAQ Many of the questions in there have an answer that involves de-normalizing. As an example. It may be that even if your specific problem isn't in there, I myself anyway found reading through there ga

Re: Solr Indexing Patterns

2011-06-06 Thread Judioo
I do think that Solr would be better served if there was a *best practice section *of the site. Looking at the majority of emails to this list they resolve around "how do I do X?". Seems like tutorials with real world examples would serve Solr no end of good. I still do not have an example of th

Re: Solr Indexing Patterns

2011-06-06 Thread Judioo
Thanks On 6 June 2011 19:32, Erick Erickson wrote: > #Everybody# (including me) who has any RDBMS background > doesn't want to flatten data, but that's usually the way to go in > Solr. > > Part of whether it's a good idea or not depends on how big the index > gets, and unfortunately the only way

Re: Solr performance tuning - disk i/o?

2011-06-06 Thread Erick Erickson
If you're seeing results, things must be OK. It's a little strange, though, I'm seeing warmup times of 1 on the trivial reload of the example documents. But I wouldn't worry too much here. Those are pretty high autowarm counts, you might have room to reduce them but absent long autowarm times ther

Re: Solr Indexing Patterns

2011-06-06 Thread Erick Erickson
#Everybody# (including me) who has any RDBMS background doesn't want to flatten data, but that's usually the way to go in Solr. Part of whether it's a good idea or not depends on how big the index gets, and unfortunately the only way to figure that out is to test. But that's the first approach I'

Re: TIKA INTEGRATION PERFORMANCE

2011-06-06 Thread Tomás Fernández Löbbe
On Mon, Jun 6, 2011 at 1:47 PM, Naveen Gupta wrote: > Hi Tomas, > > 1. Regarding SolrInputDocument, > > We are not using java client, rather we are using php solr, wrapping > content > in SolrInputDocument, i am not sure how to do in PHP client? In this case, > we need tika related jars to avail

Re: Need query help

2011-06-06 Thread Alexey Serba
See "Tagging and excluding Filters" section * http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters 2011/6/6 Denis Kuzmenok : > For now i have a collection with: > id (int) > price (double) multivalue > brand_id (int) > filters (string) multivalue > > I  need  to  get a

Re: Auto-scaling solr setup

2011-06-06 Thread Akshay
Yes sadly .. I too have not much clue about AWS. The SolrReplication API doesnt give me what i want exactly.. For the time being i have hacked my way into the amazon image bootstrapping the replication check in a shell script ((curl & awk) very dirty way) . Once the check suceeds I enable the ser

RE: Solr performance tuning - disk i/o?

2011-06-06 Thread Demian Katz
All of my cache autowarmCount settings are either 1 or 5. maxWarmingSearchers is set to 2. I previously shared the contents of my firstSearcher and newSearcher events -- just a "queries" array surrounded by a standard-looking tag. The events are definitely firing -- in addition to t

Re: Auto-scaling solr setup

2011-06-06 Thread Erick Erickson
The HTTP interface (http://wiki.apache.org/solr/SolrReplication#HTTP_API) can be used to control lots of parts of replication. As to warmups, I don't know of a good way to test that. I don't know whether getting the current status on the slave includes whether warmup is completed or not. At worst,

Re: SolrJ and Range Faceting

2011-06-06 Thread Jamie Johnson
Small error, shouldn't be using this.start but should instead be using Double.parseDouble(this.getValue()); and sdf.parse(count.getValue()); respectfully. On Mon, Jun 6, 2011 at 1:16 PM, Jamie Johnson wrote: > Thanks Martijn. I pulled your patch and it looks like what I was looking > for. The

Re: SolrJ and Range Faceting

2011-06-06 Thread Jamie Johnson
Thanks Martijn. I pulled your patch and it looks like what I was looking for. The original FacetField class has a getAsFilterQuery method which returns the criteria to use as an fq parameter, I have logic which does this in my class which works, any chance of getting something like this added to

Re: TIKA INTEGRATION PERFORMANCE

2011-06-06 Thread Naveen Gupta
Hi Tomas, 1. Regarding SolrInputDocument, We are not using java client, rather we are using php solr, wrapping content in SolrInputDocument, i am not sure how to do in PHP client? In this case, we need tika related jars to avail the metadata such as content .. we certainly don't want to handle al

Re: How to get default result?

2011-06-06 Thread Tomás Fernández Löbbe
Hi Richard, are you setting the value to 0 at index time when the housenumber is not present? If you are, this would be as simple as modify the query at the application layer to city = a, street= b, housenumber=(14 OR 0). If you are not doing anything at index time with the not present housenumbers

Default query parser operator

2011-06-06 Thread Brian Lamb
Hi all, Is it possible to change the query parser operator for a specific field without having to explicitly type it in the search field? For example, I'd like to use: http://localhost:8983/solr/search/?q=field1:word token field2:parser syntax instead of http://localhost:8983/solr/search/?q=f

How to get default result?

2011-06-06 Thread richardr
Dear list, i got a question regarding my address search: I am searching for address data. If there is one address field not definied (in this case the housenumber) for the specific query (e.g. city = a, street = b, housenumber=14), I am getting no result. For every street there exists at least one

Re: Solr performance tuning - disk i/o?

2011-06-06 Thread Erick Erickson
Polling interval was in reference to slaves in a multi-machine master/slave setup. so probably not a concern just at present. Warmup time of 0 is not particularly normal, I'm not quite sure what's going on there but you may want to look at firstsearcher, newsearcher and autowarm parameters in conf

Re: Solr Indexing Patterns

2011-06-06 Thread Judioo
On 5 June 2011 14:42, Erick Erickson wrote: > See: http://wiki.apache.org/solr/SchemaXml > > By adding ' "multiValued="true" ' to the field, you can add > the same field multiple times in a doc, something like > > > > value1 > value2 > > > > I can't see how that would work as one would need

Re: Search with Synonyms in two fields

2011-06-06 Thread Jonathan Rochkind
On 6/5/2011 3:36 AM, occurred wrote: Ok, thx for the answer. My idea now is to store both field-values in one field and pre- and suffix the values from field2 with something very special. Also then the synonyms have to have the special pre- and suffixes. What are you actually trying to do? Us

Master Slave help

2011-06-06 Thread Rohit Gupta
Hi, I have configured my master slave server and everything seems to be running fine, the replication completed the firsttime it ran. But everytime I go the the replication link in the admin panel after restarting the server or server startup I notice the replication starting from scratch or a

Need query help

2011-06-06 Thread Denis Kuzmenok
For now i have a collection with: id (int) price (double) multivalue brand_id (int) filters (string) multivalue I need to get available brand_id, filters, price values and list of id's for current query. For example now i'm doing queries with facet.field=brand_id/filters/price: 1) to ge

Auto-scaling solr setup

2011-06-06 Thread Akshay
So i am trying to setup an auto-scaling search system of ec2 solr-slaves which scale up as number of requests increase and vice versa Here is what I have 1. A solr master and underlying slaves(scalable). And an elastic load balancer to distribute the load. 2. The ec2-auto-scaling setup fires nodes

RE: Solr performance tuning - disk i/o?

2011-06-06 Thread Demian Katz
Thanks once again for the helpful suggestions! Regarding the selection of facet fields, I think publishDate (which is actually just a year) and callnumber-first (which is actually a very broad, high-level category) are okay. authorStr is an interesting problem: it's definitely a useful facet (

problem: zooKeeper Integration with solr

2011-06-06 Thread Mohammad Shariq
Hi folk, I am using solr to index around 100mn docs. now I am planning to move to cluster based solr, so that I can scale the indexing and searching process. since solrCloud is in development stage, I am trying to index in shard based environment using zooKeeper. I followed the steps from http://

Re: Applying synonyms increase the data size from MB to GBs

2011-06-06 Thread Erick Erickson
Have you considered query-time expansion rather than index-time expansion? In general this will lead to more complex queries, but smaller indexes. Take a look at the analysis page available from the admin page to see exactly what happens. What is the high-legel problem you're trying to solve? Hav

Re: java.io.IOException: The specified network name is no longer available

2011-06-06 Thread Erick Erickson
Yep, but note the discussion. It's not at all clear that Solr is the place to deal with an unreliable network, and it sounds like that's the root of your issue. It doesn't look like anyone's hot to change Solr's behavior here, and it's arguable that Solr isn't the place to compensate for an unreli

Re: TIKA INTEGRATION PERFORMANCE

2011-06-06 Thread Tomás Fernández Löbbe
1. About the commit strategy, all the ExtractingRequestHandler (request handler that uses Tika to extract content from the input file) will do is extract the content of your file and add it to a SolrInputDocument. The commit strategy should not change because of this, compared to other documents yo

Re: synonyms problem

2011-06-06 Thread Erick Erickson
What does "call synonym methods in Java" mean? That is, what are you trying to accomplish and from where? Best Erick On Sun, Jun 5, 2011 at 9:48 PM, deniz wrote: > well i have changed it into text... but still confused about how to use > synonyms... > > and also I want to know how to call synony

Re: Expunging deletes from a very large index

2011-06-06 Thread Michael McCandless
You can drop your mergeFactor to 2 and then run expungeDeletes? This will make the operation take longer but (assuming you have > 3 segments in your index) should use less transient disk space. You could also make a custom merge policy, that expunges one segment at a time (even slower but even le

Re: Applying synonyms increase the data size from MB to GBs

2011-06-06 Thread Ahmet Arslan
> Is there a way where in I can apply all those file to same > tag with some > delimiter separated? > > like this: >         class="solr.SynonymFilterFactory" > synonyms="BODYTaxonomy.txt > , ClinicalObs.txt, MicTaxo.txt, SPTaxo.txt" > ignoreCase="true" > expand="true"/> Yes, you can perfectly

Travel Assistance applications now open for ApacheCon NA 2011

2011-06-06 Thread Simon Willnauer
The Apache Software Foundation (ASF)'s Travel Assistance Committee (TAC) is now accepting applications for ApacheCon North America 2011, 7-11 November in Vancouver BC, Canada. The TAC is seeking individuals from the Apache community at-large --users, developers, educators, students, Committers, an

Re: Solr Field name restrictions

2011-06-06 Thread Marc SCHNEIDER
Hi, Using Solr 3.1 I'm getting errors when trying to sort on fields containing dashes in the name... So that's true stay away from dashes if you can. Marc. On Sun, Jun 5, 2011 at 3:46 PM, Erick Erickson wrote: > I'd stay away from dashes too. It's too easy for the query parsers > to mistake the

Re: Feature: skipping caches and info about cache use

2011-06-06 Thread pravesh
SOLR1.3+ logs only the fresh queries in the logs. If you re-run the same query then it is served from cache, and not printed on the logs(unless cache(s) are not warmed or sercher is reopened). So, Otis's proposal would definitely help in doing some benchmarks & baselining the search :) -- View t