RE: facet on field aliases of same field

2014-10-29 Thread Michael Ryan
It is indeed possible. Just need to use a different syntax. As far as I know, the facet parameters need to be local parameters, like this... &facet.range={!key=date_decade facet.range.start=1600-01-01T00:00:00Z facet.range.end=2000-01-01T00:00:00Z facet.range.gap=%2B10YEARS}date&facet.range={!k

Changing/merging terms of existing documents without reindexing them

2014-10-22 Thread Michael Ryan
I have the following problem: I have many (let's say hundreds of millions) of documents in an existing distributed index that have a field with a variety of values. Two of these values are "dog" and "puppy". I have decided that I want to reclassify these to just all be "dog". I do queries on th

RE: Exact match on string field with special characters

2014-10-06 Thread Michael Ryan
This should do what you want: String fq = "Field1" + "\"" + org.apache.solr.client.solrj.util.ClientUtils.escapeQueryChars(value) + "\""; -Michael -Original Message- From: tedsolr [mailto:tsm...@sciquest.com] Sent: Monday, October 06, 2014 10:49 AM To: solr-user@lucene.apache.org Subje

RE: Inconsistent response time

2014-10-03 Thread Michael Ryan
It could be due to the minimum timer resolution on Windows. Do a search for "windows 15ms" and you'll find a lot of information about it. Though, I'm not sure which versions of Windows and/or Java have that problem. You could test it out by timing things other than Solr and see if they also take

RE: Exact match on string field with special characters

2014-10-01 Thread Michael Ryan
When you call addFacetField, the parameter you pass it should just be the fieldName. The fieldValue shouldn't come into play at all (unless I'm misunderstanding what you're trying to do). If you ever do need to escape a value for a query, you can use org.apache.solr.client.solrj.util.ClientUtil

RE: Content-Charset header in HttpSolrServer

2014-08-10 Thread Michael Ryan
Done. https://issues.apache.org/jira/browse/SOLR-6360 -Michael -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, August 06, 2014 7:55 PM To: solr-user@lucene.apache.org Subject: Re: Content-Charset header in HttpSolrServer : I was reviewing the

RE: solr update dynamic field generates multiValued error

2014-08-04 Thread Michael Ryan
Are the latLong_0_coordinate and latLong_1_coordinate fields populated using copyField? If so, this sounds like it could be https://issues.apache.org/jira/browse/SOLR-3502. -Michael -Original Message- From: Franco Giacosa [mailto:fgiac...@gmail.com] Sent: Monday, August 04, 2014 9:05 P

Content-Charset header in HttpSolrServer

2014-07-27 Thread Michael Ryan
I was reviewing the httpclient code in HttpSolrServer and noticed that it sets a "Content-Charset" header. As far as I know this is not a real header and is not necessary. Anyone know a reason for this to be there? I'm guessing this was just a mistake when converting from httpclient3 to httpclie

RE: DocValues without re-index?

2014-07-22 Thread Michael Ryan
/watch?v=9h3ax5Wmxpk On Tue, Jul 22, 2014 at 6:50 AM, Michael Ryan wrote: > Is it possible to use DocValues on an existing index without first > re-indexing? > > -Michael > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com>

DocValues without re-index?

2014-07-21 Thread Michael Ryan
Is it possible to use DocValues on an existing index without first re-indexing? -Michael

RE: Group only top 50 results not All results.

2014-07-11 Thread Michael Ryan
I suggest doing this in two queries. In the first query, retrieve the unique ids of the top 50 documents. In the second query, just query for those ids (e.g., q=ids:(2 13 55 62 81)), and add the facet parameters on that query. -Michael -Original Message- From: Aaron Gibbons [mailto:agib

RE: Best way to fix "Document contains at least one immense term"?

2014-07-01 Thread Michael Ryan
cript in JavaScruipt using the stateless script update processor. Can you tell us more about the nature of your data? I mean, sometimes analyzer filters strip or fold accented characters anyway, so count of characters versus UTF-8 bytes may be a non-problem. -- Jack Krupansky -----Original Messag

Best way to fix "Document contains at least one immense term"?

2014-07-01 Thread Michael Ryan
In LUCENE-5472, Lucene was changed to throw an error if a term is too long, rather than just logging a message. I have fields with terms that are too long, but I don't care - I just want to ignore them and move on. The recommended solution in the docs is to use LengthFilterFactory, but this lim

RE: Multiterm analysis in complexphrase query

2014-07-01 Thread Michael Ryan
f you have any questions. LUCENE-5470 and LUCENE-5504 would move multiterm analysis farther down and make it available to all parsers that use QueryParserBase, including the ComplexPhraseQueryParser. Best, Tim -Original Message----- From: Michael Ryan [mailto:mr...@moreover.com] Se

Multiterm analysis in complexphrase query

2014-06-29 Thread Michael Ryan
I've been using a modified version of the complex phrase query parser patch from https://issues.apache.org/jira/browse/SOLR-1604 in Solr 3.6, and I'm currently upgrading to 4.9, which has this built-in. I'm having trouble with using accents in wildcard queries, support for which was added in ht

RE: Date truncation and time zone when searching

2014-05-21 Thread Michael Ryan
Well for CEST, which is 2 hours ahead, I would think you could just do... datefield:[* TO NOW/MONTH-2HOURS] That would give you everything up to 2014-04-30 22:00:00 GMT, which is 2014-05-01 00:00:00 CEST. Always always always store the correct value. -Michael -Original Message- From:

score retrieval performance

2014-05-19 Thread Michael Ryan
Is there any significant difference in query speed when retrieving the score pseudo-field? E.g., does... q=foo&sort=date+desc&fl=*,score ...take longer to run than... q=foo&sort=date+desc&fl=* I know there's different code paths in Solr depending on whether the score is needed or not, but not

RE: timeAllowed query parameter not working?

2014-03-27 Thread Michael Ryan
Unfortunately the timeAllowed parameter doesn't apply to the part of the processing that makes wildcard queries so slow. It only applies to a later part of the processing when the matching documents are being collected. There's some discussion in the original ticket that implemented this (https

RE: Solr 3.6.1 stalling with high CPU and blocking on field cache

2013-11-26 Thread Michael Ryan
My gut instinct is that your heap size is way too high. Try decreasing it to like 5-10G. I know you say it uses more than that, but that just seems bizarre unless you're doing something like faceting and/or sorting on every field. -Michael -Original Message- From: Patrick O'Lone [mailto

RE: Interesting edismax/qs bug in Solr 3.5

2013-09-22 Thread Michael Ryan
Sounds like https://issues.apache.org/jira/browse/LUCENE-3821 (issue seems to be fixed but still shows as open). -Michael -Original Message- From: Arcadius Ahouansou [mailto:arcad...@menelic.com] Sent: Sunday, September 22, 2013 11:15 PM To: solr-user Subject: Interesting edismax/qs bug

RE: JVM Crash using solr 4.4 on Centos

2013-09-19 Thread Michael Ryan
This is a known bug in that JDK version. Upgrade to a newer version of JDK 7 (any build within the last two years or so should be fine). If that's not possible for you, you can add -XX:-UseLoopPredicate as a command line option to java to work around this. -Michael -Original Message- F

RE: Memory usage during aggregation - SolrCloud with very large numbers of facet terms.

2013-09-03 Thread Michael Ryan
> However, the Solr instance we direct our client query to is consuming > significantly more RAM (10GB) and is still failing after a few queries when > it runs out of heap space. This is presumably due to the role it plays, > aggregating the results from each shard. That seems quite odd... Wha

RE: swap and GC

2013-07-29 Thread Michael Ryan
This is interesting... How are you measuring the heap size? -Michael -Original Message- From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] Sent: Monday, July 29, 2013 5:34 AM To: solr-user@lucene.apache.org Subject: swap and GC Something interesting I have noticed today, after

RE: Solr 3.6 optimize and field cache question

2013-07-08 Thread Michael Ryan
I'm 99% sure that the deleted docs will indeed use up space in the field cache, at least until the segments that those documents are in are merged - that is what an optimize will do. Of course, these segments will automatically be merged eventually, but it might take days for this to happen, dep

Using per-segment FieldCache or DocValues in custom component?

2013-07-01 Thread Michael Ryan
I have some custom code that uses the top-level FieldCache (e.g., FieldCache.DEFAULT.getLongs(reader, "foobar", false)). I'd like to redesign this to use the per-segment FieldCaches so that re-opening a Searcher is fast(er). In most cases, I've got a docId and I want to get the value for a part

RE: why does the has to be indexed.

2013-06-24 Thread Michael Ryan
To enforce uniqueness, Solr needs to be able to search on the id to see if it is currently in the index. -Michael -Original Message- From: Mysurf Mail [mailto:stammail...@gmail.com] Sent: Monday, June 24, 2013 11:52 AM To: solr-user@lucene.apache.org Subject: why does the has to be ind

RE: Restarting SOLR will remove all cache?

2013-06-21 Thread Michael Ryan
Restarting Solr won't clear the disk cache. When I'm doing perf testing, I'll sometimes run this on the server before each test to clear out the disk cache: echo 1 > /proc/sys/vm/drop_caches -Michael -Original Message- From: Learner [mailto:bbar...@gmail.com] Sent: Friday, June 21, 201

RE: Stats facet on int/tint fields

2013-04-22 Thread Michael Ryan
Sounds like this could be https://issues.apache.org/jira/browse/SOLR-2976. -Michael -Original Message- From: vinothkumar raman [mailto:vinothkr.k...@gmail.com] Sent: Monday, April 22, 2013 5:54 AM To: solr-user@lucene.apache.org; solr-...@lucene.apache.org Subject: Stats facet on int/tin

RE: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Michael Ryan
I've investigated this in the past. The worst case is 2*indexSize additional disk space (3*indexSize total) during an optimize. In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of 10. We would see the worst case happen when there were exactly 20 segments (or some oth

RE: NPE when faceting TEXTfield in a distributed search query

2013-04-10 Thread Michael Ryan
Yes, this is a distributed search thing. In a distributed search, it will first make a somewhat normal facet request to all of the shards, get back the facet values, then make a second request in order to get the full counts of the facet values - this second request contains a list of facet term

RE: NPE when faceting TEXTfield in a distributed search query

2013-04-10 Thread Michael Ryan
Large facet.limit values cause a very large amount of form data to be sent to the shards, though I'm not sure why this would cause a NullPointerException. Perhaps the web server you are using is truncating the data instead of returning a form too large error, which is somehow causing an NPE. Are

RE: Slow performance on distributed search

2013-03-26 Thread Michael Ryan
Depending on your use case and the particulars of your system, a previous post I made about using a FieldCache in SolrIndexSearcher for id retrieval (see http://osdir.com/ml/solr-user.lucene.apache.org/2013-01/msg01574.html) may help you. In your case, it might not be the merging process on the

RE: Slow performance on distributed search

2013-03-26 Thread Michael Ryan
What are the values of the start and rows parameters you are using? When you say the controller shard takes a long time, how long is it taking - 100ms, 1s, 10s...? -Michael -Original Message- From: qungg [mailto:qzheng1...@gmail.com] Sent: Tuesday, March 26, 2013 11:17 AM To: solr-user

Nested queries with proximity/slop

2013-03-19 Thread Michael Ryan
I was wondering if anyone is aware of an existing Jira for this bug... _query_:"\"a b\"~2" ...is parsed as... PhraseQuery(someField:"a b") ...instead of the expected... PhraseQuery(someField:"a b"~2) _query_:"\"a b\""~2 ...is parsed as... PhraseQuery(someField:"a b"~2) _query_:"\"a b\"~2"~3 ...i

RE: Distributed Search and the Stale Check

2013-02-25 Thread Michael Ryan
I don't have anything to add besides saying "this is awesome". Great analysis. -Michael

RE: Can't determine Sort Order: 'prijs ASC', pos=5

2013-02-13 Thread Michael Ryan
I think the order needs to be in lowercase. Try "asc" instead of "ASC". -Michael -Original Message- From: PeterKerk [mailto:vettepa...@hotmail.com] Sent: Wednesday, February 13, 2013 7:30 PM To: solr-user@lucene.apache.org Subject: Can't determine Sort Order: 'prijs ASC', pos=5 On this

RE: solr j response

2013-02-10 Thread Michael Ryan
Assuming that createdDate is a DateField in your schema.xml, the object returned by SolrJ will be a Date object (though you will need to cast it to a Date). -Michael

RE: LocalParam tag does not work when is placed in brackets

2013-02-07 Thread Michael Ryan
I'm pretty sure the local params have to be at the very start of the query. But you should be able to do this with nested queries. Try this... fq=_query_:"{!tag=d0feea8}category:\"5\" OR otherField:\"otherValue\"" AND type:DOCUMENT -Michael -Original Message- From: Karol Sikora [mailto

RE: A question about attaching shards to load balancers

2013-01-30 Thread Michael Ryan
>From a performance point of view, I can't imagine it mattering. In our setup, >we have a dedicated Solr server that is not a shard that takes incoming >requests (we call it the "coordinator"). This server is very lightweight and >practically has no load at all. My gut feeling is that having a

Using FieldCache in SolrIndexSearcher for distributed id retrieval

2013-01-29 Thread Michael Ryan
Following up from a post I made back in 2011... > I am a user of Solr 3.2 and I make use of the distributed search capabilities > of Solr using > a fairly simple architecture of a coordinator + some shards. > > Correct me if I am wrong: In a standard distributed search with > QueryComponent, t

RE: Issues with docFreq/docCount on SolrCloud

2013-01-23 Thread Michael Ryan
Are you able to see any evidence that some of the 500k docs are being added twice? Check the maxDocs on the Solr admin page. I vaguely recall there being some issue with docs in SolrCloud being added multiple times (which under the covers is really add, delete, add). I think that could cause the

RE: Solr 4.0 - timeAllowed in distributed search

2013-01-20 Thread Michael Ryan
(This is based on my knowledge of 3.6 - not sure if this has changed in 4.0) You are using rows=3, which requires retrieving 3 documents from disk. In a non-distributed search, the QTime will not include the time it takes to retrieve these documents, but in a distributed search, it will.

RE: SOlr 3.5 and sharding

2013-01-14 Thread Michael Ryan
If you have the same documents -- with the same uniqueKey -- across multiple shards, the count will not be what you expect. You'll need to ensure that each document exists only on a single shard. -Michael -Original Message- From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wante

RE: wildcard faceting in solr cloud

2013-01-08 Thread Michael Ryan
I'd guess that the patch simply doesn't implement it for distributed searches. The code for distributed facets is quite a bit more complicated, and I don't see it touched in this patch. -Michael -Original Message- From: jmozah [mailto:jmo...@gmail.com] Sent: Tuesday, January 08, 2013 1

RE: Question about GC logging timestamps

2013-01-05 Thread Michael Ryan
>From my own experience, the timestamp seems to be logged at the start of the >garbage collection. -Michael

RE: Odd exceptions in both 3.5 and 4.1-SNAPSHOT

2013-01-03 Thread Michael Ryan
We see these EofExceptions in our system occasionally. I believe they occur when our SolrJ client times out and closes the connection, before Jetty returns the response. -Michael -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Thursday, January 03, 2013 10:07 AM

RE: Where does schema.xml's schema/@name displays?

2012-12-20 Thread Michael Ryan
In our system (using 3.6), it is displayed on /solr/admin/. I'd guess that the value in solr.xml overrides the one in schema.xml, but not sure. -Michael -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Thursday, December 20, 2012 12:08 AM To: solr-user@lu

RE: Sort speed asc vs desc - is desc slower?

2012-12-12 Thread Michael Ryan
> Perhaps if there are a lot more ties on one end vs the other? > Or of the values being sorted on aren't that random? Do they naturally > increase like a timestamp? It's a unique id field. The id is a simple sequential id, so docs with a lower doc id will naturally also have a lower id. I thi

Highlighting data stored outside of Solr

2012-12-11 Thread Michael Ryan
Has anyone ever attempted to highlight a field that is not stored in Solr? We have been considering not storing fields in Solr, but still would like to use Solr's built-in highlighting. On first glance, it looks like it would be fairly simply to modify DefaultSolrHighlighter to get the stored

RE: SolrCloud - Query performance degrades with multiple servers

2012-12-05 Thread Michael Ryan
As you add nodes, the average response time of the slowest node will likely increase. For example, consider an extreme case where you have something like 1 million nodes - you're practically guaranteed that one of them is going to be doing something like a stop-the-world garbage collection. So e

Occasional "failed to respond" errors

2012-12-05 Thread Michael Ryan
We have a longstanding issue with "failed to respond" errors in Solr when our coordinator is querying our Solr shards. To elaborate further... we're using the built-in distributed capabilities of Solr 3.6, and using Jetty as our server. Occasionally, we will have a query fail due to an error

RE: Solr 4 : Optimize very slow

2012-12-04 Thread Michael Ryan
When I upgraded from 3.2 to 3.6, I found that an optimize - all other variables being the same - took about twice as long. Eventually I was able to track this down to the new default of MMapDirectory. By changing back to NIOFSDirectory, I was able to get the optimize time back down to what it fo

RE: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread Michael Ryan
Yeah, the situation is kind of a pain right now. In https://issues.apache.org/jira/browse/SOLR-2438, it was enabled by default and there is no way to disable without patching SolrQueryParser. There's also the edismax parser which doesn't have a setting for this, which I've made a jira for at ht

RE: Solr - Disk writes and set up suggestions

2012-11-03 Thread Michael Ryan
I'd recommend not optimizing every hour. Are you seeing a significant performance increase from optimizing this frequently? -Michael

RE: facet by "in the past" and "in the future"

2012-10-18 Thread Michael Ryan
This should do it: facet=true&facet.query=yourDateField:([* TO NOW/DAY-1MILLI])&facet.query=yourDateField:([NOW/DAY TO *]) -Michael -Original Message- From: Paul [mailto:p...@nines.org] Sent: Thursday, October 18, 2012 5:28 PM To: solr-user@lucene.apache.org Subject: facet by "in the pa

RE: How many documents in each Lucene segment?

2012-10-15 Thread Michael Ryan
Easiest way I know of without parsing any of the index files is to take the size of the fdx file in bytes and divide by 8. This will give you the exact number of documents before 4.0, and a close approximation in 4.0. Though, the fdx file might not be on disk if you haven't committed. -Michael

RE: Building solr with maven

2012-10-14 Thread Michael Ryan
We have a maven project to build a war containing everything from the Solr war, plus some of our own code. Here's the relevant stuff from our pom.xml: war org.apache.solr solr-core org.apache.solr solr

RE: Funny behavior in facet query on large dataset

2012-10-08 Thread Michael Ryan
Facets are only really useful if you want the counts for multiple values (e.g., "eldudearino", "ladudearina"). I'd suggest just leaving all the facet parameters off of that query - the numFound that is returned should give you what you want. The slowness may be due to the facet cache needing to

RE: Strange "spikes" in query response times...any ideas where else to look?

2012-06-28 Thread Michael Ryan
A few questions... 1) Do you only see these spikes when running JMeter? I.e., do you ever see a spike when you manually run a query? 2) How are you measuring the response time? In my experience there are three different ways to measure query speed. Usually all of them will be approximately equ

RE: KeywordTokenizerFactory with SynonymFilterFactory

2012-06-16 Thread Michael Ryan
Try changing the tokenizer2 SynonymFilterFactory filter to this: By default, it seems that it uses WhitespaceTokenizer. -Michael

RE: Wildcard-Search Solr 3.5.0

2012-05-23 Thread Michael Ryan
I'd guess that this is because SnowballPorterFilterFactory does not implement MultiTermAwareComponent. Not sure, though. -Michael

RE: Changing precisionStep without a re-index

2012-04-18 Thread Michael Ryan
In case anyone tries to do this... If you facet on a TrieField and change the precisionStep to 0, you'll need to re-index. Changing precisionStep to 0 changes the prefix returned by TrieField.getMainValuePrefix(FieldType), which then causes facets with a value of "0" to be returned. -Michael

RE: Changing precisionStep without a re-index

2012-04-16 Thread Michael Ryan
> Not really - it changes what tokens are indexed for them numbers and > range queries won't work correctly. > Sorting (FieldCache), function queries, etc, would still work, and > exact match queries would still work. Thanks. So it is just range queries that won't work correctly? That's okay for

Changing precisionStep without a re-index

2012-04-16 Thread Michael Ryan
Is it safe to change the precisionStep for a TrieField without doing a re-index? Specifically, I want to change a field from this: to this: By "safe", I mean that searches will return the correct results, a FieldCache on the field will still work, clowns won't eat me... -Michael

RE: mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Michael Ryan
It looks like the first format was removed in 3.6 as part of https://issues.apache.org/jira/browse/SOLR-1052. The second format works in all 3.x versions. -Michael -Original Message- From: Peter Wolanin [mailto:peter.wola...@acquia.com] Sent: Friday, April 13, 2012 12:32 PM To: solr-us

RE: How to limit the number of open searchers?

2012-03-11 Thread Michael Ryan
> I'm curious, why can't you do a master/slave setup? It's just not all that useful for this particular application. Indexing new docs and merging segments - which as I understand is the main strength of having a write-only master - is a relatively small part of our app. What really is expensiv

RE: How to limit the number of open searchers?

2012-03-07 Thread Michael Ryan
> Unless you have warming happening, there should > only be a single searcher open at any given time. > So it seems to me that maxWarmingSearchers > should give you what you need. What I'm seeing is that if a query takes a very long time to run, and runs across the duration of multiple commits (I

RE: How can Solr do parallel query warming with and ?

2012-03-05 Thread Michael Ryan
https://issues.apache.org/jira/browse/SOLR-2548 may be of interest to you. -Michael

How to limit the number of open searchers?

2012-03-05 Thread Michael Ryan
Is there a way to limit the number of searchers that can be open at a given time? I know there is a maxWarmingSearchers configuration that limits the number of warming searchers, but that's not quite what I'm looking for... Ideally, when I commit, I want there to only be one searcher open befor

RE: Update Solr Schema To Store Field

2012-02-01 Thread Michael Ryan
This should be fine. From my experience, changing a field from stored="false" to stored="true" and vice versa is generally safe to do and has no unexpected behavior. -Michael

RE: HTMLStripCharFilterFactory not working in Solr4?

2012-01-24 Thread Michael Ryan
Try putting the HTMLStripCharFilterFactory before the StandardTokenizerFactory instead of after it. I vaguely recall being burned by something like this before. -Michael

RE: Question on Reverse Indexing

2012-01-21 Thread Michael Ryan
> Can this be the reason why it is working automatically although there are no > reversed tokens being stored and even without the > ReversedWildcardFilterFactory being set, solr automatically is allowing > leading wild card search? Yes, that's correct. See https://issues.apache.org/jira/browse

RE: Question about sorting by a field

2012-01-19 Thread Michael Ryan
How about having a single-valued field named "firstDestination" that has the first destination in the list, and then your query could be something like 'destination:"Buenos Aires" firstDestination:"Buenos Aires"'. Docs that match both should have a higher score and thus will be listed first. -M

TrieField precisionStep effect on non-range queries and sorting

2012-01-02 Thread Michael Ryan
I was wondering... how does the TrieField precisionStep value affect the speed of non-range queries and sorting? I'm assuming that int (precisionStep=0) is no slower than tint (precisionStep=8) for these - is that correct? tint is just faster for range queries? Is int any faster than tint for

RE: Replication Index Fetch error

2011-12-19 Thread Michael Ryan
According to http://lucene.apache.org/java/3_4_0/fileformats.html, the FNMVersion changed from -2 to -3 in Lucene 3.4. Is it possible that the new master is actually running 3.4, and the new slave is running 3.2? (This is just a wild guess.) -Michael

RE: Poor performance on distributed search

2011-12-19 Thread Michael Ryan
I had a similar requirement in my project, where a user might ask for up to 3000 results. What I did was change SolrIndexSearcher.doc(int, Set) to retrieve the unique key from the field cache instead of retrieving it as a stored field from disk. This resulted in a massive speed improvement for t

RE: multi value field search

2011-12-17 Thread Michael Ryan
> The problem I have is that at search time, I have faceting turned on for > this field and therefore, I get the four facets "canadian", "imperial", > "bank", and "commerce", which all refer to the same record. > > How can I go about searching for any word contained in the company name but > then

UnInvertedField vs FieldCache for facets for single-token text fields

2011-11-03 Thread Michael Ryan
I have some fields I facet on that are TextFields but have just a single token. The fieldType looks like this: SimpleFacets uses an UnInvertedField for these fields because multiValuedFieldCache() returns true for TextField. I tried changing the type for these fields to the plain "s

RE: Query time help

2011-10-30 Thread Michael Ryan
Another thing to note is that QTime does not include the time it takes to retrieve the stored documents to include in the response. So if you're using a high rows value in your query, QTime may be much smaller than the actual time Solr spends generating the response. Try adding rows=1 to your quer

Applying hl.requireFieldMatch to "groups" of fields

2011-10-27 Thread Michael Ryan
I am trying to highlight FieldA when a user searches on either FieldA or FieldB, but I do not want to highlight FieldA when a user searches on FieldC. To explain further: I have a field named "content" and a field named "contentCS". The content field is a stored text field that uses LowerCaseFilte

How to make UnInvertedField faster?

2011-10-19 Thread Michael Ryan
I was wondering if anyone has any ideas for making UnInvertedField.uninvert() faster, or other alternatives for generating facets quickly. The vast majority of the CPU time for our Solr instances is spent generating UnInvertedFields after each commit. Here's an example of one of our slower fields

RE: setting a large positionIncrementGap

2011-10-11 Thread Michael Ryan
> Separately: why do you want to make the gap so large? No reason, really. I'm just curious about how it works under the covers. -Michael

setting a large positionIncrementGap

2011-10-11 Thread Michael Ryan
Is there any negative side-effects of setting a very large positionIncrementGap? For example, I use positionIncrementGap=100 right now - is there any reason for me to not use positionIncrementGap=1, or even greater? I saw a thread from a few months ago asking something like this, but I did

RE: Example setting TieredMergePolicy for Solr 3.3 or 3.4?

2011-09-21 Thread Michael Ryan
I think the problem is that the config needs to be inside of the config, rather than after it as your have. -Michael

Using the contrib flexible query parser in Solr

2011-09-13 Thread Michael Ryan
Has anyone used the "Flexible Query Parser" (https://issues.apache.org/jira/browse/LUCENE-1567) in Solr? I'm just starting to look at it for the first time and was wondering if it is something that can be dropped into Solr fairly easily, or if more extensive changes are needed. I thought perh

RE: High facet.limit (with only 2-3 actual facets) -> Massive bandwidth consumption in DistributedSearch

2011-09-08 Thread Michael Ryan
> yep - facet.mincount=1 Yeah, I've ran into this same issue, though I never looked too closely into it. What is happening is that the facet.mincount parameter is removed when the query is made to the shards, so each shard is returning about 3 facet values, most of them with a count of 0. I

RE: High facet.limit (with only 2-3 actual facets) -> Massive bandwidth consumption in DistributedSearch

2011-09-08 Thread Michael Ryan
Are you using facet.mincount in the query? -Michael

RE: Optimize concern in Solr 3.2

2011-09-02 Thread Michael Ryan
> I have recently upgraded from Solr 1.4 to Solr 3.2. In Solr 1.4 only 3 > files (one .cfs & two segments) file were made in *index/* directory. > (after > doing optimize). > > Now, in Solr 3.2, the optimize seems not be working. My final number of > files in *index/* directory are in 7-8 in numb

RE: Query vs Filter Query Usage

2011-08-25 Thread Michael Ryan
> 10,000,000 document index > Internal Document id is 32 bit unsigned int > Max Memory Used by a single cache slot in the filter cache = 32 bits x > 10,000,000 docs = 320,000,000 bits or 38 MB I think it depends on where exactly the result set was generated. I believe the result set will usually

Optimize requires 50% more disk space when there are exactly 20 segments

2011-08-24 Thread Michael Ryan
I'm using Solr 3.2 with a mergeFactor of 10 and no merge policy configured, thus using the default LogByteSizeMergePolicy. Before I do an optimize, typically the largest segment will be about 90% of the total index size. When I do an optimize, the total disk space required is usually about 2x t

RE: Requiring multiple matches of a term

2011-08-21 Thread Michael Ryan
> One simple way of doing this is maybe to write a wrapper for TermQuery > that only returns docs with a Term Frequency > X as far as I > understand the question those terms don't have to be within a certain > window right? Correct. Terms can be anywhere in the document. I figured term frequencie

Requiring multiple matches of a term

2011-08-19 Thread Michael Ryan
Is there a way to specify in a query that a term must match at least X times in a document, where X is some value greater than 1? For example, I want to only get documents that contain the word "dog" three times. I've thought that using a proximity query with an arbitrary large distance value

RE: Solr Accent Insensitive and sensitive search

2011-08-17 Thread Michael Ryan
Are you using the same analyzer for both type="query" and type="index"? Can you show us the fieldType from your schema? -Michael

RE: copyfields in schema.xml

2011-08-11 Thread Michael Ryan
Nope. The 'text' field will just have the 'titulo' contents. To have both, you would have to do something like this: -Michael

RE: How come this query string starts with wildcard?

2011-08-10 Thread Michael Ryan
I think this is because ")" is treated as a token delimiter. So "(foo)bar" is treated the same as "(foo) bar" (that is, bar is treated as a separate word). So "(foo)*" is really parsed as "(foo) *" and thus the * is treated as the start of a new word. -Michael

RE: schema.xml changes, need re-indexing ?

2011-07-27 Thread Michael Ryan
You should be fine - no need to re-index your data. Adding and removing fields is generally safe to do without a re-index. Changing a field (its type, analyzers, etc) requires more caution and generally does require a re-index. -Michael

RE: Returning total matched document count with SolrJ

2011-06-30 Thread Michael Ryan
SolrDocumentList docs = queryResponse.getResults(); long totalMatches = docs.getNumFound(); -Michael

RE: Sorting by vale of field

2011-06-29 Thread Michael Ryan
You could try adding a new int field (like "typeSort") that has the desired sort values. So when adding a document with type:car, also add typeSort:1; when adding type:van, also add typeSort:2; etc. Then you could do "sort=typeSort asc" to get them in your desired order. I think this is also po

Using FieldCache in SolrIndexSearcher - crazy idea?

2011-06-28 Thread Michael Ryan
I am a user of Solr 3.2 and I make use of the distributed search capabilities of Solr using a fairly simple architecture of a coordinator + some shards. Correct me if I am wrong: In a standard distributed search with QueryComponent, the first query sent to the shards asks for fl=myUniqueKey or

omitTermFreqAndPositions in a TextField fieldType

2011-06-16 Thread Michael Ryan
Is it possible to use omitTermFreqAndPositions="true" in a declaration that uses class="solr.TextField"? I've tried doing this and it does not seem to work (i.e., the prx file size does not change). Using it in a declaration does work, but I'd rather set it in the so I don't have to repeat i

  1   2   >