It is indeed possible. Just need to use a different syntax. As far as I know,
the facet parameters need to be local parameters, like this...
&facet.range={!key=date_decade facet.range.start=1600-01-01T00:00:00Z
facet.range.end=2000-01-01T00:00:00Z
facet.range.gap=%2B10YEARS}date&facet.range={!k
I have the following problem: I have many (let's say hundreds of millions) of
documents in an existing distributed index that have a field with a variety of
values. Two of these values are "dog" and "puppy". I have decided that I want
to reclassify these to just all be "dog".
I do queries on th
This should do what you want:
String fq = "Field1" + "\"" +
org.apache.solr.client.solrj.util.ClientUtils.escapeQueryChars(value) + "\"";
-Michael
-Original Message-
From: tedsolr [mailto:tsm...@sciquest.com]
Sent: Monday, October 06, 2014 10:49 AM
To: solr-user@lucene.apache.org
Subje
It could be due to the minimum timer resolution on Windows. Do a search for
"windows 15ms" and you'll find a lot of information about it. Though, I'm not
sure which versions of Windows and/or Java have that problem. You could test it
out by timing things other than Solr and see if they also take
When you call addFacetField, the parameter you pass it should just be the
fieldName. The fieldValue shouldn't come into play at all (unless I'm
misunderstanding what you're trying to do).
If you ever do need to escape a value for a query, you can use
org.apache.solr.client.solrj.util.ClientUtil
Done. https://issues.apache.org/jira/browse/SOLR-6360
-Michael
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Wednesday, August 06, 2014 7:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Content-Charset header in HttpSolrServer
: I was reviewing the
Are the latLong_0_coordinate and latLong_1_coordinate fields populated using
copyField? If so, this sounds like it could be
https://issues.apache.org/jira/browse/SOLR-3502.
-Michael
-Original Message-
From: Franco Giacosa [mailto:fgiac...@gmail.com]
Sent: Monday, August 04, 2014 9:05 P
I was reviewing the httpclient code in HttpSolrServer and noticed that it sets
a "Content-Charset" header. As far as I know this is not a real header and is
not necessary. Anyone know a reason for this to be there? I'm guessing this was
just a mistake when converting from httpclient3 to httpclie
/watch?v=9h3ax5Wmxpk
On Tue, Jul 22, 2014 at 6:50 AM, Michael Ryan wrote:
> Is it possible to use DocValues on an existing index without first
> re-indexing?
>
> -Michael
>
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
<http://www.griddynamics.com>
Is it possible to use DocValues on an existing index without first re-indexing?
-Michael
I suggest doing this in two queries. In the first query, retrieve the unique
ids of the top 50 documents. In the second query, just query for those ids
(e.g., q=ids:(2 13 55 62 81)), and add the facet parameters on that query.
-Michael
-Original Message-
From: Aaron Gibbons [mailto:agib
cript in JavaScruipt using the stateless script
update processor.
Can you tell us more about the nature of your data? I mean, sometimes analyzer
filters strip or fold accented characters anyway, so count of characters versus
UTF-8 bytes may be a non-problem.
-- Jack Krupansky
-----Original Messag
In LUCENE-5472, Lucene was changed to throw an error if a term is too long,
rather than just logging a message. I have fields with terms that are too long,
but I don't care - I just want to ignore them and move on.
The recommended solution in the docs is to use LengthFilterFactory, but this
lim
f you have any questions.
LUCENE-5470 and LUCENE-5504 would move multiterm analysis farther down and make
it available to all parsers that use QueryParserBase, including the
ComplexPhraseQueryParser.
Best,
Tim
-Original Message-----
From: Michael Ryan [mailto:mr...@moreover.com]
Se
I've been using a modified version of the complex phrase query parser patch
from https://issues.apache.org/jira/browse/SOLR-1604 in Solr 3.6, and I'm
currently upgrading to 4.9, which has this built-in.
I'm having trouble with using accents in wildcard queries, support for which
was added in ht
Well for CEST, which is 2 hours ahead, I would think you could just do...
datefield:[* TO NOW/MONTH-2HOURS]
That would give you everything up to 2014-04-30 22:00:00 GMT, which is
2014-05-01 00:00:00 CEST.
Always always always store the correct value.
-Michael
-Original Message-
From:
Is there any significant difference in query speed when retrieving the score
pseudo-field? E.g., does...
q=foo&sort=date+desc&fl=*,score
...take longer to run than...
q=foo&sort=date+desc&fl=*
I know there's different code paths in Solr depending on whether the score is
needed or not, but not
Unfortunately the timeAllowed parameter doesn't apply to the part of the
processing that makes wildcard queries so slow. It only applies to a later part
of the processing when the matching documents are being collected. There's some
discussion in the original ticket that implemented this
(https
My gut instinct is that your heap size is way too high. Try decreasing it to
like 5-10G. I know you say it uses more than that, but that just seems bizarre
unless you're doing something like faceting and/or sorting on every field.
-Michael
-Original Message-
From: Patrick O'Lone [mailto
Sounds like https://issues.apache.org/jira/browse/LUCENE-3821 (issue seems to
be fixed but still shows as open).
-Michael
-Original Message-
From: Arcadius Ahouansou [mailto:arcad...@menelic.com]
Sent: Sunday, September 22, 2013 11:15 PM
To: solr-user
Subject: Interesting edismax/qs bug
This is a known bug in that JDK version. Upgrade to a newer version of JDK 7
(any build within the last two years or so should be fine). If that's not
possible for you, you can add -XX:-UseLoopPredicate as a command line option to
java to work around this.
-Michael
-Original Message-
F
> However, the Solr instance we direct our client query to is consuming
> significantly more RAM (10GB) and is still failing after a few queries when
> it runs out of heap space. This is presumably due to the role it plays,
> aggregating the results from each shard.
That seems quite odd... Wha
This is interesting... How are you measuring the heap size?
-Michael
-Original Message-
From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
Sent: Monday, July 29, 2013 5:34 AM
To: solr-user@lucene.apache.org
Subject: swap and GC
Something interesting I have noticed today, after
I'm 99% sure that the deleted docs will indeed use up space in the field cache,
at least until the segments that those documents are in are merged - that is
what an optimize will do. Of course, these segments will automatically be
merged eventually, but it might take days for this to happen, dep
I have some custom code that uses the top-level FieldCache (e.g.,
FieldCache.DEFAULT.getLongs(reader, "foobar", false)). I'd like to redesign
this to use the per-segment FieldCaches so that re-opening a Searcher is
fast(er). In most cases, I've got a docId and I want to get the value for a
part
To enforce uniqueness, Solr needs to be able to search on the id to see if it
is currently in the index.
-Michael
-Original Message-
From: Mysurf Mail [mailto:stammail...@gmail.com]
Sent: Monday, June 24, 2013 11:52 AM
To: solr-user@lucene.apache.org
Subject: why does the has to be ind
Restarting Solr won't clear the disk cache. When I'm doing perf testing, I'll
sometimes run this on the server before each test to clear out the disk cache:
echo 1 > /proc/sys/vm/drop_caches
-Michael
-Original Message-
From: Learner [mailto:bbar...@gmail.com]
Sent: Friday, June 21, 201
Sounds like this could be https://issues.apache.org/jira/browse/SOLR-2976.
-Michael
-Original Message-
From: vinothkumar raman [mailto:vinothkr.k...@gmail.com]
Sent: Monday, April 22, 2013 5:54 AM
To: solr-user@lucene.apache.org; solr-...@lucene.apache.org
Subject: Stats facet on int/tin
I've investigated this in the past. The worst case is 2*indexSize additional
disk space (3*indexSize total) during an optimize.
In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of
10. We would see the worst case happen when there were exactly 20 segments (or
some oth
Yes, this is a distributed search thing. In a distributed search, it will first
make a somewhat normal facet request to all of the shards, get back the facet
values, then make a second request in order to get the full counts of the facet
values - this second request contains a list of facet term
Large facet.limit values cause a very large amount of form data to be sent to
the shards, though I'm not sure why this would cause a NullPointerException.
Perhaps the web server you are using is truncating the data instead of
returning a form too large error, which is somehow causing an NPE. Are
Depending on your use case and the particulars of your system, a previous post
I made about using a FieldCache in SolrIndexSearcher for id retrieval (see
http://osdir.com/ml/solr-user.lucene.apache.org/2013-01/msg01574.html) may help
you. In your case, it might not be the merging process on the
What are the values of the start and rows parameters you are using? When you
say the controller shard takes a long time, how long is it taking - 100ms, 1s,
10s...?
-Michael
-Original Message-
From: qungg [mailto:qzheng1...@gmail.com]
Sent: Tuesday, March 26, 2013 11:17 AM
To: solr-user
I was wondering if anyone is aware of an existing Jira for this bug...
_query_:"\"a b\"~2"
...is parsed as...
PhraseQuery(someField:"a b")
...instead of the expected...
PhraseQuery(someField:"a b"~2)
_query_:"\"a b\""~2
...is parsed as...
PhraseQuery(someField:"a b"~2)
_query_:"\"a b\"~2"~3
...i
I don't have anything to add besides saying "this is awesome". Great analysis.
-Michael
I think the order needs to be in lowercase. Try "asc" instead of "ASC".
-Michael
-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com]
Sent: Wednesday, February 13, 2013 7:30 PM
To: solr-user@lucene.apache.org
Subject: Can't determine Sort Order: 'prijs ASC', pos=5
On this
Assuming that createdDate is a DateField in your schema.xml, the object
returned by SolrJ will be a Date object (though you will need to cast it to a
Date).
-Michael
I'm pretty sure the local params have to be at the very start of the query. But
you should be able to do this with nested queries. Try this...
fq=_query_:"{!tag=d0feea8}category:\"5\" OR otherField:\"otherValue\"" AND
type:DOCUMENT
-Michael
-Original Message-
From: Karol Sikora [mailto
>From a performance point of view, I can't imagine it mattering. In our setup,
>we have a dedicated Solr server that is not a shard that takes incoming
>requests (we call it the "coordinator"). This server is very lightweight and
>practically has no load at all.
My gut feeling is that having a
Following up from a post I made back in 2011...
> I am a user of Solr 3.2 and I make use of the distributed search capabilities
> of Solr using
> a fairly simple architecture of a coordinator + some shards.
>
> Correct me if I am wrong: In a standard distributed search with
> QueryComponent, t
Are you able to see any evidence that some of the 500k docs are being added
twice? Check the maxDocs on the Solr admin page. I vaguely recall there being
some issue with docs in SolrCloud being added multiple times (which under the
covers is really add, delete, add). I think that could cause the
(This is based on my knowledge of 3.6 - not sure if this has changed in 4.0)
You are using rows=3, which requires retrieving 3 documents from disk.
In a non-distributed search, the QTime will not include the time it takes to
retrieve these documents, but in a distributed search, it will.
If you have the same documents -- with the same uniqueKey -- across multiple
shards, the count will not be what you expect. You'll need to ensure that each
document exists only on a single shard.
-Michael
-Original Message-
From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wante
I'd guess that the patch simply doesn't implement it for distributed searches.
The code for distributed facets is quite a bit more complicated, and I don't
see it touched in this patch.
-Michael
-Original Message-
From: jmozah [mailto:jmo...@gmail.com]
Sent: Tuesday, January 08, 2013 1
>From my own experience, the timestamp seems to be logged at the start of the
>garbage collection.
-Michael
We see these EofExceptions in our system occasionally. I believe they occur
when our SolrJ client times out and closes the connection, before Jetty returns
the response.
-Michael
-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: Thursday, January 03, 2013 10:07 AM
In our system (using 3.6), it is displayed on /solr/admin/. I'd guess that the
value in solr.xml overrides the one in schema.xml, but not sure.
-Michael
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Thursday, December 20, 2012 12:08 AM
To: solr-user@lu
> Perhaps if there are a lot more ties on one end vs the other?
> Or of the values being sorted on aren't that random? Do they naturally
> increase like a timestamp?
It's a unique id field. The id is a simple sequential id, so docs with a lower
doc id will naturally also have a lower id.
I thi
Has anyone ever attempted to highlight a field that is not stored in Solr? We
have been considering not storing fields in Solr, but still would like to use
Solr's built-in highlighting. On first glance, it looks like it would be
fairly simply to modify DefaultSolrHighlighter to get the stored
As you add nodes, the average response time of the slowest node will likely
increase. For example, consider an extreme case where you have something like 1
million nodes - you're practically guaranteed that one of them is going to be
doing something like a stop-the-world garbage collection. So e
We have a longstanding issue with "failed to respond" errors in Solr when our
coordinator is querying our Solr shards.
To elaborate further... we're using the built-in distributed capabilities of
Solr 3.6, and using Jetty as our server. Occasionally, we will have a query
fail due to an error
When I upgraded from 3.2 to 3.6, I found that an optimize - all other variables
being the same - took about twice as long. Eventually I was able to track this
down to the new default of MMapDirectory. By changing back to NIOFSDirectory, I
was able to get the optimize time back down to what it fo
Yeah, the situation is kind of a pain right now. In
https://issues.apache.org/jira/browse/SOLR-2438, it was enabled by default and
there is no way to disable without patching SolrQueryParser. There's also the
edismax parser which doesn't have a setting for this, which I've made a jira
for at ht
I'd recommend not optimizing every hour. Are you seeing a significant
performance increase from optimizing this frequently?
-Michael
This should do it:
facet=true&facet.query=yourDateField:([* TO
NOW/DAY-1MILLI])&facet.query=yourDateField:([NOW/DAY TO *])
-Michael
-Original Message-
From: Paul [mailto:p...@nines.org]
Sent: Thursday, October 18, 2012 5:28 PM
To: solr-user@lucene.apache.org
Subject: facet by "in the pa
Easiest way I know of without parsing any of the index files is to take the
size of the fdx file in bytes and divide by 8. This will give you the exact
number of documents before 4.0, and a close approximation in 4.0.
Though, the fdx file might not be on disk if you haven't committed.
-Michael
We have a maven project to build a war containing everything from the Solr war,
plus some of our own code. Here's the relevant stuff from our pom.xml:
war
org.apache.solr
solr-core
org.apache.solr
solr
Facets are only really useful if you want the counts for multiple values (e.g.,
"eldudearino", "ladudearina"). I'd suggest just leaving all the facet
parameters off of that query - the numFound that is returned should give you
what you want.
The slowness may be due to the facet cache needing to
A few questions...
1) Do you only see these spikes when running JMeter? I.e., do you ever see a
spike when you manually run a query?
2) How are you measuring the response time? In my experience there are three
different ways to measure query speed. Usually all of them will be
approximately equ
Try changing the tokenizer2 SynonymFilterFactory filter to this:
By default, it seems that it uses WhitespaceTokenizer.
-Michael
I'd guess that this is because SnowballPorterFilterFactory does not implement
MultiTermAwareComponent. Not sure, though.
-Michael
In case anyone tries to do this... If you facet on a TrieField and change the
precisionStep to 0, you'll need to re-index. Changing precisionStep to 0
changes the prefix returned by TrieField.getMainValuePrefix(FieldType), which
then causes facets with a value of "0" to be returned.
-Michael
> Not really - it changes what tokens are indexed for them numbers and
> range queries won't work correctly.
> Sorting (FieldCache), function queries, etc, would still work, and
> exact match queries would still work.
Thanks. So it is just range queries that won't work correctly? That's okay for
Is it safe to change the precisionStep for a TrieField without doing a re-index?
Specifically, I want to change a field from this:
to this:
By "safe", I mean that searches will return the correct results, a FieldCache
on the field will still work, clowns won't eat me...
-Michael
It looks like the first format was removed in 3.6 as part of
https://issues.apache.org/jira/browse/SOLR-1052. The second format works in all
3.x versions.
-Michael
-Original Message-
From: Peter Wolanin [mailto:peter.wola...@acquia.com]
Sent: Friday, April 13, 2012 12:32 PM
To: solr-us
> I'm curious, why can't you do a master/slave setup?
It's just not all that useful for this particular application. Indexing new
docs and merging segments - which as I understand is the main strength of
having a write-only master - is a relatively small part of our app. What really
is expensiv
> Unless you have warming happening, there should
> only be a single searcher open at any given time.
> So it seems to me that maxWarmingSearchers
> should give you what you need.
What I'm seeing is that if a query takes a very long time to run, and runs
across the duration of multiple commits (I
https://issues.apache.org/jira/browse/SOLR-2548 may be of interest to you.
-Michael
Is there a way to limit the number of searchers that can be open at a given
time? I know there is a maxWarmingSearchers configuration that limits the
number of warming searchers, but that's not quite what I'm looking for...
Ideally, when I commit, I want there to only be one searcher open befor
This should be fine. From my experience, changing a field from stored="false"
to stored="true" and vice versa is generally safe to do and has no unexpected
behavior.
-Michael
Try putting the HTMLStripCharFilterFactory before the StandardTokenizerFactory
instead of after it. I vaguely recall being burned by something like this
before.
-Michael
> Can this be the reason why it is working automatically although there are no
> reversed tokens being stored and even without the
> ReversedWildcardFilterFactory being set, solr automatically is allowing
> leading wild card search?
Yes, that's correct. See https://issues.apache.org/jira/browse
How about having a single-valued field named "firstDestination" that has the
first destination in the list, and then your query could be something like
'destination:"Buenos Aires" firstDestination:"Buenos Aires"'. Docs that match
both should have a higher score and thus will be listed first.
-M
I was wondering... how does the TrieField precisionStep value affect the speed
of non-range queries and sorting?
I'm assuming that int (precisionStep=0) is no slower than tint
(precisionStep=8) for these - is that correct? tint is just faster for range
queries?
Is int any faster than tint for
According to http://lucene.apache.org/java/3_4_0/fileformats.html, the
FNMVersion changed from -2 to -3 in Lucene 3.4. Is it possible that the new
master is actually running 3.4, and the new slave is running 3.2? (This is just
a wild guess.)
-Michael
I had a similar requirement in my project, where a user might ask for up to
3000 results. What I did was change SolrIndexSearcher.doc(int, Set) to retrieve
the unique key from the field cache instead of retrieving it as a stored field
from disk. This resulted in a massive speed improvement for t
> The problem I have is that at search time, I have faceting turned on for
> this field and therefore, I get the four facets "canadian", "imperial",
> "bank", and "commerce", which all refer to the same record.
>
> How can I go about searching for any word contained in the company name but
> then
I have some fields I facet on that are TextFields but have just a single token.
The fieldType looks like this:
SimpleFacets uses an UnInvertedField for these fields because
multiValuedFieldCache() returns true for TextField. I tried changing the type
for
these fields to the plain "s
Another thing to note is that QTime does not include the time it takes to
retrieve the stored documents to include in the response. So if you're using a
high rows value in your query, QTime may be much smaller than the actual time
Solr spends generating the response.
Try adding rows=1 to your quer
I am trying to highlight FieldA when a user searches on either FieldA or FieldB,
but I do not want to highlight FieldA when a user searches on FieldC.
To explain further: I have a field named "content" and a field named
"contentCS". The content field is a stored text field that uses
LowerCaseFilte
I was wondering if anyone has any ideas for making UnInvertedField.uninvert()
faster, or other alternatives for generating facets quickly.
The vast majority of the CPU time for our Solr instances is spent generating
UnInvertedFields after each commit. Here's an example of one of our slower
fields
> Separately: why do you want to make the gap so large?
No reason, really. I'm just curious about how it works under the covers.
-Michael
Is there any negative side-effects of setting a very large
positionIncrementGap? For example, I use positionIncrementGap=100 right now -
is there any reason for me to not use positionIncrementGap=1, or even
greater?
I saw a thread from a few months ago asking something like this, but I did
I think the problem is that the config needs to be inside of the
config, rather than after it as your have.
-Michael
Has anyone used the "Flexible Query Parser"
(https://issues.apache.org/jira/browse/LUCENE-1567) in Solr? I'm just starting
to look at it for the first time and was wondering if it is something that can
be dropped into Solr fairly easily, or if more extensive changes are needed. I
thought perh
> yep - facet.mincount=1
Yeah, I've ran into this same issue, though I never looked too closely into it.
What is happening is that the facet.mincount parameter is removed when the
query is made to the shards, so each shard is returning about 3 facet
values, most of them with a count of 0. I
Are you using facet.mincount in the query?
-Michael
> I have recently upgraded from Solr 1.4 to Solr 3.2. In Solr 1.4 only 3
> files (one .cfs & two segments) file were made in *index/* directory.
> (after
> doing optimize).
>
> Now, in Solr 3.2, the optimize seems not be working. My final number of
> files in *index/* directory are in 7-8 in numb
> 10,000,000 document index
> Internal Document id is 32 bit unsigned int
> Max Memory Used by a single cache slot in the filter cache = 32 bits x
> 10,000,000 docs = 320,000,000 bits or 38 MB
I think it depends on where exactly the result set was generated. I believe the
result set will usually
I'm using Solr 3.2 with a mergeFactor of 10 and no merge policy configured,
thus using the default LogByteSizeMergePolicy. Before I do an optimize,
typically the largest segment will be about 90% of the total index size.
When I do an optimize, the total disk space required is usually about 2x t
> One simple way of doing this is maybe to write a wrapper for TermQuery
> that only returns docs with a Term Frequency > X as far as I
> understand the question those terms don't have to be within a certain
> window right?
Correct. Terms can be anywhere in the document. I figured term frequencie
Is there a way to specify in a query that a term must match at least X times in
a document, where X is some value greater than 1?
For example, I want to only get documents that contain the word "dog" three
times. I've thought that using a proximity query with an arbitrary large
distance value
Are you using the same analyzer for both type="query" and type="index"? Can you
show us the fieldType from your schema?
-Michael
Nope. The 'text' field will just have the 'titulo' contents. To have both, you
would have to do something like this:
-Michael
I think this is because ")" is treated as a token delimiter. So "(foo)bar" is
treated the same as "(foo) bar" (that is, bar is treated as a separate word).
So "(foo)*" is really parsed as "(foo) *" and thus the * is treated as the
start of a new word.
-Michael
You should be fine - no need to re-index your data.
Adding and removing fields is generally safe to do without a re-index. Changing
a field (its type, analyzers, etc) requires more caution and generally does
require a re-index.
-Michael
SolrDocumentList docs = queryResponse.getResults();
long totalMatches = docs.getNumFound();
-Michael
You could try adding a new int field (like "typeSort") that has the desired
sort values. So when adding a document with type:car, also add typeSort:1; when
adding type:van, also add typeSort:2; etc. Then you could do "sort=typeSort
asc" to get them in your desired order.
I think this is also po
I am a user of Solr 3.2 and I make use of the distributed search capabilities
of Solr using a fairly simple architecture of a coordinator + some shards.
Correct me if I am wrong: In a standard distributed search with
QueryComponent, the first query sent to the shards asks for fl=myUniqueKey or
Is it possible to use omitTermFreqAndPositions="true" in a
declaration that uses class="solr.TextField"? I've tried doing this and it does
not seem to work (i.e., the prx file size does not change). Using it in a
declaration does work, but I'd rather set it in the so I
don't have to repeat i
100 matches
Mail list logo