terms in the main query returns results in
miliseconds.
Note that I am not using any wildcard queries, in each case I am
specifying the field to search and the terms to search on. Where
should I start to debug?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Thu, Jun 27, 2013 at 12:14 PM, Upayavira wrote:
> can you give an example?
>
Thank you. This is an example query:
select
?q=search_field:iraq
&fq={!cache=false}search_field:love%20obama
&defType=edismax
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ividual
second in faceted on. The issue remains the same even when reversing
the order of the pivot:
&facet.pivot=provider,added
Is this a Solr bug, or am I pivoting wrong? This is on Solr 4.1.0
running on OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) on
Ubuntu Server 12.04. T
ss the release?
http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957/
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
down OR this OR that OR left
OR right OR north OR south OR east OR west
My index currently has 77461952 documents, most under 1 KiB each but
upwards of ten fields.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ords:
I (3 times matched)
eat (2 times matched)
love, cake, you, will, candy (1 time each)
Thanks!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
Thank you Jack and Koji. I will take a look at MLT and also at the
.zip files from LUCENE-474. Koji, did you have to modify the code for
the latest Solr?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
tually, the 'disk'
is an Amazon Web Service EBS volume.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal wrote:
> Does adding facet.mincount=2 help?
>
>
In fact, when adding facet.mincount=20 (I know that some dupes are in
the hundreds) I got the OutOfMemoryError in seconds instead of
minutes.
--
Dotan Cohen
http://gibberish.co.il
http:
olr 4.1 we were using overwrite=false&allowDups=false in order to
discard the new document, not overwrite the extant document. We knew
at the time that the features were depreciated, and apparently
allowDups=false stopped working in 4.3. We are testing new solutions,
but we need to identify the dupe
he query ran for almost 2 full minutes but it returned
results! I'll google for how to increase the disk cache for queries
like this. Other than the Qtime, is there no way to judge the amount
of memory required for a particular query to run?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
that this is not a one-time problem, rather, that I should already
learn how to deal with tuning Solr for intensive queries as such. I
learn by the problems encountered!
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Tue, Jul 30, 2013 at 9:56 PM, Shawn Heisey wrote:
> On 7/30/2013 12:49 PM, Dotan Cohen wrote:
>>
>> Thanks, the query ran for almost 2 full minutes but it returned
>> results! I'll google for how to increase the disk cache for queries
>> like this. Other th
.apache.org/solr/TermsComponentand found that it can be
> really memory modest (ie without sort nor limit).
> Be aware that df-s returned by that component are unaware of deleted
> document, hence expungeDeletes before.
>
Thank you, I will look into that.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
indexSearcher.decref();
// if i'm here then it's a new document
return super.addDoc(cmd);
}
}
> And I give a bunch of examples in my book.
>
I anticipate the book with esteem!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
host:8983/solr/terms?terms.fl=id&terms.mincount=2";
>
Thanks, Jack. This returns results with comparable Qtimes to the
faceting on enum. Good to know!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Wed, Jul 31, 2013 at 4:56 AM, Bill Bell wrote:
> On Jul 30, 2013, at 12:34 PM, Dotan Cohen wrote:
>> On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal wrote:
>>> Does adding facet.mincount=2 help?
>>
>> In fact, when adding facet.mincount=20 (I know that some dupes
buntu Server. Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
caching alltogether rarely does.
>
Thanks. The problem is that the queries with filter queries are taking
much longer to run (~60-80 ms) than the queries without (~1-4 ms). I
figured that the problem may have been with the caching.
In fact, running a query with a filter query and caching disabl
0
The server is 64-bit Ubuntu Server 12.04 LTS running Solr 4.1 and the
following Java:
$ java -version
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ase the heap until Solr serves the
> facets without OOM.
>
Thanks, I will start with "-Xmx8g" and test.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
e how this goes.
Thank you.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen wrote:
> On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote:
>
> [Tokd: maxWarmingSearchers limit exceeded?]
>
>> Thank you Toke, this is exactly on my "list of things to learn about
>> Solr". We do get the error
ds have?
>
Batches of 20-50 results are added to solr a few times a minute, and a
commit is done after each batch since I'm calling Solr as such:
http://127.0.0.1:8983/solr/core/update/json?commit=true
Should I remove commit=true and run a cron job to commit once per minute?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ords), in
order to determine what the top keywords / topics are. That query
would take up to 200 seconds to run, but it does not have to return
the results in real-time (the output goes to another process, not to a
waiting user).
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
r 4.1 as it contains
so many examples?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
; remove commit=true and run a cron job to commit once per minute?
>
>
> Even better, it sounds like a job for CommitWithin :
> http://wiki.apache.org/solr/CommitWithin
>
I'll look into that. Thank you!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ut I think that I eliminated it
as a possibility because I actually need the top keywords related to a
specific keyword. For instance, I need to know which words are most
commonly used with the word "coffee".
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
.xml file differs from the Solr default value. I think that
this is bad practice: a single default should be decided upon and Solr
should use this value when nothing is specified in solrconfig.xml, and
that _same_value_ should be specified in the stock solrconfig.xml. Is
it not a reasonable assumption th
the invalid queries which also log as SEVERE. I thought that this
would be easy to Google for, but it is not! If there is a concise
document that examines this issue, I would love to know where on the
wild wild web it exists.
Thank you.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Wed, Apr 3, 2013 at 8:47 PM, Shawn Heisey wrote:
> On 4/2/2013 3:09 AM, Dotan Cohen wrote:
>> I notice that this only occurs on queries that run facets. I start
>> Solr with the following command:
>> sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
>>
setup will eventually have a spectacular
> blow up with OutOfMemory, rather than semi-silently ignoring commits. A
> searcher object contains caches and uses a lot of memory, so having lots of
> them around will eventually use up the entire heap.
>
Silently dropping data is by far the worse choice, I agree, especially
as a default setting.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
hanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ch
that they will show as a dupe fo the
RemoveDuplicatesTokenFilterFactory? That seems odd.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
> definition, they are a new value for the term text.
>
>
I see, for some reason I did not concentrate on this key quote of yours:
"...to remove the tokens that did not produce a stem ..."
Now it makes perfect sense.
Thank you, Jack!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
removing and adding fields to the schema has shown almost no change in
the extant index results returned.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ly impact on future indexing. Whether
> your existing index will still be valid depends upon the changes you are
> making.
>
> Upayavira
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ot;, or does the Solr index directory contents or even the directory
> itself need to be explicitly deleted first? I believe it is the latter, but
> the former "seems" to work, most of the time. Deleting the directory itself
> "seems" to be the best answer, to date - but no guarantees!
>
I don't have an answer for that, sorry!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
mind that while the reindex is happening, clients will be performing
searches and a few hundred documents will be written per minute. Note
that the machine running Solr is an EC2 instance running on Amazon Web
Services, and that the 'disk' on which the Solr index is stored in an
EBS v
you know you're up-to-date and can wait, say, 30s before
> making another request.
>
Actually, I would add a filter query for documents whose last_index
value is before the last schema change, and stop when less documents
were returned than were requested.
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
new value, so I see
that the change was properly commited.
Is there a known bug that overwriting such a doc...:
a
b
...with this doc...:
a
...has no effect? Can multiValue fields be only added, but not removed?
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
as new documents are being push in.
> My earlier reply to your other message has some other ideas that will
> hopefully help.
>
Thank you Shawn!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
> half-upgraded -- one copy of my index is version 3.5.0, the other is
> 4.2.1. Switching to SolrCloud with sharding and replication would
> eliminate this flexibility, unless I maintained two separate clouds.
>
Thank you. I am not using Solr Cloud but if I ever consider it, then I
will keep this in mind.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
to append a couple of values:
>
>
>
>doc-id
>a
>b
>
>
>
> To empty out a multivalued field:
>
>
>
>doc-id
>
>
>
>
Thank you. I will see about translating that into the JSON format that
I work with.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
der. Apache Solr is an amazing product, but it is often obtuse and
unintuitive. Other times one does not even know what Solr is capable
of, such as the case in this thread, where I was parsing entire
documents to change the multiField value.
Thank you very much!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
bout it. If there is any fine
manual that is particularly urgent that I should read, please do
mention it. Thanks!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
7;dotan-', even if a document has other tags
such as 'beatles'?
4) How to have Solr return only those faceting values which are larger than 0?
Thank you!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
e the parameter facet.mincount - looks like you want to set it to 1,
> instead of the default which is 0.
Perfect, thank you Raymond!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Wed, Jun 5, 2013 at 3:41 PM, Brendan Grainger
wrote:
> Hi Dotan,
>
> I think all you need to do is add:
>
> facet.mincount=1
>
> i.e.
>
> select?q=*:*&fq=tags:dotan-*&facet=true&facet.field=tags&
> rows=0&facet.mincount=1
>
> Note that you can do it per field as well:
>
> select?q=*:*&fq=tags:d
sults with
_both_ term1 _and_ term2, which could be between 0-10 documents.
Note that in the application, users will be searching for any
arbitrary number of terms, in fact they will be entering phrases. I
can limit these phrases to 140 characters if needed.
Thank you in advance!
--
Dotan C
On Wed, Jun 5, 2013 at 6:10 PM, Shawn Heisey wrote:
> On 6/5/2013 9:03 AM, Dotan Cohen wrote:
>> How would one write a query which should perform set union on the
>> search terms (term1 OR term2 OR term3), and yet also perform phrase
>> matching if both terms are found? I tr
ing from an arbitrary-length phrase, but it wouldn't
be pretty! Edismax does in fact meet my need, though.
Thanks!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
gle the user-input in order to
add the "~1" to the end of each term.
Note that the ExtendedDisMax page does in fact mention that fuzziness
is supported:
http://wiki.apache.org/solr/ExtendedDisMax#Query_Syntax
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Wed, Jun 5, 2013 at 9:04 PM, Eustache Felenc
wrote:
> There is also http://wiki.apache.org/solr/SolrRelevancyCookbook with nice
> examples.
>
Thank you.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
to filed size in words.
Thank you.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
gt; that produces the above form.
>
Thank you Shawn, that is much cleaner and will be easier to debug when
/ if things go wrong.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ount. But I commend you taking notice and taking an
interest. Thank you!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
rfect test for Phoronix, and much more
relevant for some readers than Jack-the-Ripper or Quake.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
checking which classes are available?
Thank you.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
wherever. FWIW you
> can find the same question and my response on Stackoverflow.
>
> ~ David
>
Thank you David. In fact I do frequent Stack Overflow.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
de.
>
Hi Alex. Would you mind posting the new analyzers?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
schema.xml have changed. Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
finding any good ways of going about
this.
Note that we are talking about ~18,000,000 (yes, 18 million) small
documents similar to 'tweets' (mostly under 1 KiB each, very very few
over 5 KiB).
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
s N documents at a time, indexes them into
Solr 4.1, then requests another N documents to index? Or is there
internal Solr / Lucene facility for this? I've actually looked for
such a facility, but as I am unable to find such a thing I ask.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Fri, Mar 1, 2013 at 12:22 PM, Rafał Kuć wrote:
> Hello!
>
> As far as I know you have to re-index using external tool.
>
Thank you Rafał. That is what I figured.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
eature of 4.2 are you suggesting for this issue? Can
Solr 4.2 natively import from a Solr index?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
I see, thanks. Actually, running a clean 4.1 with no previous index
does not have the issues.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Mon, Aug 20, 2012 at 3:00 PM, Markus Jelsma
wrote:
> Date queries are described here: http://wiki.apache.org/solr/SolrQuerySyntax
>
Terrific, thank you!
> You must first make sure your dates end up in a Date fieldType and are in the
> proper format.
>
Thanks.
--
Do
On Mon, Sep 3, 2012 at 5:50 PM, Alexey Serba wrote:
> http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting
>
Thank you, that does seem to be only available on Solr 4.0. Luckily,
we're using Websolr so upgrading is rather easy!
Thanks!
--
Dotan
tamp
for compatibility with other software. I had planned on converting it
all over to the internal Solr Datetime type, but I now see that I
should leave it as a timestamp.
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
its type to string?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ld that is to be pivoted
on needs to be a string field? Is that documented, as I cannot find
that in the docs.
Thanks!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
gt; doesn't work properly, as it indexes multiple terms per value and you'd get
> odd values. Pivot faceting was initially implemented only with textual
> terms in mind, and string is generally the desired type.
>
Thanks for the insight. I'll see how much time for experimentation I
might afford.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
pivot faceting, but all of the other types of
> faceting take this into account (hence faceting works fine on trie
> fields).
>
Thanks. I am not familiar with the trie field, but I'll look into it.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
s a lady whos sure
all that glitters is gold
and shes buying a stairway to heaven
I would prefer to get this result:
all that glitters is gold
(psuedo-XML from memory, may not be accurate but illustrates the point)
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
et! In fact, I did not know that the
updateRequestProcessorChain needed to be defined in solrconfig.xml and
I had tried to define it in schema.xml. I don't have access to
solrconfig.xml (I am using Websolr) but I will contact them about
adding it.
Thank you.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
r-PHP-Client. In fact,
preceding the variable with (int) does in fact resolve the issue I
have found. This looks like an issue with PHP being weakly typed.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
d by an order
of magnitude.
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
n application and in production schema do in fact match!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
tice that the
highlighted sections come after the main results. The highlighting
feature works as expected.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
gold
Whereas I need:
all that glitters is gold
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
>From where did the additional relative paths 'collection1',
'collection1/data', and 'collection1/data/index' come from? I know
that I can change the value of CWD with the -Dsolr.solr.home flag, but
what affects the relative paths mentioned?
Thanks.
--
Dotan C
which is how Solr knows about the
> relative paths. There should also be a README.txt file that will tell you
> more about how the directory is expected to be organized.
>
> Cheers,
> Tricia
>
Thanks. I read the top-level README.txt but now I see that the answer
is in the so
tp://wiki.apache.org/solr/SchemaXml
I do have a Solr 4 Beta index running on Websolr that does not have
such a field. It works, but throws many "Service Unavailable" and
"Communication Error" errors. Might the lack of the _version_ field be
the reason?
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
27;t say anything about the cause of
> those errors without seeing the exception.
>
I see, thanks. I don't think that I'm using the SolrCloud feature. Is
it enable because there exist "solr/collection1" and also
"multicore/core0"?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
d to your schema and forget about it.
> You don't need to do anything with it as it is used internally by
> Solr.
>
That is exactly my plan, but I would also like to understand more
about what is going on. I don't like cut-and-paste programming.
Thank you very much!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
eferring to the example schema.xml file provided with
Solr, then I'd love to. I'm signing up for the dev list now. Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ping_onDeckSearchers.3DX.22_mean_in_my_logs.3F
I happen to know that the script will try to commit once every 60
seconds. How does one "reduce the work in newSearcher listeners"? What
effect will this have? What effect will reducing the autowarmCount on
caches have?
Thanks.
--
Dotan Cohe
panel twice and I cannot
find the cache statistics. Where are they?
> Lowering the autowarmCount should lower the time needed to warm up,
> howere you can also look at your warming queries (if you have such)
> and see how long they take.
>
Thank you, I will look at that!
--
Dotan Cohe
On Mon, Oct 22, 2012 at 5:27 PM, Mark Miller wrote:
> Are you using Solr 3X? The occasional long commit should no longer
> show up in Solr 4.
>
Thank you Mark. In fact, this is the production release of Solr 4.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Mon, Oct 22, 2012 at 7:29 PM, Shawn Heisey wrote:
> On 10/22/2012 9:58 AM, Dotan Cohen wrote:
>>
>> Thank you, I have gone over the Solr admin panel twice and I cannot find
>> the cache statistics. Where are they?
>
>
> If you are running Solr4, you can see indi
I make the best use of that for Solr assuming both heavy reads
and writes?
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
free sharedbuffers cached
Mem:14 2 12 0 0 1
-/+ buffers/cache: 0 14
Swap: 0 0 0
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
In Ultraseek Server, it was
> called "force merge" and we had to tell people to stop doing that nearly
> every month.
>
Thank you for those links. I commented on the Solr bug. There are some
very insightful comments in there.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
had to reduce my autowarm count
> on the filter cache to FOUR, with a cache size of 512. When it is 8 or
> higher, it can take over a minute to autowarm.
>
I will have to experiment with the warning. Thank you for the tips.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
soft commits, so you have that option to have the
> documents immediately available for search.
>
Thanks, Erick. I'll play around with different configurations. So far
just removing the periodic optimize command worked wonders. I'll see
how much it helps or hurts to run that daily or m
en merge to see the
effect.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ers/cache:729 14250
Swap: 0 0 0
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
he following
> bits with a header to indicate what each section is:
> CACHE->filterCache
> CACHE->queryResultCache
> CORE->searcher
>
> Thanks,
> Shawn
>
Thank you Shawn. The information is here:
http://pastebin.com/aqEfeYVA
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
h the main java executable:
>
> update-alternatives --config java
>
Thanks, I will take a look at the current Oracle JVM.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
1 - 100 of 112 matches
Mail list logo