On Wed, Jul 31, 2013 at 4:56 AM, Bill Bell wrote:
> On Jul 30, 2013, at 12:34 PM, Dotan Cohen wrote:
>> On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal wrote:
>>> Does adding facet.mincount=2 help?
>>
>> In fact, when adding facet.mincount=20 (I know that some dupes
host:8983/solr/terms?terms.fl=id&terms.mincount=2";
>
Thanks, Jack. This returns results with comparable Qtimes to the
faceting on enum. Good to know!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
indexSearcher.decref();
// if i'm here then it's a new document
return super.addDoc(cmd);
}
}
> And I give a bunch of examples in my book.
>
I anticipate the book with esteem!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
.apache.org/solr/TermsComponentand found that it can be
> really memory modest (ie without sort nor limit).
> Be aware that df-s returned by that component are unaware of deleted
> document, hence expungeDeletes before.
>
Thank you, I will look into that.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Tue, Jul 30, 2013 at 9:56 PM, Shawn Heisey wrote:
> On 7/30/2013 12:49 PM, Dotan Cohen wrote:
>>
>> Thanks, the query ran for almost 2 full minutes but it returned
>> results! I'll google for how to increase the disk cache for queries
>> like this. Other th
that this is not a one-time problem, rather, that I should already
learn how to deal with tuning Solr for intensive queries as such. I
learn by the problems encountered!
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
he query ran for almost 2 full minutes but it returned
results! I'll google for how to increase the disk cache for queries
like this. Other than the Qtime, is there no way to judge the amount
of memory required for a particular query to run?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
olr 4.1 we were using overwrite=false&allowDups=false in order to
discard the new document, not overwrite the extant document. We knew
at the time that the features were depreciated, and apparently
allowDups=false stopped working in 4.3. We are testing new solutions,
but we need to identify the dupe
On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal wrote:
> Does adding facet.mincount=2 help?
>
>
In fact, when adding facet.mincount=20 (I know that some dupes are in
the hundreds) I got the OutOfMemoryError in seconds instead of
minutes.
--
Dotan Cohen
http://gibberish.co.il
http:
tually, the 'disk'
is an Amazon Web Service EBS volume.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
Thank you Jack and Koji. I will take a look at MLT and also at the
.zip files from LUCENE-474. Koji, did you have to modify the code for
the latest Solr?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ords:
I (3 times matched)
eat (2 times matched)
love, cake, you, will, candy (1 time each)
Thanks!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
down OR this OR that OR left
OR right OR north OR south OR east OR west
My index currently has 77461952 documents, most under 1 KiB each but
upwards of ten fields.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ss the release?
http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957/
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ividual
second in faceted on. The issue remains the same even when reversing
the order of the pivot:
&facet.pivot=provider,added
Is this a Solr bug, or am I pivoting wrong? This is on Solr 4.1.0
running on OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) on
Ubuntu Server 12.04. T
On Thu, Jun 27, 2013 at 12:14 PM, Upayavira wrote:
> can you give an example?
>
Thank you. This is an example query:
select
?q=search_field:iraq
&fq={!cache=false}search_field:love%20obama
&defType=edismax
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
terms in the main query returns results in
miliseconds.
Note that I am not using any wildcard queries, in each case I am
specifying the field to search and the terms to search on. Where
should I start to debug?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
to filed size in words.
Thank you.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Wed, Jun 5, 2013 at 9:04 PM, Eustache Felenc
wrote:
> There is also http://wiki.apache.org/solr/SolrRelevancyCookbook with nice
> examples.
>
Thank you.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
gle the user-input in order to
add the "~1" to the end of each term.
Note that the ExtendedDisMax page does in fact mention that fuzziness
is supported:
http://wiki.apache.org/solr/ExtendedDisMax#Query_Syntax
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ing from an arbitrary-length phrase, but it wouldn't
be pretty! Edismax does in fact meet my need, though.
Thanks!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Wed, Jun 5, 2013 at 6:10 PM, Shawn Heisey wrote:
> On 6/5/2013 9:03 AM, Dotan Cohen wrote:
>> How would one write a query which should perform set union on the
>> search terms (term1 OR term2 OR term3), and yet also perform phrase
>> matching if both terms are found? I tr
sults with
_both_ term1 _and_ term2, which could be between 0-10 documents.
Note that in the application, users will be searching for any
arbitrary number of terms, in fact they will be entering phrases. I
can limit these phrases to 140 characters if needed.
Thank you in advance!
--
Dotan C
On Wed, Jun 5, 2013 at 3:41 PM, Brendan Grainger
wrote:
> Hi Dotan,
>
> I think all you need to do is add:
>
> facet.mincount=1
>
> i.e.
>
> select?q=*:*&fq=tags:dotan-*&facet=true&facet.field=tags&
> rows=0&facet.mincount=1
>
> Note that you can do it per field as well:
>
> select?q=*:*&fq=tags:d
e the parameter facet.mincount - looks like you want to set it to 1,
> instead of the default which is 0.
Perfect, thank you Raymond!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
7;dotan-', even if a document has other tags
such as 'beatles'?
4) How to have Solr return only those faceting values which are larger than 0?
Thank you!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
bout it. If there is any fine
manual that is particularly urgent that I should read, please do
mention it. Thanks!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
der. Apache Solr is an amazing product, but it is often obtuse and
unintuitive. Other times one does not even know what Solr is capable
of, such as the case in this thread, where I was parsing entire
documents to change the multiField value.
Thank you very much!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
to append a couple of values:
>
>
>
>doc-id
>a
>b
>
>
>
> To empty out a multivalued field:
>
>
>
>doc-id
>
>
>
>
Thank you. I will see about translating that into the JSON format that
I work with.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
> half-upgraded -- one copy of my index is version 3.5.0, the other is
> 4.2.1. Switching to SolrCloud with sharding and replication would
> eliminate this flexibility, unless I maintained two separate clouds.
>
Thank you. I am not using Solr Cloud but if I ever consider it, then I
will keep this in mind.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
as new documents are being push in.
> My earlier reply to your other message has some other ideas that will
> hopefully help.
>
Thank you Shawn!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
new value, so I see
that the change was properly commited.
Is there a known bug that overwriting such a doc...:
a
b
...with this doc...:
a
...has no effect? Can multiValue fields be only added, but not removed?
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
you know you're up-to-date and can wait, say, 30s before
> making another request.
>
Actually, I would add a filter query for documents whose last_index
value is before the last schema change, and stop when less documents
were returned than were requested.
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
mind that while the reindex is happening, clients will be performing
searches and a few hundred documents will be written per minute. Note
that the machine running Solr is an EC2 instance running on Amazon Web
Services, and that the 'disk' on which the Solr index is stored in an
EBS v
ot;, or does the Solr index directory contents or even the directory
> itself need to be explicitly deleted first? I believe it is the latter, but
> the former "seems" to work, most of the time. Deleting the directory itself
> "seems" to be the best answer, to date - but no guarantees!
>
I don't have an answer for that, sorry!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ly impact on future indexing. Whether
> your existing index will still be valid depends upon the changes you are
> making.
>
> Upayavira
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
removing and adding fields to the schema has shown almost no change in
the extant index results returned.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
> definition, they are a new value for the term text.
>
>
I see, for some reason I did not concentrate on this key quote of yours:
"...to remove the tokens that did not produce a stem ..."
Now it makes perfect sense.
Thank you, Jack!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ch
that they will show as a dupe fo the
RemoveDuplicatesTokenFilterFactory? That seems odd.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
hanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
setup will eventually have a spectacular
> blow up with OutOfMemory, rather than semi-silently ignoring commits. A
> searcher object contains caches and uses a lot of memory, so having lots of
> them around will eventually use up the entire heap.
>
Silently dropping data is by far the worse choice, I agree, especially
as a default setting.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Wed, Apr 3, 2013 at 8:47 PM, Shawn Heisey wrote:
> On 4/2/2013 3:09 AM, Dotan Cohen wrote:
>> I notice that this only occurs on queries that run facets. I start
>> Solr with the following command:
>> sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
>>
the invalid queries which also log as SEVERE. I thought that this
would be easy to Google for, but it is not! If there is a concise
document that examines this issue, I would love to know where on the
wild wild web it exists.
Thank you.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
.xml file differs from the Solr default value. I think that
this is bad practice: a single default should be decided upon and Solr
should use this value when nothing is specified in solrconfig.xml, and
that _same_value_ should be specified in the stock solrconfig.xml. Is
it not a reasonable assumption th
ut I think that I eliminated it
as a possibility because I actually need the top keywords related to a
specific keyword. For instance, I need to know which words are most
commonly used with the word "coffee".
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
; remove commit=true and run a cron job to commit once per minute?
>
>
> Even better, it sounds like a job for CommitWithin :
> http://wiki.apache.org/solr/CommitWithin
>
I'll look into that. Thank you!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
r 4.1 as it contains
so many examples?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ords), in
order to determine what the top keywords / topics are. That query
would take up to 200 seconds to run, but it does not have to return
the results in real-time (the output goes to another process, not to a
waiting user).
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ds have?
>
Batches of 20-50 results are added to solr a few times a minute, and a
commit is done after each batch since I'm calling Solr as such:
http://127.0.0.1:8983/solr/core/update/json?commit=true
Should I remove commit=true and run a cron job to commit once per minute?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen wrote:
> On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote:
>
> [Tokd: maxWarmingSearchers limit exceeded?]
>
>> Thank you Toke, this is exactly on my "list of things to learn about
>> Solr". We do get the error
e how this goes.
Thank you.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ase the heap until Solr serves the
> facets without OOM.
>
Thanks, I will start with "-Xmx8g" and test.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
0
The server is 64-bit Ubuntu Server 12.04 LTS running Solr 4.1 and the
following Java:
$ java -version
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
caching alltogether rarely does.
>
Thanks. The problem is that the queries with filter queries are taking
much longer to run (~60-80 ms) than the queries without (~1-4 ms). I
figured that the problem may have been with the caching.
In fact, running a query with a filter query and caching disabl
buntu Server. Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
I see, thanks. Actually, running a clean 4.1 with no previous index
does not have the issues.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
eature of 4.2 are you suggesting for this issue? Can
Solr 4.2 natively import from a Solr index?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Fri, Mar 1, 2013 at 12:22 PM, Rafał Kuć wrote:
> Hello!
>
> As far as I know you have to re-index using external tool.
>
Thank you Rafał. That is what I figured.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
s N documents at a time, indexes them into
Solr 4.1, then requests another N documents to index? Or is there
internal Solr / Lucene facility for this? I've actually looked for
such a facility, but as I am unable to find such a thing I ask.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
finding any good ways of going about
this.
Note that we are talking about ~18,000,000 (yes, 18 million) small
documents similar to 'tweets' (mostly under 1 KiB each, very very few
over 5 KiB).
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
schema.xml have changed. Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
de.
>
Hi Alex. Would you mind posting the new analyzers?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
wherever. FWIW you
> can find the same question and my response on Stackoverflow.
>
> ~ David
>
Thank you David. In fact I do frequent Stack Overflow.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
checking which classes are available?
Thank you.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
rfect test for Phoronix, and much more
relevant for some readers than Jack-the-Ripper or Quake.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ount. But I commend you taking notice and taking an
interest. Thank you!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
gt; that produces the above form.
>
Thank you Shawn, that is much cleaner and will be easier to debug when
/ if things go wrong.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Wed, Nov 7, 2012 at 5:16 PM, Walter Underwood wrote:
> You are probably thinking of SweetSpotSimilarity. You might also want to look
> at pivoted document normalization.
>
Thanks, I'll take a look at that.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
smoother. Try doing that.
>
> Otis
Thanks, Otis. I'll start googling for Solr and Lucene Similarity.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
trying again?
>
Thanks. Seeing how all the documents are being added, either there is
a valid format in the created_iso8601 field or it is empty. I've
pretty much ruled out empty in code, but still nothing in the index.
I'll play around some more and update the list. At least I am learning
nding
the value of the created_iso8601 field to another field, I can ensure
that the value is legal and does exist! On the other hand, it seems
that there is no such value being stored in the index, but new
documents are being added that ostensibly should have that value.
I'll try adding a document with post.jar and see what happens. I'll
update the thread.
Thanks!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
> field comes first or last (I don't know which).
>
Thank you. In fact, I am being careful to try to pull up records after
the date in which the application was updated to populate the field.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
the field remains
empty. Perhaps I'm writing ISO 8601 wrong, I'll get to looking at that
now. I'm surprised that Solr accepts the documents with bad data in
some of the fields, I will look into that too as well.
Have a peaceful Saturday.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
absolutely sure that that document has the new field? Solr happily sorts
> documents that do not have a value for a field, that's the purpose of
> sortMissingFirst/Last.
>
All the newest documents have the field, and I'm sorting by time
descending. In fact, I did test with more rows
f the
created_iso8601field did not exist. That field is indexed and stored,
with no parameters defined on handlers that may list the fields to
return as Alexandre had mentioned.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
hat route I would have it recognize some
LocalParams (such as omitNorms=true right there) to be flexible at
query time. I'm actually surprised that this doesn't yet exist.
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
, in the sense that
one could not apply on-the-fly score computation component
coefficients. Surely I'm not the first dev to run into an issue with
the default scoring algorithm and want to tweak it only on specific
queries!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ory). Only full-text fields or fields that
> need an index-time boost need norms. "
>
> http://wiki.apache.org/solr/SchemaXml
Thank you, but I am looking for a query-time modifier. I do need the
fieldNorm enabled in the general sense.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
523.html
>
Thank you Shawn! Those are exactly the documents that I need. Google
should hire you to fill in the pages when someone searches for "java
garbage collection". Interestingly, I just check and bing.com does
list the Oracle page on the first pager of results. I shudder to think
tha
h the main java executable:
>
> update-alternatives --config java
>
Thanks, I will take a look at the current Oracle JVM.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
he following
> bits with a header to indicate what each section is:
> CACHE->filterCache
> CACHE->queryResultCache
> CORE->searcher
>
> Thanks,
> Shawn
>
Thank you Shawn. The information is here:
http://pastebin.com/aqEfeYVA
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
ers/cache:729 14250
Swap: 0 0 0
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
en merge to see the
effect.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
soft commits, so you have that option to have the
> documents immediately available for search.
>
Thanks, Erick. I'll play around with different configurations. So far
just removing the periodic optimize command worked wonders. I'll see
how much it helps or hurts to run that daily or m
had to reduce my autowarm count
> on the filter cache to FOUR, with a cache size of 512. When it is 8 or
> higher, it can take over a minute to autowarm.
>
I will have to experiment with the warning. Thank you for the tips.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
In Ultraseek Server, it was
> called "force merge" and we had to tell people to stop doing that nearly
> every month.
>
Thank you for those links. I commented on the Solr bug. There are some
very insightful comments in there.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
free sharedbuffers cached
Mem:14 2 12 0 0 1
-/+ buffers/cache: 0 14
Swap: 0 0 0
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
I make the best use of that for Solr assuming both heavy reads
and writes?
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
On Mon, Oct 22, 2012 at 7:29 PM, Shawn Heisey wrote:
> On 10/22/2012 9:58 AM, Dotan Cohen wrote:
>>
>> Thank you, I have gone over the Solr admin panel twice and I cannot find
>> the cache statistics. Where are they?
>
>
> If you are running Solr4, you can see indi
On Mon, Oct 22, 2012 at 5:27 PM, Mark Miller wrote:
> Are you using Solr 3X? The occasional long commit should no longer
> show up in Solr 4.
>
Thank you Mark. In fact, this is the production release of Solr 4.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
panel twice and I cannot
find the cache statistics. Where are they?
> Lowering the autowarmCount should lower the time needed to warm up,
> howere you can also look at your warming queries (if you have such)
> and see how long they take.
>
Thank you, I will look at that!
--
Dotan Cohe
ping_onDeckSearchers.3DX.22_mean_in_my_logs.3F
I happen to know that the script will try to commit once every 60
seconds. How does one "reduce the work in newSearcher listeners"? What
effect will this have? What effect will reducing the autowarmCount on
caches have?
Thanks.
--
Dotan Cohe
eferring to the example schema.xml file provided with
Solr, then I'd love to. I'm signing up for the dev list now. Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
d to your schema and forget about it.
> You don't need to do anything with it as it is used internally by
> Solr.
>
That is exactly my plan, but I would also like to understand more
about what is going on. I don't like cut-and-paste programming.
Thank you very much!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
27;t say anything about the cause of
> those errors without seeing the exception.
>
I see, thanks. I don't think that I'm using the SolrCloud feature. Is
it enable because there exist "solr/collection1" and also
"multicore/core0"?
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
tp://wiki.apache.org/solr/SchemaXml
I do have a Solr 4 Beta index running on Websolr that does not have
such a field. It works, but throws many "Service Unavailable" and
"Communication Error" errors. Might the lack of the _version_ field be
the reason?
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
which is how Solr knows about the
> relative paths. There should also be a README.txt file that will tell you
> more about how the directory is expected to be organized.
>
> Cheers,
> Tricia
>
Thanks. I read the top-level README.txt but now I see that the answer
is in the so
>From where did the additional relative paths 'collection1',
'collection1/data', and 'collection1/data/index' come from? I know
that I can change the value of CWD with the -Dsolr.solr.home flag, but
what affects the relative paths mentioned?
Thanks.
--
Dotan C
gold
Whereas I need:
all that glitters is gold
Thanks.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
tice that the
highlighted sections come after the main results. The highlighting
feature works as expected.
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
1 - 100 of 112 matches
Mail list logo