Uwe,
I'm sorry for confusion https://issues.apache.org/jira/browse/SOLR-7730
goes in 5.4 only. Hence, to get fast DV facets you need to apply patch
(it's pretty small).
Accelerating non-DV facets is not so clear so far. Please show profiler
snapshot for non-DV facets if you wish to go this way.
Well done Mikhail,
curious to see the performance!
Apart the disk usage ( of course building docValues will cost more space),
taking in consideration the Field cardinality, in the past when the field
cardinality was low ( few unique values in the field), the enum approach
was suggested ( so DocVal
TrieLongField should be the other way around, you can index Long data, in
the perspective of running efficient range queries on it.
But you want to actually index a range, and query for values ( matching
only the docs which have valid ranges for that field).
Not sure there's something like that Ou
Despite docvalues provide NRT faceting with a great performance (since
5.4), enum method is still really important for edge cases (many docs,
small num of terms).
Also, Solr's UnIvertedField had a really smart BigTerms strategy, when
fattest terms were counted by enum and remaining ones with fc. Do
When I submit this:
http://localhost:8983/solr/EventLog/select?q=deeper&wt=json&indent=true
then I get these (empty) results:
{
"responseHeader":{
"status":0,
"QTime":1,
"params":{
"q":"deeper",
"indent":"true",
"wt":"json"}},
"response":{"numFound":0,"start":
Thank for replaying. I have tried with Geo Search, but I cannot managed to
query documents using edismax, because in GeoSearch one has to use
Contains, Intersect etc. keywords and I don't know how to put it in edismax
query...
In my case I just drop prefix from my ranges and it happened to fit Int
On Tue, 2015-09-22 at 11:56 -0700, Erick Erickson wrote:
> FWIW, there is work being done for "high cardinality faceting" with
> some of the recent Streaming Aggregation code.
The JIRA I can find that seems relevant is
https://issues.apache.org/jira/browse/SOLR-7903
I can see streaming faceting w
If you go to the Analysis tool, indexing and query time , what can you see
for your "deeper" query text and your field content ?
( using the log text field ) ?
Have you verified the current tokens in the index for that field ?
I quickly went through your config files, and they look ok, but it is q
Hi Mark,
Search is not coming properly becuase you have taken "ELall" field as a
text type which is not define properly.
you have to modify the schema.xml with these chance.
--
View this message in context:
http://lucene.472066.n3.nabble.com/query-parsing-tp423077
Hi Solr community,
I can find many blog posts on how to deploy Solr with docker but I am
wondering if Solr/Docker is really ready for production.
Has anybody ever ran Solr in production with Docker?
Thank you for your feedback,
Aurélien
HI Erik,
Thank you for your reply. I wrote into a file. Allmy love to cat * > file.
I structured my JSON in a format that I would like to upload into solr.
defined a schema.xml and solrconfig.xml and took it from there. Initiallu I
uploaded a 1G with post then I got a bit to over zealous I guess.
hi all
i wanna to get each doc score in search result + restrict search result to
some doc that their score are above than score that i need (i mean i set
minimum score in search and get doc based on upper than that score)
i need this in normal search with edismax and more like this in pysolr
i und
It's quite common to hear about the benefit of sharding.
Until we reach the I/O bound on our machines, sharding is likely to reduce
the query time.
Furthermore working on smaller indexes will make the single searches faster
on the smaller nodes.
But what about the other way around ?
What if we actu
Mugeesh, I believe you are on the right path and I was eager to try out
your suggestion. So my schema.xml now contains this snippet (changes
indicated by ~):
required="true" />
~ stored="true" required="true" />
required="true" />
required="true" />
~ stored="true" multiValue
On Wed, Sep 23, 2015, at 02:00 PM, aurelien.mazo...@francelabs.com
wrote:
> Hi Solr community,
>
> I can find many blog posts on how to deploy Solr with docker but I am
> wondering if Solr/Docker is really ready for production.
> Has anybody ever ran Solr in production with Docker?
Hi Aurelien
m so those 2 are the queries at the minute :
1) logtext:deeper
2) logtext:*deeper*
According to your schema, the log text field is of type "text_en".
This should be completely fine.
Have you ever changed your schema on run ? without re-indexing your old
docs ?
What happens if you use your ana
I've tried to run different shards from different machine, and there is a
slight improvement in the performance (about 3 mins faster for 1GB worth of
data, from 22 mins to 19 mins).
Is this a normal scenario? Both of my machine are running on Intel i7 core.
Regards,
Edwin
On 21 September 2015
Hello everyone,
In our development efforts, we came into the necessity of sharing Solr
indexes with some initial documents to be deployed alongside our
application. For that, I just started copying the collection directory
with its conf and data subdirs.
That worked for some time, but it now
Hi,
Would like to check, will StandardTokenizerFactory works well for indexing
both English and Chinese (Bilingual) documents, or do we need tokenizers
that are customised for chinese (Eg: HMMChineseTokenizerFactory)?
Regards,
Edwin
You can request the “score” field in the “fl” parameter.
Why do you want to cut off at a particular score value?
Solr scores don’t work like that. They are not absolute relevance scores, they
change with each query. There is no such thing as a 100% match or a 50% match.
Setting a lower score li
Honestly is highly discouraged to share an index, making N Solr nodes using
it.
Can you express better your requirement ? Why can't you replicate the
index ?
Cheers
2015-09-23 15:37 GMT+01:00 Henrique O. Santos :
> Hello everyone,
>
> In our development efforts, we came into the necessity of sh
Using a second machines , you will dispose of fresh memory, disk and CPUs.
So assuming you succeeded in saturating the first machine indexing power,
of course it is normal that you improve your indexing time giving an
additional node serving the indexing process.
Are you balancing 50:50 or with som
Hi Alessandro,
The requirement is pretty simple. We have a product that makes use of
Solr collections. Anyone can download the product and deploy it locally
(alongside a local Solr instance) on their own machine and start using
it. To make it clear, each installation of the product operates by
In a word, no. The CJK languages in general don't
necessarily tokenize on whitespace so using a tokenizer
that uses whitespace as it's default tokenizer simply won't
work.
Have you tried it? It seems a simple test would get you
an answer faster.
Best,
Erick
On Wed, Sep 23, 2015 at 7:41 AM, Zheng
You should not be copying things into a Solr index unless
1> you absolutely and totally guarantee that no current Solr is running
2> you absolutely and totally guarantee that you replace it entirely
You're really just asking for maintenance issues with this approach. I'd do
one
of two things:
1>
For what it's worth, we've had good luck using the ICUTokenizer and
associated filters. A native Chinese speaker here at the office gave us an
enthusiastic thumbs up on our Chinese search results. Your mileage may vary
of course.
On Wed, Sep 23, 2015 at 11:04 AM, Erick Erickson
wrote:
> In a wor
On 9/23/2015 10:21 AM, Alessandro Benedetti wrote:
m so those 2 are the queries at the minute :
1) logtext:deeper
2) logtext:*deeper*
According to your schema, the log text field is of type "text_en".
This should be completely fine.
Have you ever changed your schema on run ? without re-inde
Too busy these days on work. I'd like to continue this topic talk.
I got 10M+ tradition-chinese news articles. Due to lack of time to write my own
traditional-Chinese tokenizer, I use CJK tokenizer. However, CJK uses bigram
and thus will create a very large index on my case. I don't know if I
You may find the following articles interesting:
http://discovery-grindstone.blogspot.ca/2014/01/searching-in-solr-analyzing-results-and.html
( a whole epic journey)
https://dzone.com/articles/indexing-chinese-solr
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newslet
This is totally weird.
Don't only re-index your old docs, find the data directory and
rm -rf data (with Solr stopped) and re-index.
re: the analysis page Alessandro mentioned.
Go to the Solr admin UI (http://localhost:8983/solr). You'll
see a drop-down on the left that lets you select a core,
sel
I leave it to the default settings for now, which should be balancing 50-50
across both shards.
Regards,
Edwin
On 23 September 2015 at 22:49, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:
> Using a second machines , you will dispose of fresh memory, disk and CPUs.
> So assuming you s
Hi,
There seems to have been a significant spike in spam emails forwarded
to moderators. While the volume is still tractable, what surprises me
is that almost all of these ought to have been caught by automated
spam filters. E.g., a significant fraction of the spam seems to
originate from just a f
On Wed, Sep 23, 2015, at 04:34 PM, Gora Mohanty wrote:
> Hi,
>
> There seems to have been a significant spike in spam emails forwarded
> to moderators. While the volume is still tractable, what surprises me
> is that almost all of these ought to have been caught by automated
> spam filters. E.g.
Our test Solr and Elasticsearch instances for Quepid(http://quepid.com) are
now hosted on docker (specifically kubernetes)
It's worked pretty well. I'd suggest if you're curious to speak to my
devops focussed colleague Chris Bradford that has a great deal of
experience here. I haven't encountered
Hi !,
I keep getting nodes that fall into recovery mode and then issue the
following log WARN every 10 seconds:
WARN Stopping recovery for core= coreNodeName=core_node7
and sometimes this appears as well:
PERFORMANCE WARNING: Overlapping onDeckSearchers=2
At higher traffic time, this gets
On 23 September 2015 at 21:10, Upayavira wrote:
>
>
> On Wed, Sep 23, 2015, at 04:34 PM, Gora Mohanty wrote:
>> Hi,
>>
>> There seems to have been a significant spike in spam emails forwarded
>> to moderators. While the volume is still tractable, what surprises me
>> is that almost all of these ou
Sure, prototyping is the best answer ;).
Well, the biggest question is "how long
does each shard spend on the query"?
Adding more shards will likely decrease
the time each shard takes to process the
first-pass query. But if your base is
50ms, and sharding some more takes it to
40ms I don't think
Shawn Heisey-2 wrote
> On 9/22/2015 11:54 AM, vsilgalis wrote:
>> I've actually read that article a few times.
>>
>> Yeah I know we aren't perfect in opening searchers. Yes we are committing
>> from the client, this is something that is changing in our next code
>> release, AND we are auto soft com
Wow, this is not expected at all. There's no
way you should, on the face of it, get
overlapping on-deck searchers.
I recommend you put your maxWarmingSearchers
back to 2, that's a fail-safe that is there to make
people look at why they're warming a bunch of
searchers at once.
With those settings,
Yep, going to smaller files (and possibly feeding
them through multiple clients) is the way to go
here.
You could also use a SolrJ client which would be
more efficient, here's a place to start. Admittedly
it doesn't parse JSON, but should give you an idea
of how you could go about it if you wanted
I forgot some additional details:
solr version is 5.0.0
and when one of the nodes enter recovery mode the leader says this:
The current zkClientTimeout is 15 seconds. I am gonna try to increment to
30 seconds. The process is running like this
usr/lib/jvm/java-8-oracle/bin/java -server -Xss2
Hi Doug,
The Dockerfiles we use have been pushed up to a GitHub repo
https://github.com/o19s/solr-docker. I'm happy to answer any questions
about them.
~Chris
On Wed, Sep 23, 2015 at 8:47 AM Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:
> Our test Solr and Elasticsearch instances
Hi,
I have found in my code few boost functions like this one:
bf=sum(product(max(field_a,1),10),product(max(field_b,1),100))
The question is, may I split this in two:
bf=product(max(field_a,1),10)
bf=product(sub(has_image,1),100)
being sure to have in return always the same results of first b
On 9/23/2015 11:28 AM, Erick Erickson wrote:
This is totally weird.
Don't only re-index your old docs, find the data directory and
rm -rf data (with Solr stopped) and re-index.
I pretty much do that. The thing is: I don't have a data directory
anywhere! Most of my stuff is in /localapps/dev/E
You're really mixing text and numeric concepts here:
numeric types are pretty much completely unanalyzed.
The notion of subjecting them to an analysis chain hasn't
been done at all.
You could consider adding an update processor to
your update chain, I suspect the script update processor
is likely
On 9/23/2015 10:10 AM, vsilgalis wrote:
> Thanks guys, this is exactly what I needed, something to dig into and follow
> up on.
>
> I do have question in regards to searcher warmup, when looking here:
> http://0.0.0.0.43:8080/solr/#/collections/plugins/core?entry=searcher
>
> is the warmuptime spec
Hi Erick,
Yes I did tried on the StandardTokenizer, and it seems to work well for
both English and Chinese words. Also, it has a faster indexing and response
time during query. Just that in StandardTokenizer which tokenize on
whitespace, the cutting of the chinese words will be indiviual character
On 23 September 2015 at 18:08, Erick Erickson
wrote:
> Wow, this is not expected at all. There's no
> way you should, on the face of it, get
> overlapping on-deck searchers.
>
> I recommend you put your maxWarmingSearchers
> back to 2, that's a fail-safe that is there to make
> people look at why
Thanks Rich and Alexandre,
I'll probably test out the CJKTokenizer as well.
Previously I had some issues with the Paoding in Solr 5.2.1. But I haven't
tested it on 5.3.0 yet.
Regards,
Edwin
On 23 September 2015 at 23:23, Alexandre Rafalovitch
wrote:
> You may find the following articles inter
Then my next guess is you're not pointing at the index you think you are
when you 'rm -rf data'
Just ignore the Elall field for now I should think, although get rid of it
if you don't think you need it.
DIH should be irrelevant here.
So let's back up.
1> go ahead and "rm -fr data" (with Solr sto
Thanks Erick, your insights are really useful.
Honestly I agree and when the prototyping will come I will definitely
proceed like you suggested !
Cheers
2015-09-23 16:53 GMT+01:00 Erick Erickson :
> Sure, prototyping is the best answer ;).
>
> Well, the biggest question is "how long
> does each
I believe so, though I would test it out.
You could try it out in http://splainer.io and confirm the scores are
identical
Cheers
-Doug
On Wed, Sep 23, 2015 at 12:10 PM, Vincenzo D'Amore
wrote:
> Hi,
>
> I have found in my code few boost functions like this one:
>
> bf=sum(product(max(field_a,1
Hi Doug,
I have ported solrcloud to docker too, I hope you can found something
interesting here:
https://github.com/freedev/solrcloud-zookeeper-docker
This project runs an zookeeper ensemble and a sorlcloud cluster within many
containers.
Now, in my spare time, I'm trying to port this project t
bq: and when one of the nodes enter recovery mode the leader says this:
Hmmm, nothing came through, the mail filter is pretty aggressive
about stripping attachments though.
bq: You mean 10 seconds apart ?
Hmmm, no I mean 10 minutes. That would
explain the overlapping searchers since the only
tim
Nice! starred. We'll keep that in mind should we go to docker beyond one
instance.
Cheers
-Doug
On Wed, Sep 23, 2015 at 12:39 PM, Vincenzo D'Amore
wrote:
> Hi Doug,
>
> I have ported solrcloud to docker too, I hope you can found something
> interesting here:
>
> https://github.com/freedev/solrc
On 9/23/2015 12:30 PM, Erick Erickson wrote:
Then my next guess is you're not pointing at the index you think you are
when you 'rm -rf data'
Just ignore the Elall field for now I should think, although get rid of it
if you don't think you need it.
DIH should be irrelevant here.
So let's back u
Hi Doug,
thank you for your git repo. I am planning a kubernetes + solr integration
as well,
can you tell us how do you organize your pods and services (or else)
regarding zookeeper
management. How do you organize your pods/services/etc along with solr
instances, ZK nodes etc ..
Thanks in advance
here are the logs that didnt make it through the image: (sorry for the
misalignment on the logs)
9/23/2015, 7:14:49 PMERRORStreamingSolrClientserror
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting
for connection from pool
at
org.apache.http.impl.conn.PoolingClientCon
Hi,
we're using Solr 4.10.4 and the dismax query parser to search across
multiple fields. One of the fields is configured with a
StandardTokenizer (type "text_general"). I set mm=100% to only get hits
that match all terms.
This does not seem to work for queries that are split into multiple
Hi Andreas,
Thats weird. It looks like mm calculation is done before the tokenization took
place.
You can try to set autoGeneratePhraseQueries to true
or replace dashes with white-spaces at client side.
Ahmet
On Wednesday, September 23, 2015 10:00 PM, Andreas Hubold
wrote:
Hi,
we're usin
Use fq
Bill Bell
Sent from mobile
> On Sep 23, 2015, at 1:00 PM, Andreas Hubold
> wrote:
>
> Hi,
>
> we're using Solr 4.10.4 and the dismax query parser to search across multiple
> fields. One of the fields is configured with a StandardTokenizer (type
> "text_general"). I set mm=100% to o
Hi Epo,
We aren't using Zookeeper or the SolrCloud stuff on docker yet but it looks
like Vincenzo was using three ZK containers, each with a different port.
Sincerely,
Joe Lawson
On Wed, Sep 23, 2015 at 1:28 PM, Epo Jemba wrote:
> Hi Doug,
>
> thank you for your git repo. I am planning a kube
Hi,
just curious: what you get by running Solr into a Docker container ?
Best
Ugo
On Wed, Sep 23, 2015 at 5:39 PM, Vincenzo D'Amore
wrote:
> Hi Doug,
>
> I have ported solrcloud to docker too, I hope you can found something
> interesting here:
>
> https://github.com/freedev/solrcloud-zookeeper
we get to run commands like, docker run solr and have solr working!
containers make new application deployments a breeze.
On Wed, Sep 23, 2015 at 4:35 PM, Ugo Matrangolo
wrote:
> Hi,
>
> just curious: what you get by running Solr into a Docker container ?
>
> Best
> Ugo
>
> On Wed, Sep 23, 2015
Hi Ugo,
I do not yet use Solr in docker, but for my case docker is not enough here,
used in conjunction with kubernetes what I'am reaching is elasticity,
I mean, adding removing nodes and leave scaling and fault tolerancy to
kubernetes oob.
All you have to do is well defined your blueprint, templa
bq: 9/23/2015, 7:14:49 PMWARNZkControllerLeader is publishing core=dawanda
coreNodeName =core_node10 state=down on behalf of un-reachable replica
Ok, this brings up a different possibility. If you happen to be indexing at
a very high rate there's some possibility that the followers get so busy
tha
OK, this is bizarre. You'd have had to set up SolrCloud by specifying the
-zkRun command when you start Solr or the -zkHost; highly unlikely. On the
admin page there would be a "cloud" link on the left side, I really doubt
one's there.
You should have a data directory, it should be the parent of t
I'm using Solr 4.10.4 in a 3 node cloud setup. I have 3 shards and 3 replicas
for the collection.
I want to analyze the logs to extract the queries and query times. Is there a
tool or script someone has created already for this?
Thanks,
Magesh
On 9/18/2015 3:27 PM, Shawn Heisey wrote:
> A query that works fine in Solr 4.9.1 doesn't work in 5.2.1 with the
> same schema. The field that I am grouping on does not have docValues.
> I get this exception:
>
> java.lang.IllegalStateException: unexpected docvalues type SORTED_SET
> for field '
Recently I installed 5.3.0 and started seeing weird exception which baffled
me. Has anybody encountered such an issue ? The indexing was done via DIH,
the field that is causing the issue is a TrieDateField defined as below
Looking at the following exceptions it feels like a wrong exception,
ity
What tools do you use for the "auto setup"? How do you get your config
automatically uploaded to zk?
On Tue, Sep 22, 2015 at 2:35 PM, Gili Nachum wrote:
> Our auto setup sequence is:
> 1.deploy 3 zk nodes
> 2. Deploy solr nodes and start them connecting to zk.
> 3. Upload collection config to zk
bq: What tools do you use for the "auto setup"? How do you get your config
automatically uploaded to zk?
Both uploading the config to ZK and creating collections are one-time
operations, usually done manually. Currently uploading the config set is
accomplished with zkCli (yes, it's a little clumsy
What were you trying to do when this happened?
Bear in mind that a tdate field *is* by definition multivalued. It is
indexed at multiple levels of precision.
I bet if you reindexed with this field as a date field type, you won't
hit this issue. The date field type is still a TrieDateField, but it
73 matches
Mail list logo