Hi,
one-way managed synonyms seems to work fine, but I cannot make both-way
synonyms work.
Steps to reproduce with Solr 5.4.1:
1. create a core:
$ bin/solr create_core -c test -d server/solr/configsets/basic_configs
2. edit schema.xml so fieldType text_general looks like this:
The script essentially automates what you would do manually, for the first
time when starting up the system.
It is no different from extracting the archive, setting permissions etc.
yourself.
So the next time you wanted to stop/ restart solr, you'll have to do it
using the solr script.
That being
Hi,
I'm trying to optimize a solr application.
The bottleneck are queries that request 1000 rows to solr.
Unfortunately the application can't be modified at the moment, can you
suggest me what could be done on the solr side to increase the performance?
The bottleneck is just on fetching the re
If you're fetching large text fields, consider highlighting on them and
just returning the snippets. I faced such a problem some time ago and
highlighting sped things up nearly 10x for us.
On Thu, 11 Feb 2016, 15:03 Matteo Grolla wrote:
> Hi,
> I'm trying to optimize a solr application.
> T
Hi,
I need to evaluate different boost solutions performance and I can't find
any relevant documentation about it. Are fieldCache and/or DocValues used
by Function Queries?
On Thu, Feb 11, 2016, at 09:33 AM, Matteo Grolla wrote:
> Hi,
> I'm trying to optimize a solr application.
> The bottleneck are queries that request 1000 rows to solr.
> Unfortunately the application can't be modified at the moment, can you
> suggest me what could be done on the solr side to
Hi Upayavira,
I'm working with solr 4.0, sorting on score (default).
I tried setting the document cache size to 2048, so all docs of a single
request fit (2 requests fit actually)
If I execute a query the first time it takes 24s
I reexecute it, with all docs in the documentCache and it tak
On Thu, 2016-02-11 at 11:53 +0100, Matteo Grolla wrote:
> I'm working with solr 4.0, sorting on score (default).
> I tried setting the document cache size to 2048, so all docs of a single
> request fit (2 requests fit actually)
> If I execute a query the first time it takes 24s
> I reexecu
Thanks Toke, yes, they are long times, and solr qtime (to execute the
query) is a fraction of a second.
The response in javabin format is around 300k.
Currently I can't limit the rows requested or the fields requested, those
are fixed for me.
2016-02-11 13:14 GMT+01:00 Toke Eskildsen :
> On Thu,
Hi Guys,
is it possible to have any feedback ?
Is there any process to speed up bug resolution / discussions ?
just want to understand if the patch is not good enough, if I need to
improve it or simply no-one took a look ...
https://issues.apache.org/jira/browse/LUCENE-6954
Cheers
On 11 January
Hi Mark,
Nothing comes for free :) With doc per action, you will have to handle
large number of docs. There is hard limit for number of docs per shard -
it is ~4 billion (size of int) so sharding is mandatory. It is most
likely that you will have to have more than one collection. Depending on
On Wed, Feb 10, 2016 at 12:13 PM, Markus Jelsma
wrote:
> Hi Tom - thanks. But judging from the article and SOLR-6348 faceting stats
> over ranges is not yet supported. More specifically, SOLR-6352 is what we
> would need.
>
> [1]: https://issues.apache.org/jira/browse/SOLR-6348
> [2]: https://is
Can you check your log level? Probably log level of error would suffice for
your purpose and it would most certainly reduce your log size(s).
On Thu, Feb 11, 2016 at 12:53 PM, kshitij tyagi wrote:
> Hi,
> I have migrated to solr 5.2 and the size of logs are high.
>
> Can anyone help me out here
On Wed, Feb 10, 2016 at 5:21 AM, Markus Jelsma
wrote:
> Hi - if we assume the following simple documents:
>
>
> 2015-01-01T00:00:00Z
> 2
>
>
> 2015-01-01T00:00:00Z
> 4
>
>
> 2015-01-02T00:00:00Z
> 3
>
>
> 2015-01-02T00:00:00Z
> 7
>
>
> Can i get a daily average for the fiel
Hi Matteo,
as an addition to Upayavira observation, how is the memory assigned for
that Solr Instance ?
How much memory is assigned to Solr and how much left for the OS ?
Is this a VM on top of a physical machine ? So it is the real physical
memory used, or swapping could happen frequently ?
Is th
On Thu, Feb 11, 2016 at 7:45 AM, Matteo Grolla wrote:
> Thanks Toke, yes, they are long times, and solr qtime (to execute the
> query) is a fraction of a second.
> The response in javabin format is around 300k.
OK, That tells us a lot.
And if you actually tested so that all the docs would be in t
Related this, I just created this :
https://issues.apache.org/jira/browse/SOLR-8672
To be fair, I see no utility in returning duplicate suggestions ( if they
have no different payload, they are un-distinguishable from a human
perspective hence useless to have duplication) .
I would like to hear so
Hi Yonic,
after the first query I find 1000 docs in the document cache.
I'm using curl to send the request and requesting javabin format to mimic
the application.
gc activity is low
I managed to load the entire 50GB index in the filesystem cache, after that
queries don't cause disk activity an
Your biggest issue here is likely to be http connections. Making an HTTP
connection to Solr is way more expensive than the ask of adding a single
document to the index. If you are expecting to add 24 billion docs per
day, I'd suggest that somehow merging those documents into batches
before sending
On Thu, Feb 11, 2016 at 9:42 AM, Matteo Grolla wrote:
> Hi Yonic,
> after the first query I find 1000 docs in the document cache.
> I'm using curl to send the request and requesting javabin format to mimic
> the application.
> gc activity is low
> I managed to load the entire 50GB index in th
I see a lot of time spent in splitOnTokens
which is called by (last part of stack trace)
BinaryResponseWriter$Resolver.writeResultsBody()
...
solr.search.ReturnsField.wantsField()
commons.io.FileNameUtils.wildcardmatch()
commons.io.FileNameUtils.splitOnTokens()
2016-02-11 15:42 GMT+01:00 Matte
Thanks. But this yields an error in FacetModule:
java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
at
org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:100)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHan
On Thu, Feb 11, 2016 at 10:04 AM, Markus Jelsma
wrote:
> Thanks. But this yields an error in FacetModule:
>
> java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
> at
> org.apache.solr.search.facet.FacetModule.prepare(FacetModule.java:100)
> at
> org.ap
I am trying to select distinct records from a collection. (I need distinct
name and corresponding id)
I have tried using grouping and group format of simple but that takes a
long time to execute and sometimes runs into out of memory exception.
Another limitation seems to be that total number of gr
[image: Immagine incorporata 1]
2016-02-11 16:05 GMT+01:00 Matteo Grolla :
> I see a lot of time spent in splitOnTokens
>
> which is called by (last part of stack trace)
>
> BinaryResponseWriter$Resolver.writeResultsBody()
> ...
> solr.search.ReturnsField.wantsField()
> commons.io.FileNameUtils.w
Is this a scenario that was working fine and suddenly deteriorated, or has
it always been slow?
-- Jack Krupansky
On Thu, Feb 11, 2016 at 4:33 AM, Matteo Grolla
wrote:
> Hi,
> I'm trying to optimize a solr application.
> The bottleneck are queries that request 1000 rows to solr.
> Unfortun
You should edit the files installed by install_solr_service.sh - change the
init.d script to pass the -p argument to ${SOLRINSTALLDIR}/bin/solr.
By the way, my initscript is modified (a) to support the conventional
/etc/sysconfig/ convention, and (b) to run solr as a different
user than the use
What version of Solr are you using?
Have you taken a look at the Collapsing Query Parser. It basically performs
the same functions as grouping but is much more efficient at doing it.
Take a look here:
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
On Thu, Feb 11, 2016
Responses have always been slow but previously time was dominated by
faceting.
After few optimization this is my bottleneck.
My suggestion has been to properly implement paging and reduce rows,
unfortunately this is not possible at least not soon
2016-02-11 16:18 GMT+01:00 Jack Krupansky :
> Is t
Are queries scaling linearly - does a query for 100 rows take 1/10th the
time (1 sec vs. 10 sec or 3 sec vs. 30 sec)?
Does the app need/expect exactly 1,000 documents for the query or is that
just what this particular query happened to return?
What does they query look like? Is it complex or use
I am using
Solr 5.1.0
On Thu, Feb 11, 2016 at 9:19 AM, Binoy Dalal wrote:
> What version of Solr are you using?
> Have you taken a look at the Collapsing Query Parser. It basically performs
> the same functions as grouping but is much more efficient at doing it.
> Take a look here:
>
> https://
For my application, the solution I implemented is I log the chunk that
failed into a file. This file is than post processed one record at a
time. The ones that fail, are reported to the admin and never looked at
again until the admin takes action. This is not the most efficient
solution right no
Hi Jack,
response time scale with rows. Relationship doens't seem linear but
Below 400 rows times are much faster,
I view query times from solr logs and they are fast
the same query with rows = 1000 takes 8s
with rows = 10 takes 0.2s
2016-02-11 16:22 GMT+01:00 Jack Krupansky :
> Are querie
Hi - i was sending the following value for json.facet:
json.facet=by_day:{type : range, start : NOW-30DAY/DAY, end : NOW/DAY, gap :
"+1DAY", facet:{x : "avg(rank)"}}
I now also notice i didn't include the time field. But adding it gives the same
error:
json.facet=by_day:{type : range, field : ti
On Thu, Feb 11, 2016 at 11:07 AM, Markus Jelsma
wrote:
> Hi - i was sending the following value for json.facet:
> json.facet=by_day:{type : range, start : NOW-30DAY/DAY, end : NOW/DAY, gap :
> "+1DAY", facet:{x : "avg(rank)"}}
>
> I now also notice i didn't include the time field. But adding it g
On 2/10/2016 10:33 AM, McCallick, Paul wrote:
> We’re trying to fine tune our query and ingestion performance and would like
> to get more metrics out of SOLR around this. We are capturing the standard
> logs as well as the jetty request logs. The standard logs get us QTime,
> which is not a g
Good to know. Hmmm... 200ms for 10 rows is not outrageously bad, but still
relatively bad. Even 50ms for 10 rows would be considered barely okay.
But... again it depends on query complexity - simple queries should be well
under 50 ms for decent modern hardware.
-- Jack Krupansky
On Thu, Feb 11, 2
Solr 6.0 supports SELECT DISTINCT (SQL) queries. You can even choose
between a MapReduce implementation and a Json Facet implementation. The
MapReduce Implementation supports extremely high cardinality for the
distinct fields. Json Facet implementation supports lower cardinality but
high QPS.
Joel
virtual hardware, 200ms is taken on the client until response is written to
disk
qtime on solr is ~90ms
not great but acceptable
Is it possible that the method FilenameUtils.splitOnTokens is really so
heavy when requesting a lot of rows on slow hardware?
2016-02-11 17:17 GMT+01:00 Jack Krupansky
Awesome! The surrounding braces did the thing. Fixed the quotes just before.
Many thanks!!
The remaining issue is that some source files in o.a.s.search.facet package are
package protected or private. I can't implement a custom Agg using FacetContext
and such. Created issue: https://issues.apach
Hi Guys,
I'm having a problem with master slave syncing.
So I have two cores one is small core (just keep data use frequently for
fast results) and another is big core (for rare query and for search in
every thing). both core has same solrconfig file. But small core
replication is fine, other tha
Out of curiosity, have you tried to debug that solr version to see which
text arrives to the splitOnTokens method ?
In latest solr that part has changed completely.
Would be curious to understand what it tries to tokenise by ? and * !
Cheers
On 11 February 2016 at 16:33, Matteo Grolla wrote:
>
I have tried to use the Collapsing feature but it appears that it leaves
duplicated records in the result set.
Is that expected? Or any suggestions on working around it?
Thanks
On Thu, Feb 11, 2016 at 9:30 AM, Brian Narsi wrote:
> I am using
>
> Solr 5.1.0
>
> On Thu, Feb 11, 2016 at 9:19 AM,
Hello,
I am trying to implement a Solr cluster with mutual authentication using
client and server SSL certificates. I have both client and server
certificates signed by CA. The set up is working good, however any client
cert that chains up to issuer CA are able to access the Solr cluster
without v
I'm looking for an option to write a Solr plugin which can deal with a
custom binary input stream. Unfortunately Solr's javabin as a protocol is
not an option for us.
I already had a look at some possibilities like writing a custom request
handler, but it seems like the classes/interfaces one woul
The CollapsingQParserPlugin shouldn't have duplicates in the result set.
Can you provide the details?
Joel Bernstein
http://joelsolr.blogspot.com/
On Thu, Feb 11, 2016 at 12:02 PM, Brian Narsi wrote:
> I have tried to use the Collapsing feature but it appears that it leaves
> duplicated records
Ok I see that Collapsing features requires documents to be co-located in
the same shard in SolrCloud.
Could that be a reason for duplication?
On Thu, Feb 11, 2016 at 11:09 AM, Joel Bernstein wrote:
> The CollapsingQParserPlugin shouldn't have duplicates in the result set.
> Can you provide the
Hey Solr folks,
Current dismax parser behavior is different for unigrams versus bigrams.
For unigrams, it's MAX-ed across fields (so called dismax), but for
bigrams, it's SUM-ed from Solr 4.10 (according to
https://issues.apache.org/jira/browse/SOLR-6062).
Given this inconsistency, the dilem
Hey Solr folks,
Current dismax parser behavior is different for unigrams versus bigrams.
For unigrams, it's MAX-ed across fields (so called dismax), but for
bigrams, it's SUM-ed from Solr 4.10 (according to
https://issues.apache.org/jira/browse/SOLR-6062).
Given this inconsistency, the dilem
What is your replication configuration in solrconfig.xml on both
master and slave?
bq: big core is doing full sync every time wherever it start (every minute).
Do you mean the Solr is restarting every minute or the polling
interval is 60 seconds?
The Solr logs should tell you something about wh
Steven's solution is a very common one, complete to the
notion of re-chunking. Depending on the throughput requirements,
simply resending the offending packet one at a time is often
sufficient (but not _efficient). I can imagine fallback scenarios
like "try chunking 100 at a time, for those chunks
You can also look at hour log4j properties file and manipulate the
max log size, how many old versions are retained etc.
If you're talking about the console log, people often just disable
console logging (again in the logging properties file).
Best,
Erick
On Thu, Feb 11, 2016 at 6:11 AM, Aditya
bq: We want the hits on solr servers to be distributed
True, this happens automatically in SolrCloud, but a simple load
balancer in front of master/slave does the same thing.
bq: what if master node fail what should be our fail over strategy ?
This is, indeed one of the advantages for SolrCloud
It's possible with JDBC settings (see the specific ones for your
drive), but dangerous. What if the number of rows is 1B or something?
You'll blow Solr's memory out of the water
Best,
Erick
On Wed, Feb 10, 2016 at 12:45 PM, Troy Edwards wrote:
> Is it possible for the Data Import Handler to
I first wrote the “fall back to one at a time” code for Solr 1.3.
It is pretty easy if you plan for it. Make the batch size variable. When a
batch fails, retry with a batch size of 1 for that particular batch. Then keep
going or fail, either way, you have good logging on which one failed.
wunde
Hello everyone!
I hope this email finds you well. I hope everyone is as excited about
ApacheCon as I am!
I'd like to remind you all of a couple of important dates, as well as ask for
your assistance in spreading the word! Please use your social media platform(s)
to get the word out! The more v
Hi,
I noticed while running an indexing job (2M docs but per doc size could be
2-3 MB) that one of the shards goes down just after the commit. (Not
related to OOM or high cpu/load). This marks the shard as "down" in zk and
even a reload of the collection does not recover the state.
There are n
Hi ,
I am currently indexing individual outlook messages and searching is
working fine.
I have created solr core using following command.
./solr create -c sreenimsg1 -d data_driven_schema_configs
I am using following command to index individual messages.
curl "
http://localhost:8983/solr/
yes, runtime lib cannot be used for loading container level plugins
yet. Eventually they must. You can open a ticket
On Mon, Jan 4, 2016 at 1:07 AM, tine-2 wrote:
> Hi,
>
> are there any news on this? Was anyone able to get it to work?
>
> Cheers,
>
> tine
>
>
>
> --
> View this message in contex
Yeah that would be the reason. If you want distributed unique capabilities,
then you might want to start testing out 6.0. Aside from SELECT DISTINCT
queries, you also have a much more mature Streaming Expression library
which supports the unique operation.
Joel Bernstein
http://joelsolr.blogspot.c
Y, this looks like a Tika feature. If you run the tika-app.jar [1]on your file
and you get the same output, then that's Tika's doing.
Drop a note on the u...@tika.apache.org list if Tika isn't meeting your needs.
-Original Message-
From: Sreenivasa Kallu [mailto:sreenivasaka...@gmail.co
I have found that when you deal with large amounts of all sort of files, in
the end you find stuff (pdfs are typically nasty) that will hang tika. That
is even worse that a crash or OOM.
We used aperture instead of tika because at the time it provided a watchdog
feature to kill what seemed like a h
x-post to Tika user's
Y and n. If you run tika app as:
java -jar tika-app.jar
It runs tika-batch under the hood (TIKA-1330 as part of TIKA-1302). This
creates a parent and child process, if the child process notices a hung thread,
it dies, and the parent restarts it. Or if your OS gets u
Should have looked at how we handle psts before earlier responsesorry.
What you're seeing is Tika's default treatment of embedded documents, it
concatenates them all into one string. It'll do the same thing for zip files
and other container files. The default Tika format is xhtml, and we i
Hi Erick,
Below is master slave config:
Master:
commit
optimize
2
Slave:
http://master:8983/solr/big_core/replication
00:00:60
username
password
Do you mean the Solr is restarting every minute or the polling
inte
Typo? That's 60 seconds, but that's not especially interesting either way.
Do the actual segment's look identical after the polling?
On Thu, Feb 11, 2016 at 1:16 PM, Novin Novin wrote:
> Hi Erick,
>
> Below is master slave config:
>
> Master:
>
>
> commit
> optimize
>
Clarification needed on edismax query parser "pf" field.
*SOLR Query:*
/query?q=refrigerator water filter&qf=P_NAME^1.5
CategoryName&wt=xml&debugQuery=on&pf=P_NAME
CategoryName&mm=2&fl=CategoryName P_NAME score&defType=edismax
*Parsed Query from DebugQuery results:*
(+((DisjunctionMaxQuery((P_NAM
Tim,
In my case, I have to use Tika as follows:
java -jar tika-app.jar -t
I will be invoking the above command from my Java app
using Runtime.getRuntime().exec(). I will capture stdout and stderr to get
back the raw text i need. My app use case will not allow me to use a
, it is out of t
For sure, if I need heavy duty text extraction again, Tika would be the
obvious choice if it covers dealing with hangs. I never used tika-server
myself (not sure if it existed at the time) just used tika from my own jvm.
On Thu, Feb 11, 2016 at 8:45 PM, Allison, Timothy B.
wrote:
> x-post to Tik
In order to use the Collapsing feature I will need to use Document Routing
to co-locate related documents in the same shard in SolrCloud. What are the
advantages and disadvantages of Document Routing?
Thanks,
On Thu, Feb 11, 2016 at 12:54 PM, Joel Bernstein wrote:
> Yeah that would be the reaso
After more debugging, I figured out that it is related to this:
https://issues.apache.org/jira/browse/SOLR-3274
Is there a recommended fix (apart from running a zk ensemble?)
On Thu, Feb 11, 2016 at 10:29 AM, KNitin wrote:
> Hi,
>
> I noticed while running an indexing job (2M docs but per doc
Well, I'd imagine you could spawn threads and monitor/kill them as
necessary, although that doesn't deal with OOM errors
FWIW,
Erick
On Thu, Feb 11, 2016 at 3:08 PM, xavi jmlucjav wrote:
> For sure, if I need heavy duty text extraction again, Tika would be the
> obvious choice if it covers d
Try comma instead of space delimiting?
On Thu, Feb 11, 2016 at 2:33 PM, Senthil wrote:
> Clarification needed on edismax query parser "pf" field.
>
> *SOLR Query:*
> /query?q=refrigerator water filter&qf=P_NAME^1.5
> CategoryName&wt=xml&debugQuery=on&pf=P_NAME
> CategoryName&mm=2&fl=CategoryName
Y, and you can't actually kill a thread. You can ask nicely via
Thread.interrupt(), but some of our dependencies don't bother to listen for
that. So, you're pretty much left with a separate process as the only robust
solution.
So, we did the parent-child process thing for directory-> directo
Again, first things first... debugQuery=true and see which Solr search
components are consuming the bulk of qtime.
-- Jack Krupansky
On Thu, Feb 11, 2016 at 11:33 AM, Matteo Grolla
wrote:
> virtual hardware, 200ms is taken on the client until response is written to
> disk
> qtime on solr is ~90
we have upgraded solr version last night getting following error
org.apache.solr.common.SolrException: Bad content Type for search handler
:application/octet-stream
what i should do ? to remove this .
Erick ,
bq: We want the hits on solr servers to be distributed
True, this happens automatically in SolrCloud, but a simple load
balancer in front of master/slave does the same thing.
Midas : in case of solrcloud architecture we need not to have load balancer
? .
On Thu, Feb 11, 2016 at 11:42 PM
my log is increasing . it is urgent ..
On Fri, Feb 12, 2016 at 10:43 AM, Midas A wrote:
> we have upgraded solr version last night getting following error
>
> org.apache.solr.common.SolrException: Bad content Type for search handler
> :application/octet-stream
>
> what i should do ? to remove th
On 2/11/2016 10:13 PM, Midas A wrote:
> we have upgraded solr version last night getting following error
>
> org.apache.solr.common.SolrException: Bad content Type for search handler
> :application/octet-stream
>
> what i should do ? to remove this .
What version did you upgrade from and what vers
solr 5.2.1
On Fri, Feb 12, 2016 at 12:59 PM, Shawn Heisey wrote:
> On 2/11/2016 10:13 PM, Midas A wrote:
> > we have upgraded solr version last night getting following error
> >
> > org.apache.solr.common.SolrException: Bad content Type for search handler
> > :application/octet-stream
> >
> > wh
80 matches
Mail list logo