Hi,
I have a text field in my index containing extended characters, which
I'd like to match against when searching without the extended characters.
e.g. field contains "Ensō" which I want to match when searching for
just "enso".
My current config for that field (type) is given below:
of them.
And check whether your swap device is working. With a working swap disk, maybe
your system would just slow down instead of crashing. No, sorry, your swap _is_
working, and java is mostly swapped out. It must be slow. Cheers -- Rick
On May 26, 2017 1:25:55 PM EDT, Robert Brown wrote
Thanks Shawn,
It's more inquisitiveness now more than anything.
http://web.lavoco.com/top.png
(forgot to mention mariadb on there too :)
On 26/05/17 16:20, Shawn Heisey wrote:
On 5/26/2017 11:01 AM, Robert Brown wrote:
Let's assume I can't get more RAM - why would an i
s and filtering, about 20 fields in total.
Currently just 500 docs.
On 26/05/17 15:43, Erick Erickson wrote:
Or get more physical memory? Solr _likes_ memory, you won't be able to
do much with only 2G physical memory..
On Fri, May 26, 2017 at 2:00 AM, Robert Brown wrote:
Thanks Rick,
ually, every one).
cheers -- Rick
On 2017-05-25 06:55 PM, Robert Brown wrote:
Hi,
I'm currently running 6.5.1 with a tiny index, less than 1MB.
When I restart another app on the same server as Solr, Solr
occasionally dies, but no solr_oom_killer.log file.
Heap size is 256MB (~30MB used)
Hi,
I'm currently running 6.5.1 with a tiny index, less than 1MB.
When I restart another app on the same server as Solr, Solr occasionally
dies, but no solr_oom_killer.log file.
Heap size is 256MB (~30MB used), Physical RAM 2GB, typically using 1.5GB.
How else can I debug what's causing it?
Hi All,
I have an index with 10m documents.
When performing an MLT query and grouping by a field, response times are
roughly 20s.
The group field is currently populated with unique values, as we now
start to manually group documents (hence using MLT).
The group field has docValues turned o
In addition to a separate proxy you could use iptables, I use this
technique for another app (running on port 5000 but requests come in
port 80)...
*nat
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 5000
MaryJo, I think you've mis-understood. The counts are different simply
because the 2nd query contains an filter of a facet value from the 1st
query - that's completely expected.
The issue is how to get the original facet counts (with no filters but
same q) in the same call as also filtering b
Hi,
Currently we import data-sets from various sources (csv, xml, json,
etc.) and POST to Solr, after some pre-processing to get it into a
consistent format, and some other transformations.
We currently dump out to a json file in batches of 1,000 documents and
POST that file to Solr.
Rough
, Shawn Heisey wrote:
On 4/28/2016 3:13 PM, Robert Brown wrote:
I operate several collections (about 7-8) all using the same 5-node
ZooKeeper cluster. They've been in production for 3 months, with only
2 previous issues where a Solr node went down.
Tonight, during several updates to the vario
Hi,
I operate several collections (about 7-8) all using the same 5-node
ZooKeeper cluster. They've been in production for 3 months, with only 2
previous issues where a Solr node went down.
Tonight, during several updates to the various collections, a handful
failed due to the below error.
Hi,
I have a collection with 2 shards, 1 replica each.
When I send updates, I currently /admin/ping each of the nodes, and then
pick one at random.
I'm guessing it makes more sense to only send updates to one of the
leaders, so I'm contemplating getting the collection status instead, and
fi
Hi,
My autoSoftCommit is set to 1 minute. Does this actually affect things
if no documents have actually been updated/created? Will this also
affect the clearing of any caches?
Is this also the same for hard commits, either with autoCommit or making
an explicit http request to commit.
Th
Hi,
My collection had issues earlier, 1 shard showed as Down, the other only
replica was Gone.
Both were actually still up and running, no disk or CPU issues.
This occurred during updates.
The server since recovered after a reboot.
Upon trying to update the index again, I'm now getting cons
It's a string field, ean...
http://paste.scsys.co.uk/510132
On 04/11/2016 06:00 PM, Yonik Seeley wrote:
On Mon, Apr 11, 2016 at 12:52 PM, Robert Brown wrote:
Hi,
When I perform a range query of ['' TO *] to filter out docs where a
particular field has a value, this does wha
Hi,
When I perform a range query of ['' TO *] to filter out docs where a
particular field has a value, this does what I want, but I thought using
the square brackets was inclusive, so empty-string values should
actually be included?
The JSON I post to Solr has empty values, not null/undefine
excellent writeup on this
subtlety:
https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/
Best,
Erick
On Sat, Apr 9, 2016 at 3:51 AM, Robert Brown wrote:
Hi,
I have this delete query: "*partner:pg AND market:us AND last_seen:[* TO
2016-04-09T02:01:06Z]*"
And would like to
Hi,
I have this delete query: "*partner:pg AND market:us AND last_seen:[* TO
2016-04-09T02:01:06Z]*"
And would like to add "AND merchant_id != 12345 AND merchant_id != 98765"
Would this be done by including "*AND -merchant_id:12345 AND
-merchant_id:98765*" ?
Thanks,
Rob
ents will make a difference too - so the comparison
to 300 - 500 on other cloud setups may or may not be comparing apples to
oranges...
Are the "new" documents actually new or are you overwriting existing solr
doc ID's? If you are overwriting, you may want to optimize and see if that
Hi,
I'm currently posting updates via cURL, in batches of 1,000 docs in JSON
files.
My setup consists of 2 shards, 1 replica each, 50m docs in total.
These updates are hitting a node at random, from a server across the
Internet.
Apart from the obvious delay, I'm also seeing QTime's of 1,00
was really important you could have two or more
"indexing" servers and fire multiple threads at each one...
You probably already know this, but the key is how often you "commit" and
force the indexing to occur...
On Mon, Apr 4, 2016 at 3:33 PM, Robert Brown wrote:
Hi,
Does So
Hi,
Does Solr have any sort of limit when attempting multiple updates, from
separate clients?
Are there any safe thresholds one should try to stay within?
I have an index of around 60m documents that gets updated at key points
during the day from ~200 downloaded files - I'd like to fork off
et use facet.range.gap to set how dates
are "truncated".
Regards,
Emir
On 31.03.2016 10:52, Robert Brown wrote:
> Hi,
>
> Is it possible to facet by a date (solr.TrieDateField) but
truncated
> to the day, or even the hour?
>
> If not, are there any other options apart from
Hi,
Is it possible to facet by a date (solr.TrieDateField) but truncated to
the day, or even the hour?
If not, are there any other options apart from storing that truncated
data in another (string?) field?
Thanks,
Rob
olve those random slowdowns - or at least rule it out.
On 03/24/2016 01:44 PM, Shawn Heisey wrote:
On 3/24/2016 4:02 AM, Robert Brown wrote:
If my index data directory size is 70G, and I don't have 70G (plus
heap, etc) in the system, this will occasionally affect search speed
right? When Sol
Hi,
If my index data directory size is 70G, and I don't have 70G (plus heap,
etc) in the system, this will occasionally affect search speed right?
When Solr has to resort to reading from disk?
Before I go out and throw more RAM into the system, in the above
example, what would you recommend
which
is probably sounding confused.
Cheers,
Rob
On 03/23/2016 04:03 PM, Tom Evans wrote:
On Wed, Mar 23, 2016 at 3:43 PM, Robert Brown wrote:
So I setup a new solr server to point to my existing ZK configs.
When going to the admin UI on this new server I can see the shards/replica
ote:
The whole _point_ of configsets is to re-use them in multiple
collections, so please do!
Best,
Erick
On Tue, Mar 22, 2016 at 5:38 AM, Robert Brown wrote:
Hi,
Is it safe to create a new cluster but use an existing config set that's in
zookeeper? Or does that config set contain the cluster
"why do you care? just do this ..."
I see this a lot on mailing lists these days, it's usually a learning
curve/task/question. I know I fall into these types of questions/tasks
regularly.
Which usually leads to "don't tell me my approach is wrong, just explain
what's going on, and why", or
sn't it? (I added it to ZK as per the docs),
just a bit confusing to see some files/directories from ZK, and some not.
Thanks for any more insight.
On 03/22/2016 04:57 PM, Shawn Heisey wrote:
On 3/22/2016 6:38 AM, Robert Brown wrote:
Is it safe to create a new cluster but use an exist
Hi,
Is it safe to create a new cluster but use an existing config set that's
in zookeeper? Or does that config set contain the cluster status too?
I want to (re)-build a cluster from scratch, with a different amount of
shards, but not using shard-splitting.
Thanks,
Rob
e used to boost individual
products for specific keywords - I'm beginning to think this is actually
our best hope? e.g. A multi-valued field containing keywords that
resulted in a click on that product.
On 03/18/2016 04:14 PM, Robert Brown wrote:
That does sound rather useful!
We c
rity boost can also be useful:
you can measure it by sales or by number of clicks. I use a combination
of both, and store those values using partial updates.
Hope it helps,
John
On 17/03/16 09:36, Robert Brown wrote:
Hi,
I currently have an index of ~50m docs representing shopping products:
name, de
about how much memory you have on your machine, how much
RAM you're allocating to Solr and the like so it's hard to say much other
than generalities
Best,
Erick
On Sat, Mar 19, 2016 at 10:41 AM, Shawn Heisey wrote:
On 3/19/2016 11:12 AM, Robert Brown wrote:
I have an index o
l.
On Mar 18, 2016 10:40 AM, "Robert Brown" wrote:
Thanks for the added input.
I'll certainly look into the machine learning aspect, will be good to put
some basic knowledge I have into practice.
I'd been led to believe the tie parameter didn't actually do a lot. :-/
Hi,
I have an index of 60m docs split across 2 shards (each with a replica).
When load testing queries (picking random keywords I know exist), and
randomly requesting facets too, 95% of my responses are under 0.5s.
However, during some random manual tests, sometimes I see searches
taking bet
osted.
Even if you haven't seen them at all.
Cheers
On Fri, Mar 18, 2016 at 4:21 PM, Robert Brown wrote:
It's also worth mentioning that our platform contains shopping products in
every single category, and will be searched by absolutely anyone, via an
API made available to various websi
als, and you need
to carefully tune the features of your interest.
But the results could be surprising .
[1] https://issues.apache.org/jira/browse/SOLR-8542
[2] Learning to Rank in Solr <https://www.youtube.com/watch?v=M7BKwJoh96s>
Cheers
On Thu, Mar 17, 2016 at 10:15 AM, Robert Brown
wrot
Hi,
I currently have an index of ~50m docs representing shopping products:
name, description, brand, category, etc.
Our "qf" is currently setup as:
name^5
brand^2
category^3
merchant^2
description^1
mm: 100%
ps: 5
I'm getting complaints from the business concerning relevancy, and was
hopin
Hi,
I'm looking for some advice and possible options for dealing with our
relevancy when searching through shopping products.
A search for "tablet" returns pills, when the user would expect
electronic devices.
Without any extra criteria (like category), how would/could you manage
this situ
Hi,
I have 2 shards, each with 1 replica.
When sending the same request to the cluster, I'm seeing the same
results, but ordered differently, and with different scores.
Does this highlight an issue with my index, or is this an accepted anomaly?
Example of 8 results:
1st call:
160.2047
160.
ithin the shard directory there should be multiple directories - "tlog"
"index." . Do you see multiple "index.*" directories in there
for the shard which has more data on disk?
On Sat, Mar 5, 2016 at 6:39 PM, Robert Brown wrote:
Hi,
I have an index with 65m docs
Thanks Shawn,
I'm just about to remove that node and rebuild it, at least there won't
be any actual downtime.
On 05/03/16 14:44, Shawn Heisey wrote:
On 3/5/2016 6:09 AM, Robert Brown wrote:
I have an index with 65m docs spread across 2 shards, each with 1
replica.
The replica1
Nope, we never run optimise.
Would there be some tell-tale files in the index dir to indicate if
someone else had ran an optimise?
On 05/03/16 13:11, Binoy Dalal wrote:
Have you executed an optimize across that particular shard?
On Sat, 5 Mar 2016, 18:39 Robert Brown, wrote:
Hi,
I
Hi,
I have an index with 65m docs spread across 2 shards, each with 1 replica.
The replica1 of shard2 is using up nearly double the amount of disk
space as the other shards/replicas.
Could there be a reason/fix for this?
/home/s123/solr/data/de_shard1_replica1 = 72G
numDocs:34,786,026
maxD
Hi,
As a pure C user, without wishing to use Java, what's my best approach
for managing the SolrCloud environment?
I operate a FastCGI environment, so I have the persistence to cache the
state of the "cloud".
So far I see good utilisation of the collections API being my best bet?
Any other
erhaps it will work better for you.
Upayavira
On Sun, Jan 31, 2016, at 06:31 PM, Robert Brown wrote:
Hi,
I've had to switch to using the MLT component, rather than the handler,
since I'm running on Solrcloud (5.4) and if I hit a node without the
starting document, I get nothing back.
Hi,
I've had to switch to using the MLT component, rather than the handler,
since I'm running on Solrcloud (5.4) and if I hit a node without the
starting document, I get nothing back.
When I perform a MLT query, I only get back the ID and score for the
similar documents, yet my fl=*,score.
Hi,
During some testing, I've found that the queryResultCache is not used
when I use grouping.
Is there another cache that is being used in this scenario, if so,
which, and how can I ensure they'[re providing a real benefit?
Thanks,
Rob
Hi,
I have 2 shards, 1 leader and 1 replica in each.
I've just removed a leader from one of the shards but the replica hasn't
become a leader yet.
How quickly should this normally happen?
tickTime=2000
dataDir=/home/rob/zoodata
clientPort=2181
initLimit=5
syncLimit=2
Thanks,
Rob
hat I was after.
On 01/11/2016 05:16 PM, Alessandro Benedetti wrote:
mmm i think there is a misconception here :
On 10 January 2016 at 19:00, Robert Brown wrote:
I'm thinking more about how the external load-balancer will know if a node
is down, as to take it out the pool of active ser
care to Or just let Zookeeper do that
for you. One of the tasks of Zookeeper is pinging all the machines
with all the replicas and, if any of them are unreachable, telling the
rest of the cluster that that machine is down.
Best,
Erick
On Sun, Jan 10, 2016 at 5:19 AM, Robert Brown wrote:
T
to start the 6.0
release process, so it's up in the air.
On Sat, Jan 9, 2016 at 12:04 PM, Robert Brown wrote:
Hi,
(btw, when is 5.5 due? I see the docs reference it, but not the download
page)
Anyway, I index and query Solr over HTTP (no SolrJ, etc.) - is it best/good
to get the CL
Hi,
(btw, when is 5.5 due? I see the docs reference it, but not the
download page)
Anyway, I index and query Solr over HTTP (no SolrJ, etc.) - is it
best/good to get the CLUSTERSTATUS via the collection API and explicitly
send queries to a replica to ensure I don't send queries to the leade
k0.example.com:2181/myapp -c collection1 --node node1.example.com --slice
shard2
I mention this tool every now and then on this list because I like it, but I’m
the author, so take that with a pretty big grain of salt. Feedback is very
welcome.
On 1/8/16, 1:18 PM, "Robert Brown&qu
Hi,
I'm having trouble identifying a replica to delete...
I've created a 3-shard cluster, all 3 created on a single host, then
added a replica for shard2 onto another host, no problem so far.
Now I want to delete the original shard, but got this error when trying
a *replica* param value I th
stion for your product manager
>
> Best
> Erick
>
> On Wed, Feb 8, 2012 at 9:23 AM, Robert Brown wrote:
>> Thanks Erick,
>>
>> I didn't get confused with multiple tokens vs multiValued :)
>>
>> Before I go ahead and re-index 4m docs, and belie
Thanks Erick,
I didn't get confused with multiple tokens vs multiValued :)
Before I go ahead and re-index 4m docs, and believe me I'm using the
analysis page like a mad-man!
What do I need to configure to have the following both indexed with and
without the dots...
.net
sales manager.
£12.50
This all seems a bit too much work for such a real-world scenario?
---
IntelCompute
Web Design & Local Online Marketing
http://www.intelcompute.com
On Tue, 7 Feb 2012 05:11:01 -0800 (PST), Ahmet Arslan
wrote:
>> I'm still finding matches across
>> newlines
>>
>> index...
>>
>> i am fluent
>>
I'm still finding matches across newlines
index...
i am fluent
german racing
search...
"fluent german"
Any suggestions? I've currently got this in wdftypes.txt for
WordDelimiterfilterfactory
\u000A => ALPHANUM
\u000B => ALPHANUM
\u000C => ALPHANUM
\u000D => ALPHANUM
# \u000D\u000A => ALPHA
mapping dots to spaces. I don't think that's workable anyway since
".net" would cause issues.
Tying out the wdftypes now...
---
IntelCompute
Web Design & Local Online Marketing
http://www.intelcompute.com
On Mon, 6 Feb 2012 04:10:18 -0800 (PST), Ahmet Arslan
wrote:
>> My fear is what will t
My fear is what will then happen with highlighting if I use re-mapping?
On Mon, 6 Feb 2012 03:33:03 -0800 (PST), Ahmet Arslan
wrote:
>> I need to tokenise on whitespace, full-stop, and comma
>> ONLY.
>>
>> Currently using solr.WhitespaceTokenizerFactory with
>> WordDelimiterFilterFactory but th
is it good practice, common, or even possible to put symbols in my
list of synonyms?
I'm having trouble indexing and searching for "A&E", with it being
split on the &.
we already convert .net to dotnet, but don't want to store every
combination of 2 letters, A&E, M&E, etc.
--
IntelComp
Hi,
I need to tokenise on whitespace, full-stop, and comma ONLY.
Currently using solr.WhitespaceTokenizerFactory with
WordDelimiterFilterFactory but this is also splitting on &, /,
new-line, etc.
It seems such a simple setup, what am I doing wrong? what do you use
for such "normal searchin
The trailing full-stop above is not being matched when searching for
"sage 200" for the below field type...
Do I need the WordDelimiterFilterFactory for this to work as expected?
I don't see any mention of periods being discussed in the docs.
positionIncrementGap="100">
I have a field which is indexed and queried as follows:
ignoreCase="true" expand="true"/>
words="stopwords.txt" enablePositionIncrements="true" />
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"/>
protected="protwords.txt"/>
When search
I have a text field, using stopwords...
Index and query analysers setup as follows:
SynonymFilterFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
SnowballPorterFilterFactory
Searching for "front of house" brings back perfect matches, but
doesn't highlight the "of".
When searching against 1 field, is it possible to have highlighting
returned 2 different ways?
We'd like the full field returned with keywords highlighted, but then
also returned as snippets.
Any possible approaches?
--
IntelCompute
Web Design & Local Online Marketing
http://www.intelcomp
is it possible to lower the score for synonym matches?
we setup...
admin => administration
but if someone searches specifically for "admin", we want those
specific matches to rank higher than matches for "administration"
--
IntelCompute
Web Design & Local Online Marketing
http://www.inte
se, the boost and fields in the "qf" parameter won't be
> considered for the search. With this query Solr will search for documents
> with the terms "this" and/or (depending on your default operator) "that" in
> the field1 and the term "other"
l not be considered and if
> you use LuceneQP the qf are not considered and it is going to use the
> default search field for the term "other" and no boost.
>
> You can see this very easily turning on the "debugQuery".
>
> Regards,
>
> Tomás
>
>
If I have a set list in solrconfig for my "qf" along with their
boosts, and I then specify field names directly in q (where I could
also override the boosts), are the boosts left in place, or reset to 1?
this^3
that^2
other^9
ie q=field1:+(this that) +(other)
--
IntelCompute
We
I have a query which is highlighting 3 snippets in 1 field, and 1
snippet in another field.
By enabling hl.requireFieldMatch, only the latter highlighted field is
returned.
from this...
plc Whetstone Temporary [hl-on]Sales[hl-off] Assistant Customer
service Cashier work 08
and custom
nt this capability? I'd strongly advise that you
> just forget about
> this feature unless and until there's a demonstrated need. Here's a
> blog I made at
> Lucid. Long-winded, but I'm like that sometimes
>
> http://www.lucidimagination.com/blog/2011/11/03/sto
Boosts can be included there too can't they?
so this is valid?
q=+(stemmed^2:perl or stemmed^3:java) +unstemmed^5:"development
manager"
is it possible to have different boosts on the same field btw?
We currently search across 5 fields anyway, so my queries are gonna
start getting messy. :-/
Is it possible to search a field but not be affected by the snowball
filter?
ie, searching for "manage" is matching "management", but a user may
want to restrict results to only containing "manage".
I was hoping that simply quoting the term would do this, but it
doesn't appear to make any di
Solr 3.3.0
I have a field/type indexed as below.
For a particular document the content of this field is
'FreeBSD,Perl,Linux,Unix,SQL,MySQL,Exim,Postgresql,Apache,Exim'
Using eDismax, mm=1
When I query for...
+perl +(apache sql) +(linux unix)
Strangely, the highlighting is being returned as
011 11:43:11 +0200, Michael Kuhlmann
wrote:
> Am 28.10.2011 11:16, schrieb Robert Brown:
>> Is there no way to return the total number of docs as part of a search?
>
> No, it isn't. Usually this information is of absolutely no value to the
> end user.
>
> A workar
Currently I'm making 2 calls to Solr to be able to state "matched 20
out of 200 documents".
Is there no way to return the total number of docs as part of a
search?
--
IntelCompute
Web Design & Local Online Marketing
http://www.intelcompute.com
When we display search results to our users we include a percentage
score.
Top result being 100%, then all others normalised based on the
maxScore, calculated outside of Solr.
We now want to limit returned docs with a percentage score higher than
say, 50%.
e.g. We want to search but only r
Hi,
We do regular searches against documents, with highlighting on. To
then view a document in more detail, we re-do the search but using
fq=id:12345 to return the single document of interest, but still want
highlighting on, so sending the q param back again.
Is there anything you would rec
ene ecosystem search :: http://search-lucene.com/
>
>
>
>
>>
>>From: Robert Brown
>>To: solr-user@lucene.apache.org
>>Sent: Monday, October 17, 2011 4:01 AM
>>Subject: Re: Multi CPU Cores
>>
>>Where exactly do you set this up?
Where exactly do you set this up? We're running Solr3.4 under tomcat,
OpenJDK 1.6.0.20
btw, is the JRE just a different name for the VM? Apologies for such a
newbie Java question.
On Sun, 16 Oct 2011 12:51:44 -0400, Johannes Goll
wrote:
> we use the the following in production
>
> java -ser
Expanding CA to California sounds like a use for a synonyms config
file? you can then do that translation at index and query time, if
needed.
On Thu, 6 Oct 2011 12:01:33 -0400, Jason Toy
wrote:
> Hi Otis,
> Thanks for the response. So just to make sure I understand clearly, so I
> would store
We don't want to limit the number of results coming back, so
unfortunately grouping doesn't quite fix it, plus it would, by nature,
group docs by a particular Author together which might not necessarily
be adjacent.
On Thu, 6 Oct 2011 07:16:48 -0700 (PDT), Ahmet Arslan
wrote:
>> For the sake of
Hi,
For the sake of simplicity, I have an index with docs containing the
following fields:
Title
Description
Author
Some searches will obviously be saturated by docs from any given
author if they've simply written more.
I'd like to give a negative boost to these matches, there-by making
s
87 matches
Mail list logo