Hi,
I'm currently looking at zappos solr implementation on their website.
One thing make me curious is how their facet filter works.
If you see zappos facet filter, there are some facet that allow us
filter using multiple value, for example size and brands. The
behaviour allow user to select mult
You need to store the color field as multi valued stored field. You have to
do pagination manually. If you worried, then use database. Have a table
with Product Name and Color. You could retrieve data with pagination.
Still if you want to achieve it via Solr. Have a separate record for every
produ
Thanks Shawn and Yonik!
Yonik: I noticed this error appears to be fairly trivial, but it is not
appearing after a previous crash. Every time I run this high-volume test
that produced my stack trace, I zero out the logs, Solr data and
Zookeeper data and start over from scratch with a brand new
: Thanks for your help. I found a workaround for this use case, which is to
: avoid using a shards query and just asking each shard for a dump of the
that would be (step#1 in) the method i would recomend for your usecase of
"check whats in the entire index" because it drasitcally reduces the
On 7/25/2013 6:53 PM, Tim Vaillancourt wrote:
> Thanks for the reply Shawn, I can always count on you :).
>
> We are using 10GB heaps and have over 100GB of OS cache free to answer the
> JVM question, Young has about 50% of the heap, all CMS. Our max number of
> processes for the JVM user is 10k,
thank u for replying very much .
in fact ,we make a process for this problem , we found when master building
index, it will clean self index when building index . so slave every minute
to sync index, destroy self index folder.
by the way : we building index using
dataimport0?command=full-impor
On Thu, Jul 25, 2013 at 7:44 PM, Tim Vaillancourt wrote:
> "ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
> Failure to open existing log file (non fatal)
>
That itself isn't necessarily a problem (and why it says "non fatal")
- it just means that most likely the a transac
Thanks for the reply Shawn, I can always count on you :).
We are using 10GB heaps and have over 100GB of OS cache free to answer the
JVM question, Young has about 50% of the heap, all CMS. Our max number of
processes for the JVM user is 10k, which is where Solr dies when it blows
up with 'cannot c
On 7/25/2013 5:44 PM, Tim Vaillancourt wrote:
The transaction log error I receive after about 10-30 minutes of load
testing is:
"ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
Failure to open existing log file (non fatal)
/opt/easw/easw_apps/easo_solr_cloud/solr/xmshd_sha
Stack trace:
http://timvaillancourt.com.s3.amazonaws.com/tmp/solrcloud.nodeC.2013-07-25-16.jstack.gz
Cheers!
Tim
On 25 July 2013 16:44, Tim Vaillancourt wrote:
> Hey guys,
>
> I am reaching out to the Solr list with a very vague issue: under high
> load against a SolrCloud 4.3.1 cluster of 3
Hey guys,
I am reaching out to the Solr list with a very vague issue: under high load
against a SolrCloud 4.3.1 cluster of 3 instances, 3 shards, 2 replicas (2
cores per instance), I eventually see failure messages related to
transaction logs, and shortly after these stacktraces occur the cluster
On 7/25/2013 4:45 PM, Tom Burton-West wrote:
Thanks for your help. I found a workaround for this use case, which is to
avoid using a shards query and just asking each shard for a dump of the
unique ids. i.e. run an *:* query and ask for 1 million rows at a time.
This should be a no scoring quer
Hi Shawn,
Thanks for your help. I found a workaround for this use case, which is to
avoid using a shards query and just asking each shard for a dump of the
unique ids. i.e. run an *:* query and ask for 1 million rows at a time.
This should be a no scoring query, so I would think that it doesn't
http://wiki.apache.org/solr/DocValues#Specifying_a_different_Codec_implementation
OK, it seems there's no back compat for disk based docvalues
implementation. I have to reindex documents to get rid of this issue.
On 25 July 2013 22:17, Marcin Rzewucki wrote:
> Hi,
>
> After upgrading from solr
On 7/25/2013 3:09 PM, Tom Burton-West wrote:
Thanks Shawn,
I was confused by the error message: "Invalid version (expected 2, but 60)
or the data in not in 'javabin' format"
Your explanation makes sense. I didn't think about what the shards have to
send back to the head shard.
Now that I look
Yes, your assumption is wrong. It does what it says, "only the fields in this
list will be included" in the response.
wunder
On Jul 25, 2013, at 2:44 PM, Matt Lieber wrote:
> Hi,
>
> I only want to return one field in the documents being returned from my query.
> I know there is the 'fl' param
fl is on the server side. Try it in a browser and you'll see that.
Upayavira
On Thu, Jul 25, 2013, at 10:44 PM, Matt Lieber wrote:
> Hi,
>
> I only want to return one field in the documents being returned from my
> query.
> I know there is the 'fl' parameter, which is described in the
> document
Hi,
I only want to return one field in the documents being returned from my query.
I know there is the 'fl' parameter, which is described in the documentation
http://wiki.apache.org/solr/CommonQueryParameters as:
"This parameter can be used to specify a set of fields to return, limiting the
amo
I'm not entirely clear about your question. However, in replication, you
should never commit docs directly to your slave, it will mess up the
synchronisation of your indexes, and hence mess up your replication. If
that's what you are proposing, don't do it!
Upayavira
On Thu, Jul 25, 2013, at 08:2
Thanks Shawn,
I was confused by the error message: "Invalid version (expected 2, but 60)
or the data in not in 'javabin' format"
Your explanation makes sense. I didn't think about what the shards have to
send back to the head shard.
Now that I look in my logs, I can see the posts that the shard
Use LucidWorks Search, define a file system data source and set the schedule
to crawl the directory every minute, 5 minutes, 30 seconds, or whatever
interval you want.
http://docs.lucidworks.com/display/lweug/Simple+Filesystem+Data+Sources
http://docs.lucidworks.com/display/help/Schedules
-- J
Yeah, those are the rules. They are more of a heuristic that manages to work
most of the time reasonably well, but like most heuristics, it is not
perfect.
In this particular case, your best bet would be to use an update processor
to discard the "ignored" field values before Solr actually sees
Hi,
After upgrading from solr 4.3.1 to solr 4.4 I have the following issue:
ERROR - 2013-07-25 20:00:15.433; org.apache.solr.core.CoreContainer; Unable
to create core: awslocal_shard5
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.(SolrCo
I have flume sink directory where new files are being written periodically.
How can I instruct solr to index the files in the directory every time a
new file gets written.
Any ideas?
Thanks,
Rajesh
We are using SOLR 4.3.1 but not using solrcloud now.
We currently support both push and pull indexing and we use softcommit for
push indexing purpose. Now whenever we perform pull indexing (using indexer
program) the changes made by the push indexing process (during indexing
time) might get lost h
Great, this guidance is definitely pointing me in the right direction.
Thanks Shawn, Erick, and Hoss. I'll pursue this some more and see if I
can get it working.
Brian
Hi,
I send a query to SOLR, which returns exactly one document. It's a
"id:some_doc_id" search. Here are the parameters as shown in the response:
params: {
mlt.mindf: "1",
mlt.count: "5",
mlt.fl: "text",
fl: "id,,application_id,...
project_start,project_end,project_title,score",
start
Thanks for all answers.
It appears that we will not have a data-center failure tolerant deployment of
zookeeper without a 3rd datacenter. The other alternative is to forget about
running zookeepers across datacenters, and instead have a live-warm deployment
(and we'd have to manually switch/fa
I agree with your comment on separating noise with the actual relevant
result.
My approach to separate relevant result with noise is not algorithmic but
an absolute measure, i.e. top 5 or top 10 results will always be relevant
(at-least the probability is higher).
But again, that kind of simple sor
On 26 July 2013 00:11, kaustubh147 wrote:
> Hi,
>
> When I am connecting my application to solr thru a load balancer
> (https://domain name/apache-solr-4.0.0), it is significantly slow. but if I
> connect Solr directly (https://11.11.1.11:8080/apache-solr-4.0.0) on the
> application server it work
On 7/25/2013 12:26 PM, Shawn Heisey wrote:
Either multipartUploadLimitInKB doesn't work properly, or there may be
some hard limits built into the servlet container, because I set
multipartUploadLimitInKB in the requestDispatcher config to 32768 and it
still didn't work. I wonder, perhaps there i
Hi,
When I am connecting my application to solr thru a load balancer
(https://domain name/apache-solr-4.0.0), it is significantly slow. but if I
connect Solr directly (https://11.11.1.11:8080/apache-solr-4.0.0) on the
application server it works better.
Ideally use of load balancer should give b
Hi! http://motelchanty.com.br/google.com.offers.html
Hi Jack,
I should have pointed out our use case. In any reasonable case where
actual end users will be looking at search results, paging 1,000 at a time
is reasonable. But what we are doing is a dump of the unique ids with a
"*:*" query. This allows us to verify that what our system thinks has
On 7/25/2013 11:39 AM, Tom Burton-West wrote:
Hello,
I am running solr 4.2.1 on 3 shards and have about 365 million documents in
the index total.
I sent a query asking for 1 million rows at a time, but I keep getting an
error claiming that there is an invalid version or data not in javabin
form
As usual, there is no published hard limit per se, but I would urge caution
about requesting more than 1,000 rows at a time or even 250. Sure, in a fair
number of cases 5,000 or 10,000 or even 100,000 MAY work (at least
sometimes), but Solr and Lucene are more appropriate for "paged" results,
w
Hello,
I am running solr 4.2.1 on 3 shards and have about 365 million documents in
the index total.
I sent a query asking for 1 million rows at a time, but I keep getting an
error claiming that there is an invalid version or data not in javabin
format (see below)
If I lower the number of rows re
Look for the presentations online. You are not the first store to use Solr,
there are some explanations around. Try one from Gilt, but I think there
were more.
You will want to store data at the lowest meaningful level of search
granularity. So, in your case, it might be ProductVariation (shoes+co
I was hoping to do this from within Solr, that way I don't have to manually
mess around with pagination. The number of items on each page would be
indeterministic.
On Jul 25, 2013, at 9:48 AM, Anshum Gupta wrote:
> Have a multivalued stored 'color' field and just iterate on it outside of
> so
Have a multivalued stored 'color' field and just iterate on it outside of
solr.
On Thu, Jul 25, 2013 at 10:12 PM, Mark wrote:
> How would I go about doing something like this. Not sure if this is
> something that can be accomplished on the index side or its something that
> should be done in ou
How would I go about doing something like this. Not sure if this is something
that can be accomplished on the index side or its something that should be done
in our application.
Say we are an online store for shoes and we are selling Product A in red, blue
and green. Is there a way when we sea
On 7/25/2013 8:21 AM, Brian Robinson wrote:
The sentence on the admin page just tells me to check the logs, but I
don't appear to have any yet. Those are located in
solr/collection1/data/tlog/, right?
Those are transaction logs - for durability in the face of failure and
for the real-time get
I think the default SpellingQueryConverter has a hard time with terms that
contain numbers. Can you provide a failing case...the query you're executing
(with all the spellcheck.xxx params) and the spellcheck response (or lack
thereof). Is it producing any hits?
James Dyer
Ingram Content Group
Hi,
SPM for Solr shows numDocs, maxDocs, and their delta. Is that what
you are after? See http://sematext.com/spm
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Wed, Jul 17, 2013 at 4:06 PM, Furkan KAMACI wrote:
> I have cra
Hi,
given a dynamic field
stored="true" />
There are some other suffix-based fields as well. And some of the fields
in document should be ignored, they have "nosolr_" prefix. But defining
stored="false" />
even at the start of schema does not work, field
"nosolr_inv_dunning_boolean" is r
Thanks Eric and Flavio for your responses. Sorry, I meant to I was creating
collections and not cores.
I used the same article as suggested by Flavio to set up the solr cloud and
I did it twice. Both the times I am facing the same issue. I am not sure
where the problem is.
I am using the follo
Hi! http://www.cedarsshawarma.com/google.com.offers.html
if you get an error on the admin UI, there should be specifics about
*what* the initialization failure is -- at last one sentence, and there
should be a full stack trace in the logs -- having those details will
help understand the root of your first problem, which may explain your
second problem
I'm looking into some possible slow down after long indexing issues when I get
back from vacation. This could be related. Very early guess though.
Another thing you might try - Lucene recently changed the merge scheduler
policy defaults (in 4.1) - it used to use up 3 threads to merge and have a
Forgot to attach server and solr configurations:
SolrCloud 4.1, internal Zookeeper, 16 shards, custom java importer.
Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb
SSD and 50tb SAS memory
On Thu, Jul 25, 2013 at 3:20 PM, Radu Ghita wrote:
>
> Hi,
>
> We are having
Hi,
I think you are pushing it too far - there is no 'string search' without an
index. And besides, these things are just better done by a few lines of
code - and if your array is too big, then you should create the index...
roman
On Thu, Jul 25, 2013 at 9:06 AM, Rohit Kumar wrote:
> Hi,
>
>
Hi,
I have a scenario.
String array = ["Input1 is good", ""Input2 is better", "Input2 is sweet",
"Input3 is bad"]
I want to compare the string array against the given input :
String inputarray= ["Input1", "Input2"]
It involves no indexes. I just want to use the power of string search to do
a r
Ok , now it works perfectly. In the past version i've renamed the default
collection, but
with
http://myserver/solr/
i was accessing directly
http://myserver/solr/corename/
probably because the default collection became the one that i have renamed.
Thanks for the help!
--
View this message
Well, we have hit the aforementioned jira issue with about 80 shards. The
sharding for us is a pure function of memory consumption and we use RAM
lots. With solr4 however, things look much better and hopefully having
migrated from solr3 we can live for long time without hitting the limit
again.
Auto soft commit is great for real time access, but you need to do hard
commits periodically or else the transaction log (which is what assures that
soft commits are durable) gets too big - it needs to be replayed on startup
and is used for real-time search.
So, set the auto soft commit to the
My actual solconfig.xml is:
${solr.ulog.dir:}
1
true
I tried (solrj 4.3.1) and after 10 sec I got results:
1) server.add(doc) - nothing in index
2) server.add(doc, 1) - nothing in index
3) server.add(doc) and server.commit() - all fine, b
Hi,
We are having a client with business model that requires indexing each
month billion rows into solr from mysql in a small time-frame. The
documents are very light, but the number is very high and we need to
achieve speeds of around 80-100k/s. The built in solr indexer goes to
40-50k tops, but
Start here: http://wiki.apache.org/solr/HowToContribute
Then, when your patch is ready submit a JIRA and attach
your patch. Then nudge (gently) if none of the committers
picks it up and applies it
NOTE: It is _not_ necessary that the first version of your
patch is completely polished. I often
I find this article very interesting about cloud deployment:
http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html
Best,
Flavio
On Thu, Jul 25, 2013 at 1:59 PM, Erick Erickson wrote:
> I'd advise you to tear it down and start over. You should be
> creating new _collections_, no
Vicky:
Please define "&distrib=false doesn't work".
_What_ doesn't work? What are the symptoms?
It could be a bug or it could be a misunderstanding, I
have no way of even guessing.
Best
Erick
On Thu, Jul 25, 2013 at 3:52 AM, vicky desai wrote:
> Hi,
>
> I have also noticed that once I put the c
Actually, you're getting a solr.xml file but you don't know it.
When Solr doesn't find solr.xml, there's a default one hard-
coded that is used. See ConfigSolrOld.java, at the end
DEF_SOLR_XML is defined.
So, as Hoss says, it's much better to make one anyway so
you know what you're getting.
Consi
I don't think there is any hard limit, but it will be more of a
performance-based limit. Going beyond a couple dozen shards (lets say, 25)
would take you into uncharted territory, where a sophisticated proof of
concept implementation is essential. "Hundreds" or "thousands" of shards are
likely
I'd advise you to tear it down and start over. You should be
creating new _collections_, not cores at this level I believe. And
manually editing the cluster state is just _asking_ for
trouble unless you really understand what's happening under
the covers, and since you say you're relatively new
If that level of scripting is difficult for you, consider the LucidWorks
Search product which has a built-in scheduler for crawl/import jobs,
including web sites, file system directories, sharepoint repositories, and
databases.
See:
http://docs.lucidworks.com/display/help/Crawling+Content
--
Mikhail,
Yes, +1.
This question comes up a few times a year. Grant created a JIRA issue
for this many moons ago.
https://issues.apache.org/jira/browse/LUCENE-2127
https://issues.apache.org/jira/browse/SOLR-1726
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring
"collection1" is the default, so when you enter
http://myserver/solr/, under the covers you get
http://myserver/solr/collection1/.
So go ahead and rename your cores, but address
them specifically as
http://myserver/solr/corename/
Best
Erick
On Thu, Jul 25, 2013 at 7:10 AM, santonel wrote:
> Hi
The query syntax is case sensitive; "and" is treated as a search term and
not as an operator.
On Thu, Jul 25, 2013 at 1:00 PM, Payal.Mulani <
payal.mul...@highqsolutions.com> wrote:
> Hi,
>
> I am using solr14 and when I search with 'and' the it searches the
> documents
> containing 'and' as a t
Picking up on what Donimique mentioned. Your ZK configuration
isn't doing you much good. Not only do you have an even number
6 (which is actually _less_ robust than having 5), but by splitting
them among two data centers you're effectively requiring the data
center with 4 nodes to always be up. If
Hi
I've upgraded my solr server (a single core with single collection) from
4.3.1 to 4.4.0, using the new solr.xml
configuration file from example and setting the new core.properties (with my
collection name) under the instance dir.
When i check the status of solr via web interface, all is up an
Hi,
Context:
* https://issues.apache.org/jira/browse/SOLR-4956
*
http://search-lucene.com/c/Solr:/core/src/java/org/apache/solr/update/SolrCmdDistributor.java%7C%7CmaxBufferedAddsPerServer
As you can see, maxBufferedAddsPerServer = 10.
We have an app that sends 20K docs to SolrCloud using Cloud
Hi,
I am using solr14 and when I search with 'and' the it searches the documents
containing 'and' as a text but If I am searching with 'AND' word then it
will not search 'and' as a text and taking as a logical operator so any one
have idea that why this both makes difference.
Also both giving dif
I have to execute this command for full-import
http://localhost:8983/solr/dataimport?command=full-import
Can you explain how do i use the java timer to fire this HTTP request.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Auto-Indexing-in-Solr-tp4080233p4080278.html
Sent
Hi
You could use Java timer. Trigger your DB import, every X minute. Another
option, You may aware when your DB is updated. When ever DB gets changed,
trigger the request to index the new added data.
Regards
Aditya
www.findbestopensource.com
On Thu, Jul 25, 2013 at 11:42 AM, archit2112 wrote:
Hi! http://mpreprointranet.com/google.com.offers.html
Nicole,
According to our findings, there is also a limit for the number of shards
depending on the volume of the returned data. See this jira:
https://issues.apache.org/jira/browse/SOLR-4903
Dmitry
On Thu, Jul 25, 2013 at 11:25 AM, Nicole Lacoste wrote:
> Oh found the answer myself. Its the
Hi,
I using SOLR 3.6.1 and implemented spellcheck. I found that the numbers in the
spellcheck query does not return any results. Below is my solrconfig.xml and
schema.xml details. Please any one let me know what needs to be done in order
to get the spell check for numbers.
solrConfig
Hello,
I have implemented a Solr EventListener, which should be fired after
committing.
This works fine on the Solr-Master Instance and it also worked in Solr 3.5
on any Slave Instance.
I upgraded my installation to Solr 4.2 and now the postCommit event is not
fired any more on the replication (S
Hi,
I have implemented like Chris described it:
The field is indexed as numeric, but displayed as string, according to
configuration.
It applies to facet, pivot, group and query.
How do we proceed? How do I contribute it?
Thanks.
-Original Message-
From: Chris Hostetter [mailto:hossman
Oh found the answer myself. Its the GET methods URL length that limits the
number of shards.
Niki
On 25 July 2013 10:14, Nicole Lacoste wrote:
> Is there a limit on the number of shards?
>
> Niki
>
>
> On 24 July 2013 01:14, Jack Krupansky wrote:
>
>> 2.1 billion documents (including deleted
BTW, How Solr's MoreLikeThis Component works? Which algorithm does it use
at underlying?
2013/7/24 Roman Chyla
> This paper contains an excellent algorithm for plagiarism detection, but
> beware the published version had a mistake in the algorithm - look for
> corrections - I can't find them no
Is there a limit on the number of shards?
Niki
On 24 July 2013 01:14, Jack Krupansky wrote:
> 2.1 billion documents (including deleted documents) per Lucene index, but
> essentially per Solr shard as well.
>
> But don’t even think about going that high. In fact, don't plan on going
> above 100
Hi,
I have also noticed that once I put the core up on both the machine
&distrib=false works well. could this be a possible bug that when a core is
down on one instance &distrib=false doesnt work
--
View this message in context:
http://lucene.472066.n3.nabble.com/Querying-a-specific-core-in-so
Hi Erik,
Thanks for the reply
But does &distrib=true work for replicas as well. As i mentioned earliear I
have a set up of 1 leader and 1 replica. If a core is up on either of the
instances querying to both the instances gives me results even with
&distrib=false
--
View this message in contex
83 matches
Mail list logo