When we have duplicated documents (same uniqueID) among the shards, the query
results could be non-deterministic, this is an known issue.
The consequence when we display the search results on our UI page with
paginating is: if user click the 'last page', it could display an empty page
since the to
Yandong,
have you figured out if it works for you to use one collection per customer?
We have the similar use-case as yours: customer id's are used as core names.
that was the reason our company did not upgrade to solrcould ... I might
remember it wrong but I vaguely remember I looked into using
We just tried to use
.../solr/admin/cores?action=RENAME&core=core0&other=core5
to rename a core 'old' to 'new'.
After the request is done, the solr.xml has new core name, and the solr
admin shows the new core name in the list. But the index dir still has the
old name as the directory name. I loo
Hi Shawn,
I do have persistent="true" in my solr.xml:
...
the command I ran was to rename from '413' to '413a'.
when i debug through solr CoreAdminHandler, I notice the persistent flag
only controls if the new data will be persisted to solr.xml or not, thus as
you can se
thanks Shawn for filing the issue.
by the way my solrconfig.xml has:
${MYSOLRROOT:/mysolrroot}/messages/solr/data/${solr.core.name}
For now I will have to shutdown solr and write a script to modify the
solr.xml manually and rename the core data directory to new one.
by the way when I try to re
yeah I realize using ${solr.core.name} for dataDir must be the cause for the
issue we see... it is fair to say the SWAP and RENAME just create an alias
that still points to the old datadir.
if they can not fix it then it is not a bug :-) at least we understand
exactly what is going on there.
than
Hi -
when I execute a shard query like:
[myhost]:8080/solr/mycore/select?q=type:message&rows=14&...&qt=standard&wt=standard&explainOther=&hl.fl=&shards=solrserver1:8080/solr/mycore,solrserver2:8080/solr/mycore,solrserver3:8080/solr/mycore
everything works fine until I query against a large
any update on this?
will this be addressed/fixed?
in our system, our UI will allow user to paginate through search results.
As my in deep test find out, if the rows=0, the results size is consistently
the total sum of the documents on all shards regardless there is any
duplicates; if the rows
ok when my head is cooled down, I remember this old school issue... that I
have been dealing with it myself.
so I do not expect this can be straighten out or fixed in anyways.
basically when you have to sorted results sets you need to merge, and
paginate through, it is never an easy job (if all i
did any one verified the following is ture?
> the Description on http://wiki.apache.org/solr/CoreAdmin#CREATE is:
>
> *quote*
> If a core with the same name exists, while the "new" created core is
> initalizing, the "old" one will continue to accept requests. Once it
> has finished, all new request
thanks for the information, you are right, I was using the same instance dir.
I agree with you, I would like to see an error is I am creating a core with
the name of existing core name.
right now I have to do ping first, and analyze if the returned code is 404
or not.
Jie
--
View this message
I am trying to get the value of 'dataDir' that was set in solrconfig.xml.
other thank query solr with
http://[host]:8080/solr/default/admin/file/?contentType=text/xml;charset=utf-8&file=solrconfig.xml
and parse the dataDir element using some xml parser, then resolve all
possible environment vari
Hi -
our indexed documents currently store solr fields like 'digest' or 'type',
which most of our documents will end up with same value (such as 'sha1' for
field 'digest', or 'message' for field 'type' etc).
on each solr server, we usually have 100 of millions of documents indexed
and with the sam
thank you David!
--
View this message in context:
http://lucene.472066.n3.nabble.com/suggestion-howto-handle-highly-repetitive-valued-field-tp4026104p4026163.html
Sent from the Solr - User mailing list archive at Nabble.com.
I cleaned up the solr schema by change a small portion of the stored fields
to stored="false".
out for 5000 document (about 500M total size of original documents), I ran a
benchmark comparing the solr index size between the schema before/after the
clean up.
first time run it showed about 40% redu
this is related to my previous post where I did not get any feedback yet...
I am going through a practice to reduce the disk usage by solr index files.
first step I took was to move some fields from stored to not stored. this
reduced the size of .fdt by 30-60%.
very promising... however I notice
thanks for the information...
I did come across that discussion, I guess I will try to write a customized
Similarity class and disable tf.
I hope this is not totally odd to do ... I do notice about 10GB .frq file
size in cores that have total 10-30GB .fdt files. I wish the benchmark will
show me
thanks Erik ... I did run optimize on both indices to get ride of the deleted
data when compare to each other. (and my benchmark tests were just indexing
5000 new documents without duplicates...into a new core... but I did
optimize just to make sure).
I think one results is consistent that the .f
thanks, this is very helpful
--
View this message in context:
http://lucene.472066.n3.nabble.com/if-I-only-need-exact-search-does-frequency-score-matter-tp4026893p4027559.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Otis,
do you think I should customize both tf and idf to disable the term
frequency?
i.e. something like:
public float tf(float freq) {
return freq > 0 ? 1.0f : 0.0f;
}
public float idf(int docFreq, int numDocs) {
return docFreq > 0 ? 1.0f : 0.0f;
}
t
Hi Otis,
I customized the Similarity class and add it through the end of schema.xml:
... ...
and mypackage.NoTfSimilarity.java is like:
public class NoTfSimilarity extends DefaultSimilarity
{
public float tf(float freq)
{
return freq > 0 ? 1.0f : 0.0f;
}
public flo
Hi Otis,
here is the debug output on the query... seems all tf and idf indeed return
1.0f as I customized... I did not overwrite queryNorm or weight etc... see
below.
but the bottom line is that if my purpose is to reduce the frq file size,
customize similarity seems wont help on that. I guess th
When I use HttpClient and its PostMethod to post a query with some Chinese,
solr fails returning any record, or return everything.
... ...
method = new PostMethod(solrReq);
method.getParams().setContentCharset("UTF-8");
method.setRequestHeader("Conten
:-) Otis, I also looked at solrJ source code, seems exactly what I am doing
here... but I probably will do what you suggested ... thanks
Jie
--
View this message in context:
http://lucene.472066.n3.nabble.com/POST-query-with-non-ASCII-to-solr-using-httpclient-wont-work-tp4032957p4032973.html
Se
unfortunately solrj is not an option here...
we will have to make a quick fix with a patch out in production.
I am still unable to make the solr (3.5) take url encoded query. again
passing non-urlencoded query string works with non-ASIIC (Chinese), but
fails return anything when sending request wi
what will happen if in my query I specify a greater number for rows than the
queryResultWindowSize in my solrconfig.xml
for example, if queryResultWindowSize=100, but I need process a batch query
from solr with rows=1000 each time and vary the start move on... what will
happen? if I do not turn o
any suggestions?
--
View this message in context:
http://lucene.472066.n3.nabble.com/queryResultWindowSize-vs-rows-tp401p4012336.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Erik,
no I dont have any evidence, just a precaution question.
So according to your explanation, this cache only keep the document ID, so
if client paying to next group of document in the window, there will be
another query to solr server to retrieve these docs, correct?
ok that is good to kno
Hi -
with a corrupted core,
1. if I run CheckIndex with -fix, it will drop the hook to the corrupted
segment, but the segment files are still there, when we have a lot of
corrupted segments, we have to manually pick them out and remove them, is
there a way the tool can suffix them or prefix them
very often when we try to shutdown tomcat, we got following error in
catalina.out indicating a solr thread can not be stopped, the tomcat results
hanging, we have to kill -9, which we think lead to some core corruptions in
our production environment. please help ...
catalina.out:
... ...
Oct 19,
by the way, I am running tomcat 6, solr 3.5 on redhat 2.6.18-274.el5 #1 SMP
Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-memory-leak-prevent-tomcat-shutdown-tp4014788p4014792.html
Sent from the Solr - User ma
found a solr/lucene bug : TimeLimitingCollector starts thread in static {}
with no way to stop them
https://issues.apache.org/jira/browse/LUCENE-2822
is this the same issue? it is fixed in Luence 3.5. but I am using solr3.5
with lucene 2.9.3 (matched lucene version).
can anyone shed some light
any input on this?
thanks
Jie
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-memory-leak-prevent-tomcat-shutdown-tp4014788p4015265.html
Sent from the Solr - User mailing list archive at Nabble.com.
I have a question about the solr replication (master/slaves).
when index activities are on going on master, when slave send in file list
command to get a version (actually to my understand a snapshot of the time)
of all files and their size/timestamp etc.
then slaves will decide which files need
thanks ...
could you please point me to some more detailed explanation on line or I
will have to read the code to find out? I would like to understand a little
more on how this is achieved. thanks!
Jie
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-replication-agains
thanks... I just read the related code ... now I understand it seems the
master keeps replicable snapshots (version), so it should be static. thank
you Otis!
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-replication-against-active-indexing-on-master-tp4017696p4017743.
we are using solr 3.5 in production and we deal with customers data of
terabytes.
we are using shards for large customers and write our own replica management
in our software.
Now with the rapid growth of data, we are looking into solrcloud for its
robustness of sharding and replications.
I unde
thanks for your feedback Erick.
I am also aware of the current limitation of shard number in a collection is
fixed. changing the number will need re-config and re-index. Let's say if
the limitation gets levitated in near future release, I would then consider
setup collection for each customer, whi
38 matches
Mail list logo