Thank you Shawn! this is very helpful.
Renee
--
View this message in context:
http://lucene.472066.n3.nabble.com/using-HttpSolrServer-with-PoolingHttpClientConnectionManager-tp4322905p4322972.html
Sent from the Solr - User mailing list archive at Nabble.com.
first of all I apologize for the length of this message ... there are few
questions I would appreciate your help please:
1. originally I wanted to use solrj in my application layer (webapp deployed
with tomcat), to query the solr server(s) with multi-cores, non-cloud setup.
Since I need send back
Hi -
I have a schema looks like:
(text_nost and text_st are just defined field type without/with stopwords...
irrelevant to the issues here)
these 3 fields are parallel in means of their values. I want to be able to
match these values and be able to search something like :
give me all attach
thanks for your time!
--
View this message in context:
http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897p4318382.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Chris,
since I have been playing with this install, and I am not certain if I have
unknowingly messed some other settings. I want to avoid put in a false Jira
wasting your time.
I wiped out everything on my solr box and did a fresh install of solr
6.4.0, made sure my config file set are place
Thanks Erick!
I looked at solr twiki though if configSetBaseDir is not set, the default
should be SOLR_HOME/configsets:
configSetBaseDir
The directory under which configsets for solr cores can be found.
Defaults
to SOLR_HOME/configsets
and I do have my solr started with :
-Dsolr.solr.
Hi -
We use separate solr install and data folders with a shared schema/config
(configsets) in multi-cores setup, it seems the configsets need to be
deployed in both places (we are running solr 6.4.0)?
for example,
solr is installed in /opt/solr, thus there is folder:
/opt/solr/server/solr/con
Thanks John... yes that was the first idea came to our mind, but it will
require doubling our servers (in replica data centers as well etc),
definitely we can't afford the cost.
We have thought of first establishing a small pool of 'hot' servers and use
them to take incoming new index data using u
Shawn and Ari,
the 3rd party jars are exactly just one of the concerns I have.
We had more than just a multi-lingual integration, we have to integrate with
many other 3rd party tools. We basically deploy all those jars into an
'external' lib extension path in production, then for each 3rd party too
I just read through the following link Shawn shared in his reply:
https://wiki.apache.org/solr/WhyNoWar
While the following statement is true:
"Supporting a single set of binary bits is FAR easier than worrying
about what kind of customized environment the user has chosen for their
deployment
Thanks everyone, I think this is very helpful... I will post more specific
questions once we start to get more familiar with solr 6.
--
View this message in context:
http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300253.html
Sent fr
Thanks ... but that is an extremely simplified situation.
We are not just looking for Solr as a new tool to start using it.
In our production, we have cloud based big data indexing using Solr for many
years. We have developed lots business related logic/component deployed as
webapps working seaml
need some general advises please...
our infra is built with multiple webapps with tomcat ... the scale layer is
archived on top of those webapps which work hand-in-hand with solr admin
APIs / shard queries / commit or optimize / core management etc etc.
While I have not get a chance to actually p
thanks Yonik... I bet with solr 3.5 we do not have jason facet api support
yet ...
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238522.html
Sent from the Solr - User mailing list archive at Nabble.com.
Also Yonik, out of curiosity... when I run stats on a large msg set (such as
200 million msgs), it tends to use a lot of memory, this should be expected
correct?
if I were able to use !sum=true to only get sum, a clever algorithm should
be able to tell if sum is only requited, it will avoid memory
now I think with solr 3.5 (that we are using), !sum=true (overwrite default )
probably is not supported yet :-(
thanks
Renee
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238519.html
Sent from the Solr - User mailing l
I did try single quote with backslash of the bang.
also tried disable history chars...
did not work for me.
unfortunately, we are using solr 3.5, probably does not support json format?
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-f
thanks!
but it is silly that I can seem to escape the {!sum=true} properly to make
it work in my curl :-(
time curl -d
'q=*:*&rows=0&shards=solrhostname:8080/solr/413-1,anothersolrhost:8080/solr/413-2&stats=true&stats.field={!sum=true}myfieldname'
http://localhost:8080/solr/413-1/select/? | xmll
Hi -
I have been using stats to get the sum of a field data (int) like:
&stats=true&stats.field=my_field_name&rows=0
It works fine but when the index has hundreds million messages on a sharded
indices, it take long time.
I noticed the 'stats' give out more information than I needed (just sum), I
Thanks a lot Shawn, for the details, it is very helpful !
--
View this message in context:
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227274.html
Sent from the Solr - User mailing list archive at Nabble.com.
Shawn, thanks so much, and this user forum is so helpful!
I will start use autocommit with confidence it will greatly help reducing
the false commit requests (a lot) from processes in our system.
Regarding the solr version, it is actually a big problem we have to resolve
sooner or later.
When we
unfortunately we are still using solr 3.5 with lucene 2.9.3 :-( If we upgrade
to solr 4.x it will require upgrade of lucene away from 2.x.x which will
need re-index of all our data. With current measures, it might take about
8-9 for the data we have to be re-indexed, a big concern.
so to understan
thank you! I will look into that.
Also I came across autosoftcommit, it seems to be useful... we are still
using solr 3.5, I hope autosoftcommit is included in solr 3.5...
--
View this message in context:
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-
Walter, thanks!
I will do some tests using auto commit, I guess if there is requirement for
console UI to make documents searchable in 10 minutes, we will need to use
the autocommit with maxTime instead of maxDoc.
I wonder if in case we need to do a 'force commit', the autocommit will not
get in
this make sense now. Thanks!
why I got on this idea is:
In our system we have large customer base and lots of cores, each customer
may have multiple cores.
there are also a lot of processes running in our system processing the data
for these customers, and once a while, they would ask a center p
[core]/index is a folder holding index files.
But index files in that folder is not just being deleted or added, they are
also being updated.
on Linux file system, the folder's timestamp will only be updated if the
files in it is being added or deleted, NOT updated. So if I check the index
folde
hum... at beginning I also assumed segment index files will only be deleted
or added, but not modified.
But I did a test with heavy indexing on going, and observed the index file
in [core]/index with a latest updated timestamp keep growing for about 7
minutes... not sure if the new write caused an
I will need to figure out when was last index activity on a core.
I can't use [corename]/index timestamp, because it only reflex the file
deletion or addition, not file update.
I am curious if any solr core admin RESTful api sort of thing thing I can
use to get last modified timestamp on physica
thanks Shawn...
on the other side, I have just created a thin layer webapp I deploy it with
solr/tomcat. this webapp provides RESTful api allow all kind of clients in
our system to call and request a commit on the certain core on that solr
server.
I put in with the idea to have a centre/final pla
Hi Erick... as Shawn pointed out... I am not using solrcloud, I am using a
more complicated sharding scheme, home grown...
thanks for your response :-)
Renee
--
View this message in context:
http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-
Hi Shawn,
I think we have similar structure where we use frontier/back instead of
hot/cold :-)
so yes we will probably have to do the same.
since we have large customers and some of them may have tera bytes data and
end up with hundreds of cold cores the blind delete broadcasting to all
of th
Shawn,
thanks for the reply.
I have a sharded index. When I re-index a document (vs new index, which is
different process), I need to delete the old one first to avoid dup. We all
know that if there is only one core, the newly added document will replace
the old one, but with multiple core indexes
I run this curl trying to delete some messages :
curl
'http://localhost:8080/solr/mycore/update?commit=true&stream.body=abacd'
| xmllint --format -
or
curl
'http://localhost:8080/solr/mycore/update?commit=true&stream.body=myfield:mycriteria'
| xmllint --format -
the results I got is like:
%
sorry I should elaborate that earlier...
in our production environment, we have multiple cores and the ingest
continuously all day long; we only do optimize periodically, and optimize
once a day in mid night.
So sometimes we could see 'too many open files' error. To prevent it from
happening, in
yeah, I can figure out the segment number by going to stat page of solr...
but my question was how to figure out exact total number of files in 'index'
folder for each core.
Like I mentioned in previous message, I currently have 8 files per segment
(.prx .tii etc), but it seems this might change i
thanks!
It seems the file count in index directory is the segment# * 8 in my dev
environment...
I see there are .fnm .frq .fdt .fdx .nrm .prx .tii .tis (8) file extensions,
and each has as many as segment# files.
Is it always safe to calculate the file counts using segment number multiply
by 8?
ok I dug more into this and realize the file extensions can vary depending on
schema, right?
for instance we dont have *.tvx, *.tvd, *.tvf (not using term vector)... and
I suspect the file extensions
may change with future lucene releases?
now it seems we can't just count the file using any formul
Hi Hoss,
thanks for your response...
you are right I got a typo in my question, but I did use maxSegments, and
here is the exactly url I used:
curl
'http://localhost:8080/solr/97/update?optimize=true&maxSegments=10&waitFlush=true'
I used jconsole and du -sk to monitor each partial optimize, and
I have a core with 120+ segment files and I tried partial optimize specify
maxNumSegments=10, after the optimize the segment files reduced to 64 files;
I did the same optimize again, it reduced to 30 something;
this keeps going and eventually it drops to teen number.
I was expecting seeing the o
just update on this issue...
we turned off the new/first searchers (upgrade to Solr 1.4.1), and ran
benchmark tests, there is no noticeable performance impact on the queries we
perform comparing with Solr 1.3 benchmark tests WITH new/first searchers.
Also the memory usage reduced by 5.5 GB after
Ken,
looks like we posted at same time :-)
thanks very much!
Renee
--
View this message in context:
http://lucene.472066.n3.nabble.com/using-HTTPClient-sending-solr-ping-request-wont-timeout-as-specified-tp1691292p1695584.html
Sent from the Solr - User mailing list archive at Nabble.com.
thanks Michael, I got it resolved last night... you are right, it is more
like a HttpClient issue after I tried another link unrelated to solr. If
anyone is interested, here is the working code:
HttpClientParams httpClientParams = new HttpClientParams();
httpClientParams.setSoTim
I am using the following code to send out solr request from a webapp. please
notice the timeout setting:
HttpClient client = new HttpClient();
HttpMethod method = new GetMethod(solrReq);
method.getParams().setParameter(HttpConnectionParams.SO_TIMEOUT,
I also added the following timeout for the connection, still not working:
client.getParams().setSoTimeout(httpClientPingTimeout);
client.getParams().setConnectionManagerTimeout(httpClientPingTimeout);
--
View this message in context:
http://lucene.472066.n3.nabble.com/
Hi Yonik,
I tried the fix suggested in your comments (using "solr.TrieDateField" ),
and it loaded up 130 cores in 1 minute, 1.3GB memory (a little more than 1GB
when turning off static warm cache, and much less than 6.5GB when use
'solr.DateField').
Will this have any impact on first query or per
http://lucene.472066.n3.nabble.com/file/n1617135/solrconfig.xml
solrconfig.xml
Hi Yonik,
I have uploaded our solrconfig.xml file for your reference.
we also tried 1.4.1, for same index data, it took about 30-55 minutes to
load up all 130 cores, it did not help at all.
There is no query running
Hi Yonik,
I attached the solrconfig.xml to you in previous post, and we do have
firstSearch and newSearch hook ups.
I commented them out, all 130 cores loaded up in 1 minute, same as in solr
1.3. total memory took about 1GB. Whereas in 1.3, with hook ups, it took
about 6.5GB for same amount of
Hi Yonik,
thanks for your reply.
I entered a bug for this at :
https://issues.apache.org/jira/browse/SOLR-2138
to answer your questions here:
- do you have any warming queries configured?
> no, all autowarmingcount are set to 0 for all caches
- do the cores have documents already, and i
Hi -
I posted this problem but no response, I guess I need to post this in the
Solr-User forum. Hopefully you will help me on this.
We were running Solr 1.3 for long time, with 130 cores. Just upgrade to Solr
1.4, then when we start the Solr, it took about 45 minutes. The catalina.log
shows Solr
49 matches
Mail list logo