Yes, absolutely correct, comma is missing at the end of line 10
All key-value pairs inside the same block should be comma separated, except
last one
From: Shawn Heisey
Reply: solr-user@lucene.apache.org
Date: April 25, 2017 at 2:29:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Cause
;s no results? My strategy is to prefer an AND for large
> collections (or a higher mm than 1) and prefer closer to an OR for
smaller
> collections.
>
> -Doug
>
> On Tue, Feb 21, 2017 at 1:39 PM Fuad Efendi > wrote:
>
>> Thank you Ahmet, I will try it; sounds rea
explicitly set similarity to tf-idf and see how it goes?
Ahmet
On Tuesday, February 21, 2017 4:28 AM, Fuad Efendi wrote:
Hello,
Default TF-IDF performs poorly with the indexed 200 millions documents.
Query "Michael Jackson" may run 300ms, and "Michael The Jackson" over 3
secon
chael Jackson” runs 300ms instead of 3ms just because huge number
of hits and TF-IDF calculations. Solr 6.3.
Thanks,
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Search Relevancy, Recommender Systems
user
pass
dbname
localhost
1433
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Search Relevancy, Recommender Systems
From: Per Newgro
Reply: solr-user@lucene.apache.org
Date: February 7
Were you indexing new documents while reloading? “Previously we’ve done
reloads of a collection after changing solrconfig.xml without any issues.”
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Search Relevancy, Recommender Systems
From: Kelly, Frank
Reply: solr-user
Correct: multivalued field with 1 shop IDs. Use case: shopping network
in U.S. for example for a big brand such as Walmart, when user implicitly
provides IP address or explicitly Postal Code, so that we can find items in
his/her neighbourhood.
You basically provide “join” information via this
Not; historical logs for document updates is not provided. Users need to
implement such functionality themselves if needed.
From: Mahmoud Almokadem
Reply: solr-user@lucene.apache.org
Date: February 6, 2017 at 3:32:34 PM
To: solr-user@lucene.apache.org
Subject: Time of insert
Hello,
I'm u
simplify life ;)
On November 4, 2016 at 12:05:13 PM, Fuad Efendi (f...@efendi.ca) wrote:
Yes we need that documented,
http://stackoverflow.com/questions/8924102/restricting-ip-addresses-for-jetty-and-solr
Of course Firewall is a must for extremely strong environments / large
corporations, DMZ
+ DMZ(s)
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Search Relevancy, Recommender Systems
On November 4, 2016 at 9:28:21 AM, David Smiley (david.w.smi...@gmail.com)
wrote:
I was just researching how to secure Solr by IP address and I finally
figured it out. Perhaps this might go in
ould
be very different.
I had recently assignment at well-known retail shop where we even designed
pre-query custom boosts so that we can customize typical (most important
for the business) queries as per business needs
Thanks,
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Search Relevan
e I've eliminated general connectivity/authentication problems.
Thanks,
Jamie
On Wed, Nov 2, 2016 at 4:58 PM, Fuad Efendi wrote:
> In MySQL, this command will explicitly allow to connect from
> remote ICZ2002912 host, check MySQL documentation:
>
> GRANT ALL ON mys
In MySQL, this command will explicitly allow to connect from remote ICZ2002912
host, check MySQL documentation:
GRANT ALL ON mysite.* TO 'root’@'ICZ2002912' IDENTIFIED BY ‘Oakton123’;
On November 2, 2016 at 4:41:48 PM, Fuad Efendi (f...@efendi.ca) wrote:
This is the root
stance
I suspect you need to allow MySQL & Co. to accept connections from ICZ2002912.
Plus, check DNS resolution, etc.
Thanks,
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Recommender Systems
On November 2, 2016 at 2:37:08 PM, Jamie Jackson (jamieja...@gmail.com) wrote:
I
sider sharding / SolrCloud if you need huge memory
just for field cache. And you will be forced to consider it if you gave more
that 2 billions documents (am I right? Lucene internal limitation,
Integer.MAX_INT)
Thanks,
--
Fuad Efendi
(416) 993-2060
http://www.tokenizer.ca
Search Relevanc
internal
caches.
Solr has the way to warm up internal caches before making new searcher
available:
https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
Make this queries typical for your use cases (for instance, *:* with faceting):
Thanks,
--
Fuad Efendi
(416
.
But it works fine with KeywordTokenizer.
Any idea why? Thanks,
--
Fuad Efendi
http://www.tokenizer.ca
Data Mining, Vertical Search
;what is the best way to stop Solr when it gets in OOM” (or just
becomes irresponsive because of swallowed exceptions)
--
Fuad Efendi
416-993-2060(cell)
On February 25, 2016 at 2:37:45 PM, CP Mishra (mishr...@gmail.com) wrote:
Looking at the previous threads (and in our tests), oom script spec
> I can
> manually create an httpclient and set up authentication but then I can't use
> solrj.
Yes; correct; except that you _can_ use solj with this custom HttpClient
instance (which will intercept authentication, which will support cookies, SSL
or plain HTTP, Keep-Alive, and etc.)
You can
Hi,
Please add me: FuadEfendi
Thanks!
--
http://www.tokenizer.ca
Hi,
Few months ago I was able to modify Wiki; I can't do it now, probably
because http://wiki.apache.org/solr/ContributorsGroup
Please add me: FuadEfendi
Thanks!
--
Fuad Efendi, PhD, CEO
C: (416)993-2060
F: (416)800-6479
Tokenizer Inc., Canada
http://www.tokenizer.ca
eaders when you POST your file to
Solr)
-Fuad Efendi
http://www.tokenizer.ca
-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: October-03-12 1:30 PM
To: solr-user@lucene.apache.org
Subject: RE: Can SOLR Index UTF-16 Text
Something is missing from the body of your E
re and etc...
-Fuad Efendi
http://www.tokenizer.ca
-Original Message-
From: vybe3142 [mailto:vybe3...@gmail.com]
Sent: October-03-12 12:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Can SOLR Index UTF-16 Text
Thanks for all the responses. Problem partially solved (see below)
1.
Solr can index bytearrays too: unigram, bigram, trigram... even bitsets,
tritsets, qatrisets ;- )
LOL I got strong cold...
BTW, don't forget to configure UTF-8 as your default (Java) container
encoding...
-Fuad
ion unique terms - per document. Do you have
>>such
>> large documents? This appears to be a hard limit based of 24-bytes in a
>>Java
>> int.
>>
>> You can try facet.method=enum, but that may be too slow.
>>
>> What release of Solr are you running
- per document. Do you have
>such
>large documents? This appears to be a hard limit based of 24-bytes in a
>Java
>int.
>
>You can try facet.method=enum, but that may be too slow.
>
>What release of Solr are you running?
>
>-- Jack Krupansky
>
>-Original Message
"channel:MyTerm" it shows 650 documents foundŠ possibly bugŠ it
happens after I commit data too, nothing changes; and this field is
single-valued non-tokenized string.
-Fuad
--
Fuad Efendi
416-993-2060
http://www.tokenizer.ca
possibly bugŠ it
happens after I commit data too, nothing changes; and this field is
single-valued non-tokenized string.
-Fuad
--
Fuad Efendi
416-993-2060
http://www.tokenizer.ca
Hi there,
"Load term Info" shows 3650 for a specific term "MyTerm", and when I execute
query "channel:MyTerm" it shows 650 documents found possibly bug it
happens after I commit data too, nothing changes; and this field is
single-valued non-tokenized string.
-Fuad
)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand
ler.java:204)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561)
--
Fuad Efendi
http
:1561)
--
Fuad Efendi
http://www.tokenizer.ca
can get more information and also download from here:
>
>http://solr-ra.tgels.org
>
>Regards
>
>- Nagendra Nagarajayya
>http://solr-ra.tgels.org
>http://rankingalgorithm.tgels.org
>
>ps. Note: Apache Solr 4.0 with RankingAlgorithm 1.4.4 is an external
>implementation
we can use Facets with Near Real Time
feature
Service layer will accumulate search results from three layers, it will be
near real time.
Any thoughts? Thanks,
--
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
http://www.tokenizer.ca
http://www.linkedin.com/in/lucene
> FWIW, when asked at what point one would want to split JVMs and shard,
> on the same machine, Grant Ingersoll mentioned 16GB, and precisely for
> GC cost reasons. You're way above that.
- his index is 75G, and Grant mentioned RAM heap size; we can use terabytes
of index with 16Gb memory.
, Web Services, Moreover, Web Ping, SQL-import, sitemaps-based,
intranets, and more.
Additionally to that, I can design super-rich UI extremely fast using tools
such as Liferay Portal, Apache Wicket, Vaadin.
Thanks,
--
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
http://www.tokenizer.ca <h
, Web Services, Moreover, Web Ping, SQL-import, sitemaps-based,
intranets, and more.
Additionally to that, I can design super-rich UI extremely fast using tools
such as Liferay Portal, Apache Wicket, Vaadin.
Thanks,
--
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
http://www.tokenizer.ca <h
I agree that SSD boosts performance... In some rare not-real-life scenario:
- super frequent commits
That's it, nothing more except the fact that Lucene compile time including
tests takes up to two minutes on MacBook with SSD, or forty-fifty minutes on
Windows with HDD.
Of course, with non-empty
lient and SOLRJ...
Fuad Efendi
http://www.tokenizer.ca
Sent from my iPad
On 2011-11-19, at 9:14 PM, alx...@aim.com wrote:
> Hello,
>
> I use solr 3.4 with jetty that is included in it. Periodically, I see this
> error in the jetty output
>
> SEVERE: org.mortb
I am using Lily for atomic index updates ( implemented very nice;
transactionally; plus MapReduce; plus auto-denormaluzing)
http://www.lilyproject.org
It slows down "mean time" 7-10 times, but TPS still the same
- Fuad
http://www.tokenizer.ca
Sent from my iPad
On 2011-11-10, at 9:59 PM, M
e they have at
least 100k fields per instance they don't have any problem outside Amazon
;)))
--
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
Data Mining, Search Engines
http://www.tokenizer.ca
On 11-08-17 11:08 PM, "Fuad Efendi" wrote:
>more investigation and I see
I agree with Yonik of course;
But
You should see OOM errors in this case. In case of "virtualization"
however it is unpredictable and if JVM doesn't have few bytes to output
OOM into log file (because we are catching "throwable" and trying to
generate HTTP 500 instead !!! Freaky
memory required,
> currently I use -Xms3072M .
"Large CPU" instance is "virtualization" and behaviour is unpredictable.
Choose "cluster" instance with explicit Intel XEON CPU (instead of
"CPU-Units") and compare behaviour; $1.60/hour. Please share result
p://java.sun.com/webapps/bugreport/crash.jsp
>> #
>>
>>
>>
>> However, I can start it and run without any problems by removing
>> -XX:+AggressiveOpts (which has to be default setting "in upcoming
>>releases"
>> Java 6)
>>
>>
>>
>> Do we need to disable -XX:-DoEscapeAnalysis as IBM suggests?
>> http://www-01.ibm.com/support/docview.wss?uid=swg21422605
>>
>>
>>
>> Thanks,
>> Fuad Efendi
>>
>> http://www.tokenizer.ca
>>
>>
>>
>
>
>
>--
>lucidimagination.com
run without any problems by removing
-XX:+AggressiveOpts (which has to be default setting "in upcoming releases"
Java 6)
Do we need to disable -XX:-DoEscapeAnalysis as IBM suggests?
http://www-01.ibm.com/support/docview.wss?uid=swg21422605
Thanks,
Fuad Efendi
http://www.tokenizer.ca
Hi Otis,
I am recalling "pagination" feature, it is still unresolved (with default
scoring implementation): even with small documents, searching-retrieving
documents 1 to 10 can take 0 milliseconds, but from 100,000 to 100,010 can
take few minutes (I saw it with trunk version 6 months ago, and wi
I think the question is strange... May be you are wondering about possible
OOM exceptions? I think we can pass to Lucene single document containing
comma separated list of "term, term, ..." (few billion times)... Except
"stored" and "TermVectorComponent"...
I believe thousands companies already in
WHERE KEY2=? ORDER BY KEY1" -
check everything...
Thanks,
--
Fuad Efendi
416-993-2060
Tokenizer Inc., Canada
Data Mining, Search Engines
http://www.tokenizer.ca <http://www.tokenizer.ca/>
On 11-06-05 12:09 AM, "Rohit Gupta" wrote:
>No didn't double post, my b
Nice article... 2 ms better than 20 ms, but in another chart 50 seconds are not
as good as 3 seconds... Sorry for my vision...
SOLR pushed into Lucene Core huge amount of performance improvements...
Sent on the TELUS Mobility network with BlackBerry
-Original Message-
From: Shashi Kant
Interesting wordings:
"we want real-time search, we want simple multi-tenancy, and we want a
solution that is built for the cloud"
And later,
" built on top of Lucene."
Is that possible? :)
(what does that mean "real time search" anyway... and what is "cloud"?)
community is growing!
P.S.
I neve
It could be environment specific (specific of your "top" command
implementation, OS, etc)
I have on CentOS 2986m "virtual" memory showing although -Xmx2g
You have 10g "virtual" although -Xmx6g
Don't trust it too much... "top" command may count OS buffers for opened
files, network sockets, JVM D
Anyone noticed that it doesn't work? Already 2 weeks
https://issues.apache.org/jira/browse/INFRA-3667
I don't receive WIKI change notifications. I CC to 'Apache Wiki'
wikidi...@apache.org
Something is bad.
-Fuad
nnections" even for huge SQL-side max_connections.
If you are interested, I can continue work on SOLR-2233. CC: dev@lucene (is
anyone working on DIH improvements?)
Thanks,
Fuad Efendi
http://www.tokenizer.ca/
-Original Message-
From: François Schiettecatte [mailto:fschietteca...@gm
Related: SOLR-846
Sent on the TELUS Mobility network with BlackBerry
-Original Message-
From: Erick Erickson
Date: Tue, 7 Dec 2010 08:11:41
To:
Reply-To: solr-user@lucene.apache.org
Subject: Re: Out of memory error
Have you seen this page? http://wiki.apache.org/solr/DataImportHandler
Batch size "-1"??? Strange but could be a problem.
Note also you can't provide parameters to default startup.sh command; you
should modify setenv.sh instead
--Original Message--
From: sivaprasad
To: solr-user@lucene.apache.org
ReplyTo: solr-user@lucene.apache.org
Subject: Out of memory
I experienced similar problems. It was because we didn't perform load stress
tests properly, before going to production. Nothing is forever, replace
controller, change hardware vendor, maintain low temperature inside a rack.
Thanks
--Original Message--
From: Robert Gründler
To: solr-user
For Making by solr admin password protected,
I had used the Path Based Authentication form
http://wiki.apache.org/solr/SolrSecurity.
In this way my admin area,search,delete,add to index is protected.But
Now
when I make solr authenticated then for every update/delete f
> You could set a firewall that forbid any connection to your Solr's
> server port to everyone, except the computer that host your application
> that connect to Solr.
> So, only your application will be able to connect to Solr.
I believe firewalling is the only possible solution since SOLR doesn'
Hi,
I've read very interesting interview with Ryan,
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and
-Videos/Interview-Ryan-McKinley
Another finding is
https://issues.apache.org/jira/browse/SOLR-773
(lucene/contrib/spatial)
Is there any more staff going on for SOLR
of the default relevancy stuff but construct your own
> based on some other criterias?
>
> --
> Jan Høydahl - search architect
> Cominvent AS - www.cominvent.com
>
> On 13. feb. 2010, at 19.26, Fuad Efendi wrote:
>
> > Hi,
> > I execute query "word1", a
Hi,
I execute query "word1", and it returns 100k results where top-10k are just
"word1".
How to filter it, and to show "word1 word2" in top-10?
Thanks
> or since you specificly asked about delteing anything older
> then X days (in this example i'm assuming x=7)...
>
> createTime:[NOW-7DAYS TO *]
createTime:[* TO NOW-7DAYS]
Funny, Arrays.copy() for HashMap... but something similar...
Anyway, I use same values for initial size and max size, to be safe... and
to have OOP at startup :)
> -Original Message-
> From: Fuad Efendi [mailto:f...@efendi.ca]
> Sent: February-12-10 6:55 PM
> T
I always use initial size = max size,
just to avoid Arrays.copyOf()...
Initial (default) capacity for HashMap is 16, when it is not enough - array
copy to new 32-element array, then to 64, ...
- too much wasted space! (same for ConcurrentHashMap)
Excuse me if I didn't understand the question...
> hello *, quick question, what would i have to change in the query
> parser to allow wildcarded terms to go through text analysis?
I believe it is illogical. "wildcarded terms" will go through terms
enumerator.
SOLR doesn't come with such things...
Look at www.liferay.com; they have plugin for SOLR (in SVN trunk) so that
all documents / assets can be automatically indexed by SOLR (and you have
full freedom with defining specific SOLR schema settings); their portlets
support WebDAV, and "Open Office" look
-based, JSR-168, JSR-286 (and
it supports PHP-portlets, but I never tried).
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
> -Original Message-
> From: Peter [mailto:zarato...@gmx.net]
> Sent: January-16-10 10:17 AM
> To: solr-user@lucene.apache.org
> Subject: Fu
Levenstein algo is currently hardcoded (FuzzyTermEnum class) in Lucene 2.9.1
and 3.0...
There are samples of other distance in "contrib" folder
If you want to play with distance, check
http://issues.apache.org/jira/browse/LUCENE-2230
It works if distance is integer and follows "metric space axioms
'!'
:)))
Plus, FastLRUCache (previous one was synchronized)
(and of course warming-up time) := start complains after ensuring there are
no complains :)
(and of course OS needs time to cache filesystem blocks, and Java HotSpot,
... - few minutes at least...)
> On Feb 3, 2010, at 1:38 PM, Rajat Gar
uest handler
and etc.
It may work well (but only if query contains term from dictionary; it can't
work as a spellchecker)
Combination 2 algos can boost performance extremely...
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search
I can only tell that Liferay Portal (WebDAV) Document Library Portlet has
same functionality as Sharepoint (it has even /servlet/ URL with suffix
'/sharepoint'); Liferay also has plugin (web-hook) for SOLR (it has generic
search wrapper; any kind of search service provider can be hooked in
Liferay
> >> Even if "commit" takes 20 minutes?
> I've never seen a commit take 20 minutes... (anything taking that long
> is broken, perhaps in concept)
"index merge" can take from few minutes to few hours. That's why nothing can
beat SOLR Master/Slave and sharding for huge datasets. And reopening of
I
> >> Why to embed "indexing" as a transaction dependency? Extremely weird
> idea.
> There is nothing weird about different use cases requiring different
> approaches
>
> If you're just thinking documents and text search ... then its less of
> an issue.
> If you have an online application where
http://issues.apache.org/jira/browse/LUCENE-2230
Enjoy!
> -Original Message-
> From: Fuad Efendi [mailto:f...@efendi.ca]
> Sent: January-19-10 11:32 PM
> To: solr-user@lucene.apache.org
> Subject: SOLR Performance Tuning: Fuzzy Searches, Distance, BK-Tree
>
> Hi,
&
t to create an index? Absolutely nothing.
Why to embed "indexing" as a transaction dependency? Extremely weird idea. But
I understand some selling points...
SOLR: it is faster than Lucene. Filtered queries run faster than traditional
"AND" queries! And this is real selling point.
T
x27;t have to worry that field "USA" (3 characters) is
repeated in few millions documents, and field "Canada" (6 characters) in
another few; no any "relational", it's done automatically without any
Compass/Hibernate/Table(s)
Don't think "relational&q
Is there limit on size of query string?
Looks like I have exceptions when query string is higher than 400 characters
(average)
Thanks!
! (although I need to use classic int
instead of float distance by Lucene/Levenstein etc.)
Thanks,
Fuad Efendi
+1 416-993-2060
http://www.tokenizer.ca/
Data Mining, Vertical Search
> Seeley
> Sent: January-03-10 10:03 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR: Replication
>
> On Sat, Jan 2, 2010 at 11:35 PM, Fuad Efendi wrote:
> > I tried... I set APR to improve performance... server is slow while
> replica;
> > but "top&qu
ation
>
> On Sat, Jan 2, 2010 at 5:48 PM, Fuad Efendi wrote:
> > I used RSYNC before, and 20Gb replica took less than an hour (20-40
> > minutes); now, HTTP, and it takes 5-6 hours...
> > Admin screen shows 952Kb/sec average speed; 100Mbps network, full-
> duplex; I
>
I used RSYNC before, and 20Gb replica took less than an hour (20-40
minutes); now, HTTP, and it takes 5-6 hours...
Admin screen shows 952Kb/sec average speed; 100Mbps network, full-duplex; I
am using Tomcat Native for APR. 10x times slow...
-Fuad
http://www.tokenizer.ca
, WIKIs, Forum
Posts) is automatically indexed. Having separate SOLR definitely helps:
instead of hardcoding (with Lucene) we can now intelligently manage stop
words, stemming, language settings, and more.
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
Tokenizer Inc.
http
ote:
>
> >
> > On Dec 24, 2009, at 11:36 AM, Walter Underwood wrote:
> >> When do users do a query like that? --wunder
> >
> > Well, SolrEntityProcessor "users" do :)
> >
> > http://issues.apache.org/jira/browse/SOLR-1499
> >
t; low values for start=12345. Queries like start=28838540 take 40-60
> seconds,
> > and even cause OutOfMemoryException.
> >
> > I use highlight, faceting on nontokenized "Country" field, standard
> handler.
> >
> >
> > It even seems to be a
ttp://issues.apache.org/jira/browse/SOLR-1499
> > (which by the way I plan on polishing and committing over the holidays)
> >
> > Erik
> >
> >
> >
> >>
> >> On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote:
> >>
> >>> I used paginati
OutOfMemoryException.
I use highlight, faceting on nontokenized "Country" field, standard handler.
It even seems to be a bug...
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search
bilities. Log output will default to standard /logs folder of Tomcat.
You may find additional logging configuration settings by google for "Java 5
Logging" etc.
>
>
> 2009/12/20 Fuad Efendi :
> > After researching how to configure default SOLR & Tomcat logging, I
&
q,rsp);
setResponseHeaderValues(handler,req,rsp);
StringBuilder sb = new StringBuilder();
for (int i=0; i -Original Message-
> From: Fuad Efendi [mailto:f...@efendi.ca]
> Sent: December-20-09 2:54 PM
> To: solr-user@lucene.apache.org
> Subject: SOLR Performance Tuning
denly synchronous I/O by Java/Tomcat Logger slows down performance
much higher than read-only I/O of Lucene.
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search
> By that I mean that the java/tomcat
> process just disappears.
I had similar problem when I started Tomcat via SSH, and then I improperly
closed SSH without "exit" command.
In some cases (OutOfMemory) memory is not enough to generate log (or CPU can
be overloaded by Garbage Collector to su
-Fuad
> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: November-03-09 5:00 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene FieldCache memory requirements
>
> On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi wrote:
> > I b
FieldCache uses internally WeakHashMap... nothing wrong, but... no any
Garbage Collection tuning will help in case if allocated RAM is not enough
for replacing Weak** with Strong**, especially for SOLR faceting... 10%-15%
CPU taken by GC were reported...
-Fuad
Even in simplistic scenario, when it is Garbage Collected, we still
_need_to_be_able_ to allocate enough RAM to FieldCache on demand... linear
dependency on document count...
>
> Hi Mark,
>
> Yes, I understand it now; however, how will StringIndexCache size down in
a
> production system facetin
if (t < mterms.length) {
> // if there are less terms than documents,
> // trim off the dead array space
> String[] terms = new String[t];
> System.arraycopy (mterms, 0, terms, 0, t);
> mterms = terms;
> }
>
> StringIndex value = new StringInd
o be
safe, use this in your basic memory estimates:
[512Mb ~ 1Gb] + [non_tokenized_fields_count] x [maxdoc] x [8 bytes]
-Fuad
> -Original Message-
> From: Fuad Efendi [mailto:f...@efendi.ca]
> Sent: November-02-09 7:37 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Lucene
To be correct, I analyzed FieldCache awhile ago and I believed it never
"sizes down"...
/**
* Expert: The default cache implementation, storing all values in memory.
* A WeakHashMap is used for storage.
*
* Created: May 19, 2004 4:40:36 PM
*
* @since lucene 1.4
*/
Will it size down? Onl
ira/browse/LUCENE-1990 to make this
memory requirement even lower... but please correct me if I am wrong with
formula, and I am unsure how it is currently implemented...
Thanks,
Fuad
> -Original Message-
> From: Fuad Efendi [mailto:f...@efendi.ca]
> Sent: November-02-09 8:21
Mark,
I don't understand this:
> so with a ton of docs and a few uniques, you get a temp boost in the RAM
> reqs until it sizes it down.
Sizes down??? Why is it called Cache indeed? And how SOLR uses it if it is
not cache?
And this:
> A pointer for each doc.
Why can't we use (int) DocumentID?
I just did some tests in a completely new index (Slave), sort by
low-distributed non-tokenized Field (such as Country) takes milliseconds,
but sort (ascending) on tokenized field with heavy distribution took 30
seconds (initially). Second sort (descending) took milliseconds. Generic
query *.*; Fiel
the unsupported exception in that method
> for things like multi reader and just do the work to get the right
> number (currently there is a comment that the user should do that work
> if necessary, making the call unreliable for this).
>
> Fuad Efendi
negligible (for your case) memory to hold the actual string values).
>
> Note that for your use case, this is exceptionally wasteful. If
> Lucene had simple bit-packed ints (I've opened LUCENE-1990 for this)
> then it'd take much fewer bits to reference the values, sinc
1 - 100 of 296 matches
Mail list logo