Re: Caused by: org.noggit.JSONParser$ParseException: Expected ',' or '}': char=",position=312 BEFORE='ssions"

2017-04-25 Thread Fuad Efendi
Yes, absolutely correct, comma is missing at the end of line 10 All key-value pairs inside the same block should be comma separated, except last one From: Shawn Heisey Reply: solr-user@lucene.apache.org Date: April 25, 2017 at 2:29:03 PM To: solr-user@lucene.apache.org Subject: Re: Cause

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Fuad Efendi
;s no results? My strategy is to prefer an AND for large > collections (or a higher mm than 1) and prefer closer to an OR for smaller > collections. > > -Doug > > On Tue, Feb 21, 2017 at 1:39 PM Fuad Efendi > wrote: > >> Thank you Ahmet, I will try it; sounds rea

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Fuad Efendi
explicitly set similarity to tf-idf and see how it goes? Ahmet On Tuesday, February 21, 2017 4:28 AM, Fuad Efendi wrote: Hello, Default TF-IDF performs poorly with the indexed 200 millions documents. Query "Michael Jackson" may run 300ms, and "Michael The Jackson" over 3 secon

CPU Intensive Scoring Alternatives

2017-02-20 Thread Fuad Efendi
chael Jackson” runs 300ms instead of 3ms just because huge number of hits and TF-IDF calculations. Solr 6.3. Thanks, -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Search Relevancy, Recommender Systems

Re: Solr 5.5.0 MSSQL Datasource Example

2017-02-07 Thread Fuad Efendi
user pass dbname localhost 1433 -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Search Relevancy, Recommender Systems From: Per Newgro Reply: solr-user@lucene.apache.org Date: February 7

Re: Solr 5.3.1: Collection reload results in IndexWriter is closed exception

2017-02-07 Thread Fuad Efendi
Were you indexing new documents while reloading? “Previously we’ve done reloads of a collection after changing solrconfig.xml without any issues.” -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Search Relevancy, Recommender Systems From: Kelly, Frank Reply: solr-user

Re: Help with design choice: join or multiValued field

2017-02-06 Thread Fuad Efendi
Correct: multivalued field with 1 shop IDs. Use case: shopping network in U.S. for example for a big brand such as Walmart, when user implicitly provides IP address or explicitly Postal Code, so that we can find items in his/her neighbourhood. You basically provide “join” information via this

Re: Time of insert

2017-02-06 Thread Fuad Efendi
Not; historical logs for document updates is not provided. Users need to implement such functionality themselves if needed. From: Mahmoud Almokadem Reply: solr-user@lucene.apache.org Date: February 6, 2017 at 3:32:34 PM To: solr-user@lucene.apache.org Subject: Time of insert Hello, I'm u

Re: How-To: Secure Solr by IP Address

2016-11-04 Thread Fuad Efendi
simplify life ;) On November 4, 2016 at 12:05:13 PM, Fuad Efendi (f...@efendi.ca) wrote: Yes we need that documented, http://stackoverflow.com/questions/8924102/restricting-ip-addresses-for-jetty-and-solr Of course Firewall is a must for extremely strong environments / large corporations, DMZ

Re: How-To: Secure Solr by IP Address

2016-11-04 Thread Fuad Efendi
+ DMZ(s) -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Search Relevancy, Recommender Systems On November 4, 2016 at 9:28:21 AM, David Smiley (david.w.smi...@gmail.com) wrote: I was just researching how to secure Solr by IP address and I finally figured it out. Perhaps this might go in

Re: Different Sorts based on Different Groups

2016-11-04 Thread Fuad Efendi
ould be very different. I had recently assignment at well-known retail shop where we even designed pre-query custom boosts so that we can customize typical (most important for the business) queries as per business needs Thanks, -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Search Relevan

Re: Problem with Password Decryption in Data Import Handler

2016-11-02 Thread Fuad Efendi
e I've eliminated general connectivity/authentication problems. Thanks, Jamie On Wed, Nov 2, 2016 at 4:58 PM, Fuad Efendi wrote: > In MySQL, this command will explicitly allow to connect from > remote ICZ2002912 host, check MySQL documentation: > > GRANT ALL ON mys

Re: Problem with Password Decryption in Data Import Handler

2016-11-02 Thread Fuad Efendi
In MySQL, this command will explicitly allow to connect from remote ICZ2002912  host, check MySQL documentation: GRANT ALL ON mysite.* TO 'root’@'ICZ2002912' IDENTIFIED BY ‘Oakton123’; On November 2, 2016 at 4:41:48 PM, Fuad Efendi (f...@efendi.ca) wrote: This is the root

Re: Problem with Password Decryption in Data Import Handler

2016-11-02 Thread Fuad Efendi
stance I suspect you need to allow MySQL & Co. to accept connections from ICZ2002912. Plus, check DNS resolution, etc.  Thanks, -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Recommender Systems On November 2, 2016 at 2:37:08 PM, Jamie Jackson (jamieja...@gmail.com) wrote: I

Re: Timeout occured while waiting response from server at: http://***/solr/commodityReview

2016-11-02 Thread Fuad Efendi
sider sharding / SolrCloud if you need huge memory just for field cache. And you will be forced to consider it if you gave more that 2 billions documents (am I right? Lucene internal limitation, Integer.MAX_INT) Thanks, -- Fuad Efendi (416) 993-2060 http://www.tokenizer.ca Search Relevanc

Re: Timeout occured while waiting response from server at: http://***/solr/commodityReview

2016-11-01 Thread Fuad Efendi
internal caches. Solr has the way to warm up internal caches before making new searcher available: https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig Make this queries typical for your use cases (for instance, *:* with faceting):          Thanks, -- Fuad Efendi (416

Foot, Inch: Stripping Out Special Characters: DisMax: WhitespaceTokenizer vs. Keyword Tokenizer

2016-03-10 Thread Fuad Efendi
. But it works fine with KeywordTokenizer. Any idea why? Thanks, --  Fuad Efendi http://www.tokenizer.ca Data Mining, Vertical Search

Re: Stopping Solr JVM on OOM

2016-02-25 Thread Fuad Efendi
;what is the best way to stop Solr when it gets in OOM” (or just becomes irresponsive because of swallowed exceptions) --  Fuad Efendi 416-993-2060(cell) On February 25, 2016 at 2:37:45 PM, CP Mishra (mishr...@gmail.com) wrote: Looking at the previous threads (and in our tests), oom script spec

RE: Solr HTTP client authentication

2014-11-17 Thread Fuad Efendi
> I can > manually create an httpclient and set up authentication but then I can't use > solrj. Yes; correct; except that you _can_ use solj with this custom HttpClient instance (which will intercept authentication, which will support cookies, SSL or plain HTTP, Keep-Alive, and etc.) You can

contributor group

2013-04-05 Thread Fuad Efendi
Hi, Please add me: FuadEfendi Thanks! -- http://www.tokenizer.ca

Please add me: FuadEfendi

2013-04-05 Thread Fuad Efendi
Hi, Few months ago I was able to modify Wiki; I can't do it now, probably because http://wiki.apache.org/solr/ContributorsGroup Please add me: FuadEfendi Thanks! -- Fuad Efendi, PhD, CEO C: (416)993-2060 F: (416)800-6479 Tokenizer Inc., Canada http://www.tokenizer.ca

RE: Can SOLR Index UTF-16 Text

2012-10-03 Thread Fuad Efendi
eaders when you POST your file to Solr) -Fuad Efendi http://www.tokenizer.ca -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: October-03-12 1:30 PM To: solr-user@lucene.apache.org Subject: RE: Can SOLR Index UTF-16 Text Something is missing from the body of your E

RE: Can SOLR Index UTF-16 Text

2012-10-03 Thread Fuad Efendi
re and etc... -Fuad Efendi http://www.tokenizer.ca -Original Message- From: vybe3142 [mailto:vybe3...@gmail.com] Sent: October-03-12 12:30 PM To: solr-user@lucene.apache.org Subject: Re: Can SOLR Index UTF-16 Text Thanks for all the responses. Problem partially solved (see below) 1.

RE: Can SOLR Index UTF-16 Text

2012-10-02 Thread Fuad Efendi
Solr can index bytearrays too: unigram, bigram, trigram... even bitsets, tritsets, qatrisets ;- ) LOL I got strong cold... BTW, don't forget to configure UTF-8 as your default (Java) container encoding... -Fuad

Re: UnInvertedField limitations

2012-09-06 Thread Fuad Efendi
ion unique terms - per document. Do you have >>such >> large documents? This appears to be a hard limit based of 24-bytes in a >>Java >> int. >> >> You can try facet.method=enum, but that may be too slow. >> >> What release of Solr are you running

Re: UnInvertedField limitations

2012-09-06 Thread Fuad Efendi
- per document. Do you have >such >large documents? This appears to be a hard limit based of 24-bytes in a >Java >int. > >You can try facet.method=enum, but that may be too slow. > >What release of Solr are you running? > >-- Jack Krupansky > >-Original Message

RE: Solr-4.0.0-Beta Bug with "Load Term Info" in Schema Browser

2012-08-25 Thread Fuad Efendi
"channel:MyTerm" it shows 650 documents foundŠ possibly bugŠ it happens after I commit data too, nothing changes; and this field is single-valued non-tokenized string. -Fuad -- Fuad Efendi 416-993-2060 http://www.tokenizer.ca

RE: Solr-4.0.0-Beta Bug with "Load Term Info" in Schema Browser

2012-08-24 Thread Fuad Efendi
possibly bugŠ it happens after I commit data too, nothing changes; and this field is single-valued non-tokenized string. -Fuad -- Fuad Efendi 416-993-2060 http://www.tokenizer.ca

Solr-4.0.0-Beta Bug with "Load Term Info" in Schema Browser

2012-08-24 Thread Fuad Efendi
Hi there, "Load term Info" shows 3650 for a specific term "MyTerm", and when I execute query "channel:MyTerm" it shows 650 documents foundŠ possibly bugŠ it happens after I commit data too, nothing changes; and this field is single-valued non-tokenized string. -Fuad

UnInvertedField limitations

2012-08-20 Thread Fuad Efendi
) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand ler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) -- Fuad Efendi http

UnInvertedField limitations

2012-08-20 Thread Fuad Efendi
:1561) -- Fuad Efendi http://www.tokenizer.ca

Re: Near Real Time + Facets + Hierarchical Faceting (Pivot Table) with Date Range: huge data set

2012-08-20 Thread Fuad Efendi
can get more information and also download from here: > >http://solr-ra.tgels.org > >Regards > >- Nagendra Nagarajayya >http://solr-ra.tgels.org >http://rankingalgorithm.tgels.org > >ps. Note: Apache Solr 4.0 with RankingAlgorithm 1.4.4 is an external >implementation

Near Real Time + Facets + Hierarchical Faceting (Pivot Table) with Date Range: huge data set

2012-08-13 Thread Fuad Efendi
we can use Facets with Near Real Time feature Service layer will accumulate search results from three layers, it will be near real time. Any thoughts? Thanks, -- Fuad Efendi 416-993-2060 Tokenizer Inc., Canada http://www.tokenizer.ca http://www.linkedin.com/in/lucene

RE: Using Solr 3.4 running on tomcat7 - very slow search

2012-07-17 Thread Fuad Efendi
> FWIW, when asked at what point one would want to split JVMs and shard, > on the same machine, Grant Ingersoll mentioned 16GB, and precisely for > GC cost reasons. You're way above that. - his index is 75G, and Grant mentioned RAM heap size; we can use terabytes of index with 16Gb memory.

Solr Consultant Available in Canada: Solr, HBase, Hadoop, Lily

2012-04-16 Thread Fuad Efendi
, Web Services, Moreover, Web Ping, SQL-import, sitemaps-based, intranets, and more. Additionally to that, I can design super-rich UI extremely fast using tools such as Liferay Portal, Apache Wicket, Vaadin. Thanks, -- Fuad Efendi 416-993-2060 Tokenizer Inc., Canada http://www.tokenizer.ca <h

Solr Consultant Available in Canada: Solr, HBase, Hadoop, Mahout, Lily

2012-04-16 Thread Fuad Efendi
, Web Services, Moreover, Web Ping, SQL-import, sitemaps-based, intranets, and more. Additionally to that, I can design super-rich UI extremely fast using tools such as Liferay Portal, Apache Wicket, Vaadin. Thanks, -- Fuad Efendi 416-993-2060 Tokenizer Inc., Canada http://www.tokenizer.ca <h

Re: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Fuad Efendi
I agree that SSD boosts performance... In some rare not-real-life scenario: - super frequent commits That's it, nothing more except the fact that Lucene compile time including tests takes up to two minutes on MacBook with SSD, or forty-fifty minutes on Windows with HDD. Of course, with non-empty

Re: jetty error, broken pipe

2011-11-19 Thread Fuad Efendi
lient and SOLRJ... Fuad Efendi http://www.tokenizer.ca Sent from my iPad On 2011-11-19, at 9:14 PM, alx...@aim.com wrote: > Hello, > > I use solr 3.4 with jetty that is included in it. Periodically, I see this > error in the jetty output > > SEVERE: org.mortb

Re: HBase Datasource

2011-11-10 Thread Fuad Efendi
I am using Lily for atomic index updates ( implemented very nice; transactionally; plus MapReduce; plus auto-denormaluzing) http://www.lilyproject.org It slows down "mean time" 7-10 times, but TPS still the same - Fuad http://www.tokenizer.ca Sent from my iPad On 2011-11-10, at 9:59 PM, M

Re: solr keeps dying every few hours.

2011-08-17 Thread Fuad Efendi
e they have at least 100k fields per instanceŠ they don't have any problem outside Amazon ;))) -- Fuad Efendi 416-993-2060 Tokenizer Inc., Canada Data Mining, Search Engines http://www.tokenizer.ca On 11-08-17 11:08 PM, "Fuad Efendi" wrote: >more investigation and I see

Re: solr keeps dying every few hours.

2011-08-17 Thread Fuad Efendi
I agree with Yonik of course; ButŠ You should see OOM errors in this case. In case of "virtualization" however it is unpredictableŠ and if JVM doesn't have few bytes to output OOM into log file (because we are catching "throwable" and trying to generate HTTP 500 instead !!! FreakyŠ

Re: solr keeps dying every few hours.

2011-08-17 Thread Fuad Efendi
memory required, > currently I use -Xms3072M . "Large CPU" instance is "virtualization" and behaviour is unpredictable. Choose "cluster" instance with explicit Intel XEON CPU (instead of "CPU-Units") and compare behaviour; $1.60/hour. Please share result

Re: Solr Performance Tuning: -XX:+AggressiveOpts

2011-07-27 Thread Fuad Efendi
p://java.sun.com/webapps/bugreport/crash.jsp >> # >> >> >> >> However, I can start it and run without any problems by removing >> -XX:+AggressiveOpts (which has to be default setting "in upcoming >>releases" >> Java 6) >> >> >> >> Do we need to disable -XX:-DoEscapeAnalysis as IBM suggests? >> http://www-01.ibm.com/support/docview.wss?uid=swg21422605 >> >> >> >> Thanks, >> Fuad Efendi >> >> http://www.tokenizer.ca >> >> >> > > > >-- >lucidimagination.com

Solr Performance Tuning: -XX:+AggressiveOpts

2011-07-27 Thread Fuad Efendi
run without any problems by removing -XX:+AggressiveOpts (which has to be default setting "in upcoming releases" Java 6) Do we need to disable -XX:-DoEscapeAnalysis as IBM suggests? http://www-01.ibm.com/support/docview.wss?uid=swg21422605 Thanks, Fuad Efendi http://www.tokenizer.ca

Re: 400 MB Fields

2011-06-07 Thread Fuad Efendi
Hi Otis, I am recalling "pagination" feature, it is still unresolved (with default scoring implementation): even with small documents, searching-retrieving documents 1 to 10 can take 0 milliseconds, but from 100,000 to 100,010 can take few minutes (I saw it with trunk version 6 months ago, and wi

Re: 400 MB Fields

2011-06-07 Thread Fuad Efendi
I think the question is strange... May be you are wondering about possible OOM exceptions? I think we can pass to Lucene single document containing comma separated list of "term, term, ..." (few billion times)... Except "stored" and "TermVectorComponent"... I believe thousands companies already in

Re: URGENT HELP: Improving Solr indexing time

2011-06-04 Thread Fuad Efendi
WHERE KEY2=? ORDER BY KEY1" - check everything... Thanks, -- Fuad Efendi 416-993-2060 Tokenizer Inc., Canada Data Mining, Search Engines http://www.tokenizer.ca <http://www.tokenizer.ca/> On 11-06-05 12:09 AM, "Rohit Gupta" wrote: >No didn't double post, my b

Re: Solr vs ElasticSearch

2011-05-31 Thread Fuad Efendi
Nice article... 2 ms better than 20 ms, but in another chart 50 seconds are not as good as 3 seconds... Sorry for my vision... SOLR pushed into Lucene Core huge amount of performance improvements... Sent on the TELUS Mobility network with BlackBerry -Original Message- From: Shashi Kant

RE: Solr vs ElasticSearch

2011-05-31 Thread Fuad Efendi
Interesting wordings: "we want real-time search, we want simple multi-tenancy, and we want a solution that is built for the cloud" And later, " built on top of Lucene." Is that possible? :) (what does that mean "real time search" anyway... and what is "cloud"?) community is growing! P.S. I neve

RE: Solr memory consumption

2011-05-31 Thread Fuad Efendi
It could be environment specific (specific of your "top" command implementation, OS, etc) I have on CentOS 2986m "virtual" memory showing although -Xmx2g You have 10g "virtual" although -Xmx6g Don't trust it too much... "top" command may count OS buffers for opened files, network sockets, JVM D

WIKI alerts

2011-05-31 Thread Fuad Efendi
Anyone noticed that it doesn't work? Already 2 weeks https://issues.apache.org/jira/browse/INFRA-3667 I don't receive WIKI change notifications. I CC to 'Apache Wiki' wikidi...@apache.org Something is bad. -Fuad

RE: DIH: Exception with "Too many connections"

2011-05-31 Thread Fuad Efendi
nnections" even for huge SQL-side max_connections. If you are interested, I can continue work on SOLR-2233. CC: dev@lucene (is anyone working on DIH improvements?) Thanks, Fuad Efendi http://www.tokenizer.ca/ -Original Message- From: François Schiettecatte [mailto:fschietteca...@gm

Re: Out of memory error

2010-12-07 Thread Fuad Efendi
Related: SOLR-846 Sent on the TELUS Mobility network with BlackBerry -Original Message- From: Erick Erickson Date: Tue, 7 Dec 2010 08:11:41 To: Reply-To: solr-user@lucene.apache.org Subject: Re: Out of memory error Have you seen this page? http://wiki.apache.org/solr/DataImportHandler

Re: Out of memory error

2010-12-06 Thread Fuad Efendi
Batch size "-1"??? Strange but could be a problem. Note also you can't provide parameters to default startup.sh command; you should modify setenv.sh instead --Original Message-- From: sivaprasad To: solr-user@lucene.apache.org ReplyTo: solr-user@lucene.apache.org Subject: Out of memory

Re: Dataimporthandler crashed raidcontroller

2010-11-04 Thread Fuad Efendi
I experienced similar problems. It was because we didn't perform load stress tests properly, before going to production. Nothing is forever, replace controller, change hardware vendor, maintain low temperature inside a rack. Thanks --Original Message-- From: Robert Gründler To: solr-user

RE: Need feedback on solr security

2010-02-17 Thread Fuad Efendi
For Making by solr admin password protected, I had used the Path Based Authentication form http://wiki.apache.org/solr/SolrSecurity. In this way my admin area,search,delete,add to index is protected.But Now when I make solr authenticated then for every update/delete f

RE: Need feedback on solr security

2010-02-17 Thread Fuad Efendi
> You could set a firewall that forbid any connection to your Solr's > server port to everyone, except the computer that host your application > that connect to Solr. > So, only your application will be able to connect to Solr. I believe firewalling is the only possible solution since SOLR doesn'

Range Queries, Geospatial

2010-02-16 Thread Fuad Efendi
Hi, I've read very interesting interview with Ryan, http://www.lucidimagination.com/Community/Hear-from-the-Experts/Podcasts-and -Videos/Interview-Ryan-McKinley Another finding is https://issues.apache.org/jira/browse/SOLR-773 (lucene/contrib/spatial) Is there any more staff going on for SOLR

RE: Removing single-term results / reordering

2010-02-13 Thread Fuad Efendi
of the default relevancy stuff but construct your own > based on some other criterias? > > -- > Jan Høydahl - search architect > Cominvent AS - www.cominvent.com > > On 13. feb. 2010, at 19.26, Fuad Efendi wrote: > > > Hi, > > I execute query "word1", a

Removing single-term results / reordering

2010-02-13 Thread Fuad Efendi
Hi, I execute query "word1", and it returns 100k results where top-10k are just "word1". How to filter it, and to show "word1 word2" in top-10? Thanks

RE: expire/delete documents

2010-02-12 Thread Fuad Efendi
> or since you specificly asked about delteing anything older > then X days (in this example i'm assuming x=7)... > > createTime:[NOW-7DAYS TO *] createTime:[* TO NOW-7DAYS]

RE: For caches, any reason to not set initialSize and size to the same value?

2010-02-12 Thread Fuad Efendi
Funny, Arrays.copy() for HashMap... but something similar... Anyway, I use same values for initial size and max size, to be safe... and to have OOP at startup :) > -Original Message- > From: Fuad Efendi [mailto:f...@efendi.ca] > Sent: February-12-10 6:55 PM > T

RE: For caches, any reason to not set initialSize and size to the same value?

2010-02-12 Thread Fuad Efendi
I always use initial size = max size, just to avoid Arrays.copyOf()... Initial (default) capacity for HashMap is 16, when it is not enough - array copy to new 32-element array, then to 64, ... - too much wasted space! (same for ConcurrentHashMap) Excuse me if I didn't understand the question...

RE: analysing wild carded terms

2010-02-10 Thread Fuad Efendi
> hello *, quick question, what would i have to change in the query > parser to allow wildcarded terms to go through text analysis? I believe it is illogical. "wildcarded terms" will go through terms enumerator.

RE: Solr integration with document management systems

2010-02-06 Thread Fuad Efendi
SOLR doesn't come with such things... Look at www.liferay.com; they have plugin for SOLR (in SVN trunk) so that all documents / assets can be automatically indexed by SOLR (and you have full freedom with defining specific SOLR schema settings); their portlets support WebDAV, and "Open Office" look

RE: Fundamental questions of how to build up solr for huge portals

2010-02-05 Thread Fuad Efendi
-based, JSR-168, JSR-286 (and it supports PHP-portlets, but I never tried). Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay > -Original Message- > From: Peter [mailto:zarato...@gmx.net] > Sent: January-16-10 10:17 AM > To: solr-user@lucene.apache.org > Subject: Fu

RE: fuzzy matching / configurable distance function?

2010-02-04 Thread Fuad Efendi
Levenstein algo is currently hardcoded (FuzzyTermEnum class) in Lucene 2.9.1 and 3.0... There are samples of other distance in "contrib" folder If you want to play with distance, check http://issues.apache.org/jira/browse/LUCENE-2230 It works if distance is integer and follows "metric space axioms

RE: Solr response extremely slow

2010-02-04 Thread Fuad Efendi
'!' :))) Plus, FastLRUCache (previous one was synchronized) (and of course warming-up time) := start complains after ensuring there are no complains :) (and of course OS needs time to cache filesystem blocks, and Java HotSpot, ... - few minutes at least...) > On Feb 3, 2010, at 1:38 PM, Rajat Gar

SOLR Performance Tuning: Fuzzy Search

2010-02-03 Thread Fuad Efendi
uest handler and etc. It may work well (but only if query contains term from dictionary; it can't work as a spellchecker) Combination 2 algos can boost performance extremely... Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http://www.tokenizer.ca/ Data Mining, Vertical Search

RE: Comparison of Solr with Sharepoint Search

2010-01-26 Thread Fuad Efendi
I can only tell that Liferay Portal (WebDAV) Document Library Portlet has same functionality as Sharepoint (it has even /servlet/ URL with suffix '/sharepoint'); Liferay also has plugin (web-hook) for SOLR (it has generic search wrapper; any kind of search service provider can be hooked in Liferay

RE: Solr vs. Compass

2010-01-25 Thread Fuad Efendi
> >> Even if "commit" takes 20 minutes? > I've never seen a commit take 20 minutes... (anything taking that long > is broken, perhaps in concept) "index merge" can take from few minutes to few hours. That's why nothing can beat SOLR Master/Slave and sharding for huge datasets. And reopening of I

RE: Solr vs. Compass

2010-01-25 Thread Fuad Efendi
> >> Why to embed "indexing" as a transaction dependency? Extremely weird > idea. > There is nothing weird about different use cases requiring different > approaches > > If you're just thinking documents and text search ... then its less of > an issue. > If you have an online application where

RE: SOLR Performance Tuning: Fuzzy Searches, Distance, BK-Tree

2010-01-22 Thread Fuad Efendi
http://issues.apache.org/jira/browse/LUCENE-2230 Enjoy! > -Original Message- > From: Fuad Efendi [mailto:f...@efendi.ca] > Sent: January-19-10 11:32 PM > To: solr-user@lucene.apache.org > Subject: SOLR Performance Tuning: Fuzzy Searches, Distance, BK-Tree > > Hi, &

RE: Solr vs. Compass

2010-01-22 Thread Fuad Efendi
t to create an index? Absolutely nothing. Why to embed "indexing" as a transaction dependency? Extremely weird idea. But I understand some selling points... SOLR: it is faster than Lucene. Filtered queries run faster than traditional "AND" queries! And this is real selling point. T

RE: Solr vs. Compass

2010-01-22 Thread Fuad Efendi
x27;t have to worry that field "USA" (3 characters) is repeated in few millions documents, and field "Canada" (6 characters) in another few; no any "relational", it's done automatically without any Compass/Hibernate/Table(s) Don't think "relational&q

Is there limit on size of query string?

2010-01-22 Thread Fuad Efendi
Is there limit on size of query string? Looks like I have exceptions when query string is higher than 400 characters (average) Thanks!

SOLR Performance Tuning: Fuzzy Searches, Distance, BK-Tree

2010-01-19 Thread Fuad Efendi
! (although I need to use classic int instead of float distance by Lucene/Levenstein etc.) Thanks, Fuad Efendi +1 416-993-2060 http://www.tokenizer.ca/ Data Mining, Vertical Search

RE: SOLR: Replication

2010-01-03 Thread Fuad Efendi
> Seeley > Sent: January-03-10 10:03 AM > To: solr-user@lucene.apache.org > Subject: Re: SOLR: Replication > > On Sat, Jan 2, 2010 at 11:35 PM, Fuad Efendi wrote: > > I tried... I set APR to improve performance... server is slow while > replica; > > but "top&qu

RE: SOLR: Replication

2010-01-02 Thread Fuad Efendi
ation > > On Sat, Jan 2, 2010 at 5:48 PM, Fuad Efendi wrote: > > I used RSYNC before, and 20Gb replica took less than an hour (20-40 > > minutes); now, HTTP, and it takes 5-6 hours... > > Admin screen shows 952Kb/sec average speed; 100Mbps network, full- > duplex; I >

SOLR: Replication

2010-01-02 Thread Fuad Efendi
I used RSYNC before, and 20Gb replica took less than an hour (20-40 minutes); now, HTTP, and it takes 5-6 hours... Admin screen shows 952Kb/sec average speed; 100Mbps network, full-duplex; I am using Tomcat Native for APR. 10x times slow... -Fuad http://www.tokenizer.ca

SOLR: Portlet (Plugin) for Lifeay Portal

2009-12-25 Thread Fuad Efendi
, WIKIs, Forum Posts) is automatically indexed. Having separate SOLR definitely helps: instead of hardcoding (with Lucene) we can now intelligently manage stop words, stemming, language settings, and more. Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http

RE: SOLR Performance Tuning: Pagination

2009-12-24 Thread Fuad Efendi
ote: > > > > > On Dec 24, 2009, at 11:36 AM, Walter Underwood wrote: > >> When do users do a query like that? --wunder > > > > Well, SolrEntityProcessor "users" do :) > > > > http://issues.apache.org/jira/browse/SOLR-1499 > >

RE: SOLR Performance Tuning: Pagination

2009-12-24 Thread Fuad Efendi
t; low values for start=12345. Queries like start=28838540 take 40-60 > seconds, > > and even cause OutOfMemoryException. > > > > I use highlight, faceting on nontokenized "Country" field, standard > handler. > > > > > > It even seems to be a

RE: SOLR Performance Tuning: Pagination

2009-12-24 Thread Fuad Efendi
ttp://issues.apache.org/jira/browse/SOLR-1499 > > (which by the way I plan on polishing and committing over the holidays) > > > > Erik > > > > > > > >> > >> On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote: > >> > >>> I used paginati

SOLR Performance Tuning: Pagination

2009-12-24 Thread Fuad Efendi
OutOfMemoryException. I use highlight, faceting on nontokenized "Country" field, standard handler. It even seems to be a bug... Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http://www.tokenizer.ca/ Data Mining, Vertical Search

RE: SOLR Performance Tuning: Disable INFO Logging.

2009-12-21 Thread Fuad Efendi
bilities. Log output will default to standard /logs folder of Tomcat. You may find additional logging configuration settings by google for "Java 5 Logging" etc. > > > 2009/12/20 Fuad Efendi : > > After researching how to configure default SOLR & Tomcat logging, I &

RE: SOLR Performance Tuning: Disable INFO Logging.

2009-12-20 Thread Fuad Efendi
q,rsp); setResponseHeaderValues(handler,req,rsp); StringBuilder sb = new StringBuilder(); for (int i=0; i -Original Message- > From: Fuad Efendi [mailto:f...@efendi.ca] > Sent: December-20-09 2:54 PM > To: solr-user@lucene.apache.org > Subject: SOLR Performance Tuning

SOLR Performance Tuning: Disable INFO Logging.

2009-12-20 Thread Fuad Efendi
denly synchronous I/O by Java/Tomcat Logger slows down performance much higher than read-only I/O of Lucene. Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http://www.tokenizer.ca/ Data Mining, Vertical Search

RE: solr stops running periodically

2009-11-16 Thread Fuad Efendi
> By that I mean that the java/tomcat > process just disappears. I had similar problem when I started Tomcat via SSH, and then I improperly closed SSH without "exit" command. In some cases (OutOfMemory) memory is not enough to generate log (or CPU can be overloaded by Garbage Collector to su

RE: Lucene FieldCache memory requirements

2009-11-03 Thread Fuad Efendi
-Fuad > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: November-03-09 5:00 AM > To: solr-user@lucene.apache.org > Subject: Re: Lucene FieldCache memory requirements > > On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi wrote: > > I b

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
FieldCache uses internally WeakHashMap... nothing wrong, but... no any Garbage Collection tuning will help in case if allocated RAM is not enough for replacing Weak** with Strong**, especially for SOLR faceting... 10%-15% CPU taken by GC were reported... -Fuad

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
Even in simplistic scenario, when it is Garbage Collected, we still _need_to_be_able_ to allocate enough RAM to FieldCache on demand... linear dependency on document count... > > Hi Mark, > > Yes, I understand it now; however, how will StringIndexCache size down in a > production system facetin

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
if (t < mterms.length) { > // if there are less terms than documents, > // trim off the dead array space > String[] terms = new String[t]; > System.arraycopy (mterms, 0, terms, 0, t); > mterms = terms; > } > > StringIndex value = new StringInd

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
o be safe, use this in your basic memory estimates: [512Mb ~ 1Gb] + [non_tokenized_fields_count] x [maxdoc] x [8 bytes] -Fuad > -Original Message- > From: Fuad Efendi [mailto:f...@efendi.ca] > Sent: November-02-09 7:37 PM > To: solr-user@lucene.apache.org > Subject: RE: Lucene

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
To be correct, I analyzed FieldCache awhile ago and I believed it never "sizes down"... /** * Expert: The default cache implementation, storing all values in memory. * A WeakHashMap is used for storage. * * Created: May 19, 2004 4:40:36 PM * * @since lucene 1.4 */ Will it size down? Onl

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
ira/browse/LUCENE-1990 to make this memory requirement even lower... but please correct me if I am wrong with formula, and I am unsure how it is currently implemented... Thanks, Fuad > -Original Message- > From: Fuad Efendi [mailto:f...@efendi.ca] > Sent: November-02-09 8:21

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
Mark, I don't understand this: > so with a ton of docs and a few uniques, you get a temp boost in the RAM > reqs until it sizes it down. Sizes down??? Why is it called Cache indeed? And how SOLR uses it if it is not cache? And this: > A pointer for each doc. Why can't we use (int) DocumentID?

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
I just did some tests in a completely new index (Slave), sort by low-distributed non-tokenized Field (such as Country) takes milliseconds, but sort (ascending) on tokenized field with heavy distribution took 30 seconds (initially). Second sort (descending) took milliseconds. Generic query *.*; Fiel

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
the unsupported exception in that method > for things like multi reader and just do the work to get the right > number (currently there is a comment that the user should do that work > if necessary, making the call unreliable for this). > > Fuad Efendi

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
negligible (for your case) memory to hold the actual string values). > > Note that for your use case, this is exceptionally wasteful. If > Lucene had simple bit-packed ints (I've opened LUCENE-1990 for this) > then it'd take much fewer bits to reference the values, sinc

  1   2   3   >