Re: [ANN] word2vec for Lucene

2014-11-20 Thread Glen Newton
Hi Koji, Semantic vectors is here: http://code.google.com/p/semanticvectors/ It is a project that has been around for a number of years and used by many people (including me http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html ). If you could compare and contrast word2vec

Re: any plans to remove int32 limitation on the number of the documents in the index?

2013-05-02 Thread Glen Newton
$100 for anyone who gets me a working Long.MAX_VALUE branch! ;-) I know that for many of the SOLR with faceting use cases, things will not scale to Long documents, but there are a number of more straightforward use cases, where SOLR/Lucene will scale to Long. Like simple searches, small numbers o

Re: Benefits of Solr over Lucene?

2013-02-12 Thread Glen Newton
der > > On Feb 12, 2013, at 12:26 PM, Glen Newton wrote: > >> Is there a page on the wiki that points out the use cases (or the >> features) that are best suited for Lucene adoption, and those best >> suited for SOLR adoption? >> >> -Glen >> >> On

Re: Benefits of Solr over Lucene?

2013-02-12 Thread Glen Newton
Is there a page on the wiki that points out the use cases (or the features) that are best suited for Lucene adoption, and those best suited for SOLR adoption? -Glen On Tue, Feb 12, 2013 at 3:11 PM, Shawn Heisey wrote: > On 2/12/2013 11:19 AM, JohnRodey wrote: >> >> So I have had a fair amount of

Re: [Announce] Apache Solr 4.0 with RankingAlgorithm 1.4.4 and Realtime NRT available for download

2012-10-29 Thread Glen Newton
+10 On Mon, Oct 29, 2012 at 12:17 PM, Michael Della Bitta wrote: > As an external observer, I think the main problem is your branding. > "Realtime Near Realtime" is definitely an oxymoron, and your ranking > algorithm is called "Ranking Algorithm," which is generic enough to > suggest that a. it'

Re: CLASSPATH

2012-05-09 Thread Glen Newton
rrow for a reason. You should post to the nutch list. If you think the nutch list is not a responsive space, post to http://stackoverflow.com/ with the appropriate tags nutch tagged questions: http://stackoverflow.com/questions/tagged/nutch constructively, Glen Newton On Wed, May 9, 2012 at 4:55 A

Re: Evaluating Solr

2012-04-04 Thread Glen Newton
"Re-Index your data" ~= Reload your data On Wed, Apr 4, 2012 at 12:46 PM, Joseph Werner wrote: > Hi, > > I'm evaluating Solr for use in a project. In the Solr FAQ under "How can I > rebuild my index from scratch if I change my schema?"  After restarting the > server, step  5 is to "Re-Index your

Re: Lucene vs Solr design decision

2012-03-09 Thread Glen Newton
millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan wrote: > Solr has no limitation on the number of cores. It's limited by your hardware, > inodes and how many files you could keep open. > > I think even if you went the Lucene route you would run into same hardw

Re: Unusually long data import time?

2012-02-22 Thread Glen Newton
; _usually_) - Network issues if non-local - DB configuration (driver, etc) If you can give more information about the above, people on this list should be able to better indicate whether 18 hours sounds right for your situation. -Glen Newton On Wed, Feb 22, 2012 at 10:14 AM, Devon Baumgarten wrote

Re: OutOfMemoryError coming from TermVectorsReader

2011-09-19 Thread Glen Newton
Please include information about your heap size, (and other Java command line arguments) as well a platform OS (version, swap size, etc), Java version, underlying hardware (RAM, etc) for us to better help you. >From the information you have given, increasing your heap size should help. Thanks, Gl

Re: indexing 30million records to Solr using solrj doen't work but works for small files

2011-09-02 Thread Glen Newton
Please show how it "doesn't work", i.e. does the application throw an exception and if yes, could you please post the stacktrace. If no, please be more explicit. Thanks, Glen Newton On Fri, Sep 2, 2011 at 10:35 AM, angel wrote: > Hi below is my java program for indexing around

Re: Stable Linux Release

2011-08-18 Thread Glen Newton
Please take this discussion off list. Thanks, Glen On Thu, Aug 18, 2011 at 3:02 PM, Gora Mohanty wrote: > On Fri, Aug 19, 2011 at 12:15 AM, Cupbearer wrote: >> What are the Prerequite libraries required to get Solr to work in Php.   >> Php.net has libxml2 and libcurlx I think (off the top of my

Re: Indexing SharePoint from SolrJ

2011-07-27 Thread Glen Newton
+1 On 7/27/11, Twomey, David wrote: > > Does anyone have examples of indexing SP content using the Google Connectors > API and using SolrJ. > > I know Lucid Imagination has a Sharepoint connector and I have used that > successfully. > > However, I would like to create a thumbnail image of PDF's

Re: Using Solr over Lucene effects performance?

2011-03-11 Thread Glen Newton
On Fri, Mar 11, 2011 at 5:26 PM, Yonik Seeley wrote: > That's an apples to oranges comparison - lucene is a library and solr > is a server. I partially agree ;-) Lucene is a library and Solr is an http server wrapper-plus around Lucene. Solr also adds (all sorts of great) significant functional

Re: Using Solr over Lucene effects performance?

2011-03-11 Thread Glen Newton
I have seen little repeatable empirical evidence for the usual answer "mostly no". With respect: everyone in the Solr universe seems to answer this question in the way Yonik has. However, with a large number of requests the XML serialization/deserialization must have some, likely significant, impa

Re: Architecture decisions with Solr

2011-02-09 Thread Glen Newton
> This application will be built to serve many users If this means that you have thousands of users, 1000s of VMs and/or 1000s of cores is not going to scale. Have an ID in the index for each user, and filter using it. Then they can see only their own documents. Assuming that you are building an

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Glen Newton
Where do you get your Lucene/Solr downloads from? [x] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [] I/we build them from source via an SVN/Git checkout. -Glen Newton -- -

Re: DIH and UTF-8

2010-12-28 Thread Glen Newton
e of this list... On Tue, Dec 28, 2010 at 4:15 PM, Mark wrote: > It was due to the way I was writing to the DB using our rails application. > Everythin looked correct but when retrieving it using the JDBC driver it was > all managled. > > On 12/27/10 4:38 PM, Glen Newton wrote: &

Re: DIH and UTF-8

2010-12-27 Thread Glen Newton
ysql/share/mysql/charsets/ | > +------++ > 8 rows in set (0.00 sec) > > > Any other ideas? Thanks > > > On 12/27/10 3:23 PM, Glen Newton wrote: >> >> [client] >> >  default-character-set = utf8 >> >  [mysql] >> >  default-character-set=utf8 >> >  [mysqld] >> >  character_set_server = utf8 >> >  character_set_client = utf8 > -- -

Re: DIH and UTF-8

2010-12-27 Thread Glen Newton
racter-set = utf8 > [mysql] > default-character-set=utf8 > [mysqld] > character_set_server = utf8 > character_set_client = utf8 -Glen On Mon, Dec 27, 2010 at 6:15 PM, Mark wrote: > I tried both of those with no such luck. > > On 12/27/10 2:49 PM, Glen Newton wrote: >> &

Re: DIH and UTF-8

2010-12-27 Thread Glen Newton
1 - Verify your mysql is set up using UTF-8 2 - Does your JDBC connect string contain: useUnicode=true&characterEncoding=UTF-8 See: http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-charsets.html Glen http://zzzoot.blogspot.com/ On Mon, Dec 27, 2010 at 5:15 PM, Mark wrote: > Solr: 1.4

Re: Dataimport performance

2010-12-16 Thread Glen Newton
he Lucene list. If you have any questions, please contact me. Thanks, Glen Newton http://zzzoot.blogspot.com --> Old LuSql benchmarks: http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lusql.html On Thu, Dec 16, 2010 at 12:04 PM, Dyer, James wrote: > We have ~50 lon

IndexTank technology...

2010-11-11 Thread Glen Newton
Does anyone know what technology they are using: http://www.indextank.com/ Is it Lucene under the hood? Thanks, and apologies for cross-posting. -Glen http://zzzoot.blogspot.com -- -

Swap on large memory multi-core multi-cpu NUMA

2010-09-29 Thread Glen Newton
In a recent blog entry ("The MySQL “swap insanity” problem and the effects of the NUMA architecture" http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/), Jeremy Cole describes a particular but common problem with large memory installations of MySql on multi-core

Re: Memcache for Solr

2010-08-31 Thread Glen Newton
Apologies Chris: my mistake. -Glen On 31 August 2010 23:27, Chris Hostetter wrote: > > : ? > : The second post was relevant to the original post. > : And even dealt with some of the questions asked in the original: > > The first msg with subject "Memcache for Solr" was a thread-jack of > an exist

Re: Memcache for Solr

2010-08-31 Thread Glen Newton
? The second post was relevant to the original post. And even dealt with some of the questions asked in the original: Q > are there any down sides to it and difficult to implement A > We found it wasn't feasible to cache arbitrary result sets... ? -glen On 31 August 2010 15:11, Chris Hostette

Re: Having problems with the java api in 1.4.0

2010-08-24 Thread Glen Newton
Liz, I've built terrabyte (1-2 TB) test Lucene indexes, but have not reached to the petabyte level, so I am not sure. Certainly there is overhead in using the http and xml marshaling/de-marshaling, which may or may not be a critical factor for you. Could you give more information with respect to

Huge pages

2010-07-07 Thread Glen Newton
I was wondering if anyone has any experience using huge pages[1] to improve SOLR (or Lucene) performance (esp on 64bit). Some are reporting major performance gains in large, memory intense applications (like EJBs)[2]. Also, ephemeral but significant performance reductions have also been solved usin

Re: document level security: indexing/searching techniques

2010-07-06 Thread Glen Newton
d when the permissions change. Does SOLR expose this kind of functionality? -Glen Newton http://zzzoot.blogspot.com/ http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html On 7 July 2010 00:38, RL wrote: > > I've a question about indexing/searching techniques in relatio

Re: DataImport issue with large number of documents

2010-06-08 Thread Glen Newton
=true - "Increase the netTimoutForStreamingResults value" from http://lucene.grantingersoll.com/2008/07/16/mysql-solr-and-communications-link-failure/ See also: http://lucene.472066.n3.nabble.com/Recommended-MySQL-JDBC-driver-td817458.html -Glen Newton http://zzzoot.blogspot.com/ On 09/06/

Re: Experience with Solr and JVM heap sizes over 2 GB

2010-03-31 Thread Glen Newton
I have used up to 27GB of heap with no issues, both SOLR and (just) Lucene. -Glen Newton http://zzzoot.blogspot.com/ On 31 March 2010 11:34, Burton-West, Tom wrote: > Hello all, > > We have been running a configuration in production with 3 solr instances > under one  tomcat with 16

Re: Stopwords

2010-03-17 Thread Glen Newton
That discussion cites a paper via a URL: http://doc.rero.ch/lm.php?url#16;00,43,4,20091218142456-GY/Dolamic_Ljiljana__When_Stopword_Lists_Make_the_Difference_20091218.pdf Unfortunately when I go to this URL I get: "L'accès à ce document est limité." But I tracked down the paper. Here is its refe

Re: Moving From Oracle Text Search To Solr

2010-03-16 Thread Glen Newton
I've also index a concatenation of 50k journal articles (making a single document of several hundred MB of text) and it did not give me an OOM. -glen On 16 March 2010 15:57, Erick Erickson wrote: > Why do you think you'd hit OOM errors? How big is "very large"? I've > indexed, as a single docum

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-19 Thread Glen Newton
I've run Lucene with heap sizes as large as 28GB of RAM (on a 32GB machine, 64bit, Linux) and a ramBufferSize of 3GB. While I haven't noticed the GC issues mark mentioned in this configuration, I have seen them in the ranges he discusses (on 1.6 http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/i

Re: Indexing large text documents

2010-01-05 Thread Glen Newton
(In Lucene) I break the document into smaller pieces, then add each piece to the Document field in a loop. This seems to work better, but will mess-around with analysis like term offsets. This should work in your example. In Lucene, you can also add the field using a Reader to the file in question

Re: scanning folders recursively / Tika

2009-11-13 Thread Glen Newton
Have one thread recursing depth first down the directories & adding to a queue (fixed size). Have many threads reading off of the queue and doing the work. -glen http://zzzoot.blogspot.com/ 2009/11/13 Peter Gabriel : > Hello. > > I am on work with Tika 0.5 and want to scan a folder system about 1

Re: Solr and LSA

2009-10-30 Thread Glen Newton
I am using Semantic Vectors[1] implementation of LSA in a large scale digital library project called Project Torngat[2]. I presented some of the work at the European Conference on Digital Libraries (ECDL)[3], at the 'Very Large Digital Libraries (VLDL) workshop[4] in September. A pre-print of the p

Re: How to reduce the Solr index size..

2009-08-27 Thread Glen Newton
2009/8/27 Fuad Efendi : > stored="true" means that this piece of info will be stored in a filesystem. > So that your index will contain 1Mb of pure log PLUS some info related to > indexing itself: terms, etc. > > Search speed is more important than index size... Not if you run out of space for the

Visualizing Semantic Journal Space (large scale) using full-text

2009-07-29 Thread Glen Newton
tion using only the full-text (no metadata). For more info & howto: http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html Glen Newton -- -

Re: DataImportHandler / Import from DB : one data set comes in multiple rows

2009-07-23 Thread Glen Newton
doop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: Glen Newton >> To: solr-user@lucene.apache.org >> Sent: Thursday, July 23, 2009 5:52:43 AM >> Subject: Re: DataImportHandler / Import from DB : one data set comes in   >> multiple rows &g

Re: DataImportHandler / Import from DB : one data set comes in multiple rows

2009-07-23 Thread Glen Newton
://code4lib.org/files/glen_newton_LuSql.pdf [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql Disclosure: I am the author of LuSql. Glen Newton http://zzzoot.blogspot.com/ 2009/7/22 Chantal Ackermann : > Hi all, > > this is my first post, as I am new to SOLR (some Lucene exp).

Re: Any benefit from compressed object pointers? (java6u14)

2009-07-16 Thread Glen Newton
I am going to do some (large scale) indexing tests using Lucene & will post to both this and the Lucene list. More info on compressed pointers: http://wikis.sun.com/display/HotSpotInternals/CompressedOops -Glen Newton http://zzzoot.blogspot.com/search?q=lucene 2009/7/16 Kevin Peterson :

Re: Improve indexing time

2009-07-13 Thread Glen Newton
/cistilabswiki/index.php/LuSql Disclosure: I am the author of LuSql. Glen Newton http://zzzoot.blogspot.com/ 2009/7/13 Gurjot Singh : > Hi, > We have a solr index of size 626 MB and number of douments indexed are > 141810. We have configured index based spellchecker with buildOnCommit >

Re: Indexing rich documents from websites using ExtractingRequestHandler

2009-07-08 Thread Glen Newton
Try putting all the PDF URLs into a file, download with something like 'wget' then index locally. Glen Newton http://zzzoot.blogspot.com/ 2009/7/8 ahammad : > > Hello, > > I can index rich documents like pdf for instance that are on the filesystem. > Can we use Extracti

Re: Is there any other way to load the index beside using "http" connection?

2009-07-02 Thread Glen Newton
.blogspot.com/search?q=lucene > > > Thanks > > Francis > > > -Original Message- > From: Glen Newton [mailto:glen.new...@gmail.com] > Sent: Thursday, July 02, 2009 8:22 AM > To: solr-user@lucene.apache.org > Subject: Re: Is there any other way to load the index be

Re: Is there any other way to load the index beside using "http" connection?

2009-07-02 Thread Glen Newton
cis > > -Original Message- > From: Glen Newton [mailto:glen.new...@gmail.com] > Sent: Wednesday, July 01, 2009 8:06 PM > To: solr-user@lucene.apache.org > Subject: Re: Is there any other way to load the index beside using "http" > connection? > > You can

Re: Is there any other way to load the index beside using "http" connection?

2009-07-02 Thread Glen Newton
ster? > > You mentioned about LuSql, I am not familiar with that. Can you provide us > the docs or something? Again I am not the database Guys, I am only the solr > Guy. The database we have is a different box than Solr master and both are > running linux(RedHat). > > Tha

Re: Is there any other way to load the index beside using "http" connection?

2009-07-01 Thread Glen Newton
You can directly load to the backend Lucene using LuSql[1]. It is faster than Solr, sometimes as much as an order of magnitude faster. Disclosure: I am the author of LuSql -Glen http://zzzoot.blogspot.com/ [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql 2009/7/1 Francis Y

OPI: Article on Sunspot

2009-06-03 Thread Glen Newton
"Sunspot: A Solr-Powered Search Engine for Ruby" http://www.linux-mag.com/id/7341 glen http://zzzoot.blogspot.com/ -- -

Re: recommendation for document store to use alongside Solr?

2009-05-26 Thread Glen Newton
han if the document is accessed 99% of the time it is in the search result. I think you could do this with 2 cores in Solr, if I understand Solr correctly. I have also had good experience with BDB for (non-networked) document storage. Glen Newton http://zzzoot.blogspot.com/ 2009/5/26 Peter Keane

Re: Advice on custom DIH or other solutions: LuSql

2009-04-29 Thread Glen Newton
The next version of LuSql[1] supports solutions for this kind of issue: reading from JDBC (which may include a long and compex query) and then writing the results to a single (flattened) JDBC table that can subsequently be the source table for Solr. This might be helpful for your particular issue.

Re: DataImportHandler Questions-Load data in parallel and temp tables

2009-04-28 Thread Glen Newton
Amit, You might want to take a look at LuSql[1] and see if it may be appropriate for the issues you have. thanks, Glen [1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql 2009/4/27 Amit Nithian : > All, > I have a few questions regarding the data import handler. We have some

Re: Using Solr to index a database

2009-04-20 Thread Glen Newton
You have not indicated how you wish to use the index (inside Solr or not). It is possible that LuSql might be an preferable alternative to Solr/DataImportHandler, depending on your requirements. LuSql: http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql Disclaimer: I am the autho

Re: Using Solr from AppEngine application via SolrJ: any problematic issues?

2009-04-14 Thread Glen Newton
s. It would need to be ported > to a different implementation of SolrServer (the base class), one that uses > java.net.URL. I suggest “JavaNetUrlHttpSolrServer”. > > ~ David Smiley > > > On 4/14/09 1:13 PM, "Glen Newton" wrote: > > I was wondering if those m

Using Solr from AppEngine application via SolrJ: any problematic issues?

2009-04-14 Thread Glen Newton
I was wondering if those more up on SolrJ internals could take a look if there were any serious gotchas with the AppEngine's Java urlfetch with respect to SolrJ. http://code.google.com/appengine/docs/java/urlfetch/overview.html "The URL must use the standard ports for HTTP (80) and HTTPS (443). Th

Re: Any tips for indexing large amounts of data?

2009-04-09 Thread Glen Newton
x27;t yet implemented work stealing] -glen 2009/4/9 Glen Newton : > For Solr / Lucene: > - use -XX:+AggressiveOpts > - If available, huge pages can help. See > http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html > I haven't yet followed-up with my Luce

Re: Any tips for indexing large amounts of data?

2009-04-09 Thread Glen Newton
For Solr / Lucene: - use -XX:+AggressiveOpts - If available, huge pages can help. See http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html I haven't yet followed-up with my Lucene performance numbers using huge pages: it is 10-15% for large indexing jobs. For Lucene: - mu

Re: Using constants with DataImportHandler and MySQL ?

2009-04-08 Thread Glen Newton
In MySql at least, you can do achieve what I think you want by manipulating the SQL, like this: mysql> select "foo" as Constant1, id from Article limit 10; select "foo" as Constant1, id from Article limit 10; +---++ | Constant1 | id | +---++ | foo | 1 | | foo |

Re: jetty vs tomcat

2009-03-05 Thread Glen Newton
Performance comparison link: - "Jetty vs Tomcat: A Comparative Analysis". prepared by Greg Wilkins - May, 2008. http://www.webtide.com/choose/jetty.jsp 2009/3/5 Erik Hatcher : > That being said... I don't think there is a strong reason to go out of your > way to install Tomcat and do the addition

Re: public apology for company spam

2009-03-05 Thread Glen Newton
and your colleagues do not have infinite social capital, and hopefully you will have no reason to be forced to spend this capital in such an unfortunate manner in the future. :-) sincerely, Glen Newton 2009/3/5 Yonik Seeley : > This morning, an apparently over-zealous marketing firm, on behalf

Re: How to search the database tables using solr.

2009-03-04 Thread Glen Newton
Also take a look at LuSql: http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql 2009/3/4 Shalin Shekhar Mangar : > On Wed, Mar 4, 2009 at 7:32 PM, Radha C. wrote: > >> Hi, >> >> I am working in a software concern. We are having some R&D base work like >> making use of solr search

Re: [ANN] Lucid Imagination

2009-01-26 Thread Glen Newton
Congrats & good-luck on this new endeavour! -Glen :-) 2009/1/26 Grant Ingersoll : > Hi Lucene and Solr users, > > As some of you may know, Yonik, Erik, Sami, Mark and I teamed up with > Marc Krellenstein to create a company to provide commercial > support (with SLAs), training, value-add compone

Re: Customizing Solr to handle Leading Wildcard queries

2009-01-15 Thread Glen Newton
If we are talking short single term fields (like a file field that has a single term like "foo.pdf") then do what the DBMS b-tree indexes did a long time ago: for every field you want a leading wildcard, insert it in reverse order. So field file:"foo.pdf" is also stored, indexed as reverseField:"f

Re: emample for using SOLR for search against database tables

2008-12-23 Thread Glen Newton
Depending on your requirements, using Lucene directly instead of Solr might be appropriate. Even in a web environment. Not likely a popular statement on the Solr list, but one that you should consider. :-) -Glen 2008/12/23 Manupriya : > > Yes... At present I want SOLR to run within my standalone

Data Import Request Handler problem: Odd performance behaviour for large number of records

2008-12-18 Thread Glen Newton
Hello, I amusing Solr 1.4 (solr-2008-11-19) with Lucene 2.4 dropped in instead of 2.9 I am indexing 500k records using the JDBC Data Import Request Handler. Config: Linux openSUSE 10.2 (X86-64) Dual core dual core 64bit Xeon 3GHz Dell blade 8GB RAM java version "1.6.0_07" Java(TM) SE Runtim

Re: Solr on Solaris

2008-12-05 Thread Glen Newton
When you are saying "application server" do you mean tomcat? If yes, I have allocated >8GB of heap to tomcat and it uses it all no problem (64 bit Intel/64 bit Java). -glen 2008/12/5 Jeryl Cook <[EMAIL PROTECTED]>: > your out of memory :). > > each instance of an application server you can techn

Help with Solr configuration for LuSql performance comparison

2008-12-01 Thread Glen Newton
Hello, I am putting together some performance comparisons of LuSql[1] and Solr's Data Import Request Handler[2], JdbcDataSource[3]. I want to make sure I am comparing apples with apples, so would appreciate the community helping me to make sure I am doing so. First, LuSql default uses Lucene's St

Re: range queries on string field with millions of values

2008-11-28 Thread Glen Newton
Hi Naomi, Try fixing your data. :-) No, really: 1 - Sort all of your call numbers using whatever sort makes sense to you. 2 - Assign them - in your sort order - sort keys that are floats, starting: 0.01 0.02 ... 1.01 1.02 ... 79,999.98 79,999.99 This should ap

Lucene 2.3.1 vs 2.4 benchmarks using LuSql

2008-11-24 Thread Glen Newton
I have some simple indexing benchmarks comparing Lucene 2.3.1 with 2.4: http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lusql.html In the next couple of days I will be running benchmarks comparing Solr's DataImportHandler/JdbcDataSource indexing performance with LuSql and wil

Re: Solr schema Lucene's StandardAnalyser equivalent?

2008-11-19 Thread Glen Newton
ry.java > ./src/java/org/apache/solr/analysis/StandardTokenizerFactory.java > > > Does that do it? > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > From: Glen Newton <[EMAIL PROTECTED]&g

Solr schema Lucene's StandardAnalyser equivalent?

2008-11-19 Thread Glen Newton
Hello, I am looking for the Solr schema equivalent to Lucene's StandardAnalyser. Is it the Solr schema type:

Re: Software Announcement: LuSql: Database to Lucene indexing

2008-11-18 Thread Glen Newton
s.apache.org/jira/browse/SOLR-853 > > On Tue, Nov 18, 2008 at 8:26 PM, Glen Newton <[EMAIL PROTECTED]> wrote: > >> Erik, >> >> Right now there is no real abstraction like DIH in LuSql. But as >> indicated in the TODO section of the documentation, I was plan

Re: Software Announcement: LuSql: Database to Lucene indexing

2008-11-18 Thread Glen Newton
IH could borrow from? Or vice versa? > > Erik > > > On Nov 17, 2008, at 11:03 PM, Glen Newton wrote: >> >> That said, I am very interested in making LuSql useful to the Solr >> community as well as teh broader Lucene community, so if any of you >> can off

Re: Software Announcement: LuSql: Database to Lucene indexing

2008-11-17 Thread Glen Newton
Hello, I'm Glen Newton, LuSql author. Thanks for the kind words about LuSql! :-) I have just joined the Solr list, and while knowing about Solr, I have not used it and have only limited technical knowledge of Solr. That said, I am very interested in making LuSql useful to the Solr comm