from:"Rich Cariens"

Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

2015-09-23 Thread Rich Cariens

For what it's worth, we've had good luck using the ICUTokenizer and associated filters. A native Chinese speaker here at the office gave us an enthusiastic thumbs up on our Chinese search results. Your mileage may vary of course. On Wed, Sep 23, 2015 at 11:04 AM, Erick Erickson wrote: > In a wor

Re: Implementing custom analyzer for multi-language stemming

2014-08-06 Thread Rich Cariens

the language detection tool do it's best and not sweat it. On Wed, Aug 6, 2014 at 12:11 AM, TK wrote: > > On 8/5/14, 8:36 AM, Rich Cariens wrote: > >> Of course this is extremely primitive and basic, but I think it would be >> possible to write a CharFilter or Tok

Re: Implementing custom analyzer for multi-language stemming

2014-08-05 Thread Rich Cariens

I've started a GitHub project to try out some cross-lingual analysis ideas ( https://github.com/whateverdood/cross-lingual-search). I haven't played over there for about 3 months, but plan on restarting work there shortly. In a nutshell, the interesting component ("SimplePolyGlotStemmingTokenFilter

Re: MMapDirectory failed to map a 23G compound index segment

2011-09-30 Thread Rich Cariens

ept for OS system wide / per-process limits imposed) > you should be able to mmap up to the full 64 bit address space. > > Your virtual memory is unlimited (from "ulimit" output), so that's good. > > Mike McCandless > > http://blog.mikemccandless.com > > On

Re: MMapDirectory failed to map a 23G compound index segment

2011-09-12 Thread Rich Cariens

11 at 8:58 PM, Lance Norskog wrote: > > > Do you need to use the compound format? > > > > On Thu, Sep 8, 2011 at 3:57 PM, Rich Cariens >wrote: > > > >> I should add some more context: > >> > >> 1. the problem index included several cfs s

Re: MMapDirectory failed to map a 23G compound index segment

2011-09-08 Thread Rich Cariens

open files ulimit. Do the MultiMMapIndexInput ByteBuffer arrays each consume a file handle/descriptor? On Thu, Sep 8, 2011 at 5:19 PM, Rich Cariens wrote: > FWiW I optimized the index down to a single segment and now I have no > trouble opening an MMapDirectory on that index, even though

Re: MMapDirectory failed to map a 23G compound index segment

2011-09-08 Thread Rich Cariens

FWiW I optimized the index down to a single segment and now I have no trouble opening an MMapDirectory on that index, even though the 23G cfx segment file remains. On Thu, Sep 8, 2011 at 4:27 PM, Rich Cariens wrote: > Thanks for the response. "free -g" reports: > >

Re: MMapDirectory failed to map a 23G compound index segment

2011-09-08 Thread Rich Cariens

iettecatte > My memory of this is a little rusty but isn't mmap also limited by mem + > swap on the box? What does 'free -g' report? > > François > > On Sep 7, 2011, at 12:25 PM, Rich Cariens wrote: > > > Ahoy ahoy! > > > > I've run

MMapDirectory failed to map a 23G compound index segment

2011-09-07 Thread Rich Cariens

Ahoy ahoy! I've run into the dreaded OOM error with MMapDirectory on a 23G cfs compound index segment file. The stack trace looks pretty much like every other trace I've found when searching for OOM & "map failed"[1]. My configuration follows: Solr 1.4.1/Lucene 2.9.3 (plus SOLR-1969

Re: SSD experience

2011-08-22 Thread Rich Cariens

ut a 40% boost in performance on our tests with no changes > > except the disk. > > > > On Mon, Aug 22, 2011 at 10:54 AM, Rich Cariens >wrote: > > > >> Ahoy ahoy! > >> > >> Does anyone have any experiences or stories they can share with

SSD experience

2011-08-22 Thread Rich Cariens

Ahoy ahoy! Does anyone have any experiences or stories they can share with the list about how SSDs impacted search performance for better or worse? I found a Lucene SSD performance benchmark doc

Re: how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Rich Cariens

We patched our 1.4.1 build with SOLR-1969(making MMapDirectory configurable) and realized a 64% search performance boost on our Linux hosts. On Mon, Aug 8, 2011 at 10:05 AM, Dyer, James wrote: > If you want to try MMapDirectory with Solr 1.4, then

Re: document storage

2011-05-13 Thread Rich Cariens

We've decided to store the original document in both Solr and external repositories. This is to support the following: 1. highlighting - We need to mark-up the entire document with hit-terms. However if this was the only reason to store the text I'd seriously consider calling out to the e

Re: Guidance for event-driven indexing

2011-02-15 Thread Rich Cariens

; and if you want to apply an UpdateChain, that would look like this: > > > > myPipeline > > > > See http://wiki.apache.org/solr/SolrRequestHandler for details > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com &g

Re: Guidance for event-driven indexing

2011-02-15 Thread Rich Cariens

o choose where :) > > A JMSUpdateHandler sounds heavy weight, but does not need to be, and might > be the logically best place for such a feature imo. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > On 14. feb. 2011, at 17.42, Rich Carien

Re: Guidance for event-driven indexing

2011-02-14 Thread Rich Cariens

itect > Cominvent AS - www.cominvent.com > > On 14. feb. 2011, at 16.53, Rich Cariens wrote: > > > Hello, > > > > I've built a system that receives JMS events containing links to docs > that I > > must download and index. Right now the JMS receiving, downloa

Guidance for event-driven indexing

2011-02-14 Thread Rich Cariens

Hello, I've built a system that receives JMS events containing links to docs that I must download and index. Right now the JMS receiving, downloading, and transformation into SolrInputDoc's happens in a separate JVM that then uses Solrj javabin HTTP POSTs to distribute these docs across many index

Re: Full text hit term highlighting

2010-12-05 Thread Rich Cariens

This works, as long as you don't need query > highlighting." Have you found a way around that, or have you decided not to > use highlighting after all? Or am I missing something? > ____ > From: Rich Cariens [richcari...@gmail.com]

Re: Full text hit term highlighting

2010-12-05 Thread Rich Cariens

e next of > any of the terms. > > On Sat, Dec 4, 2010 at 4:10 PM, Rich Cariens > wrote: > > Anyone ever use Solr to present a view of a document with hit-terms > > highlighted within? Kind of like Google's cached <http://bit.ly/hgudWq > >copies? > > > > > > -- > Lance Norskog > goks...@gmail.com >

Full text hit term highlighting

2010-12-04 Thread Rich Cariens

Anyone ever use Solr to present a view of a document with hit-terms highlighted within? Kind of like Google's cached copies?

Re: Optimize Index

2010-11-04 Thread Rich Cariens

For what it's worth, the Solr class instructor at the Lucene Revolution conference recommended *against* optimizing, and instead suggested to just let the merge factor do it's job. On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey wrote: > On 11/4/2010 7:22 AM, stockiii wrote: > >> how can i start an

Re: StreamingUpdateSolrServer hangs

2010-04-16 Thread Rich Cariens

I experienced the hang described with the Solr 1.4.0 build. Yonik - I also thought the streaming updater was blocking on commits but updates never resumed. To be honest I was in a bit of a rush to meet a deadline so after spending a day or so tinkering I bailed out and just wrote a component by h

Re: Index "transaction log" or equivalent?

2010-04-08 Thread Rich Cariens

Thanks Mark. That's sort of what I was thinking of doing. On Thu, Apr 8, 2010 at 10:33 AM, Mark Miller wrote: > On 04/08/2010 09:23 AM, Rich Cariens wrote: > >> Are there any best practices or built-in support for keeping track of >> what's >> been inde

Index "transaction log" or equivalent?

2010-04-08 Thread Rich Cariens

Are there any best practices or built-in support for keeping track of what's been indexed in a Solr application so as to support a full rebuild? I'm not indexing from a single source, but from many, sometimes arbitrary, sources including: 1. A document repository that fires events (containing

Re: an OR filter query

2010-04-04 Thread Rich Cariens

Why not just make the your "mature:false" filter query a default value instead of always appended? I.e.: -snip- mature:false -snip- That way if someone wants mature items in their results the search client explicitly sets "fq=mature:*" or whatever. Would that work? On Sun, Apr 4, 2010 at

Re: Experience with indexing billions of documents?

2010-04-02 Thread Rich Cariens

A colleague of mine is using native Lucene + some home-grown patches/optimizations to index over 13B small documents in a 32-shard environment, which is around 406M docs per shard. If there's a 2B doc id limitation in Lucene then I assume he's patched it himself. On Fri, Apr 2, 2010 at 1:17 PM,

Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

Re: Implementing custom analyzer for multi-language stemming

Re: Implementing custom analyzer for multi-language stemming

Re: MMapDirectory failed to map a 23G compound index segment

Re: MMapDirectory failed to map a 23G compound index segment

Re: MMapDirectory failed to map a 23G compound index segment

Re: MMapDirectory failed to map a 23G compound index segment

Re: MMapDirectory failed to map a 23G compound index segment

MMapDirectory failed to map a 23G compound index segment

Re: SSD experience

SSD experience

Re: how to enable MMapDirectory in solr 1.4?

Re: document storage

Re: Guidance for event-driven indexing

Re: Guidance for event-driven indexing

Re: Guidance for event-driven indexing

Guidance for event-driven indexing

Re: Full text hit term highlighting

Re: Full text hit term highlighting

Full text hit term highlighting

Re: Optimize Index

Re: StreamingUpdateSolrServer hangs

Re: Index "transaction log" or equivalent?

Index "transaction log" or equivalent?

Re: an OR filter query

Re: Experience with indexing billions of documents?

26 matches

Site Navigation

Mail list logo

Footer information