from:"George Aroush"

RE: Near Real Time

2009-10-21 Thread George Aroush

> >   Further without the NRT features present what's the closest I can 
> > expect to real time for the typical use case (obviously this will vary
> > but the average deploy). One hour? One Minute? It seems like there are 
> > a few hacks to get somewhat close. Thanks so much.
> 
> Depends a lot on the nature of the requests and the size of the index,
> but one minute is often doable.
> On a large index that facets on many fields per request, one minute is
> probably still out of reach.

With no facets, what index size is consider, in general, out of reach for
NRT?  Is a 9GB index with 7 million records out of reach?  How about 3GB
with 3 million records?  3GB with 800K records?  This is for 1 min. NRT
setting.

Thanks.

-- George

Solr* != solr*

2008-07-01 Thread George Aroush

Hi Folks,

Can someone tell me what I might have setup wrong?  After indexing my data,
I can search just fine on, let say "sol*" but not on "Sol*" (note upper case
'S' vs. lower case 's') I get 0 hits.

Here is my customize schema.xml setting:


  







  
  







  


Btw, "Solr", "solr", "sOlr", etc. works.  It's a problem with wild cards.

Thanks in advance.

-- George

schema.xml for CJK, German, French, etc.

2008-07-02 Thread George Aroush

Hi Folks,

Has anyone created schema.xml for languages other then English?  I like to
see a working example mainly for CJK, German and French.  If you have can
you share them?

TO get me started, I created the following for German:

  

  
  
  
  
  
  

  

Will those filters work on German text?

Thanks.

-- George

RE: schema.xml for CJK, German, French, etc.

2008-07-02 Thread George Aroush

Thanks Erik!

Trouble is, I don't know those languages to conclude that my setup is
correct, specially for CJK.

It's less problematic for European languages, but then again, should I be
using those English filters with the German SnowballPorterFilterFactory?
That is, will WordDelimiterFilterFactory work with a German filter?  Etc.

It would be nice if folks share their setting (Generic for each language)
and then we can add them to a Solr Wiki.

-- George

> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, July 02, 2008 9:40 PM
> To: solr-user@lucene.apache.org
> Subject: Re: schema.xml for CJK, German, French, etc.
> 
> 
> On Jul 2, 2008, at 9:16 PM, George Aroush wrote:
> > Has anyone created schema.xml for languages other then English?
> 
> Indeed.
> 
> >  I like to
> > see a working example mainly for CJK, German and French.  
> If you have 
> > can you share them?
> >
> > TO get me started, I created the following for German:
> >
> >  
> >
> >  
> >   > words="stopwords.txt"/>
> >   > generateWordParts="0"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0"/>
> >  
> >   > language="German" />
> >  
> >
> >  
> >
> > Will those filters work on German text?
> 
> 
> One tip that will help is visiting 
> http://localhost:8983/solr/admin/analysis.jsp
>   and test it out to see that you're getting the tokenization 
> that you desire on some sample text.  Solr's analysis 
> introspection is quite nice and easy to tinker with.
> 
> Removing stop words before lower casing won't quite work 
> though, as StopFilter is case-sensitive with all stop words 
> generally lowercased, but other than relocating the 
> StopFilterFactory in that chain it seems reasonable.
> 
> As always, though, it depends on what you want to do with 
> these languages to offer more concrete recommendations.
> 
>   Erik
>

RE: Solr* != solr*

2008-08-01 Thread George Aroush

Hi Erik and all,

I'm still trying to solve this issue and I like to know how others might
have solved it in their client.  I can't modify Solr / Lucene code and I'm
using Solr 1.2.

What I have done is simple.  Given a user input, I break it into words and
then analyze each word.  Any word contains wildcards (* Or ?) I lowercase
it.

While the logic is simple, I'm not comfortable with it because the
word-breaker isn't based on the analyzer in use by Lucene.  In my case, I
can't tell which analyzer is used.

So my question is, did you run into this problem, if so, how did you
workaround it?  That is, is breaking on generic whitespaces (independent of
the analyzer in use) "good enough"?

Thanks.

-- George

> -Original Message-
> From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, July 01, 2008 9:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr* != solr*
> 
> George - wildcard expressions, in Lucene/Solr's QueryParser, 
> are not analyzed.  There is one trick in the API that isn't 
> yet wired to  
> Solr's configuration, and that is setLowercaseExpandedTerms(true).   
> This would solve the Sol* issue because when indexed all 
> terms for the "text" field are lowercased during analysis.
> 
> An functional alternative, of course, is to have the client 
> lowercase the query expression before requesting to Solr 
> (careful, though - consider AND/OR/NOT).
> 
>   Erik
> 
> 
> 
> On Jul 1, 2008, at 8:14 PM, George Aroush wrote:
> 
> > Hi Folks,
> >
> > Can someone tell me what I might have setup wrong?  After 
> indexing my 
> > data, I can search just fine on, let say "sol*" but not on "Sol*" 
> > (note upper case 'S' vs. lower case 's') I get 0 hits.
> >
> > Here is my customize schema.xml setting:
> >
> > > positionIncrementGap="100">
> >  
> >
> >
> > > words="stopwords.txt"/>
> > > generateWordParts="0" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0"/>
> >
> > > protected="protwords.txt"/>
> >
> >  
> >  
> >
> >
> > > words="stopwords.txt"/>
> > > generateWordParts="0" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0"/>
> >
> > > protected="protwords.txt"/>
> >
> >  
> >
> >
> > Btw, "Solr", "solr", "sOlr", etc. works.  It's a problem with wild 
> > cards.
> >
> > Thanks in advance.
> >
> > -- George
>

Commit frequency

2009-01-16 Thread George Aroush

Hi Folks,

I'm trying to collect some data -- if you can share them -- about the
frequency of commit you have set on your index and at what rate did you find
it acceptable.  This is for a none-master / slave setup.

For my case, in a test environment, I have experimented with a 1 minute
interval (each 1 minute I commit anywhere between 0 to 10 new documents, and
0 to 10 updated documents).  While commit is ongoing, I'm also searching on
the index.

For this experiment, my index size is about 3.5 Gb, and I have about 1.2
million documents.  My experiment was done on a Windows 2003 server, with 4
Gb RAM and 3 GHZ 2x Xean CPU.

So, if you can share your setup, at least the commit frequency, I would
appreciate it.

What I'm trying to get out of this, is what's the lowest commit frequency
that Solr can handle.

Regards,

-- George

RE: multiple indices

2007-09-11 Thread George Aroush

> > I was going through some old emails on this topic. Rafael Rossini 
> > figured out how to run multiple indices on single instance of jetty 
> > but it has to be jetty plus. I guess jetty doesn't allow this? I 
> > suppose I can add additional jars and make it work but I 
> haven't tried 
> > that. It'll always be much safer/simpler/less playing around if a 
> > feature is available out of box.
> 
> The example that comes with Solr is meant to be a starting 
> point for users.  It is a relatively functional and 
> well-commented example, and its config files are pretty much 
> the canonical documentation for solr config, and for many 
> people they can modifying it for their own production use
> 
> but it is still just an example application.
> 
> By the time people want to do expert-level activities with 
> Solr (multi-index falls into that category), they should be 
> able to configure their own servlet container, whether it be 
> jetty plus, tomcat, resin, etc.

Does this means Solr 1.2 supports MultiSearcher?

-- George

Solr and WebSphere 6.1

2007-12-28 Thread George Aroush

Hi folks,

Has anyone managed to get Solr 1.2 to run under WebSphere 6.1?  If so, can
you share your experience, what configuration, settings, etc. you had to do.

Someone asked this questions earlier this month, but I don't see anyone
followed up -- so I'm asking again since I have this need too.

Thanks.

-- George

RE: Near Real Time

Solr* != solr*

schema.xml for CJK, German, French, etc.

RE: schema.xml for CJK, German, French, etc.

RE: Solr* != solr*

Commit frequency

RE: multiple indices

Solr and WebSphere 6.1

8 matches

Site Navigation

Mail list logo

Footer information