RE: Near Real Time
> > Further without the NRT features present what's the closest I can > > expect to real time for the typical use case (obviously this will vary > > but the average deploy). One hour? One Minute? It seems like there are > > a few hacks to get somewhat close. Thanks so much. > > Depends a lot on the nature of the requests and the size of the index, > but one minute is often doable. > On a large index that facets on many fields per request, one minute is > probably still out of reach. With no facets, what index size is consider, in general, out of reach for NRT? Is a 9GB index with 7 million records out of reach? How about 3GB with 3 million records? 3GB with 800K records? This is for 1 min. NRT setting. Thanks. -- George
Solr* != solr*
Hi Folks, Can someone tell me what I might have setup wrong? After indexing my data, I can search just fine on, let say "sol*" but not on "Sol*" (note upper case 'S' vs. lower case 's') I get 0 hits. Here is my customize schema.xml setting: Btw, "Solr", "solr", "sOlr", etc. works. It's a problem with wild cards. Thanks in advance. -- George
schema.xml for CJK, German, French, etc.
Hi Folks, Has anyone created schema.xml for languages other then English? I like to see a working example mainly for CJK, German and French. If you have can you share them? TO get me started, I created the following for German: Will those filters work on German text? Thanks. -- George
RE: schema.xml for CJK, German, French, etc.
Thanks Erik! Trouble is, I don't know those languages to conclude that my setup is correct, specially for CJK. It's less problematic for European languages, but then again, should I be using those English filters with the German SnowballPorterFilterFactory? That is, will WordDelimiterFilterFactory work with a German filter? Etc. It would be nice if folks share their setting (Generic for each language) and then we can add them to a Solr Wiki. -- George > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 02, 2008 9:40 PM > To: solr-user@lucene.apache.org > Subject: Re: schema.xml for CJK, German, French, etc. > > > On Jul 2, 2008, at 9:16 PM, George Aroush wrote: > > Has anyone created schema.xml for languages other then English? > > Indeed. > > > I like to > > see a working example mainly for CJK, German and French. > If you have > > can you share them? > > > > TO get me started, I created the following for German: > > > > > > > > > > > words="stopwords.txt"/> > > > generateWordParts="0" > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > catenateAll="0"/> > > > > > language="German" /> > > > > > > > > > > Will those filters work on German text? > > > One tip that will help is visiting > http://localhost:8983/solr/admin/analysis.jsp > and test it out to see that you're getting the tokenization > that you desire on some sample text. Solr's analysis > introspection is quite nice and easy to tinker with. > > Removing stop words before lower casing won't quite work > though, as StopFilter is case-sensitive with all stop words > generally lowercased, but other than relocating the > StopFilterFactory in that chain it seems reasonable. > > As always, though, it depends on what you want to do with > these languages to offer more concrete recommendations. > > Erik >
RE: Solr* != solr*
Hi Erik and all, I'm still trying to solve this issue and I like to know how others might have solved it in their client. I can't modify Solr / Lucene code and I'm using Solr 1.2. What I have done is simple. Given a user input, I break it into words and then analyze each word. Any word contains wildcards (* Or ?) I lowercase it. While the logic is simple, I'm not comfortable with it because the word-breaker isn't based on the analyzer in use by Lucene. In my case, I can't tell which analyzer is used. So my question is, did you run into this problem, if so, how did you workaround it? That is, is breaking on generic whitespaces (independent of the analyzer in use) "good enough"? Thanks. -- George > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Tuesday, July 01, 2008 9:35 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr* != solr* > > George - wildcard expressions, in Lucene/Solr's QueryParser, > are not analyzed. There is one trick in the API that isn't > yet wired to > Solr's configuration, and that is setLowercaseExpandedTerms(true). > This would solve the Sol* issue because when indexed all > terms for the "text" field are lowercased during analysis. > > An functional alternative, of course, is to have the client > lowercase the query expression before requesting to Solr > (careful, though - consider AND/OR/NOT). > > Erik > > > > On Jul 1, 2008, at 8:14 PM, George Aroush wrote: > > > Hi Folks, > > > > Can someone tell me what I might have setup wrong? After > indexing my > > data, I can search just fine on, let say "sol*" but not on "Sol*" > > (note upper case 'S' vs. lower case 's') I get 0 hits. > > > > Here is my customize schema.xml setting: > > > > > positionIncrementGap="100"> > > > > > > > > > words="stopwords.txt"/> > > > generateWordParts="0" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="0"/> > > > > > protected="protwords.txt"/> > > > > > > > > > > > > > words="stopwords.txt"/> > > > generateWordParts="0" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="0"/> > > > > > protected="protwords.txt"/> > > > > > > > > > > Btw, "Solr", "solr", "sOlr", etc. works. It's a problem with wild > > cards. > > > > Thanks in advance. > > > > -- George >
Commit frequency
Hi Folks, I'm trying to collect some data -- if you can share them -- about the frequency of commit you have set on your index and at what rate did you find it acceptable. This is for a none-master / slave setup. For my case, in a test environment, I have experimented with a 1 minute interval (each 1 minute I commit anywhere between 0 to 10 new documents, and 0 to 10 updated documents). While commit is ongoing, I'm also searching on the index. For this experiment, my index size is about 3.5 Gb, and I have about 1.2 million documents. My experiment was done on a Windows 2003 server, with 4 Gb RAM and 3 GHZ 2x Xean CPU. So, if you can share your setup, at least the commit frequency, I would appreciate it. What I'm trying to get out of this, is what's the lowest commit frequency that Solr can handle. Regards, -- George
RE: multiple indices
> > I was going through some old emails on this topic. Rafael Rossini > > figured out how to run multiple indices on single instance of jetty > > but it has to be jetty plus. I guess jetty doesn't allow this? I > > suppose I can add additional jars and make it work but I > haven't tried > > that. It'll always be much safer/simpler/less playing around if a > > feature is available out of box. > > The example that comes with Solr is meant to be a starting > point for users. It is a relatively functional and > well-commented example, and its config files are pretty much > the canonical documentation for solr config, and for many > people they can modifying it for their own production use > > but it is still just an example application. > > By the time people want to do expert-level activities with > Solr (multi-index falls into that category), they should be > able to configure their own servlet container, whether it be > jetty plus, tomcat, resin, etc. Does this means Solr 1.2 supports MultiSearcher? -- George
Solr and WebSphere 6.1
Hi folks, Has anyone managed to get Solr 1.2 to run under WebSphere 6.1? If so, can you share your experience, what configuration, settings, etc. you had to do. Someone asked this questions earlier this month, but I don't see anyone followed up -- so I'm asking again since I have this need too. Thanks. -- George