Yonik/Erick,
We are building a custome Search which is to be done in 2 parts executed at
different points of time. As a result of it, the first step we want tokenize
the information and store it, which we want to retrieve a later point of
time for further processing and then store it back into the
Hi All,
I understand that a leading Wild card search is not allowed as it is a very
costly operation. There is an issues logged for it . (
http://issues.apache.org/jira/browse/SOLR-218). Is there any other way of
enabling leading wildcards apart from doing it in code by calling *
QueryParser.setAl
org/ is a separate world language database
> project. I found it at the bottom of the WordNet wikipedia page. Thanks
> for starting me on the search!
>
> Lance
>
> -Original Message-
> From: Eswar K [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 26, 2007 6:50 PM
&
Analyzer is slower, for example. I
> recently indexed cca 20MM large docs on a 8-core, 8 GB RAM box in 10 hours -
> 550 docs/second. No CJK, just English.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> - Original Message
>
(or know) that
> Google does on east asian text? I don't think you can treat the three
> languages in the same way here. Japanese has multi-morphemic words,
> but Chinese doesn't really.
>
> jds
>
> On Nov 27, 2007 11:54 AM, Eswar K <[EMAIL PROTECTED]> wrote:
&g
Is there any specific reason why the CJK analyzers in Solr were chosen to be
n-gram based instead of it being a morphological analyzer which is kind of
implemented in Google as it considered to be more effective than the n-gram
ones?
Regards,
Eswar
On Nov 27, 2007 7:57 AM, Eswar K <[EM
ng your own analyses.
>
> http://en.wikipedia.org/wiki/WordNet
> http://wordnet.princeton.edu/
>
> Lance
>
> -Original Message-
> From: Eswar K [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 26, 2007 6:34 PM
> To: solr-user@lucene.apache.org
> Subject: Re:
yword,
Where a plain keyword search will fail if there is no exact match, this algo
will often return relevant documents that don't contain the keyword at all.
- Eswar
On Nov 27, 2007 7:51 AM, Marvin Humphrey <[EMAIL PROTECTED]> wrote:
>
> On Nov 26, 2007, at 6:06 PM, E
> On Nov 27, 2007 10:01 AM, Eswar K <[EMAIL PROTECTED]> wrote:
>
> > What is the performance of these CJK analyzers (one in lucene and
> hylanda
> > )?
> > We would potentially be indexing millions of documents.
> >
> > James,
> >
> > We w
esults using LSI?
>
> I suppose if someone said they had a patch for Lucene/Solr that
> implemented it, we could ask on legal-discuss for advice.
>
> -Grant
>
> On Nov 26, 2007, at 1:13 PM, Eswar K wrote:
>
> > I was just searching for info on LSA and came across Se
AIL PROTECTED]> wrote:
> I don't think NGram is good method for Chinese.
>
> CJKAnalyzer of Lucene is 2-Gram.
>
> Eswar K:
> if it is chinese analyzer,,i recommend hylanda(www.hylanda.com),,,it is
> the best chinese analyzer and it not free.
> if u wanna free chinese an
Hoss,
Thanks a lot. Will look into it.
Regards,
Eswar
On Nov 26, 2007 11:55 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> : Does Solr come with Language analyzers for CJK? If not, can you please
> : direct me to some good CJK analyzers?
>
> Lucene has a CJKTokenizer and CJKAnalyzer in the
_indexing) is
> > patented, so it is not likely to happen unless the authors donate the
> > patent to the ASF.
> >
> > -Grant
> >
> >
> >
> > On Nov 26, 2007, at 8:23 AM, Eswar K wrote:
> >
> > > All,
> > >
> > &
Hi,
Does Solr come with Language analyzers for CJK? If not, can you please
direct me to some good CJK analyzers?
Regards,
Eswar
All,
Is there any plan to implement Latent Semantic Analysis as part of Solr
anytime in the near future?
Regards,
Eswar
t 20MM docs. Redo the search, and it's 1 ms (cached).
> This is without any load nor serious benchmarking, clearly.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> - Original Message
> From: Eswar K <[EMAIL PROTECTED]>
> To
Hi otis,
I understand that is slightly off track question, but I am just curious to
know the performance of Search on a 20 GB index file. What has been your
observation?
Regards,
Eswar
On Nov 21, 2007 12:33 PM, Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:
> Mike is right about the occasional slo
All,
Is there any difference in the way any of the Solr's features work on
Windows/Linux. Ideally it should not as its a java implementation. I was
looking at CollectionsDistribution and its documentation (
http://wiki.apache.org/solr/CollectionDistribution). It appeared that it
uses rsync which i
have five servers to do that. We have a separate server
> for indexing and use the Solr distribution scripts.
>
> We have a relatively small index, about 250K docs.
>
> wunder
>
>
> On 11/19/07 8:48 PM, "Eswar K" <[EMAIL PROTECTED]> wrote:
>
> >
> Do you mean that you're expecting about 1000 QPS over an index with up
> to 20 million documents?
>
> --Matthew
>
> On Nov 19, 2007, at 6:00 AM, Eswar K wrote:
>
> > All,
> >
> > Can you give some information on this or atleast let me know where
All,
Can you give some information on this or atleast let me know where I can
find this information if its already listed out anywhere.
Regards,
Eswar
On Nov 18, 2007 9:45 PM, Eswar K <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I understand that Solr can be used on different Linux fl
Kishore,
Solr has a SynonymFilterFactory which might be off use to you (
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46)
Regards,
Eswar
On Nov 18, 2007 10:39 PM, Kishore AVK. Veleti <[EMAIL PROTECTED]>
wrote:
> Hi All,
>
> I am new to
Is there any idea implementing that feature in the up coming releases?
Regards,
Eswar
On Nov 18, 2007 9:35 PM, Stuart Sierra <[EMAIL PROTECTED]> wrote:
> On Nov 18, 2007 10:50 AM, Eswar K <[EMAIL PROTECTED]> wrote:
> > We have a scenario, where we want to find out document
Hi,
I understand that Solr can be used on different Linux flavors. Is there any
preferred flavor (Like Red Hat, Ubuntu, etc)?
Also what is the kind of configuration of hardware (Processors, RAM, etc) be
best suited for the install?
We expect to load it with millions of documents (varying from 2 -
We have a scenario, where we want to find out documents which are similar in
content. To elaborate a little more on what we mean here, lets take an
example.
The example of this email chain in which we are interacting on, can be best
used for illustrating the concept of near dupes (We are not getti
25 matches
Mail list logo