Re: Chinese Phonetic search

2012-02-07 Thread Li Li
you can convert Chinese words to pinyin and use n-gram to search phonetic similar words On Wed, Feb 8, 2012 at 11:10 AM, Floyd Wu wrote: > Hi there, > > Does anyone here ever implemented phonetic search especially with > Chinese(traditional/simplified) using SOLR or Lucene? > > Please share some

Re: Typical Cache Values

2012-02-07 Thread Pranav Prakash
> > * > * > This is not unusual, but there's also not much reason to give this much > memory in your case. This is the cache that is hit when a user pages > through result set. Your numbers would seem to indicate one of two things: > 1> your window is smaller than 2 pages, see solrconfig.xml, >

Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-07 Thread Lance Norskog
Experience has shown that it is much faster to run Solr with a small amount of memory and let the rest of the ram be used by the operating system "disk cache". That is, the OS is very good at keeping the right disk blocks in memory, much better than Solr. How much RAM is in the server and how much

How to use nested query in fq?

2012-02-07 Thread Yandong Yao
Hi Guys, I am using Solr 3.5, and would like to use a fq like 'getField(getDoc(uuid:workspace_${workspaceId})), "isPublic"):true? - workspace_${workspaceId}: workspaceId is indexed field. - getDoc(uuid:concat("workspace_", workspaceId): return the document whose uuid is "workspace_${workspaceI

Re: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-02-07 Thread Koji Sekiguchi
It seems a bug to me Can you open a ticket? Thank you Koji Sekiguchi from iPhone On 2012/02/08, at 13:32, Shyam Bhaskaran wrote: > Hi Koji, > > Thanks for the response when I use hl.bs.chars=".!?" and hl.bs.maxScan=200 I > see improvements, below is the highlighted value > > "The synthesis t

RE: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-02-07 Thread Shyam Bhaskaran
Hi Koji, Thanks for the response when I use hl.bs.chars=".!?" and hl.bs.maxScan=200 I see improvements, below is the highlighted value "The synthesis tool only supports the resolution functions for std_logic and std_logic_vector." But in other cases I also see that some of the words break in

struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-07 Thread geeky2
hello all, i am struggling with getting solr.WordDelimiterFilterFactory to behave as is indicated in the solr book (Smiley) on page 54. the example in the books reads like this: >> Here is an example exercising all options: WiFi-802.11b to Wi, Fi, WiFi, 802, 11, 80211, b, WiFi80211b << essentia

Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Chris Hostetter
: This all seems a bit too much work for such a real-world scenario? You haven't really told us what your scenerio is. You said you want to split tokens on whitespace, full-stop (aka: period) and comma only, but then in response to some suggestions you added comments other things that you neve

Re: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-02-07 Thread Koji Sekiguchi
(12/02/08 1:54), Shyam Bhaskaran wrote: Hi Koji, I have tried using hl.bs.type=SENTENCE and still no improvement. We are storing PDF extracted content in the field which has termVectors enabled. Example the field contains the following data extracted from PDF "User-defined resolution function

URI Encoding with Solr and Weblogic

2012-02-07 Thread Elisabeth Adler
Hi, I try to get Solr 3.3.0 to process Arabic search requests using its admin interface. I have successfully managed to set it up on Tomcat using the URIEncoding attribute but fail miserably on WebLogic 10. Invoking the URL http://localhost:7012/solr/select/?q=? returns the XML below:

RE: Multi word synonyms

2012-02-07 Thread Zac Smith
Are you able to explain how I would create another field to fit my scenario? -Original Message- From: O. Klein [mailto:kl...@octoweb.nl] Sent: Tuesday, February 07, 2012 1:28 PM To: solr-user@lucene.apache.org Subject: RE: Multi word synonyms Well, if you want both multi word and single

RE: Multi word synonyms

2012-02-07 Thread O. Klein
Well, if you want both multi word and single words I guess you will have to create another field :) Or make queries like you suggested. -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3724009.html Sent from the Solr - User mailing list archive at

Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Erik Hatcher
A custom tokenizer/tokenfilter could set the position increment when a newline comes through as well. Erik On Feb 7, 2012, at 15:28, Erick Erickson wrote: > Well, this is a common approach. Someone has to split up the > input as "sentences" (whatever they are). Putting them in multi-valued

RE: Multi word synonyms

2012-02-07 Thread Zac Smith
It doesn't seem to do it for me. My field type is: I am using edism

I want to specify multiple facet prefixes per field

2012-02-07 Thread Yuhao
I simulated a hierarchical faceting browsing scheme using facet.prefix.  However, it seems there can only be one facet.prefix per field.  For OR queries, the browsing scheme requires multiple facet prefixes.  For example: fq=facet1:term1 OR facet1:term2 OR facet1:term3 Something like the above

RE: Multi word synonyms

2012-02-07 Thread O. Klein
Isn't that what autoGeneratePhraseQueries="true" is for? -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3723886.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phonetic search and matching

2012-02-07 Thread Erick Erickson
Yes, you could do that. I guess numbers will give you trouble under all circumstances. You may be able to do something like search against your non- phonetic field with higher boosts to preferentially do those matches. Best Erick On Tue, Feb 7, 2012 at 2:30 PM, Dirk Högemann wrote: > Thanks Eri

Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Erick Erickson
Well, this is a common approach. Someone has to split up the input as "sentences" (whatever they are). Putting them in multi-valued fields is trivial. Then you confine things to within sentences, then you start searching phrases with a slop less than your incrementGap... Best Erick On Tue, Feb 7

RE: Multi word synonyms

2012-02-07 Thread Zac Smith
I suppose I could translate every user query to include the term with quotes. e.g. if someone searches for stock syrup I send a query like: q=stock syrup OR "stock syrup" Seems like a bit of a hack though, is there a better way of doing this? Zac -Original Message- From: Zac Smith Sent

Missing search result...

2012-02-07 Thread Tim Hibbs
Hi, all... I have a small problem retrieving the full set of query responses I need and would appreciate any help. I have a query string as follows: +((Title:"sales") (+Title:sales) (TOC:"sales") (+TOC:sales) (Keywords:"sales") (+Keywords:sales) (text:"sales") (+text:sales) (sales)) +(RepType:"W

Re: Phonetic search and matching

2012-02-07 Thread Dirk Högemann
Thanks Erick. In the first place we thought of removing numbers with a pattern filter. Setting inject to false will have the "same" effect If we want to be able to search for numbers in the content this solution will not work,but another field without phonetic filtering and searching in both fields

Re: Commit call - ReadTimeoutException -> usage scenario for big update requests and the ioexception case

2012-02-07 Thread Torsten Krah
Am 07.02.2012 15:12, schrieb Erick Erickson: Right, I suspect you're hitting merges. Guess so. How often are you committing? One time, after all work is done. In other words, why are you committing explicitly? It's often better to use commitWithin on the add command and just let Solr do i

Re: solrcore.properties

2012-02-07 Thread darul
Walter Underwood wrote > > Looking at SOLR-1335 and the wiki, I'm not quite sure of the final > behavior for this. > > These properties are per-core, and not visible in other cores, right? > > yes it is. Walter Underwood wrote > > > Are variables substituted in solr.xml, so I can swap in

Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Robert Brown
This all seems a bit too much work for such a real-world scenario? --- IntelCompute Web Design & Local Online Marketing http://www.intelcompute.com On Tue, 7 Feb 2012 05:11:01 -0800 (PST), Ahmet Arslan wrote: >> I'm still finding matches across >> newlines >> >> index... >> >> i am fluent >>

Re: Realtime profile data

2012-02-07 Thread Pawel Rog
Thank you. I'll try NRT and some post-filter :) On Tue, Feb 7, 2012 at 3:09 PM, Erick Erickson wrote: > You have several options: > 1> if you can go to trunk (bleeding edge, I admit), you can >     get into the near real time (NRT) stuff. > 2> You could maintain essentially a post-filter step wh

RE: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-02-07 Thread Shyam Bhaskaran
Hi Koji, I have tried using hl.bs.type=SENTENCE and still no improvement. We are storing PDF extracted content in the field which has termVectors enabled. Example the field contains the following data extracted from PDF "User-defined resolution functions. The synthesis tool only supports the r

Re: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-02-07 Thread Koji Sekiguchi
(12/02/08 0:50), Shyam Bhaskaran wrote: Hi, We are using Solr 4.0 along with FVH and there is an issue we are facing while highlighting. For our requirement we want the highlighted search result should start with the beginning of the sentence and needed help to get this done. As of now this i

Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-02-07 Thread Shyam Bhaskaran
Hi, We are using Solr 4.0 along with FVH and there is an issue we are facing while highlighting. For our requirement we want the highlighted search result should start with the beginning of the sentence and needed help to get this done. As of now this is not happening and the highlighted output

Re: Typical Cache Values

2012-02-07 Thread Erick Erickson
See below... On Tue, Feb 7, 2012 at 8:21 AM, Pranav Prakash wrote: > Based on the hit ratio of my caches, they seem to be pretty low. Here they > are. What are typical values of yours production setup? What are some of > the things that can be done to improve the ratios? > > queryResultCache > >

Re: Commit call - ReadTimeoutException -> usage scenario for big update requests and the ioexception case

2012-02-07 Thread Erick Erickson
Right, I suspect you're hitting merges. How often are you committing? In other words, why are you committing explicitly? It's often better to use commitWithin on the add command and just let Solr do its work without explicitly committing. Going forward, this is fixed in trunk by the DocumentWriter

Re: Realtime profile data

2012-02-07 Thread Erick Erickson
You have several options: 1> if you can go to trunk (bleeding edge, I admit), you can get into the near real time (NRT) stuff. 2> You could maintain essentially a post-filter step where your app maintains a list of deleted messages and removes them from the response. This will cause

Re: Improving performance for SOLR geo queries?

2012-02-07 Thread Erick Erickson
So the obvious question is "what is your performance like without the distance filters?" Without that knowledge, we have no clue whether the modifications you've made had any hope of speeding up your response times As for the docs, any improvements you'd like to contribute would be happily re

Re: Phonetic search and matching

2012-02-07 Thread Erick Erickson
What happens if you do NOT inject? Setting inject="false" stores only the phonetic reduction, not the original text. In that case your false match on "13" would go away Not sure what that means for the rest of your app though. Best Erick On Mon, Feb 6, 2012 at 5:44 AM, Dirk Högemann wrote:

Re: Symbols in synonyms

2012-02-07 Thread Erick Erickson
You're probably looking at a custom tokenizer and/or filter chain here. Or at least creatively combining the ones that exist. The admin/analysis page will be your friend. Even if you define these as synonyms, the rest of the analysis chain may break them up so you really have to look at the effect

Re: Parallel indexing in Solr

2012-02-07 Thread Per Steffensen
You could try to isolate the bottleneck by testing the indexing speed from the local machine hosting Solr. Also tools like iostat or sar might give you more details about the disk side. Yes, I am doing different stuff to isolate bottleneck. Im also profiling JVM. And I am using iostat, top a

Typical Cache Values

2012-02-07 Thread Pranav Prakash
Based on the hit ratio of my caches, they seem to be pretty low. Here they are. What are typical values of yours production setup? What are some of the things that can be done to improve the ratios? queryResultCache lookups : 3234602 hits : 496 hitratio : 0.00 inserts : 3234239 evictions : 323014

Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Ahmet Arslan
> I'm still finding matches across > newlines > > index... > > i am fluent > german racing > > search... > > "fluent german" > > Any suggestions?  You can use a multiValued field for this. Split your document according to new line at client side. i am fluent german racing positionIncreme

Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Robert Brown
I'm still finding matches across newlines index... i am fluent german racing search... "fluent german" Any suggestions? I've currently got this in wdftypes.txt for WordDelimiterfilterfactory \u000A => ALPHANUM \u000B => ALPHANUM \u000C => ALPHANUM \u000D => ALPHANUM # \u000D\u000A => ALPHA

Re: indexing data on solr

2012-02-07 Thread alessio crisantemi
ok, I try. but I think: If I Index a zip archive containing any pdf files and after, i search on solr a query, I see only the list of the pdf title into my archive, but it can't search into the single document.. I read on Tika documentation that "Package formats can contain multiple separate docu

Re: Parallel indexing in Solr

2012-02-07 Thread Sami Siren
On Mon, Feb 6, 2012 at 5:55 PM, Per Steffensen wrote: > Sami Siren skrev: > >> On Mon, Feb 6, 2012 at 2:53 PM, Per Steffensen >> wrote: >> >> >> >>> >>> Actually right now, I am trying to find our what my bottleneck is. The >>> setup >>> is more complex, than I would bother you with, but basicall

more sql-like commands for solr

2012-02-07 Thread Li Li
hi all, we have used solr to provide searching service in many products. I found for each product, we have to do some configurations and query expressions. our users are not used to this. they are familiar with sql and they may describe like this: I want a query that can search books whose

external jar as processor

2012-02-07 Thread nagarjuna
hi everybody i have the following entities. i added the jar file into webinf/lib folder and i dont know how to specify the field names in the schema.xml pls help me anybody http://test.xxx.com"; appKey="qto9gjtI68pi7JRxVZ8Z" lastUpdate="${dataimporter.last_index_time}" /> http://abcs.xxx.com"; p