Re: Customizing results

2009-06-04 Thread Fergus McMenemie
>Generally a good idea, but be prepared to entertain requests that should >also ask you to be able to perform the query using those aliases. I mean >when you talk about something "similar" to aliases in SQL, those aliases can >be used in SQL scripts in the where clause too. > >Cheers >Avlesh I am

Re: timeouts

2009-06-04 Thread James liu
*Collins: *i don't know what u wanna say? -- regards j.L ( I live in Shanghai, China)

Re: using UpdateRequestProcessor from a custom analyzer

2009-06-04 Thread Shalin Shekhar Mangar
On Fri, Jun 5, 2009 at 5:54 AM, Kir4 wrote: > > Is it possible to create a custom analyzer (index time) that uses > UpdateRequestProcessor to add new fields to posts, based on the tokens > generated by the other analyzers that have been run (before my custom > analyzer)? No, UpdateRequestProces

Re: Customizing results

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Jun 5, 2009 at 10:20 AM, Avlesh Singh wrote: > Generally a good idea, but be prepared to entertain requests that should > also ask you to be able to perform the query using those aliases. I mean > when you talk about something "similar" to aliases in SQL, those aliases can > be used in SQL

Re: Customizing results

2009-06-04 Thread Avlesh Singh
Generally a good idea, but be prepared to entertain requests that should also ask you to be able to perform the query using those aliases. I mean when you talk about something "similar" to aliases in SQL, those aliases can be used in SQL scripts in the where clause too. Cheers Avlesh 2009/6/5 Nob

Re: Customizing results

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
Hi Otis, is it a good idea to provide as aliasing feature for Solr similar to the SQL 'as' in SQL we can do select location_da_dk as location Solr may have fl.alias=location_da_dk:location --Noble On Fri, Jun 5, 2009 at 3:10 AM, Otis Gospodnetic wrote: > > Aha, so you really want to ren

Re: how to do exact serch with solrj

2009-06-04 Thread Avlesh Singh
And the field should be of type, text, right Otis? Does one still need those "anchors" if the type is string with the filters you suggested? Cheers Avlesh On Fri, Jun 5, 2009 at 6:35 AM, Otis Gospodnetic wrote: > > I re-read your original request. Here is the recipe that should work: > > * Def

Re: Customizing results

2009-06-04 Thread Avlesh Singh
Nice suggestion Noble! If you are using SolrJ, then this particular binding can be an answer to your question. Cheers Avlesh 2009/6/5 Noble Paul നോബിള്‍ नोब्ळ् > How are you accessing Solr? SolrJ? > > does this help? > https://issues.apache.org/jira/browse/SOLR-1129 > > On Fri, Jun 5, 2009 at 3

Re: Customizing results

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
How are you accessing Solr? SolrJ? does this help? https://issues.apache.org/jira/browse/SOLR-1129 On Fri, Jun 5, 2009 at 3:00 AM, Manepalli, Kalyan wrote: > Otis, >        With that solution, the client has to accept all type location fields > (location_de_de, location_it_it). I want to copy t

Re: Determining Search Query Category

2009-06-04 Thread Walter Underwood
Can you analyze the logs to see which categories people choose for each query? When there are enough queries and a clear preference, you can highlight that choice. wunder On 6/4/09 9:21 PM, "Avlesh Singh" wrote: > If you haven't already given this a thought, you may want to try out an > auto-co

Re: Index Comma Separated numbers

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
did you try the NumberFormatTransformer ? On Fri, Jun 5, 2009 at 12:09 AM, Jianbin Dai wrote: > > Hi, One of the fields to be indexed is price which is comma separated, e.g., > 12,034.00.  How can I indexed it as a number? > I am using DIH to pull the data. Thanks. > > > > > -- -

Re: Determining Search Query Category

2009-06-04 Thread Avlesh Singh
If you haven't already given this a thought, you may want to try out an auto-complete feature, suggesting those categories upfront. Cheers Avlesh On Fri, Jun 5, 2009 at 3:56 AM, ram_sj wrote: > > Hi, > > I have more than 20 categories for my search application. I'm interested in > finding the c

Re: Which caches should use the solr.FastLRUCache

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Jun 4, 2009 at 11:29 PM, Robert Purdy wrote: > > Thanks for the Good information :) Well I haven't had any evictions in any of > the caches in years, but the hit ratio is 0.51 in queryResultCache, 0.77 in > documentCache, 1.00 in the fieldValueCache, and 0.99 in the filterCache. So > in yo

Re: indexing Chienese langage

2009-06-04 Thread James liu
On Mon, Feb 16, 2009 at 4:30 PM, revathy arun wrote: > Hi, > > When I index chinese content using chinese tokenizer and analyzer in solr > 1.3 ,some of the chinese text files are getting indexed but others are not. > are u sure ur analyzer can do it good? if not sure, u can use analzyer link in

Re: indexing Chienese langage

2009-06-04 Thread James liu
first: u not have to restart solr,,,u can use new data to replace old data and call solr to use new search..u can find something in shell script which with solr two: u not have to restart solr,,,just keep id is same..example: old id:1,title:hi, new id:1,title:welcome,,just index new data,,it will

Re: how to do exact serch with solrj

2009-06-04 Thread Otis Gospodnetic
I re-read your original request. Here is the recipe that should work: * Define new field type that: Uses KeywordTokenizer Uses LowerCaseFilter * Make your field be of the above type. * Use those begin/end anchor characters at index and search time. I believe that should work. Please tr

Re: how to do exact serch with solrj

2009-06-04 Thread Otis Gospodnetic
I don't think there is anything ready to be used in Solr (but would be easy to add), but if you indexed your with a custom "beginning of string" and "end of string" anchors, you'll be able to get your exact matching working. For example, convert "hello the world" to "$hello the world$" before i

Re: Determining Search Query Category

2009-06-04 Thread Otis Gospodnetic
Ram, Typical queries are short, so they are hard to categorize using statistical approaches. Maybe categorization of queries would work with a custom set of rules applied to queries? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: ram_

using UpdateRequestProcessor from a custom analyzer

2009-06-04 Thread Kir4
Is it possible to create a custom analyzer (index time) that uses UpdateRequestProcessor to add new fields to posts, based on the tokens generated by the other analyzers that have been run (before my custom analyzer)? The content of said fields must differ from post to post based on the tokens ext

Re: Questions regarding IT search solution

2009-06-04 Thread Jeff Hammerbacher
Hey, Your system sounds similar to the work don by Stu Hood at Rackspace in their Mailtrust unit. See http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-datafor more details and inspiration. Regards, Jeff On Thu, Jun 4, 2009 at 4:58 PM, wrote: > Hi, > This i

Re: Questions regarding IT search solution

2009-06-04 Thread silentsurfer77
Hi, This is encouraging to know that solr/lucene solution may work. Can anyone using solr/lucene for such scenario can confirm that the solution is used and working fine? That would be really helpful, as I just started looking into the solr/lucene solution only couple of days back and might be di

Re: Field Compression

2009-06-04 Thread Fer-Bj
Here is what we have: for all the documents we have a field called "small_body" , which is a 60 chars max text field that were we store the "abstract" for each article. We have about 8,000,000 documents indexed, and usually we display this small_body on our "listing pages". For each listing pa

Re: indexing Chienese langage

2009-06-04 Thread Fer-Bj
What we usually do to reindex is: 1. stop solr 2. rmdir -r data (that is to remove everything in /opt/solr/data/ 3. mkdir data 4. start solr 5. start reindex. with this we're sure about not having old copies or index.. To check the index size we do: cd data du -sh Otis Gospodnetic wro

Re: how to do exact serch with solrj

2009-06-04 Thread Jianbin Dai
I still have a problem with exact matching. query.setQuery("title:\"hello the world\""); This will return all docs with title containing "hello the world", i.e., "hello the world, Jack" will also be matched. What I want is exactly "hello the world". Setting this field to string instead of text

Determining Search Query Category

2009-06-04 Thread ram_sj
Hi, I have more than 20 categories for my search application. I'm interested in finding the category of query entered by user dynamically instead of asking the user to filter the results through long list of categories. Its a general question, its not specific to solr though, any suggestion abo

Re: Questions regarding IT search solution

2009-06-04 Thread Otis Gospodnetic
My guess is Solr/Lucene would work. Not sure how well/fast, but it would, esp. if you avoid range queries (or use tdate), and esp. if you shard/segment indices smartly, so that at query time you send (or distribute if you have to) the query to only those shards that have the data (if your quer

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer
Hi, As Alex correctly pointed out my main intention is to figure out whether Solr/lucene offer functionalities to replicate what Splunk is doing in terms of building indexes etc for enabling search capabilities. We evaluated Splunk, but it is not very cost effective solution for us as we may hav

Re: Customizing results

2009-06-04 Thread Otis Gospodnetic
Aha, so you really want to rename the field at response time? I wonder if this is something that could be done with (or should be added to) response writers. That's where I'd go look first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > Fro

Re: indexing Chienese langage

2009-06-04 Thread Otis Gospodnetic
I can't tell what that analyzer does, but I'm guessing it uses n-grams? Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629 instead? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Fer-Bj > To: solr-user@lucene.apache.

Re: Is there Downside to a huge synonyms file?

2009-06-04 Thread Yonik Seeley
On Tue, Jun 2, 2009 at 11:28 PM, anuvenk wrote: > I'm using query time synonyms. These don't currently work if the synonyms expand to more than one option, and those options have a different number of words. -Yonik http://www.lucidimagination.com

RE: Customizing results

2009-06-04 Thread Manepalli, Kalyan
Otis, With that solution, the client has to accept all type location fields (location_de_de, location_it_it). I want to copy the result into "location" field, so that client can just accept location. Thanks, Kalyan Manepalli -Original Message- From: Otis Gospodnetic [mailto:otis_

Re: Questions regarding IT search solution

2009-06-04 Thread Alexandre Rafalovitch
I would also be interested to know what other existing solutions exist. Splunk's advantage is that it does extraction of the fields with advanced searching functionality (it has lexers/parsers for multiple content types). I believe that's the Solr's function desired in original posting. At the tim

Re: Customizing results

2009-06-04 Thread Otis Gospodnetic
Hello, If you know what language the user specified (or is associated with), then you just have to ensure the "fl" URL parameter contain that field (and any other fields you want returned). So if the language/locale is de_de, then make sure the request has fl=location_de_de,another_field,anot

Re: How to disable posting updates from a remote server

2009-06-04 Thread Eric Pugh
Take a look at the security section in the wiki, u could do this with firewall rules or password access. On Thursday, June 4, 2009, ashokc wrote: > > Hi, > > I find that I am freely able to post to my production SOLR server, from any > other host that can run the post command. So somebody can wip

How to disable posting updates from a remote server

2009-06-04 Thread ashokc
Hi, I find that I am freely able to post to my production SOLR server, from any other host that can run the post command. So somebody can wipe out the whole index by posting a delete query. Is there a way SOLR can be configured so that it will take updates ONLY from the server on which it is runn

Re: Faceting on text fields

2009-06-04 Thread Yao Ge
Yes. I am using 1.3. When is 1.4 due for release? Yonik Seeley-2 wrote: > > Are you using Solr 1.3? > You might want to try the latest 1.4 test build - faceting has changed a > lot. > > -Yonik > http://www.lucidimagination.com > > On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge wrote: >> >> I am ind

Index Comma Separated numbers

2009-06-04 Thread Jianbin Dai
Hi, One of the fields to be indexed is price which is comma separated, e.g., 12,034.00. How can I indexed it as a number? I am using DIH to pull the data. Thanks.

Re: HashDocSet's maxSize and loadFactor

2009-06-04 Thread Yonik Seeley
On Thu, Jun 4, 2009 at 7:52 AM, Marc Sturlese wrote: > Hey there, I am trying to optimize the setup of hasDocSet. Be aware that in the latest versions of Solr 1.4, HashDocSet is no longer used by Solr. https://issues.apache.org/jira/browse/SOLR-1169 > Have read the documentation here: > http://w

Re: Which caches should use the solr.FastLRUCache

2009-06-04 Thread Robert Purdy
Thanks for the Good information :) Well I haven't had any evictions in any of the caches in years, but the hit ratio is 0.51 in queryResultCache, 0.77 in documentCache, 1.00 in the fieldValueCache, and 0.99 in the filterCache. So in your opinion should the documentCache and queryResultCache use th

Re: Faceting on text fields

2009-06-04 Thread Yonik Seeley
Are you using Solr 1.3? You might want to try the latest 1.4 test build - faceting has changed a lot. -Yonik http://www.lucidimagination.com On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge wrote: > > I am index a database with over 1 millions rows. Two of fields contain > unstructured text but size of e

Customizing results

2009-06-04 Thread Manepalli, Kalyan
Hi, I am trying to customize the response that I receive from Solr. In the index I have multiple fields that contain the same data in different language. At the query time client specifies the language. Based on this param, I want to return the value, copied into a different field. E

Re: SpellCheckComponent: queryAnalyzerFieldType

2009-06-04 Thread Shalin Shekhar Mangar
On Thu, Jun 4, 2009 at 7:24 PM, Michael Ludwig wrote: > Shalin Shekhar Mangar wrote: > > | If you use spellcheck.q parameter for specifying > | the spelling query, then the field's analyzer will > | be used [...] If you use the q parameter, then the > | SpellingQueryConverter is used. > > http://

Re: Field Compression

2009-06-04 Thread Grant Ingersoll
On Jun 4, 2009, at 6:42 AM, Erick Erickson wrote: It *will* cause performance issues if you load that field for a large number of documents on a particular search. I know Lucene itself has lazy field loading that helps in this case, but I don't know how to persuade SOLR to use it (it may even

statistics about word distances in solr

2009-06-04 Thread Jens Fischer
Hi, I was wondering if there's an option to return statistics about distances from the query terms to the most frequent terms in the result documents. At present I return the most frequent terms using facetSearch which returns for each word in the result documents the number ob occurences (wit

Faceting on text fields

2009-06-04 Thread Yao Ge
I am index a database with over 1 millions rows. Two of fields contain unstructured text but size of each fields is limited (256 characters). I come up with an idea to use visualize the text fields using text cloud by turning the two text fields in facets. The weight of font and size is of each

Re: Questions regarding IT search solution

2009-06-04 Thread Walter Underwood
Why build one? Don't those already exist? Personally, I'd start with Hadoop instead of Solr. Putting logs in a search index is guaranteed to not scale. People were already trying different approaches ten years ago. wunder On 6/4/09 8:41 AM, "Silent Surfer" wrote: > Hi, > Any help/pointers on t

Re: Which caches should use the solr.FastLRUCache

2009-06-04 Thread Yonik Seeley
2009/6/4 Noble Paul നോബിള്‍ नोब्ळ् : > FastLRUCache is designed to be lock free so it is well suited for > caches which are hit several times in a request. I guess there is no > harm in using FastLRUCache across all the caches. Gets are cheaper, but evictions are more expensive. If the cache hit

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer
Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer wrote: From: Silent Surfer Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer
Hi, Any help/pointers on the following message would really help me.. Thanks,Surfer --- On Tue, 6/2/09, Silent Surfer wrote: From: Silent Surfer Subject: Questions regarding IT search solution To: solr-user@lucene.apache.org Date: Tuesday, June 2, 2009, 5:45 PM Hi, I am new to Lucene forum and

Re: spell checking

2009-06-04 Thread Walter Underwood
"query suggest" --wunder On 6/4/09 1:25 AM, "Michael Ludwig" wrote: > Yao Ge schrieb: > >> Maybe we should call this "alternative search terms" or >> "suggested search terms" instead of spell checking. It is >> misleading as there is no right or wrong in spelling, there >> is only popular (term

SpellCheckComponent: queryAnalyzerFieldType

2009-06-04 Thread Michael Ludwig
Shalin Shekhar Mangar wrote: | If you use spellcheck.q parameter for specifying | the spelling query, then the field's analyzer will | be used [...] If you use the q parameter, then the | SpellingQueryConverter is used. http://markmail.org/message/k35r7qmpatjvllsc - message http://markmail.org/t

Re: Field Compression

2009-06-04 Thread Erick Erickson
Warning: This is from a Lucene perspective I don't think it matters. I'm pretty sure that COMPRESS onlyapplies to *storing* the data, not putting the tokens in the index (this latter is what's serached)... It *will* cause performance issues if you load that field for a large number of document

Re: indexing Chienese langage

2009-06-04 Thread Erick Erickson
Hmmm, are you quite sure that you emptied the index first and didn'tjust add all the documents a second time to the index? Also, when you say the index almost doubled, were you looking only at the size of the *directory*? SOLR might have been holding a copy of the old index open while you built a

HashDocSet's maxSize and loadFactor

2009-06-04 Thread Marc Sturlese
Hey there, I am trying to optimize the setup of hasDocSet. Have read the documentation here: http://wiki.apache.org/solr/SolrPerformanceFactors#head-2de2e9a6f806ab8a3afbd73f1d99ece48e27b3ab But can't exactly understand it. Does it mean that the maxSize should be 0.005 x NumberDocsOfMyIndex or that

Re: spell checking

2009-06-04 Thread Michael Ludwig
Yao Ge schrieb: Maybe we should call this "alternative search terms" or "suggested search terms" instead of spell checking. It is misleading as there is no right or wrong in spelling, there is only popular (term frequency?) alternatives. I had exactly the same difficulty in understanding the c

Re: Field Compression

2009-06-04 Thread Fer-Bj
Is it correct to assume that using field compression will cause performance issues if we decide to allow search over this field? ie: if I decide to add "compressed=true" to the BODY field... and a I allow search on body... would that be a problem? At the same time: if I add compress