Re: Search for misspelled words in corpus

2013-06-08 Thread Otis Gospodnetic
Hm, I was purposely avoiding mentioning ngrams because just ngramming all indexed tokens would balloon the index My assumption was that only *some* words are misspelled, in which case it may be better not to ngram all tokens Otis -- Solr & ElasticSearch Support http://sematext.com/ On

Velocity / Solritas not works in solr 4.3 and Tomcat 6

2013-06-08 Thread andy tang
*Could anyone help me to see what is the reason which Solritas page failed?* *I can go to http://localhost:8080/solr without problem, but fail to go to http://localhost:8080/solr/browse* *As below is the status report! Any help is appreciated.* *Thanks!* *Andy* * * *type* Status report *mess

Re: Search for misspelled words in corpus

2013-06-08 Thread Jagdish Nomula
Another theoretical answer for this question is ngrams approach. You can index the word and its trigrams. Query the index, by the string as well as its trigrams, with a % match search. You than pass the exhaustive resultset through a more expensive scoring such as Smith Waterman. Thanks, Jagdish

Re: load balancing internal Solr on Azure

2013-06-08 Thread Otis Gospodnetic
Hi Kevin, Would http://search-lucene.com/?q=LBHttpSolrServer work for you? Otis -- Solr & ElasticSearch Support http://sematext.com/ On Fri, May 24, 2013 at 3:12 PM, Kevin Osborn wrote: > We are looking install SolrCloud on Azure. We want it to be an internal > service. For some application

Re: HyperLogLog for Solr

2013-06-08 Thread Otis Gospodnetic
I have not heard of anyone using HLL in Solr, but: https://docs.google.com/presentation/d/1ESNiqd7HuIfuwXSSK81PAAu6AmEPEE0u_vyk4FU5x9o/present#slide=id.p https://github.com/ptdavteam/elasticsearch-approx-plugin Otis -- Solr & ElasticSearch Support http://sematext.com/ On Tue, May 28, 2013 at

Re: Note on The Book

2013-06-08 Thread Otis Gospodnetic
It's 2013 and people suffer from ADD. Break it up into a la carte chapter books. Otis -- Solr & ElasticSearch Support http://sematext.com/ On Wed, May 29, 2013 at 6:23 PM, Jack Krupansky wrote: > Markus, > > Okay, more pages it is! > > -- Jack Krupansky > > -Original Message- From:

Re: Search for misspelled words in corpus

2013-06-08 Thread Shashi Kant
n-grams might help, followed by a edit distance metric such as Jaro-Winkler or Smith-Waterman-Gotoh to further filter out. On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic wrote: > Interesting problem. The first thing that comes to mind is to do > "word expansion" during indexing. Kind of lik

Re: Search for misspelled words in corpus

2013-06-08 Thread Otis Gospodnetic
Interesting problem. The first thing that comes to mind is to do "word expansion" during indexing. Kind of like synonym expansion, but maybe a bit more dynamic. If you can have a dictionary of correctly spelled words, then for each token emitted by the tokenizer you could look up the dictionary a

Dataless nodes in SolrCloud?

2013-06-08 Thread Otis Gospodnetic
Hi, Is there a notion of a data-node vs. non-data node in SolrCloud? Something a la http://www.elasticsearch.org/guide/reference/modules/node/ Thanks, Otis Solr & ElasticSearch Support http://sematext.com/

Re: index merge question

2013-06-08 Thread Sourajit Basak
I have noticed that when I write a doc with an id that already exists, it creates a new revision with the only the fields from the second write. I guess there is a REST API in the latest solr version which updates only selected fields. In my opinion, merge should be creating a doc which is a union

Re: Custom Data Clustering

2013-06-08 Thread Otis Gospodnetic
Hello, This sounds like a custom SearchComponent. Which clustering library you want to use or DIY is up to you, but go with the SearchComponent approach. You will still need to process N hits, but you won't need to first send them all over the wire. Otis -- Solr & ElasticSearch Support http://se

Query-node+shard stickiness?

2013-06-08 Thread Otis Gospodnetic
Hi, Is there anything in SolrCloud that would support query-node/shard affinity/stickiness? What I mean by that is a mechanism that is smart enough to keep sending the same query X to the same node(s)+shard(s)... with the goal being better utilization of Solr and OS caches? Example: * Imagine a

Re: Help required with fq syntax

2013-06-08 Thread Kamal Palei
Though the syntax looks fine, but I get all the records. As per example given above I get all the documents, meaning filtering did not work. I am curious to know if my indexing went fine or not. I will check and revert back. On Sun, Jun 9, 2013 at 7:21 AM, Otis Gospodnetic wrote: > Try: > > ...

Re: Help required with fq syntax

2013-06-08 Thread Kamal Palei
Also please note that for some documents, blocked_company_ids may not be present as well. In such cases that document should be present in search result as well. BR, Kamal On Sun, Jun 9, 2013 at 7:07 AM, Kamal Palei wrote: > Dear All > I have a multi-valued field blocked_company_ids in index.

Re: Help required with fq syntax

2013-06-08 Thread Otis Gospodnetic
Try: ...&q=*:*&fq=-blocked_company_ids:5 Otis -- Solr & ElasticSearch Support http://sematext.com/ On Sat, Jun 8, 2013 at 9:37 PM, Kamal Palei wrote: > Dear All > I have a multi-valued field blocked_company_ids in index. > > You can think like > > 1. document1 , blocked_company_ids: 1, 5, 7

Help required with fq syntax

2013-06-08 Thread Kamal Palei
Dear All I have a multi-valued field blocked_company_ids in index. You can think like 1. document1 , blocked_company_ids: 1, 5, 7 2. document2 , blocked_company_ids: 2, 6, 7 3. document3 , blocked_company_ids: 4, 5, 6 and so on . If I want to retrieve all the documents where blocked_compan

Re: does solr support query time only stopwords?

2013-06-08 Thread Otis Gospodnetic
Maybe returned hits match other query terms. Otis Solr & ElasticSearch Support http://sematext.com/ On Jun 8, 2013 6:34 PM, "jchen2000" wrote: > I wanted to analyze high frequency terms using Solr's Luke request handler > and keep updating the stopwords file for new queries from time to time. >

Re: Entire query is stopwords

2013-06-08 Thread Jan Høydahl
Remove the stopFilter from the "index" section of your fieldType, only keep it in the "query" section. This way your stopwords will always be indexed and edismax will be able to selectively remove stopwords from the query depending on whether all words are stopwords or not. -- Jan Høydahl, sear

does solr support query time only stopwords?

2013-06-08 Thread jchen2000
I wanted to analyze high frequency terms using Solr's Luke request handler and keep updating the stopwords file for new queries from time to time. Obviously I have to index all terms whether they belong to stopwords list or not. So I configured query analyzer stopwords list but disabled index anal

Re: Lucene/Solr Filesystem tunings

2013-06-08 Thread Mark Miller
Turning swappiness down to 0 can have some decent performance impact. - http://en.wikipedia.org/wiki/Swappiness In the past, I've seen better performance with ext3 over ext4 around commits/fsync. Test were actually enough slower (lots of these operations), that I made a special ext3 partition w

Re: index merge question

2013-06-08 Thread Mark Miller
On Jun 8, 2013, at 12:52 PM, Jamie Johnson wrote: > When merging through the core admin ( > http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy for > conflicts during the merge? So for instance if I am merging core 1 and > core 2 into core 0 (first example), what happens if core

index merge question

2013-06-08 Thread Jamie Johnson
When merging through the core admin ( http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy for conflicts during the merge? So for instance if I am merging core 1 and core 2 into core 0 (first example), what happens if core 1 and core 2 both have a document with the same key, say core

Re: custom field tutorial

2013-06-08 Thread Jack Krupansky
Usually, people want to do the opposite - store the numeric code as a numeric field for perceived efficiency and let the user query and view results with the text form. But, there isn't any evidence of any great performance benefit of doing so - just store the string code in a string field. A