Re: performance between ExternalFileField and Join

2012-03-01 Thread Tommaso Teofili
Also regarding the Join functionality I remember Yonik pointed out it's O(# unique terms) but I agree with Erik on the ExternalFileField as you can use it just inside a function query, for example, for boosting. Tommaso 2012/3/1 Erick Erickson > Hmmm. ExternalFileFields can only be float values,

Architectural question structuring solr, multiple instances or filters

2012-03-01 Thread Ramo Karahasan
Hi I face the issue that i have n business-user. Each business-user has it's own amount products. I want to provide an interface for each business-user where he can find only the products he offers. What would be a be a better solution: 1.)To have one big index and filter by customer-name

Re: Making additional solr requests in an QueryResponseWriter

2012-03-01 Thread Mikhail Khludnev
Hello Donnie, 1. Nothing beside of design consideration prevents you form doing search in QueryResponseWriter. You have a request, which isn't closed yet, where you can obtain searcher from. 2. Your usecase isn't clear. If you need just to search categories, and return the lists of subcategories p

Re: Couple issues with edismax in 3.5

2012-03-01 Thread Way Cool
Thanks Ahmet! That's good to know someone else also tried to make phrase queries to fix multi-word synonym issue. :-) On Thu, Mar 1, 2012 at 1:42 AM, Ahmet Arslan wrote: > > I don't think mm will help here because it defaults to 100% > > already by the > > following code. > > Default behavior

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-01 Thread Koji Sekiguchi
(12/03/02 6:05), Ahmet Arslan wrote: I have the same problem. This happens only for some documents in the index. Andrew, can you provide a document string and a query pair? I will try to re-produce the exception. Then we can create a test case that fails. Others can look into it. +1. Please

Search by url starting with

2012-03-01 Thread lackadaisical
Hi, I am sorry if this has already been posted. I am new to the solr. I am crawling my site using Nutch and posting it to Solr. I am trying to implement a feature where I want to get all data where url starts with "http://someurl/"; Any thoughts? Thanks, Stan -- View this message in contex

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Mark Miller
> I assuming the windows configuration looked correct? Yeah, so far I can not spot any smoking gun...I'm confounded at the moment. I'll re read through everything once more... - Mark

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Matthew Parker
I reindex every time I change something. I also delete any zookeeper data too. I assuming the windows configuration looked correct? On Thu, Mar 1, 2012 at 3:39 PM, Mark Miller wrote: > P.S. FYI you will have to reindex after adding _version_ back the schema... > > On Mar 1, 2012, at 3:35 PM, M

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Matthew Parker
I tried publishing to /update/extract request handler using manifold, but got the same result. I also tried swapping out the replication handlers too, but that didn't do anything. Otherwise, that's it. On Thu, Mar 1, 2012 at 3:35 PM, Mark Miller wrote: > Any other customizations you are making

Using MLT Handler to find similar documents but also filter similar documents by a keyword.

2012-03-01 Thread Ravish Bhagdev
Hi, Apologies if this has been answered before, I tried searching for it and didn't find anything answering this exactly. I want to find similar documents using MLT Handler using some specified fields but I want to filter down the returned matches with some keywords as well. I looked at the exam

Re: Too many values for UnInvertedField faceting on field topic

2012-03-01 Thread Yonik Seeley
On Thu, Mar 1, 2012 at 3:34 AM, Michael Jakl wrote: > The topic field holds roughly 5 > values per doc, but I wasn't able to compute the correct number right > now. How many unique values for that field in the whole index? If you have log output (or output from the stats page for fieldValueCache)

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Ahmet Arslan
> @iorixxx: Where can I find that > example schema.xml? Please find text_general_rev at http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/schema.xml > And when I find it, can I just make the title field which > currently is of > "text" type then of "text_rev" type? Yes,

Making additional solr requests in an QueryResponseWriter

2012-03-01 Thread Donnie McNeal
Hi all, The documents in our solr index have an parent child relationship which we have basically flattened in our solr queries. We have messaged solr into being the query API for a 3rd party data. The relationship is simple parent-child relationship as follows: category +-sub-category this ult

Re: alphanumeric buckets

2012-03-01 Thread Emmanuel Espina
Only one interval? in that case you could add a filter query and facet in the regular way. That is: facet.field=person&fq=person:[A TO C] But consider that you will get the search results that include those persons only. Thanks Emmanuel 2012/3/1 AlexR : > Hi > > i need to build buckets with al

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Erick Erickson
On frequent method of doing leading and trailing wildcards is to use ngrams (as distinct from edgengrams). That in combination with phrase queries might work well in this case. You also might be surprised at how little space bigrams take, give it a test and see .. Best Erick On Thu, Mar 1, 2012

Re: Modify Standalone solr server to use it application without http request

2012-03-01 Thread Erick Erickson
I'm really confused here. Your first question seemed to be about http involved in index replication, which really doesn't seem to be related to your latest post. Can you start over from the beginning? Best Erick On Thu, Mar 1, 2012 at 9:56 AM, Neel wrote: > Hi Erick, Thanks for your post. > > We

Re: Solr Design question on spatial search

2012-03-01 Thread Venu Gmail Dev
I don't think Spatial search will fully fit into this. I have 2 approaches in mind but I am not satisfied with either one of them. a) Have 2 separate indexes. First one to store the information about all the cities and second one to store the retail stores information. Whenever user searches fo

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread PeterKerk
@iorixxx: Where can I find that example schema.xml? I downloaded the latest version here: ftp://apache.mirror.easycolocate.nl//lucene/solr/3.5.0 And checked \example\example-DIH\solr\db\conf\schema.xml But no text_rev type is defined in there. And when I find it, can I just make the title field w

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Ahmet Arslan
--- On Thu, 3/1/12, PeterKerk wrote: > From: PeterKerk > Subject: Re: Need tokenization that finds part of stringvalue > To: solr-user@lucene.apache.org > Date: Thursday, March 1, 2012, 6:59 PM > @iorixxx: yes, that is what I need. > But also when its IN the text, not > necessarily at the begi

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-01 Thread Ahmet Arslan
> I have the same problem. This happens > only for some documents in the index. Andrew, can you provide a document string and a query pair? I will try to re-produce the exception. Then we can create a test case that fails. Others can look into it.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Mark Miller
P.S. FYI you will have to reindex after adding _version_ back the schema... On Mar 1, 2012, at 3:35 PM, Mark Miller wrote: > Any other customizations you are making to solrconfig? > > On Mar 1, 2012, at 1:48 PM, Matthew Parker wrote: > >> Added it back in. I still get the same result. >> >> On

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Mark Miller
Any other customizations you are making to solrconfig? On Mar 1, 2012, at 1:48 PM, Matthew Parker wrote: > Added it back in. I still get the same result. > > On Wed, Feb 29, 2012 at 10:09 PM, Mark Miller wrote: > Do you have a _version_ field in your schema? I actually just came back to > this

Re: search.highlight.InvalidTokenOffsetsException in Solr 3.5

2012-03-01 Thread andrew
I have the same problem. This happens only for some documents in the index. Like sharadgaur, the problem ceased when I removed ReversedWildcardFilterFactory from my analysis chain, HTMLStripCharFilterFactory has been there before and after. I am running branch-3.6 r1238628. As far as I can tell,

Simple poll

2012-03-01 Thread ku3ia
Hi, all! It may be seems strange, but can you who read this post answer at some questions. I want to understand, that maybe I want to much from my Solr, so: 1) Solr version; 2) Summary doc count; 3) Shards count (if exists); 4) rows count at query (from ... into); 5) Average queries per minute (QP

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Matthew Parker
Added it back in. I still get the same result. On Wed, Feb 29, 2012 at 10:09 PM, Mark Miller wrote: > Do you have a _version_ field in your schema? I actually just came back to > this thread with that thought and then saw your error - so that remains my > guess. > > I'm going to improve the doc

alphanumeric buckets

2012-03-01 Thread AlexR
Hi i need to build buckets with alphanumeric values. for example: facet.field=person person: Alex(10), Ben(5), George(8), Paul(3), Peter(2), Stefan(9) now i need all person in the interval of A-C with facet.query=person[A TO C] i only get the number of matches (15) but i wanna have the values

RE: Spelling Corrector Algorithm

2012-03-01 Thread Husain, Yavar
Thanks Robert. Yes thats right I can get some more accuracy if I use transposition in addition to substitution, insert and deletion. From: Robert Muir [rcm...@gmail.com] Sent: Thursday, March 01, 2012 9:50 PM To: solr-user@lucene.apache.org Subject: Re: Spe

RE: Spelling Corrector Algorithm

2012-03-01 Thread Husain, Yavar
Thanks James. I loved the last line in your mail "But in the end, especially with 1-word queries, I doubt even the best algorithms are going to always accurately guess what the user wanted." Absolutely I agree to this; if it is a phrase (instead of single word) then probably we can apply some N

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread PeterKerk
@iorixxx: yes, that is what I need. But also when its IN the text, not necessarily at the beginning. So using the * character like: q=smart* the product is found, but when I do this: q=*mart* it isnt...why is that? -- View this message in context: http://lucene.472066.n3.nabble.com/Need-toke

Re: Spelling Corrector Algorithm

2012-03-01 Thread Robert Muir
On Thu, Mar 1, 2012 at 6:43 AM, Husain, Yavar wrote: > Hi > > For spell checking component I set extendedResults to get the frequencies and > then select the word with the best frequency. I understand the spell check > algorithm based on Edit Distance. For an example: > > Query to Solr: Marien >

Re: handling case insensitive and regex

2012-03-01 Thread Ahmet Arslan
> but the following doesn't work. > TESTING* Please see the following writeups: http://wiki.apache.org/solr/MultitermQueryAnalysis http://www.lucidimagination.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Ahmet Arslan
> if title holds "smartphone" I want it to be found when > someone types > "martph" or "smar" or "smart". Peter, so you want to beginsWith startsWith type of search? You can use use wildcard search (with start operator) for this. e.g. &q=smar* Alternatively, if your index size is not huge, you

RE: Need tokenization that finds part of stringvalue

2012-03-01 Thread Dyer, James
Speaking of which, there is a spellchecker in jira that will detect word-break errors like this. See "WordBreakSpellChecker" at https://issues.apache.org/jira/browse/LUCENE-3523 . To use it with Solr, you'd also need to apply SOLR-2993 (https://issues.apache.org/jira/browse/SOLR-2993). This S

RE: Spelling Corrector Algorithm

2012-03-01 Thread Dyer, James
Yavar, When you listed what the spell checker returns you put them in this order: > Marine (Freq: 120), Market (Freq: 900) and others Was "Marine" listed first, and then did you pick "Market" because you thought higher frequency is better? If so, you probably have the right settings already b

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Walter Underwood
I once used a spell checker to break up compound words. It was slow, but worked pretty well. wunder On Mar 1, 2012, at 5:53 AM, Erick Erickson wrote: > Right, there's nothing in Solr that I know of that'll help here. How would > a tokenizer understand that "smartphone" should be "smart" "phone"

errata for solr tutorial

2012-03-01 Thread Nicolai Scheer
Hi! Having just worked through the solr tutorial (http://lucene.apache.org/solr/tutorial.html) I think I found two minor "bugs": 1. The "delete by query" example java -Ddata=args -jar post.jar "" should read java -Ddata=args -jar post.jar "name:DDR" 2. The link to the mailing lists at the end

Re: flashcache and solr/lucene

2012-03-01 Thread Robert Stewart
Any segment files on SSD will be faster in cases where the file is not in OS cache. If you have enough RAM a lot of index segment files will end up in OS system cache so it wont have to go to disk anyway. Since most indexes are bigger than RAM an SSD helps a lot. But if index is much larger than

Re: Modify Standalone solr server to use it application without http request

2012-03-01 Thread Neel
Hi Erick, Thanks for your post. We are not directly providing search result from lucene index to user. We are processing the lucene search result and adding additional information to it by getting from different sources[from other lunce indexes or from databases]. So, consuming search results fro

handling case insensitive and regex

2012-03-01 Thread Neil Hart
I'm just starting out... for either testing QA TESTING QA I can query with the following strings and find my text: testing TESTING testing* but the following doesn't work. TESTING* any ideas? thanks Neil

Re: AW: AW: Problem using double quotes in search string

2012-03-01 Thread Ahmet Arslan
> what about, if a search string starts with "$o$" ? this is > not recognized by > dismax too, right? Is there another filter I have to use? I don't fully follow your question but it seems that you want to search special characters too? With raw or term query parser plugin you can do that. htt

AW: AW: Problem using double quotes in search string

2012-03-01 Thread Ramo Karahasan
Hi, what about, if a search string starts with "$o$" ? this is not recognized by dismax too, right? Is there another filter I have to use? Thanks, Ramo -Ursprüngliche Nachricht- Von: Ahmet Arslan [mailto:iori...@yahoo.com] Gesendet: Donnerstag, 1. März 2012 12:44 An: solr-user@lucene.a

Re: searching top matches of each facet

2012-03-01 Thread Paul
Perfect! Thanks! On Wed, Feb 29, 2012 at 3:29 PM, Emmanuel Espina wrote: > I think that what you want is FieldCollapsing: > > http://wiki.apache.org/solr/FieldCollapsing > > For example > &q=my search&group=true&group.field=subject&group.limit=5 > > Test it to see if that is what you want. > > Th

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread PeterKerk
I think I didnt explain myself clearly: I need to be able to find substrings. So, its not that I'd expect Solr to find synonyms, but rather if a piece of text contains the searched text, for example: if title holds "smartphone" I want it to be found when someone types "martph" or "smar" or "smart"

Re: Need tokenization that finds part of stringvalue

2012-03-01 Thread Erick Erickson
Right, there's nothing in Solr that I know of that'll help here. How would a tokenizer understand that "smartphone" should be "smart" "phone"? There's no general solution for this issue. You can do domain-specific solutions with synonyms for instance, or some other word list that contains terms yo

Re: Modify Standalone solr server to use it application without http request

2012-03-01 Thread Erick Erickson
Currently, the page you referenced here: http://wiki.apache.org/solr/SolrReplication is the standard way to replicate incremental indexes. You say your "worried about the extra http". Why? Do you have any evidence that this would be a problem? Http isn't inherently inefficient at all, and even if

Re: performance between ExternalFileField and Join

2012-03-01 Thread Erick Erickson
Hmmm. ExternalFileFields can only be float values, so I'm not sure "the necessary data" is straight-forward. Additionally, they are used in function queries. Does this still work? I really don't know the performance characteristics if, say, you have users with access to all documents for SOLR-2272

Re: [SoldCloud] leaking file descriptors

2012-03-01 Thread Markus Jelsma
On Thursday 01 March 2012 13:03:18 Bernd Fehling wrote: > What is netstat telling you about the connections on the servers? > > Any connections in "CLOSE_WAIT" (passive close) hanging? I can't tell exact numbers right now but there were a lot between all the cores and the indexing clients. >

Re: [SoldCloud] leaking file descriptors

2012-03-01 Thread Sami Siren
Do you have autocommit enabled? I tested this with 1m docs indexed by using the default example config and saw used file descriptors go up to 2400 (did not come down even after the final commit at the end). Then I disabled autocommit, reindexed and the descriptor count stayed pretty much flat at ar

flashcache and solr/lucene

2012-03-01 Thread dan sutton
Hi, Just wondering if anyone had any experience with solr and flashcache [https://wiki.archlinux.org/index.php/Flashcache], my guess it might be particularly useful for indicies not changing that often, and for large indicies where an SSD of that size is prohibitive. Cheers, Dan

Re: [SoldCloud] leaking file descriptors

2012-03-01 Thread Bernd Fehling
What is netstat telling you about the connections on the servers? Any connections in "CLOSE_WAIT" (passive close) hanging? Saw this on my servers last week. Used a little proggi to spoof a local connection on those servers ports and was able to fake the TCP-stack to close those connections. It a

Re: AW: Problem using double quotes in search string

2012-03-01 Thread Ahmet Arslan
> does that effect my result list? Because if i use the  > dismax, and type into > my search field the title "blue on blue" (without quotes), I > get this > product as a first result. If I use dismax without boosting > and search for > "blue on blue" (without quotes) I'm not getting this result >

Spelling Corrector Algorithm

2012-03-01 Thread Husain, Yavar
Hi For spell checking component I set extendedResults to get the frequencies and then select the word with the best frequency. I understand the spell check algorithm based on Edit Distance. For an example: Query to Solr: Marien Spell Check Text Returned: Marine (Freq: 120), Market (Freq: 900)

[SoldCloud] leaking file descriptors

2012-03-01 Thread Markus Jelsma
Hi, Yesterday we had an issue with too many open files, which was solved because a username was misspelled. But there is still a problem with open files. We cannot succesfully index a few millions documents from MapReduce to a 5-node Solr cloud cluster. One of the problems is that after a wh

AW: Problem using double quotes in search string

2012-03-01 Thread Ramo Karahasan
Hi, does that effect my result list? Because if i use the dismax, and type into my search field the title "blue on blue" (without quotes), I get this product as a first result. If I use dismax without boosting and search for "blue on blue" (without quotes) I'm not getting this result in the first

Re: Problem using double quotes in search string

2012-03-01 Thread Ahmet Arslan
> I've got an issue when searching with a searchtstring > like:  'title:"Blue" > on "Blu' . the original searchstring is: 'title:"Blue" on > "Blue"' and this > works well. If I now delete the last double quote and the > "e" than I get the > error below. Is there any filter that can handle such >

Problem using double quotes in search string

2012-03-01 Thread Ramo Karahasan
Hi, I've got an issue when searching with a searchtstring like: 'title:"Blue" on "Blu' . the original searchstring is: 'title:"Blue" on "Blue"' and this works well. If I now delete the last double quote and the "e" than I get the error below. Is there any filter that can handle such searches w

Re: Couple issues with edismax in 3.5

2012-03-01 Thread Ahmet Arslan
> I don't think mm will help here because it defaults to 100% > already by the > following code. Default behavior of mm has changed recently. So it is a good idea to explicitly set it to 100%. Then all of the search terms must match. > Regarding multi-word synonym, what is the best way to handle

Re: Too many values for UnInvertedField faceting on field topic

2012-03-01 Thread Michael Jakl
Hi! On Wed, Feb 29, 2012 at 22:21, Emmanuel Espina wrote: > No. But probably we can find another way to do what you want. Please > describe the problem and include some "numbers" to give us an idea of > the sizes that you are handling. Number of documents, size of the > index, etc. Thank you! Ou

Re: Solr Cloud, Commits and Master/Slave configuration

2012-03-01 Thread eks dev
Thanks Mark, Good, this is probably good enough to give it a try. My analyzers are normally fast, doing duplicate analysis (at each replica) is probably not going to cost a lot, if there is some decent "batching" Can this be somehow controlled (depth of this buffer / time till flush or some such