Re: PositionIncrement gap and multi-valued fields.

2011-08-09 Thread Marco Martinez
Hi Luis, As far as i know, the position increment gap only affects in some queries, like phrase queries if you use the slop. The position incremente gap does not affect the similarity scoring formula of lucene : score(q,d) = coord(q,d)

Saravanan Chinnadurai/Actionimages is out of the office.

2011-08-09 Thread Saravanan . Chinnadurai
I will be out of the office starting 09/08/2011 and will not return until 10/08/2011. Please email to itsta...@actionimages.com for any urgent issues. Action Images is a division of Reuters Limited and your data will therefore be protected in accordance with the Reuters Group Privacy / Data P

Possible bug in FastVectorHighlighter

2011-08-09 Thread Massimo Schiavon
In my Solr (3.3) configuration I specified these two params: when I do a simple search I obtain correctly highlighted results where matches areenclosed with correct tag. If I do the same request with hl.useFastVectorHighlighter=true in the http query string (or specifying the same parameter

Re: ServerSolrException: No such core: collection1

2011-08-09 Thread Shinichiro Abe
Sorry. The jar files needed was insufficient. Regards, Shinichiro Abe On 2011/08/08, at 14:31, Shinichiro Abe wrote: > Hi. > I use EmbeddedSolrServer.The solrJ indexing code(attached) worked well > on Solr1.4 but didn't work on Solr3.3(since 3.1). Do I need to do anything > else? > > Exception

Re: Possible bug in FastVectorHighlighter

2011-08-09 Thread Jayendra Patil
Try using - Regards, Jayendra On Tue, Aug 9, 2011 at 4:46 AM, Massimo Schiavon wrote: > In my Solr (3.3) configuration I specified these two params: > > > > > when I do a simple search I obtain correctly highlighted results where > matches areenclosed with correct tag. > If I do

Problem with DIH: How to map key value pair stored in 1-N relation from a JDBC Source?

2011-08-09 Thread Christian Bordis
Hi! After 1,5 days digging on google, solr wiki, solr 1.4 book (Smiley/Pugh), solr-user mailing list no solution turn up for my problem *sigh*. I use: - solr 3.3 - Date Import Handler 3.3 - JDBC source is MySQL Constrains: - No changes to core database schema - I can only add new views, stored p

question about query parsing

2011-08-09 Thread Bernd Fehling
Hi list, while searching with debug on I see strange query parsing: identifier:"ub.uni-bielefeld.de" identifier:"ub.uni-bielefeld.de" +MultiPhraseQuery(identifier:"(ub.uni-bielefeld.de ub) uni bielefeld de") +identifier:"(ub.uni-bielefeld.de ub) uni bielefeld de" It is a PhraseQuery, but -

Trying to index pdf docs - lazy loading error - ClassNotFoundException: solr.extraction.ExtractingRequestHandler

2011-08-09 Thread Rode González
Hi all. I've tried to index pdf documents using the libraries includes in the example distribution of solr 3.3.0. I've copied all the jars includes in /dist and /contrib directories in a common /lib directory and I've included this path to the solrconfig.xml file. The request handler f

Re: strip html from data

2011-08-09 Thread Erick Erickson
OK, what does "not working" mean? You never answered Markus' question: "Are you looking at the returned result set or what you've actually indexed? Analyzers are not run on the stored data, only on indexed data." If "not working" means that your returned results contain the markup, then you're co

Re: Multiplexing TokenFilter for multi-language?

2011-08-09 Thread Erick Erickson
The most common way to handle this is to just index to language-specific fields, e.t. text_ex, text_en, text_de. Since you know what language the user is searching in, you can route the queries to the correct set of fields That said, this is an interesting approach. You don't necessarily need

Re: XPathProcessor foreach not working properly inside another entity

2011-08-09 Thread penela
After a bit of better targeted search on the forum, I''ve found this solution by Noble Paull: http://lucene.472066.n3.nabble.com/DIH-Http-input-bug-problem-with-two-level-RSS-walker-tp491046p491047.html Using rootEntity="false" in the outer entity seems to make it work as expected. Thanks! -- Vi

Re: question about query parsing

2011-08-09 Thread Ahmet Arslan
> while searching with debug on I see strange query parsing: > > name="rawquerystring">identifier:"ub.uni-bielefeld.de" > name="querystring">identifier:"ub.uni-bielefeld.de" > > +MultiPhraseQuery(identifier:"(ub.uni-bielefeld.de ub) uni > bielefeld de") > > > +identifier:"(ub.uni-bielefeld.de

Re: matching exact/whole phrase

2011-08-09 Thread Erick Erickson
Single quotes aren't part of the Lucene syntax. If you tack on &debugQuery=on, you'll see something like this returned: title_string:'One text:shot Note that the default field in my schema is "text", also note the single quote that's part of " 'One" but not "shot". Try with double quotes, that g

Strip special chars like "-"

2011-08-09 Thread roySolr
Hello, I have some terms in my index with specials characters. An example is "manchester-united". I want that a user can search for "manchester-united","manchester united" and "manchesterunited". What's the best way to fix this? i have used the patternReplaceFilter and some tokenizers but it coul

Re: Strip special chars like "-"

2011-08-09 Thread Jayendra Patil
Use http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory which can generate tokens as u need to match the search patterns. Regards, Jayendra On Tue, Aug 9, 2011 at 9:27 AM, roySolr wrote: > Hello, > > I have some terms in my index with specials characters.

Re: Strip special chars like "-"

2011-08-09 Thread roySolr
With the worddelimiter i can only fix the first 2 situations("manchester-united" and "manchester united") I can use something like generateWordParts. But i think this doesn't fix the problem with "manchesterunited". -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-specia

RE: Problem with DIH: How to map key value pair stored in 1-N relation from a JDBC Source?

2011-08-09 Thread Dyer, James
Christian, It looks like you should probably write a Transformer for your DIH script. I assume you have a child entity set up for "PriceTable". Add a Transformer to this entity that will look at the value of "currency" and "price", remove these from the row, then add them back in with "curren

Master switching using BigIP

2011-08-09 Thread Uomesh
Hi, I am working on solr server setup in production environment. Could you please advise me if following architecture will work and there will be no issue with index corruption. 2 master instances(1 primary master,2 backup/inactive master) Primary master - this will be active master and will doin

Re: Strip special chars like "-"

2011-08-09 Thread Markus Jelsma
Use the catenateWordParts option On Tuesday 09 August 2011 16:02:47 roySolr wrote: > With the worddelimiter i can only fix the first 2 > situations("manchester-united" and "manchester united") > > I can use something like generateWordParts. But i think this doesn't fix > the problem with "manches

Re: question about query parsing

2011-08-09 Thread Bernd Fehling
Am 09.08.2011 14:58, schrieb Ahmet Arslan: while searching with debug on I see strange query parsing: identifier:"ub.uni-bielefeld.de" identifier:"ub.uni-bielefeld.de" +MultiPhraseQuery(identifier:"(ub.uni-bielefeld.de ub) uni bielefeld de") +identifier:"(ub.uni-bielefeld.de ub) uni bielefe

Re: Multiplexing TokenFilter for multi-language?

2011-08-09 Thread cnyee
You are right - the stemmer was only instantiated twice. Not sure why it was instantiated twice. I tested with 10 and 50 records, maybe it was associated with the auto-commit cycle). What a bummer. Back to the drawing board again. Thanks for your input anyway. I was struggling with weird search b

Re: Strip special chars like "-"

2011-08-09 Thread roySolr
The catenateWordParts option has the following effect: manchester-united => "manchester","united" The query "manchesterunited" will not match with: "manchester","united". Maybe i'm wrong but i have test something similar in the past. -- View this message in context: http://lucene.472066.n3.nabb

Re: Strip special chars like "-"

2011-08-09 Thread Jayendra Patil
catenateWordParts would club the two words as mentioned in the example @ http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory catenateWords="1" causes maximum runs of word parts to be catenated: "wi-fi" => "wifi" Regards, Jayendra On Tue, Aug 9, 2011 at 10

Re: Strip special chars like "-"

2011-08-09 Thread roySolr
Yes, i understand the difference between generateWordParts and catenateWords. But i can't fix my problem with these options, It doesn't fix all the possibilities. -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239186.html Sent from the Solr

problem with terms component results ?

2011-08-09 Thread Royi Ronen
Hi, I am using the terms component. Many times an 'e' at the end of the word is missing. E.g., it gives 'googl' instead of 'google', 'youtub' instead of 'youtube'. The problem does not exist for some other words ending with 'e'. Any ideas why it happens? Royi

Re: Multiplexing TokenFilter for multi-language?

2011-08-09 Thread Erick Erickson
Frankly I don't know what gremlins lurk in this approach. You might hop over to the dev list and ask the question there, the language gurus will almost certainly weigh in. I'd ask the question without starting a JIRA just to see what the response is... Off hand, I can imagine that there would be s

edismax, mixing wildcards with specific terms

2011-08-09 Thread Mark juszczec
Hello all Will the edismax QueryParser allow you to mix search terms with wildcards and search terms with specific values in the same query? Or is it better to switch between Query Parsers at run time after analyzing the query? IOW if it contains wildcards, use edismax, otherwise use the default

XPathProcessor foreach not working properly inside another entity

2011-08-09 Thread penela
Hi! What I'm trying to do is get RSS urls from a MySQL DB of my own, an use them as the url endpoint for indexing the feed articles (mixing db and rss core DIH examples to some extent). My data-config looks like this:

Re: Strip special chars like "-"

2011-08-09 Thread Erick Erickson
OK, what are the other possibilities that it doesn't fix? Just saying "it won't work" without some examples doesn't leave much to go on... Best Erick On Tue, Aug 9, 2011 at 10:41 AM, roySolr wrote: > Yes, i understand the difference between generateWordParts and catenateWords. > But i can't fix

Re: problem with terms component results ?

2011-08-09 Thread Erick Erickson
The TermsComponent is looking at *indexed* terms that have been passed through the analysis chain. So I suspect you're seeing the results of stemming. WordDelimiterFilterFactory will also break things up, as will other tokenizers/analyzers. If you want your original input you'll need to have a pre

Re: How to retreive data from mysql table using DataImportHandler?

2011-08-09 Thread Erick Erickson
Please review: http://wiki.apache.org/solr/UsingMailingLists Have you looked at: http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS Best Erick On Tue, Aug 9, 2011 at 7:28 AM, nagarjuna wrote: > Hi everybody ... >       pls help me to get the data from mysql table using DataImportHan

Re: problem with terms component results ?

2011-08-09 Thread Erik Hatcher
Because you've got a stemmer in your analysis chain for those fields. If you want unstemmed terms, remove the stemmer, or copyField to a different field to use for the terms component. Erik On Aug 9, 2011, at 10:20 , Royi Ronen wrote: > Hi, > I am using the terms component. > Many tim

Re: Strip special chars like "-"

2011-08-09 Thread roySolr
Ok, i there are three query possibilities: Manchester-united Manchester united Manchesterunited The original name of the club is "manchester-united". generateWordParts will fixes two of these possibilities: "Manchester-united" => "manchester","united" I can search for "Manchester-united" and

Re: Multiplexing TokenFilter for multi-language?

2011-08-09 Thread cnyee
I believe that the FilterFactory is not designed to be called for each instant of field processing. Think of it, that would be terribly inefficient. The instantiated stemmer is meant to be reused as much as possible. Maybe the FilterFactory is called to instantiate a new stemmer in association with

Remote backup of Solr index over low-bandwith connection

2011-08-09 Thread Peter Kritikos
Hello, everyone, My company will be using Solr on the server appliance we deliver to our clients. We would like to maintain remote backups of clients' search indexes to avoid rebuilding a large index when an appliance fails. One of our clients backs up their data onto a remote server provided

Re: Strip special chars like "-"

2011-08-09 Thread lee carroll
Hi I might be wrong as I've not tried it out to be sure but from the wiki docs: These parameters may be combined in any way. Example of generateWordParts="1" and catenateWords="1": "PowerShot" -> 0:"Power", 1:"Shot" 1:"PowerShot" (where 0,1,1 are token positions) does that fit the bill ? On 9 A

Re: Remote backup of Solr index over low-bandwith connection

2011-08-09 Thread Jonathan Rochkind
You can use rsync to automatically only transfer the files that have changed. I don't think you'll have to home grow your own 'only transfer the diffs' solution, I think rsync will do that for you. But yes, running an optimization, after many updates/deletes, will generally mean nearly everyth

Re: Handling space variations in queries - matching 'thunderbolt' for query 'thunder bolt'

2011-08-09 Thread Chris Hostetter
: during indexing). However, due to the pre-analysis whitespace tokenization : done by lucene query parser, the reverse is not handled well - document with : string 'thunderbolt' being matched to query 'thunder bolt'. it's not so much "pre-analysis whitespace tokenization" as it is "query parse

Solr repllication oddities

2011-08-09 Thread Dan Pinkard
We've seen a few problems lately, and I'm hoping someone can offer insight on resolving them. We are currently on 1151296 on machines that are definitely not overloaded on mem/CPU/IO/network. 1)When moving from build 1151296 from 1150478 the index format changed, or some other marker that

Michigan Information Retrieval Enthusiasts Group Quarterly Meetup - August 17th 2011 - Solr in the Cloud, Erick Erickson

2011-08-09 Thread Provalov, Ivan
Next IR Meetup will be held at Farmington Hills Community Library on August 17, 2011. Please RSVP here: http://www.meetup.com/Michigan-Information-Retrieval-Enthusiasts-Group Thank you, Ivan Provalov

Re: "Weighted" facet strings

2011-08-09 Thread Chris Hostetter
: Subject: "Weighted" facet strings First off: a terminology clarification. what you are describing has very little to do with facets. it's true that your "category" field is a "facet" of your documents, but in the context of your question, you aren't asking about any facet related features of

Re: Multiple Cores on different machines?

2011-08-09 Thread Chris Hostetter
: A quick question - is it possible to have 2 cores in Solr on two different : machines? your question is a little vague ... like asking "is it possible to have to have two betamax VCRs in two different rooms of my house" ... sure, if you want ... but why are you asking the question? are you e

Re: Query Rewrite

2011-08-09 Thread Chris Hostetter
: then in the CustomQueryParser I iterate over all the arguments adding : each key/value to a Map. I then pass in this to the constructor of a : basically copied ExtendedDismaxQParser (only difference is the added : aliases and the logic to add those to the ExtendedSolrQParser). : : Now, the thi

Re: extending edismax?

2011-08-09 Thread Chris Hostetter
: E.g. I want to pass the query "red shoes" as q="shoes"&fq=color:red. I have : a service that can tell me that in the phrase "red shoes" the word red is : the color. : : My question is where should I invoke this external service, : : 1) should my search client call the service, form the request

Re: Solr and External Fields

2011-08-09 Thread Chris Hostetter
: I recently modified the DefaultSolrHighlighter to support external : fields, but is there a way to do this for solr itself? I'm looking to : store a field in an external store and give Solr access to that field. : Where in Solr would I do this? it depends on when/how you want to use that ente

Re: solr chewing up system swap

2011-08-09 Thread Chris Hostetter
: I have arrived a site where solr is being run under jetty. It is ubuntu 10.04 : i386 hosted on AWS (xen). Our combined solr index size is a mere 21 MB. What : I am seeing that solr is steadily consuming about 150 MB of swap per week : and won't relinquish it until sunspot is restarted. how much

edismax, inconsistencies with implicit/explicit AND when used with explicit OR

2011-08-09 Thread Mark juszczec
Hello all We've just switched from the default parser to the edismax parser and a user has noticed some inconsistencies when using implicit/explicit ANDs, ORs and grouping search terms in parenthesis. First, the default query operator is AND. I switched it from OR today. The query: customersJo

Cache replication

2011-08-09 Thread arian487
I'm wondering if the caches on all the slaves are replicated across (such as queryResultCache). That is to say, if I hit one of my slaves and cache a result, and I make a search later and that search happens to hit a different slave, will that first cached result be available for use? This is pre

Re: Strip special chars like "-"

2011-08-09 Thread Sujit Pal
I have done this using a custom tokenfilter that (among other things) detects hyphenated words and converts it to the 3 variations, using a regex match on the incoming token: (\w+)-(\w+) that runs the following regex transform: s/(\w+)-(\w+)/$1$2__$1 $2/ and then splits by "__" and passes the or

Re: Cache replication

2011-08-09 Thread Erick Erickson
No, caches are not replicated across slaves. You really have two choices: 1> use some sort of "sticky" addressing whereby requests from the same client are sent to the same slave. 2> don't worry about it . Examine your cache stats to see how often your caches, particularly your Query

Re: Strip special chars like "-"

2011-08-09 Thread Erick Erickson
That's not what I get. This is for Solr 3.3, but there's no reason that I know of that other versions should give different results. Here's the field def form the 3.3 example, this is just the standard implementation.

unique terms and multi-valued fields

2011-08-09 Thread Kevin Osborn
Please verify my understanding. I have a field called "category" and it has a value "computers". If I use this same field and value for all of my documents, it is really only stored on disk once because "category:computers" is a unique term. Is this correct? But, what about multi-valued fields.

Re: Cache replication

2011-08-09 Thread arian487
Thanks for the informative response. I'll consider using the 'sticky' addressing as you suggested. The reason cache is so important for me is because I'm actually doing more processing after the query component to come up with my query result and I want to avoid that processing as much as possibl

Re: Multiple Cores on different machines?

2011-08-09 Thread Satish Talim
Chris, sorry for not being clear when I asked the question. We are still experimenting with Solr. We have 2 tables in Postgres that we want to migrate to Solr for faster query results. One index is of static data and the other related index would be of data that changes once or twice a month. Some

Re: Multiple Cores on different machines?

2011-08-09 Thread Shashi Kant
"Betamax VCR"? really ? :-) On Tue, Aug 9, 2011 at 3:38 PM, Chris Hostetter wrote: > > : A quick question - is it possible to have 2 cores in Solr on two > different > : machines? > > your question is a little vague ... like asking "is it possible to have to > have two betamax VCRs in two diffe

RE: Multiple Cores on different machines?

2011-08-09 Thread Jonathan Rochkind
> tables. Others are suggesting 2 separate indexes on 2 different machines and > using SOLRs capacity to combine cores and generate a third index that > denormalizes the tables for us. What capability is that, exaclty? I think you may be imagining it. Solr does have some capability to distribut

Re: Cache replication

2011-08-09 Thread Paul Libbrecht
Arian, I've been doing results post-processing in some versions of the ActiveMath server and it has been the wrong choice as much as possible. Maybe this is not what you do, but the biggest flaw was that the post-processing was eliminating or adding results (for insiders of ActiveMath: converti

Re: edismax, inconsistencies with implicit/explicit AND when used with explicit OR

2011-08-09 Thread Ahmet Arslan
Hi Mark, I suspect that issue you are facing is https://issues.apache.org/jira/browse/SOLR-2649 You can verify this by toggling default operator between 'AND' and 'OR'. --- On Wed, 8/10/11, Mark juszczec wrote: > From: Mark juszczec > Subject: edismax, inconsistencies with implicit/explicit