Re: FastVectorHighlighter wiki corrections

2012-01-10 Thread Michael Lissner
Hi, I didn't hear any responses here, so I went ahead and made a bunch of changes to the highlighting parameters wiki: - Highlighter is now known as Original Highlighter so it's more clear that Highlighter doesn't just refer to the highlighting utilities generally. - I need help with fragsize

Multiple Sort for Group/Folding

2012-01-10 Thread Mauro Asprea
Hi, I'm having some issues trying to sort my grouped results by more than one field. If I use just one, independently of which I use it just work fine (I mean it sorts). I have a case that the first sorting key is equal for all the head docs of each group, so I expect to return the groups sor

Re: Solr core as a dispatcher

2012-01-10 Thread shlomi java
Straying a bit from the subject, don't you think it will be useful to have the shards parameter used also in the index, in order to maintain document uniqueness? I mean as an out of the box feature of Solr. Because the situation today is that a Solr's client working with a sharded Solr is respons

Re: stopwords as privacy measure

2012-01-10 Thread Michael Lissner
It's a bit of a privacy through obscurity measure, unfortunately. The problem is that American courts do a lousy job of removing social security numbers from cases that I put on my site. I do anonymization before sending the cases to Solr, but if you're clever (and the stopwords weren't in plac

Re: Stemming numbers

2012-01-10 Thread Otis Gospodnetic
Hi Tanner, Here is another simple way: AutoComplete. You know what your users are searching for, you can identify top queries and you can identify common queries that are not finding matches.  This all allows you to figure out what to feed in AutoComplete.  And hopefully your AutoComplete doesn

Re: Stemming numbers

2012-01-10 Thread Ted Dunning
I was afraid you would say that. See http://fora.tv/2009/10/14/ACM_Data_Mining_SIG_Ted_Dunning#fullprogram, click on the Recommendations section to skip to the good part. The point is that cross recommendation can let you learn what sorts of rewrites of this kind are needed. The idea is that you

Re: Stemming numbers

2012-01-10 Thread Tanner Postert
You mention "that is one way to do it" is there another i'm not seeing? On Jan 10, 2012, at 4:34 PM, Ted Dunning wrote: > On Tue, Jan 10, 2012 at 5:32 PM, Tanner Postert > wrote: > >> We've had some issues with people searching for a document with the >> search term '200 movies'. The document i

Re: Stemming numbers

2012-01-10 Thread Ted Dunning
On Tue, Jan 10, 2012 at 5:32 PM, Tanner Postert wrote: > We've had some issues with people searching for a document with the > search term '200 movies'. The document is actually title 'two hundred > movies'. > > Do we need to add every number to our synonyms dictionary to > accomplish this? Tha

RE: ignoreTikaException value

2012-01-10 Thread TRAN-NGOC Minh
Thanks for your reply. I added the argument in the solrconfig.xml and it worked like a charm. Thanks again Minh -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: mardi 10 janvier 2012 01:25 To: solr-user@lucene.apache.org Subject: Re: ignoreTikaException value

Stemming numbers

2012-01-10 Thread Tanner Postert
We've had some issues with people searching for a document with the search term '200 movies'. The document is actually title 'two hundred movies'. Do we need to add every number to our synonyms dictionary to accomplish this? Is it best done at index or search time?

Re: SpellCheck Help

2012-01-10 Thread Donald Organ
my copyField was defined as copyfield <--- notice the lowercase f On Tue, Jan 10, 2012 at 2:50 PM, Dyer, James wrote: > Three things to check: > > 1. Use a higher spellcheck.count than 1. Try 10. IndexBasedSpellChecker > pre-filters the possibilities in a first pass of a 2-pass process.

RE: SpellCheck Help

2012-01-10 Thread Dyer, James
Three things to check: 1. Use a higher spellcheck.count than 1. Try 10. IndexBasedSpellChecker pre-filters the possibilities in a first pass of a 2-pass process. If spellcheck.count is too low, all the good suggestions might get filtered on the first pass and then it won't find anything on

SpellCheck Help

2012-01-10 Thread Donald Organ
I am trying to get the IndexBasedSpellChecker to work. I believe I have everything setup properly and the spellcheck component seems to be running but the suggestions list is empty. I am using SOLR 3.5 with Jetty. My solrconfig.xml and schema.xml are as follows: solrconfig.xml: http://pastie.o

Re: How to debug DIH with MySQL?

2012-01-10 Thread Walter Underwood
Right, but that says exactly nothing about how that identifier is used. --wunder On Jan 10, 2012, at 11:23 AM, Gora Mohanty wrote: > On Wed, Jan 11, 2012 at 12:37 AM, Walter Underwood > wrote: >> Thanks! That looks like it fixed the problem. This list continues to be >> awesome. >> >> Is the f

Re: How to debug DIH with MySQL?

2012-01-10 Thread Gora Mohanty
On Wed, Jan 11, 2012 at 12:37 AM, Walter Underwood wrote: > Thanks! That looks like it fixed the problem. This list continues to be > awesome. > > Is the function of the name attribute actually described in the docs? I could > not figure out what it was for. Yes, it is, though maybe not very pr

Re: How to debug DIH with MySQL?

2012-01-10 Thread Walter Underwood
Thanks! That looks like it fixed the problem. This list continues to be awesome. Is the function of the name attribute actually described in the docs? I could not figure out what it was for. wunder On Jan 10, 2012, at 10:41 AM, dan whelan wrote: > just a guess but this might need to change fro

Re: How to debug DIH with MySQL?

2012-01-10 Thread dan whelan
just a guess but this might need to change from ${biblio.id} to ${book.id} Since the entity name is book instead of biblio On 1/10/12 10:37 AM, Walter Underwood wrote: I see a missing required "title" field for every document when I'm using DIH. Yes, these documents have titles in the dat

How to debug DIH with MySQL?

2012-01-10 Thread Walter Underwood
I see a missing required "title" field for every document when I'm using DIH. Yes, these documents have titles in the database. Is there a way to see what exact queries are sent to MySQL or received by MySQL? Here is a relevant chunk of the dataConfig:

Re: Solr core as a dispatcher

2012-01-10 Thread Shawn Heisey
On 1/9/2012 5:15 PM, Hector Castro wrote: Hi, Has anyone had success with multicore single node Solr configurations that have one core acting solely as a dispatcher for the other cores? For example, say you had 4 populated Solr cores – configure a 5th to be the definitive endpoint with `shar

Re: Two documents with same ID but different hash

2012-01-10 Thread Erick Erickson
I have no idea what you mean by "different hash", and you haven't provided much information go on here. What is your evidence that the document is in the index twice? If you're inspecting the index at a low level that's expected, since documents are just marked as deleted not immediately removed f

Re: best way to force substitutions in data

2012-01-10 Thread Gora Mohanty
On Tue, Jan 10, 2012 at 9:04 PM, geeky2 wrote: > thank you both for the information. > > Gora, when you mentioned: > >>> > - For keeping both values, use synonyms. > << > > what did you mean exactly. [...] Please take a look at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Syno

Re: Doing url search in solr is slow

2012-01-10 Thread yu shen
Hi Erick, I change all my url fields into text (they were string fields before), and added a WordDelimiterFilterFactory, so that url fields can be tokenized into several words. But I still got around 15 seconds response time measured using debugyQuery=on, and most of the time still spend on DebugC

Re: best way to force substitutions in data

2012-01-10 Thread geeky2
thank you both for the information. Gora, when you mentioned: >> - For keeping both values, use synonyms. << what did you mean exactly. mark -- View this message in context: http://lucene.472066.n3.nabble.com/best-way-to-force-substitutions-in-data-tp3646195p3647920.html Sent from the Solr -

Re: Do Hignlighting + proximity using surround query parser

2012-01-10 Thread Ahmet Arslan
> I am not able to do highlighting with surround query parser > on the returned > results. > I have tried the highlighting component but it does not > return highlighted > results. Highlighter does not recognize Surround Query. It must be re-written to enable highlighting in o.a.s.search.QParser#

Re: Two documents with same ID but different hash

2012-01-10 Thread Hyttinen Lauri
Hello again, Well after further review the ID's are different. The difference was just so small I missed it after staring it for a few hours. BR, Lauri On 01/10/2012 02:20 PM, Hyttinen Lauri wrote: Hello, I sent some data into the solr/lucene index but when I query the data I see weird resu

Re: how to rebuild snowball lib in solr

2012-01-10 Thread Erick Erickson
On a very quick glance, it looks like the source is at: ./lucene/contrib/analyzers/common/src/java/org/tartarus/snowball and from there just compile Lucene and/or Solr as you normally would. See: http://wiki.apache.org/solr/HowToContribute Best Erick On Mon, Jan 9, 2012 at 2:13 PM, wrote: > H

RE: Match raw query string

2012-01-10 Thread McCarroll, Robert
Thank you for your patience and assistance. XML is not my forte, but layoffs and attrition have reduced IT staff well below minimum functional levels here. Thanks to your help, the exact title matches have made it to the first page of results. Robert McCarroll Systems Administration NYS De

Two documents with same ID but different hash

2012-01-10 Thread Hyttinen Lauri
Hello, I sent some data into the solr/lucene index but when I query the data I see weird results. There are documents with identical id fields but they have different hash values. Apart from the hash values the results are the same. I thought it was impossible to have documents with same uniq

Re: Solr core as a dispatcher

2012-01-10 Thread Hector Castro
In my case the cores are populated with different records that adhere to the same schema. The question about randomly distributing requests is because each core has the `shards` parameter populated so that it can hit the other core's indexes. My question is more about the advantages (if any) of

Re: Facet Query using Dates

2012-01-10 Thread Mauro Asprea
I think I solve it... It seems to be because of the - that's just before the query facet name -- Mauro Asprea E-Mail: mauroasp...@gmail.com Mobile: +34 654297582 Skype: mauro.asprea On Tuesday, January 10, 2012 at 11:33 AM, Mauro Asprea wrote: > Hi, I;m having issues using the "new" way

Facet Query using Dates

2012-01-10 Thread Mauro Asprea
Hi, I;m having issues using the "new" way of faceting dates with the Query Facets. The issue is that it is returning wrong counts. I tested it using a Date Facet instead and the Dated one did result correct counters. I'm using Sunspot RSolr client and I'm using also new folding/group feature.

Re: best way to force substitutions in data

2012-01-10 Thread Gora Mohanty
On Tue, Jan 10, 2012 at 4:44 AM, geeky2 wrote: [...] > i have a database with approximately 7Million rows that i am bringing in to > solr. > > for a very small sub-set of these 7Million rows (about 130 rows), i need to > substitute an old part number for a new part number.  i know ahead of time >

Re: best way to force substitutions in data

2012-01-10 Thread Dmitry Kan
how about using regular expressions: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceCharFilterFactory On Tue, Jan 10, 2012 at 1:14 AM, geeky2 wrote: > Hello all, > > i have been reading the solr book as well as searching the archives of this > list to learn how

Question about updating index with custom field types

2012-01-10 Thread 罗赛
Hello everyone, I have a question on how to update index using xml messages when there are some complex custom field types in my index...like: And field offer has some attributes in it... I've read page, http://wiki.apache.org/solr/UpdateXmlMessages and example shows that xml should be like:

impact of omitTermFreqAndPositions="true"

2012-01-10 Thread Samarendra Pratap
Hi, I understand that setting omitTermFreqAndPositions="true" for a field in schema.xml stores less information in the index with some restrictions e.g. phrase search. But does setting this property as "true" for a field which is of type "string", "int" or is analyzed by KeywordAnalyzer makes an