Re: Use of Soundex in solr spellchecker

2012-06-06 Thread Lance Norskog
Metaphone and DoubleMetaphone are more advanced that Soundex, and they already exist as filters. There is no independent measure of accuracy for Solr- you have to decide if you like the results. On Wed, Jun 6, 2012 at 4:36 AM, nutchsolruser wrote: > Does incorporating soundex algorithm into solr

Question on addBean and deleteByQuery

2012-06-06 Thread Darin Pope
When using SolrJ (1.4.1 or 3.5.0) and calling either addBean or deleteByQuery, the POST body has numbers before and after the XML (47 and 0 as noted in the example below): *** POST /solr/123456/update?wt=xml&version=2.2 HTTP/1.1 User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSo

Re: Solr, I have perfomance problem for indexing.

2012-06-06 Thread Jihyun Suh
Each table has 35,000 rows. (35 thousands). I will check the log for each step of indexing. I run Solr 3.5. 2012/6/6 Jihyun Suh > I have 128 tables of mysql 5.x and each table have 3,5000 rows. > When I start dataimport(indexing) in Solr, it takes 5 minutes for one > table. > But When Solr ind

Re: Is FileFloatSource's WeakHashMap cache only cleaned by GC?

2012-06-06 Thread Gregg Donovan
Thanks for the suggestion, Erick. I created a JIRA and moved the patch to SVN, just to be safe. [1] --Gregg [1] https://issues.apache.org/jira/browse/SOLR-3514 On Wed, Jun 6, 2012 at 2:35 PM, Erick Erickson wrote: > > Hmmm, it would be better to open a Solr JIRA and attach this as a patch. > Al

Re: ExtendedDisMax Question - Strange behaviour

2012-06-06 Thread Jack Krupansky
First, it appears that you are using the "dismax" query parser, not the extended dismax ("edismax") query parser. My hunch is that some of those fields may be non-tokenized "string" fields in which one or more of your search keywords do appear but not as the full string value or maybe with a diff

Re: Extract information from url field

2012-06-06 Thread Jack Krupansky
Yes, using PatternTokenizerFactory. Here's an example field type that if you define a "department" field with this type and do a copyField from "url" to "department, it will end up with the department name alone. It handles embedded punctuation (e.g., dot, dash, and underscore) and mixed case wo

Re: pass custom parameters from client to solr

2012-06-06 Thread srinir
What would be a good place to read the custom solr params I passed from the client to solr ? I saw that all the params passed to solr is available in rb.req. I have a business requirement to collapse or combine some properties together based on some conditions. Currently I have a custom component

Re: TermComponent and Optimize

2012-06-06 Thread lboutros
It is possible to use the "expungeDeletes" option in the commit, that could solve your problem. http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22 Sadly, there is currently a bug with the TieredMergePolicy : https://issues.apache.org/jira/browse/SOLR-2725 SOLR-272

Re: Replication

2012-06-06 Thread Erick Erickson
A couple of things to check. 1> Are you optimizing all the time? An optimization will merge all the segments into a single segment, which will cause the whole index to be replicated after each optimization. Best Erick On Wed, Jun 6, 2012 at 1:33 AM, William Bell wrote: > We are using S

Re: Is FileFloatSource's WeakHashMap cache only cleaned by GC?

2012-06-06 Thread Erick Erickson
Hmmm, it would be better to open a Solr JIRA and attach this as a patch. Although we've had some folks provide a Git-based rather than an SVN-based patch. Anyone can open a JIRA, but you must create a signon to do that. It'd get more attention that way Best Erick On Tue, Jun 5, 2012 at 2:19

Re: Boost by Nested Query / Join Needed?

2012-06-06 Thread Erick Erickson
Generally, you just have to bite the bullet and denormalize. Yes, it really runs counter to to your DB mindset But before jumping that way, how many denormalized records are we talking here? 1M? 100M? 1B? Solr has (4.x) some join capability, but it makes a lousy general-purpose database. Yo

Single term boosting with dismax

2012-06-06 Thread matteosilv
Hi, i'm using dismax query parser. i would like to boost on a single term at query time, instead that on the whole field. i should probably use the standard query parser, however i've also overriden the dismax query parser to handle payload boosting on terms. what i want to obtain is a double boo

Levenstein Distance

2012-06-06 Thread Gau
I have a list of synoynms which is being expanded at query time. This yields a lot of results (in millions). My use-case is name search. I want to sort the results by Levenstein Distance. I know this can be done with strdist function. But sorting being inefficient and Solr function adding to its w

RE: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults

2012-06-06 Thread Dyer, James
Markus, With "maxCollationTries=0", it is not going out and querying the collations to see how many hits they each produce. So it doesn't know the # of hits. That is why if you also specify "collateExtendedResults=true", all the hit counts are zero. It would probably be better in this case i

Re: Fielded searches with Solr ExtendedDisMax Query Parser

2012-06-06 Thread Nicolò Martini
Great! Thank you a lot, that solved all my problems. Regards, Nicolò Il giorno 06/giu/2012, alle ore 14:55, Jack Krupansky ha scritto: > This is a known (unfixed) bug. The workaround is to add a space between each > left parenthesis and field name. > > See: > https://issues.apache.org/jira/bro

Re: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults

2012-06-06 Thread Jack Krupansky
Do single-word queries return hits? Is this a multi-shard environment? Does the request list all the shards needed to give hits for all the collations you expect? Maybe the queries are being done locally and don't have hits for the collations locally. -- Jack Krupansky -Original Message-

problem with mapping-iso accents

2012-06-06 Thread Gastone Penzo
Hi, i have a problem ISOaccent tokenize filter. i have e field in my schema with this filter: if i try this filter with analyisis tool in solr admin panel it works. for example: sarà => sara. but when i create indexes it doesn't work. in the index the field is "sarà" with accent. why? i use

solrj library requirements: slf4j-jdk14-1.5.5.jar

2012-06-06 Thread Welty, Richard
the section of the solrj wiki page on setting up the class path calls for slf4j-jdk14-1.5.5.jar which is supposed to be in a lib/ subdirectory. i don't see this jar or any like it with a different version anywhere in either the 3.5.0 or 3.6.0 distributions. is it really needed or is this just sli

Re: ExtendedDisMax Question - Strange behaviour

2012-06-06 Thread André Maldonado
Erick, thanks for your reply and sorry for the confusion in last e-mail. But it is hard to explain the situation without that bunch of code. In my schema I have a field called textoboost that contains copies of a lot of other fields. Doing the query in this field I got this: +(((textoboost:aparta

Re: highlighter not respecting sentence boundry

2012-06-06 Thread Jack Krupansky
I don't quite understand the problem. What is an example snippet that you think is incorrect and what do you think the snipppet should be? Also, try the /browse handler in the Solr example after following the Solr tutorial to post data. Do a search that will highlight terms similar to what you

Re: Fielded searches with Solr ExtendedDisMax Query Parser

2012-06-06 Thread Jack Krupansky
This is a known (unfixed) bug. The workaround is to add a space between each left parenthesis and field name. See: https://issues.apache.org/jira/browse/SOLR-3377 So, q=(field2:ciao) becomes: q=( field2:ciao) -- Jack Krupansky -Original Message- From: Nicolò Martini Sent: Wednesd

Re: Exception when optimizing index

2012-06-06 Thread Jack Krupansky
It could be related to https://issues.apache.org/jira/browse/LUCENE-2975. At least the exception comes from the same function. "Caused by: java.io.IOException: Invalid vInt detected (too many bits) at org.apache.lucene.store.DataInput.readVInt(DataInput.java:112)" What hardware and Java vers

Fielded searches with Solr ExtendedDisMax Query Parser

2012-06-06 Thread Nicolò Martini
Hi all, I'm having a problem using the Solr ExtendedDisMax Query Parser with query that contains fielded searches inside not-plain queries. The case is the following. If I send to SOLR an edismax request (defType=edismax) with parameters 1. qf=field1^10 2. q=field2:ciao 3. debugQuery=on (for

Re: Efficiently mining or parsing data out of XML source files

2012-06-06 Thread Jack Krupansky
I did see a mention yesterday to a situation involving DIH and large XML files where is was unusually slow, but if the big XML file was broken into many smaller files it went really fast for the same amount of data. If that is the case, you don't need to parse all of the XML, just detect the bo

RE: ReadTimeout on commit

2012-06-06 Thread spring
Hi Jack, hi Erik, thanks for the tips! It's solr 3.6 I increased the batch to 1000 docs and the timeout to 10 s. Now it works. And I will implement the retry around the commit-call. Thx! > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Mittwoch, 6. J

Re: sort by publishedDate and get published Date in solr query results

2012-06-06 Thread Shameema Umer
OK Jack. Will do. On Wed, Jun 6, 2012 at 5:29 PM, Jack Krupansky wrote: > Check your Solr log file to see whether errors or warnings are issued. If > Nutch is sending bogus date values, they should produce warnings. > > At this stage there are two strong possibilities: > > 1. Nutch is simply not

Re: ReadTimeout on commit

2012-06-06 Thread Mark Miller
Looks like the commit is taking longer than your set timeout. On Jun 5, 2012, at 6:51 AM, wrote: > Hi, > > I'm indexing documents in batches of 100 docs. Then commit. > > Sometimes I get this exception: > > org.apache.solr.client.solrj.SolrServerException: > java.net.SocketTimeoutException:

Re: Efficiently mining or parsing data out of XML source files

2012-06-06 Thread Mike Sokolov
I agree, that seems odd. We routinely index XML using either HTMLStripCharFilter, or XmlCharFilter (see patch: https://issues.apache.org/jira/browse/SOLR-2597), both of which parse the XML, and we don't see such a huge speed difference from indexing other field types. XmlCharFilter also allo

Re: sort by publishedDate and get published Date in solr query results

2012-06-06 Thread Jack Krupansky
Check your Solr log file to see whether errors or warnings are issued. If Nutch is sending bogus date values, they should produce warnings. At this stage there are two strong possibilities: 1. Nutch is simply not sending that date field value at all. 2. Solr is rejecting the date field value be

Re: ReadTimeout on commit

2012-06-06 Thread Jack Krupansky
As Erick says, you are probably hitting an occasional automatic background merge which takes a bit longer. That is not an indication of a problem. Increase your connection timeout. Check the log to see how long the merge or "slow commit" takes. You have a timeout of 1000 which is 1 second. Make

Re: sort by publishedDate and get published Date in solr query results

2012-06-06 Thread Shameema Umer
Versions: Nutch: 1.4 and Solr: 3.4 My schema file contains But I do not know whether this feed plugin is working or not as I am new to nutch and solr. Here is my query http://localhost:8983/solr/select/?q=title:'.$v.' content:'.$v.'&sort=publishedDat

Re: Schema / Config Error?

2012-06-06 Thread Jack Krupansky
Read CHANGES.txt carefully, especially the section entitled "Upgrading from Solr 3.5". For example, "* As of Solr 3.6, the and sections of solrconfig.xml are deprecated and replaced with a new section. Read more in SOLR-1052 below." If you simply copied your schema/config directly, uncha

Re: How to find the age of a page

2012-06-06 Thread Jack Krupansky
See the reply on the other email thread you started. -- Jack Krupansky -Original Message- From: Shameema Umer Sent: Wednesday, June 06, 2012 6:28 AM To: solr-user@lucene.apache.org Subject: Re: How to find the age of a page Hi Syed Abdul, I am sorry to ask this basic question as I

Re: How to find the age of a page

2012-06-06 Thread Jack Krupansky
My misunderstanding. I thought you were "publishing" to SOLR and wanted the date when that occurred (indexing). -- Jack Krupansky -Original Message- From: Shameema Umer Sent: Wednesday, June 06, 2012 4:45 AM To: solr-user@lucene.apache.org Subject: Re: How to find the age of a page H

Re: sort by publishedDate and get published Date in solr query results

2012-06-06 Thread Jack Krupansky
Step 1: Verify that "publishedDate" is in fact the field name that Nutch uses for "published date". Step 2: Make sure the Nutch is passing the date in the format -MM-DDTHH:MM:SSZ. Whether you need a "Nutch plugin" to do that is not a question for this Solr mailing list. My (very limited) u

Re: Schema / Config Error?

2012-06-06 Thread Shameema Umer
Make sure your port is 8983 or 8080. On Wed, Jun 6, 2012 at 4:27 PM, Erick Erickson wrote: > That implies one of two things: > 1> you changed solr.xml. I'd go back to the original and re-edit > anything you've changed > 2> you somehow got a corrupted download. Try blowing your installation > away

Re: ReadTimeout on commit

2012-06-06 Thread Erick Erickson
You're probably hitting a background merge and the request is timing out even though the commit succeeds. Try querying for the data in the last packet to test this. And you don't say what version of Solr you're using. One test you can do is increase the number of documents before a commit. If mer

Re: ExtendedDisMax Question - Strange behaviour

2012-06-06 Thread Erick Erickson
Sorry, but your post is really hard to read with all the data inline. Try running with &debugQuery=on and looking at the parsed query, I suspect your field lists aren't the same even though you think they are. Perhaps a typo somewhere? Best Erick On Mon, Jun 4, 2012 at 1:26 PM, André Maldonado

Re: Schema / Config Error?

2012-06-06 Thread Erick Erickson
That implies one of two things: 1> you changed solr.xml. I'd go back to the original and re-edit anything you've changed 2> you somehow got a corrupted download. Try blowing your installation away and getting a new copy Because it works perfectly for me. Best Erick On Wed, Jun 6, 2012 at 4:14 AM

Re: How to find the age of a page

2012-06-06 Thread Shameema Umer
Hi Syed Abdul, I am sorry to ask this basic question as I am new to nutch solr(even new to java application). Can you tell me how to add tstamp to published date after re-indexing. Does an update query is enough? Also, i am not able to get the field *publishedDate* in my query results to check whe

issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults

2012-06-06 Thread Markus Jelsma
Hi, We've had some issues with a bad zero-hits collation being returned for a two word query where one word was only one edit away from the required collation. With spellcheck.maxCollations to a reasonable number we saw the various suggestions without the required collation. We decreased thres

Re: How to find the age of a page

2012-06-06 Thread in.abdul
when ever you reindex add the current TimeStamp .. that will be the publish date .. from there you can calculate Thanks and Regards, S SYED ABDUL KATHER On Wed, Jun 6, 2012 at 2:16 PM, Shameema Umer [via Lucene] < ml-node+s472066n3987930...@n3.nabble.com> wrote: > Hi abdul a

Issue with Solrcloud /solr 4.0 : Discrepancy in number of groups and ngroups value

2012-06-06 Thread Nitesh Nandy
We are using Solr 4.0 (svn build 30th may, 2012) with Solr Cloud. While querying, we use field collpasing with ngroups set to true. However, there is a difference in the number of results got and the "ngroups" value returned. Ex: http://localhost:8983/solr/select?q=messagebody:monit%20AND%20usergr

Re: How to find the age of a page

2012-06-06 Thread Shameema Umer
Hi abdul and Jack, i got the tstamp working but I really need to know the published date of each page. On Sat, Jun 2, 2012 at 12:01 AM, Jack Krupansky wrote: > If you uncomment the "timestamp" field in the Solr example, Solr will > automatically initialize it for each new document to be the tim

Re: Schema / Config Error?

2012-06-06 Thread G.Long
Hi :) Looks like you forgot to paste your schema.xml and the error in your e-mail : o Gary Le 06/06/2012 10:14, Spadez a écrit : Hi, I installed a fresh copy of Solr 3.6.0 or my server but I get the following page when I try to access Solr: http://176.58.103.78:8080/solr/ It says errors t

Schema / Config Error?

2012-06-06 Thread Spadez
Hi, I installed a fresh copy of Solr 3.6.0 or my server but I get the following page when I try to access Solr: http://176.58.103.78:8080/solr/ It says errors to do with my Solr.xml. This is my solr.xml: I really cant figure out how I am meant to fix this, so if anyone is able to give some in