Metaphone and DoubleMetaphone are more advanced that Soundex, and they
already exist as filters.
There is no independent measure of accuracy for Solr- you have to
decide if you like the results.
On Wed, Jun 6, 2012 at 4:36 AM, nutchsolruser wrote:
> Does incorporating soundex algorithm into solr
When using SolrJ (1.4.1 or 3.5.0) and calling either addBean or
deleteByQuery, the POST body has numbers before and after the XML (47 and 0
as noted in the example below):
***
POST /solr/123456/update?wt=xml&version=2.2 HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSo
Each table has 35,000 rows. (35 thousands).
I will check the log for each step of indexing.
I run Solr 3.5.
2012/6/6 Jihyun Suh
> I have 128 tables of mysql 5.x and each table have 3,5000 rows.
> When I start dataimport(indexing) in Solr, it takes 5 minutes for one
> table.
> But When Solr ind
Thanks for the suggestion, Erick. I created a JIRA and moved the patch
to SVN, just to be safe. [1]
--Gregg
[1] https://issues.apache.org/jira/browse/SOLR-3514
On Wed, Jun 6, 2012 at 2:35 PM, Erick Erickson wrote:
>
> Hmmm, it would be better to open a Solr JIRA and attach this as a patch.
> Al
First, it appears that you are using the "dismax" query parser, not the
extended dismax ("edismax") query parser.
My hunch is that some of those fields may be non-tokenized "string" fields
in which one or more of your search keywords do appear but not as the full
string value or maybe with a diff
Yes, using PatternTokenizerFactory. Here's an example field type that if you
define a "department" field with this type and do a copyField from "url" to
"department, it will end up with the department name alone. It handles
embedded punctuation (e.g., dot, dash, and underscore) and mixed case wo
What would be a good place to read the custom solr params I passed from the
client to solr ? I saw that all the params passed to solr is available in
rb.req.
I have a business requirement to collapse or combine some properties
together based on some conditions. Currently I have a custom component
It is possible to use the "expungeDeletes" option in the commit, that could
solve your problem.
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22
Sadly, there is currently a bug with the TieredMergePolicy :
https://issues.apache.org/jira/browse/SOLR-2725 SOLR-272
A couple of things to check.
1> Are you optimizing all the time? An optimization will merge all the
segments into a single segment, which will cause the whole
index to be replicated after each optimization.
Best
Erick
On Wed, Jun 6, 2012 at 1:33 AM, William Bell wrote:
> We are using S
Hmmm, it would be better to open a Solr JIRA and attach this as a patch.
Although we've had some folks provide a Git-based rather than an SVN-based
patch.
Anyone can open a JIRA, but you must create a signon to do that. It'd get more
attention that way
Best
Erick
On Tue, Jun 5, 2012 at 2:19
Generally, you just have to bite the bullet and denormalize. Yes, it
really runs counter to to your DB mindset
But before jumping that way, how many denormalized records are we
talking here? 1M? 100M? 1B?
Solr has (4.x) some join capability, but it makes a lousy general-purpose
database.
Yo
Hi, i'm using dismax query parser.
i would like to boost on a single term at query time, instead that on the
whole field.
i should probably use the standard query parser, however i've also overriden
the dismax query parser to handle payload boosting on terms.
what i want to obtain is a double boo
I have a list of synoynms which is being expanded at query time. This yields
a lot of results (in millions). My use-case is name search.
I want to sort the results by Levenstein Distance. I know this can be done
with strdist function. But sorting being inefficient and Solr function
adding to its w
Markus,
With "maxCollationTries=0", it is not going out and querying the collations to
see how many hits they each produce. So it doesn't know the # of hits. That
is why if you also specify "collateExtendedResults=true", all the hit counts
are zero. It would probably be better in this case i
Great! Thank you a lot, that solved all my problems.
Regards,
Nicolò
Il giorno 06/giu/2012, alle ore 14:55, Jack Krupansky ha scritto:
> This is a known (unfixed) bug. The workaround is to add a space between each
> left parenthesis and field name.
>
> See:
> https://issues.apache.org/jira/bro
Do single-word queries return hits?
Is this a multi-shard environment? Does the request list all the shards
needed to give hits for all the collations you expect? Maybe the queries are
being done locally and don't have hits for the collations locally.
-- Jack Krupansky
-Original Message-
Hi,
i have a problem ISOaccent tokenize filter.
i have e field in my schema with this filter:
if i try this filter with analyisis tool in solr admin panel it works.
for example:
sarà => sara.
but when i create indexes it doesn't work. in the index the field is "sarà"
with accent. why?
i use
the section of the solrj wiki page on setting up the class path calls for
slf4j-jdk14-1.5.5.jar which is supposed to be in a lib/ subdirectory.
i don't see this jar or any like it with a different version anywhere
in either the 3.5.0 or 3.6.0 distributions.
is it really needed or is this just sli
Erick, thanks for your reply and sorry for the confusion in last e-mail.
But it is hard to explain the situation without that bunch of code.
In my schema I have a field called textoboost that contains copies of a lot
of other fields. Doing the query in this field I got this:
+(((textoboost:aparta
I don't quite understand the problem. What is an example snippet that you
think is incorrect and what do you think the snipppet should be?
Also, try the /browse handler in the Solr example after following the Solr
tutorial to post data. Do a search that will highlight terms similar to what
you
This is a known (unfixed) bug. The workaround is to add a space between each
left parenthesis and field name.
See:
https://issues.apache.org/jira/browse/SOLR-3377
So,
q=(field2:ciao)
becomes:
q=( field2:ciao)
-- Jack Krupansky
-Original Message-
From: Nicolò Martini
Sent: Wednesd
It could be related to https://issues.apache.org/jira/browse/LUCENE-2975. At
least the exception comes from the same function.
"Caused by: java.io.IOException: Invalid vInt detected (too many bits)
at org.apache.lucene.store.DataInput.readVInt(DataInput.java:112)"
What hardware and Java vers
Hi all,
I'm having a problem using the Solr ExtendedDisMax Query Parser with query that
contains fielded searches inside not-plain queries.
The case is the following.
If I send to SOLR an edismax request (defType=edismax) with parameters
1. qf=field1^10
2. q=field2:ciao
3. debugQuery=on (for
I did see a mention yesterday to a situation involving DIH and large XML
files where is was unusually slow, but if the big XML file was broken into
many smaller files it went really fast for the same amount of data. If that
is the case, you don't need to parse all of the XML, just detect the
bo
Hi Jack, hi Erik,
thanks for the tips! It's solr 3.6
I increased the batch to 1000 docs and the timeout to 10 s. Now it works.
And I will implement the retry around the commit-call.
Thx!
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Mittwoch, 6. J
OK Jack. Will do.
On Wed, Jun 6, 2012 at 5:29 PM, Jack Krupansky wrote:
> Check your Solr log file to see whether errors or warnings are issued. If
> Nutch is sending bogus date values, they should produce warnings.
>
> At this stage there are two strong possibilities:
>
> 1. Nutch is simply not
Looks like the commit is taking longer than your set timeout.
On Jun 5, 2012, at 6:51 AM, wrote:
> Hi,
>
> I'm indexing documents in batches of 100 docs. Then commit.
>
> Sometimes I get this exception:
>
> org.apache.solr.client.solrj.SolrServerException:
> java.net.SocketTimeoutException:
I agree, that seems odd. We routinely index XML using either
HTMLStripCharFilter, or XmlCharFilter (see patch:
https://issues.apache.org/jira/browse/SOLR-2597), both of which parse
the XML, and we don't see such a huge speed difference from indexing
other field types. XmlCharFilter also allo
Check your Solr log file to see whether errors or warnings are issued. If
Nutch is sending bogus date values, they should produce warnings.
At this stage there are two strong possibilities:
1. Nutch is simply not sending that date field value at all.
2. Solr is rejecting the date field value be
As Erick says, you are probably hitting an occasional automatic background
merge which takes a bit longer. That is not an indication of a problem.
Increase your connection timeout. Check the log to see how long the merge or
"slow commit" takes. You have a timeout of 1000 which is 1 second. Make
Versions: Nutch: 1.4 and Solr: 3.4
My schema file contains
But I do not know whether this feed plugin is working or not as I am new to
nutch and solr.
Here is my query
http://localhost:8983/solr/select/?q=title:'.$v.'
content:'.$v.'&sort=publishedDat
Read CHANGES.txt carefully, especially the section entitled "Upgrading from
Solr 3.5". For example,
"* As of Solr 3.6, the and sections of
solrconfig.xml are deprecated
and replaced with a new section. Read more in SOLR-1052
below."
If you simply copied your schema/config directly, uncha
See the reply on the other email thread you started.
-- Jack Krupansky
-Original Message-
From: Shameema Umer
Sent: Wednesday, June 06, 2012 6:28 AM
To: solr-user@lucene.apache.org
Subject: Re: How to find the age of a page
Hi Syed Abdul,
I am sorry to ask this basic question as I
My misunderstanding. I thought you were "publishing" to SOLR and wanted the
date when that occurred (indexing).
-- Jack Krupansky
-Original Message-
From: Shameema Umer
Sent: Wednesday, June 06, 2012 4:45 AM
To: solr-user@lucene.apache.org
Subject: Re: How to find the age of a page
H
Step 1: Verify that "publishedDate" is in fact the field name that Nutch
uses for "published date".
Step 2: Make sure the Nutch is passing the date in the format
-MM-DDTHH:MM:SSZ. Whether you need a "Nutch plugin" to do that is not a
question for this Solr mailing list. My (very limited) u
Make sure your port is 8983 or 8080.
On Wed, Jun 6, 2012 at 4:27 PM, Erick Erickson wrote:
> That implies one of two things:
> 1> you changed solr.xml. I'd go back to the original and re-edit
> anything you've changed
> 2> you somehow got a corrupted download. Try blowing your installation
> away
You're probably hitting a background merge and the request is timing
out even though the commit succeeds. Try querying for the data in
the last packet to test this.
And you don't say what version of Solr you're using.
One test you can do is increase the number of documents before
a commit. If mer
Sorry, but your post is really hard to read with all the data inline.
Try running with &debugQuery=on and looking at the parsed query, I suspect
your field lists aren't the same even though you think they are.
Perhaps a typo somewhere?
Best
Erick
On Mon, Jun 4, 2012 at 1:26 PM, André Maldonado
That implies one of two things:
1> you changed solr.xml. I'd go back to the original and re-edit
anything you've changed
2> you somehow got a corrupted download. Try blowing your installation
away and getting a new copy
Because it works perfectly for me.
Best
Erick
On Wed, Jun 6, 2012 at 4:14 AM
Hi Syed Abdul,
I am sorry to ask this basic question as I am new to nutch solr(even new to
java application). Can you tell me how to add tstamp to published date
after re-indexing. Does an update query is enough?
Also, i am not able to get the field *publishedDate* in my query results to
check whe
Hi,
We've had some issues with a bad zero-hits collation being returned for a two
word query where one word was only one edit away from the required collation.
With spellcheck.maxCollations to a reasonable number we saw the various
suggestions without the required collation. We decreased
thres
when ever you reindex add the current TimeStamp .. that will be the publish
date .. from there you can calculate
Thanks and Regards,
S SYED ABDUL KATHER
On Wed, Jun 6, 2012 at 2:16 PM, Shameema Umer [via Lucene] <
ml-node+s472066n3987930...@n3.nabble.com> wrote:
> Hi abdul a
We are using Solr 4.0 (svn build 30th may, 2012) with Solr Cloud. While
querying, we use field collpasing with ngroups set to true. However, there
is a difference in the number of results got and the "ngroups" value
returned.
Ex:
http://localhost:8983/solr/select?q=messagebody:monit%20AND%20usergr
Hi abdul and Jack,
i got the tstamp working but I really need to know the published date of
each page.
On Sat, Jun 2, 2012 at 12:01 AM, Jack Krupansky wrote:
> If you uncomment the "timestamp" field in the Solr example, Solr will
> automatically initialize it for each new document to be the tim
Hi :)
Looks like you forgot to paste your schema.xml and the error in your
e-mail : o
Gary
Le 06/06/2012 10:14, Spadez a écrit :
Hi,
I installed a fresh copy of Solr 3.6.0 or my server but I get the following
page when I try to access Solr:
http://176.58.103.78:8080/solr/
It says errors t
Hi,
I installed a fresh copy of Solr 3.6.0 or my server but I get the following
page when I try to access Solr:
http://176.58.103.78:8080/solr/
It says errors to do with my Solr.xml. This is my solr.xml:
I really cant figure out how I am meant to fix this, so if anyone is able to
give some in
46 matches
Mail list logo