I ran into this today it took me longer than it should have to figure
out the problem, so I wanted to write and share my experience to save
someone else some time. A web search and a search through the mail
archives didn't provide any elucidation.
If you run SolrJ 4.0.0 BETA connecting to Sol
--- Begin Message ---
Apache Solr 1.4 has been released and is now available for public
download!
http://www.apache.org/dyn/closer.cgi/lucene/solr/
Solr is the popular, blazing fast open source enterprise search
platform from the Apache Lucene project. Its major features include
powerful ful
Apologies. Meant to forward the message to a corporate internal list.
I blame my e-mail address auto-complete. ;-)
Sean Timm wrote:
Subject:
[ANN] Solr 1.4.0 Released
From:
Grant Ingersoll
Date:
Tue, 10 Nov 2009 11
If you have a number of long queries running, your system can become CPU
bound resulting in low throughput and high response times. There are
many ways you can construct a query that will cause it to take a long
time to process, but the SOLR-502 patch can only address the ones where
the work i
This should be part of the lucene-core-2.4-dev.jar which is in
lucene/solr/trunk/lib
% unzip -l lucene-core-2.4-dev.jar | grep TimeLimitedCollector
251 06-19-08 08:57
org/apache/lucene/search/TimeLimitedCollector$1.class
1564 06-19-08 08:57
org/apache/lucene/search/TimeLimitedCo
So how about a run off between #2 (straight line family member with most
votes) and #3 (normal font)?
-Sean
Yonik Seeley wrote:
OK, so looking at family totals:
33 - the curvy family (9,10,11)
36 - #3 (normal font)
64 - straight line family
Again 36 and 64 aren't directly comparable since
Patrick is on vacation this week. You can get the authoritative answer
when he is back, but I believe localsolr is on an older trunk build and
has not yet made the mods needed to work with the latest UpdateProcessor
changes. With the flurry of activity on the Solr trunk in the past
month, I t
Length normalization in the Similarity class will generally favor
shorter fields. For example, with the DefaultSimilarity, the length
norm for a 2 term field is 0.625. For a three term field it is 0.5.
The norm is multiplied by the score.
I say "generally will favor" because the length norm
By
using our score length norm function, Doc2's score will be multiplied by
1.0f and Doc1 by 0.875f resulting in the desired behavior.
Doc1: Chevrolet Tahoe Hybrid 2008
Doc2: Chevrolet Tahoe 2008
-Sean
Mark Miller wrote:
Sean Timm wrote:
To solve this, we wrote our own Similarity clas
https://issues.apache.org/jira/browse/LUCENE-1360
Simon Hu wrote:
I am definitely interested in trying your Similarity class. Can you please
post the patch in jira?
thanks
-Simon
Sean Timm wrote:
In the example below, Doc1, and Doc2 will all have the same score for
the query
Chris--
Sorry, your e-mail got lost in the noise. You're right, there does
appear to be a problem. I can reproduce this by setting the "root"
level to "OFF" and then setting it back to "INFO". I'll take a look
into it. Have you opened a JIRA issue for this?
-Sean
Chris Hostetter wrote:
I didn't see a bug on this issue, so I opened SOLR-774 with a patch to
fix this.
-Sean
Sean Timm wrote:
Chris--
Sorry, your e-mail got lost in the noise. You're right, there does
appear to be a problem. I can reproduce this by setting the "root"
level to "OFF&quo
soll wrote:
See also https://issues.apache.org/jira/browse/SOLR-502 (timeout
searches)
and https://issues.apache.org/jira/browse/LUCENE-997
This is committed on trunk and will be in 1.3. Don't ask me how it
works, b/c I haven't tried it yet, but maybe Sean Timm or someone can
help o
From the XML 1.0 spec.: "Legal characters are tab, carriage return,
line feed, and the legal graphic characters of Unicode and ISO/IEC
10646." So, \005 is not a legal XML character. It appears the old StAX
implementation was more lenient than it should have been and Woodstox is
doing the corr
Add echoParams=all to your URL and look for the "cat" field in one of
the passed parameters. Specifically, in pf and qf. These can be
defaulted in the solrconfig.xml file.
-Sean
Jon Drukman wrote:
whenever i try to use qt=dismax i get the following error:
Sep 22, 2008 11:50:48 AM org.apach
I heard a story that the 'r' in Solr back in the CNet days stood for
Resin (the servlet container). True? Clearly the "w/ replication"
makes more sense now as probably both Tomcat and Jetty deployments are
more common now.
Just curious,
Sean
Chris Hostetter wrote:
: Can we spell out the au
Is it possible to do date math in a FunctionQuery? This doesn't work,
but I'm looking for something like:
bf=recip((NOW-updated),1,200,10) when using DisMax to get the elapsed
time between NOW and when the document was updated (where updated is a
Date field).
I know one can do rord(updated)
http://issues.apache.org/jira/browse/SOLR-527 (An XML commit only
request handler) is pertinent to this discussion as well.
-Sean
Ian Holsman wrote:
There was a patch by Sean Timm you should investigate as well.
It limited a query so it would take a maximum of X seconds to execute,
and
I believe the Solr replication scripts require POSTing a commit to read
in the new index--so at least limited POST capability is required in
most scenarios.
-Sean
Lance Norskog wrote:
About that "read-only" switch for Solr: one of the basic HTTP design
guidelines is that GET should only retur
https://issues.apache.org/jira/secure/attachment/12394165/solr-logo.png
https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png
https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg
https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impressio
Matthew Runo wrote:
Which papers did you see that actually talked about using clicks? I
don't see those, beyond "Addressing Malicious Noise in Clickthrough
Data" by Filip Radlinski and also his "Query Chains: Learning to Rank
from Implicit Feedback" - but neither is really on topic.
Here are t
Jana, Kumar Raja wrote:
2. If I set SolrQuery.setTimeAllowed(2000) Will this kill query
processing after 2 secs? (I know this question sounds silly but I just
want a confirmation from the experts J
That is the idea, but only some of the code is within the timer. So,
there are cases where
-
From: Sean Timm [mailto:tim...@aol.com]
Sent: Wednesday, February 18, 2009 1:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Query regarding setTimeAllowed(Integer) and
setRows(Integer)
Jana, Kumar Raja wrote:
2. If I set SolrQuery.setTimeAllowed(2000) Will this kill query
processing
We too use Heritrix. We tried Nutch first but Nutch was not finding all
of the documents that it was supposed to. When Nutch and Heritrix were
both set to crawl our own site to a depth of three, Nutch missed some
pages that were linked directly from the seed. We ended up with 10%-20%
fewer pages in
gt; Tony
>
> On Fri, Mar 6, 2009 at 8:08 AM, Sean Timm wrote:
>
>
>> We too use Heritrix. We tried Nutch first but Nutch was not finding all
>> of the documents that it was supposed to. When Nutch and Heritrix were
>> both set to crawl our own site to a depth of thre
This is the funniest e-mail I've had all day. SOLer is the typical
pronunciation, but I've heard solAR as well. It's the description of
"pirate-like" that made me chuckle.
-Sean
Charles Federspiel wrote:
Hi,
My company is evaluating different open-source indexing and search software
and we
It looks like the dataimporter.functions.escapeSql(String) function
escapes quotes, but fails to escape '\' characters which are problematic
especially when the field value ends in a \. Also, on failure, I get an
alarming notice of a possible resource leak. I couldn't find Jira
issues for eit
Chris Hostetter wrote:
this can be accomplished by indexing a numeric field containing the
"length" of the field as a number, and then doing a secondary sort on it.
the fieldNorm typically takes care of this sort of thing for you, but is
more of a generalized concept, and doesn't give you exac
Java provides one. You probably want to use utf-8 as the encoding scheme.
http://java.sun.com/javase/6/docs/api/java/net/URLEncoder.html
Note you also will want to strip or escape character that are meaningful
in the Solr/Lucene query syntax.
http://lucene.apache.org/java/2_4_0/queryparsersyn
It should probably be configurable: (1) return nothing if no match, (2)
substitute with an alternate field, (3) return first sentence or N
number of tokens.
-Sean
Yonik Seeley wrote on 8/9/2007, 5:50 PM:
> On 8/9/07, Benjamin Higgins <[EMAIL PROTECTED]> wrote:
> > Thanks Mike. I didn't thin
This may be your problem. The below docs are for the HTTP connector,
simlar configuration can be made to the AJP and other connectors
See
http://tomcat.apache.org/tomcat-6.0-doc/config/http.html
URIEncoding
This specifies the character encoding used to decode the URI bytes,
after %xx decodin
Indexes cannot be directly compared unless they have similar collection
statistics. That is the same terms occur with the same frequency
across all indexes and the average document lengths are about the same
(though the default similarity in Lucene may not care about average
document length--I
Similarly, if you know that you are dealing with domain names or ip
addresses (or other text with discrete parts), you can reverse the order
of the parts rather than at the character level making it more human
readable: com.example.www Your query would then be sent as com.example.*
-Sean
Ian
It seems the best thing to do would be to do a case-insensitive
spellcheck, but provide the suggestion preserving the original case that
the user provided--or at least make this an option. Users are often
lazy about capitalization, especially with search where they've learned
from web search e
That is one of my peeves with the Solr Javadocs. Few of the @deprecated
tags (if any) tell what you should be using instead. In this particular
case, the answer is very simple. The class merely moved to a new package:
from
http://lucene.apache.org/solr/api/org/apache/solr/request/DisMaxReques
Take a look at https://issues.apache.org/jira/browse/SOLR-236 Field
Collapsing.
-Sean
Head wrote:
I would like to be able to tell SOLR to dedup the results based on a certain
set of fields. For example, I like to return only one instance of the set
of documents that have the same 'name' and
Music is another domain where this is a real problem. E.g., "The The",
"The Who", not to mention the song and album names.
-Sean
Walter Underwood wrote:
We do a similar thing with a no stopword, no stemming field.
There are a surprising number of movie titles that are entirely
stopwords. "Be
Send the URL with the å character URL encoded as %C3%A5. That is the
UTF-8 URL encoding.
http://myserver:8080/solrproducts/select/?q=all_SV:ljusbl%C3%A5+status:online&fl=id%2Cartno%2Ctitle_SV%2CtitleSort_SV%2Cdescription_SV%2C&sort=titleSort_SV+asc,id+asc&start=0&q.op=AND&rows=25
-Sean
Danie
Jonathan Ariel wrote:
How do you to partition the data to a static set and a dynamic set, and then
combining them at query time? Do you have a link to read about that?
One way would be distributed search (SOLR-303), but distributed idf is
not part of the current patch anymore, so you may have
Noble--
You should probably include SOLR-505 in your DataImportHandler patch.
-Sean
Noble Paul നോബിള് नोब्ळ् wrote:
It is caused by the new caching feature in Solr. The caching is done
at the browser level . Slr just sends appropriate headers. .We had
raised an issue to disable that.
BTW Th
It seems that the DisMaxRequestHandler tries hard to handle any query
that the user can throw at it.
From http://wiki.apache.org/solr/DisMaxRequestHandler:
"Quotes can be used to group phrases, and +/- can be used to denote
mandatory and optional clauses ... but all other Lucene query parser
s
I can take a stab at this. I need to see why SOLR-502 isn't working for
Otis first though.
-Sean
Bram de Jong wrote:
On Tue, Jun 3, 2008 at 1:26 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
+1. Fault tolerance good. ParseExceptions bad.
Can you open a JIRA issue for it? If you feel
+1
Shridhar Venkatraman wrote on 4/7/2007, 12:13 AM:
B is a bit cartoony (someone said
that earlier)..mainly because of the
letters, yet fresh.
A appears dated (an 80's look).
An alternate (C?) that retains the sunflare from B but changes the
letters to be more staid may add the requir
It may not be easy or even possible without major changes, but having
global collection statistics would allow scores to be compared across
searchers. To do this, the master indexes would need to be able to
communicate with each other.
An other approach to merging across searchers is describe
Yes, for good (hopefully)
or bad.
-Sean
Shridhar Venkatraman wrote on 5/7/2007, 12:37 AM:
Interesting..
Surrogates can also bring the searcher's subjectivity (opinion and
context) into it by the learning process ?
shridhar
Sean Timm wrote:
It may not be easy or even pos
45 matches
Mail list logo