Hi,
I would like to return results sorted by score (desc), but i would like to
insert random results into some predefined slots (lets say 10, 14 and 18).
The reason I want to do that is I boost click-through rate based features
significantly and i want to give a chance to documents which doesnt ha
Thanks, i'll check the issues.
-Original message-
> From:Jack Krupansky
> Sent: Mon 04-Jun-2012 17:19
> To: solr-user@lucene.apache.org
> Subject: Re: Add HTTP-header from ResponseWriter
>
> There is some commented-out code in SolrDispatchFilter.doFilter:
>
> // add info to http heade
Other option I could think of is to write a custom component which implements
handleResponses, where i can pick random documents from across shards and
insert it into the ResponseBuilder's resultIds ? I would place this
component at the end (or after QueryCOmponent). will that work ? is there a
bet
Hi,
On trunk the maxScore response attribute is always returned even if score is
not part of fl. Is this intentional?
Thanks,
The reason multi word synonyms work better if you use LUCENE_33 is because
then Solr uses the SlowSynonymFilter instead of SynonymFilterFactory
(FSTSynonymFilterFactory).
But I don't know if the difference between them is a bug or not. Maybe
someone has more insight?
Bernd Fehling-2 wrote
>
>
Hello,
I advanced on my problem.
The index and fieldtype are good :
I forgot copyfield "body_strip_html" on text, the defaultSearchField.
Newbie's mistake.
Now, solr returns all xml files i want.
But, in php, the text isn't displayed for 2 xml files (with term "castor"
snipped by html or xml t
Do you have test cases?
What are you sending to your SynonymFilterFactory?
What are you expecting it should return?
What is it returning when setting to Version.LUCENE_33?
What is it returning when setting to Version.LUCENE_36?
Am 05.06.2012 10:56, schrieb O. Klein:
> The reason multi word s
Hi,
We use solrcloud in production, and we are facing some issues with queries
that take very long specially deep paging queries, these queries keep our
servers very busy. i am looking for a way to stop (kill) queries taking
longer than a specific amount of time (say 5 seconds), i checked timeAllo
maybe look into your solrconfig.xml file whether fl not set by default on
your request handler
--
View this message in context:
http://lucene.472066.n3.nabble.com/maxScore-always-returned-tp3987727p3987733.html
Sent from the Solr - User mailing list archive at Nabble.com.
Older versions of Solr didn't really sort correctly on multivalued fields, they
just didn't complain .
Hmmm. Off the top of my head, you can:
1> You don't say what the documents to be indexed are. Are they Solr-style
documents on disk or do you process them with, say, a SolrJ program?
Hi,
I'm indexing documents in batches of 100 docs. Then commit.
Sometimes I get this exception:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketTimeoutException: Read timed out
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java
Hi,
I'm runing a cluster of Solr serveres for an index split up in a lot of shards.
Each shard is replicated. Current setup is one Tomcat instance per shard, even
if the Tomcats are running on the same machine.
My question is this:
Would it be more advisable to run one Tomcat per machine with
Hi.
We set fl in the request handler's default without score.
thanks
-Original message-
> From:darul
> Sent: Tue 05-Jun-2012 12:05
> To: solr-user@lucene.apache.org
> Subject: Re: maxScore always returned
>
> maybe look into your solrconfig.xml file whether fl not set by default on
>
Hi,
I'm adding the numFound to the HTTP response header in a custom
SolrDispatchFilter in the writeResponse() method, similar to the commented code
in doFilter(). This works just fine but not for distributed requests. I'm
trying to read "hits" from the SolrQueryResponse but it is not there for
There isn't a solution for killing long running queries that works.
On Tue, Jun 5, 2012 at 1:34 AM, arin_g wrote:
> Hi,
> We use solrcloud in production, and we are facing some issues with queries
> that take very long specially deep paging queries, these queries keep our
> servers very busy. i a
There's an open issue for improving deep paging performance:
https://issues.apache.org/jira/browse/SOLR-1726
-Original message-
> From:arin_g
> Sent: Tue 05-Jun-2012 12:03
> To: solr-user@lucene.apache.org
> Subject: Search timeout for Solrcloud
>
> Hi,
> We use solrcloud in producti
Is it possible to filter out numbers and disclaimer ( repeated contents)
while indexing to SOLR?
These are all surplus information and do not want to index it
I have tried using boilerpipe algorithm as well to remove surplus
infromation from web pages such as navigational elements, templates, and
Say I have various categories of 'tags'. I want a keyword search to search
through my index of articles. So I search over:
1) the title.
2) the body
3) about 10 of these tag-categories. Each tag category is multivalued with a
few words per value.
Without considering the affect on 'relevance', and
I resolve my problem:
I had to specify the field to return with my query.
Thanks A LOT for your help !
--
View this message in context:
http://lucene.472066.n3.nabble.com/Strip-html-tp3987051p398.html
Sent from the Solr - User mailing list archive at Nabble.com.
By saying "dirty data" you imply that only one of the values is "good" or
"clean" and that the others can be safely discarded/ignored, as opposed to
true multi-valued data where each value is there for good reason and needs
to be preserved. In any case, how do you know/decide which value should
Hi Gora,
> Your configuration files look fine. It would seem that something
> is going wrong with the SELECT in Oracle, or with the JDBC
> driver used to access Oracle. Could you try:
* Manually doing the SELECT for the entity, and sub-entity
> to ensure that things are working.
>
The SELECTs
Take a look at "query elevation". It may do exactly want you want, but at a
minimum, it would show you how this kind of thing can be done.
See:
http://wiki.apache.org/solr/QueryElevationComponent
-- Jack Krupansky
-Original Message-
From: srinir
Sent: Tuesday, June 05, 2012 3:08 AM
T
Hello SOLR users,
is there someone who wrote plugins for HypericHQ to monitor the very many
metrics SOLR exposes through JMX?
I am a kind of newbie to JMX and the tutorials of Hyperic aren't simple enough
to my taste... so I'd be helped if someone did it already.
thanks in advance
Paul
I sucessfully use Oracle with DIH although none of my imports have
sub-entities. (slight difference, I'm on ojdbc5.jar w/10g...). It may be you
have a driver that doesn't play well with DIH in some cases. You might want to
try these possible workarounds:
- rename the columns in SELECT with "
Hello Grant,
I need to frame a query that is a combination of two query parts and I use a
'function' query to prepare the same. Something like:
q={!type=func q.op=AND df=text}product(query($uq,0.0),query($cq,0.1))
where $uq and $cq are two queries.
Now, I want a search result returned only if I
Hi,
One of the possibilities for this kind of issue to occur may be the case
sensitivity of column names in Oracle.
Can you apply a transformer and check the entity map which actually
contains the keys and their values ?
Also, please try specifying upper case field names for Oracle and try if
that
I don't have the answer to your question, but I certainly don't think
anybody should be slapped in the face for asking a question!
Michael Della Bitta
Appinions, Inc. -- Where Influence Isnāt a Game.
http://www.appinions.com
On Tue, Jun 5, 2012 a
Hi,
Sorry, I am stumped, and cannot help further without
access to Oracle. Please disregard the bit about the
quotes: I was reading a single quote followed by a
double quote as three single quotes. There was no
issue there.
Since your configurations for Oracle, and mysql are
different, are you us
Hi James.
Thanks for your advice.
As I said, alias works for me. I use joins instead of sub-entities...
Heavily...
These config files work for me...
db-data-config.xml
My (very limited) understanding of "boilerpipe" in Tika is that it strips
out "short text", which is great for all the menu and navigation text, but
the typical disclaimer at the bottom of an email is not very short and
frequently can be longer than the email message body itself. You may have to
Hi Gora,
Yes, I restart Solr for each change I do.
Thanks for your help...
An small question Is DIH work well with Oracle database? Using all the
features It can do?
On Tue, Jun 5, 2012 at 9:32 AM, Gora Mohanty wrote:
> Hi,
>
> Sorry, I am stumped, and cannot help further without
> acces
On 5 June 2012 20:05, Rafael Taboada wrote:
> Hi James.
>
> Thanks for your advice.
>
> As I said, alias works for me. I use joins instead of sub-entities...
> Heavily...
> These config files work for me...
[...]
How about NULL values in the column that
you are doing a left outer join on? Cannot
On 5 June 2012 20:08, Rafael Taboada wrote:
> Hi Gora,
>
> Yes, I restart Solr for each change I do.
>
> Thanks for your help...
>
> An small question Is DIH work well with Oracle database? Using all the
> features It can do?
Unfortunately, I have never used DIH with Oracle. However,
this sho
There may be a raw performance advantage to having all values in a single
combined field, but then you loose the opportunity to boost title and tag
field hits.
With the extended dismax query parser you have the ability to specify the
field list in the "qf" request parameter so that the query c
Quick reminder, we're meeting at The Plough in Bloomsbury tomorrow night.
Details and RSVP on the meetup page:
http://www.meetup.com/london-search-social/events/65873032/
--
Richard Marr
On 3 Jun 2012, at 00:29, Richard Marr wrote:
>
> Apologies for the short notice guys, we're meeting up at
Thanks for the responses,
By saying "dirty data" you imply that only one of the values is "good" or
> "clean" and that the others can be safely discarded/ignored, as opposed to
> true multi-valued data where each value is there for good reason and needs
> to be preserved. In any case, how do you k
IRC, Lucene in Action book loops around this point almost every chapter:
multifield query is faster.
On Tue, Jun 5, 2012 at 7:04 PM, Jack Krupansky wrote:
> There may be a raw performance advantage to having all values in a single
> combined field, but then you loose the opportunity to boost titl
On 5 June 2012 22:05, Mikhail Khludnev wrote:
> IRC, Lucene in Action book loops around this point almost every chapter:
> multifield query is faster.
[...]
Surely this is dependent on the type, and volume of one's
data? As with many issues, isn't the answer that "it depends",
i.e., one should pr
I'm curious... how deep is it that is becoming problematic? Tens of pages,
hundreds, thousands, millions?
And when you say deep paging, are you incrementing through all pages down to
the depth or "gapping" to some very large depth outright? If the former, I
am wondering if the Solr cache is bu
Hi,
First off, I'm about a week into all things Solr, and still trying to figure
out how to fit my relational-shaped peg through a denormalized hole. Please
forgive my ignorance below :-D
I have the need store a One-to-N type relationship, and perform a boost a
related field.
Let's say I want to
We've encountered GC spikes at Etsy after adding new
ExternalFileFields a decent number of times. I was always a little
confused by this behavior -- isn't it just one big float[]? why does
that cause problems for the GC? -- but looking at theĀ FileFloatSource
code a little more carefully, I wonder i
for example when we set the start parameter to 1000, 2000 or higher (page
100, 200 ...), it takes very long (20, 30 seconds, sometimes even 100
seconds).
this usually happens when there is a big gap between pages, mostly hit by
web crawlers (when they crawl the last page link on our website).
--
Hey guys, I am trying to upgrade to Solr 4.0. Do you know where I get a clean
4.0 commit for production use?
I did an SVN checkout from
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/
and it looks like they have migrated to 5.0. From the link below it looks
like that happened by the end of
: Hey guys, I am trying to upgrade to Solr 4.0. Do you know where I get a clean
Clarification: 4.0 does not exist yet. What does exist is the 4x branch,
from which you can build snapshots that should be very similar to what
will eventually be released as 4.0.
: http://svn.apache.org/repos/asf
The Nightly Build wiki still says it is "4.x" even though it is now 5.x.
See:
https://wiki.apache.org/solr/NightlyBuilds
AFAIK, there isn't a 4.x nightly build running. (Is that going to happen
soon??)
You can checkout the repo for the 4x branch:
http://svn.apache.org/repos/asf/lucene/dev/bran
: The Nightly Build wiki still says it is "4.x" even though it is now 5.x.
: See:
: https://wiki.apache.org/solr/NightlyBuilds
:
: AFAIK, there isn't a 4.x nightly build running. (Is that going to happen
: soon??)
Yes...
http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3c3fd307e
I've updated the wiki to try and fill in some of these holes...
http://wiki.apache.org/solr/ExtractingRequestHandler
: i'm looking at using Tika to index a bunch of documents. the wiki page seems
to be a little bit out of date ("// TODO: this is out of date as of Solr 1.4 -
dist/apache-solr-ce
: The real issue here is that the docs are created externally, and the
: producer won't (yet) guarantee that fields that should appear once will
: actually appear once. Because of this, I don't want to declare the field as
: multiValued="false" as I don't want to cause indexing errors. It would be
: It seems that TermComponent is looking at all versions of documents in the
index.
:
: Does this is the expected behavior for TermComponent? Any suggestion about
how to solve this?
Yes...
http://wiki.apache.org/solr/TermsComponent
"The doc frequencies returned are the number of documents tha
Hoss,
In your edit, I noticed that the wiki makes "SolrPlugin" a link, but to a
nonexistent page, although the page "SolrPlugins" does exist.
See: "it is provided as a SolrPlugin,"
http://wiki.apache.org/solr/ExtractingRequestHandler
I also noticed a few other things:
1. Reference to the "/s
It probably can work out reasonably well in both scenarios, but you do get
some additional flexibility with multiple Tomcat instances:
1. Any "per-instance" Tomcat limits become per-core rather than for all
cores on that machine.
2. If you have to restart Tomcat, only a single shard is impacte
Thanks for your reply!
I tried using the types field in WordDelimiterFilterFactory wherein I was
passing a text file which contained % $ as alphabets. But even then it didnt
get indexed and neither did it show up in search results.
Am I missing something?
Thanks,
Kushal
--
View this message in c
I used 3.x mysql.
After I migrate to 5.x mysql, I don't get same error just like ' Unable to
execute query'.
Maybe low version of mysql and Solr have some problems, I don't know
exactly.
2012/6/5 Jihyun Suh
> That's why I made a new DB for dataimport test. So my tables have no
> access or activ
I have 128 tables of mysql 5.x and each table have 3,5000 rows.
When I start dataimport(indexing) in Solr, it takes 5 minutes for one
table.
But When Solr indexs 20th table, it takes around 10 minutes for one table.
And then When it indexs 40th table, it takes around 20 minutes for one
table.
Solr
Thanks Jack for your help!
I found my mistake, rather than classifying those special characters as
ALPHA , I classified it as a DIGIT. Also I missed the same entry for search
analyzer. So probably that was the reason for not getting relevant results.
I spent a lot of time figuring this out. So I'l
Thanks. I'm sure someone else will have the same issue at some point.
-- Jack Krupansky
-Original Message-
From: KPK
Sent: Tuesday, June 05, 2012 9:51 PM
To: solr-user@lucene.apache.org
Subject: Re: index special characters solr
Thanks Jack for your help!
I found my mistake, rather th
You wrote "3,5000", but is that 35 hundred (3,500) or 35 thousand (35,000)??
Your numbers seem far worse than what many people typically see with Solr
and DIH.
Is the database running on the same machine?
Check the Solr log file to see if some errors (or warnings) might be
occurring frequent
Which Solr do you run?
On Tue, Jun 5, 2012 at 8:02 PM, Jack Krupansky wrote:
> You wrote "3,5000", but is that 35 hundred (3,500) or 35 thousand (35,000)??
>
> Your numbers seem far worse than what many people typically see with Solr
> and DIH.
>
> Is the database running on the same machine?
>
>
Hi,
We are hiring multiple Lucene/Solr engineers, tech leads, architects based
in Minneapolis - both full time and consulting for developing new search
platform.
Please reach out to me - svamb...@gmail.com
Thanks,
Venkat Ambati
Sr. Manager, Best Buy
We are using SOLR 1.4, and we are experiencing full index replication
every 15 minutes.
I have checked the solrconfig and it has maxsegments set to 20. It
appears like it is indexing a segment, but replicating the whole
index.
How can I verify it and possibly fix the issue?
--
Bill Bell
billnb.
60 matches
Mail list logo