Besides changing the scoring algorithm, what about "Field Collapsing" -
http://wiki.apache.org/solr/FieldCollapsing - to collapse the results from
same website url?
Yunfei
On Mon, Oct 29, 2012 at 12:43 AM, Alexander Aristov <
alexander.aris...@gmail.com> wrote:
> Hi everybody,
>
> I have a ques
Interesting but not exactly what I want to get.
If I group items then I will get small number of docs. I don't want this. I
need all of them.
Best Regards
Alexander Aristov
On 29 October 2012 12:05, yunfei wu wrote:
> Besides changing the scoring algorithm, what about "Field Collapsing" -
> h
On Mon, Oct 29, 2012 at 7:04 AM, Shawn Heisey wrote:
> They are indeed Java options. The first two control the maximum and
> starting heap sizes. NewRatio controls the relative size of the young and
> old generations, making the young generation considerably larger than it is
> by default. The
Dne 29.10.2012 0:09, Lance Norskog napsal(a):
1) Do you use compound files (CFS)? This adds a lot of overhead to merging.
i do not know. whats solr configuration statement for turning them on/off?
2) Does ES use the same merge policy code as Solr?
ES rate limiting:
http://www.elasticsearch.or
Hi,
It seems to work without the cache option, the downside is it will takes
ages for everything to be indexed and my testset is 20 times smaller than
the productset.
Indexing just the root item takes 3 minutes (>600K) but every subentity
takes more time which is obvious but i would've hoped it w
is there JIRA ticket dedicated to throttling segment merge? i could not
find any, but jira search kinda sucks.
It should be ported from ES because its not much code.
>
> Is there way to set-up logging to output something when segment merging
>>> runs?
>>>
>>> I think segment merging is logged when you enable infoStream logging
>> (you
>> should see it commented in the solrconfig.xml)
>>
> no, segment merging is not logged at info level. it needs customized lo
With Lucene 4.0, FSDirectory now supports merge bytes/sec throttling
(FSDirectory.setMaxMergeWriteMBPerSec): it rate limits that max
bytes/sec load on the IO system due to merging.
Not sure if it's been exposed in Solr / ElasticSearch yet ...
Mike McCandless
http://blog.mikemccandless.com
On Mo
Hello Solr Gurus,
I am newbie to solr application, below are my requirements:
1. We have 7 folders having indexed files, which SOLR application to be
pointed. I understand shards feature can be used for searching. If there is
any other alternative. Each folder has around 24 million documents.
2.
I don't think you're reading the grouping right. When you use grouping,
you get the top N groups, and within each group you get the top M
scoring documents. So you can actually get _more_ documents back than in
the non-grouping case and your app can then intelligently intersperse them
however you w
Hi.Is there any solution to facet documents with specified prefix on some tokenized field, but in result gets the original value of a field?e.q.: Indexed value: "toys for children"query: q=&start=0&rows=0&facet.limit=-1&facet.mincount=1&f.category_ac.facet.prefix=chi&facet.field=
Hello!
Do you have to use faceting for prefixing ? Maybe it would be better to use
ngram based field and return the stored value ?
--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
> Hi.
> Is there any solution to facet documents with speci
On 29 October 2012 15:36, mroosendaal wrote:
> Hi,
>
> It seems to work without the cache option, the downside is it will takes
> ages for everything to be indexed and my testset is 20 times smaller than
> the productset.
Someone else will have to respond as to how to use CachedSqlEntity in
Solr
Hi,
We want to remove some results from the result set based on the result of some
algorithms on some fields in adjacent documents. For example, if doc2 resembles
or doc1 we want to remove it. We cannot do this in a search component because
of problems with paging, maintaining rows=N results de
Hi everyone,
at this year's Berlin Buzz words conference someone (sematext?) have
described a technique of a hot shard. The idea is to have a slim shard to
maximize the update throughput during a day (when millions of docs need to
be posted) and make sure the indexed documents are immediately sear
I have the Hi all,
I have the following schema
offer:
- offer_id
- offer_title
- offer_description
- related_shop_id
- related shop_name
- offer_price
Each offer is related to a shop.
In one shop, we have many offers
I would like show in one page (26 offers) only one offer from a shop.
I need
Hi!
I am very excited to announce the availability of Apache Solr 4.0 with
RankingAlgorithm 1.4.4 and Realtime NRT. Realtime NRT is a high
performance and more granular NRT implementation as to soft commit. The
update performance is about 70,000 documents / sec* (almost 1.5-2x
performance imp
Hello!
Try the grouping feature of Solr -
http://wiki.apache.org/solr/FieldCollapsing . You can collapse
documents based on the field value, which would be a shop identifier
in your case.
--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
> I have
hello dear,
I try running apache-solr 3.6.1 on my linux host using java -jar start.jar,
everything is running fine.
but java stop running as soon as the command terminal close. i have searching
for means of making it run continuously but i did not see.
please i need help on making apache-solr r
Hello!
What you are experiencing is normal - when you run an application in
the foreground it will close when you close the terminal. You can use
command like screen or you can use nohup. However I would advise
installing a standalone Jetty or Tomcat and just deploy Solr as any
other web applicati
Hi!
I am very excited to announce the availability of an integrated Apache
Solr 4.0 with Realtime NRT download:
http://solr-ra.tgels.org/realtime-nrt.jsp
Realtime NRT has been contributed back to Solr, see JIRA:
https://issues.apache.org/jira/browse/SOLR-3816
Here is more info about Realtime
Hi - Detach it from the terminal: java -jar start.jar &
-Original message-
> From:zakari mohammed
> Sent: Mon 29-Oct-2012 15:22
> To: solr-user@lucene.apache.org
> Subject: Having problem runing apache-solr on my linux server
>
>
> hello dear,
> I try running apache-solr 3.6.1 on
Could any of the committers here confirm whether this is a legitimate
effort? I mean, how could anything labeled "Apache ABC with XYZ" be an
"external project" and be sanctioned/licensed by Apache? In fact, the linked
web page doesn't even acknowledge the ownership of the Apache trademarks or
A
If your subentities are large, the default DIH Cache probably isn't going to
work because it stores all the data in-memory. (This is
CachedSQLEntityProcessor for Solr 3.5 or earlier ;
cacheImpl="SortedMapBackedCache" for 3.6 or later)
DIH for Solr 3.6 and later supports pluggable caches (see
It certainly seems to be a rogue project, but I can't understand the
meaning of "realtime near realtime (NRT)" either. At best, its oxymoronic.
On 10/29/2012 10:30 AM, Jack Krupansky wrote:
Could any of the committers here confirm whether this is a legitimate
effort? I mean, how could anything
Hi Tobias,
You can pass a comma-delimited list of Zk addresses in your ensemble, such
as:
zk1:2181,zk2:2181,zk3:2181, etc.
Cheers,
Tim
On Mon, Oct 29, 2012 at 2:42 AM, Tobias Kraft wrote:
> Hi,
>
> when I need high availability for my Solr environment it is recommended to
> run a Zookeeper ens
Hi Erick,
I have upgraded from 3.5 to 4.0 on first index build i was able to see the
distinct terms on UI under Schema Browser section. I had to add a new date
field as stored to schema and re-index.
After that all the time on UI for every field i see the distinct term as
"-1".
Can you please adv
Hi,
the subject says it all :)
Is there something like sum() available in a solr query to sum all
values of a field ?
Regards,
Markus Mirsberger
Hi, SOLR gurus
we're experiencing a crash with SOLR 4.0 whenever the results contain
multibyte characters (more precisely: German umlauts, utf-8 encoded).
The crashes only occur when using ReversedWildcardFilterFactory (which
is necessary in 4.0 to be able to have wildcards at the beginning of
th
Unfortunately, neither the subject nor your message says it all. Be
specific - what exactly do you want to sum? All matching docs? Just the
returned docs? By group? Or... what?
You can of course develop your own search component that does whatever it
wants with the search results.
-- Jack Kr
Jack:
I respect your hard-work responding to user problems on the mail list.
So it would be nicer to try out Realtime NRT then pass rogue comments,
whether a contribution is legit/spam or a scam... I guess it illuminates
the narrow minded view of oneself ... The spirit of open source is
con
I have for example an integer field and want to sum all these values for
all the matching documents.
Similar to this in sql:
SELECT SUM(/expression/ )
FROM tables
WHERE predicates;
Regards,
Markus
On 29.10.2012 22:25, Jack Krupansky wrote:
Unfortunately, neither the subject nor your message
Hello!
Take a look at StatsComponent - http://wiki.apache.org/solr/StatsComponent
--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
> I have for example an integer field and want to sum all these values for
> all the matching documents.
> Similar
Hi Tomas,
I think this is same case Marian reported before.
https://issues.apache.org/jira/browse/SOLR-3193
https://issues.apache.org/jira/browse/SOLR-3901
--- On Mon, 10/29/12, Tomas Zerolo wrote:
> From: Tomas Zerolo
> Subject: SOLR 4.0 + ReversedWildcardFilterFactory + DefaultSolrHighligh
I'd like to use faceting. I don't want list of documents.
Using ngram would give me response which is useless for me.
Querying smth like this:
fq=category_ngram:child&facet.field=category_exactly
would give me something like this (for multivalued category fields):
"toys for children"
"games"
"m
On Mon, Oct 29, 2012 at 08:55:27AM -0700, Ahmet Arslan wrote:
> Hi Tomas,
>
> I think this is same case Marian reported before.
>
> https://issues.apache.org/jira/browse/SOLR-3193
> https://issues.apache.org/jira/browse/SOLR-3901
Thanks, Ahmet. Yes, by the descriptions they look very similar. I'
On 10/28/2012 11:34 PM, maneesha wrote:
For creating the full index for my core every quarter from scratch:
- I create a new core e.g. "blahNov2012" using admin url with option
action=CREATE and I give it a new dataDir property e.g.
/home/blah/data/Nov2012.
- I do a full import on blahNov2012 t
As an external observer, I think the main problem is your branding.
"Realtime Near Realtime" is definitely an oxymoron, and your ranking
algorithm is called "Ranking Algorithm," which is generic enough to
suggest that a. it's the only ranking algorithm available, and b. by
implication, that Solr do
We want to put all files into a file system(HDFS). It is easy to maintain
if both config and index files are in the same place. You are right, we can
put a dummy one to by pass that. But I have another idea. We subclass the
Directory class to handle all the files access. If QueryElevationComponent
+10
On Mon, Oct 29, 2012 at 12:17 PM, Michael Della Bitta
wrote:
> As an external observer, I think the main problem is your branding.
> "Realtime Near Realtime" is definitely an oxymoron, and your ranking
> algorithm is called "Ranking Algorithm," which is generic enough to
> suggest that a. it'
Maybe we do need to think more seriously about some of those higher-level
SQL-like features.
Function queries, which can now be used as "pseudo-fields" in the "fl"
parameter do have a "sum" function, but that merely adds an explicit list of
functions/field names for a single document.
Maybe
On 10/29/2012 7:55 AM, Dmitry Kan wrote:
Hi everyone,
at this year's Berlin Buzz words conference someone (sematext?) have
described a technique of a hot shard. The idea is to have a slim shard to
maximize the update throughput during a day (when millions of docs need to
be posted) and make sure
Hi!
We're currently facing a strange memory issue we can't explain, so I'd
like to kindly ask if anyone is able to shed a light an the behavour
we encounter.
We use a Solr 3.5 instance on a Windows Server 2008 machine equipped
with 16GB of ram.
The index uses 8 cores, 10 million documents, disk s
I think I get it right way.
Referring back to my example.
I will get 3 groups:
Large group with 8 documents in it and
two other groups with one document in each
If I limit a group by 5 docs then 1st group will have only 5 docs and the
other two will stay contain one doc.
And the order (based on
You've mentioned that you want ot "improve" the scores of these documents,
but you haven't really given any specifics about when/how/why you wnat to
improve the score in general -- ie: in this examples you have a total of
10 docs, but how do you distinguish the 2 special docs from the 8 other
Hi again!
On 29 October 2012 18:39, Nicolai Scheer wrote:
> Hi!
>
> We're currently facing a strange memory issue we can't explain, so I'd
> like to kindly ask if anyone is able to shed a light an the behavour
> we encounter.
>
> We use a Solr 3.5 instance on a Windows Server 2008 machine equippe
: Transformer is great to augment Documents before shipping to response,
: but what would be a way to prevent document from being delivered?
DocTransformers can only modify the documents -- not hte Document List.
what you are describing would have to be done as a SearchComponent (or in
the QPar
Perhapse this is a XY problem.
First of all I don't have a site which I want to boost. All docs are equal.
Secondly I will explain what I have. I have 100 docs indexed. I do a query
which returns 10 found docs. 8 of them from one site and 2 from other
different sites. I dont like order. Technical
: > field:"value" OR (*:* AND NOT field:[* TO *])
: Instead of field:[* TO *], you can define a default value in schema.xml.
: Or DefaultValueUpdateProcessorFactory in solrconfig.
right -- the most efficient way to query for this kind of "has value in
fieldX" or "does not have a value in field
You haven't really explained things enough for us to help you...
: First of all I don't have a site which I want to boost. All docs are equal.
:
: Secondly I will explain what I have. I have 100 docs indexed. I do a query
: which returns 10 found docs. 8 of them from one site and 2 from other
:
I don't know if there is an easy way to address the key crux of your
question...
: only applied of there are 0 values for the field. Is there a way when
: using the DIH to replace 'null' or missing values with a default, such
: that I can ensure that I always have the same number of values in ea
You absolutely follow my problem. I want to put Obama from espn atop just
because this is exceptional and probably interesting occurance. And the
score is low because content is long or there are no matches in title.
29.10.2012 23:18 пользователь "Chris Hostetter"
написал:
>
> You haven't really
How did you get the 7 directories anyway? From your message,
they sound like they are _solr_ indexes, in which case you
somehow created then with Solr. But I don't really understand
the setup in that case.
If these are Solr/Lucene indexes, you can use the "multicore"
features. This treats them lik
I agree with Chris Hostetter that we might not be able to provide
suggestions for the use cases unless there are clear reasons provided
("don't like the order" is the feeling, not the reason how you want to
adjust the orders).
- if you want to put some results on top based on some terms regardless
: We are currently working on having Solr files read from HDFS. We extended
: some of the classes so as to avoid modifying the original Solr code and
: make it compatible with the future release. So here comes the question, I
: found in QueryElevationComponent, there is a piece of code checking wh
I suspect what's happening is that the index format changed
between 3.x and 4.x and somehow the Luke request
handler is getting mixed up. I'm further guessing that
the -1 is just the default value, since it's clearly bogus
it's a flag the luke request handler just didn't see what it
expected...
O
In short, no. The problem is that faceting is working by counting
documents with distinct tokens in the field. So in your example
I'd expect you to see facets for "toys", "for", "children". All it
has to work with are the tokens, the fact that the original input
was three words is completely lost a
Thanks Hoss,
I probably did not formulate the question properly, but you gave me an answer.
I do it already in SearchComponent, just wanted to centralise this
control of the depth and width of the response to the single place in
code [style={minimal, verbose, full...}].
It just sounds logical t
Hi All,
I am trying to run two SolrCloud with 3 and 2 shards respectively (lets say
Cloud3shards and Clouds2Shards). All servers are identical with 18GB Ram
(16GB assigned for Java).
I am facing a few issues on both clouds and would be grateful if any one
else has seen / solved these.
1) Every
: I did not look where pagination happens, but it looks like
: DocTransform gets applied at the very end (response writer), which in
: turn means pagination is not an issue , just soma pages might get
: shorter due to this additional filtering, but that is quite ok for me.
it depends on what you
On 10/29/2012 3:26 PM, shreejay wrote:
I am trying to run two SolrCloud with 3 and 2 shards respectively (lets say
Cloud3shards and Clouds2Shards). All servers are identical with 18GB Ram
(16GB assigned for Java).
This bit right here sets off warning bells right away. You're only
leaving 2GB
Do updates always start at the shard leader first? If so one can save one
internal request by only sending updates to the shard leader. I am
assuming that when the shard leader is down, SolrJ's CloudSolrServer is
smart enough to use the newly elected shard leader after a failover has
occurred. A
Dne 29.10.2012 12:18, Michael McCandless napsal(a):
With Lucene 4.0, FSDirectory now supports merge bytes/sec throttling
(FSDirectory.setMaxMergeWriteMBPerSec): it rate limits that max
bytes/sec load on the IO system due to merging.
Not sure if it's been exposed in Solr / ElasticSearch yet ...
i
I could well be doing something wrong here, but so far I haven't figured it
out. I currently run SOLR 4 BETA / multicore and I was investigating
migrating to SOLR 4.0 (on my workstation).
I've even backed out my custom schema and solrconfig so I'm running as close
to original as possible with no
Hi all,
would you know why I get (notice square brackets)
[1969 Harley Davidson Ultimate Chopper]
not
1969 Harley Davidson Ultimate Chopper
when calling
var description = row.get("ProductName").toString();
in a script transformer?
Thank you,
Radek.
On 10/29/2012 5:28 PM, vybe3142 wrote:
I could well be doing something wrong here, but so far I haven't figured it
out. I currently run SOLR 4 BETA / multicore and I was investigating
migrating to SOLR 4.0 (on my workstation).
I've even backed out my custom schema and solrconfig so I'm running
Hi there,
I am new to SOLR and trying to use MapReduce to index on 4.0. Per online
suggenstions, I tried both ConcurrentUpdateSolrServer and CloudSolrServer.
For ConcurrentUpdateSolrServer, I did this:
in setup:
int taskId = context.getTaskAttemptID().getTaskID().getId();
int serverId = taskId
realtime comes from the tag used to enable the functionality in
solrconfig.xml. nrt is used as an acronym as in radar/laser/jpeg/cdrom,
etc. nrt is so well known I did not expect it to be expanded to its full
form which does make it oxymoronic ...
Regards,
Nagendra Nagarajayya
http://solr-ra
Thanks Michael for the feedback. Will take a look at this ...
Regards,
Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org
On 10/29/2012 9:17 AM, Michael Della Bitta wrote:
As an external observer, I think the main problem is your branding.
"Realtime Near Realtime"
This looks like a MySQL permissions problem and not a Solr problem.
"Caused by: java.sql.SQLException: Access denied for user
'readonly'@'10.86.29.32'
(using password: NO)"
I'd advise reading your stack traces a bit more carefully. You should
check your permissions or if you don't own the DB, chec
Sounds like it is multivalued - the square brackets indicate an array.
-- Jack Krupansky
-Original Message-
From: Radek Zajkowski
Sent: Monday, October 29, 2012 8:37 PM
To: solr-user@lucene.apache.org
Subject: row.get() in script transformer adds square brackets [] to string
value
H
Hi,
I have created Solr XML data source. And on that I am working on less than
operator.
I tried q=SerialNo:[ * TO 500 ].But It is showing records having SerialNo=1000.
Could you help me this.?
Thanks,
Leena Jawale
The contents of this e-mail and any attachmen
72 matches
Mail list logo