Hi,
I am relatively new to SOLR and I am looking for a neat way to implement
weak documents with SOLR.
Whenever a document is updated or deleted all it's dependent documents
should be removed from the index. In other words they exist as long as
the document exist they refer to when they were
Hi all,
Is there any way the solr caches ( document / field/ query) cache can be
persisted on disk. In case of system crash, can i make the new cache loaded
from the persisted cache.
Thanks,
Prasi
Just a guess, I haven't investigated them fully yet, but I wonder if
block joins could serve you here, as they involve creating docs in a
parent child relationship.
Or, you could easily fake it:
abcd
parent:abcd
Not sure if that syntax is completely right, but using that sort of
thing woul
As Shawn stated above, when you start up Solr there will be no such thing as
caches or old searchers.
If you want to warm up, you can only rely on firstSearcher and newSearcher
queries.
/"What would happen to the autowarmed queries , cache , old searcher now ?"/
They're all gone.
-
Thanks,
Caches are only valid as long as the Index Searcher is valid. So, if you make
a commit with opening a new searcher then caches will be invalidated.
However, in this scenario you can configure your caches so that the new
searcher will keep a certain number of cache entries from the previous one
(aut
Thomas,
our experience with Curriki.org is that evaluating what I call the "related
documents" is a procedure that needs access to the complete content and thus is
run at the DB level and no thte sold-level.
For example, if a user changes a part of its name, we need to reindex all of
his resou
I'm sorry, I forgot to write the problem.
adfel70 wrote
> 1. take one of the replicas of shard1 down(it doesn't matter which one)
> 2. continue indexing documents(that's important for this scenario)
> 3. take down the second replica of shard1(now the shard is down and we
> cannot index anymore)
>
We have a webapp running with a very high HEAP size (24GB) and we have
no problems with it AFTER we enabled the new GC that is meant to replace
sometime in the future the CMS GC, but you have to have Java 6 update
"Some number I couldn't find but latest should cover" to be able to use:
1. Remo
Currently , once solr is started, we run a batch that would fire queries to
solr ( just something like the firstsearcher does). Once this is done, then
the users would start using search.
In case the server is restarted or anything crashes, then again i have to
run this batch which i cannot contro
Thanks Sujit, I got the problem and fixed it.
2013/11/26 Sujit Pal
> Hi Furkan,
>
> In the stock definition of the payload field:
>
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/collection1/conf/schema.xml?view=markup
>
> the analyzer for payloads field type is a WhitespaceT
Am 27.11.2013 09:58, schrieb Paul Libbrecht:
Thomas,
our experience with Curriki.org is that evaluating what I call the
"related documents" is a procedure that needs access to the complete
content and thus is run at the DB level and no thte sold-level.
For example, if a user changes a part of i
Now this is strange,
while using TrimFilterFactory with attribute "updateOffsets=true" as described
in
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.TrimFilterFactory
and
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-TrimFilter
I get "
Hi
I am facing a problem in solr with tt_news url. Every time its showing all news
from one detail page.
For ex: I have two category of news
1. Corporate
2. Human
So url for corporate it should be form like :
domainname/pagename/corporate/detail/article/newsheading
And for human
Hello,
this is a repost. This message was originally posted on the 'general' list but
it was suggested, that the 'user' list might be a better place to ask.
Original Message
Hi,
we are passing a multivalued field to the LanguageIdentifierUpdateProcessor.
This multivalued field contai
You could just add the queries you have set up in your batch script to the
firstSearcher queries. Like this, you wouldn't need to run the script
everytime you restart Solr.
As for crash protection and immediate action, that's outside the scope of
the Solr mailing list. You could setup a watchdog t
I think when a replica becomes leader, it tries to sync *from* all the
other replicas to see if anyone else is more up to date than it is, then it
syncs back out *to* the replicas. But that probably won't happen in your
case, since when replica1 comes back (step 4) it is the only contender, so
it
Hi,
on https://issues.apache.org/jira/browse/SOLR-3583
there are some patches listed. I currently can't really figure out, for
which solr version this patch is valid, since the issue listed there is
still open and should be fixed for version 4.6.
I'm wondering if this patch can be appli
Hi Team,
As per the latest updates in the support ticket in Lucid portal we have some
concerns as below
1. The join key id's seem to have to be integers. It says they require
longs, but I am having trouble with anything but an integer as the "from" and
"to" key values.
--regarding the
Yes. This is going to hurt you a lot. The intent of M/S is that
you should be indexing to one, and only one machine, the
master. All slaves pull their indexes from the master. Frankly
I don't know quite what will happen in the configuration you're
talking about. I strongly recommend you do not do t
Please review: http://wiki.apache.org/solr/UsingMailingLists
You've given us almost no information to go on here.
Best,
Erick
On Tue, Nov 26, 2013 at 2:21 PM, GOYAL, ANKUR wrote:
> Hi,
>
> I am working on using term vector component with solr 4.2.1. If I use solr
> in a multicore environment,
I suspect that it is an oversight for a use case that was not considered. I
mean, it should probably either ignore or convert non text/string values.
Hmmm... are you using JSON input? I mean, how are the types being set? Solr
XML doesn't have a way to set the value types.
You could workaround
What about using some JSONP techniques since the results in the Solr
instance rest as key/value pairs?
On 11/26/13, 10:53 AM, "Markus Jelsma" wrote:
>I don't think you mean client-side proxy. You need a server side layer
>such as a normal web application or good proxy. We use Nginx, it is very
The sense of "fq" clauses is "for all the docs that
match my primary query, only show the ones that
match the fq clause". There's no primary query to
work with.
If you really need this capability, you can add this to
the section of your request handler in
solrconfig.xml
*:*
The oob request handl
Just bite the bullet and do the query at your application level. I mean,
Solr/Lucene would have to do the same amount of work internally anyway. If
the perceived performance overhead is too great, get beefier hardware.
-- Jack Krupansky
-Original Message-
From: Thomas Scheffler
Sent:
My _guess_, and it's only a guess since you haven't shown us
anything about your Solr setup, is that all your documents
are getting indexed with the same ID so you only have one
live document.
You might review:
http://wiki.apache.org/solr/UsingMailingLists
Best,
Erick
On Wed, Nov 27, 2013 at 5:
As Daniel says, there's no information available
in step 4 for that node to know it's out of date.
"Don't do that" isn't very helpful. I think the only
recovery strategy I can think of offhand is to
reindex from some time T prior to step <1>...
Best,
Erick
On Wed, Nov 27, 2013 at 6:07 AM, Danie
I am interested in retrieving the tf for terms that matched the query, not
all terms in the document. Is this possible? Looking at the example when
I search for the word cable I get the response that is shown below, ideally
I'd like to see only the tf for the word cable. Is this possible or woul
"Try it and see". Not really helpful, but the best we I can do.
There's no formal method for insuring that a patch will work
with an arbitrary version. At least you're trying to apply it
to a version newer than it was created on.
Not much help, but
If you _do_ apply it to 4.5.1, and if you ha
Would it serve to return the tf or ttf? You'd have to
tack on clauses like
fl=*,ttf(name,drive)
or
fl=*.ttf(name,drive)
Which implies that you'd have to do some work
on the query side to add the tf or ttf clauses.
See:
http://wiki.apache.org/solr/FunctionQuery#tf
Best,
Erick
On Wed, Nov 27, 20
That information would be included in the debugQuery output as well.
-- Jack Krupansky
-Original Message-
From: Jamie Johnson
Sent: Wednesday, November 27, 2013 9:32 AM
To: solr-user@lucene.apache.org
Subject: Term Vector Component Question
I am interested in retrieving the tf for
Hi
I have various solr related projects in a single environment.
These project are not related one to another.
I'm thinking of building a solr architecture so that all the projects will
use different solr collections in the same cluster, as opposed to having a
solr cluster for each project.
1. as
Why complicate it?, I think the simplest solution to the poster question
is either a transparent proxy or proxy jetty (or tomcat) via Apache Web
Server.
I don't think there will be any difference between either, only how easy
one or the other are to implement.
HTH,
Guido.
On 27/11/13 14:13
Mark,
As a 2nd thought, maybe, I was just focusing on what I thought you
needed initially which is allow client to query solr and at the same
time restrict specific request parameters, both apache and a any rich
transparent proxy can do the job easily, apache can rewrite the URL and
map only
Hi All,
on our test environment we have implemented a new search engine based on
Solr 4.3 with 2 instances hosted on different servers and 1 shard
present on each servlet container.
During some stress test we noticed a bottleneck into crawling of large
PDF file that blocks the serving of resu
> I suspect that it is an oversight for a use case that was not considered.
> I mean, it should probably either ignore or convert non text/string
> values.
Ok, I'll see that I provide a patch against trunk. It actually
ignores non string values, but is unable to check the remaining values
of a mul
Right. Delete by query "id:foo OR dependsOn:foo". --wunder
On Nov 27, 2013, at 6:23 AM, "Jack Krupansky" wrote:
> Just bite the bullet and do the query at your application level. I mean,
> Solr/Lucene would have to do the same amount of work internally anyway. If
> the perceived performance o
Hi Sujit;
Your example has that line:
override def decodeNormValue(b: Byte) = 1.0F
However it is a final class. Do you have any idea to handle it?
2013/11/27 Furkan KAMACI
> Thanks Sujit, I got the problem and fixed it.
>
>
> 2013/11/26 Sujit Pal
>
>> Hi Furkan,
>>
>> In the stock defini
"it is a final *method*". Can not be overrided at Solr 4.5.1?
2013/11/27 Furkan KAMACI
> Hi Sujit;
>
> Your example has that line:
>
> override def decodeNormValue(b: Byte) = 1.0F
>
>
> However it is a final class. Do you have any idea to handle it?
>
>
>
> 2013/11/27 Furkan KAMACI
>
>> Thank
Hi,
I am using solr 4.6, with external zookeeper 3.4.5
5 nodes, 5 shards, 3 replicas.
I uploaded collection configuration in zookeepr.
I am using the new core discovery mode
I have this issue when I try to create a collection with this call :
http://10.0.5.227:8101/solr/admin/collections?action=C
Hi,
There's nothing unusual in what you are trying to do, this scenario is very
common.
To answer your questions:
> 1. as I understand I can separate the configs of each collection in
> zookeeper. is it correct?
Yes, that's correct. You'll have to upload your configs to ZK and use the
Collection
Lansing,
I ran the command without any issue
http://localhost:8983/solr/admin/collections?action=CREATE&name=Current1&numShards=5&replicationFactor=3&maxShardsPerNode=15&collection.configName=default
The only different was that I have only one box and used the default config
from example folde
Hello,
I'm trying to get a list of top terms for a field called "Tags".
One way to do this would be to query all data *:* and then facet by the
Tags column:
/solr/collection/admin/select?q=*:*&rows=0&facet=true&facet.field=Tags
I've noticed another way to do this is using the luke interface like
Hi,
I got a new issue now. I have Solr 4.3.0 running just fine. However on Solr
4.3.1, it wont load. I get this issue:
{msg=SolrCore 'mycore' is not available due to init failure: Plugin
init failure for [schema.xml] fieldType "text_ws": Plugin init failure
for [schema.xml] analyzer/filter: Erro
Jack,
I'm not following, are you suggesting to turn on debug and then parse the
explain? Seems very round about if that is the case, no?
On Wed, Nov 27, 2013 at 9:40 AM, Jack Krupansky wrote:
> That information would be included in the debugQuery output as well.
>
> -- Jack Krupansky
>
> -
You can always expose the admin handler on non-admin URL. That's all just
definitions in solrconfig.xml.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, i
Since your users shouldn't be allowed at any time to access Solr directly, it's
up to you to implement that on the client side anyway?
I can't tell if there is a technical difference between the two calls you
named, but i'd guess that the second might be a more direct way to access this
informa
I definitely want tf, the number of times the matched term appears in the
document, the key is that I want only the term that was searched for, not
all terms.
Looking at the tf function this is close, except it needs to be the exact
term, I really need it to be the user entered text. So for insta
There is an XML version of explain as well, if parsing the structured text
is too difficult for your application. The point is that debug "explain"
details precisely the term vector values for actual query terms.
Don't let the "debug" moniker throw you - this parameter is simply giving
you acc
It's certainly seems to be faster (in my limited testing).
I just don't want to base my software on the Luke scripts if they're
prone to changing in the future.
And yes, I realize there are ways to make this secure. I just wanted
to know if it's something I should avoid doing (perhaps for reasons
I'm assuming you're using the ExtractingRequestHandler. Offloading
the entire work onto your Solr box that is also serving queries
and indexing is not going to scale well.
Consider using Tika/SolrJ (Tika is what the ERH uses anyway) to
offload the PDF parsing amongst as many clients as you can aff
Hi
I am trying to setup a test SolrCloud 4.5.1 implementation. My synonym file
is about 1.6 MB. When I try to add collection to ZooKeeper 3.4.5 on Ubuntu
12.4, it fails because of the 1MB limit of ZooKeeper. Has anyone any
experience with using such synonym files? Can I store them in some other
lo
You can use the jute.maxbuffer > 1M as a workaround.
You must set -Djute.maxbuffer in zookeeper and solr to work properly
--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Wednesday, November 27, 2013 at 5:15 PM, Puneet Pawaia wrote:
> Hi
>
> I am trying to setup a t
thanks I'm looking at this now, debug seems pretty close to what I want.
Is there a way to exclude information from the debug response, for
instance I don't need idf, fieldnorm, timing information, etc. Again
thanks.
On Wed, Nov 27, 2013 at 11:49 AM, Jack Krupansky wrote:
> There is an XML ver
a little more reading gave me it. I can just do debug=results, but that
still includes idf and fieldnorm. Much less though so it's a step ;) If
there is anyway to get just idf that would be great, otherwise no big deal
On Wed, Nov 27, 2013 at 12:18 PM, Jamie Johnson wrote:
> thanks I'm looki
Yago, not sure if this is a good idea. Docs say this is dangerous stuff.
Anyway, not being a linux or java expert, I would appreciate if you could
point me to an implementation of this.
Regards
Puneet Pawaia
On 27 Nov 2013 22:54, "Yago Riveiro" wrote:
> You can use the jute.maxbuffer > 1M as
To be honest, this kind of question comes up so often, that it probably is
worth a Jira to have a more customized or parameterized "explain".
Function queries in the "fl" list give you a lot more control, but not at
the level of actual terms that matched.
-- Jack Krupansky
-Original Mess
On 11/27/2013 9:37 AM, Raheel Hasan wrote:
I got a new issue now. I have Solr 4.3.0 running just fine. However on Solr
4.3.1, it wont load. I get this issue:
{msg=SolrCore 'mycore' is not available due to init failure: Plugin
init failure for [schema.xml] fieldType "text_ws": Plugin init failur
I have a payload field at my schema (Solr 4.5.1) When a user searches for a
keyword I will calculate the usual score and "if" a match occurs at that
payload field I will add payload to the general score (payload * normalize
coefficient)
How can I do that? Custom payload similarity class or custom
How are you launching Solr?
Do you have an ensemble or you're running zookeeper embedded?
Yes, doc says that jute.maxbuffer is dangerous, but without it you can stored
nothing with more than 1M in zookeeper … and in some point you can have a
clusterstate.json with a size greater than 1M
-
Hi,
So, this query does just what I want, but it's typically 3 times slower
than the edismax query without the functions:
select?qq={!edismax v='news' qf='title^2 body'}&scaledQ=scale(product(
query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),
product(0.25,field(myfield)))&fq={!query v=$qq}
They are just trying to keep users from using ZK in a bad way. Storing and
accessing a ton of huge files is not what ZooKeeper was designed for. A 1MB
limit is a fairly arbitrary limiter to make sure you don’t shoot yourself in
the foot and store lots of large files. With modern networks and har
: So, this query does just what I want, but it's typically 3 times slower
: than the edismax query without the functions:
that's because the scale() function is inhernetly slow (it has to
compute the min & max value for every document in order to know how to
scale them)
what you are seeing is
Although the 'scale' is a big part of it, here's a closer breakdown. Here
are 4 queries with increasing functions, and theei response times (caching
turned off in solrconfig):
100 msec:
select?q={!edismax v='news' qf='title^2 body'}
135 msec:
select?qq={!edismax v='news' qf='title^2
body'}q={!fun
I'm curious how much compression you get with your synonym file using
something basic like gzip? If significant, would it make sense to
store the compressed syn file in ZooKeeper (or any other metadata you
need to distribute around the cluster)? This would require the code
that reads the syn file f
Hi Dave,
Have you looked at the TermsComponent?
http://wiki.apache.org/solr/TermsComponent It is easy to wire into an
existing request handler and allows you to return the top terms for a
field. Example server even includes an example request handler that
uses it:
true
fa
Thanks Jack, I'll see if I can find anything on Jira about this and if not
I'll create a ticket for it.
On Wed, Nov 27, 2013 at 12:28 PM, Jack Krupansky wrote:
> To be honest, this kind of question comes up so often, that it probably is
> worth a Jira to have a more customized or parameterized "
I didn't see anything so I created this
https://issues.apache.org/jira/browse/SOLR-5511
On Wed, Nov 27, 2013 at 2:35 PM, Jamie Johnson wrote:
> Thanks Jack, I'll see if I can find anything on Jira about this and if not
> I'll create a ticket for it.
>
>
> On Wed, Nov 27, 2013 at 12:28 PM, Jack
Hi Erick,
On our architecture we use Apache Manifoldcf to invoke the schedulation
from Manifold-web and we use the Manifold-agent to take the pdf file
from the filesystem to SolR instances. Is it possibile to redirect the
Manifold schedulation to the SolrJ instance for specific schedules?
Tha
jca recently pointed out on the #solr IRC channel that normal (ie:
non-committer) confluence-users are not able to post comments on any
pages of the Solr Ref Guide.
This is evidently do to a change made by Infra that was mentioned in an
email to all PMC members on Oct1 -- but the rramificati
Hi,
I'd like to check - there is something I don't understand about cache - and
I don't know if it is a bug, or feature
the following calls return a cache
FieldCache.DEFAULT.getTerms(reader, idField);
FieldCache.DEFAULT.getInts(reader, idField, false);
the resulting arrays *will* contain entrie
Hi,I would like to post a comment about the problem below on Solr Confluence documentation, but comments are disabled right now for confluence-users (at least at the time I'm writing this - it was confirmed on IRC a minute ago).The page I would like to comment on is : https://cwiki.apache.org/conf
FYI: https://issues.apache.org/jira/browse/INFRA-7058
: Changing this back for just the ref guide wiki psace would be fairly easy --
: but i don't want to do that until i have a chance to talk to Infra about it.
-Hoss
Thanks Tim,
That seems to be exactly what I'm looking for!
-Dave
> On Nov 27, 2013, at 2:34 PM, Timothy Potter wrote:
>
> Hi Dave,
>
> Have you looked at the TermsComponent?
> http://wiki.apache.org/solr/TermsComponent It is easy to wire into an
> existing request handler and allows you to re
Jamie:
Before jumping into using debug, do take a bit to test
the performance! I've seen the debug component take
up to 80% of the query time. Admittedly, that was, I
think, 3.6 or something so it may be much different now.
But I should have asked first, "Why do you care?". What
is your use case.
Yep, it's expected. Segments are write-once. It's been
a long standing design that deleted data will be
reclaimed on segment merge, but not before. It's
pretty expensive to change the terms loaded on the
fly to respect deleted document's removed data.
Best,
Erick
On Wed, Nov 27, 2013 at 4:07 PM,
Are you using old-style XML files with a tag
and maybe tags as well? If so, see:
https://issues.apache.org/jira/browse/SOLR-5510
Short form: you may have better luck if you're using
old-style solr.xml files by adding:
genericCoreNodeNames="${genericCoreNodeNames:true}
to your tag, something li
I understand that changes would be expensive, but shouldn't the cache
simply skip the deleted docs? In the same way as the cache for multivalued
fields (that accepts livedocs bits).
Thanks,
roman
On Wed, Nov 27, 2013 at 6:26 PM, Erick Erickson wrote:
> Yep, it's expected. Segments are write-o
FYI: comments should now be working for all registered users.
If Comment spam becomes a problem too unweildy to manage by deleting after
the fact, we'll have to consider going the same route as we do with an
explicit white list of users like we have with moin moin.
: Date: Wed, 27 Nov 2013 14
Hello lucene
how view lucene merge process?
What do you really want to do/accomplish? I mean, for what purpose?
You can turn on the Lucene infostream for logging of index writing.
See:
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
Set to "true".
There are some examples in my e-book.
-- Jack Krupansky
I am running an ensemble.
Can I get examples of how to use the option? I think there are not many
examples available of the exact usage.
Regards
Puneet
On 27 Nov 2013 23:23, "Yago Riveiro" wrote:
> How are you launching Solr?
>
> Do you have an ensemble or you're running zookeeper embedded?
>
>
Hi Julien,
Please see : http://search-lucene.com/m/MTRUH1cyNGZ1 and
https://issues.apache.org/jira/browse/INFRA-7058
On Wednesday, November 27, 2013 11:19 PM, Julien Canquelain
wrote:
Hi,
I would like to post a comment about the problem below on Solr Confluence
documentation, but com
Hello all,
Please add my username ( iorixxx ) to Contributors Group. With this, can I
edit confluence too?
Hi,
Please add my username ( shinichiro ) to Contributors Group.
Thanks in advance,
Shinichiro Abe
Thank you for your replies,
I am using the new-style discovery
It worked after adding this setting :
${genericCoreNodeNames:true}
--
View this message in context:
http://lucene.472066.n3.nabble.com/Error-when-creating-collection-in-Solr-4-6-tp4103536p4103696.html
Sent from the Solr - User ma
85 matches
Mail list logo