On 10/3/2013 11:29 PM, Sadler, Anthony wrote:
> Time:
> -
> On some servers we're dealing with something in the region of a million or
> more files. Indexing that many times takes upwards of 48 hours or more. While
> the script is now fairly stable and fault tolerant, that is still a pretty
No direct help but a bunch of related random thoughts:
1) How are you running Tika? As a jar loading from scratch every time? Tika
can also run in a server mode where it listens to a network socket. You
send the file, it sends the extract back. Might be faster.
2) Deleting old stuff. You can inde
Thanks for the tips. When I got time, I will have a look into it and I will
try to use solr via the embedded jetty.
Regards,
Roland.
On Thu, Oct 3, 2013 at 3:26 PM, Shawn Heisey wrote:
> On 8/14/2013 5:16 AM, Roland Everaert wrote:
> > For the past months I have deplaoyed and used SOLR 4.3.
I suggest you to look at here:
http://www.javadocexamples.com/java_source/org/apache/lucene/wikipedia/analysis/WikipediaTokenizerTest.java.html
2013/10/4 Ken Krugler
> Hi all,
>
> Where's the documentation on the WikipediaTokenizer?
>
> Specifically I'm wondering how pieces from the source XML
Did you check here at logs:
*Caused by: java.io.FileNotFoundException:
/opt/solr/myCore/data/index/_2he9.si (No such file or directory)*
2013/10/4 tamanjit.bin...@yahoo.co.in
> Hi,
> We migrated to Solr 4.3 from 3.5 yesterday. We use multicore Master Slave
> architecture and use external scrip
There is an old question like that:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201101.mbox/%3CAANLkTi=itRz7ni6HV-m=GTThzLb9G8XkWi92jBn=p...@mail.gmail.com%3E
Also you check that page for general information:
http://websolr.com/guides/solr-clients
2013/10/4 Neeraj Pandey
> Hi a
Hi!
Imagin a collection collection1 with 3 shards, replicationFactor=2 and
maxShardsPerNode=2 hosted on three machines.
Then add a new collection, collection2 configured the same.
Cool, so now we have three machines each with 4 cores, a shard leader and a
replica for each of the two collection.
Hi
Does the size of ID field matter .. in terms of memory usage...and query
performance...
i.e. will Solr use more memory if you use a URL string as ID field instead
of a int value?
./zahoor
Hi,
I have been asked the same question. There are only DELETEALIAS and CREATEALIAS
actions available, so is there a way to achieve uninterrupted switch of an
alias from one index to another? Are we lacking a MOVEALIAS command?
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominv
Using arbitrary strings affects at least on the traffic between the
shard(s) and a querying client or shards and a frontend solr instance. We
have actually hit such an issue, described here:
https://issues.apache.org/jira/browse/SOLR-4903, which has triggered the
suggestion for ids compaction:
http
In the last Solr versions Atomic Updates were introduced
http://wiki.apache.org/solr/Atomic_Updates
I'm wondering, updating a field that is
stored="true" indexed="true"
would be different as updating a field that is
stored="true" indexed="false"
Would Solr try to reindex the doc only if the fi
I did see that. The file its looking for doesn't exist post slaves have been
updated. I suspect there was segment info file of that name before the
syncing happened and once the file was removed the searcher still looks for
the file. Strange behavior, don't know why should this happen.
--
View t
Hello ,
I am using solr 4.0 , I want to sent a list of objects to solr as a request
parameter. Is it possible ? Please let me know.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Can-I-pass-some-Object-as-request-parameter-to-solr-server-tp4093463.html
Sent from the Solr -
On Fri, Oct 4, 2013, at 11:19 AM, maephisto wrote:
> In the last Solr versions Atomic Updates were introduced
> http://wiki.apache.org/solr/Atomic_Updates
>
> I'm wondering, updating a field that is
> stored="true" indexed="true"
> would be different as updating a field that is
> stored="true"
I've used this feature to great effect. I have logs coming in, and I
create a core for each day. At the end of each day, I create a new core
for tomorrow, unload any cores over 2 months old, then create a set of
aliases ("all", "month", "week", "today") pointing to just the cores
that are needed fo
The two techniques for sending lists of values/objects to Solr are comma or
spaces delimited single parameter values, or multiple instances of the
parameter name, one per value/object.
Solr has no object-encoding per se, and has no parameters that are
generalized objects, but you can use any o
Hi guys,
I was thinking how to activate the DocValues approach for using faceting.
Tell me if I am correct :
1) enable in schema.xml for the field of interest, the DocValues attribute
set to true.
2) use one of these 2 faceting strategies : fc ( Field Cache) or fcs (Per
segment Field Cache)
3) T
I've implemented a SearchRequest class at my application. It has some
custom fields and filled via web services automatically from a JSON object
(via jackson). Inside a core class I retrieve proper attributes from that
class and send a query to Solr server.
If you have an API class to reach your S
It all depends. I mean, if you have 20 million URLs averaging 40 characters
each, that's 80 MB, not a big deal at all, but if you have 20 billion URLs
that would take up 80 GB, which might be a big deal. But if you shard those
20 billion documents into 10 shards, 8 GB may or may not be a big dea
Hi
I want to display result as one Dataset thorough solr using Multicore.In one
core Containg EnglishCollectionData and onther containg HindiCollectionData.
When I am join two core result is displayed when I am giving English Parameter
But does not work For Hindi Parameter.Could me give the so
Thank you!
My question is actually what's the difference in updating an indexed field
vs. updating a non-indexed field? Will updating an indexed field trigger a
"refresh" in the solr indexes while in the other case wouldn't?
--
View this message in context:
http://lucene.472066.n3.nabble.com/U
I'm trying to get rid of the following warnings:
[2013-10-03 17:37:56.981] WARNING Using deprecated class:
XmlUpdateRequestHandler -- replace with UpdateRequestHandler
[2013-10-03 17:37:56.983] WARNING Using deprecated class:
BinaryUpdateRequestHandler -- replace with UpdateRequestHandler
How
On 10/4/2013 6:56 AM, maephisto wrote:
> My question is actually what's the difference in updating an indexed field
> vs. updating a non-indexed field? Will updating an indexed field trigger a
> "refresh" in the solr indexes while in the other case wouldn't?
A change is a change. It doesn't matte
On 10/4/2013 7:16 AM, Erlend Garåsen wrote:
> [2013-10-03 17:37:56.981] WARNING Using deprecated class:
> XmlUpdateRequestHandler -- replace with UpdateRequestHandler
> [2013-10-03 17:37:56.983] WARNING Using deprecated class:
> BinaryUpdateRequestHandler -- replace with UpdateRequestHandler
>
> H
Check out: https://issues.apache.org/jira/browse/SOLR-5302 can do this
using query facets
On Fri, Jul 12, 2013 at 11:35 AM, Jack Krupansky wrote:
> sum(x, y, z) = x + y + z (sums those specific fields values for the
> current document)
>
> sum(x, y) = x + y (sum of those two specific field value
Check out: https://issues.apache.org/jira/browse/SOLR-5302 it supports
median value
On Wed, Jul 3, 2013 at 12:11 PM, William Bell wrote:
> If you are a programmer, you can modify it and attach a patch in Jira...
>
>
>
>
> On Tue, Jun 4, 2013 at 4:25 AM, Marcin Rzewucki
> wrote:
>
> > Hi there,
Many thanks for your reply.
We're running Solr 4.4.0 in production, but SolrJ 3.6 is still used by
our CMS connector. I will place your reply as a comment in our own Jira
and do the necessary changes later when we are ready to upgrade SolrJ
for our connector.
Erlend
On 10/4/13 4:05 PM, Sha
Is there a way to use the function return value for a range query
For example: I have two price fields pricea and priceb and now i want to get
the values where the sum of the pricea and priceb is between [0 TO 5]
Something like *select?q={!func}sum(pricea,priceb):[0 TO 5]*
I can't calculate thi
I think the best you can do is compute sum(pricea,priceb) at index time as a
third field, say priceSum, and then you can do a range query on that
priceSum field.
It would be nice to be able to have a query that evaluates arbitrary
expressions combining field values, but there is no such featur
Thanks for the quick answer. I thought that :-)
Is there any plan add such a functionality in the future. Or is it completely
against the concept.
Bests Sandro
-Ursprüngliche Nachricht-
Von: Jack Krupansky [mailto:j...@basetechnology.com]
Gesendet: Freitag, 4. Oktober 2013 16:41
An:
I have some info and examples for the WikipediaTokenizer in my book, but a
tokenizer does not direct tokens to a field. Rather, you would use the
tokenizer in the analyzer for whatever field you wish to store values in.
You could use the same input for multiple fields and then filter the tokens
No plan that I know of, but there is a new Lucene "expression module", so
maybe it is not so farfetched. Its performance might not be so great, but if
you need the flexibility it might be worth it.
-- Jack Krupansky
-Original Message-
From: Sandro Zbinden
Sent: Friday, October 04, 20
Thanks for the guidance, Shawn. I am in fact using Tomcat instead of
Jetty, but the logging was working OK at one point, so I'm not sure what
changed to make it not work. I'll have to investigate that. I'll check
out your other suggestions as well.
Brian
On 10/4/2013 1:43 AM, Shawn Heisey wrot
Hello,
and thank you for your answer Shawn.
I tried to simplify my problem but I realize I chose a bad example : I
don't process phone numbers, and I do process unstructured documents.
My GATE application might return several annotations for the same group of
words (because I'm using an ontology
Hi,
When a distributed search is done, the inital query is forwarded to all
shards that are part of the specific collection that we are querying.
My question here is, Which is the machine that does the aggregation for
results from shards?
Is the machine which receives the initial request?
I nee
It ended up that I just needed to restart Tomcat. Once you mentioned the
logging, it sounded like something somewhere just got stuck, so
rebooting took care of it. I should have just done that in the first place.
Brian
On 10/4/2013 1:43 AM, Shawn Heisey wrote:
On 10/3/2013 8:03 PM, Brian Robin
I need to sort documents returned in order of (descending)score &
(descending)value of an int field within the document. How do I ensure
proper sort order as well as good performance ?
I don't need the sort-order defined by sort=score desc,intField desc. The
sort order needs to be somewhat like wh
Hi,
im playing around with solr 4.4. Just started it and tried to create new
cores. First hurdle was the fact that no solrconfig.xml and no schema.xml
were found in the classpath. So I copied and configured into
example/resources, because that directory is in the classpath.
Now I am able to c
On 10/4/2013 1:49 PM, helt wrote:
Now I am able to create cores, but new cores don't have their own config
file. Is this intended behavior? I haven't found much documentation for
these features.
I think I miss something basic here. Is it really normal, that no
configuration files are being creat
Yes, the machine that gets the initial request is the one that distributes
to the shards and the aggregates the results.
On Fri, Oct 4, 2013 at 9:55 AM, yriveiro wrote:
> Hi,
>
> When a distributed search is done, the inital query is forwarded to all
> shards that are part of the specific coll
40 matches
Mail list logo