Hi Im using Solr 4's Data Import Utility to index Oracle 10g XE database. Im
using full imports as well as delta imports. I want these processes to be
automatic. (Eg: The import processes can be timed or should be executed as
soon any data in the database is modified). I searched for the same onlin
Thanks Erick.
It seems the approach suggested by you is the one which I was looking for ,
thanks a lot for reply.
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-get-unique-latest-results-from-solr-tp4080034p4080228.html
Sent from the Solr - User mailing list archive
You can certainly just include the attachment count in the
response and have the app apply the secondary sort. But
that doesn't separate the "noise" as you say.
How would you identify "noise"? If you don't have an algorithmic
way to do that, I don't know how you'd manage to separate
the signal
Not unless you are using "atomic updates", which require that you
store all fields.
Personally, it sounds like you're using Data Import Handler. You may
want to consider using SolrJ and caching some values to make it work.
Second approach: Use some of the cached entity capabilities in DIH
to make
I don't know the code in detail, but I suspect your answer is that
in general any incoming query has to be satisfied before a searcher
is closed. Any queries that are in process will hold open a reference
to the "snapshot" of the index at the time they started (i.e. the
segments current at that tim
You might be able to get close with grouping (by employee) and
sorting within groups by update time.
Best
Erick
On Wed, Jul 24, 2013 at 10:13 AM, Jack Krupansky
wrote:
> In that case, the answer is that no, Solr does not have such a feature.
>
> You could simulate it by doing a separate query (u
: Doable at Lucene level by any chance?
Given how well the Trie fields compress (ByteField and ShortField have
been deprecated in favor of TrieIntField for this reason) it probably
just makes sense to treat it as a numeric at the Lucene level.
: > If there's positive feedback, I'll open an iss
: Subject: Processing a lot of results in Solr
: Message-ID:
: In-Reply-To: <1374612243070-4079869.p...@n3.nabble.com>
https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message
: I get what looks like the admin page, but it says that there are solr core
: initialization failures, and the links on the page just bring me back to the
: same page.
if you get an error on the admin UI, there should be specifics about
*what* the initialization failure is -- at last one senten
Hi! http://busnatura.home.pl/google.com.offers.html
Hi,
I am new to solr and am trying to setup a solr cloud
I have created 3 server solr cloud and 1 zookeeeper and I am facing the
following problems with my set up.
1) When I create a new core using the collections API , the cores are
created, but all are in down state. How can I make them acti
With 6 zookeeper instances you need at least 4 instances running at the same
time. How can you decide to stop 4 instances and have only 2 instances running
? Zookeeper can't work anymore in these conditions.
Dominique
Le 25 juil. 2013 à 00:16, "Joshi, Shital" a écrit :
> We have SolrCloud cl
Solr 4.4 is already released!!!
http://lucene.apache.org/solr/
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-4-3-0-SolrCloud-lost-all-documents-when-leaders-got-rebuilt-tp4080185p4080188.html
Sent from the Solr - User mailing list archive at Nabble.com.
That makes sense about all bets being off. I wanted to make sure that
people whose systems are behaving sensibly weren't going to have problems.
I think I need to tame the base amount of memory the field cache takes. We
currently do boosting on several fields during most queries. We boost by at
le
We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute boxes
(cloud), where 5 machines (leaders) are in datacenter1 and replicas on
datacenter2. We have 6 zookeeper instances - 4 on datacenter1 and 2 on
datacenter2. The zookeeper instances are on same hosts as Solr nodes. We'
I have a solr query which has a bunch of boost params for relevancy. This
search works fine and returns the most relevant documents as per the user
query. For example, if user searches for: "iphone 5", keywords like
"apple", "wifi" etc are boosted. I get these keywords from external
training. The t
Well, it seems to work. I wonder what the best way to test this would be?
How can I remove a node from a cluster but still have it be up and running?
Jim
On Wed, Jul 24, 2013 at 12:10 PM, Jim Musil wrote:
> Wow! Awesome. Give me a bit to try to plug this into my environment.
>
> The other way I
I'd say you can work on this step and not bother with next steps until you
get this one going. The main question would be where did your apache+solr
setup came from? Was it working with earlier versions of Solr and you
upgraded? Was it some automatic install? Something else.
It was installed by
fwiw,
i did some prototype with the following differences:
- it streams straight to the socket output stream
- it streams on-going during collecting, without necessity to store a
bitset.
It might have some limited extreme usage. Is there anyone interested?
On Wed, Jul 24, 2013 at 7:19 PM, Roman C
Hi! http://www.MAVERIQUEEVENTS.COM/google.com.offers.html
On Wed, Jul 24, 2013 at 3:00 PM, Brian Robinson
wrote:
> I get what looks like the admin page, but it says that there are solr core
> initialization failures, and the links on the page just bring me back to
> the same page.
>
I'd say you can work on this step and not bother with next steps until
Hello,
I'm trying to get Solr 4.4 up and running. I only want a single core
(for now). I'm running this with Tomcat on Apache. I have a couple of
different issues, which may be resolved by including a properly written
solr.xml file.
First, when I navigate to
http://{myhost}:8080/solr/
I get
On 7/24/2013 11:16 AM, Shawn Heisey wrote:
> Just an FYI - 15 seconds is a VERY short time to do an autocommit with
> openSearcher set to false. If you are doing this with openSearcher set
> to true, then it would be better for you to do this with autoSoftCommit
> and do the autoCommit on a longer
On 7/24/2013 10:33 AM, Neil Prosser wrote:
> The log for server09 starts with it throwing OutOfMemoryErrors. At this
> point I externally have it listed as recovering. Unfortunately I haven't
> got the GC logs for either box in that time period.
There's a lot of messages in this thread, so I apolo
This paper contains an excellent algorithm for plagiarism detection, but
beware the published version had a mistake in the algorithm - look for
corrections - I can't find them now, but I know they have been published
(perhaps by one of the co-authors). You could do it with solr, to create an
index
On 7/24/2013 9:38 AM, jimtronic wrote:
> I've encountered an OOM that seems to come after the server has been up for a
> few weeks.
>
> While I would love for someone to just tell me "you did X wrong", I'm more
> interested in trying to debug this. So, given the error below, where would I
> look
Note that this can lead to performance issues. Queries with lots of
hits require lots of scoring and this will make queries slower.
We had this case with a client about 2 weeks ago. We were able to
spot this change in the change in the average number of hits
before/after query changes (meant to h
Hi Manasi,
Have a look at http://sematext.com/products/dym-researcher/index.html
- it sounds like exactly what you are after.
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Tue, Jul 23, 2013 at 1:29 AM, smanad wrote:
> Hey,
>
On 7/23/2013 4:56 PM, SolrLover wrote:
> For ex: If I have the hard autocommit set to 10 minutes and a softcommit
> every second, new documents will show up every second but in case of JVM
> crash or power goes out I will lose all the documents after the last hard
> commit.
It's my understanding t
On 7/24/2013 3:33 AM, Furkan KAMACI wrote:
> I am indexing and check the admin stats page. I see that:
>
> commits:471
>
> autocommit maxTime:15000ms
>
> autocommits:414
>
> soft autocommits:0
>
> optimizes:12
>
> docsPending:
Wow! Awesome. Give me a bit to try to plug this into my environment.
The other way I was going to attempt this was to use the health check file
option for the ping request handler. I would have to write a separate
process in python or something that would ping zookeeper for active nodes
and if the
On 7/24/2013 3:32 AM, archit2112 wrote:
> However,This is not working and im getting the following error -
> Unable to execute query: SELECT * FROM PRODUCT WHERE PID= Processing
> Document # 1
> Caused by: java.sql.SQLException: ORA-00936: missing expression
Here's your first entity:
It seems
: Subject: Can the admins of this list please boot wired...@yahoo.com
:
: Apologize if this is not the correct way to request mailing list admin
: support but it's pretty clear that wired...@yahoo.com is spamming this
: list and should be booted out.
Please see hte existing thread on this matter.
Hi Jim,
Based on our discussion, I cooked up this solution for my book Solr in
Action and would appreciate you looking it over to see if it meets
your needs. The basic idea is to extend Solr's built-in
PingRequestHandler to verify a replica is connected to Zookeeper and
is in the "active" state. T
One thing I'm seeing in your logs is the leaderVoteWait safety
mechanism that I mentioned previously:
>>>
2013-07-24 07:06:19,856 INFO o.a.s.c.ShardLeaderElectionContext -
Waiting until we see more replicas up: total=2 found=1 timeoutin=45792
<<<
>From Mark M: This is a safety mechanism - you ca
Sorry, good point...
https://gist.github.com/neilprosser/d75a13d9e4b7caba51ab
I've included the log files for two servers hosting the same shard for the
same time period. The logging settings exclude anything below WARN
for org.apache.zookeeper, org.apache.solr.core.SolrCore
and org.apache.solr.u
_One_ idea would be to configure your java to dump core on the oom error -
you can then load the dump into some analyzers, eg. Eclipse, and that may
give you the desired answers (I fortunately don't remember that from top of
my head how to activate the dump, but google will give your the answer)
r
bq: also want to make the commits more reliable.
How are they "unreliable"? It sounds like you're saying they're not
doing what you expect.
If you're talking about predicting when the documents will be
searchable, then Mikhail is spot on.
There's also "real time get" which fetches the most recen
Hello,
As mentioned a week ago, I have setup a contest to collect Solr Usability
ideas (http://search-lucene.com/m/QMVb129wpXc/). It is fully explained in a
blog post:
http://blog.outerthoughts.com/2013/07/announcing-solr-usability-contest/
I am hoping that this will be announced at several commu
Solr supports pure negative queries, but only at the top level. Pure
negative sub-queries are not supported. To work around this limitation your
need to add "*:*" to the sub-query:
(offTime:[2013-07-24T14:35:46.319Z TO *]) OR (*:* NOT offTime:[* TO *])
-- Jack Krupansky
-Original Message
Are you using softcommits heavily? I heard that using softcommits heavily
(every second) and not using hard commit for long time causes out of memory
issues since SOLR uses hashmap for transaction log.
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-debug-an-OutOfMemo
Thanks for your response.
We are planning to move to SOLR 4.3.1 from 3.5.x. Currently we just use hard
commits every 30 minutes (as we are using 3.X), but we want to do softcommit
in new version of SOLR and we also want to make the commits more reliable.
--
View this message in context:
http
I've encountered an OOM that seems to come after the server has been up for a
few weeks.
While I would love for someone to just tell me "you did X wrong", I'm more
interested in trying to debug this. So, given the error below, where would I
look next? The only odd thing that sticks out to me is t
Soft commits and NRT don't exist in Solr 3.x so
I'm really confused. In 3.6 you have to do hard commits
and they're expensive 1/sec is far too often...
Erick
On Wed, Jul 24, 2013 at 5:10 AM, Mikhail Khludnev
wrote:
> Hello,
>
> First of all, I don't think it can commit (even soft) every second,
On Tue, Jul 23, 2013 at 10:05 PM, Matt Lieber wrote:
> That sounds like a satisfactory solution for the time being -
> I am assuming you dump the data from Solr in a csv format?
>
JSON
> How did you implement the streaming processor ? (what tool did you use for
> this? Not familiar with that)
Ali:
Thanks for bringing closure to this...
On Tue, Jul 23, 2013 at 5:00 PM, Ali, Saqib wrote:
> Thanks Alan and Shawn. Just installed Solr 4.4, and no longer experiencing
> the issue.
>
> Thanks! :)
>
>
> On Tue, Jul 23, 2013 at 7:21 AM, Shawn Heisey wrote:
>
>> On 7/23/2013 7:50 AM, Alan Wood
You're mixing a couple of things here.
1> With SolrCloud, you don't need to specify shards. That's only
really for non-SolrCloud mode.
2> You can add &distrib=false to your query to only return the results
from the node you direct the query to.
e.g.
http://localhost:7574/solr/collection1/select?
Your application has to handle it, this is kind of a
"federated search" thing. So a sorted map is a fine way
to go, or you could just have a sorted set with an
overridden compare function.
On Wed, Jul 24, 2013 at 2:05 AM, Vineet Mishra wrote:
> Hi
>
> I have a Master Solr through which I am query
Mikhail,
It is a slightly hacked JSONWriter - actually, while poking around, I have
discovered that dumping big hitsets would be possible - the main hurdle
right now, is that writer is expecting to receive docuemnts with fields
loaded, but if it received something that loads docs lazily, you could
Dear Solr-list,
We are experimenting an unexpected behaviour of solr using date range
queries combined with OR. The use case is (reduced to this simple
example): We want to get all content which does not have the offTime
field defined or, if the offTime field is defined, it should be in the
future
Hello folks,
I have a query that is very very slow because too many sub queries and i
want to know if have another way to populate some fields after execute
query.
Example: first i execute my 'main' query and when it completes, i get every
'id' fields from my document and execute a secondary quer
Hi! http://gurunkew.com/google.com.offers.html
Apologize if this is not the correct way to request mailing list admin
support but it's pretty clear that wired...@yahoo.com is spamming this
list and should be booted out.
Tim
To eliminate the possibility of errors, you need to buffer the query as
indicated in the wiki. If you don't and you use a super-small maxDistErr
as you tell me you are doing, then you are merely making the probability
of hitting an error small (perhaps even very very small), but not
nonexistent.
Log messages?
On Wed, Jul 24, 2013 at 1:37 AM, Neil Prosser wrote:
> Great. Thanks for your suggestions. I'll go through them and see what I can
> come up with to try and tame my GC pauses. I'll also make sure I upgrade to
> 4.4 before I start. Then at least I know I've got all the latest changes
+1 for this. Some use cases:
* JIRA issue sorting
* Log level sorting
* User role sorting
...
Doable at Lucene level by any chance?
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Wed, Jul 24, 2013 at 12:15 PM, Elran Dvir wrot
Yes, per-field facet method is supported.
-- Jack Krupansky
-Original Message-
From: GaneshSe
Sent: Wednesday, July 24, 2013 10:20 AM
To: solr-user@lucene.apache.org
Subject: facet.method value per field
We are trying to use facet across multiple fields, we would like to know how
to c
Quick behavior check on whether Solr continues to process queries and
index documents during a collection reload?
For example, after I upload new config documents to Zookeeper, I issue
a reload command using the collections API. Of course this propagates
a core reload across all nodes in the colle
The debugQuery=true parameter will give you an "explain" section that
details what terms matched in each document. There is an XML version as well
("debug.explain.structured").
Unfortunately, these are the analyzed (stemmed, lower case, synonyms
expanded) terms. Pick your poison!
-- Jack Kru
We are trying to use facet across multiple fields, we would like to know how
to control the facet.method based on the field that we are using. There are
some fields for us which makes sense to use facet.method=enum and for some
other fields facet.method=fc makes more sense. This depends on the type
Details here:
http://wiki.apache.org/solr/RealTimeGet
-- Jack Krupansky
-Original Message-
From: Furkan KAMACI
Sent: Wednesday, July 24, 2013 5:07 AM
To: solr-user@lucene.apache.org
Subject: Usage Of Real Time Get Handler Of Solr
Hi;
There is a real time get handler at Solr:
In that case, the answer is that no, Solr does not have such a feature.
You could simulate it by doing a separate query (using the method I
suggested) for each of the 10 employees, one at a time.
-- Jack Krupansky
-Original Message-
From: Alok Bhandari
Sent: Wednesday, July 24, 2013
Thanks Jack.
It may be the case that I was unable to explain the query correctly.
Actually I don't want it for a single employee I want it for all the
employees that are updated in that time range. So if lets say 10 employees
data is updated in the given time-range and that also multiple times the
Do your time range query, sort by the time field as "descending", and take
the first result.
-- Jack Krupansky
-Original Message-
From: Alok Bhandari
Sent: Wednesday, July 24, 2013 9:08 AM
To: solr-user@lucene.apache.org
Subject: how to get unique latest results from solr
Hello All,
I tried reducing the maxDistErr to "0.01", just to test making it smaller.
I got maxLevels down to 45, and slightly better query times (Indexing time
was about the same). However, my queries are not accurate anymore. I need
to pad by 2 or 3 whole numbers to get a hit now, which won't work in real
u
Hello All,
I am using solr 4.0. I have a data in my solrindex where on each review of a
document a new entry for a document is added in solr , here document also
have a field which holds employee_id and entry also holds the timestamp of
when that record is added.
Now I want to query this index
I'm using java-1.7.0-openjdk-1.7.0.3-2.1.el6.1.x86_64 and
tomcat6-6.0.24-48.el6_3.noarch.
I tested with the 4.4 solr version but I still have the bug.
Elodie
Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris
thanks Michael, adding autoCommit sorted it.
cheers,
Alistair
--
mov eax,1
mov ebx,0
int 80h
On 23/07/2013 18:34, "Michael Della Bitta"
wrote:
>Hi Alistair,
>
>You probably need a commit, and not an optimize.
>
>Which version of Solr are you running against? The 4.0 releases have more
>co
Sounds like this problem has to be on your client environment. Should check
if your client environment requires different authentication or if there is
a firewall that blocking your request.
On Wed, Jul 24, 2013 at 9:20 PM, bucci wrote:
> Hello,
> we've got following error when we are trying to
I can connect to solr directly, and I can search for articles so solr query
seems to work fine. The problem is when I try to add/edit documents so it
looks like some issue in adding index or commit operation. Added documents
are not big-sized
--
View this message in context:
http://lucene.47206
Hello,
we've got following error when we are trying to index(add) single document
to our Knowledge Base project. It works fine on our test environment but
with any attempt to add index in client environment it throws Read timeout
error.
When searching articles it's ok, problem occurs only at inser
Hey Guys,
I was wondering if anyone has successfully been able to connect to SOLR
4.3.1 using DIGEST authentication with HttpSolrServer ?
*How I generated the password*
./digest.sh -a md5 admin:secure:password
admin:secure:password:e430caca84c337d4b820c44c1ebc943a
*I can successfully log in via
Hi All,
We have encountered a use case in our system where we have a few fields
(Severity. Risk etc) with a closed set of values, where the sort order for
these values is pre-determined but not lexicographic (Critical is higher than
High). Generically this is very close to how enums work.
To i
I am indexing and check the admin stats page. I see that:
commits:471
autocommit maxTime:15000ms
autocommits:414
soft autocommits:0
optimizes:12
docsPending:388
adds:305
cumulative_adds:2154245
Im new to solr.I have successfully indexed oracle 10g xe database. Im trying
to perform delta import on the same.
The Delta query required a comparison of last_modified column of the table
with ${dih.last_index_time}.
However in my application I do not have such a column . Also, i cannot add
this c
Roman,
Can you disclosure how that streaming writer works? What does it stream
docList or docSet?
Thanks
On Wed, Jul 24, 2013 at 5:57 AM, Roman Chyla wrote:
> Hello Matt,
>
> You can consider writing a batch processing handler, which receives a query
> and instead of sending results back, it
Because it's a get and not a search handler. It takes the id parameter and
returns the "latest stored fields of" document with the specified ID.
-Original message-
> From:Furkan KAMACI
> Sent: Wednesday 24th July 2013 11:07
> To: solr-user@lucene.apache.org
> Subject: Usage Of Real Tim
Hello,
First of all, I don't think it can commit (even soft) every second, afaik
it's too frequent for typical deployment. Hence, if you really need such
(N)RT I suggest you experiment with it right now, to face the bummer
sooner.
Also, one second durability sounds like over-expectation for Solr,
Hi;
There is a real time get handler at Solr:
true
json
true
${solr.ulog.dir:}
I m not bothered about the leader. I just want to check if a particluar core
is up on a particular solr instance.
My Use case is as follows
I have to create a core on one instance and then there is some DB code. If
after creating the core the DB action fails then the entire task is repeated
again.
I am looking for a feature in SOLR, that will give me all matched words in
the document when I search with a word.
My search FIELD uses Stemming and as well as Synonym filters.
For example I have documents and part of the text goes like below
1.We were very careful about my surgery
2.are still
Can I get more insight into why you'd want to query a specific replica?
Also, leader election is dynamic in SolrCloud, so unless you have a ZK
aware client (CloudSolrServer) you may not be hitting the leader when you
think you are.
Having said that, why would you be bothered with 'who's the leader'
Hi,
I had a requirement wherein I wanted to query a specific core on a specific
solr instance . I found the following content in solr wiki
*
Explicitly specify the addresses of shards you want to query:
http://localhost:7574/solr/collection1/select?shards=localhost:7574/solr/collection1/*
Now su
Great. Thanks for your suggestions. I'll go through them and see what I can
come up with to try and tame my GC pauses. I'll also make sure I upgrade to
4.4 before I start. Then at least I know I've got all the latest changes.
In the meantime, does anyone have any idea why I am able to get leaders
84 matches
Mail list logo