Yes, before indexing, we go and check whether that document is already
there in index or not.
Because along with the document, we also have meta-data information which
needs to be appended.
So, we have few multivalued metadata fields, which we update if the same
document is found again.
On Fri,
So you will need to do a search for each document before adding it to the
index, in case it is already there. That will be slow.
And where do you store the last-assigned number?
And there are plenty of other problems, like reloading after a corrupted index
(disk failure), or deleted documents w
Actually not.
If i am updating the existing document, i need to keep the old number
itself.
may be this way we can do it.
If we pass the number to the field, it will take that value, if we dont
pass it, it will do auto-increment.
Because if we update, i will have old number and i will pass it as a
Why?
When you reindex, is it OK if they all change?
If you reindex one document, is it OK if it gets a new sequential number?
wunder
On Apr 5, 2012, at 9:23 PM, Manish Bafna wrote:
> We already have a unique key (We use md5 value).
> We need another id (sequential numbers).
>
> On Fri, Apr 6,
We already have a unique key (We use md5 value).
We need another id (sequential numbers).
On Fri, Apr 6, 2012 at 9:47 AM, Chris Hostetter wrote:
>
> : We need to have a document id available for every document (Per core).
>
> : We can pass docid as one of the parameter for fq, and it will return
: We need to have a document id available for every document (Per core).
: We can pass docid as one of the parameter for fq, and it will return the
: docid in the search result.
So it sounds like you need a *unique* id, but nothing you described
requies that it be a counter.
Take a look at th
We need to have a document id available for every document (Per core).
There is DocID in Lucene Index but did not find any API to expose it using
Solr.
May be if we can alter Solr to optionally return the DocId (which is
unique),
We can pass docid as one of the parameter for fq, and it will retur
> I am considering writing a small tool that would read from
> one solr core
> and write to another as a means of quick re-indexing of
> data. I have a
> large-ish set (hundreds of thousands) of documents that I've
> already parsed
> with Tika and I keep changing bits and pieces in schema and
> co
hi,Erick.
thanks at first.
I had watched the status of JVM at runtime helped by "jconsole" and "jmap".
1,When the "Xmx" was not assigned, then, the "Old Gen" area was full whose
size was up to 1.5Gb and whose major content are instances of "String" ,
when the whole size of heap was up to the maxim
maxPosAsterisk - maximum position (1-based) of the asterisk wildcard ('*')
that triggers the reversal of query term. Asterisk that occurs at positions
higher than this value will not cause the reversal of query term. Defaults
to 2, meaning that asterisks on positions 1 and 2 will cause a reversal.
Apologies if this is a very straightforward schema design problem that
should be fairly obvious, but I'm not seeing a good way to do it.
Let's say I have an index that wants to model Albums and Tracks, and
they all have arbitrary tags attached to them (represented by
multivalue string type fields).
Hi,
I am using EmbeddedSolrServer for full indexing (Multi core)
and StreamingUpdateSolrServer for incremental indexing.
The steps involved are mentioned below.
Full indexing (Daily)
1) Start EmbeddedSolrServer
2) Delete all docs
3) Add all docs
4) Commit and optimize collection
5) Stop Embedde
First of all, what I was seeing was different from what I thought I was seeing
because a few weeks ago I uncommented the block in the
solrconfig.xml file and I didn't realize it until yesterday just before I went
home, so that was controlling the commits more than the add and commit calls
that
Because the first query result doesn't meet my requirement
I have to do a secondary process manually based on the first query full
results.
Only after I finish the secondary process, I begin to show it to the end
user based on specific records(for instance like the Solr does 10 records a
time)
one
"What's memory"? Really, how are you measuring it?
If it's virtual, you don't need to worry about it. Is this
causing you a real problem or are you just nervous about
the difference?
Best
Erick
On Wed, Apr 4, 2012 at 11:23 PM, a sd wrote:
> hi,all.
> I have write a program which send data to
Of course putting more clauses in an OR query will
have a performance cost, there's more work to do
OK, being a smart-alec aside you will probably
be fine with a few hundred clauses. The question
is simply whether the performance hit is acceptable.
I'm afraid that question can't be answered in
Solr version? I suspect your outlier is due to merging
segments, if so this should have happened quite some time
into the run. See Simon Wilnauer's blog post on
DocumenWriterPerThread (trunk) code.
What commitWithin time are you using?
Best
Erick
On Wed, Apr 4, 2012 at 7:50 PM, Mike O'Leary wr
This is really difficult to imagine working well. Even if you
do choose the appropriate analysis chain (and it must
be a chain here), and manage to appropriately tokenize
for each language, what happens at query time?
How do you expect to get matches on, say, Ukranian when
the tokens of the query
the default query operator is pretty much ignored with (e)dismax
style parsers. You can get there by varying the "mm" parameter.
See:
http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
Best
Erick
On Tue, Apr 3, 2012 at 10:58 PM, neosky wrote:
> 1.I did 5 gram t
Hi folks, I'm a little stumped here.
I have an existing solr 1.4 setup which is well configured. I want to
upgrade to the latest solr release, and after reading release notes, the
wiki, etc, I concluded the correct path would be to not change any
config items and just replace the solr.war file
:
:
:
:
:
:
:
i'm pretty sure what you are seeing here is a variation on the "stopwords"
confusion people tend to have about dismax (and edismax)
just like hte lucene qparser, "whitespace" in the query string is
significant, and is
: > Is it possible to define a field as "Counter Column" which can be
: > auto-incremented.
a feature like this does not exist in Solr at the moment, but it would be
possible to implement this fairly easily in an UpdateProcessor -- however
it would only be functional in very limited situations
I worked through the Solr tutorial and everthing worked like a charm;
I figured I would go ahead and install Jetty and try to install Solr
and get a functional prototype search engine up. Unfortunatly, my
Jetty installation seems to be broken:
HTTP ERROR 500
Problem accessing /solr/admin/index.
Thanks for all the replies on this. It turns out that the reason that I
wasn't getting the expected results is because I was not properly indexed
one of the fields. My content type display settings for that field were set
to hidden in Drupal. After I corrected this and re-indexed I started
getting
> It looks like somehow the query is getting converted from "library" to
> "librari". Any idea how that would happen?
Yeah, that happens from having stemming involved in your query time analysis
(look at your field type, you've surely got Snowball in there)
Also, you're using the dismax query pa
On Thu, Apr 5, 2012 at 12:19 AM, Jamie Johnson wrote:
> Not sure if this got lost in the shuffle, were there any thoughts on this?
Sorting by "id" could be pretty expensive (memory-wise), so I don't
think it should be default or anything.
We also need a way for a client to hit the same set of ser
Hi,
I am getting the below log's
Apr 5, 2012 6:27:59 PM org.apache.commons.httpclient.HttpMethodDirector
executeWithRetry
INFO: I/O exception (org.apache.commons.httpclient.NoHttpResponseException)
caught when processing request: The server 192.168.6.135 failed to respond
Apr 5, 2012 6:27:59 PM
(12/04/05 15:34), Thomas Werthmüller wrote:
Hi
I configured solr that also word parts are found. When is search "Monday"
or "Mond" the right document is found. This is done with the following
configuration in the schema.xml:.
Now, when I add hl=true to the query sting, the excerpt for "Monday"
Hi
I deployed a solr cluster,the code version is
"NightlyBuilds apache-solr-4.0-2012-03-19_09-25-37".
Cluster has 4 nodes named "A", "B", "C", "D", "num_shards=2", A and
C in shard1 , B and D in shard2, A and B is the leader of their shard. It
has ran 2 days, added 20m docs, all of th
the problem is how do I determine for each document the degree of
separation and then apply boosting for example -
say there is a user A - with friends X, Y, Z and another User B with
friends L, M
if there is a doc in index D1, with author field as Z and another doc D2 in
index with author as L,
I
Hi Manuel,
Why don't you create a program to parse the html files, maybe using xslt,
and them submit the output to Solr?
---
Marcelo
On Thursday, April 5, 2012, Manuel Antonio Novoa Proenza <
mano...@estudiantes.uci.cu> wrote:
> Hello,
>
> I would like to know the method of extracting from the i
>
> Hi,
> Is it possible to define a field as "Counter Column" which can be
> auto-incremented.
>
> Thanks,
> Manish.
>
32 matches
Mail list logo