You could MD4 the parts you care about, store that, fetch it and compare.
If there is a reliable timestamp, you could use that. But that would be
app-dependent.
In general, you need to store some info about each source document
and figure out whether it is new. This get much hairier with a web
spi
: number of records. We collect our data from a number of sources and each
: source produces around 50,000 docs. Each of these document has a "sourceId"
: field indicating the source of the document. Now assuming we're indexing all
: documents from SourceA (sourceId="SourceA"), majority of these d
Hi,
I am not sure if this can be done. Let's say if periodically there is a
big batch to be indexed and we don't want to replicate the data befor
the batch is completely indexed. We would like to avoid post commit
hook as we will be periodically committing to reduce the memory usage
and we a
On 14-Sep-07, at 3:38 PM, Tom Hill wrote:
Hi Mike,
Thanks for clarifying what has been a bit of a black box to me.
A couple of questions, to increase my understanding, if you don't
mind.
If I am only using fields with multiValued="false", with a type of
"string"
or "integer" (untokenize
Hi Mike,
Thanks for clarifying what has been a bit of a black box to me.
A couple of questions, to increase my understanding, if you don't mind.
If I am only using fields with multiValued="false", with a type of "string"
or "integer" (untokenized), does solr automatically use approach 2? Or is
: When you say "outside of Solr" do you mean outside of solr.war? We finally
: got php/curl working with jetty's Basic Authentication. We had to unpack and
: repack solr.war to edit web.xml and it would have been nice to use some
: other method.
it should not be neccessary to unpack the war ... yo
I apologize for missing that. I added an anchor at the top and a link
where the word "overrides" is in the wiki.
Thanks,
-Nathan
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, September 14, 2007 10:53 AM
To: solr-user@lucene.apache.org
Subject: Re: hl.sni
You could use index into multiple fields with different analyzers
and search all of them.
text_en: uses English stemmer
text_de: uses German stemmer
text_exact: no stemming
text_strip: uses ISOLatin1AccentFilter
You can search all of these and put different boosts on them,
with higher boosts for
You can try the public/private key certficate system. You deploy it to
jetty/tomcat somehow, and curl has options to send it.
We haven't tried this. The authentication happens at the http container
level, not in the solr config.
-Original Message-
From: Bill Au [mailto:[EMAIL PROTECTED]
S
Hi Tom,
thanks for your professional response -- works fine and looks good :-).
Since I am playing around with mixed texts (English and German), I do
not have any idea whether or not an EnglishPorter will be useful for
German texts. But I will find it out by playing around ;-)
Regards from G
I meant outside of the Solr code. You are right that it is still in the
Solr war file since you will need to put the authentication configuration
into web.xml.
Bill
On 9/14/07, jenix <[EMAIL PROTECTED]> wrote:
>
>
> When you say "outside of Solr" do you mean outside of solr.war? We finally
> got
Hi Marc,
The searches are going to look for an exact match of the query (after
analysis) in the index (after analysis).
So, realli will not match really.
So you want to have the same stemmer (probably not the English one, given
your examples) in both in index analyzer, and the query analyzer. I'
Hi Tom,
thanks for your response -- and sorry for the newbie question, may sound
somehow silly ;-) . Here the quick result of the analysis UI:
Index for "really": 5* really. Query for "really": 5* really, 2* realli
(from: EnglishPorterFilterFactory {protected=protwords.txt},
RemoveDuplicates
Hi Marc,
Are you using the same stemmer on your queries that you use when indexing?
Try the analysis function in the admin UI, to see how things are stemmed for
indexing vs. querying. If they don't match for really and fünny, and do
match for kraßen, then that's your problem.
Tom
On 9/14/07, M
On 14-Sep-07, at 5:19 AM, Thompson,Roger wrote:
Hi there!
I am embarking on re-engineering an application using Solr/Lucene (If
you'd like to see the current manifestation go to:
fictionfinder.oclc.org). The database for this application
consists of
approximatly 1.4 million records of varyin
On Sep 14, 2007, at 8:19 AM, Thompson,Roger wrote:
I am embarking on re-engineering an application using Solr/Lucene (If
you'd like to see the current manifestation go to:
fictionfinder.oclc.org). The database for this application
consists of
approximatly 1.4 million records of varying size f
On Sep 14, 2007, at 12:33 PM, Nathaniel E. Powell wrote:
http://wiki.apache.org/solr/
HighlightingParameters#head-23ecd5061bc2c86a
561f85dc1303979fe614b956
where it talks about the hl.snippets parameter, it says that it can be
overridden on a per-field basis. I haven't been able to find any
In the wiki:
http://wiki.apache.org/solr/HighlightingParameters#head-23ecd5061bc2c86a
561f85dc1303979fe614b956
where it talks about the hl.snippets parameter, it says that it can be
overridden on a per-field basis. I haven't been able to find any
information in the documentation or on the m
Hi,
oops, the URIEncoding was lost during the update to tomcat 6.0.14.
Thanks for the advice.
But now I am really curioused. After indexing the document from scratch,
I have the effect that queries to "this" and "is" work fine, whereas
queries to "really" and "fünny" do not return the result. Fü
When you say "outside of Solr" do you mean outside of solr.war? We finally
got php/curl working with jetty's Basic Authentication. We had to unpack and
repack solr.war to edit web.xml and it would have been nice to use some
other method.
--
View this message in context:
http://www.nabble.com/Aut
Add/Update, Commit/Optimize, Delete, and Delete by query, in Solr are done
using the url /update. So should be able to protect that url at the
container level outside of Solr. If you want you can protect the query url
/select or the admin pages too. Container level authentication is
transparent
Hi,
What methods are available for user authentication? I'm using Jetty and
php/curl and Basic HTTP Auth does not seem to work. I just need something
simple so that only the Admin can add, update or delete documents.
Regards,
Jennifer Seaman
--
View this message in context:
http://www.nabble.
Hi Erik,
>>So in your case #1, documents are reindexed with this scheme - so if you
>>truly need to skip a reindexing for some reason (why, though?) you'll
>>need to come up with some other mechanism. [perhaps update could be
>>enhanced to allow ignoring a duplicate id rather than reindexing?]
I
You might find the dynamic fields useful. From the schema.xml:
So you could have a document like:
Ed Summers
Library of Congress
without having to explicitly name these fields in the schema.xml. Does
that help at all?
//Ed
Hi there!
I am embarking on re-engineering an application using Solr/Lucene (If
you'd like to see the current manifestation go to:
fictionfinder.oclc.org). The database for this application consists of
approximatly 1.4 million records of varying size for the "work" record,
and another database o
Cuong,
I accomplished (in Collex) by attaching a "batch number" to each
document. When indexing a batch (or source), a GUID is generated and
every document from that batch/source gets that same identifier
attached to it. At the end of the indexing run, I delete everything
with that sour
Hi,
I am new to Solr and am trying to implementing a solution for indexing and
searching using Embedded Solr.
However, i have a query w.r.t SolrSchema :
How do i generate the schema fields programatically, instead of defining
them in the schema.xml ?
Regards,
Venkat
[apologies for sending a WIP
Hi,
I am new to Solr and am t
--
Blog @ http://blizzardzblogs.blogspot.com
Hi all,
I've been struggling to find a good way to synchronize Solr with a large
number of records. We collect our data from a number of sources and each
source produces around 50,000 docs. Each of these document has a "sourceId"
field indicating the source of the document. Now assuming we're inde
29 matches
Mail list logo