Hi,
I'm after a bit of clarification about the 'limitations' section of the
distributed search page on the wiki.
The first two limitations say:
* Documents must have a unique key and the unique key must be stored
(stored="true" in schema.xml)
* When duplicate doc IDs are received, Solr chooses
Mark Miller-3 wrote:
>
> The 'doc ID' in the second point refers to the unique key in the first
> point.
>
I thought so but thanks for clarifying. Maybe a wording change on the wiki
would be good?
Cheers,
Andrew.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Duplicat
Another question...
I have a series of cores representing historical data, only the most recent
of which gets indexed to.
I'd like to alias the most recent one to 'current' so that when they roll
over I can just change the alias, and the cron jobs etc. which manage
indexing don't have to change.
Mark Miller-3 wrote:
>
> On 7/4/10 12:49 PM, Andrew Clegg wrote:
>> I thought so but thanks for clarifying. Maybe a wording change on the
>> wiki
>
> Sounds like a good idea - go ahead and make the change if you'd like.
>
That page seems to be marked immut
Chris Hostetter-3 wrote:
>
> a cleaner way to deal with this would be do use something like
> RewriteRule -- either in your appserver (if it supports a feature like
> that) or in a proxy sitting in front of Solr.
>
I think we'll go with this -- seems like the most bulletproof way.
Cheers,
Is anyone using ZooKeeper-based Solr Cloud in production yet? Any war
stories? Any problematic missing features?
Thanks,
Andrew.
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-in-production-tp991995p991995.html
Sent from the Solr - User mailing list archive at N
Hi,
I'm a little confused about how the tuning params in solrconfig.xml actually
work.
My index currently has mergeFactor=25 and maxMergeDocs=2147483647.
So this means that up to 25 segments can be created before a merge happens,
and each segment can have up to 2bn docs in, right?
But this pag
Okay, thanks Marc. I don't really have any complaints about performance
(yet!) but I'm still wondering how the mechanics work, e.g. when you have a
number of segments equal to mergeFactor, and each contains maxMergeDocs
documents.
The docs are a bit fuzzy on this...
--
View this message in conte
--
View this message in context:
http://lucene.472066.n3.nabble.com/Duplicate-docs-when-mergin-tp1261979p1261979.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi,
First off, sorry about previous accidental post, had a sausage-fingered
moment.
Anyway...
If I merge two indices with CoreAdmin, as detailed here...
http://wiki.apache.org/solr/MergingSolrIndexes
What happens to duplicate documents between the two? i.e. those that have
the same unique key
(Many apologies if this appears twice, I tried to send it via Nabble
first but it seems to have got stuck, and is fairly urgent/serious.)
Hi,
I'm trying to use the replication handler to take snapshots, then
archive them and ship them off-site.
Just now I got a message from tar that worried me:
January 2011 12:30, Andrew Clegg wrote:
> (Many apologies if this appears twice, I tried to send it via Nabble
> first but it seems to have got stuck, and is fairly urgent/serious.)
>
> Hi,
>
> I'm trying to use the replication handler to take snapshots, then
> archive th
ripts.conf,solrconfig_slave.xml:solrconfig.xml,stopwords.txt,synonyms.txt
00:00:10
Thanks,
Andrew.
On 16 January 2011 12:55, Andrew Clegg wrote:
> PS one other point I didn't mention is that this server has a very
> fast autocommit limit (2 seconds max time).
&g
First of all, apologies if you get this twice. I posted it by email an hour
ago but it hasn't appeared in any of the archives, so I'm worried it's got
junked somewhere.
I'm trying to use a DataImportHandler to merge some data from a database
with some other fields from a collection of XML files,
Chantal Ackermann wrote:
>
> Hi Andrew,
>
> your inner entity uses an XML type datasource. The default entity
> processor is the SQL one, however.
>
> For your inner entity, you have to specify the correct entity processor
> explicitly. You do that by adding the attribute "processor", and th
Erik Hatcher wrote:
>
>
> On Jul 30, 2009, at 11:54 AM, Andrew Clegg wrote:
>>> url="${domain.pdb_code}-noatom.xml" processor="XPathEntityProcessor"
>> forEach="/">
>>> xpath="//*[local-name(
Chantal Ackermann wrote:
>
>
> my experience with XPathEntityProcessor is non-existent. ;-)
>
>
Don't worry -- your hints put me on the right track :-)
I got it working with:
Now, to get it to ignore missing files without an error... Hmm...
Che
A couple of questions about the DIH XPath syntax...
The docs say it supports:
xpath="/a/b/subje...@qualifier='fullTitle']"
xpath="/a/b/subject/@qualifier"
xpath="/a/b/c"
Does the second one mean "select the value of the attribute called qualifier
in the /a/b/subject element"?
e.g. For
Andrew Clegg wrote:
>
>
>
Sorry, Nabble swallowed my XML example. That was supposed to be
[a]
[b]
[subject qualifier="some text" /]
[/b]
[/a]
... but in XML.
Andrew.
--
View this message in context:
http://www.nabble.com/Questions-about-XPath-in-
Noble Paul നോബിള് नोब्ळ्-2 wrote:
>
> On Thu, Aug 13, 2009 at 6:35 PM, Andrew Clegg
> wrote:
>
>> Does the second one mean "select the value of the attribute called
>> qualifier
>> in the /a/b/subject element"?
>
> yes you are right. Isn
Noble Paul നോബിള് नोब्ळ्-2 wrote:
>
> yes. look at the 'flatten' attribute in the field. It should give you
> all the text (not attributes) under a given node.
>
>
I missed that one -- many thanks.
Andrew.
--
View this message in context:
http://www.nabble.com/Questions-about-XPath-in-d
Hi folks,
I'm trying to use the Debug Now button in the development console to test
the effects of some changes in my data import config (see attached).
However, each time I click it, the right-hand frame fails to load -- it just
gets replaced with the standard 'connection reset' message from Fi
Noble Paul നോബിള് नोब्ळ्-2 wrote:
>
> apparently I do not see any command full-import, delta-import being
> fired. Is that true?
>
It seems that way -- they're not appearing in the logs. I've tried Debug Now
with both full and delta selected from the dropdown, no difference either
way.
If
ahammad wrote:
>
> Is it possible to add a prefix to the data in a Solr field? For example,
> right now, I have a field called "id" that gets data from a DB through the
> DataImportHandler. The DB returns a 4-character string like "ag5f". Would
> it be possible to add a prefix to the data that
Try a sdouble or sfloat field type?
Andrew.
johan.sjoberg wrote:
>
> Hi,
>
> we're performing range queries of a field which is of type double. Some
> queries which should generate results does not, and I think it's best
> explained by the following examples; it's also expected to exist data
Paul Tomblin wrote:
>
> Is there such a thing as a wildcard search? If I have a simple
> solr.StrField with no analyzer defined, can I query for "foo*" or
> "foo.*" and get everything that starts with "foo" such as 'foobar" and
> "foobaz"?
>
Yes. foo* is fine even on a simple string field.
You can use the Data Import Handler to pull data out of any XML or SQL data
source:
http://wiki.apache.org/solr/DataImportHandler
Andrew.
Elaine Li wrote:
>
> Hi,
>
> I am new solr user. I want to use solr search to run query against
> many xml files I have.
> I have set up the solr server
Hi all, I'm having problems getting Solr to start on Tomcat 6.
Tomcat is installed in /opt/apache-tomcat , solr is in
/opt/apache-tomcat/webapps/solr , and my Solr home directory is /opt/solr .
My config file is in /opt/solr/conf/solrconfig.xml .
I have a Solr-specific context file in
/opt/apach
Constantijn Visinescu wrote:
>
> This might be a bit of a hack but i got this in the web.xml of my
> applicatin
> and it works great.
>
>
>
>solr/home
>/Solr/WebRoot/WEB-INF/solr
>java.lang.String
>
>
>
That worked, thanks. You're right though, it is a
hossman wrote:
>
>
> : Hi all, I'm having problems getting Solr to start on Tomcat 6.
>
> which version of Solr?
>
>
Sorry -- a nightly build from about a month ago. Re. your other message, I
was sure the two machines had the same version on, but maybe not -- when I'm
back in the office tom
Andrew Clegg wrote:
>
>
> hossman wrote:
>>
>>
>> This is why the examples of using context files on the wiki talk about
>> keeping the war *outside* of the webapps directory, and using docBase in
>> your Context declaration...
>
Hi folks,
I'm using the 2009-09-30 build, and any single or double quotes in the query
string cause an NPE. Is this normal behaviour? I never tried it with my
previous installation.
Example:
http://myserver:8080/solr/select/?title:%22Creatine+kinase%22
(I've also tried without the URL encoding
r-4 wrote:
>
> don't forget q=... :)
>
> Erik
>
> On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote:
>
>>
>> Hi folks,
>>
>> I'm using the 2009-09-30 build, and any single or double quotes in
>> the query
>> string cause an NPE
Hi,
I have a field in my index called related_ids, indexed and stored, with the
following field type:
Several records in my index contain the token 1cuk in the related_ids field,
but only *some* of them are r
ncethat you're hitting that
> limit? That 1cuk is past the 10,000th term
> in record 2.40?
>
> For this to be possible, I have to assume that the FieldAnalysis
> tool ignores this limit
>
> FWIW
> Erick
>
> On Fri, Oct 23, 2009 at 12:01 PM, Andrew Clegg
> wrote:
Morning,
Last week I was having a problem with terms visible in my search results in
large documents not causing query hits:
http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-td26029040.html#a26029351
Erick suggested it might be related to maxFieldLength,
I can reproduce a problem with maxFieldLength being
> ignored.
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> On Mon, Oct 26, 2009 at 7:11 AM, Andrew Clegg
> wrote:
>>
>> Morning,
>>
>> Last week I was having a problem with terms visible
Yonik Seeley-2 wrote:
>
> Sorry Andrew, this is something that's bitten people before.
> search for maxFieldLength and you will see *2* of them in your config
> - one for indexDefaults and one for mainIndex.
> The one in mainIndex is set at 1 and hence overrides the one in
> indexDefaults.
Yonik Seeley-2 wrote:
>
> If you could, it would be great if you could test commenting out the
> one in mainIndex and see if it inherits correctly from
> indexDefaults... if so, I can comment it out in the example and remove
> one other little thing that people could get wrong.
>
Yep, it seems
Hi,
If I have a DataImportHandler query with a greater-than sign in, like this:
Everything's fine. However, if it contains a less-than sign:
I get this exception:
INFO: Processing configuration from solrconfig.xml: {config=dataconfig.xml}
[Fatal Error] :240:129: The value o
e, so it has to obey XML encoding rules which
> make it ugly but whatcha gonna do?
>
> Erik
>
> On Oct 27, 2009, at 11:50 AM, Andrew Clegg wrote:
>
>>
>> Hi,
>>
>> If I have a DataImportHandler query with a greater-than sign in,
>> like th
Hi,
If I give a query that matches a single document, and facet on a particular
field, I get a list of all the terms in that field which appear in that
document.
(I also get some with a count of zero, I don't really understand where they
come from... ?)
Is it possible with faceting, or a simila
> For terms - http://wiki.apache.org/solr/TermsComponent
>
> Helps?
>
> Cheers
> Avlesh
>
> On Wed, Oct 28, 2009 at 11:32 PM, Andrew Clegg
> wrote:
>
>>
>> Hi,
>>
>> If I give a query that matches a single document, and facet on a
>> partic
Morning,
Can someone clarify how dismax queries work under the hood? I couldn't work
this particular point out from the documentation...
I get that they pretty much issue the user's query against all of the fields
in the schema -- or rather, all of the fields you've specified in the qf
parameter
set of analyzers
> assigned to that particular field for queries (as opposed to indexing).
> For
> example, if "test" is matched against a "string" vs "text" field,
> different
> analyzers may be applied to "string" or "text"
>
&g
optimize, there should be no 0-value facets.
>
> On Wed, Oct 28, 2009 at 11:36 AM, Andrew Clegg
> wrote:
>>
>>
>> Isn't the TermVectorComponent more for one document at a time, and the
>> TermsComponent for the whole index?
>>
>> Actually -- having
Actually Avlesh pointed me at that, earlier in the thread. But thanks :-)
Yonik Seeley-2 wrote:
>
> On Wed, Oct 28, 2009 at 2:02 PM, Andrew Clegg
> wrote:
>> If I give a query that matches a single document, and facet on a
>> particular
>> field, I get a list of
Hi,
I've recently added the TermVectorComponent as a separate handler, following
the example in the supplied config file, i.e.:
true
tvComponent
It works, but with one quirk. When you use tf.all=true, you
Hi everyone,
I'm experimenting with highlighting for the first time, and it seems
shockingly slow for some queries.
For example, this query:
http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on
takes 313ms. But when I add highlighting:
http://server:80
Hi everyone,
I'm experimenting with highlighting for the first time, and it seems
shockingly slow for some queries.
For example, this query:
http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on
takes 313ms. But when I add highlighting:
http://server:80
although not with those really long response
> times). Fixed by moving to JRE 1.6 and tuning garbage collection.
>
> Bye,
>
> Jaco.
>
> 2009/11/3 Andrew Clegg
>
>>
>> Hi everyone,
>>
>> I'm experimenting with highlighting for the first tim
Nicolas Dessaigne wrote:
>
> Alternatively, you could use a copyfield with a maxChars limit as your
> highlighting field. Works well in my case.
>
Thanks for the tip. We did think about doing something similar (only
enabling highlighting for certain shorter fields) but we decided that
perhaps
Hi,
If I run a MoreLikeThis query like the following:
http://www.cathdb.info/solr/mlt?q=id:3.40.50.720&rows=0&mlt.interestingTerms=list&mlt.match.include=false&mlt.fl=keywords&mlt.mintf=1&mlt.mindf=1
one of the hits in the results is "and" (I don't do any stopword removal on
this field).
Howev
Lukáš Vlček wrote:
>
> I am looking for good arguments to justify implementation a search for
> sites
> which are available on the public internet. There are many sites in
> "powered
> by Solr" section which are indexed by Google and other search engines but
> still they decided to invest resour
Morning all,
I'm having problems with joining child a child entity from one database to a
parent from another...
My entity definitions look like this (names changed for brevity):
c is getting indexed fine (it's stored, I can see field 'c' in the search
results) but child.d isn't. I know
Lukáš Vlček wrote:
>
> When you need to search for something Lucene or Solr related, which one do
> you use:
> - generic Google
> - go to a particular mail list web site and search from here (if there is
> any search form at all)
>
Both of these (Nabble in the second case) in case any recent p
Any ideas on this? Is it worth sending a bug report?
Those links are live, by the way, in case anyone wants to verify that MLT is
returning suggestions with very low tf.idf.
Cheers,
Andrew.
Andrew Clegg wrote:
>
> Hi,
>
> If I run a MoreLikeThis query like the followin
Noble Paul നോബിള് नोब्ळ्-2 wrote:
>
> no obvious issues.
> you may post your entire data-config.xml
>
Here it is, exactly as last attempt but with usernames etc. removed.
Ignore the comments and the unused FileDataSource...
http://old.nabble.com/file/p26335171/dataimport.temp.xml dataimpo
Chantal Ackermann wrote:
>
> no idea, I'm afraid - but could you sent the output of
> interestingTerms=details?
> This at least would show what MoreLikeThis uses, in comparison to the
> TermVectorComponent you've already pasted.
>
I can, but I'm afraid they're not very illuminating!
http://
Chantal Ackermann wrote:
>
> your URL does not include the parameter mlt.boost. Setting that to
> "true" made a noticeable difference for my queries.
>
Hmm, I'm really not sure if this is doing the right thing either. When I add
it I get:
1.0
0.60737264
0.27599618
0.2476748
0.24487767
aerox7 wrote:
>
> Hi Andrew,
> I download the last build of solr (1.4) and i have the same probleme with
> DebugNow in Dataimport dev Console. have you found a solution ?
>
Sorry about slow reply, I've been on holiday. No, I never found a solution,
it worked in some nightlies but not in other
Hi,
I'm interested in near-dupe removal as mentioned (briefly) here:
http://wiki.apache.org/solr/Deduplication
However the link for TextProfileSignature hasn't been filled in yet.
Does anyone have an example of using TextProfileSignature that demonstrates
the tunable parameters mentioned in th
'm missing something.
Thanks again,
Andrew.
Erik Hatcher-4 wrote:
>
>
> On Jan 12, 2010, at 7:56 AM, Andrew Clegg wrote:
>> I'm interested in near-dupe removal as mentioned (briefly) here:
>>
>> http://wiki.apache.org/solr/Deduplication
>>
>> Howe
Erik Hatcher-4 wrote:
>
>
> On Jan 12, 2010, at 9:15 AM, Andrew Clegg wrote:
>> Thanks Erik, but I'm still a little confused as to exactly where in
>> the Solr
>> config I set these parameters.
>
> You'd configure them within the element, so
Hi,
Is there a way to get the DataImportHandler to skip already-seen records
rather than reindexing them?
The UpdateHandler has an capability which (as I
understand it) means that a document whose uniqueKey matches one already in
the index will be skipped instead of overwritten.
Can the DIH be
Marc Sturlese wrote:
>
> You can use deduplication to do that. Create the signature based on the
> unique field or any field you want.
>
Cool, thanks, I hadn't thought of that.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Skipping-duplicates-in-DataImportHandler-based-
Hi,
I'm trying to get the Velocity / Solritas feature to work for one core of a
two-core Solr instance, but it's not playing nice.
I know the right jars are being loaded, because I can see them mentioned in
the log, but still I get a class not found exception:
09-May-2010 15:34:02 org.apache.so
Erik Hatcher-4 wrote:
>
> What version of Solr? Try switching to
> class="solr.VelocityResponseWriter", and if that doesn't work use
> class="org.apache.solr.request.VelocityResponseWriter". The first
> form is the recommended way to do it. The actual package changed in
> trunk not t
Sorry -- in the second of those error messages (the NPE) I meant
lucene
not standard.
Andrew Clegg wrote:
>
>
> Erik Hatcher-4 wrote:
>>
>> What version of Solr? Try switching to
>> class="solr.VelocityResponseWriter", an
in or /solr/itas and insert your core name in the
middle.
(Does anyone know if there'd be a simple way to make that automatic?)
Andrew Clegg wrote:
>
>
> Erik Hatcher-4 wrote:
>>
>> What version of Solr? Try switching to
>> class="solr.Velocit
Hi folks,
I had a Solr instance (in Jetty on Linux) taken down by a process monitoring
tool (God) with a SIGKILL recently.
How bad is this? Can it cause index corruption if it's in the middle of
indexing something? Or will it just lose uncommitted changes? What if the
signal arrives in the middl
Hi Solr gurus,
I'm wondering if there is an easy way to keep the targets of hyperlinks from
a field which may contain HTML fragments, while stripping the HTML.
e.g. if I had a field that looked like this:
"This is the entire content of my field, but http://example.com/ some of
the words are a
Lance Norskog-2 wrote:
>
> The PatternReplace and HTMPStrip tokenizers might be the right bet.
> The easiest way to go about this is to make a bunch of text fields
> with different analysis stacks and investigate them in the Scema
> Browser. You can paste an HTML document into the text box and s
findbestopensource wrote:
>
> Could you tell us your schema used for indexing. In my opinion, using
> standardanalyzer / Snowball analyzer will do the best. They will not break
> the URLs. Add href, and other related html tags as part of stop words and
> it
> will removed while indexing.
>
Thi
Neeb wrote:
>
> Just wondering if you ever managed to run TextProfileSignature based
> deduplication. I would appreciate it if you could send me the code
> fragment for it from solrconfig.
>
Actually the project that was for got postponed and I got distracted by
other things, for now at least
Andrew Clegg wrote:
>
> Re. your config, I don't see a minTokenLength in the wiki page for
> deduplication, is this a recent addition that's not documented yet?
>
Sorry about this -- stupid question -- I should have read back through the
thread and refreshed my memory.
Markus Jelsma wrote:
>
> Well, it got me too! KMail didn't properly order this thread. Can't seem
> to
> find Hatcher's reply anywhere. ??!!?
>
Whole thread here:
http://lucene.472066.n3.nabble.com/Filtering-near-duplicates-using-TextProfileSignature-tt479039.html
--
View this message in co
77 matches
Mail list logo