Hi,
Tell us more about your deploy.
How many documents, and how large?
How much RAM?
What kind of physical disk system and how is it allocated to the VM?
When do you measure - during indexing or during search load?
Have you tried to throw more search load on the system? When (how many QPS)
does i
Hi Peter,
this scenario would be really great for us - I didn't know that this is
possible and works, so: thanks!
At the moment we are doing similar with replicating to the readonly
instance but
the replication is somewhat lengthy and resource-intensive at this
datavolume ;-)
Regards,
Peter.
> 1
Hi Guys,
I encountered a problem when enabling WordDelimiterFilterFactory for both
index and query (pasted relative part of schema.xml at the bottom of email).
*1. Steps to reproduce:*
1.1 The indexed sample document contains only one sentence: "This is a
TechNote."
1.2 Query is: q=TechNo
I've got Solr set up now with two cores which I call live and rebuild and
which point to core0 and core1 directories respectively. My solr.xml file
contains:
In my Spring MVC application I have Solr set up as an embedded server and
have two singleton beans which I use to refer to
Really well done problem statement by the way
On Tue, Sep 14, 2010 at 5:40 AM, yandong yao wrote:
> Hi Guys,
>
> I encountered a problem when enabling WordDelimiterFilterFactory for both
> index and query (pasted relative part of schema.xml at the bottom of
> email).
>
> *1. Steps to reprodu
hi,
it's the second time i am stumble across some strange behaviour:
in my schema.xml i have defined
i can't place the PatternReplaceFilter before the WhitespaceTokenizer. i
have the schema like above, di
Hello!
Tokenizer is executed before filters, because tokenizer is
"generating" tokens and than filters operate on them.
> hi,
> it's the second time i am stumble across some strange behaviour:
> in my schema.xml i have defined
> positionIncrementGap="100">
>
>
>
did you index with solr 1.4 (or are you using solr 1.4) ?
at a quick glance, it looks like it might be this:
https://issues.apache.org/jira/browse/SOLR-1852 , which was fixed in 1.4.1
On Tue, Sep 14, 2010 at 5:40 AM, yandong yao wrote:
> Hi Guys,
>
> I encountered a problem when enabling WordDe
I found
http://www.jarvana.com/jarvana/browse/org/ow2/weblab/service/solr-duplicates-detector/2.0/
Is anybody knows, hot to install ans use this lib on existing Solr instance?
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-install-DuplicatesDetectorService-tp147256
Why do you want to? Perhaps there's a better solution for your underlying
problem
if you'd explain shat it is...
Best
Erick
On Tue, Sep 14, 2010 at 8:05 AM, hellboy wrote:
>
> I found
>
>
> http://www.jarvana.com/jarvana/browse/org/ow2/weblab/service/solr-duplicates-detector/2.0/
>
> Is anybody
Peter Sturge,
this was a nice hint, thanks again! If you are here in Germany anytime I
can invite you to a beer or an apfelschorle ! :-)
I only needed to change the lockType to none in the solrconfig.xml,
disable the replication and set the data dir to the master data dir!
Regards,
Peter Karich.
CharFilters go before Tokenizers which go before (token) Filters.
Token filters (called just in the config) operate on tokens, so need
to go after the tokenizer. WhitespaceTokenizer is a tokenizer.
PatternReplaceFilterFactory is a token filter.
What you probably want instead is solr.Patter
Hi I am using solrCloud which uses an ensemble of 3 zookeeper instances.
I am performing survivability tests:
Taking one of the zookeeper instances down I would expect the client to use a
different zookeeper server instance.
But as you can see in the below logs attached
Depending on which insta
Hi Robert,
I am using solr 1.4, will try with 1.4.1 tomorrow.
Thanks very much!
Regards,
Yandong Yao
2010/9/14 Robert Muir
> did you index with solr 1.4 (or are you using solr 1.4) ?
>
> at a quick glance, it looks like it might be this:
> https://issues.apache.org/jira/browse/SOLR-1852 , whi
Hi Shaun,
I think it is more easy to fix this problem, if we got more information
about what is going on in your application.
Please, could you provide the CoreAdminResponse returned by car.process()
for us?
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.
Hi Mitch
Thanks for responding. Not actually sure what you wanted from
CoreAdminResponse but I put the following in:
CoreAdminRequest car = new CoreAdminRequest();
car.setCoreName("live");
car.setOtherCoreName("rebuild");
car.setAction(CoreAdminPar
Hey guys,
Is there a way of doing the following:
We want to get the highest value from a list of multiple fields within a
document.
Example below:
max(field1,field2,field3,field4)
The values are as follow:
field1 = 100
field2 = 300
field3 = 250
field4 = not indexed in document (null)
The hig
Shawn Heisey wrote:
The one called PatternReplaceFilterFactory (no Char) has been around
forever. It is not mentioned on the Wiki page about analyzers. The one
called PatternReplaceCharFilterFactory is only available from svn.
This seems to be true, which I hadn't realized either. The
The stats component will give you the maximum value within one field:
http://wiki.apache.org/solr/StatsComponent
You're going to have to compute the max amongst several fields
client-side, having StatsComponent return the max for each field, and
then just max-ing them client side. Not hard.
Oh wait, I misunderstood, you want just the highest value _for one
document_, from stored fields, given for each document? StatsComponent
won't help you there.
Either do it client side, or do it at index time in a single stored
field, that's it. Maybe there's some confusing way to use a quer
Hey guys,
Has anyone successfully compiled and used Field Collapsing patch (236)
with Solr 1.4.1?
I keep getting this exception when I search:
null
java.lang.NullPointerException
at
org.apache.solr.search.fieldcollapse.NonAdjacentDocumentCollapser$FloatValueFieldComparator.compare(NonA
Hi,
I'm tweaking my schema and the LowerCaseTokenizerFactory doesn't create
tokens, based solely on lower-casing characters. Is there a way to tell it
NOT to drop non-characters? It's amazingly frustrating that the
TokenizerFactory and the FilterFactory have two entirely different modes of
behav
Can SOLR be configured out of the box to handle rolling log files?
Kind regards,
Vladimir Sutskever
Investment Bank - Technology
JPMorgan Chase, Inc.
Tel: (212) 552.5097
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale
Hi,
I'm interested in using geographic clustering of records in a Solr
search index. Specifically, I want to be able to efficiently produce a
map with clustered bubbles that represent the number of documents that
are indexed with points in that general area. I'd like to combine this
with other f
On Tue, Sep 14, 2010 at 1:54 PM, Scott Gonyea wrote:
> Hi,
>
> I'm tweaking my schema and the LowerCaseTokenizerFactory doesn't create
> tokens, based solely on lower-casing characters. Is there a way to tell it
> NOT to drop non-characters? It's amazingly frustrating that the
> TokenizerFactor
Hi,
Has anyone come across a situation where they have seen their facet
field values wrap into a new facet entry when the value exceeds 256
characters?
For example:
2302
1403
1382
419
236
236*
As you can see the last value in the tissue-antology list is split
between two facet values.
Faceting on a multi-value field?
I wonder if your positionIncrementGap for your field definition in your
schema is 256. I am not sure what it defaults to. But it seems possible
if it's 256 it could lead to what you observed. Try explicitly defining
it to be really really big maybe? I'm not
On Tue, Sep 14, 2010 at 3:35 PM, Niall O'Connor
wrote:
> Has anyone come across a situation where they have seen their facet field
> values wrap into a new facet entry when the value exceeds 256 characters?
Yes, for indexed string fields, there currently is a limit of 256
chars per token. It's b
From: Simon Willnauer [simon.willna...@googlemail.com]
Sent: Tuesday, 14 September 2010 17:47
To: solr-user@lucene.apache.org
Subject: Re: Field names
>On Tue, Sep 14, 2010 at 1:39 AM, Peter A. Kirk wrote:
>>
>>
>>
>> So it only finds 9?
>
>Since the "gb" term says 18 occurrences throughout th
Hmmm, were you logged in on the Wiki? If not, you can create a login
pretty easily...
Or someone might pick it up..
Erick
On Tue, Sep 14, 2010 at 12:18 PM, Jonathan Rochkind wrote:
>
>
> Shawn Heisey wrote:
>
>>
>> The one called PatternReplaceFilterFactory (no Char) has been around
>> forever.
What does "handle" mean? Create them or index them?
Erick
On Tue, Sep 14, 2010 at 2:02 PM, Vladimir Sutskever <
vladimir.sutske...@jpmorgan.com> wrote:
> Can SOLR be configured out of the box to handle rolling log files?
>
>
> Kind regards,
>
> Vladimir Sutskever
> Investment Bank - Technology
>
hi... i am using solr for indexing local files (solrj) and indexing
crawled nutch-documents...
i have configured the fieldtype
and use this type by field
stored="true"/>
the pattern for date is how described in DateField.java:
-MM-dd'T'HH:mm:ssZ
i need this date for sorting my se
On Tue, Sep 14, 2010 at 4:54 PM, h00kpub...@gmail.com
wrote:
> SEVERE: org.apache.solr.common.SolrException: Error while creating field
> 'metadata_last_modified{type=date,properties=indexed,stored,omitNorms}' from
> value '2010-09-14T22:29:24+0200'
Different timezones are currently not allowed -
I opened a bug for this issue:
https://issues.apache.org/jira/browse/SOLR-2120
On 09/14/2010 03:51 PM, Yonik Seeley wrote:
On Tue, Sep 14, 2010 at 3:35 PM, Niall O'Connor
wrote:
Has anyone come across a situation where they have seen their facet field
values wrap into a new facet entry whe
It would be a nice feature if Solr supports queries with time zone support on
an index where all times are UTC. There is some chatter about this in SOLR-750
but i haven't found an issue that would add support for time zone queries.
Did i do a lousy search or is the issue missing as of yet?
I went for a different route:
https://issues.apache.org/jira/browse/LUCENE-2644
Scott
On Tue, Sep 14, 2010 at 11:18 AM, Robert Muir wrote:
> On Tue, Sep 14, 2010 at 1:54 PM, Scott Gonyea wrote:
>
> > Hi,
> >
> > I'm tweaking my schema and the LowerCaseTokenizerFactory doesn't create
> > token
Erick Erickson wrote:
Hmmm, were you logged in on the Wiki? If not, you can create a login
pretty easily...
Or someone might pick it up..
I was logged in, created an account just for that purpose in fact. The
page still said "protected" or something and wouldn't let me edit it. I
tried, rea
If you're using Javas SimpleDateFormat, try enclosing your Z in the format
string with single quotes, like:
SimpleDateFormat sdf = new SimpleDateFormat("-MM-dd'T'HH:mm:ss'Z'");
HTH
Erick
On Tue, Sep 14, 2010 at 4:54 PM, h00kpub...@gmail.com <
h00kpub...@googlemail.com> wrote:
> hi... i am u
Why would you want to do that, instead of just using another tokenizer
and a lowercasefilter? It's more confusing less DRY code to leave them
separate -- the LowerCaseTokenizerFactory combines anyway because
someone decided it was such a common use case that it was worth it for
the demonstrat
Jonathan, you bring up an excellent point.
I think its worth our time to actually benchmark this LowerCaseTokenizer
versus LetterTokenizer + LowerCaseFilter
This tokenizer is quite old, and although I can understand there is no doubt
its technically faster than LetterTokenizer + LowerCaseFilter e
I'm using for a field, indexing, then looking at the terms component.
I'm seeing shingles that consist of only 2 terms, whereas I'm
expecting all the terms to be at least 4 terms... What's up? Thanks.
There doesn't seem to have been anything readily available. All of the
tokenizers make their own assumptions about how I want to treat the data.
The end result is that this felt like the most direct approach. The
default behavior of "LowerCaseTokenizer"(+Factory) was retained, while
allowing it
Hi,
Still working on extending my proof of concept by working off the example
configuration and modifying the schema.xml. Having trouble with wildcard
searches:
factory OR faction -- 40 results (ok)
factory -- 1 result (ok)
faction -- 39 results (ok)
facti?n -- 39 results (ok)
fact* -- 40 resul
I'd agree with your point entirely. My attacking LowerCaseTokenizer was a
result of not wanting to create yet more Classes.
That said, rightfully dumping LowerCaseTokenizer would probably have me
creating my own Tokenizer.
I could very well be thinking about this wrong... But what if I wanted t
To answer my own question, and this sucks :) the minShingleSize isn't
set in at least 1.4.2. I'm guessing a later version though?
On Tue, Sep 14, 2010 at 5:49 PM, Jason Rutherglen
wrote:
> positionIncrementGap="100">
>
>
>
> words="stopwords.txt"/>
> maxShingleSize="4" outputUnigrams="fal
How about patching the LetterTokenizer to be capable of tokenizing how
you want, which can then be combined with a LowerCaseFilter (or not) as
desired. Or indeed creating a new tokenizer to do exactly what you want,
possibly (but one that doesn't combine an embedded lowercasefilter in
there too
And here's the issue... https://issues.apache.org/jira/browse/SOLR-1740
On Tue, Sep 14, 2010 at 6:08 PM, Jason Rutherglen
wrote:
> To answer my own question, and this sucks :) the minShingleSize isn't
> set in at least 1.4.2. I'm guessing a later version though?
>
> On Tue, Sep 14, 2010 at 5:49
> but
>
> facto?y -- 0 (expecting 1)
>
>
you have stemming enabled for the field? stemming will make your wildcards
behave strangely. I would recommend you turn it off. because stemming likely
turned factory into factori or similar
> I thought these are all valid searches but am I missing somethi
I didn't see any open Jira issues for this, so i created one...
https://issues.apache.org/jira/browse/SOLR-2121
: Date: Tue, 7 Sep 2010 01:35:39 -0700 (PDT)
: From: Marc Sturlese
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Null pointer exception when
K, just making sure.
Erick
On Tue, Sep 14, 2010 at 5:20 PM, Jonathan Rochkind wrote:
> Erick Erickson wrote:
>
>> Hmmm, were you logged in on the Wiki? If not, you can create a login
>> pretty easily...
>>
>> Or someone might pick it up..
>>
>>
> I was logged in, created an account just for
That was it! Thank you very much.
- Original Message
From: Robert Muir
To: solr-user@lucene.apache.org
Sent: Tue, September 14, 2010 5:58:03 PM
Subject: Re: wildcard searches not consistent
> but
>
> facto?y -- 0 (expecting 1)
>
>
you have stemming enabled for the field? stemming will
You are probably not talking about clusters in the physical structure of data
on this disk, right?
What do YOU mean by clusters if not?
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/
There's a lot of reasons, with the performance hit being notable--but also
because I feel that using a regex on something this basic amounts to a lazy
hack. I'm typically against regular expressions in XML.
I'm vehemently opposed to them in cases where not using them should
otherwise be quite tri
Because (just IMO, I'm not an expert here either) the basic framework in
Solr is that tokenizers tokenize, but they don't generally change bytes
inside values. What changes bytes (or adds or removes tokens to the
token stream initially created by a tokenizer, etc) is filters. And
there's alrea
On 09/14/2010 07:48 PM, Dennis Gearon wrote:
> You are probably not talking about clusters in the physical structure of data
> on this disk, right?
>
> What do YOU mean by clusters if not?
I mean basically "range facets", where the ranges are 2-dimensional
distances between documents that have i
Thanks Mark for taking time to reply. What else could cause this issue to
happen so frequently. We have a master/slave configuration and only one
update server that writes to index. We have plenty of disk space available.
Thanks
Bharat Jain
On Fri, Sep 10, 2010 at 8:19 AM, Mark Miller wrote:
Dear All:
I am studying SolrCloud now,I downloaded it
from:https://svn.apache.org/repos/asf/lucene/solr/branches/cloud/
but i found that there no
webapps:https://svn.apache.org/repos/asf/lucene/solr/branches/cloud/example/webapps/
but we need http://localhost:8983/solr/collection1/admin/zookee
After upgrading to 1.4.1, it is fixed.
Thanks very much for your help!
Regards,
Yandong Yao
2010/9/14 yandong yao
> Hi Robert,
>
> I am using solr 1.4, will try with 1.4.1 tomorrow.
>
> Thanks very much!
>
> Regards,
> Yandong Yao
>
> 2010/9/14 Robert Muir
>
> did you index with solr 1.4 (or
Hi
when using LukeRequestHandler, I can for example call:
http://localhost:8983/solr/admin/luke?fl=name&fl=cat
which will return data including the frequency of the top 10 search terms in
the specified fields.
I can also add a "numTerms" parameter to obtain more than the top 10.
But how do I e
> when using LukeRequestHandler...
> But how do I ensure I get *all* the terms in the index returned? Can I set
> "numTerms=ALL" or something like that?
I'm not sure about LukeRequestHandler, but you can do that with the
TermsComponent instead.
/terms?terms.fl=name&terms.limit=-1
Will give y
So, basically, faceting geographically?
within 100 meters
within 300 meters
within 1km
within 3km
within 10km
within 100km
This type of results?
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www
Can you give us a scencario:
1/ Like a OOP sequence diagram, "Thishappens, that happens, now that"
2/ Where you see it useful?
Isn't it possible to convert before storing/after retrieving?
Couldn't a timezone offset (or local timezone designation) be stored as a
separate field to fil
>From what I can tell, it's being controlled in the browser. I CAN'T tell if
>it's being generated in the browser or in the server.
Which is it in the example,and where to you want it generated? Do you want the
DATA for the clusters, or the actual icons also?
Looks like a display object way to
I saw something about having separate reader vs writer to an index. The email
said that the reader had to do occasional (empty) commits to keep the cache
warm and for another reason. Is this relevant?
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all
Hi All,
What is the difference of using shards,solr cloud and zookeeper..
which is the best way to scale the solr..
I need to reduce the index size in every system and reduce the search time
for a query...
Regards,
satya
65 matches
Mail list logo