I have spent lot of time in the past day playing with this setup, and made
it work finally, here are few bits of interest:
- solr v40
- linux, java7, local filesystem
- big index, 1 RW instance + 2 RO instances (sharing the same index)
lock is acquired when solr is writing data - if you happen t
so in this case since the field type is String, adding
omitTermFreqAndPositions="true" does really help in reducing the index size?
On Wed, Jul 3, 2013 at 10:00 PM, Jack Krupansky wrote:
> Oops... I wasn't reading carefully enough - frequencies and positions only
> relate to tokenized fields
sorry change the query to:
label: (Google AND Cloud AND Storage)
or will Solr add AND / OR behind the scenes?
On Wed, Jul 3, 2013 at 9:59 PM, Ali, Saqib wrote:
> So do I have to change my query to
> label: (Google Cloud Storage) ?
>
> or will Solr add AND / OR behind the scenes?
>
>
> On Wed,
Oops... I wasn't reading carefully enough - frequencies and positions only
relate to tokenized fields (text) - not string fields.
That doesn't impact your ability to do AND and OR of discrete string terms
of a multivalued string field.
-- Jack Krupansky
-Original Message-
From: Jack
So do I have to change my query to
label: (Google Cloud Storage) ?
or will Solr add AND / OR behind the scenes?
On Wed, Jul 3, 2013 at 9:54 PM, Jack Krupansky wrote:
> Yes, but it is simply doing an AND or OR of the individual terms - no
> phrases or implied ordering of the terms.
>
>
> -- Jack
Yes, but it is simply doing an AND or OR of the individual terms - no
phrases or implied ordering of the terms.
-- Jack Krupansky
-Original Message-
From: Ali, Saqib
Sent: Thursday, July 04, 2013 12:52 AM
To: solr-user@lucene.apache.org
Subject: Re: omitTermFreqAndPositions="true" in
Jack,
Thanks for the explanation! :
We have a multi-value field as following:
Most of these labels are two or more letter phrase e.g.
1) Google Reader
2) Google Mail
3) Google Cloud Storage
etc. etc.
if we add omitTermFreqAndPositions="true" to this field:
Will we be able to execute queries
The split/group implementation in RegexTransformer is not as efficient
as CSVLoader. Perhaps we need a specialized csv loader in DIH.
SOLR-2549 aims to add this support. I'll take a look.
On Tue, Jul 2, 2013 at 12:26 AM, Mike L. wrote:
> Hey Ahmet / Solr User Group,
>
>I tried using the buil
Hi Kathryn,
I wonder if you could index all your terms as separate documents and then
construct a new query (2nd pass)
q=term:term1 OR term:term2 OR term:term3
and use func to score them
*idf(other_field,field(term))*
*
*
the 'term' index cannot be multi-valued, obviously.
Other than that, if y
If you have a text field and simply want to be able to query whether
individual terms are present in the text without needing to know either how
frequently the terms occur or that some terms may be in present in phrases.
So, you can do AND and OR for individual terms in that field, but not
phra
Thank you Shawn for the excellent use case. :)
On Wed, Jul 3, 2013 at 9:34 AM, Shawn Heisey wrote:
> On 7/3/2013 9:22 AM, Ali, Saqib wrote:
>
>> What would be the use case for such a field:
>>
>> > stored="false"/>
>>
>>
>> and
>>
>> > stored="false"/>
>>
>
> I have a field li
Thanks Jacks! That was very helpful.
On Wed, Jul 3, 2013 at 9:54 AM, Jack Krupansky wrote:
> If never used, they take up zero space in the index.
>
> If they were used but are no longed used, they're still there, but any new
> or replaced documents will not take up any space for the unused field
Hello,
Can anyone please explain omitTermFreqAndPositions="true" to me in easy
English, please?
Thanks.
We have single Solr instance with lot of indexed document. Now we would
like to move to SolrCloud implementation.
Can we move the existing index to SolrCloud? If so, how? Or do we need to
reindex our data in SolrCloud?
Thanks,
Saqib
I'm using a Solr 4.3 server and accessing it from both a Java based desktop
application using SolrJ and an Android based mobile application using my
home-grown REST adaptor. I'm trying to make sure that versions of the
application are synchronized with updates to the server (too often testers
That sounds more feasible. Solr 4 does have support for swapping the core
over. The only issue is that you need to make sure you know which core is
active after restart. Solr can save the changes (mapping of core name to
directory) in solr.xml, but you need to ensure all the permissions are set
cor
On Wed, Jul 3, 2013 at 5:40 PM, slevytam wrote:
> Hi Yonik,
>
> Can you offer any insight as to how one might ensure that documents reside
> on the same shard as the document you'd like them to join.
>
> For example:
> I'd like to do a simple join of user actions to a specific document. So, i
> w
Hi Jan,
http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=issue -
SOLR-1792?
Otis
--
Performance Monitoring -- http://sematext.com/spm
Solr & ElasticSearch Support -- http://sematext.com/
On Wed, Jul 3, 2013 at 5:59 PM, Jan Morlock wrote:
> Hi,
>
> we would like to observe the m
Awesome thanks. What about indexing in a different core then renaming it once
its done?
Thanks
Brendan
On Jul 3, 2013, at 6:48 PM, Shawn Heisey wrote:
> On 7/3/2013 2:45 PM, Brendan Grainger wrote:
>> I'm experimenting with indexing using the EmbeddedSolrServer. Just to be
>> sure, as I under
On 7/3/2013 2:45 PM, Brendan Grainger wrote:
> I'm experimenting with indexing using the EmbeddedSolrServer. Just to be
> sure, as I understand it, I do not need a running instance of solr to use
> this, it literally is a running instance of solr.
You are correct, EmbeddedSolrServer starts a compl
Hi,
I'm looking for a way to clustering (or should I call it group) geo
spatial points on map based on the current zoom level and get the median
coordinate for that cluster.
Let's say I'm on the world level, and I want to cluster spatial points
within a 1000 km radius. When I zoom in I only wa
Hi,
we would like to observe the mean value of the average time per request for
the last N (e.g. 20) queries (a.k.a. simple moving average) of our Solr
server using Nagios. Does anybody know if such an observable is already
implemented.
If not, I think the perfect place for it would be the getSta
Hi Yonik,
Can you offer any insight as to how one might ensure that documents reside
on the same shard as the document you'd like them to join.
For example:
I'd like to do a simple join of user actions to a specific document. So, i
would query for a list of documents and have the user actions on
Hi;
I've written an e-mail at dev list and I want to share same e-mail here.
I've opened two issues at Jira and I want to get feedback of community.
First issue is: https://issues.apache.org/jira/browse/SOLR-4995
Currently Solr servers are interacting with only one Solr node. I think
that there s
Hey Shawn / Solr User Group,
This makes perfect sense to me. Thanks for the thorough answer.
"The CSV update handler works at a lower level than the DataImport
handler, and doesn't
have "clean" or "full-import" options, which defaults to clean=true. The DIH is
like a full application em
It looks like spellcheck's collations feature is implemented using hit
counts but I'm wondering if it'd be useful/possible to be able to sort by
the maximum score as it would have been calculated by the query. I'm
really thinking about the case when one of the spelling suggestions yields
a perfect
Hi,
I'm experimenting with indexing using the EmbeddedSolrServer. Just to be
sure, as I understand it, I do not need a running instance of solr to use
this, it literally is a running instance of solr.
Given the above, how safe is it to use an EmbeddedSolrServer for indexing
an index that might be
On Tue, Jul 2, 2013 at 10:59 AM, Andy Pickler wrote:
> SELECT
> br.other_content AS replyContent
> FROM block_reply
> ">
> *THIS DOESN'T WORK!*
>
shouldn't it be
column="replyContent"
since you are renaming it in SELECT?
Regards,
Alex.
Personal website: http://www.ou
I am using the switch parser plugin as below. Whenever there is a value for
$latlong it will invoke $fq_bbox or else it will invoke $fq_simple.. Now I
need to add one more case whenever $where is not present in the query I need
to do call fq_all and ignore both fq_bbox / fq_simple. Can someone let
Katie,
This case is actually really hard to get. Just let me provide the
contra-sample, to let you explain problem better by spotting the gap.
What if I say that, debugQuery=true provides tf, idf for the terms and
documents from the requested page of results. Why you can't use explain to
solve the
I'll give that a shot, thanks!
On Wed, Jul 3, 2013 at 12:28 PM, Shawn Heisey wrote:
> On 7/3/2013 9:29 AM, Neal Ensor wrote:
>
>> Posted the solr config up as http://apaste.info/4eKC (hope that works).
>> Note that this is largely a hold-over from upgrades of previous solr
>> versions, there
Hi,
We are upgrading solr 4.0 to solr 4.3.1 on tomcat 7.
We would like to use the "compositeId" router. It seems that there are two ways
to do that: 1. using collections API to create a new collection by passing
"numShards"; 2. Passing "numShards" in bootstrap process.
For 1, we have a large a
If never used, they take up zero space in the index.
If they were used but are no longed used, they're still there, but any new
or replaced documents will not take up any space for the unused fields
(subject to the facet that deleted fields still exist until a merge/optimize
compresses them aw
On 7/3/2013 9:22 AM, Ali, Saqib wrote:
What would be the use case for such a field:
and
I have a field like this in my schema. That field is used as one of the
source fields that get copied to my "catchall" field. I don't need the
field by itself, but I use it in conj
Hi,
Try this instead:
http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor
Background info: https://issues.apache.org/jira/browse/SOLR-1499
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Wed, Jul 3, 2013 at 2:50 AM
On 7/3/2013 9:29 AM, Neal Ensor wrote:
Posted the solr config up as http://apaste.info/4eKC (hope that works).
Note that this is largely a hold-over from upgrades of previous solr
versions, there may be lots of cruft left over. If it's advisable to do
so, I would certainly be open to starting
You can do a reload, yes, but a commit() is considerably faster.
On Tue, Jul 2, 2013 at 10:35 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:
> Wouldn't it be better to do a RELOAD?
>
> http://wiki.apache.org/solr/CoreAdmin#RELOAD
>
> Michael Della Bitta
>
> Applications Deve
Sounds like some jars cannot be found.
Maybe you can show the diff?
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Wed, Jul 3, 2013 at 12:19 PM, Van Tassell, Kristian
wrote:
> I made a minor change in my solr schema and suddenl
Exactly. And the newest shard can also be kept small (e.g. maybe just
last 12h is OK to hit first and dig deeper only if you can't find
enough stories in the last 12h), which means it will fit in memory and
be crazy fast.
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Mo
I made a minor change in my solr schema and suddenly Solr won't start (4.2
running in Tomcat). I have the same files and configuration on another machine
(running 4.2 and Tomcat), and the same configuration on yet another (a JBoss
one).
All other webapps start ok. Has anyone seen this before?
If you are a programmer, you can modify it and attach a patch in Jira...
On Tue, Jun 4, 2013 at 4:25 AM, Marcin Rzewucki wrote:
> Hi there,
>
> StatsComponent currently does not have median on the list of results. Is
> there a plan to add it in the next release(s) ? Shall I add a ticket in
>
Hi,
Does that OR query need to be scored?
Does it repeat?
If answers are no and yes, you should use fq, not q.
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Wed, Jul 3, 2013 at 12:07 PM, Kevin Osborn wrote:
> Also, what is th
Hi,
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters says:
catenateWords="1" causes maximum runs of word parts to be catenated:
"wi-fi" => "wifi"
default is 0
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Wed, Ju
Also, what is the total document count for your result set? We have an
application that is also very slow because it does a lot or OR queries. The
problem is that the result set is very large because of the ORs. Profiling
showed that Solr was spending the bulk of its time scoring the documents.
Al
Hi,
I think nobody in the community is focused on field
collapsing/grouping, so I suspect there won't be a fix until somebody
gets a strong-enough itch or business requires it so much that it
decides it pays to invests in the contribution.
Otis
--
Solr & ElasticSearch Support -- http://sematext.c
Hi,
In my schema.xml, I have the following settings:
This does great job for most of my text, but one thing I does that I don't like
is it won't replace underscores to spaces; it strips them. For example,
Or you may have dynamic field as stored but ignore some specific fields
that would otherwise be matching the dynamic field mask. Useful if you are
trying to get metadata but not content out of something.
This is based on specific field names matching before dynamic ones.
Regards,
Alex.
Person
We should consider adding another parameter "RealTime" in the log. That
would really help all of us trying to figure out how much time a query is
taking.
On Tue, Jun 4, 2013 at 5:14 PM, Otis Gospodnetic wrote:
> Right. The main takeway is that QTime is not exactly what user sees.
> What users
Hello all,
Do unused fields in Solr Schem.xml increase the size of the index files?
Should we be cleaning up those fields?
Thanks.
Saqib
very interesting. thank you all for the explanation!!! :)
On Wed, Jul 3, 2013 at 8:32 AM, Jack Krupansky wrote:
> Setting both indexed and stored to false means to ignore input values for
> that field.
>
> The effective use case is that these fields may have values in the update
> input stream a
In your schema you can define a Field type, and have it remove anything
after the ".".
Or use something like
http://wiki.apache.org/solr/DataImportHandler#RegexTransformer
On Wed, Jul 3, 2013 at 12:35 AM, archit2112 wrote:
> Im successfully able to index pdf,doc,ppt,etc files using the Data
Good point. You might have input fields for year, month, and day. You could
ignore those, but use them to make a date field.
wunder
On Jul 3, 2013, at 8:32 AM, Jack Krupansky wrote:
> Setting both indexed and stored to false means to ignore input values for
> that field.
>
> The effective use
Thanks! Unfortunately it is not working. I use Solr 3.6.2.
This is how I have defined the field type:
When I query for "*\?" I still get false positives, i.e. values that do not
end with a question mark.
Am I miss
Most tokenizers would treat "?" as punctuation - to be ignored. The white
space tokenizer will preserve all punctuation.
Or, use a raw string field ("string"/StrField).
In either case, you would need to escape the "?" in the query parser (with a
backslash) since it is a wildcard character.
Y
Setting both indexed and stored to false means to ignore input values for
that field.
The effective use case is that these fields may have values in the update
input stream and they will be ignored. Without these field definitions,
those same field values would cause exceptions - references to
Maybe to ignore?
You can set a dynamic Field to ignore as well.
On Wed, Jul 3, 2013 at 9:22 AM, Ali, Saqib wrote:
> Hello all,
>
>
> What would be the use case for such a field:
>
> stored="false"/>
>
>
> and
>
> stored="false"/>
>
>
> ?
>
>
> Thanks.
>
--
Bill Bell
billn
Posted the solr config up as http://apaste.info/4eKC (hope that works).
Note that this is largely a hold-over from upgrades of previous solr
versions, there may be lots of cruft left over. If it's advisable to do
so, I would certainly be open to starting from scratch with a 4.3+ example
configura
Hello all,
What would be the use case for such a field:
and
?
Thanks.
Hi all,
Currently, I am experimenting with the tokenizers.
Assume, I have the values: "2013?", "1900?", "87?".
I want to retrieve all values that end with the question mark as literal.
How do I need to define the in the schema.xml to allow for such
a query?
I assume it would be like: *\?
Is
+1
On 3 July 2013 14:58, Jack Krupansky wrote:
> Design your own application layer for both indexing and query that knows
> about both SQL and Solr. Give it a REST API and then your client
> applications can talk to your REST API and not have to care about the
> details of Solr or SQL. That's t
On Jul 3, 2013, at 7:47 AM, Erick Erickson wrote:
> Usually most people
> care about today's news, and a hot story will
> generate lots of queries, all of which are serviced
> by today's shard.
That's really the whole point though - rather than slamming your whole cluster
with every search, th
Design your own application layer for both indexing and query that knows
about both SQL and Solr. Give it a REST API and then your client
applications can talk to your REST API and not have to care about the
details of Solr or SQL. That's the best starting point.
-- Jack Krupansky
-Origin
Indeed, Roman. Thanks for mentioning that. I just took a quick look at that
issue and will look at it even deeper as time permits.
Erik
On Jul 3, 2013, at 09:43 , Roman Chyla wrote:
> Hi Niran, all,
> Please look at JIRA LUCENE-5014. There you will find a Lucene parser that
> does bot
Hi Niran, all,
Please look at JIRA LUCENE-5014. There you will find a Lucene parser that
does both analysis and span queries, equivalent to combination of
lucene+surround, and much more The ticket needs your review.
Roman
Hi Niran,
No analysis is done with surround. For example if you have lowercase filter in
your index analysis chain then 'Wuthering OR Heights' won't match. Try using
'wuthering OR heights'. Stemming can be problematic in this too.
See limitations in http://wiki.apache.org/solr/SurroundQueryPar
Niran -
Looks like you're being bitten by a known "feature"* of the surround query
parser. It does not analyze the text, as some of the other more commonly used
query parsers does. The dismax, edismax, and "lucene" query parsers all
leverage field analysis on the query terms or phrases. The
One option could be to get the clusterstate.json via the following Solr url
& figure out the leader from the response json:
*
http://server:port/solr/zookeeper?detail=true&path=%2Fclusterstate.json*
On Wed, Jul 3, 2013 at 5:57 PM, vicky desai wrote:
> Hi,
>
> I have a requirement where in I want
Hi,
I have a requirement where in I want to write to the leader and read from
the replica. Reason being If a write request is sent to the replica it
relays it to the leader and then the leader relays it to all the replicas.
This will help me in saving some network traffic as my application perform
I have the same question. My purpose is to start the dih full process on
the leader and not on a replica.
I tried full import on a replica but watching logs it seemed to me that the
replica was loading data to send it to the leader which in turn has to
update all the replicas.
At least this is what
On Wed, 2013-07-03 at 05:48 +0200, huasanyelao wrote:
> The response time for "OR" query is around 1-2seconds(the "AND" query is just
> about 30ms-40ms ).
The number of hits will also be much lower for the AND-query. To check
whether it is the OR or the size of the result set that is the problem,
You can always query Zookeeper and find that information out.
Take a look at CloudSolrServer, maybe ZkCoreNodeProps etc.
for examples since CloudSolrServer is "leader aware", it
should have some clues...
Or maybe ZkStateReader? I haven't been in that code much,
so I can't be more specific...
But
Ok let me explain the problem.
I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user
creates a search criteria 'X1' and he/she wants to know the occurrence of a
specific term in the result set of that 'X1' search criteria.
And then again he/she creates another search criteria '
Hi,
I have tried to get the surround query parser working against SOLR 4.3.0 and
SOLR 4.0.0 but it does not seem to return any documents for me. I need this
parser to make ordered proximity searches with the operator "W". I have tried
the parser with normal boolean operators like OR and it stil
Not automatically as far as I know. You can use
custom routing to achieve this though.
That said, be careful before you jump this way.
While it _seems_ elegant, something like have
one shard per day for the retention period (say 30 days)
and just rotate the oldest one out, it may not be
best. Cons
I don't think you can, thus the silence. But why do you want
to do this thing? Smells like an XY problem, you've asked
how to do a specific thing without stating the problem. Perhaps
there's a better approach that _is_ do-able.
Best
Erick
On Wed, Jul 3, 2013 at 2:14 AM, Tony Mullins wrote:
> An
On Jul 3, 2013, at 05:48 , huasanyelao wrote:
> Nowdays, I've got a urgent task to improve the "OR" query performance with
> solr.
> I have deployed 9 shards with solr-cloud in two server(each server : 16
> cores, 32G RAM).
> The total document count: 60,000,000, total index size : 9G.
> Accord
On Wed, Jul 3, 2013 at 6:48 AM, huasanyelao wrote:
> Nowdays, I've got a urgent task to improve the "OR" query performance with
> solr.
> I have deployed 9 shards with solr-cloud in two server(each server : 16
> cores, 32G RAM).
> The total document count: 60,000,000, total index size : 9G.
> Ac
Nowdays, I've got a urgent task to improve the "OR" query performance with
solr.
I have deployed 9 shards with solr-cloud in two server(each server : 16 cores,
32G RAM).
The total document count: 60,000,000, total index size : 9G.
According to the requirement, I have to use the "OR" query to get
Hi Fabio,
Sandeep is right - it'll take time. SOLR isn't straightforward when you first
start out but the tutorial is the best first step. You can then adapt the
various config files in the tutorial to adapt to your situation. I'd recommend
a simple approach to get the hang of it and just index
Hi everyone,
I'm seeing very bad performance when grouping (field collapsing) using
group.facet=true with a large result set.
- I have an index with 2 million documents, and I query with five facet
fields (each with 30+ groups)
- If I set group.facet=false the query can take 2000ms on first r
Great thank Tanguy for your response,
My boss ask me to before upgrade solr version because we are in 3.3.0 (in order
to swith to 4.3.1)
I tried with new version and I come to you again later. Yes ?
Great Thank again Tanguy :-)
Bien cordialement,
Adrien Ruffié
LD : +33 1 73 03 29 50
Tél : +33
Hello I made a very simple custom similarity, mainly for testing and
learning:
public class StaticNormSimilarity extends DefaultSimilarity {
private static final Logger LOG =
LoggerFactory.getLogger(StaticNormSimilarity.class);
private float norm = 0.1f;
public void setNorm(float norm) {
Hi Sandeep
Thank you for your reply
Il have a read through the tutorials now that i understand the principle of
all this,
i would ideally like to keep mssql and bolt solr on top of this so that we
can keep mssql as we have a 200GB database
Cheers
--
View this message in context:
http://luc
Hello Adrien,
Looking quickly at your schema, I suspect that the suggestions field isn't
populated, so the suggester dictionary is empty.
How is input sent to that field ? Providing a few sample documents you are
indexing could help understand what is going on.
If you intended to copy content
Hi all,
Flax is running a Lucene/SOLR hack day here in Cambridge on Friday, 26th July,
with committer and LucidWorks co-founder Grant Ingersoll. We'll provide the
venue, some food and the internet - you provide enthusiasm and great ideas for
hacking!
Details here:
http://www.meetup.com/Enter
Hi,
We have a solr cloud cluster with 2 different collections, each
collection having many nodes. We try to get the status of each
collection using CoreAdminRequest.
The code gets all live nodes from the cluster and sends a request to
each node until it gets a valid response.
We would like to ha
Hi ,
I have a set up of 1 leader and 1 replica and I have a requirement where in
I need to find the leader core from the collection. Is there an api in solrj
by means of which this can be achieved.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Is-it-possible-to-find-a-lea
87 matches
Mail list logo