On 5/16/2014 9:24 AM, aiguofer wrote:
> Jack Krupansky-2 wrote
>> Typically the white space tokenizer is the best choice when the word
>> delimiter filter will be used.
>>
>> -- Jack Krupansky
>
> If we wanted to keep the StandardTokenizer (because we make use of the token
> types) but wanted to
On 5/16/2014 6:43 PM, Steve McKay wrote:
> Doing this doesn't avoid the need to configure and administrate ZK. Running a
> special snowflake setup to avoid downloading a tar.gz doesn't seem like a
> good trade-off to me.
>
> On May 15, 2014, at 3:27 PM, Upayavira wrote:
>
>> Hi,
>>
>> I need t
On 5/15/2014 1:34 AM, Alexandre Rafalovitch wrote:
> I thought the date math rounding was for _caching_ the repeated
> queries, not so much the speed of the query itself.
Absolutely correct. When NOW is used without rounding, caching is
completely ineffective. This is because if the same query u
Aman,
The option you have got is:
- write custom components like request handlers, collectors & response
writers..
- first you would do the join, then apply the pagination
- you will get the docList in response writer, you would need to make a
call to the second core (you could be smart to use the
Regarding to your question: "That said, are you sure you want to be using
the payload feature of Lucene? "
I don't know because I don't know what is the benefits from this tokenizer,
and what Payload means here!
On Sat, May 17, 2014 at 2:45 AM, Jack Krupansky-2 [via Lucene] <
ml-node+s472066n413
On 5/15/2014 8:29 AM, danny teichthal wrote:
> I wonder about performance difference of 2 indexing options: 1- multivalued
> field 2- separate fields
>
> The case is as follows: Each document has 100 “properties”: prop1..prop100.
> The values are strings and there is no relation between different
Thanks Jack i am using *q.alt* just for testing purpose only we uses
*q=query* in our general production environment case and *mcat.intent* is
our request handler to add extra number of rows and all.
Here i was doing some mistake to properly explaining the situation, so i am
sorry for that.
*Requ
Hi Bijan,
Have you tried to set hl.maxAnalyzedChars parameter to larger number?
hl.maxAnalyzedChars
http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars
As the default value of the parameter is 51200, if the second "Andy" is
at the end paragraph of your large stored field, the
I thought the date math rounding was for _caching_ the repeated
queries, not so much the speed of the query itself.
Also, if you are using TrieDateField, precisionStep value is how
optimization is done. There is bucketing at different level of
precision, so the range search works at the least gran
Doing this doesn't avoid the need to configure and administrate ZK. Running a
special snowflake setup to avoid downloading a tar.gz doesn't seem like a good
trade-off to me.
On May 15, 2014, at 3:27 PM, Upayavira wrote:
> Hi,
>
> I need to set up a zookeeper ensemble. I could download Zookeep
Erick,
Thanks for your update.
The problem this this data is will being until whole document in the section
be deleted.
I understand this is cause optimize double scan index folder in this case.
We may add some logic to check when the file size do this scan when the file
size is too bigger.
Yon
Hi,
How many auto warming queries are supported per collection in Solr4.4 and
higher? We see one out of three queries in log when new searcher is created.
Thanks!
@Matt, Sorry we don't want to use any other organisation product other than
Apache Foundation. Thanks anyway.
Anybody else here who can help me with the default tomcat installation
along with solr to configure solrcloud.
With Regards
Aman Tandon
On Wed, May 14, 2014 at 8:13 AM, Matt Kuiper (Spr
Hi Markus,
SPDY does provide lower latency in the case when I have multiple requests to
the same server/domain. It compresses the header and reduces the number of
connections. But since it uses tls I am not sure if it will be faster than http
1.1. That is why I wanted to test SPDY with solr for
This seems to be a relevant discussion:
http://stackoverflow.com/questions/9932722/android-app-solr .
Including some code links.
Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
On Sat, May 17, 2014
Hi,
In my solr-4.2 we were using the two cores as described below:
How should i setup the solr 4.7 for the core.properties of mcat and cat
cores for using the solrcloud?
With Regards
Aman Tandon
The config is stored in ZooKeeper.
/configs/myconf/velocity/pagination_bottom.vm is a ZooKeeper path, not a
filesystem path. The data on disk is in ZK's binary format. Solr uses the ZK
client library to talk to the embedded server and read config data.
On May 16, 2014, at 2:47 AM, Aman Tandon
On Thu, May 15, 2014 at 3:44 PM, Jean-Sebastien Vachon
wrote:
> I spent some time today playing around with subfacets and facets functions
> now available in helios search 0.05 and I have some concerns... They look
> very promising .
Thanks, glad for the feedback!
[...]
> the response looks go
Hi everyone,
I'm investigating migrating over to an HDFS-based Solr Cloud install.
We use Cloudera Manager here to maintain a few other clusters, so
maintaining our Solr cluster with it as well is attractive. However, just
from reading the documentation, it's not totally clear to me what
version(
Its "spellcheck.maxResultsForSuggest".
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxResultsForSuggest
James Dyer
Ingram Content Group
(615) 213-4311
-Original Message-
From: Jan Verweij - Reeleez [mailto:j...@reeleez.nl]
Sent: Monday, May 12, 2014 2:12 AM
To: solr-user@
Hi all!
I'm using Solr 4.6.0. I'd created three collections and combine them to
alias via CREATEALIAS API.
I run delete request via curl, ex
curl
"http://127.0.0.1:8080/solr/all/update?stream.body=%3Cdelete%3E%3Cquery%3EText:dummy%3C/query%3E%3C/delete%3E";
where "all" is alias if three collection
> Hi,
>
> We have a couple of Solr servers acting as master and slave, and each
> server have the same amount of cores, we are trying to configure the
> solrcore.properties so that an script is able to add cores without
> changing the solrcore.properties using a hack like this:
>
> enable.master=fa
Add the debugQuery=true parameter and look at the "timing" section to see
which search component is consuming the time. Are you using faceting or
highlighting?
7 million documents is actually a fairly small index.
-- Jack Krupansky
-Original Message-
From: mizayah
Sent: Wednesday, M
1. Indexing 100-200 docs per second.
2. Doing Pkill -9 java to 2 replicas (not the leader) in shard 3 (while
indexing).
3. Indexing for 10-20 minutes and doing hard commit.
4. Doing Pkill -9 java to the leader and then starting one replica in shard
3 (while indexing).
I think you're in uncharted t
Could you please share the solrconfigs and schema here for more debugging
the issue as well as you could also try by adding the extra parameter
(&debugQuery=true) to your request params. Then you can view the
parsed_query, the actual query parsed by solr.
With Regards
Aman Tandon
On Thu, May 1
This is almost always that you're committing too often, either soft
commit or hard commit with openSearcher=true. Shouldn't have any
effect on the consistency of your index though.
It _is_ making your Solr work harder than you want it to, so consider
increasing the commit intervals substantially.
My e-book has an example of an update processor that rounds to any specified
resolution (e.g, day, year, hour, etc.)
The performance reason was for filter queries, to keep their uniqueness
down, not random user queries, which should be fine unrounded, except that
they can't be used for exact q
Take a look at Solr's use of DocValues:
https://cwiki.apache.org/confluence/display/solr/DocValues.
There are docValues options that use less memory then the FieldCache.
Joel Bernstein
Search Engineer at Heliosearch
On Thu, May 15, 2014 at 6:39 AM, Jeongseok Son wrote:
> Hello, I'm struggling
Any help here.??
With Regards
Aman Tandon
On Thu, May 15, 2014 at 7:33 PM, Aman Tandon wrote:
> Hi,
>
> In my solr-4.2 we were using the two cores as described below:
>
>
>hostContext="${hostContext:}" zkClientTimeout="${zkClientTimeout:15000}">
>
>
Really, really, _really_ consider denormalizing the data. You're
trying to use Solr
as a RDBMS. Solr is a _great_ search engine, but it's not a DB and trying
to make it behave as one is almost always a mistake.
Using joins should really be something you try _last_.
Best,
Erick
On Tue, May 13, 20
the first time you use any fq clause, it's evaluated pretty much as
though you'd just ANDed it in to the main clause. It's only if you use
the fq clause again that the query can take advantage of the caching.
But one query does not a pattern make. Is this right after you've
started the server? Or
Hello,
i am trying to import data from my db into solr.
in db i have two tables
- orders [order_id, user_id, created_at, store]
- order_items [order_id, item_id] (one - to - many relation )
i would like to import this into my solr collection
- orders [user_id (unique), item_id (multivalue) ]
i s
Hello all,
I am trying to return multiple snippets from a single document with a field
which includes many (5+) instances of the word ‘andy’ in the text. For some
reason, I can only get it to return one snippet. Any ideas?
Here’s the query and the response:
http://codejaw.com/2gwoozr
Thanks!
Hi list
I created a small token filter which I'd gladly "contribute", but want to
know if there's any interest in it before I go and make it pretty, add
documentation, etc... ;)
I originally created it to index domain names: I wanted to be able to
search for "google.com" and find "www.google.com"
Thanks Dmitry!
On 05/15/2014 07:54 AM, Dmitry Kan wrote:
Hi Mike,
The core name can be accessed via: ${solr.core.name} in solrconfig.xml
(verified in a solr replication config).
HTH,
Dmitry
On Fri, May 9, 2014 at 4:07 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:
It seems as
Yes Thank you,
I got to solve by adding literal when indexing.
now I'm trying to implement it into my android application, I used Solrj,
but I found out that Solrj is just for java application and it's not working
with Android.
can you suggest me a way how to index a folder from my android appli
That's a lot of tweets. There is an article talking about smaller
scale lessons, might be still useful:
http://ricston.com/blog/guerrilla-search-solr-run-3-million-documents-search-15month-machine/
Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-s
Hi
How to become a solr committer? Any suggestions?
Regards
Mukund
Hi Aiguofer,
You mean ClassicTokenizer? Because StandardTokenizer does not set token types
(e-mail, url, etc).
I wouldn't go with the JFlex edit, mainly because maintenance costs. It will be
a burden to maintain a custom tokenizer.
MappingCharFilters could be used to manipulate tokenizer beha
Hi
I have setup default cloud cluster 4.6.0 with inbuilt Zookeeper running on
Jetty, as I started with indexing till a few thousand it goes fine but soon
after some 5000 documents or so it started giving error(please find below)
and stopped the indexing too as the Zookeeper Leader selection was in
: me incorporate these config files as before. I'm (naively?) trying the
: following:
:
: final StandardQueryParser parser = new StandardQueryParser();
: final Query luceneQuery = parser.parse(query, "text");
: luceneIndex.getIndexSearcher().search(luceneQuery, collector);
Hi Mike,
The core name can be accessed via: ${solr.core.name} in solrconfig.xml
(verified in a solr replication config).
HTH,
Dmitry
On Fri, May 9, 2014 at 4:07 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:
> It seems as if the location of the suggester dictionary directory is n
Hi,
Can anybody tell me where does the embedded zookeeper keeps your config
files.when we describe the configName in starting the solrcloud then it
gives that name to the directory, as guessed from the solr logs.
*4409 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath:
/conf
On 5/14/2014 7:15 AM, nativecoder wrote:
> Can someone please tell me the difference between searching a text in the
> following ways
>
> 1. q=Exact_Word:"samplestring" -> What does it tell to solr ?
>
> 2. q=samplestring&qf=Exact_Word -> What does it tell to solr ?
>
> 3. q="samplestring"&qf=
Hello, I'm struggling with large data indexed and searched by Solr.
The schema of the documents consist of date(-MM-DD), text(tokenized and
indexed with Natural Language Toolkit), and several numerical fields.
Each document is small-sized but but the number of the docs is very large,
which is
Jack Krupansky-2 wrote
> Typically the white space tokenizer is the best choice when the word
> delimiter filter will be used.
>
> -- Jack Krupansky
If we wanted to keep the StandardTokenizer (because we make use of the token
types) but wanted to use the WDFF to get combinations of words that ar
Hi,
I don't have system that searches on URLs. So I don't fully follow.
But I remember people use URLClassifyProcessorFactory
On Friday, May 16, 2014 8:33 PM, Nitzan Shaked wrote:
Doesn't look like it. If I understand it correctly,
PathHierarchyTokenizerFactory
will only output prefixes. I su
Hi,
There was a mention either on solr wiki or on this list, that in order to
optimize the date range queries, it is beneficial to round down the range
values.
For example, if a range query is:
DateTime:[NOW-3DAYS TO NOW]
then if the precision up to msec is not required, we can safely round tha
Some of the data providers for Twitter offer a search API. Depending on
what you're doing, you might not even need to host this yourself.
My company does do search and analytics over tweets, but by the time we end
up indexing them, we've winnowed down the initial set to 10% of what we've
initially
Hi Nitzan,
Cant you do what you described with PathHierarchyTokenizerFactory?
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizerFactory.html
Ahmet
On Friday, May 16, 2014 5:13 PM, Nitzan Shaked wrote:
Hi list
I created a small token
How are you getting the data into Solr?
Solr is not a storage or a database method. It's a search engine. So,
usually, you would have your filesystem with files and then you feed
those to Solr for indexing. When you found what you are looking for,
you can have the particular file delivered by what
Doesn't look like it. If I understand it correctly,
PathHierarchyTokenizerFactory
will only output prefixes. I support suffixes as well, plus the
ever-so-useful "unanchored" sub-sequences. Using domains again as an
example, I can use my suggestion to query "market.ebay" and find "
www.market.ebay.c
Please add me to the list of contributors. Username: al.krinker
There is some minor css tweaks that I would like to fix.
I work with Solr almost daily, so I would love to contribute to make it
better.
Thanks,
Al
Have you looked at "spellcheck.collate", which re-writes the entire query with
one or more corrected words? See
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate . There are
several options shown at this link that controls how the "collate" feature
works.
James Dyer
Ingram C
Al
i’ve added you :)
minor note aside: being listed in the contributors group in the wiki doesn’t
mean, you can change/commit to the lucene/solr repository automatically. but
improvements are always welcome, you can read about it on
https://wiki.apache.org/solr/HowToContribute
-Stefan
O
Hi all,
I created a new patch https://issues.apache.org/jira/browse/SOLR-6063 ,
enabling changes in core properties without the need to unload and create it.
Considering the change in patch,
is reloading a core with transient=true and loadOnStartup=false equivalent in
memory footprint to unloadi
Thanks that worked
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-How-to-index-scripts-sh-and-SQL-tp4135627p4136207.html
Sent from the Solr - User mailing list archive at Nabble.com.
On 5/13/2014 8:56 AM, nativecoder wrote:
> Exact_Word" omitPositions="true" termVectors="false"
> omitTermFreqAndPositions="true" compressed="true" type="string_ci"
> multiValued="false" indexed="true" stored="true" required="false"
> omitNorms="true"/>
>
> multiValued="false" indexed="true" stor
> The q.alt param specifies only the parameter to use if the q parameter is
> missing. Could you verify whether that is really the case? Typically
> solrconfig gives a default of "*:*" for the q parameter. Specifying a
> query
> via the q.alt parameter seems like a strange approach - what is your
>
> Also could you please tell me the difference between searching a text in
> the
> following ways
>
> q=Exact_Word:"samplestring"
>
> q=samplestring&qf=Exact_Word
>
> I am trying to understand how enclosing the full term in "" is resolving
> this problem ? What does it tell to solr ?
The quotes t
I guess, you will need to modify your extraction select in order to fix it,
using some date functions provided by the database manufacturer. For
example, in some projects when using oracle as a data source i've been
using the next recipe to modify the oracle TIMESTAMP(6) datatype to fit the
solr da
For these specific examples, the results should be the same, but mostly
that's because the term is a simple sequence of letters.
I have an extended discussion of characters in terms in my e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product
Hi;
When I index documents via Solrj it sends documents as XML. Solr processes
it with XMLLoader. Then sends response as javabin format.
Why Solrj client does not send data as javabin format as default?
PS: I use Solr 4.5.1
Thanks;
Furlan KAMACI
Are you passing LBHttpSolrServer to the c'tor of CloudSolrServer or
just using it bare?
On Wed, May 14, 2014 at 12:16 AM, lboutros wrote:
> In other words, is there a way for the LBHttpSolrServer to ignore replicas
> which are currently "cold" ?
>
> Ludovic.
>
>
>
> -
> Jouve
> France.
> --
I haven't tried situation this this but as per your requirements, you can
make the schema for defining all those fields required by you like, date,
location, etc you can also configure the faceting form solrconfig.xml if
you want the same for every request.
You should give it a try by allocating t
aman,
if you don't trust the tomcat bits repackaged by heliosearch, perhaps the best
step for you is to try looking at the helioseach packaging and configs on a
test environment and you can diff out the deltas between how they setup tomcat
to work with solr from the regular distribution you mig
On 5/13/2014 3:12 AM, Anon15 wrote:
> Thanks for replying !
>
> This is my Schema.xml.
The XML is gone. I would imagine that this is happening because you're
posting on the Nabble forum, not directly to the mailing list. Nabble
is a two-way mirror of the list, the actual mailing list is the tr
On 5/15/2014 2:10 AM, Aman Tandon wrote:
> @Matt, Sorry we don't want to use any other organisation product other than
> Apache Foundation. Thanks anyway.
>
> Anybody else here who can help me with the default tomcat installation
> along with solr to configure solrcloud.
Heliosearch was created b
After years of building world-wide search services, I disagree.
The general rule is to do everything in Unicode and UTC and to convert at the
edges of the service. If you use local character sets or local time, you will
pay for it.
wunder
On May 14, 2014, at 5:27 AM, "Jack Krupansky" wrote:
Hi,
There is one more problem today, i indexed the mcat core and again copied
the same and then starting the shard
(as decribed in above thread)
*And i was taking my non sharded index(mcats index) and copying it to node1
as well as node 2 and starting the first node as: *
I noticed that there i
Hello,
We are using solr ver 4.3.1 and running them in solrcloud mode. We would
like to keep some dynamic configs under data directory of every shards
and/or replica of a collection. I would like to know that if nodes in not
in active state (lets say it is in recovery or other stats), and if it
co
hmm, i got a message yesterday about emails sent to me from the list
bouncing, wonder if theres something odd going on with the mailing list?
Cheers
Peri (x4082)
On 10 May 2014 13:30, Aman Tandon wrote:
> Hi,
>
> I am not getting any mails from this group, did my subscription just got
> ended
Is there a way to unload the complete collection in SolrCloud env? I can
achieve the same by unloading all shards of collection using core admin
API, but is there a better/cleaner approach?
-Saumitra
--
View this message in context:
http://lucene.472066.n3.nabble.com/Unload-collection-in-Solr
I will try with the SolrEntityProcessor
but I'm still intrested to know why will it not work with the
XPathEntityProcessor
--
View this message in context:
http://lucene.472066.n3.nabble.com/URLDataSource-indexing-from-other-Solr-servers-tp4135321p4135730.html
Sent from the Solr - User mailin
Perhaps because du reports disk block usage, not total file size?
Upayavira
On Wed, May 7, 2014, at 04:34 AM, Darrell Burgan wrote:
Hello all, I’m trying to reconcile what I’m seeing in the file system
for a Solr index versus what it is reporting in the UI. Here’s what I
see in the UI for
Hi,
Recently I have set up an image with SOLR. My goal is to index and extract
files on a Windows and Linux server. It is possible for me to index and
extract data from multiple file types. This is done by the SOLR CELL request
handler. See the post.jar cmd below.
j ava -Dauto -Drecursive -jar po
Are you talking about static warming queries, which you define as
newSearcher and firstSearcher events? If so, you should see all three
queries in the log. If you're still having the issue, can you post your
warming query configuration?
Joel Bernstein
Search Engineer at Heliosearch
On Wed, May 7
Thank you Mark.
The issue : https://issues.apache.org/jira/browse/SOLR-6086
Ludovic.
-
Jouve
France.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Replica-active-during-warming-tp4135274p4136038.html
Sent from the Solr - User mailing list archive at Nabble.com.
I wonder about performance difference of 2 indexing options: 1- multivalued
field 2- separate fields
The case is as follows: Each document has 100 “properties”: prop1..prop100.
The values are strings and there is no relation between different
properties. I would like to search by exact match on se
Is the mail list working again yet??
-- Jack Krupansky
There is also solrjmeter tool that wraps jmeter inside:
https://github.com/romanchyla/solrjmeter
I have tried it and saw more interesting graphs.
You can also plot the solr cache stats and other metrics via querying
with /admin/mbeans?stats=true&wt=json suffix on your core/collection and
using som
Hi All,
I spent some time today playing around with subfacets and facets functions now
available in helios search 0.05 and I have some concerns... They look very
promising .
I indexed 10 000 documents and built some queries to look at each feature and
found some weird behaviour that I could no
To achieve what you want, you need to specify a lightly analyzed field (no
stemming) for spellcheck. For instance, if your "solr.SpellCheckComponent" in
solrconfig.xml is set up with "field" of "title_full", then try using
"title_full_unstemmed". Also, if you are specifying a
"queryAnalyzerFi
When a core is unloaded it is unregistered from zookeeper and stops taking
request, while retaining data on disk(with default params).
Can someone explain what happens internally and how memory, CPU and network
bandwidth will be affected if we load/unload shards frequently in SolrCloud
setup using
Please read:
http://wiki.apache.org/solr/UsingMailingLists
and the contained link:
http://catb.org/~esr/faqs/smart-questions.html
On Tue, May 13, 2014 at 12:03 AM, Kamal Kishore
wrote:
> NO reply from anybody..seems strange ?
>
>
> On Fri, May 9, 2014 at 9:47 AM, Kamal Kishore
> wrote:
>
>> Any
Hi,
I need to set up a zookeeper ensemble. I could download Zookeeper and do
it that way. I already have everything I need to run Zookeeper within a
Solr install.
Is it possible to run a three node zookeeper ensemble by starting up
three Solr nodes with Zookeeper enabled? Obviously, I'd only use
Hi All,
Is there an API in Solr to change transientCacheSize dynamically without the
need to restart Solr?
Is there other Solr configuration parameters that can be changed dynamically?
Thanks.
Hi,
I am not getting any mails from this group, did my subscription just got
ended? Is there anybody can help.
With Regards
Aman Tandon
Have you looked at the results after adding &debug=query? That often
gives you valuable insights into such questions. Admittedly, the debug
syntax can be "interesting" get used to...
Best,
Erick
On Tue, May 13, 2014 at 9:11 PM, nativecoder wrote:
> Yes that happens due to the ! mark.
>
> Also ca
Any help here??
With Regards
Aman Tandon
On Thu, May 15, 2014 at 10:17 PM, Aman Tandon wrote:
> Hi,
>
> Can anybody tell me where does the embedded zookeeper keeps your config
> files.when we describe the configName in starting the solrcloud then it
> gives that name to the directory, as guesse
ok, thanks a lot, I'll check that out.
2014-05-14 14:20 GMT+02:00 Markus Jelsma :
> Elisabeth, i think you are looking for SOLR-3211 that introduced
> spellcheck.collateParam.* to override e.g. dismax settings.
>
> Markus
>
> -Original message-
> From:elisabeth benoit
> Sent:Wed 14-05-2
It didn't have any improvements for a long time now (It doesn't have any
SolrCloud-related feautes for example), I just added a note on Solr wiki to
alert users about that. Feel free to ask on the solrmeter mailing list if
you have any other questions.
Tomás
On Wed, May 14, 2014 at 3:56 AM, Ahme
92 matches
Mail list logo