Well, I recently chose it for a personal project and the deciding
thing for me was that it had nice integration to couchdb.
Thanks,
Bryan Rasmussen
On Wed, Jun 1, 2011 at 4:33 AM, Mark wrote
> I've been hearing more and more about ElasticSearch. Can anyone give me a
> rough overview on how these
Hi all,
We're using Solr to search on a Shop index and a Product index. Currently a
Shop has a field `shop_keyword` which also contains the keywords of the
products assigned to it. The shop keywords are separated by a space.
Consequently, if there is a product which has a keyword "apple" and anot
Hello to all,
I have a collection of text phrases in more than 20 languages that I'm indexing
in solr. Each phrase belongs to one of about 30 different phrase categories. I
have specified different fields for each language and added a synonym filter at
query time. I would however like the syno
I don't think you can assign a synonyms file dynamically to a field.
you would need to create multiple fields for each lang / cat phrases
and have their own synonyms file referenced for each field. that would
be a lot of fields.
On 1 June 2011 09:59, Spyros Kapnissis wrote:
> Hello to all,
If your index size if smaller (a few 100 MBs), you can consider the SOLR's
operational script tools provided with distribution to sync indexes from
Master to Slave servers. It will update(copies) the latest index snapshot
from Master to Slave(s). SOLR wiki provides good info on how to set them as
C
Thanks pravesh ^_^
You said "BTW, SOLR1.4+ ,also has feature where only the changed segment gets
synched".
Can you give me a document or some detail information please ? I've looked up
at online documents but didn't find any information .
Thanks very much .
发件人: pravesh
发送时间: 2011-06-01
>>We're using Solr to search on a Shop index and a Product index
Do you have 2 separate indexes (using distributed shard search)?? I'm sure
you are actually having only single index.
>> Currently a Shop has a field `shop_keyword` which also contains the
>> keywords of the products assigned to it.
SOLR wiki will provide help on this. You might be interested in pure Java
based replication too. I'm not sure,whether SOLR operational will have this
feature(synch'ing only changed segments). You might need to change
configuration in searchconfig.xml
--
View this message in context:
http://lucene
On Tue, 31 May 2011 19:38 -0700, "Jason Rutherglen"
wrote:
> Mark,
>
> Nice email address. I personally have no idea, maybe ask Shay Banon
> to post an answer? I think it's possible to make Solr more elastic,
> eg, it's currently difficult to make it move cores between servers
> without a lot
Thats pretty awesome. Thanks Renaud!
On Tue, 2011-05-31 at 22:56 +0100, Renaud Delbru wrote:
> Hi,
>
> have a look at the flexible query parser of lucene (contrib package)
> [1]. It provides a framework to easily create different parsing logic.
> You should be able to access the AST and to mod
Thanks for your point. I was really tripping that issue. But Now I need a
bit help more.
As far I have noticed that in the case of a value like "*role_delete*" ,
WordDelimiterFilterFactory
index two words like "*role"* and "*delete"* and in both search result with
the term "*role*" and "*delete*" w
My OS is also CentOS (5.4). If it were 10gb all the time it would be
ok, but it grows for 13-15gb, and hurts other services =\
> It could be environment specific (specific of your "top" command
> implementation, OS, etc)
> I have on CentOS 2986m "virtual" memory showing although -Xmx2g
> You
Hi guys,
Just to let you know we're meeting up to talk all-things-search on Monday
13th June. There's usually a good mix of backgrounds and experience levels
so if you're free and in the London area then it'd be good to see you there.
Details:
7pm - The Elgin - 96 Ladbrooke Grove
http://www.meetu
You may be interested in Solr's replication feature?
http://wiki.apache.org/solr/SolrReplication
On 6/1/2011 2:07 AM, wrote:
Hi everyone,
If I have two server ,their indexes should be synchronized. I changed A's
index via HTTP send document objects, Is there any config or some plug-in
On 6/1/2011 6:12 AM, pravesh wrote:
SOLR wiki will provide help on this. You might be interested in pure Java
based replication too. I'm not sure,whether SOLR operational will have this
feature(synch'ing only changed segments). You might need to change
configuration in searchconfig.xml
Yes, thi
Lee,
Thank you very much for your answer.
Using the signature field as the uniqueKey is effectively what I was
doing, so the "overwriteDupes=true" parameter in my solrconfig was
somehow redundant, although I wasn't aware of it! =D
In practice it works perfectly and that's the nice part.
By
Could you post one of your pairs of definitions? Because
I don't recognize queryMode and a web search doesn't turn
anything up, so I'm puzzled.
Best
Erick
On Wed, Jun 1, 2011 at 1:13 AM, Mike Schultz wrote:
> We have very long schema files for each of our language dependent query
> shards. One
If I read this correctly, one approach is to specify an
increment gap in a multiValued field, then search for phrases
with a slop less than that increment gap. i.e.
incrementGap=100 in your definition, and search for
"apple orange"~99
If this is gibberish, please post some examples and we'll
try s
Take a look here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
I think you want generateWordParts=1, catenateWords=1 and preserveOriginal=1,
but check it out with the admin/analysis page.
Oh, and your index-time and query-time patterns for WDFF will
You might have more luck going the other way, applying the
field collapsing patch to trunk. This is currently being worked
on, see:
https://issues.apache.org/jira/browse/SOLR-2564
Best
Erick
On Wed, Jun 1, 2011 at 12:22 AM, Isha Garg wrote:
> Hi,
> Actually currently I am using solr ve
Hi Tomás,
Thank you very much for your suggestion. I took another crack at it using
your recommendation and it worked ideally. The only thing I had to change
was
to
The first did not produce any results but the second worked beautifully.
Thanks!
Brian Lamb
2011/5/31 Tomás Fernánde
Yes that would probably be a lot of fields.. I guess a way would be to extend
the SynonymFilter and change the format of the synonyms.txt file to take the
categories into account.
Thanks again for your answer.
From: lee carroll
To: solr-user@lucene.apache.o
> I'm likely to try playing with moving cores between hosts soon. In
> theory it shouldn't be hard. We'll see what the practice is like!
Right, in theory it's quite simple, in practice I've setup a master,
then a slave, then had to add replication to both, then call create
core, then replicate, th
On Wed, 01 Jun 2011 07:52 -0700, "Jason Rutherglen"
wrote:
> > I'm likely to try playing with moving cores between hosts soon. In
> > theory it shouldn't be hard. We'll see what the practice is like!
>
> Right, in theory it's quite simple, in practice I've setup a master,
> then a slave, then h
> And some way to delete the core when it has been transferred.
Right, I manually added that to CoreAdminHandler. I opened an issue
to try to solve this problem: SOLR-2569
On Wed, Jun 1, 2011 at 8:26 AM, Upayavira wrote:
>
>
> On Wed, 01 Jun 2011 07:52 -0700, "Jason Rutherglen"
> wrote:
>> > I
Be a little careful here. LowerCaseTokenizerFactory is different than
KeywordTokenizerFactory.
LowerCaseTokenizerFactory will give you more than one term. e.g.
the string "Intelligence can't be MeaSurEd" will give you 5 terms,
any of which may match. i.e.
"intelligence", "can", "t", "be", "measure
On 6/1/2011 10:52 AM, Jason Rutherglen wrote:
nightmarish to setup. The problem is, it freezes each core into a
respective role, so if I wanted to then 'move' the slave, I can't
because it's still setup as a slave.
Don't know if this helps or not, but you CAN set up a core as both a
master and
On 6/1/2011 11:26 AM, Upayavira wrote:
Probably the ReplicationHandler would need a 'one-off' replication
command...
It's got one already, if you mean a command you can issue to a slave to
tell it to pull replication right now. The thing is, you can only issue
this command if the core is co
Jonathan,
This is all true, however it ends up being hacky (this is from
experience) and the core on the source needs to be deleted. Feel free
to post to the issue.
Jason
On Wed, Jun 1, 2011 at 8:44 AM, Jonathan Rochkind wrote:
> On 6/1/2011 10:52 AM, Jason Rutherglen wrote:
>>
>> nightmarish
On 5/31/2011 3:02 PM, Markus Jelsma wrote:
Hi,
I've seen the stats page many times, of quite a few installations and even
more servers. There's one issue that keeps bothering me: the cumulative hit
ratio of the query result cache, it's almost never higher than 50%.
What are your stats? How do y
I believe you need SOME query cache even with low hit counts, for things
like a user paging through results. You want the query to still be in
the cache when they go to the next page or what have you. Other
operations like this may depend on the query cache too for good
performance.
So even w
Here is output after about 24 hours running solr. Maybe there is some
way to limit memory consumption? :(
test@d6 ~/solr/example $ java -Xms3g-Xmx6g-D64
-Dsolr.solr.home=/home/test/solr/example/multicore/ -jar start.jar
2011-05-31 17:05:14.265:INFO::Logging to STDERR via
On Wed, 01 Jun 2011 11:47 -0400, "Jonathan Rochkind"
wrote:
> On 6/1/2011 11:26 AM, Upayavira wrote:
> >
> > Probably the ReplicationHandler would need a 'one-off' replication
> > command...
>
> It's got one already, if you mean a command you can issue to a slave to
> tell it to pull replication
I should have explained that the queryMode parameter is for our own custom
filter. So the result is that we have 8 filters in our field definition.
All the filter parameters (30 or so) of the query time and index time are
identical EXCEPT for our one custom filter which needs to know if it's in
q
Thanks. Ill have to create a Jira account to vote i guess.
We are already using KStemmer in 1.4.2 production and I would like to
upgrade to 3.1. In the meantime, what is another stemmer I could use out
of the box that would be have similar to KStemmer?
Thanks
On 5/28/11 10:02 AM, Steven A Ro
Are you in fact out of swap space, as the java error suggested?
The way JVM's work always, if you tell it -Xmx6g, it WILL use all 6g
eventually. The JVM doesn't Garbage Collect until it's going to run out
of heap space, until it gets to your Xmx. It will keep using RAM until
it reaches your
You _could_ configure it as a slave, if you plan to sometimes use it as
a slave. It can be configured as both a master and a slave. You can
configure it as a slave, but turn off automatic polling. And then issue
one-off replicate commands whenever you want.
But yeah, it gets messy, your use
So what should i do to evoid that error?
I can use 10G on server, now i try to run with flags:
java -Xms6G -Xmx6G -XX:MaxPermSize=1G -XX:PermSize=512M -D64
Or should i set xmx to lower numbers and what about other params?
Sorry, i don't know much about java/jvm =(
Wednesday, June 1, 2011, 7:29:
Hi,
We use solr and lucene fieldcache like this
static DocTerms myfieldvalues =
org.apache.lucene.search.FieldCache.DEFAULT.getTerms(reader, "myField");
which is initialized at first use and will stay in memory for fast retrieval
of field values based on DocID
The problem is after an index/commit
There is no simple answer.
All I can say is you don't usually want to use an Xmx that's more than
you actually have available RAM, and _can't_ use more than you have
available ram+swap, and the Java error seems to be suggesting you are
using more than is available in ram+swap. That may not be
Overall memory on server is 24G, and 24G of swap, mostly all the time
swap is free and is not used at all, that's why "no free swap" sound
strange to me..
> There is no simple answer.
> All I can say is you don't usually want to use an Xmx that's more than
> you actually have available RAM, a
PermSize and MaxPermSize don't need to be higher than 64M. You should read on
JVM tuning. The permanent generation is only used for the code that's being
executed.
> So what should i do to evoid that error?
> I can use 10G on server, now i try to run with flags:
> java -Xms6G -Xmx6G -XX:MaxPer
Could be related to your crazy high MaxPermSize like Marcus said.
I'm no JVM tuning expert either. Few people are, it's confusing. So if
you don't understand it either, why are you trying to throw in very
non-standard parameters you don't understand? Just start with whatever
the Solr example
Hello everyone,
I have just gotten extracting information from files with Solr Cell. Some of
the files we are indexing are large, and have much content. I would like to
limit the amount of data I index to a specified limit of characters (example
300 chars) which I will use as a document preview
There were no parameters at all, and java hitted "out of memory"
almost every day, then i tried to add parameters but nothing changed.
Xms/Xmx - did not solve the problem too. Now i try the MaxPermSize,
because it's the last thing i didn't try yet :(
Wednesday, June 1, 2011, 9:00:56 PM,
Apologize if this question has already been raised. I tried searching but
couldn't find the relevant posts.
We've indexed a bunch of documents by different authors. Then for search
results, we'd like to show the authors that have 1 or more documents
matching the search keywords.
The problem i
Hi All,
I need to change the default scoring formula of solr. How shall I hack the
code to do so?
also, is there any way to stop solr to do its default scoring and sorting?
Thanks,
Gaurav
--
View this message in context:
http://lucene.472066.n3.nabble.com/Change-default-scoring-formula-tp301219
About once a day a Solr/Jetty process gets hung on my server consuming 100% of
one of the CPU's. Once this happens the server no longer responds to requests.
I've looked through the logs to try and see if anything stands out but so far
I've found nothing out of the ordinary.
My current remedy
Taking a thread dump will take you what's going.
Bill
On Wed, Jun 1, 2011 at 3:04 PM, Chris Cowan wrote:
> About once a day a Solr/Jetty process gets hung on my server consuming 100%
> of one of the CPU's. Once this happens the server no longer responds to
> requests. I've looked through the log
Hi Otis,
Sending to solr-user mailing list.
We see this CLOSE_WAIT connections even when i do a simple http request via
curl, that is, even when i do a simple curl using a primary and secondary
shard query, like for e.g.
curl "
http://primaryshardhost:8180/solr/core0/select?q=*%3A*&shards=seconda
I'm pretty green... is that something I can do while the event is happening or
is there something I need to configure to capture the dump ahead of time.
I've tried to reproduce the problem by putting the server under load but that
doesn't seem to be the issue.
Chris
On Jun 1, 2011, at 12:06
Sorry ... I just found it. I will try that next time. I have a feeling it wont
work since the server usually stops accepting connections.
Chris
On Jun 1, 2011, at 12:12 PM, Chris Cowan wrote:
> I'm pretty green... is that something I can do while the event is happening
> or is there something
I think in my case LowerCaseTokenizerFactory will be sufficient because
there will never be spaces in this particular field. But thank you for the
useful link!
Thanks,
Brian Lamb
On Wed, Jun 1, 2011 at 11:44 AM, Erick Erickson wrote:
> Be a little careful here. LowerCaseTokenizerFactory is diff
Hi Gaurav, not sure what your use case is (and if no sorting at all is ever
required, is Solr / Lucene what you need?).
You can certainly sort by a field (or more) in descendant or ascendant order
by using the "sort" parameter.
You can customize the scoring algorithm by overriding the DefaultSimila
There's no great way to do that.
One approach would be using facets, but that will just get you the
author names (as stored in fields), and not the documents under it. If
you really only want to show the author names, facets could work. One
issue with facets though is Solr won't tell you the t
Is it possible to do a search based on a PDF file? I know its possible to
update the index with a PDF but can you do just a regular search with it?
Thanks,
Brian Lamb
First guess (and it really is just a guess) would be Java garbage
collection taking over. There are some JVM parameters you can use to
tune the GC process, especially if the machine is multi-core, making
sure GC happens in a seperate thread is helpful.
But figuring out exactly what's going on
How are you implementing your custom cache? If you're defining
it in the solrconfig, couldn't you implement the regenerator? See:
http://wiki.apache.org/solr/SolrCaching#User.2BAC8-Generic_Caches
Best
Erick
On Wed, Jun 1, 2011 at 12:38 PM, oleole wrote:
> Hi,
>
> We use solr and lucene fieldcach
If you can live with an across-the-board limit, you can set maxFieldLength
in your solrconfig.xml file. Note that this is in terms rather than
chars though...
Best
Erick
On Wed, Jun 1, 2011 at 2:22 PM, Greg Georges wrote:
> Hello everyone,
>
> I have just gotten extracting information from files
Hi,
I need to provide NRT search with faceting. Been looking at the options out
there. Wondered if anyone could clarify some questions I have and perhaps share
your NRT experiences.
The various NRT options:
1) Solr
-Solr doesn't have NRT, yet. What is the expected time frame for NRT? Is it a
I'm not quite sure what you mean by "regular search". When
you index a PDF (Presumably through Tika or Solr Cell) the text
is indexed into your index and you can certainly search that. Additionally,
there may be meta data indexed in specific fields (e.g. author,
date modified, etc).
But what does
Thanks Tomas. Well I am sorting results by a function query. I donot want
solr to do extra effort in calculating score for each document and eat up my
cpu cycles. Also, I need to use "if" condition in score calculation, which I
emulated through "map" function, but map function do not accept a funct
Tanner,
I just entered SOLR-2571 to fix the float-parsing-bug that breaks
"thresholdTokenFrequency". Its just a 1-line code fix so I also included a
patch that should cleanly apply to solr 3.1. See
https://issues.apache.org/jira/browse/SOLR-2571 for info and patches.
This parameter appears a
Don't manually group by author from your results, the list will always
be incomplete... use faceting instead to show the authors of the books
you have found in your search.
http://wiki.apache.org/solr/SolrFacetingOverview
-Original Message-
From: beccax [mailto:bec...@gmail.com]
Sent: W
I think facet.offset allows facet paging nicely by letting you index
into the list of facet values. It is working for me...
http://wiki.apache.org/solr/SimpleFacetParameters#facet.offset
-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Wednesday, June 01, 2011
How do you know whether to provide a 'next' button, or whether you are
the end of your facet list?
On 6/1/2011 4:47 PM, Robert Petersen wrote:
I think facet.offset allows facet paging nicely by letting you index
into the list of facet values. It is working for me...
http://wiki.apache.org/sol
Yes that is exactly the issue... we're thinking just maybe always have a
next button and if you go too far you just get zero results. User gets
what the user asks for, and so user could simply back up if desired to
where the facet still has values. Could also detect an empty facet
results on the
Hey Denis,
* How big is your index in terms of number of documents and index size?
* Is it production system where you have many search requests?
* Is there any pattern for OOM errors? I.e. right after you start your
Solr app, after some search activity or specific Solr queries, etc?
* What are 1)
Maybe HTMLStripTransformer is what you are looking for.
* http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer
On Tue, May 31, 2011 at 5:35 PM, Erick Erickson wrote:
> Convert them to what? Individual fields in your docs? Text?
>
> If the former, you might get some joy from the Xpa
> I've tried to use a spellcheck dictionary built from my own content, but my
> content ends up having a lot of misspelled words so the spellcheck ends up
> being less than effective.
You can try to use sp.dictionary.threshold parameter to solve this problem
* http://wiki.apache.org/solr/SpellCheck
> Will it be slow if there are 3-5 million key/value rows?
AFAIK it shouldn't affect search time significantly as Solr caches it
in memory after you reloading Solr core / issuing commit.
But obviously you need more memory and commit/reload will take more time.
Hi Andy:
Here is a white paper that shows screenshots of faceting working with
Solr and RankingAlgorithm under NRT:
http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search
The implementation (src) is also available with the download and is
described in the below document:
http://solr-ra.tgels
Nagendra,
Thanks. Can you comment on the performance impact of NRT on facet search? The
pages you linked to don't really touch on that.
My concern is that with NRT, the facet cache will be constantly invalidated.
How will that impact the performance of faceting?
Do you have any benchmark compa
Hi All,
We need to score documents based on some parameters received in query
string. Since this was not possible via function query as we need to use
"if" condition, which can be emulated through map function, but one of the
output values of "if" condition has to be function, where as map only
ac
Its Working as I was looking for.Thanks Mr. Erick.
On Wed, Jun 1, 2011 at 8:29 PM, Erick Erickson wrote:
> Take a look here:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
>
> I think you want generateWordParts=1, catenateWords=1 and
> preserveOrig
Hi all,
here is a piece from my solfconfig:
but somehow synonyms are not read... I mean there is no match when i use a
word in the synonym file... any ideas?
-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context:
http://lucene.47
I ran out of memory on some big indexes when using solr 1.4. Found out that
increasing
termInfosIndexDivisor
in solrconfig.xml could help a lot.
It may slow down your searching your index.
cheers,
:-Dennis
On 02/06/2011, at 01.16, Alexey Serba wrote:
> Hey Denis,
>
> * How big is your in
On Fri, May 20, 2011 at 12:40 AM, Chris Hostetter
wrote:
>
> : It is fairly simple to generate facets for ranges or 'buckets' of
> : distance in Solr:
> : http://wiki.apache.org/solr/SpatialSearch#How_to_facet_by_distance.
> : What isnt described is how to generate the links for these facets
>
> a
78 matches
Mail list logo