On Fri, May 20, 2011 at 12:40 AM, Chris Hostetter
wrote:
>
> : It is fairly simple to generate facets for ranges or 'buckets' of
> : distance in Solr:
> : http://wiki.apache.org/solr/SpatialSearch#How_to_facet_by_distance.
> : What isnt described is how to generate the links for these facets
>
> a
I ran out of memory on some big indexes when using solr 1.4. Found out that
increasing
termInfosIndexDivisor
in solrconfig.xml could help a lot.
It may slow down your searching your index.
cheers,
:-Dennis
On 02/06/2011, at 01.16, Alexey Serba wrote:
> Hey Denis,
>
> * How big is your in
Hi all,
here is a piece from my solfconfig:
but somehow synonyms are not read... I mean there is no match when i use a
word in the synonym file... any ideas?
-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context:
http://lucene.47
Its Working as I was looking for.Thanks Mr. Erick.
On Wed, Jun 1, 2011 at 8:29 PM, Erick Erickson wrote:
> Take a look here:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
>
> I think you want generateWordParts=1, catenateWords=1 and
> preserveOrig
Hi All,
We need to score documents based on some parameters received in query
string. Since this was not possible via function query as we need to use
"if" condition, which can be emulated through map function, but one of the
output values of "if" condition has to be function, where as map only
ac
Nagendra,
Thanks. Can you comment on the performance impact of NRT on facet search? The
pages you linked to don't really touch on that.
My concern is that with NRT, the facet cache will be constantly invalidated.
How will that impact the performance of faceting?
Do you have any benchmark compa
Hi Andy:
Here is a white paper that shows screenshots of faceting working with
Solr and RankingAlgorithm under NRT:
http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search
The implementation (src) is also available with the download and is
described in the below document:
http://solr-ra.tgels
> Will it be slow if there are 3-5 million key/value rows?
AFAIK it shouldn't affect search time significantly as Solr caches it
in memory after you reloading Solr core / issuing commit.
But obviously you need more memory and commit/reload will take more time.
> I've tried to use a spellcheck dictionary built from my own content, but my
> content ends up having a lot of misspelled words so the spellcheck ends up
> being less than effective.
You can try to use sp.dictionary.threshold parameter to solve this problem
* http://wiki.apache.org/solr/SpellCheck
Maybe HTMLStripTransformer is what you are looking for.
* http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer
On Tue, May 31, 2011 at 5:35 PM, Erick Erickson wrote:
> Convert them to what? Individual fields in your docs? Text?
>
> If the former, you might get some joy from the Xpa
Hey Denis,
* How big is your index in terms of number of documents and index size?
* Is it production system where you have many search requests?
* Is there any pattern for OOM errors? I.e. right after you start your
Solr app, after some search activity or specific Solr queries, etc?
* What are 1)
Yes that is exactly the issue... we're thinking just maybe always have a
next button and if you go too far you just get zero results. User gets
what the user asks for, and so user could simply back up if desired to
where the facet still has values. Could also detect an empty facet
results on the
How do you know whether to provide a 'next' button, or whether you are
the end of your facet list?
On 6/1/2011 4:47 PM, Robert Petersen wrote:
I think facet.offset allows facet paging nicely by letting you index
into the list of facet values. It is working for me...
http://wiki.apache.org/sol
I think facet.offset allows facet paging nicely by letting you index
into the list of facet values. It is working for me...
http://wiki.apache.org/solr/SimpleFacetParameters#facet.offset
-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Wednesday, June 01, 2011
Don't manually group by author from your results, the list will always
be incomplete... use faceting instead to show the authors of the books
you have found in your search.
http://wiki.apache.org/solr/SolrFacetingOverview
-Original Message-
From: beccax [mailto:bec...@gmail.com]
Sent: W
Tanner,
I just entered SOLR-2571 to fix the float-parsing-bug that breaks
"thresholdTokenFrequency". Its just a 1-line code fix so I also included a
patch that should cleanly apply to solr 3.1. See
https://issues.apache.org/jira/browse/SOLR-2571 for info and patches.
This parameter appears a
Thanks Tomas. Well I am sorting results by a function query. I donot want
solr to do extra effort in calculating score for each document and eat up my
cpu cycles. Also, I need to use "if" condition in score calculation, which I
emulated through "map" function, but map function do not accept a funct
I'm not quite sure what you mean by "regular search". When
you index a PDF (Presumably through Tika or Solr Cell) the text
is indexed into your index and you can certainly search that. Additionally,
there may be meta data indexed in specific fields (e.g. author,
date modified, etc).
But what does
Hi,
I need to provide NRT search with faceting. Been looking at the options out
there. Wondered if anyone could clarify some questions I have and perhaps share
your NRT experiences.
The various NRT options:
1) Solr
-Solr doesn't have NRT, yet. What is the expected time frame for NRT? Is it a
If you can live with an across-the-board limit, you can set maxFieldLength
in your solrconfig.xml file. Note that this is in terms rather than
chars though...
Best
Erick
On Wed, Jun 1, 2011 at 2:22 PM, Greg Georges wrote:
> Hello everyone,
>
> I have just gotten extracting information from files
How are you implementing your custom cache? If you're defining
it in the solrconfig, couldn't you implement the regenerator? See:
http://wiki.apache.org/solr/SolrCaching#User.2BAC8-Generic_Caches
Best
Erick
On Wed, Jun 1, 2011 at 12:38 PM, oleole wrote:
> Hi,
>
> We use solr and lucene fieldcach
First guess (and it really is just a guess) would be Java garbage
collection taking over. There are some JVM parameters you can use to
tune the GC process, especially if the machine is multi-core, making
sure GC happens in a seperate thread is helpful.
But figuring out exactly what's going on
Is it possible to do a search based on a PDF file? I know its possible to
update the index with a PDF but can you do just a regular search with it?
Thanks,
Brian Lamb
There's no great way to do that.
One approach would be using facets, but that will just get you the
author names (as stored in fields), and not the documents under it. If
you really only want to show the author names, facets could work. One
issue with facets though is Solr won't tell you the t
Hi Gaurav, not sure what your use case is (and if no sorting at all is ever
required, is Solr / Lucene what you need?).
You can certainly sort by a field (or more) in descendant or ascendant order
by using the "sort" parameter.
You can customize the scoring algorithm by overriding the DefaultSimila
I think in my case LowerCaseTokenizerFactory will be sufficient because
there will never be spaces in this particular field. But thank you for the
useful link!
Thanks,
Brian Lamb
On Wed, Jun 1, 2011 at 11:44 AM, Erick Erickson wrote:
> Be a little careful here. LowerCaseTokenizerFactory is diff
Sorry ... I just found it. I will try that next time. I have a feeling it wont
work since the server usually stops accepting connections.
Chris
On Jun 1, 2011, at 12:12 PM, Chris Cowan wrote:
> I'm pretty green... is that something I can do while the event is happening
> or is there something
I'm pretty green... is that something I can do while the event is happening or
is there something I need to configure to capture the dump ahead of time.
I've tried to reproduce the problem by putting the server under load but that
doesn't seem to be the issue.
Chris
On Jun 1, 2011, at 12:06
Hi Otis,
Sending to solr-user mailing list.
We see this CLOSE_WAIT connections even when i do a simple http request via
curl, that is, even when i do a simple curl using a primary and secondary
shard query, like for e.g.
curl "
http://primaryshardhost:8180/solr/core0/select?q=*%3A*&shards=seconda
Taking a thread dump will take you what's going.
Bill
On Wed, Jun 1, 2011 at 3:04 PM, Chris Cowan wrote:
> About once a day a Solr/Jetty process gets hung on my server consuming 100%
> of one of the CPU's. Once this happens the server no longer responds to
> requests. I've looked through the log
About once a day a Solr/Jetty process gets hung on my server consuming 100% of
one of the CPU's. Once this happens the server no longer responds to requests.
I've looked through the logs to try and see if anything stands out but so far
I've found nothing out of the ordinary.
My current remedy
Hi All,
I need to change the default scoring formula of solr. How shall I hack the
code to do so?
also, is there any way to stop solr to do its default scoring and sorting?
Thanks,
Gaurav
--
View this message in context:
http://lucene.472066.n3.nabble.com/Change-default-scoring-formula-tp301219
Apologize if this question has already been raised. I tried searching but
couldn't find the relevant posts.
We've indexed a bunch of documents by different authors. Then for search
results, we'd like to show the authors that have 1 or more documents
matching the search keywords.
The problem i
There were no parameters at all, and java hitted "out of memory"
almost every day, then i tried to add parameters but nothing changed.
Xms/Xmx - did not solve the problem too. Now i try the MaxPermSize,
because it's the last thing i didn't try yet :(
Wednesday, June 1, 2011, 9:00:56 PM,
Hello everyone,
I have just gotten extracting information from files with Solr Cell. Some of
the files we are indexing are large, and have much content. I would like to
limit the amount of data I index to a specified limit of characters (example
300 chars) which I will use as a document preview
Could be related to your crazy high MaxPermSize like Marcus said.
I'm no JVM tuning expert either. Few people are, it's confusing. So if
you don't understand it either, why are you trying to throw in very
non-standard parameters you don't understand? Just start with whatever
the Solr example
PermSize and MaxPermSize don't need to be higher than 64M. You should read on
JVM tuning. The permanent generation is only used for the code that's being
executed.
> So what should i do to evoid that error?
> I can use 10G on server, now i try to run with flags:
> java -Xms6G -Xmx6G -XX:MaxPer
Overall memory on server is 24G, and 24G of swap, mostly all the time
swap is free and is not used at all, that's why "no free swap" sound
strange to me..
> There is no simple answer.
> All I can say is you don't usually want to use an Xmx that's more than
> you actually have available RAM, a
There is no simple answer.
All I can say is you don't usually want to use an Xmx that's more than
you actually have available RAM, and _can't_ use more than you have
available ram+swap, and the Java error seems to be suggesting you are
using more than is available in ram+swap. That may not be
Hi,
We use solr and lucene fieldcache like this
static DocTerms myfieldvalues =
org.apache.lucene.search.FieldCache.DEFAULT.getTerms(reader, "myField");
which is initialized at first use and will stay in memory for fast retrieval
of field values based on DocID
The problem is after an index/commit
So what should i do to evoid that error?
I can use 10G on server, now i try to run with flags:
java -Xms6G -Xmx6G -XX:MaxPermSize=1G -XX:PermSize=512M -D64
Or should i set xmx to lower numbers and what about other params?
Sorry, i don't know much about java/jvm =(
Wednesday, June 1, 2011, 7:29:
You _could_ configure it as a slave, if you plan to sometimes use it as
a slave. It can be configured as both a master and a slave. You can
configure it as a slave, but turn off automatic polling. And then issue
one-off replicate commands whenever you want.
But yeah, it gets messy, your use
Are you in fact out of swap space, as the java error suggested?
The way JVM's work always, if you tell it -Xmx6g, it WILL use all 6g
eventually. The JVM doesn't Garbage Collect until it's going to run out
of heap space, until it gets to your Xmx. It will keep using RAM until
it reaches your
Thanks. Ill have to create a Jira account to vote i guess.
We are already using KStemmer in 1.4.2 production and I would like to
upgrade to 3.1. In the meantime, what is another stemmer I could use out
of the box that would be have similar to KStemmer?
Thanks
On 5/28/11 10:02 AM, Steven A Ro
I should have explained that the queryMode parameter is for our own custom
filter. So the result is that we have 8 filters in our field definition.
All the filter parameters (30 or so) of the query time and index time are
identical EXCEPT for our one custom filter which needs to know if it's in
q
On Wed, 01 Jun 2011 11:47 -0400, "Jonathan Rochkind"
wrote:
> On 6/1/2011 11:26 AM, Upayavira wrote:
> >
> > Probably the ReplicationHandler would need a 'one-off' replication
> > command...
>
> It's got one already, if you mean a command you can issue to a slave to
> tell it to pull replication
Here is output after about 24 hours running solr. Maybe there is some
way to limit memory consumption? :(
test@d6 ~/solr/example $ java -Xms3g-Xmx6g-D64
-Dsolr.solr.home=/home/test/solr/example/multicore/ -jar start.jar
2011-05-31 17:05:14.265:INFO::Logging to STDERR via
I believe you need SOME query cache even with low hit counts, for things
like a user paging through results. You want the query to still be in
the cache when they go to the next page or what have you. Other
operations like this may depend on the query cache too for good
performance.
So even w
On 5/31/2011 3:02 PM, Markus Jelsma wrote:
Hi,
I've seen the stats page many times, of quite a few installations and even
more servers. There's one issue that keeps bothering me: the cumulative hit
ratio of the query result cache, it's almost never higher than 50%.
What are your stats? How do y
Jonathan,
This is all true, however it ends up being hacky (this is from
experience) and the core on the source needs to be deleted. Feel free
to post to the issue.
Jason
On Wed, Jun 1, 2011 at 8:44 AM, Jonathan Rochkind wrote:
> On 6/1/2011 10:52 AM, Jason Rutherglen wrote:
>>
>> nightmarish
On 6/1/2011 11:26 AM, Upayavira wrote:
Probably the ReplicationHandler would need a 'one-off' replication
command...
It's got one already, if you mean a command you can issue to a slave to
tell it to pull replication right now. The thing is, you can only issue
this command if the core is co
On 6/1/2011 10:52 AM, Jason Rutherglen wrote:
nightmarish to setup. The problem is, it freezes each core into a
respective role, so if I wanted to then 'move' the slave, I can't
because it's still setup as a slave.
Don't know if this helps or not, but you CAN set up a core as both a
master and
Be a little careful here. LowerCaseTokenizerFactory is different than
KeywordTokenizerFactory.
LowerCaseTokenizerFactory will give you more than one term. e.g.
the string "Intelligence can't be MeaSurEd" will give you 5 terms,
any of which may match. i.e.
"intelligence", "can", "t", "be", "measure
> And some way to delete the core when it has been transferred.
Right, I manually added that to CoreAdminHandler. I opened an issue
to try to solve this problem: SOLR-2569
On Wed, Jun 1, 2011 at 8:26 AM, Upayavira wrote:
>
>
> On Wed, 01 Jun 2011 07:52 -0700, "Jason Rutherglen"
> wrote:
>> > I
On Wed, 01 Jun 2011 07:52 -0700, "Jason Rutherglen"
wrote:
> > I'm likely to try playing with moving cores between hosts soon. In
> > theory it shouldn't be hard. We'll see what the practice is like!
>
> Right, in theory it's quite simple, in practice I've setup a master,
> then a slave, then h
> I'm likely to try playing with moving cores between hosts soon. In
> theory it shouldn't be hard. We'll see what the practice is like!
Right, in theory it's quite simple, in practice I've setup a master,
then a slave, then had to add replication to both, then call create
core, then replicate, th
Yes that would probably be a lot of fields.. I guess a way would be to extend
the SynonymFilter and change the format of the synonyms.txt file to take the
categories into account.
Thanks again for your answer.
From: lee carroll
To: solr-user@lucene.apache.o
Hi Tomás,
Thank you very much for your suggestion. I took another crack at it using
your recommendation and it worked ideally. The only thing I had to change
was
to
The first did not produce any results but the second worked beautifully.
Thanks!
Brian Lamb
2011/5/31 Tomás Fernánde
You might have more luck going the other way, applying the
field collapsing patch to trunk. This is currently being worked
on, see:
https://issues.apache.org/jira/browse/SOLR-2564
Best
Erick
On Wed, Jun 1, 2011 at 12:22 AM, Isha Garg wrote:
> Hi,
> Actually currently I am using solr ve
Take a look here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
I think you want generateWordParts=1, catenateWords=1 and preserveOriginal=1,
but check it out with the admin/analysis page.
Oh, and your index-time and query-time patterns for WDFF will
If I read this correctly, one approach is to specify an
increment gap in a multiValued field, then search for phrases
with a slop less than that increment gap. i.e.
incrementGap=100 in your definition, and search for
"apple orange"~99
If this is gibberish, please post some examples and we'll
try s
Could you post one of your pairs of definitions? Because
I don't recognize queryMode and a web search doesn't turn
anything up, so I'm puzzled.
Best
Erick
On Wed, Jun 1, 2011 at 1:13 AM, Mike Schultz wrote:
> We have very long schema files for each of our language dependent query
> shards. One
Lee,
Thank you very much for your answer.
Using the signature field as the uniqueKey is effectively what I was
doing, so the "overwriteDupes=true" parameter in my solrconfig was
somehow redundant, although I wasn't aware of it! =D
In practice it works perfectly and that's the nice part.
By
On 6/1/2011 6:12 AM, pravesh wrote:
SOLR wiki will provide help on this. You might be interested in pure Java
based replication too. I'm not sure,whether SOLR operational will have this
feature(synch'ing only changed segments). You might need to change
configuration in searchconfig.xml
Yes, thi
You may be interested in Solr's replication feature?
http://wiki.apache.org/solr/SolrReplication
On 6/1/2011 2:07 AM, wrote:
Hi everyone,
If I have two server ,their indexes should be synchronized. I changed A's
index via HTTP send document objects, Is there any config or some plug-in
Hi guys,
Just to let you know we're meeting up to talk all-things-search on Monday
13th June. There's usually a good mix of backgrounds and experience levels
so if you're free and in the London area then it'd be good to see you there.
Details:
7pm - The Elgin - 96 Ladbrooke Grove
http://www.meetu
My OS is also CentOS (5.4). If it were 10gb all the time it would be
ok, but it grows for 13-15gb, and hurts other services =\
> It could be environment specific (specific of your "top" command
> implementation, OS, etc)
> I have on CentOS 2986m "virtual" memory showing although -Xmx2g
> You
Thanks for your point. I was really tripping that issue. But Now I need a
bit help more.
As far I have noticed that in the case of a value like "*role_delete*" ,
WordDelimiterFilterFactory
index two words like "*role"* and "*delete"* and in both search result with
the term "*role*" and "*delete*" w
Thats pretty awesome. Thanks Renaud!
On Tue, 2011-05-31 at 22:56 +0100, Renaud Delbru wrote:
> Hi,
>
> have a look at the flexible query parser of lucene (contrib package)
> [1]. It provides a framework to easily create different parsing logic.
> You should be able to access the AST and to mod
On Tue, 31 May 2011 19:38 -0700, "Jason Rutherglen"
wrote:
> Mark,
>
> Nice email address. I personally have no idea, maybe ask Shay Banon
> to post an answer? I think it's possible to make Solr more elastic,
> eg, it's currently difficult to make it move cores between servers
> without a lot
SOLR wiki will provide help on this. You might be interested in pure Java
based replication too. I'm not sure,whether SOLR operational will have this
feature(synch'ing only changed segments). You might need to change
configuration in searchconfig.xml
--
View this message in context:
http://lucene
>>We're using Solr to search on a Shop index and a Product index
Do you have 2 separate indexes (using distributed shard search)?? I'm sure
you are actually having only single index.
>> Currently a Shop has a field `shop_keyword` which also contains the
>> keywords of the products assigned to it.
Thanks pravesh ^_^
You said "BTW, SOLR1.4+ ,also has feature where only the changed segment gets
synched".
Can you give me a document or some detail information please ? I've looked up
at online documents but didn't find any information .
Thanks very much .
发件人: pravesh
发送时间: 2011-06-01
If your index size if smaller (a few 100 MBs), you can consider the SOLR's
operational script tools provided with distribution to sync indexes from
Master to Slave servers. It will update(copies) the latest index snapshot
from Master to Slave(s). SOLR wiki provides good info on how to set them as
C
I don't think you can assign a synonyms file dynamically to a field.
you would need to create multiple fields for each lang / cat phrases
and have their own synonyms file referenced for each field. that would
be a lot of fields.
On 1 June 2011 09:59, Spyros Kapnissis wrote:
> Hello to all,
Hello to all,
I have a collection of text phrases in more than 20 languages that I'm indexing
in solr. Each phrase belongs to one of about 30 different phrase categories. I
have specified different fields for each language and added a synonym filter at
query time. I would however like the syno
Hi all,
We're using Solr to search on a Shop index and a Product index. Currently a
Shop has a field `shop_keyword` which also contains the keywords of the
products assigned to it. The shop keywords are separated by a space.
Consequently, if there is a product which has a keyword "apple" and anot
Well, I recently chose it for a personal project and the deciding
thing for me was that it had nice integration to couchdb.
Thanks,
Bryan Rasmussen
On Wed, Jun 1, 2011 at 4:33 AM, Mark wrote
> I've been hearing more and more about ElasticSearch. Can anyone give me a
> rough overview on how these
78 matches
Mail list logo