Hi,
I think your needs would meet better with Distributed Search http://wiki.apache.org/solr/DistributedSearch
Which allows sharding to live on different servers and will search
across all of those shard when a query comes in. There are a few patch
which will hopefully be available in the S
To scale solr, take a look to this article
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr
Juan Pedro Danculovic
CTO - www.linebee.com
On Thu, Feb 11, 2010 at 4:12 AM, abhishes wrote:
>
> Suppose I am indexing very large data (5 billion rows
Suppose I am indexing very large data (5 billion rows in a database)
Now I want to use the Solr Core feature to split the index into manageable
chunks.
However I have two questions
1. Can Cores reside on difference physical servers?
2. when a query comes, will the query be answered by index i
> Claudio - fields with '-' in them can be problematic.
Why's that?
On Wed, Feb 10, 2010 at 2:38 PM, Otis Gospodnetic
wrote:
> Claudio - fields with '-' in them can be problematic.
>
> Side comment: do you really want to search across all languages at once? If
> not, maybe 3 different dismax c
It appears the hl.maxAlternateFieldLength parameter default setting in
solrconfig.xml does not take effect. I can only get it to work by explicitly
sending the parameter via the client request. It is not big deal but it
appears to be a bug.
--
View this message in context:
http://old.nabble.com/
:
http://wiki.apache.org/solr/FAQ#How_can_I_rebuild_my_index_from_scratch_if_I_change_my_schema.3F
:
: but it is not clear about the times when this is needed. So I wonder, do I
: need to do it after adding a field, removing a field, changing field type,
: changing indexed/stored/multiValue prop
: I tried your suggestion, Hoss, but committing to the new coordinator
: core doesn't change the indexVersion and therefore the ETag value isn't
: changed.
Hmmm... so the "empty" commit doesn't change the indexVersion? ... i
didn't realize that.
Well, I suppose you could replace your empty comm
I am perplexed by the behavior I am seeing of the Solr Analyzer and Filters
with regard to Underscores.
I am trying to get rid of underscores('_') when shingling, but seem unable
to do so with a Stopwords Filter.
And yet underscores are being removed when I am not even trying to by the
WordDelimi
: i want to recompile lucene with
: http://issues.apache.org/jira/browse/LUCENE-2230, but im not sure
: which source tree to use, i tried using the implied trunk revision
: from the admin/system page but solr fails to build with the generated
: jars, even if i exclude the patches from 2230...
Hmm
Hi All,
I found out there is file corruption issue by using both "EmbeddedSolrServer" &
"Solr 1.4 Java based replication" together in slave server.
In my slave server, I have 2 webapps in a tomcat instance.
1) "multicore" webapp with slave config
2) "my custom" webapp using EmbeddedSolrServer
Is it possible to do query elevation based on field?
Basically, I would like to search the same term on three different
fields:
q=field1:term OR field2:term OR field3:term
and I would like to sort the results by fourth field
sort=field4+asc
However, I would like to elevate all
: Okay. So we have to leave this question open for now. There might be
: other (more advanced) users that can answer this question. It's for
: sure, the solution we found is not quite good.
The question really isn't "open", it's a FAQ...
http://wiki.apache.org/solr/FAQ#How_can_I_get_ALL_the_ma
: NOTE: Please start a new email thread for a new topic (See
: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking)
FWIW: I'm the most nit-picky person i know about Thread-Hijacking, but i
don't see any MIME headers to indicate that Jose did that).
: > If i follow this path can i then
: Subject: Indexing / querying multiple data types
: In-Reply-To: <8cf3f00d0572f8479efcd0783be11eb1927...@xmb-rcd-104.cisco.com>
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing
: Subject: How to configure multiple data import types
: In-Reply-To: <4b6c0de5.8010...@zib.de>
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh e
Check out the configuration of WordDelimiterFilterFactory in your
schema.xml.
Depending on your settings, it's probably tokenizaing 13th into "13" and
"th". You can also have them concatenated back into a single token, but I
can't remember the exact parameter. I think it could be catenateAll.
O
I'm using the standard "text" type for a field, and part of the data
being indexed is "13th", as in "Friday the 13th".
I can't seem to get it to match when I'm querying for "Friday the 13th"
either quoted or not.
One thing that does match is "13 th" if I send the search query with a
space between
Claudio - fields with '-' in them can be problematic.
Side comment: do you really want to search across all languages at once? If
not, maybe 3 different dismax configs would make your searches better.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :
FYI this does not work. It appears that the update seems to run on a
different thread to the analysis, perhaps because the update is done
when the commit happens? I'm sending the document XML with
commitWithin="6".
I would appreciate any other ideas. I'm drawing a blank on how to
implement
Hi all,
I had DataImportHandler working perfectly on Solr 1.4 nightly build from
June 2009. I upgraded the Solr to 1.4 release and started getting errors:
Caused by: com.mysql.jdbc.exceptions.MySQLNonTransientConnectionException:
Server connection failure during transaction. Due to underlying
Lance
after a bit more reading - & cleaning up my configuration (case sensitivity
corrected but didn't appear to be affecting the indexing & i don't use the
atomID field for querying anyhow)
I've added a docType field when I index my data and now use the fq parameter to
filter on that new fiel
Hello list,
I have a corpus with 3 languages, so i setup a text content field (with
no stemming) and 3 text-[en|it|de] fields with specific snowball stemmers.
i copyField the text to my language-away fields. So, I setup this dismax
searchHandler:
dismax
title^1.2 content-en^0.8 content-it
Thanks,
I bypassed haproxy as a test and it did reduce the number of connections -
but it did not seem as those these connections were hurting anything.
Ian.
On Tue, Feb 9, 2010 at 11:01 PM, Lance Norskog wrote:
> This goes through the Apache Commons HTTP client library:
> http://hc.apache.org
Hi Joe,
See this recent thread from a user with a very similar issue:
http://old.nabble.com/No-wildcards-with-solr.ASCIIFoldingFilterFactory--td24162104.html
In the above thread, Mark Miller mentions that Lucene's AnalyzingQueryParser
should do the trick, but would need to be integrated into So
Joe Calderon-2 wrote:
>
> you can do that very easily yourself in a post processing step after
> you receive the solr response
>
true (and am already doing so).
was thinking that having this done as part of the field collapsing code, it
might be faster than doing so via post processing (ie no
sorry, what i meant to say is apply text analysis to the part of the
query that is wildcarded, for example if a term with latin1 diacritics
is wildcarded ide still like to run it through ISOLatin1Filter
On Wed, Feb 10, 2010 at 4:59 AM, Fuad Efendi wrote:
>> hello *, quick question, what would i h
you can do that very easily yourself in a post processing step after
you receive the solr response
On Wed, Feb 10, 2010 at 8:12 AM, gdeconto
wrote:
>
> I have been able to apply and use the solr-236 patch (field collapsing)
> successfully.
>
> Very, very cool and powerful.
>
> My one comment/conc
I have been able to apply and use the solr-236 patch (field collapsing)
successfully.
Very, very cool and powerful.
My one comment/concern is that the collapseCount and aggregate function
values in the collapse_counts list only represent the collapsed documents
(ie the ones that are not shown in
Hi,
There is a solution to update via DIH, but is there also a way to define a
query that fetches id's for documents that should be removed?
regards,
Lukas Kahwe Smith
m...@pooteeweet.org
I meant, available in total, not what just what satisfies the particular query
you should have at least an estimate of the amount of total documents, even if
it grows daily
and if you are talking about millions of rows, and you are try to retrieve them
all, IMHO, not getting all of them will
Okay. So we have to leave this question open for now. There might be other
(more advanced) users that can answer this question. It's for sure, the
solution we found is not quite good.
In the meantime, I will look for a way to submit a feature request. :)
Original-Message
> D
Solr will not do this efficiently. Getting all rows will be very slow. Adding a
parameter will not make it fast.
Why do you want to do this?
wunder
On Feb 10, 2010, at 7:06 AM, ego...@gmx.de wrote:
> Setting the 'rows' parameter to a number larger than the number of documents
> available requ
Yes, I tried the q=&rows=-1 - the other day and gave up
But as you say it wouldn't help because you might get
a) timeouts because you have to wait a 'long' time for the large set of results
to be returned
b) exceptions being thrown because you're retrieving too much info to be thrown
around the
Setting the 'rows' parameter to a number larger than the number of documents
available requires that you know how much are available. That's what I intended
to retrieve via the LukeRequestHandler.
Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :)
But still, it feels li
2010/2/10 Jan Simon Winkelmann :
> I am (still) trying to get JMX to work. I have finally managed to get a Jetty
> installation running with the right parameters to enable JMX. Now the next
> problem appeared. I need to get Solr to register ist MBeans with the Jetty
> MBeanServer. Using service
just set the rows to a very large number, larger than the number of documents
available
useful to set the fl parameter with the fields required to avoid memory
problems, if each document contains a lot of information
- Original Message -
From: "stefan maric"
To: solr-user@lucene.a
Egon
If you first run your query with q=&rows=0
Then your you get back an indication of the total number of docs
Now your app can query again to get 1st n rows & manage forward|backward
traversal of results by subsequent queries
Regards
Stefan Maric
-Original Message-
From: ego..
Hi Stefan,
you are right. I noticed this page-based result handling too. For web pages it
is handy to maintain a number-of-results-per-page parameter together with an
offset to browse result pages. Both can be done be solr's 'start' and 'rows'
parameters.
But as I don't use Solr in a web contex
I am using SOLR 1.3 and my server is embedded and accessed using SOLRJ.
I would like to setup my searches so that exact matches are the first
results returned, followed by near matches, and finally token based
matches.
For example, if I have a summary field in schema which is created
using copyFiel
I was just thinking along similar lines
As far as I can tell you can use the parameters start & rows in combination to
control the retrieval of query results
So
http://:/solr/select/?q=
Will retrieve up to results 1..10
http://:/solr/select/?q=&start=11&rows=10
Will retrieve up results 11..20
How can we get the max and min date from the Solr index ? I would need these
dates to draw a graph ( for example timeline graph )
Also can we use date faceting to show how many documents are indexed every
month .
Consider I need to draw a timeline graph for current year to show how many
records
> hello *, quick question, what would i have to change in the query
> parser to allow wildcarded terms to go through text analysis?
I believe it is illogical. "wildcarded terms" will go through terms
enumerator.
2010/2/10 Jan Simon Winkelmann :
> I am (still) trying to get JMX to work. I have finally managed to get a Jetty
> installation running with the right parameters to enable JMX. Now the next
> problem appeared. I need to get Solr to register ist MBeans with the Jetty
> MBeanServer. Using service
Hi at all,
I'm working with Solr1.4 and came across the point, that Solr limits the number
of documents retrieved by a solr response. This number can be changed by the
common query parameter 'rows'.
In my scenario it is very important that the response contains ALL documents in
the index! I pl
Hello,all!
I have some problem with spellcheck! I download,build and connect
dictionary(~500 000 words)!It work fine! But i have suggestions for any word
(even correct word)! Is there possible to get suggestion only for wrong
word?
--
View this message in context:
http://old.nabble.com/spellch
Hi,
I am (still) trying to get JMX to work. I have finally managed to get a Jetty
installation running with the right parameters to enable JMX. Now the next
problem appeared. I need to get Solr to register ist MBeans with the Jetty
MBeanServer. Using , Solr doesn't
complain on loading, but the
Yes, more details would be great...
Is this easily repeated?
The exists?=false is particularly spooky.
It means, somehow, a new segment was being flushed, containing 1285
docs, but then after closing the doc stores, the stored fields index
file (_X.fdx) had been deleted.
Can you turn on IndexWr
Hi,
its would be possible to add that to the main solr but the problem is:
Lets face it (example):
We have kind of 1.5 million documents in the solr master. These Documents are
books.
These books have fields like title, ids, numbers and authors and more.
This solr is global.
Now: The slave solr
48 matches
Mail list logo