How many documents are in the index?
If you haven't already done this I'd take a really close look at your
schema and make sure you're only storing the things that should really
be stored, same with the indexed fields. I drastically reduced my
index size just by changing some indexed/stored option
Sorry, I've figured out my own problem. There is a problem with the
way I create the xml document for indexing that was causing some of
the "comments" fields to not be listed correctly in the default search
field, "content".
On 10/12/07, Kevin Lewandowski <[EMAIL PRO
I've found an odd situation where solr is not returning all of the
documents that I think it should. A search for "Geckoplp4-M" returns 3
documents but I know that there are at least 100 documents with that
string.
Here is an example query for that phrase and the result set:
http://localhost:9020/
index created by nutch so small in comparison (about 27 mb
> approx) but it still returns snippets!
Are you storing the complete html? If so I think you should strip out
the html then index the document.
>
> On 10/9/07, Kevin Lewandowski <[EMAIL PROTECTED]> wrote:
> >
ize now.
Kevin
On 8/20/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
>
> On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote:
>
> > Are there any tips on reducing the index size or what factors most
> > impact index size?
> >
> > My index has 2.7 million documents an
on and
> score if the XML response is really big)
>
>
> : Date: Fri, 5 Oct 2007 11:21:48 -0700
> : From: Kevin Lewandowski <[EMAIL PROTECTED]>
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: strange sorting problem
> :
&g
I'm having a problem with sorting on a certain field. In my schema.xml
it's defined as a string (not analyzed, indexed/stored verbatim). But
when I look at my results (sorted on that field ascending) I get
things like the following:
Yr City's A Sucker
Movement b/w Yr City's A Sucker
X, Y & Sometim
Are there any tips on reducing the index size or what factors most
impact index size?
My index has 2.7 million documents and is 200 gigabytes and growing.
Most documents are around 2-3kb and there are about 30 indexed fields.
thanks,
Kevin
data_dir set up correctly in conf/scripts.conf?
That's where
snappuller puts the snapshots.
Bill
On 7/12/07, Kevin Lewandowski <[EMAIL PROTECTED]> wrote:
>
> I've been running solr replication for several months with no issues
> but recently had an instance where snappuller was
I've been running solr replication for several months with no issues
but recently had an instance where snappuller was running for about
1.5 hours. rsync was still active, so it was still copying data. I
also noticed that there was a snapshot.200707 directory inside of
the main index directory
snapshooter does create incremental builds of the index. It doesn't
appear so if you look at the contents because the existing files are
hard links. But it is incremental.
On 4/20/07, Doss <[EMAIL PROTECTED]> wrote:
Hi Yonik,
Thanks for your quick response, my question is this, can we take incr
I recommend you build your query with facet options in raw format and
make sure you're getting back the data you want. Then build it into
your app.
On 4/18/07, Jennifer Seaman <[EMAIL PROTECTED]> wrote:
Does anyone have any sample code (php, perl, etc) how to setup facet
browsing with paging? I
Thanks for sharing the info, Cass. Is eBay still using Texis? (this used to be
obvious from eBay's URLs a few years ago). I used Texis with their Vortex
script before Lucene was born.
I'd guess no. I read a PDF about ebay's architecture a few months ago
and it said all of the search stuff wa
snapshooter copies all files but most files in the snapshot
directories are hard links pointing to segments in the main index
directory. So only new segments end up getting copied.
We've been running replication on discogs.com for several months and
it works great.
On 2/13/07, escher2k <[EMAIL P
This should explain most everything:
http://wiki.apache.org/solr/CollectionDistribution
I've been running solr replication on discogs.com for a few months and
it works great!
Kevin
On 1/23/07, S Edirisinghe <[EMAIL PROTECTED]> wrote:
Hi,
I just started looking into solr. I like the features t
Yes! There's no shortage of puns when using solr. We're always talking
about "creating a solr system" or "one of the solr systems is down" :)
On 12/21/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
It's all about "sol(a)r", ya know? More day light, please!
OK, this may fix it:
https://issues.apache.org/jira/browse/SOLR-77
A war with this patch included is here:
http://people.apache.org/~yonik/solr/current/solr.war
You also need to configure some queries to be done on the firstSearcher event
in solrconfig.xml. Uncomment and customize the example o
Hmmm, on most Linux/UNIX systems, sending the QUIT signal does nothing
else but generate a stack trace to the console or a log file. If you
don't start tomcat by hand, the stack trace may go somewhere else I
suppose. This would be useful to learn how to do on your particular
system (and we shoul
accept connections for 3 or 4 hours ... did you try taking some thread
dumps like yonik suggested to see what all the threads were doing?
A kill -3 will not kill the process. It does nothing and there's no
thread dump on the console. kill -9 does kill it though.
btw, this has been a bigger prob
> My solr installation has been running fine for a few weeks but now
> after a server reboot it starts and runs for a few seconds, then stops
> responding. I don't see any errors in the logfiles, apart from
> snapinstaller not being able to issue a commit. Also, the process is
> using 100% cpu and
My solr installation has been running fine for a few weeks but now
after a server reboot it starts and runs for a few seconds, then stops
responding. I don't see any errors in the logfiles, apart from
snapinstaller not being able to issue a commit. Also, the process is
using 100% cpu and stops res
In the admin interface, if you click statistics, there's a cache section.
On 11/29/06, Tom <[EMAIL PROTECTED]> wrote:
Hi -
I'm starting to try to tune my installation a bit, and I'm looking
for cache statistics. Is there a way to peek into a running
installation, and see what my cache stats are
On Discogs I'm running Solr with two slaves and one master, using the
distribution scripts. The slaves pull and install a new snapshot every
five minutes and this is working very well so far.
Are there any risks with reducing this window to every one or two
minutes? With large caches could the au
I've been using Solr for keyword search on Discogs.com for a few
months with great results.
As of today Solr is running under Tomcat on a single dedicated box.
It's a 2.66Ghz P4, with 1 gig ram. The index has about 1.2 million
documents and is 1.2 gigs in size. This machine handles 250,000
querie
I had the very same article in mind - how would it be simpler in Solr
than in Lucene? A spellchecker is pretty much standard in every major
I meant it would be a simpler implementation in Solr because you don't
have to deal with java or any Lucene API's. You just create a document
for each "corr
Thanks for the help! I the problem was I was not using "ulimit -n".
It's back to normal now.
thanks,
Kevin
On 10/30/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 10/30/06, Kevin Lewandowski <[EMAIL PROTECTED]> wrote:
> I'm no longer able to add new d
I have not done one but have been planning to do it based on this article:
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html
With Solr it would be much simpler than the java examples they give.
On 10/30/06, Michael Imbeault <[EMAIL PROTECTED]> wrote:
Hello everyone,
Has anybody succ
I'm no longer able to add new data or optimize my index. There are
currently 1600 files in the index directory and it's about 1.1gb. I've
tried changing solrconfig.xml to use the compound file format and that
didn't make a difference. My ulimit is "unlimited" but I've tried
setting it at 100 a
I've searched the docs but could not find an answer. Is this field
microseconds or milliseconds?
thanks,
Kevin
I've had a problem similar to this and it was because of the
schema.xml. It was valid XML but there were some incorrect field
definitions and/or the default field listed was not a defined field.
I'd suggest you start with the default schema and build on it piece by
piece, each time testing for th
No, after you add new documents you simply issue a command
and the new docs are searchable.
On Discogs.com we have just over 1 million docs in the index and do
about 20,000 updates per day. Every 15 minutes we read a queue and add
new documents, then commit. And we optimize once per day. I've ha
On the performace wiki page it mentions a test box with 16GB ram. Did
anything special need to be done to use that much ram (with the OS or
java)? Would Solr on a system with Linux x86_64 and Tomcat be able to
use that much ram? (sorry, I don't know Java so I don't know if there
are any limitation
with AND
and three results with OR.
I recommend you try this same scenario out with the tutorial example
data and ensure things work as I've stated here. Let us know more
details if the problem persists.
Erik
On Sep 26, 2006, at 11:02 PM, Kevin Lewandowski wrote:
> I'm
I'm running the latest nightly build (2006-09-27) and cannot seem to
get the q.op parameter working. I have the default operator set to AND
and am testing with a two word query that returns no results. If I add
"OR" to the query I get results. But if I remove the OR and add
"q.op=OR" to the Solr q
Is it possible that the facets can be based on the contents of an
entire field instead of the terms?
For example say I have a document with this field:
Hip Hop
A facet query on the genre field returns:
1
1
but I'd like it to return:
1
thanks,
Kevin
this was all just config file changes though right, you didn't need to
write any new javacode to load into solr to make those work did you?
That's right. It was all config changes and no new java code, which is
a plus since I've never coded in java :)
Kevin
if i may ask: did you customize the Solr code at all (ie: are you using
any custom request handlers, field types or your own Similarity class) ?
... if not, which request handler are you using (Standard or DisMax) ?
I'm using the Solr from the nightly build, with Standard request
handler, and ha
"Main search engine" would be the search feature, but not
browsing/category listing?
That's correct, just the search function, though I'm looking into
using Solr for other types of browsing.
Are you using Solr for all data storage and search? Or a RDBMS? If so,
what is the split?
All data
I just wanted to say thanks to the Solr developers.
I'm now using Solr for the main search engine on Discogs.com. I've
been through five revisions of the search engine and this was
definitely the least painful. Solr gives me the power of Lucene
without having to deal with the guts. It made for a
You might want to look at acts_as_searchable for Ruby:
http://rubyforge.org/projects/ar-searchable
That's a similar plugin for the Hyperestraier search engine using its
REST interface.
On 8/28/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
I've spent a few hours tinkering with an Ruby ActiveRecord
40 matches
Mail list logo