Yep but we hire these god damn boxes and then my friend memory costs per
month = not cheap in long term. Something like 50$ / month for 2G more.
I might be an ultra geek when it comes to Linux and programming but I'm not
an ultra-geek building servers from scratch. But I will straighten up and
buy
On 22-Jan-08, at 9:46 PM, Marcus Herou wrote:
OK I got the conclusion myself. add memory to the box and get some
more
boxes :)
I'm glad you've come to that conclusion, but to reinforce it: Solr/
Lucene heavily benefits from loads of memory. Not just for Solr
caching, but it also depends
Thanks!
Yes I agree (to a certain level) on me being naive. Currently I'm only using
one server for this but will go into distributed snapshot/pull mode soon.
Then I can tune the slaves differently then the master I believe. The master
for instance do not need autowarming nor caches if not searche
On Jan 22, 2008 9:29 PM, Jonathan Ariel <[EMAIL PROTECTED]> wrote:
> If I'm going to have nearly always one value and in some cases 4 or 5 values
> I would feel the penalty when faceting? Does it depends on the amount of
> values in my field? For those documents that I'm going to have just one
> va
Thanks!
So there is just one penalty when faceting, which is my case.
"TermEnum is good for a limited number of different indexed terms in the
field, and allows multiple terms per field per document"
How much is limited number of different indexed terms in the field?
If I'm going to have nearly alw
On 22-Jan-08, at 4:20 PM, Phillip Farber wrote:
We would need all 7M ids scored so we could push them through a
filter query to reduce them to a much smaller number on the order
of 100-10,000 representing just those that correspond to items in a
collection.
You could pass the filter to S
: I'm new to Solr and Tomcat and I'm trying to track down some odd errors.
: How do I set up Tomcat to do fine-grained Solr-specific logging? I have
: looked around enough to know that it should be possible to do per-webapp
: logging in Tomcat 5.5, but the details are hard to follow for a newbie
: details. This is a simple join in the db. But how do we achieve this in
: Solr. The problem is when personal details are changed we will have to
: update all 5 resumes.
that is in a nutsehll what you need to do.
>From the perspective of clients, a Solr index is a very flattened date
structur
Just to add another wrinkle, how clean is your OCR? I've seen it
range from very nice (i.e. 99.9% of the words are actually words) to
horrible (60%+ of the "words" are nonsense). I saw one attempt
to OCR a family tree. As in a stylized tree with the data
hand-written along the various branches in e
Otis Gospodnetic wrote:
Hi,
Some quick notes, since it's late here.
- You'll need to wait for SOLR-303 - there is no way even a big machine will be
able to search such a large index in a reasonable amount of time, plus you may
simply not have enough RAM for such a large index.
Are you bas
On Jan 23, 2008, at 12:47 AM, Brian Whitman wrote:
$document->title = 'Some Title';
$document->content = 'Some content for this wonderful document.
Blah blah blah.';
did you change the schema? There's no title or content field in the
default example schema. But I believe solr d
$document->title = 'Some Title';
$document->content = 'Some content for this wonderful document.
Blah blah blah.';
did you change the schema? There's no title or content field in the
default example schema. But I believe solr does output different
errors for that.
Hi (again)
I'm trying to add documents using the SolrPhpClient (if there's a
specific mailinglist for it, please let me know and I'll ask there
instead).
I've searched the net for "missing content stream", but found nothing
that makes sense.
This is what solr spits out when I run the ex
: Hi,
: Do you know if there is a performance impact when using multivalued dynamic
: fields when it's not always necessary to store more than one value?
http://www.nabble.com/Performance-penalty-for-Multivalued-field--to9496992.html
-Hoss
: I get OOE with Solr 1.3 Autowarm seem to be the villain in cojunction with
: FieldCache somehow.
: JVM args: -Xmx512m -Xms512m -Xss128k
:
: Index size is ~4 Million docs, where I index text and store database primary
it seems naive to me to only allow 512MB for an index of 4 million docs --
n
: I am trying to apply style sheet to result xml by passing argument like
: stylesheet=tabular.xml . but it complained
: that stylesheet may be empty.. when i checked the source code for XMLwriter
: .. its lookin under /admin
as noted in the wiki the stylesheet param is (vastly) discouraged ...
I am using xalan:tokenize in an xsl that transforms solr output and the
stylesheet is failing to compile. Any ideas? I am sure its straightforward. Any
help appreciated.
Regards,
Sean
_
Get Hotmail on your mobile, text MSN to 63
Hello,
We've got some memory constraint worries from using Java RMI, although I can
see this problem could effect the xml requests too. The Java code doesn't seem
to handle large files as streams. Now we're thinking that there are two
possible solutions, either the exists or we create a file pa
Solr does not now do this. I don't know if the Solr processing stack has
this flexibility, or if it is worth adding it.
I understand every example you have suggested -- i just don't get how it
isn't possible. Can you post an exampe of the schema+commands that give
you an error?
If your go
I am trying to apply style sheet to result xml by passing argument like
stylesheet=tabular.xml . but it complained
that stylesheet may be empty.. when i checked the source code for XMLwriter
.. its lookin under /admin
private static final char[] XML_START1="\n".toCharArray();
private static fi
On 22-Jan-08, at 11:05 AM, Phillip Farber wrote:
Currently 1M docs @ ~1.4M/doc. Scaling to 7M docs. This is OCR so
we are talking perhaps 50K words total to index so as you point out
the index might not be too big. It's the *data* that is big not
the *index*, right?. So I don't think S
Thanks for your reply.
I set autowarmcount = 0 for both LRUCache and the queryCache but still I got
these errors on heavy reindexing (4M docs as fast as possible each doc <
10K). I removed firstSearcher and newSearcher but I still got the same
errors.
The strange thing is that now when the server
A more interesting use case:
Analyzing text and finding a number, like the mean word length or the mean
number of repeated words. These are standard tools for spam detection. To
create these, we would want to shovel text into a text processing chain that
creates an integer. We then want to both st
Queries involving sorting can occupy a lot of memory. During
autowarming you need 2x peak memory usage. The only thing you can do
is increase your max heap size or be careful about cache autowarming
(possibly turning it off).
cheers,
-Mike
On 21-Jan-08, at 9:44 PM, Marcus Herou wrote:
Hi,
Do you know if there is a performance impact when using multivalued dynamic
fields when it's not always necessary to store more than one value?
Since I'm going to add dynamic fields to my schema and I'm not sure if the
field will be multivalued or not, I thought about doing them multivalued. In
Obviously as the number of documents increase the index size must
increase to some degree -- I think linearly? But what index size will
result for 7M documents over 50K words where we're talking just 2 fields
per doc: 1 id field and one OCR field of ~1.4M? Ballpark?
Regarding single word qu
I did try with the latest nightly build and followed the steps outlined in
http://wiki.apache.org/solr/SpellCheckerRequestHandler
with regards to creating new catchall field 'spell' of type 'spell' and
copied my text fields to 'spell' at index time.
Still q=grapics returns 'graphics'
but q=grapic
I did try with the latest nightly build. The problem still exists.
I tested with the example data that comes with solr package.
1)with termsourcefield set to 'word' which is string fieldtype
q=iped nano returns 'ipod nano' which is good
2) with termsourcefield set to 'spell' (which is the c
Ryan McKinley wrote:
We are considering Solr 1.2 to index and search a terabyte-scale
dataset of OCR. Initially our requirements are simple: basic
tokenizing, score sorting only, no faceting. The schema is simple
too. A document consists of a numeric id, stored and indexed and a
large
same way you put any & in xml...
&
Jae Joo wrote:
In the firstsearch listner, I need to use special character "&" in the q
string, but it complains "Error - filterStart"
company_desc:"Advertising & Marketing"
0
20
company_name,
In the firstsearch listner, I need to use special character "&" in the q
string, but it complains "Error - filterStart"
company_desc:"Advertising & Marketing"
0
20
company_name, score
Thanks,
Jae Joo
31 matches
Mail list logo