It takes about one hour to replacate 6G index for solr in my env. But my
network can transfer file about 10-20M/s using scp. So solr's http replcation
is too slow, it's normal or I do something wrong?
we have an identical-sized index and it takes ~5minutes
It takes about one hour to replacate 6G index for solr in my env. But
my network can transfer file about 10-20M/s using scp. So solr's http
replcation is too slow, it's normal or I do something wrong?
Not really. The problem here is that to perform this raw, you'd need
to enumerate every term in the index, which is pretty slow.
One solution is to use one of the ngram tokenizers, probably the
NGramFilterFactory to process the output of your tokenizers. Here's a
related place to start...
http://w
In what? Where? What's the problem you're seeing? Why do you ask?
Please review: http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Fri, Oct 29, 2010 at 4:19 AM, Tharindu Mathew wrote:
> Hi,
>
> How come $subject is present??
>
> --
> Regards,
>
> Tharindu
>
O, I didn't realize that, thanks!
Erick
On Sat, Oct 30, 2010 at 10:27 PM, Lance Norskog wrote:
> Hi-
>
> NOW does not get re-run for each document. If you give a large upload
> batch, the same NOW is given to each document.
>
> It would be handy to have an auto-incrementing date field, so t
I guess that depends on what you mean by re-index, but here are some
guesses.
All of them share the assumption that you can determine #what# you want to
index from the various sites. That is, you have some way of identifying
the content you care about.
Solr won't help you at all in identifying wha
Lance Norskog [goks...@gmail.com] wrote:
> It would be handy to have an auto-incrementing date field, so that
> each document would get a unique number and the timestamp would then
> be the unique ID of the document.
If someone want to implement this, I'll just note that the granilarity of Solr
d
Hmm - personally, I wouldn't want to rely on timestamps as a unique-id
generation scheme. Might we not one day want to have distributed
parallel indexing that merges lazily? Keeping timestamps unique and in
sync across multiple nodes would be a tough requirement. I would be
happy simply havin
I have a city named 's-Hertogenbosch
I want it to be indexed exactly like that, so "'s-Hertogenbosch" (without
"")
But now I get:
1
1
1
What filter should I add/remove from my field definition?
I already tried a new fieldtype with just this, but no luck:
On Sun, Oct 31, 2010 at 12:12 PM, PeterKerk wrote:
>
> I have a city named 's-Hertogenbosch
>
> I want it to be indexed exactly like that, so "'s-Hertogenbosch" (without
> "")
>
> But now I get:
>
>1
>1
>1
>
>
> What filter should I add/remove from my field definition?
>
I already tried the normal string type, but that doesnt work either.
I now use this:
But that doesnt do it either...what else can I try?
Thanks!
--
View this message in context:
http://lucene.472066.n3.nabble.com/indexing-tp1816969p1817298.html
Sent from the So
Ah haaa. I see now. :-)
I didn't make that connection. Hopefully I would hbave before I ever tried to
implement that :-)
Kind of like user names and icons on a windows login :-)
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is us
Thanks Erick.
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so you do not have to make them
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
E
Even microseconds may not be enough on some really good, fast machine.
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so you do not have to make them
yourself. from 'http://blo
One way to view how your Tokenizers/Filters chain transforms your input
terms, is to use the analysis page of the Solr admin web application. This
is very handy when troubleshooting issues related to how terms are indexed.
On 31 October 2010 17:13, PeterKerk wrote:
>
> I already tried the normal
Thanks Eric. For the record, we are using 1.4.1 and SolrJ.
On 31 October 2010 01:54, Erick Erickson wrote:
> What version of Solr are you using?
>
> About committing. I'd just let the solr defaults handle that. You configure
> this in the autocommit section of solrconfig.xml. I'm pretty sure thi
Hi,
I'm trying to implement paging when grouping is on.
Start parameter works, but the result contains all the documents that were
before him.
http://localhost:8983/solr/select?q=test&group=true&group.field=marketplaceId&group.limit=1&rows=1&start=0(I
get 1 document).
http://localhost:8983/solr/
Ah, seems you're just one day behind. SOLR-2207, paging with field collapsing,
has just been resolved:
https://issues.apache.org/jira/browse/SOLR-2207
> Hi,
>
> I'm trying to implement paging when grouping is on.
>
> Start parameter works, but the result contains all the documents that were
>
Oh, and see the just updated wiki page as well:
http://wiki.apache.org/solr/FieldCollapsing
> Ah, seems you're just one day behind. SOLR-2207, paging with field
> collapsing, has just been resolved:
> https://issues.apache.org/jira/browse/SOLR-2207
>
> > Hi,
> >
> > I'm trying to implement pagin
Dennis Gearon [gear...@sbcglobal.net] wrote:
> Even microseconds may not be enough on some really good, fast machine.
True, especially since the timer might not provide microsecond granularity
although the returned value is in microseconds. However, an unique timestamp
generator should keep trac
Did you restart solr after the changes? Did you reindex? Because the string
type
should do what you want.
And you've shown us definitions. What are you using with
them?
Best
Erick
On Sun, Oct 31, 2010 at 1:13 PM, PeterKerk wrote:
>
> I already tried the normal string type, but that doesnt wo
Another approach for this problem is to use another Solr core for
storing users queries for auto complete functionality ( see
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
) and index not only user_query field, but also transliterated and
diff_l
Is there an issue running Solr in /home/lib as opposed to running it
somewhere outside of the virtual hosts like /lib?
Eric
Hi,
I've got some basic usage / design questions.
1. The SolrJ wiki proposes to use the same CommonsHttpSolrServer
instance for all requests to avoid connection leaks.
So if I create a Singleton instance upon application-startup I can
securely use this instance for ALL queries/updates t
Can you expand on your question? Are you having a problem? Is this idle
curiosity?
Because I have no idea how to respond when there is so little information.
Best
Erick
On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin wrote:
> Is there an issue running Solr in /home/lib as opposed to running it
>
Hi,
Thank you. This is more than idle curiosity. I am trying to debug an issue I
am having with my installation and this is one step in verifying that I have
a setup that does not consume resources. I am trying to debunk my internal
myth that having Solr nad Nutch in a virtual host would be causin
What do you actually want to do? Give an example of a string that would be
found in the source document (to index), and a few queries that you want to
match it (and that presumably aren't matching it with the methods you've tried,
since you say "it doesn't work")
Both a string type or a text ty
What servlet container are you putting your Solr in? Jetty? Tomcat? Something
else? Are you fronting it with apache on top of that? (I think maybe you are,
otherwise I'm not sure how the phrase 'virtual host' applies).
In general, Solr of course doesn't care what directory it's in on disk, so
Excellent information. Thank you. Solr is acting just fine then. I can
connect to it no issues, it indexes fine and there didn't seem to be any
complication with it. Now I can rule it out and go about solving, what you
pointed out, and I agree, to be a java/nutch issue.
Nutch is a crawler I use to
If you are copying from an indexer while you are indexing new content,
this would cause contention for the disk head. Does indexing slow down
during this period?
Lance
2010/10/31 Peter Karich :
> we have an identical-sized index and it takes ~5minutes
>
>
>> It takes about one hour to replacate
2.
The SolrJ library handling of content streams is "pull", not "push".
That is, you give it a reader and it pulls content when it feels like
it. If your software to feed the connection wants to write the data,
you have to either buffer the whole thing or do a dual-thread
writer/reader pair.
The e
With virtual hosting you can give CPU & memory quotas to your
different VMs. This allows you to control the Nutch v.s. The World
problem. Unforch, you cannot allocate disk channel. With two i/o bound
apps, this is a problem.
On Sun, Oct 31, 2010 at 4:38 PM, Eric Martin wrote:
> Excellent informat
Oh. So I should take out the installations and move them to / as
opposed to inside my virtual host of /home//www
'
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Sunday, October 31, 2010 7:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr in virtual host a
33 matches
Mail list logo