Uri Boness wrote:
Well... yes, it's a tool the Nutch ships with. It also ships with an
example Solr schema which you can use.
hi,
is there any documentation to understand what going in the schema ?
dismax
explicit
0.01
content0.5 anchor1.0 title5.2
cont
The ReplicationHandler is not enforced as a singleton , but for all
practical purposes it is a singleton for one core.
If an instance (a slice as you say) is setup as a repeater, It can
act as both a master and slave
in the repeater the configuration should be as follows
MASTER
|_
We have an encoding problem with our solr application. That is, non-ASCII chars
displaying fine in SOLR, but in googledegook in our application .
Our tomcat server.xml file already contains URIencoding="UTF-8" under the
relevant .
A google search reveals that I should set the encoding for the J
On Aug 25, 2009, at 6:35 PM, Britske wrote:
Moreover, I can't seem to find the actual code in FacetComponent or
anywhere
else for that matter where the {!ex}-param case is treated. I assume
it's in
FacetComponent.refineFacets but I can't seem to get a grip on it..
Perhaps
it's late here..
: We are running an instance of MediaWiki so the text goes through a
: couple of transformations: wiki markup -> html -> plain text.
: Its at this last step that I take a "snippet" and insert that into Solr.
...
: doc.addField("text_snippet_t", article.getSnippet(1000));
ok, well first of
So basically the idea is to replace the underlying IndexReader currently
associated with a searcher/solrCore following an update without calling
commit explicitly? This will also have the effect of bringing in inserts
btw? or is it just usable for deletes?
In terms of cache invalidation etc there
But again, why someone has OOM??? I never had...
What I discovered is: committing millions docs (in SOLR-1.4) may take
several days (although adding docs takes a day) if you have somehow
_many_segments_ and bad I/O with <= 2 CPUs; I am using heavy ramBufferSizeMB
instead of heavy mergeFactor, and
> 1. Exactly which version of Solr / SolrJ are you using?
Solr Specification Version: 1.3.0
Solr Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 11:06:47
Latest SolrJ that I downloaded a couple of days ago.
> Can you put the orriginal (pre solr, pre solrj, raw untouched, etc..
On Tue, Aug 25, 2009 at 8:37 PM, Lance Norskog wrote:
> The latest Solr 1.4 can index 200k records in several minutes, then commit
> in a few seconds. I don't know but I'm guessing it is due to Lucene
> improvements. It does not use much memory doing this.
If you're using SolrJ, it's due to improv
Hi folks,
I'm writing some search component to Solr and I'm having some troubles with
the ResponseBuilder.
I'd like to add to response, eg: only 5 documents of a search.
My problem is when I try to add these docs to the ResponseBuilder.
A snipet of the code:
[...]
QParser parser = QParser.getPars
: Is there a way to calculate a theoretical max score for the current query?
there's been some discussion on this on the java-user list over the years
... the short answer is "yes it's possible, but only in very controlled
situations" ... as i recall it depended on limiting the set of possible
: I'm a bit new to solr and have the following problem, it's about events and
: venues.
: If a user types a name of a venue, then I'd like to return the exact match
: for the venue first and then the list of events taking place at this venue.
: Currently I have defined a document bound to a databa
: Which versions of Lucene, Nutch and Solr work together? I've
: discovered that the Nutch trunk and the Solr trunk use wildly
: different versions of the Lucene jars, and it's causing me problems.
The Solr and Nuthc projects don't really target any sort of strict binary
compatibility with each
Matthew: did you ever resolve your issue?
I'm not an expect on the distributed searching code, but there's no reason
I know of why a basic "OR" type query should fail just becuase you're
using hte shards param.
Are you sure both of your solr instances (solr-archway and solr-portal)
are using th
The latest Solr 1.4 can index 200k records in several minutes, then commit
in a few seconds. I don't know but I'm guessing it is due to Lucene
improvements. It does not use much memory doing this.
Lance
On Tue, Aug 25, 2009 at 2:43 PM, Fuad Efendi wrote:
> I do commit once a day, millions of sm
1. Exactly which version of Solr / SolrJ are you using?
2. ...
: I am using the SolrJ client to add documents to in my index. My field
: is a normal "text" field type and the text itself is the first 1000
: characters of an article.
Can you put the orriginal (pre solr, pre solrj
: 1) We found the indexing speed starts dipping once the index grow to a
: certain size - in our case around 50G. We don't optimize, but we have
: to maintain a consistent index speed. The only way we could do that
: was keep creating new cores (on the same box, though we do use
Hmmm... it seems
I can give an overview, IW.getReader replaces IR.reopen. So
you'd replace in SolrCore.getSearcher. However as per another
discussion IW isn't public yet, so all you'd need to do is
expose it from UpdateHandler. Then it should work as you want,
though there would need to be a new method to create a
There were two main reasons we went with multi-core solution,
1) We found the indexing speed starts dipping once the index grow to a
certain size - in our case around 50G. We don't optimize, but we have
to maintain a consistent index speed. The only way we could do that
was keep creating new cores
Jason,
sounds like a very promising change to me - so much that I would gladly work
toward creating a patch myself.
Are there any specific points in the code u could point me to if I wanna
look at how to start off implementing it?
Lucene/Solr Classes involved etc? i'll start looking myself anyhow
Hey there,
Apologies for this not going out sooner -- apparently it was sitting
as a draft in my inbox. A few of you have pinged me, so thanks for
your vigilance.
It's time for another Hadoop/Lucene/Apache Stack meetup! We've had
great attendance in the past few months, let's keep it up! I'm alwa
This will be implemented as you're stating when
IndexWriter.getReader is incorporated. This will carry over
deletes in RAM until IW.commit is called (i.e. Solr commit).
It's a fairly simple change though perhaps too late for 1.4
release?
On Tue, Aug 25, 2009 at 3:10 PM, KaktuChakarabati wrote:
>
>
hi,
I'm looking for a way to extend StatsComponent te recognize localparams
especially the {!ex}-param.
To my knowledge this isn't implemented in the current trunk.
One of my use-cases for this is to be able to have a javascript
price-slider, where the user can operate the slider and thus set a
Hey,
I was wondering - is there a mechanism in lucene and/or solr to mark a
document in the index
as deleted and then have this change reflect in query serving without
performing the whole
commit/warmup cycle? this seems to me largely appealing as it allows a kind
of solution
where deletes are sim
I have a valid xml document that begins:
mdp.39015052775379
2
Technology transfer and in-house R&D in Indian
industry : in the later 1990s / edited and with an introduction by Binay
Kumar Pattnaik. v.1
Not found
TECHNOLOGY
TRANSFER AND
IN.HOUSE R&D
IN
INDIAN
INDUSTRY
I believe Solr is throwi
I do commit once a day, millions of small docs... it takes 20 minutes in
average... why OOM? I see only reduced I/O...
-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: August-25-09 5:35 PM
To: solr-user@lucene.apache.org
Subject: Re: frequency of commit when
Hi,
Can you try to use single SOLR instance with heavy RAM (so that
ramBufferSizeMB=8192 for instance) and mergeFactor=10? Single SOLR instance
is fast enough (> 100 client threads of Tomcat; configurable) - I usually
prefer single instance for single "writable" box with heavy RAM allocation
and g
That's my gut feeling (start big and go lower if OOM occurs) too.
Bill
On Tue, Aug 25, 2009 at 5:34 PM, Edward Capriolo wrote:
> On Tue, Aug 25, 2009 at 5:29 PM, Bill Au wrote:
> > Just curious, how often do folks commit when building their Solr/Lucene
> > index from scratch for index with milli
On Tue, Aug 25, 2009 at 5:29 PM, Bill Au wrote:
> Just curious, how often do folks commit when building their Solr/Lucene
> index from scratch for index with millions of documents? Should I just wait
> and do a single commit at the end after adding all the documents to the
> index?
>
> Bill
>
Bil
Just curious, how often do folks commit when building their Solr/Lucene
index from scratch for index with millions of documents? Should I just wait
and do a single commit at the end after adding all the documents to the
index?
Bill
So I whipped up a quick SolrJ client and ran it against the document
that I referenced earlier. When I retrieve the doc and just print its
field/value pairs to stdout it ends like this:
http://brockwine.com/images/output1.png
It appears to be some kind of garbage characters.
-Rupert
On Tue, Aug
On Aug 25, 2009, at 10:34 AM, Elaine Li wrote:
I am still looking for help on chinese language search. I tried
chinesetokenizerfactory as my analyzer, but it did not help. Only word
with white space, comma and etc around them can be found.
Try using the StandardTokenizerFactory - it handles Ch
I am indexing my data both through DataImportHandler and per transaction from
JPA using @PostXXX listeners.
UpdateRequestProcessor looks like exactly what I need I don't suppose
there's a scriptable subclass available in 1.4 that is configured from
schema.xml? :-)
Thanks guys!
-
Summary
===
I had about 120,000 object of total size 71.2 GB, those objects are already
indexed using Lucene. The index size is about 111 GB.
I tried to use solr 1.4 nightly build to index the same collection. I
divided collection on three servers, each server had 5 solr instances (
Hi,
This is a very strange behavior and the fact that it is cause by one
specific field, again, leads me to believe it's still a data issue. Did
you try using SolrJ to query the data as well? If the same thing happens
when using the binary protocol, then it's probably not a data issue. On
the
Hello,
We are running multiple slices in our environment. I have enabled JMX and I am
inspecting the replication handler mbean to obtain some information about the
master/slave configuration for replication. Is the replication handler mbean a
singleton? I only see one mbean for the entire serv
If you're using DataImportHandler, a custom (Java or script)
transformer could do this.
Also an UpdateProcessor could do it.
But there is no conditional copyField capabilities otherwise.
Keep in mind that pragmatically, if you're doing your own indexing
code, why not have a line like this?
Is there a way to have the max_side field only in Solr ...as in a conditional
copyField or something like that?
I'd like to push as much of this into Solr as I can because the app and db that
Solr is indexing are not really the best place to add this type of
functionality.
- Origin
One problem is the IT logistics of handling the file set. At 200 million
records you have at least 20G of data in one Lucene index. It takes hours to
optimize this, and 10s of minutes to copy the optimized index around to
query servers.
Another problem is that indexing speed drops off after the ind
Thanks Yonik
So its basically how the field is indexed and not stored.
So i give "the elephant is an animal" and try to get back the document it
should see the entire string, only the index is done on elephant and animal.
i was of the impression that when solr loads that document it strips out
tho
: We're doing similar thing with multi-core - when a core reaches
: capacity (in our case 200 million records) we start a new core. We are
: doing this via web service call (Create web service),
this whole thread perplexes me ... while i can understand not wanting to
let an index grow without
Thanks Yonik
So the stopFilter works is that if i give a string like "the elephant is an
animal", and when i retrieve the document the stored value will always be
the same, only the index will be done on "elephant" and "animal".
I was of the impression that Solr automatically takes out that word
The text file at:
http://brockwine.com/solr.txt
Represents one of these truncated responses (this one in XML). It
starts out great, then look at the bottom, boom, game over. :)
I found this document by first running our bigger search which breaks
and then zeroing in a specific broken document by
Can you copy-paste the source data indexed in this field which causes the
error?
Cheers
Avlesh
On Tue, Aug 25, 2009 at 10:01 PM, Rupert Fiasco wrote:
> Using wt=json also yields an invalid document. So after more
> investigation it appears that I can always "break" the response by
> pulling bac
Using wt=json also yields an invalid document. So after more
investigation it appears that I can always "break" the response by
pulling back a specific field via the "fl" parameter. If I leave off a
field then the response is valid, if I include it then Solr yields an
invalid document - a truncated
Well... yes, it's a tool the Nutch ships with. It also ships with an
example Solr schema which you can use.
Fuad Efendi wrote:
Thanks for the link, so, SolrIndex is NOT plugin, it is an application... I
use similar approach...
-Original Message-
From: Uri Boness
Hi,
Nutch comes with
Thanks for the link, so, SolrIndex is NOT plugin, it is an application... I
use similar approach...
-Original Message-
From: Uri Boness
Hi,
Nutch comes with support for Solr out of the box. I suggest you follow
the steps as described here:
http://www.lucidimagination.com/blog/2009/03/0
On Aug 25, 2009, at 11:29 AM, Fuad Efendi wrote:
"query time relevancy tuning"
It is mentioned at
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
-What is it? Just GET request parameters for standard handler?
To me, this primarily refers to dismax client-side parameterization of
"query time relevancy tuning"
It is mentioned at
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
-What is it? Just GET request parameters for standard handler?
Thanks
Uri,
Thanks a lot! I don't need to do cross language search. So Option 2
sounds better, coz my corpus is very large.
I am still looking for help on chinese language search. I tried
chinesetokenizerfactory as my analyzer, but it did not help. Only word
with white space, comma and etc around them c
Announcing a new Meetup for SFBay Apache Lucene/Solr Meetup!
What: SFBay Apache Lucene/Solr June Meetup
When: September 3, 2009 6:30 PM
Where: Computer History Museum, 1401 N Shoreline Blvd, Mountain View,
CA 94043
Presentations and discussions on Lucene/Solr, the Apache Open Source
Search
Hi Paras,
> Won't specifying fq=~term1+~term2 do the job?
Briefly looking at the source, it seems that MLT handler (not component)
uses fq parameter, so if you use MLT handler, it do the job.
Koji
Paras Chopra wrote:
Hi Koji,
I have already used MLT parameters to refine the query but still I
It seems to me that this configuration actually does what you want -
queries on "title" mostly. The default search field doesn't influence a
dismax query. I would suggest you to include the debugQuery=true
parameter, it will help you figure out how the matching is performed.
You can read more
Hi Koji,
I have already used MLT parameters to refine the query but still I'd like to
exclude additional terms. I was just going through some docs online and came
across filterQuery mechanism. Won't specifying fq=~term1+~term2 do the job?
Thanks
Paras Chopra
On Tue, Aug 25, 2009 at 4:08 PM, Koji
Hi Erik Earle,
Ahh, I read your mail too fast... Erik Hatcher's method should work.
Thanks!
Koji
Erik Hatcher wrote:
You couldn't sort on a multiValued field though.
I'd simply index a max_side field, and have the indexing client add a
single valued field with max(length,width) to it. The
Hi Paras,
> As I understand from StopFilter,
> it is a static method to exclude terms such as stop words.
Correct. As far as I know, to control what words MLT component
chooses for generating BooleanQuery, what you can do is
that you specify the following parameters:
mlt.mintf
Minimum Term Freq
You couldn't sort on a multiValued field though.
I'd simply index a max_side field, and have the indexing client add a
single valued field with max(length,width) to it. Then sort on
max_side.
Erik
On Aug 25, 2009, at 4:00 AM, Constantijn Visinescu wrote:
make a new multivalued fi
Thanks for your help.
I use the default Nutch configuration and I use solrindex to give the Nutch
result to Solr. I have results when I query therefore Nutch works properly
(it gives a url, title, content ...)
I would like to query on Solr to emphase the "title" field and not the
"content" field.
make a new multivalued field in your schema.xml, copy both width and length
into that field, and then sort on that field ?
On Tue, Aug 25, 2009 at 5:40 AM, erikea...@yahoo.com wrote:
> Clever... but if more than one row adds up to the same value I may get the
> wrong order (like 50, 50 and 10, 90
On Tue, Aug 25, 2009 at 5:26 AM, darniz wrote:
>
> Continuing on this i am having a use case where i have to strip out single
> quote for certain fields for example for testing i added teh following
> fieldType in schema.xml file
>
>
>
>
>
>
> and then i decla
On Tue, Aug 25, 2009 at 2:04 AM, Brian Klippel wrote:
> Hopefully, someone can tell me what is going wrong here.
>
>
>
> I have a field, "SearchObjectType", and a large number of the documents
> indexed in a give core have a value of "USER_PROFILE".
>
>
>
> When I examine the schema browser in ad
61 matches
Mail list logo