date:20090825

Re: solr nutch url indexing

2009-08-25 Thread last...@gmail.com

Uri Boness wrote: Well... yes, it's a tool the Nutch ships with. It also ships with an example Solr schema which you can use. hi, is there any documentation to understand what going in the schema ? dismax explicit 0.01 content0.5 anchor1.0 title5.2 cont

Re: Solr Replication

2009-08-25 Thread Noble Paul നോബിള്‍ नोब्ळ्

The ReplicationHandler is not enforced as a singleton , but for all practical purposes it is a singleton for one core. If an instance (a slice as you say) is setup as a repeater, It can act as both a master and slave in the repeater the configuration should be as follows MASTER |_

encoding problem

2009-08-25 Thread Bernadette Houghton

We have an encoding problem with our solr application. That is, non-ASCII chars displaying fine in SOLR, but in googledegook in our application . Our tomcat server.xml file already contains URIencoding="UTF-8" under the relevant . A google search reveals that I should set the encoding for the J

Re: solr 1.4: extending StatsComponent to recognize localparm {!ex}

2009-08-25 Thread Erik Hatcher

On Aug 25, 2009, at 6:35 PM, Britske wrote: Moreover, I can't seem to find the actual code in FacetComponent or anywhere else for that matter where the {!ex}-param case is treated. I assume it's in FacetComponent.refineFacets but I can't seem to get a grip on it.. Perhaps it's late here..

Re: Responses getting truncated

2009-08-25 Thread Chris Hostetter

: We are running an instance of MediaWiki so the text goes through a : couple of transformations: wiki markup -> html -> plain text. : Its at this last step that I take a "snippet" and insert that into Solr. ... : doc.addField("text_snippet_t", article.getSnippet(1000)); ok, well first of

Re: Incremental Deletes to Index

2009-08-25 Thread KaktuChakarabati

So basically the idea is to replace the underlying IndexReader currently associated with a searcher/solrCore following an update without calling commit explicitly? This will also have the effect of bringing in inserts btw? or is it just usable for deletes? In terms of cache invalidation etc there

RE: frequency of commit when building index from scratch

2009-08-25 Thread Fuad Efendi

But again, why someone has OOM??? I never had... What I discovered is: committing millions docs (in SOLR-1.4) may take several days (although adding docs takes a day) if you have somehow _many_segments_ and bad I/O with <= 2 CPUs; I am using heavy ramBufferSizeMB instead of heavy mergeFactor, and

Re: Responses getting truncated

2009-08-25 Thread Rupert Fiasco

> 1. Exactly which version of Solr / SolrJ are you using? Solr Specification Version: 1.3.0 Solr Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 11:06:47 Latest SolrJ that I downloaded a couple of days ago. > Can you put the orriginal (pre solr, pre solrj, raw untouched, etc..

Re: frequency of commit when building index from scratch

2009-08-25 Thread Yonik Seeley

On Tue, Aug 25, 2009 at 8:37 PM, Lance Norskog wrote: > The latest Solr 1.4 can index 200k records in several minutes, then commit > in a few seconds. I don't know but I'm guessing it is due to Lucene > improvements. It does not use much memory doing this. If you're using SolrJ, it's due to improv

Problem with ResponseBuilder

2009-08-25 Thread Daniel Cassiano

Hi folks, I'm writing some search component to Solr and I'm having some troubles with the ResponseBuilder. I'd like to add to response, eg: only 5 documents of a search. My problem is when I try to add these docs to the ResponseBuilder. A snipet of the code: [...] QParser parser = QParser.getPars

Re: Calculate Theoretical Max

2009-08-25 Thread Chris Hostetter

: Is there a way to calculate a theoretical max score for the current query? there's been some discussion on this on the java-user list over the years ... the short answer is "yes it's possible, but only in very controlled situations" ... as i recall it depended on limiting the set of possible

Re: Putting a something as first query result

2009-08-25 Thread Chris Hostetter

: I'm a bit new to solr and have the following problem, it's about events and : venues. : If a user types a name of a venue, then I'd like to return the exact match : for the venue first and then the list of events taking place at this venue. : Currently I have defined a document bound to a databa

Re: Which versions?

2009-08-25 Thread Chris Hostetter

: Which versions of Lucene, Nutch and Solr work together? I've : discovered that the Nutch trunk and the Solr trunk use wildly : different versions of the Lucene jars, and it's causing me problems. The Solr and Nuthc projects don't really target any sort of strict binary compatibility with each

Re: Boolean logic in distributed searches

2009-08-25 Thread Chris Hostetter

Matthew: did you ever resolve your issue? I'm not an expect on the distributed searching code, but there's no reason I know of why a basic "OR" type query should fail just becuase you're using hte shards param. Are you sure both of your solr instances (solr-archway and solr-portal) are using th

Re: frequency of commit when building index from scratch

2009-08-25 Thread Lance Norskog

The latest Solr 1.4 can index 200k records in several minutes, then commit in a few seconds. I don't know but I'm guessing it is due to Lucene improvements. It does not use much memory doing this. Lance On Tue, Aug 25, 2009 at 2:43 PM, Fuad Efendi wrote: > I do commit once a day, millions of sm

Re: Responses getting truncated

2009-08-25 Thread Chris Hostetter

1. Exactly which version of Solr / SolrJ are you using? 2. ... : I am using the SolrJ client to add documents to in my index. My field : is a normal "text" field type and the text itself is the first 1000 : characters of an article. Can you put the orriginal (pre solr, pre solrj

Re: Adding cores dynamically

2009-08-25 Thread Chris Hostetter

: 1) We found the indexing speed starts dipping once the index grow to a : certain size - in our case around 50G. We don't optimize, but we have : to maintain a consistent index speed. The only way we could do that : was keep creating new cores (on the same box, though we do use Hmmm... it seems

Re: Incremental Deletes to Index

2009-08-25 Thread Jason Rutherglen

I can give an overview, IW.getReader replaces IR.reopen. So you'd replace in SolrCore.getSearcher. However as per another discussion IW isn't public yet, so all you'd need to do is expose it from UpdateHandler. Then it should work as you want, though there would need to be a new method to create a

Re: Adding cores dynamically

2009-08-25 Thread vivek sar

There were two main reasons we went with multi-core solution, 1) We found the indexing speed starts dipping once the index grow to a certain size - in our case around 50G. We don't optimize, but we have to maintain a consistent index speed. The only way we could do that was keep creating new cores

Re: Incremental Deletes to Index

2009-08-25 Thread KaktuChakarabati

Jason, sounds like a very promising change to me - so much that I would gladly work toward creating a patch myself. Are there any specific points in the code u could point me to if I wanna look at how to start off implementing it? Lucene/Solr Classes involved etc? i'll start looking myself anyhow

Seattle / NW Hadoop, HBase Lucene, etc. Meetup , Wed August 26th, 6:45pm

2009-08-25 Thread Bradford Stephens

Hey there, Apologies for this not going out sooner -- apparently it was sitting as a draft in my inbox. A few of you have pinged me, so thanks for your vigilance. It's time for another Hadoop/Lucene/Apache Stack meetup! We've had great attendance in the past few months, let's keep it up! I'm alwa

Re: Incremental Deletes to Index

2009-08-25 Thread Jason Rutherglen

This will be implemented as you're stating when IndexWriter.getReader is incorporated. This will carry over deletes in RAM until IW.commit is called (i.e. Solr commit). It's a fairly simple change though perhaps too late for 1.4 release? On Tue, Aug 25, 2009 at 3:10 PM, KaktuChakarabati wrote: > >

solr 1.4: extending StatsComponent to recognize localparm {!ex}

2009-08-25 Thread Britske

hi, I'm looking for a way to extend StatsComponent te recognize localparams especially the {!ex}-param. To my knowledge this isn't implemented in the current trunk. One of my use-cases for this is to be able to have a javascript price-slider, where the user can operate the slider and thus set a

Incremental Deletes to Index

2009-08-25 Thread KaktuChakarabati

Hey, I was wondering - is there a mechanism in lucene and/or solr to mark a document in the index as deleted and then have this change reflect in query serving without performing the whole commit/warmup cycle? this seems to me largely appealing as it allows a kind of solution where deletes are sim

com.ctc.wstx.exc.WstxUnexpectedCharException error

2009-08-25 Thread Phillip Farber

I have a valid xml document that begins: mdp.39015052775379 2 Technology transfer and in-house R&D in Indian industry : in the later 1990s / edited and with an introduction by Binay Kumar Pattnaik. v.1 Not found TECHNOLOGY TRANSFER AND IN.HOUSE R&D IN INDIAN INDUSTRY I believe Solr is throwi

RE: frequency of commit when building index from scratch

2009-08-25 Thread Fuad Efendi

I do commit once a day, millions of small docs... it takes 20 minutes in average... why OOM? I see only reduced I/O... -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: August-25-09 5:35 PM To: solr-user@lucene.apache.org Subject: Re: frequency of commit when

RE: Solr index - Size and indexing speed

2009-08-25 Thread Fuad Efendi

Hi, Can you try to use single SOLR instance with heavy RAM (so that ramBufferSizeMB=8192 for instance) and mergeFactor=10? Single SOLR instance is fast enough (> 100 client threads of Tomcat; configurable) - I usually prefer single instance for single "writable" box with heavy RAM allocation and g

Re: frequency of commit when building index from scratch

2009-08-25 Thread Bill Au

That's my gut feeling (start big and go lower if OOM occurs) too. Bill On Tue, Aug 25, 2009 at 5:34 PM, Edward Capriolo wrote: > On Tue, Aug 25, 2009 at 5:29 PM, Bill Au wrote: > > Just curious, how often do folks commit when building their Solr/Lucene > > index from scratch for index with milli

Re: frequency of commit when building index from scratch

2009-08-25 Thread Edward Capriolo

On Tue, Aug 25, 2009 at 5:29 PM, Bill Au wrote: > Just curious, how often do folks commit when building their Solr/Lucene > index from scratch for index with millions of documents? Should I just wait > and do a single commit at the end after adding all the documents to the > index? > > Bill > Bil

frequency of commit when building index from scratch

2009-08-25 Thread Bill Au

Just curious, how often do folks commit when building their Solr/Lucene index from scratch for index with millions of documents? Should I just wait and do a single commit at the end after adding all the documents to the index? Bill

Re: Responses getting truncated

2009-08-25 Thread Rupert Fiasco

So I whipped up a quick SolrJ client and ran it against the document that I referenced earlier. When I retrieve the doc and just print its field/value pairs to stdout it ends like this: http://brockwine.com/images/output1.png It appears to be some kind of garbage characters. -Rupert On Tue, Aug

Re: multi-language search

2009-08-25 Thread Erik Hatcher

On Aug 25, 2009, at 10:34 AM, Elaine Li wrote: I am still looking for help on chinese language search. I tried chinesetokenizerfactory as my analyzer, but it did not help. Only word with white space, comma and etc around them can be found. Try using the StandardTokenizerFactory - it handles Ch

Re: Solr Query help - sorting

2009-08-25 Thread Erik Earle

I am indexing my data both through DataImportHandler and per transaction from JPA using @PostXXX listeners. UpdateRequestProcessor looks like exactly what I need I don't suppose there's a scriptable subclass available in 1.4 that is configured from schema.xml? :-) Thanks guys! -

Solr index - Size and indexing speed

2009-08-25 Thread engy.ali

Summary === I had about 120,000 object of total size 71.2 GB, those objects are already indexed using Lucene. The index size is about 111 GB. I tried to use solr 1.4 nightly build to index the same collection. I divided collection on three servers, each server had 5 solr instances (

Re: Responses getting truncated

2009-08-25 Thread Uri Boness

Hi, This is a very strange behavior and the fact that it is cause by one specific field, again, leads me to believe it's still a data issue. Did you try using SolrJ to query the data as well? If the same thing happens when using the binary protocol, then it's probably not a data issue. On the

Solr Replication

2009-08-25 Thread J G

Hello, We are running multiple slices in our environment. I have enabled JMX and I am inspecting the replication handler mbean to obtain some information about the master/slave configuration for replication. Is the replication handler mbean a singleton? I only see one mbean for the entire serv

Re: Solr Query help - sorting

2009-08-25 Thread Erik Hatcher

If you're using DataImportHandler, a custom (Java or script) transformer could do this. Also an UpdateProcessor could do it. But there is no conditional copyField capabilities otherwise. Keep in mind that pragmatically, if you're doing your own indexing code, why not have a line like this?

Re: Solr Query help - sorting

2009-08-25 Thread Erik Earle

Is there a way to have the max_side field only in Solr ...as in a conditional copyField or something like that? I'd like to push as much of this into Solr as I can because the app and db that Solr is indexing are not really the best place to add this type of functionality. - Origin

Re: Adding cores dynamically

2009-08-25 Thread Lance Norskog

One problem is the IT logistics of handling the file set. At 200 million records you have at least 20G of data in one Lucene index. It takes hours to optimize this, and 10s of minutes to copy the optimized index around to query servers. Another problem is that indexing speed drops off after the ind

Re: solr.StopFilterFactory not filtering words

2009-08-25 Thread darniz

Thanks Yonik So its basically how the field is indexed and not stored. So i give "the elephant is an animal" and try to get back the document it should see the entire string, only the index is done on elephant and animal. i was of the impression that when solr loads that document it strips out tho

Re: Adding cores dynamically

2009-08-25 Thread Chris Hostetter

: We're doing similar thing with multi-core - when a core reaches : capacity (in our case 200 million records) we start a new core. We are : doing this via web service call (Create web service), this whole thread perplexes me ... while i can understand not wanting to let an index grow without

Re: solr.StopFilterFactory not filtering words

2009-08-25 Thread darniz

Thanks Yonik So the stopFilter works is that if i give a string like "the elephant is an animal", and when i retrieve the document the stored value will always be the same, only the index will be done on "elephant" and "animal". I was of the impression that Solr automatically takes out that word

Re: Responses getting truncated

2009-08-25 Thread Rupert Fiasco

The text file at: http://brockwine.com/solr.txt Represents one of these truncated responses (this one in XML). It starts out great, then look at the bottom, boom, game over. :) I found this document by first running our bigger search which breaks and then zeroing in a specific broken document by

Re: Responses getting truncated

2009-08-25 Thread Avlesh Singh

Can you copy-paste the source data indexed in this field which causes the error? Cheers Avlesh On Tue, Aug 25, 2009 at 10:01 PM, Rupert Fiasco wrote: > Using wt=json also yields an invalid document. So after more > investigation it appears that I can always "break" the response by > pulling bac

Re: Responses getting truncated

2009-08-25 Thread Rupert Fiasco

Using wt=json also yields an invalid document. So after more investigation it appears that I can always "break" the response by pulling back a specific field via the "fl" parameter. If I leave off a field then the response is valid, if I include it then Solr yields an invalid document - a truncated

Re: solr nutch url indexing

2009-08-25 Thread Uri Boness

Well... yes, it's a tool the Nutch ships with. It also ships with an example Solr schema which you can use. Fuad Efendi wrote: Thanks for the link, so, SolrIndex is NOT plugin, it is an application... I use similar approach... -Original Message- From: Uri Boness Hi, Nutch comes with

RE: solr nutch url indexing

2009-08-25 Thread Fuad Efendi

Thanks for the link, so, SolrIndex is NOT plugin, it is an application... I use similar approach... -Original Message- From: Uri Boness Hi, Nutch comes with support for Solr out of the box. I suggest you follow the steps as described here: http://www.lucidimagination.com/blog/2009/03/0

Re: "query time relevancy tuning" - need details

2009-08-25 Thread Erik Hatcher

On Aug 25, 2009, at 11:29 AM, Fuad Efendi wrote: "query time relevancy tuning" It is mentioned at http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ -What is it? Just GET request parameters for standard handler? To me, this primarily refers to dismax client-side parameterization of

"query time relevancy tuning" - need details

2009-08-25 Thread Fuad Efendi

"query time relevancy tuning" It is mentioned at http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ -What is it? Just GET request parameters for standard handler? Thanks

Re: multi-language search

2009-08-25 Thread Elaine Li

Uri, Thanks a lot! I don't need to do cross language search. So Option 2 sounds better, coz my corpus is very large. I am still looking for help on chinese language search. I tried chinesetokenizerfactory as my analyzer, but it did not help. Only word with white space, comma and etc around them c

Lucene Meetup - September 3, Mountain View, CA

2009-08-25 Thread Erik Hatcher

Announcing a new Meetup for SFBay Apache Lucene/Solr Meetup! What: SFBay Apache Lucene/Solr June Meetup When: September 3, 2009 6:30 PM Where: Computer History Museum, 1401 N Shoreline Blvd, Mountain View, CA 94043 Presentations and discussions on Lucene/Solr, the Apache Open Source Search

Re: Can I exclude certain terms from MoreLikeThis query?

2009-08-25 Thread Koji Sekiguchi

Hi Paras, > Won't specifying fq=~term1+~term2 do the job? Briefly looking at the source, it seems that MLT handler (not component) uses fq parameter, so if you use MLT handler, it do the job. Koji Paras Chopra wrote: Hi Koji, I have already used MLT parameters to refine the query but still I

Re: solr nutch url indexing

2009-08-25 Thread Uri Boness

It seems to me that this configuration actually does what you want - queries on "title" mostly. The default search field doesn't influence a dismax query. I would suggest you to include the debugQuery=true parameter, it will help you figure out how the matching is performed. You can read more

Re: Can I exclude certain terms from MoreLikeThis query?

2009-08-25 Thread Paras Chopra

Hi Koji, I have already used MLT parameters to refine the query but still I'd like to exclude additional terms. I was just going through some docs online and came across filterQuery mechanism. Won't specifying fq=~term1+~term2 do the job? Thanks Paras Chopra On Tue, Aug 25, 2009 at 4:08 PM, Koji

Re: Solr Query help - sorting

2009-08-25 Thread Koji Sekiguchi

Hi Erik Earle, Ahh, I read your mail too fast... Erik Hatcher's method should work. Thanks! Koji Erik Hatcher wrote: You couldn't sort on a multiValued field though. I'd simply index a max_side field, and have the indexing client add a single valued field with max(length,width) to it. The

Re: Can I exclude certain terms from MoreLikeThis query?

2009-08-25 Thread Koji Sekiguchi

Hi Paras, > As I understand from StopFilter, > it is a static method to exclude terms such as stop words. Correct. As far as I know, to control what words MLT component chooses for generating BooleanQuery, what you can do is that you specify the following parameters: mlt.mintf Minimum Term Freq

Re: Solr Query help - sorting

2009-08-25 Thread Erik Hatcher

You couldn't sort on a multiValued field though. I'd simply index a max_side field, and have the indexing client add a single valued field with max(length,width) to it. Then sort on max_side. Erik On Aug 25, 2009, at 4:00 AM, Constantijn Visinescu wrote: make a new multivalued fi

Re: solr nutch url indexing

2009-08-25 Thread Thibaut Lassalle

Thanks for your help. I use the default Nutch configuration and I use solrindex to give the Nutch result to Solr. I have results when I query therefore Nutch works properly (it gives a url, title, content ...) I would like to query on Solr to emphase the "title" field and not the "content" field.

Re: Solr Query help - sorting

2009-08-25 Thread Constantijn Visinescu

make a new multivalued field in your schema.xml, copy both width and length into that field, and then sort on that field ? On Tue, Aug 25, 2009 at 5:40 AM, erikea...@yahoo.com wrote: > Clever... but if more than one row adds up to the same value I may get the > wrong order (like 50, 50 and 10, 90

Re: defining qf in your own request handler

2009-08-25 Thread Shalin Shekhar Mangar

On Tue, Aug 25, 2009 at 5:26 AM, darniz wrote: > > Continuing on this i am having a use case where i have to strip out single > quote for certain fields for example for testing i added teh following > fieldType in schema.xml file > > > > > > > and then i decla

Re: why would a search for a specific field value fail when data is present?

2009-08-25 Thread Shalin Shekhar Mangar

On Tue, Aug 25, 2009 at 2:04 AM, Brian Klippel wrote: > Hopefully, someone can tell me what is going wrong here. > > > > I have a field, "SearchObjectType", and a large number of the documents > indexed in a give core have a value of "USER_PROFILE". > > > > When I examine the schema browser in ad

61 matches

Mail list logo