date:20101210

Shards + dismax - scoring process?

2010-12-10 Thread bbarani

Hi, We are using 4 cores (starting from core 0 to core 4) for parallel indexing process. We use shards to do distributed indexing and we also use dismax request handler when doing search. I have configured core0 as Shards master core. When I issue a search query (with dismax request handler) on

Re: Search based on images

2010-12-10 Thread Dennis Gearon

Tried one, of Perry Mason's secretary when she was young (and HOOOT), Barbara Hale. http://www.skylighters.org/ggparade/index8.html Didn't find it. 1.8 billion images indexed is probably a DROP in the bucket of what's out there. Dennis Gearon Signature Warning It is alwa

Re: SOLR Config issue

2010-12-10 Thread bbarani

I am not sure if I understand your question correctly.. Are you saying that you are not able to start Jetty server in linux box? or SOLR application is not starting up even after server has started? Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Config-

Re: Delete by query or Id very slow

2010-12-10 Thread bbarani

Hi, As Tom suggested removing optimize and passing the ids as list (instead of for loop) will surely increase the speed of deletion. We have a program which fetches complete list of ID from back end (around 10 million) and compares it with the complete list of id's present in SOLR document and d

Re: Concurrent DIH calls

2010-12-10 Thread bbarani

Hi, As far as I know there is no queuing mechanism in SOLR for concurrent indexing request. It would simple ignore the concurrent request (first come first serve basis).. Solr experts, please correct me if I am wrong.. To achieve concurrency, we have implemented a queue using JMS and we send th

Re: best way to configure DIH for multiple DBS

2010-12-10 Thread bbarani

Just to give you some more clarification.. you can create multiple database config file (separate) to extract the data from different sources and add the hardcoded identifier in SOLR select query corresponding to each source. So you will have multiple data import handler committing the data in to

Re: best way to configure DIH for multiple DBS

2010-12-10 Thread bbarani

I am not sure whether I understand your question properly. If you are trying to get data from different database and dumping it to same index file then you need to specify a way to retrieve a particular data back from that XML (which actually contains the consolidated data from all Db's). For do

Re: Is it possible to assign default value for a particular record when using multivalued field type?

2010-12-10 Thread bbarani

Hi, Thanks a lot for your reply.. I am using database import handler to get the data (DIH) from DB. When I get a null data in single valued attribute the 'default' attribute seems to work perfectly fine. But seems like I need to validate the Null value (like using case when else statement) in

Re: Is it possible to assign default value for a particular record when using multivalued field type?

2010-12-10 Thread Tom Hill

Could you give us a bit more information? How are you getting this information into Solr? SolrJ? DataImportHandler? It's hard to see where the null value is getting dropped, if we don't know the path that it is coming in. I suspect that the default attribute won't do it. It's possible that you mi

Re: command line parameters for solr

2010-12-10 Thread Jack O

Tom, I would like to reachout to directly. Whats your email address? /j From: Tom Hill To: solr-user@lucene.apache.org Sent: Fri, December 10, 2010 9:43:08 PM Subject: Re: command line parameters for solr java -jar start.jar --help More docs here http://do

best way to configure DIH for multiple DBS

2010-12-10 Thread Geek Gamer

hi group, I have multiple document types indexed on a single core solr instance and each comes from a different DB. What is the best way to configure DIH to read each document type from corresponding DB. AS far as i could find DIH does not honour multiple document tags inside the data config. Th

Re: Search based on images

2010-12-10 Thread Maciej Lisiewski

W dniu 2010-12-11 06:24, Dennis Gearon pisze: Threre is actually some image recognition search engine software somewhere I heard about. Take a picture of something, say a poster, upload it, and it will adjust for some lighting/angle/distortion, and try to find it on the web somewhere. tineye

Re: command line parameters for solr

2010-12-10 Thread Jack O

thanks Tom, really appreciate. From: Tom Hill To: solr-user@lucene.apache.org Sent: Fri, December 10, 2010 9:43:08 PM Subject: Re: command line parameters for solr java -jar start.jar --help More docs here http://docs.codehaus.org/display/JETTY/A+look+at+the

Re: command line parameters for solr

2010-12-10 Thread Tom Hill

java -jar start.jar --help More docs here http://docs.codehaus.org/display/JETTY/A+look+at+the+start.jar+mechanism Personally, I usually limit access to localhost by using whatever firewall the machine uses. Tom On Fri, Dec 10, 2010 at 7:55 PM, Jack O wrote: > Hello, > > For starting solr, fr

Re: singular/plurals

2010-12-10 Thread Tom Hill

Check out this page: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Look, in particular, for "stemming". On Fri, Dec 10, 2010 at 7:58 PM, Jack O wrote: > Hello, > > Need one more help: > > What do I have to do so that search will work for singulars and plurals ? > > > > I would real

Re: Search based on images

2010-12-10 Thread Dennis Gearon

Threre is actually some image recognition search engine software somewhere I heard about. Take a picture of something, say a poster, upload it, and it will adjust for some lighting/angle/distortion, and try to find it on the web somewhere. You hear about crazy stuff like this at dev camps. B

singular/plurals

2010-12-10 Thread Jack O

Hello, Need one more help: What do I have to do so that search will work for singulars and plurals ? I would really appreciate all your help. /J

command line parameters for solr

2010-12-10 Thread Jack O

Hello, For starting solr, from where do i find the list of command line parameters. java -jar start.jar blahblah... I am especially looking for how to specify my own jetty config file. I want to allow access of solr from localhost only. I would really appreciate all your help. /J

Re: Search based on images

2010-12-10 Thread Lance Norskog

Searching for an image with a painted query! Wow. On Wed, Dec 8, 2010 at 11:14 PM, Maciej Lisiewski wrote: > There is imgSeek ( http://www.imgseek.net/isk-daemon ), which while being > far from perfect (can't handle rotated images) is quite simple and has > already been added to xapian. > Paper o

Re: [Multiple] RSS Feeds at a time...

2010-12-10 Thread Lance Norskog

There is I believe no way to do this without separate copies of your script. Each 'handler=/dataimport' has to refer to a separate config file. You can make several copies and name them config1.xml, config2.xml etc. You'll have to call each one manually, so you have to manage your own thread pool.

Viewing query debug explanation with dismax and multicore

2010-12-10 Thread sara motahari

Hi All, I am trying to debug my queries and see how scoring is done. I have 6 cores and send the quesy to 6 shards and it's dismax handler (with search on various fields with different boostings). I enable debug, and view source but I'm unable to see the explanations. I'm returning ID and scor

Re: Using synonyms in combination with facets

2010-12-10 Thread Chris Hostetter

: I have a field that I use for facetting. I do not tokenize this field. It : has entries like: : : AWB artikel 2, lid 1 : AWB artikel 8:75 : Algemene Wet Bestuursrecht artikel 8:75 I assume those are names of laws, followed by page/paragram numbers in various formats? (and evidently "lid" is

Re: search problem after using EdgeNGramFilter

2010-12-10 Thread Chris Hostetter

: I thought that I have to use NGramFilter for wildcard search. : But It was the wrong idea. : Thanks, iorixxx your confusion may be because using EdgeNGramFilter is a way to make "prefix" queries faster by precomputing hte prefixes as index time instead of at query time. (trading disk space f

Re: Query performance very slow even after autowarming

2010-12-10 Thread Chris Hostetter

: I made the field that is indexed with EdgeNGramFilterFactory as default : search field. All my query responses are very slow, some of them taking more : than 10seconds to respond. based on the info you've given, there's dozens of posisble reasons why you might see slow queries -- it's hard to

Re: Multicore and Replication (scripts vs. java, spellchecker)

2010-12-10 Thread Chris Hostetter

: #SOLR-433 "MultiCore and SpellChecker replication" [1]. Based on the : status of this feature request I'd asume that the normal procedure of : keeping the spellchecker index up2date would be running a cron job on : each node/slave that updates the spellchecker. : Is that right? i'm not 100% cer

Re: Tips for 'staggered date facets', i.e. 'last 24 hours, last week, last month, last year' , ala google news?

2010-12-10 Thread Chris Hostetter

: As Solr's standard date faceting does not appear to meet this need, we will : need to use faceting on arbitrary queries, i.e. by passing multiple values : for facet.query correct, facet.date is really just a convincence feature over using facet.query when you want lots of consistently sized ra

Re: SOLR Thesaurus

2010-12-10 Thread Péter Király

Hi Chris, thanks for your description. I should think about this a little bit more, then I will ask some details. The main problem is that Synonyms are one kind of relations, and Thesaurus may contain 6-10 kinds of relations. And it is depending on the user, which types of relations he would like

Re: SOLR Thesaurus

2010-12-10 Thread Chris Hostetter

: The question asked, in good faith, was does solr support or extend to : implementing a thesaurus. It looks like it does not which is fine. It does Well, my point was that "thesaurus" is not a feature description. it's a data structure, and depending on your goals, the existing SynonymFilter

Tips for 'staggered date facets', i.e. 'last 24 hours, last week, last month, last year' , ala google news?

2010-12-10 Thread Will Milspec

hi all, We wish to implement date faceting with a 'sliding date range', 'last 24 hours, last week, last month, last year' . Google New currently implements such faceting when you search for a topic. As Solr's standard date faceting does not appear to meet this need, we will need to use facetin

Is it possible to assign default value for a particular record when using multivalued field type?

2010-12-10 Thread bbarani

Hi, I have a multivalued field for which some of the records have null or empty data in it. Since its difficult to parse and match empty XML tags in SOLR ouput, I thought I would assign a default value for those empty data as below. But my approach is not working if this field has atleast one

Separate Lines Like Google

2010-12-10 Thread Alejandro Delgadillo

Hi everybody, I¹m having some troubles trying to figure out how to separate lines in a paragraph from a search result, I¹m indexing PDF¹s but when I search the highlight terms I can not know when the first line ends and the next one begins, Is there a way to put a [...] like google o a Paragraph

Separate Lines Like Google

2010-12-10 Thread Alejandro Delgadillo

Hi everybody, I¹m having some troubles trying to figure out how to separate lines in a paragraph from a search result, I¹m indexing PDF¹s but when I search the highlight terms I can not know when the first line ends and the next one begins, Is there a way to put a [...] like google o a Paragraph

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-10 Thread John Russell

Thanks a lot for the response. Unfortunately I can't check the statistics page. For some reason the solr webapp itself is only returning a directory listing. This is sometimes fixed when I restart but if I do that I'll lose the state I have now. I can get at the JMX interface. Can I check my i

Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-10 Thread Tom Hill

Hi John, WeakReferences allow things to get GC'd, if there are no other references to the object referred to. My understanding is that WeakHashMaps use weak references for the Keys in the HashMap. What this means is that the keys in HashMap can be GC'd, once there are no other references to the

Re: SOLR Thesaurus

2010-12-10 Thread Chris Hostetter

: My imaginative use case: : - the user enters a term and maybe he turns on a flag to get not just : the term, but all terms, which related somehow with this (usually the : synonyms and narrower terms). : - Solr first find the queried term(s) in the thesaurus, then finds the : related terms, modif

OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?

2010-12-10 Thread John Russell

I have been load testing solr 1.4.1 and have been running into OOM errors. Not out of heap but with the GC overhead limit exceeded message meaning that it didn't actually run out of heap space but just spent too much CPU time trying to make room and gave up. I got a heap dump and sent it through t

Re: multiple binary documents into a single solr document - Vignette/OpenText integration

2010-12-10 Thread briankous

Hi there, We are trying to replace opentext (V7.6) autonomy with solr so that we can index other contents, too. Due to lack of manpower and time, the management wants to buy the adapter if available. Do you know of any vendor who sells the adapter or professional service? Thank you. Brian Ko

SOLR geospatial

2010-12-10 Thread George Anthony

In looking at some of the docs support for geospatial search. I see this functionality is mostly scheduled for upcoming release 4.0 (with some playing around with backported code). I note the support for the bounding box filter, but will "bounding box" be one of the supported *data* types fo

[Multiple] RSS Feeds at a time...

2010-12-10 Thread Adam Estrada

All, Right now I am using the default DIH config that comes with the Solr examples. I update my index using the dataimport handler here http://localhost:8983/solr/admin/dataimport.jsp?handler=/dataimport This works fine but I want to be able to index more than just one feed at a time and more im

Re: Indexing documents with SOLR

2010-12-10 Thread Adam Estrada

Nutch is also a great option if you want a crawler. I have found that you will need to use the latest version of PDFBox and a it's dependencies for better results. Also, make sure to set JAVA_OPT to something really large so that you won't exceed your heap size. Adam On Fri, Dec 10, 2010 at 6:27

Re: SolJSON

2010-12-10 Thread alessandro.ri...@virgilio.it

Hi Lee, Thank you very much for your quick answer! It works fine! Ciao, Alessandro solr-user@lucene.apache.org -Original Message- From: lee carroll [mailto:lee.a.carr...@googlemail.com] Sent: 09 December 2010 18:46 To: solr-user@lucene.apache.org; alessandro.ri...@virgilio.it S

Re: Working Chef Cookbook for Solr

2010-12-10 Thread György Frivolt

Although I access solr from rails by sunspot, the rails server runs on heroku, so on a different machine. I prefer to have solr as stand alone server and want to tell sunspot where it can find the running solr. I am quite new to chef, but if I can I could help with writing a cookbook I would. If y

Re: SOLR Thesaurus

2010-12-10 Thread lee carroll

Two Peters (or rather a stupid english bloke who can't work out how to type fancy accents :-) Sorry Péter (took me 10 minutes to work out i could cut and paste) my reply was to the clustering post by Peter Sturge. Clustering sounds great but being able to define a thesaurus scheme excatly would be

Re: SOLR Thesaurus

2010-12-10 Thread Péter Király

Hi Lee, according to my vision the user could decide which relationship types would he likes to attach to his search, and the application would call his attention to other possibilities. So there would be no heuristic method applied, because e.g. boarder terms would cause lots of misleading result

Erratic Behaviour From Filters

2010-12-10 Thread Lohrenz, Steven

Hi, I have implemented a QueryParser that queries another solr core and returns a list of values (resourceIds) that are the primary solr key on the main core. I then query the main core using the resourceId to retrieve the Lucene docId. I build up an array of ints of these doc ids. I put this a

Re: Indexing documents with SOLR

2010-12-10 Thread Tommaso Teofili

Hi Pankaj, you can find the needed documentation right here [1]. Hope this helps, Tommaso [1] : http://wiki.apache.org/solr/ExtractingRequestHandler 2010/12/10 pankaj bhatt > Hi All, > I am a newbie to SOLR and trying to integrate TIKA + SOLR. > Can anyone please guide me, how to achieve

Indexing documents with SOLR

2010-12-10 Thread pankaj bhatt

Hi All, I am a newbie to SOLR and trying to integrate TIKA + SOLR. Can anyone please guide me, how to achieve this. * My Req is:* I have a directory containing a lot of PDF,DOC's and i need to make a search within the documents. I am using SOLR web application. I just need some

Re: Working Chef Cookbook for Solr

2010-12-10 Thread Upayavira

I will likely need to create one in the next week or two. Depends upon how soon you need one. The one you've found is probably designed to work with rails apps. It assumes you have solr installed already, and adds another instance/index. I certainly need one that can do something that'll create s

Re: Multicore and Replication (scripts vs. java, spellchecker)

2010-12-10 Thread Martin Grotzke

Hi, that there's no feedback indicates that our plans/preferences are fine. Otherwise it's now a good opportunity to feed back :-) Cheers, Martin On Wed, Dec 8, 2010 at 2:48 PM, Martin Grotzke wrote: > Hi, > > we're just planning to move from our replicated single index setup to > a replicated

Re: SOLR Thesaurus

2010-12-10 Thread lee carroll

Hi Peter, Thats way to clever for me :-) Discovering thesuarus relationships would be fantastic but its not clear what heuristics you would need to use to discover broader, narrower, related documents etc. Although I might be doing the clustering down i'm sceptical about the accuracy. cheers Lee

Working Chef Cookbook for Solr

2010-12-10 Thread György Frivolt

Hi, I tried to setup Solr by chef and so far found only the opscode one, but this one setup only the group and the user for solr, not the solr engine. Does anyone know about a maintained solr chef cookbook? Thanks for suggestion! Georg

Re: SOLR Thesaurus

2010-12-10 Thread Peter Sturge

Hi Lee, Perhaps Solr's clustering component might be helpful for your use case? http://wiki.apache.org/solr/ClusteringComponent On Fri, Dec 10, 2010 at 9:17 AM, lee carroll wrote: > Hi Chris, > > Its all a bit early in the morning for this mined :-) > > The question asked, in good faith, was

Re: SOLR Thesaurus

2010-12-10 Thread lee carroll

Hi Chris, Its all a bit early in the morning for this mined :-) The question asked, in good faith, was does solr support or extend to implementing a thesaurus. It looks like it does not which is fine. It does support synonyms and synonym rings which is again fine. The ski example was an illustrat

Re: dismax: limiting term match to one field

2010-12-10 Thread Jan Kurella

On 09.12.2010 21:26, ext Chris Hostetter wrote: : doc1 is name=A B category=B : doc2 is name=A category=B : : when searching for the terms "A" and "B" I want doc2 to get a higher score. : to be more specific, I don't want the term "B" to influence doc1's score in : both and, only in one of them.

Re: SOLR Thesaurus

2010-12-10 Thread Péter Király

I also try to define the problem. In the library world there are some general and special thesaurus, which reveal the relations between concepts. The relations have types as Lee described: Prefered Term (PT), Broader Terms (BT), Narrower Terms (NT) Related Terms (RT) and others. Some of these thes

55 matches

Mail list logo