Re: Wanted: a directory of quick-and-(not too)dirty analyzers for multi-language RDF.

2011-03-25 Thread François Schiettecatte
I had meant to also include a link to a blog post of mine that lists some useful links: http://fschiettecatte.wordpress.com/2008/07/23/language-recognition/ François On Mar 25, 2011, at 11:59 AM, Grant Ingersoll wrote: > You are looking for a language identification tool. You could ch

Re: Wanted: a directory of quick-and-(not too)dirty analyzers for multi-language RDF.

2011-03-25 Thread François Schiettecatte
François I think there is a language identification tool in the Nutch code base, otherwise I have written one in Perl which could easily be translated to Java. I wont have access to it for 10 days (I am traveling), but I am happy to send you a link to it when I get back (and anyone else who wan

Re: solr on the cloud

2011-03-25 Thread Jason Rutherglen
Dmitry, If you're planning on using HBase you can take a look at https://issues.apache.org/jira/browse/HBASE-3529 I think we may even have a reasonable solution for reading the index [randomly] out of HDFS. Benchmarking'll be implemented next. It's not production ready, suggestions are welcome.

Re: solr on the cloud

2011-03-25 Thread Dmitry Kan
Hi Otis, Thanks for elaborating on this and the link (funny!). I have quite a big dataset growing all the time. The problems that I start facing are pretty much predictable: 1. Scalability: this inludes indexing time (now some days!, better hours or even minutes, if that's possible) along with ha

Re: Dismax and worddelimiterfilter

2011-03-25 Thread lboutros
You could develop your own tokenizer to extract the different forms of your ids. It is possible to extend the pattern tokenizer. Ludovic. Le 25 mars 2011 21:13, "David Yang [via Lucene]" < ml-node+2732007-1439913827-383...@n3.nabble.com> a écrit : > > > Hi, > > > > I am having some really strang

Default operator

2011-03-25 Thread Brian Lamb
Hi all, I know that I can change the default operator in two ways: 1) <*solrQueryParser defaultOperator*="AND|OR"/> 2) Add q.op=AND I'm wondering if it is possible to change the default operator for a specific field only? For example, if I use the URL: http://localhost:8983/solr/search/?q=anima

Using ExtractRequestHandler when source site has redirects

2011-03-25 Thread Daniel Sharkey
Hi all, I'm trying to execute the following command: curl " http://localhost:8983/solr/update/extract?&extractOnly=true&stream.url=http://www.nytimes.com/2011/03/26/world/middleeast/26syria.html?hp " but it doesn't work because the NYTimes url has a redirect in it. Is there any way to tell T

Re: solr on the cloud

2011-03-25 Thread Otis Gospodnetic
Hi Dan, This feels a bit like a buzzword soup with mushrooms. :) MR jobs, at least the ones in Hadoopland, are very batch oriented, so that wouldn't be very suitable for most search applications. There are some technologies like Riak that combine MR and search. Let me use this funny litt

Dismax and worddelimiterfilter

2011-03-25 Thread David Yang
Hi, I am having some really strange issues matching "N61JQ-B2". If I had a field "N61JQ-B2", and I wanted to match "N61JQ", "N61JQB2", "N61JQ-B2" and "N61JQ B2" in dismax, what fieldtype should it have? My final fallback is to use ngrams but that would impose a pretty large overhead, since the

Re: Synonyms: whitespace problem

2011-03-25 Thread Ahmet Arslan
> I have a problem with the synonyms. SOLR strips the > synonyms on white space. > An example: > > manchester united, reds, manunited > > My index looks like this: > > manchester > united > red > manunited > > i want this: > manchester united > red > manunited You can escape white spaces with

Re: stopwords not working in multicore setup

2011-03-25 Thread Christopher Bottaro
Ahh, thank you for the hints Martin... German stopwords without Umlaut work correctly. So I'm trying to figure out where the UTF-8 chars are getting messed up. Using the Solr admin web UI, I did a search for title:für and the xml (or json) output in the browser shows the query with the proper enc

Re: Multiple Cores with Solr Cell for indexing documents

2011-03-25 Thread Erick Erickson
Right, and you can go to sharding rather than managing your multiple cores if thats warranted. Erick On Fri, Mar 25, 2011 at 1:31 PM, Brandon Waterloo wrote: > I did finally manage to deploy Solr with multiple cores but we've been > running into so many problems with permissions, index loca

RE: Multiple Cores with Solr Cell for indexing documents

2011-03-25 Thread Brandon Waterloo
I did finally manage to deploy Solr with multiple cores but we've been running into so many problems with permissions, index location, and other things that I (quite fortunately) convinced my boss that multiple cores are not the way to go here. I had in place a single-core system that would fil

Re: Multiple Cores with Solr Cell for indexing documents

2011-03-25 Thread Markus Jelsma
You can only set properties for a lib dir that must be used in solrconfig.xml. You can use sharedLib in solr.xml though. > There's options in solr.xml that point to lib dirs. Make sure you get > them right. > > Upayavira > > On Thu, 24 Mar 2011 23:28 +0100, "Markus Jelsma" > > wrote: > > I be

Re: Wanted: a directory of quick-and-(not too)dirty analyzers for multi-language RDF.

2011-03-25 Thread Grant Ingersoll
You are looking for a language identification tool. You could check https://issues.apache.org/jira/browse/SOLR-1979 for the start of this. Otherwise, you have to roll your own or buy a third party one. On Mar 24, 2011, at 12:24 PM, fr.jur...@voila.fr wrote: > Hello Solrists, > > As it says i

Synonyms: whitespace problem

2011-03-25 Thread royr
Hello, I have a problem with the synonyms. SOLR strips the synonyms on white space. An example: manchester united, reds, manunited My index looks like this: manchester united red manunited i want this: manchester united red manunited my configuration:

Re: Create 2 index with solr

2011-03-25 Thread Dmitry Kan
Hi Amel, If you copy example dir from the solr distribution dir to example2 and change jetty's port in the example2/etc/jetty.xml to something different from the one in example/etc/jetty.xml, you'll effectively have two different servers with two separate SOLRs. Now you can independently modify y

Create 2 index with solr

2011-03-25 Thread Amel Fraisse
Hi, I am using Solr to index documents. And I would index my documents with 2 different analyzer and generate 2 index. So I don't know how I could generate 2 different index? Thank you for your help. Amel.

Re: Search in database and documents

2011-03-25 Thread Jan Høydahl
Hi again :) Please elaborate on what you are trying to do in more detail, and we'll be able to suggest a way forward. Read this page carefully: http://wiki.apache.org/solr/UsingMailingLists -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 25. mars 2011, at 14.16, D

Re: Problems with creating a query that matches all the documents I want to display

2011-03-25 Thread Jan-Eirik B . Nævdal
Hi and thanks for all the answers. I finally managed to construct a fq that did what I wanted fq=(-(-obj_todate_dt:[NOW/MINUTE TO *] AND obj_todate_dt:[* TO *]) AND -(-obj_fromdate_dt:[* TO NOW/MINUTE] AND obj_fromdate_dt:[* TO *])) This gave me all documents without opening and closing time, an

Search in database and documents

2011-03-25 Thread Deepak Singh
I m new in solr search i want to change schema for search in database and documents.

Re: solr on the cloud

2011-03-25 Thread Upayavira
On Fri, 25 Mar 2011 14:26 +0200, "Dmitry Kan" wrote: > Hi, Upayavira > > Probably I'm confusing the terms here. When I say "distributed faceting" > I'm > more into SOLR on the cloud (e.g. HDFS + MR + cloud of commodity > machines) > rather than into traditional multicore/sharded SOLR on a singl

Re: problem with snowballporterfilterfactory

2011-03-25 Thread Erick Erickson
Why are you using the stemmer at all then? This is the exact inverse of how protwords.txt is usually used You might think about removing the stemmer from the analysis chain and using synonyms to transform your list of words Best Erick On Fri, Mar 25, 2011 at 5:59 AM, anurag.walia wrote:

Re: Newbie wants to index XML content.

2011-03-25 Thread Erick Erickson
Solr does not index random XML documents, (but see Martin's comments about DIH). Solr will index XML documents that have a specific format, however. The general form is: value to index value for this field So you can either try DIH or parse the raw XML yourself and

Deduplication questions

2011-03-25 Thread eks dev
Q1. Is is possible to pass *analyzed* content to the public abstract class Signature { public void init(SolrParams nl) { } public abstract String calculate(String content); } Q2. Method calculate() is using concatenated fields from name,features,cat Is there any mechanism I could build "fi

Re: solr on the cloud

2011-03-25 Thread Dmitry Kan
Hi, Upayavira Probably I'm confusing the terms here. When I say "distributed faceting" I'm more into SOLR on the cloud (e.g. HDFS + MR + cloud of commodity machines) rather than into traditional multicore/sharded SOLR on a single or multiple servers with non-distributed file systems (is that what

Suggester spellcheck component and infix

2011-03-25 Thread Kai Schlamp-2
Does the suggester component of Solr also support infix search? (like .*ompute.*) Kai -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-spellcheck-component-and-infix-tp2729996p2729996.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr on the cloud

2011-03-25 Thread Upayavira
On Fri, 25 Mar 2011 13:44 +0200, "Dmitry Kan" wrote: > Hi Yonik, > > Oh, this is great. Is distributed faceting available in the trunk? What > is > the basic server setup needed for trying this out, is it cloud with HDFS > and > SOLR with zookepers? > Any chance to see the related documentation

Re: solr on the cloud

2011-03-25 Thread Dmitry Kan
Hi Yonik, Oh, this is great. Is distributed faceting available in the trunk? What is the basic server setup needed for trying this out, is it cloud with HDFS and SOLR with zookepers? Any chance to see the related documentation? :) On Fri, Mar 25, 2011 at 1:35 PM, Yonik Seeley wrote: > On Tue, Ma

Re: solr on the cloud

2011-03-25 Thread Yonik Seeley
On Tue, Mar 22, 2011 at 7:51 AM, Dmitry Kan wrote: > Basically, of high interest is checking out the Map-Reduce for distributed > faceting, is it even possible with the trunk? Solr already has distributed faceting, and it's much more performant than a map-reduce implementation would be. I've als

Re: SOLR - problems with non-english symbols when extracting HTML

2011-03-25 Thread Grijesh
Try to send HTML data using format CDATA . - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-problems-with-non-english-symbols-when-extracting-HTML-tp2729126p2729923.html Sent from the Solr - User mailing list archive at Nabble

Re: solr on the cloud

2011-03-25 Thread Dmitry Kan
Hi Otis, Ok, thanks. No, the question about distributed faceting was in a 'guess' mode as faceting seems to be a good fit to MR. I probably need to follow the jira tickets closer for a follow-up, but was initially wondering if I missed some documentation on the topic, which didn't apparently happ

Re: Multiple Cores with Solr Cell for indexing documents

2011-03-25 Thread Upayavira
There's options in solr.xml that point to lib dirs. Make sure you get them right. Upayavira On Thu, 24 Mar 2011 23:28 +0100, "Markus Jelsma" wrote: > I believe it's example/solr/lib where it looks for shared libs in > multicore. > But, each core can has its own lib dir, usually in core/lib. Thi

Re: Detecting an empty index during start-up

2011-03-25 Thread Andrzej Bialecki
On 3/25/11 11:25 AM, David McLaughlin wrote: Thanks Chris. I dug into the SolrCore code and after reading some of the code I ended up going with core.getNewestSearcher(true) and this fixed the problem. FYI, openNew=true is not implemented and can result in an UnsupportedOperationException. For

Re: Detecting an empty index during start-up

2011-03-25 Thread David McLaughlin
Thanks Chris. I dug into the SolrCore code and after reading some of the code I ended up going with core.getNewestSearcher(true) and this fixed the problem. David On Thu, Mar 24, 2011 at 7:20 PM, Chris Hostetter wrote: > : I am not familiar with Solr internals, so the approach I wanted to take

Re: Parent-child options

2011-03-25 Thread Jan Høydahl
Otis, Impressive list of possible solutions you've come up with :) I've used Jonathan's "pattern" in several projects, but it quickly becomes unmanagable. My plan was to try to come up with a new FieldType inspired by FAST's Scope-field, which would take JSON in and be able to match hierarchica

Re: problem with snowballporterfilterfactory

2011-03-25 Thread anurag.walia
Thanks in advance. Please try to resolve the issue Please find the screen shot of analyser. I have a problem with number of character in Term Text after snowballporterfilterfactory . I entered "Polymer" but after snowballporterfilterfactory it become "Polym" while it was not exist in "protwords.t

SOLR - problems with non-english symbols when extracting HTML

2011-03-25 Thread kushti
When I send plain utf-8 text to index(non-english text), all ok, but with HTML I have wrong characters instead of non-ASCII symbols. So $this->solr->extractContents($url, strip_tags($code), array("literal.url"=>$url,"fmap.content"=>"body")); Works well, but just $this->solr->extractContents($u

AW: stopwords not working in multicore setup

2011-03-25 Thread Martin Rödig
I have some questions about your config: Is the stopwords-de.txt in the same diractory as the shema.xml? Is the title field from type text? Have you the same problem with german stopwords with out Umlaut (ü,ö,ä) like the word "denn"? A Problem can be that the stopwords-de.txt is not save as UT

AW: Newbie wants to index XML content.

2011-03-25 Thread Martin Rödig
You can use the DIH (Dataimport Import Handler) to split up and index that XML. http://wiki.apache.org/solr/DataImportHandler Mit freundlichen Grüßen M.Sc. Dipl.-Inf. (FH) Martin Rödig SHI Elektronische Medien GmbH - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Broken stats.js

2011-03-25 Thread Mark Mandel
Relatively new to SOLR (only JUST deployed my first SOLR app to production, very proud ;o) ) I went to check out the solr/mycore/admin/stats.jsp page... and all I get is a blank page. Looking into it deeper, it seems that SOLR is returning badly encoded XML to the browser, so it's not rendering.