I had meant to also include a link to a blog post of mine that lists some
useful links:
http://fschiettecatte.wordpress.com/2008/07/23/language-recognition/
François
On Mar 25, 2011, at 11:59 AM, Grant Ingersoll wrote:
> You are looking for a language identification tool. You could ch
François
I think there is a language identification tool in the Nutch code base,
otherwise I have written one in Perl which could easily be translated to Java.
I wont have access to it for 10 days (I am traveling), but I am happy to send
you a link to it when I get back (and anyone else who wan
Dmitry,
If you're planning on using HBase you can take a look at
https://issues.apache.org/jira/browse/HBASE-3529 I think we may even
have a reasonable solution for reading the index [randomly] out of
HDFS. Benchmarking'll be implemented next. It's not production
ready, suggestions are welcome.
Hi Otis,
Thanks for elaborating on this and the link (funny!).
I have quite a big dataset growing all the time. The problems that I start
facing are pretty much predictable:
1. Scalability: this inludes indexing time (now some days!, better hours or
even minutes, if that's possible) along with ha
You could develop your own tokenizer to extract the different forms of your
ids.
It is possible to extend the pattern tokenizer.
Ludovic.
Le 25 mars 2011 21:13, "David Yang [via Lucene]" <
ml-node+2732007-1439913827-383...@n3.nabble.com> a écrit :
>
>
> Hi,
>
>
>
> I am having some really strang
Hi all,
I know that I can change the default operator in two ways:
1) <*solrQueryParser defaultOperator*="AND|OR"/>
2) Add q.op=AND
I'm wondering if it is possible to change the default operator for a
specific field only? For example, if I use the URL:
http://localhost:8983/solr/search/?q=anima
Hi all,
I'm trying to execute the following command:
curl "
http://localhost:8983/solr/update/extract?&extractOnly=true&stream.url=http://www.nytimes.com/2011/03/26/world/middleeast/26syria.html?hp
"
but it doesn't work because the NYTimes url has a redirect in it. Is there
any way to tell T
Hi Dan,
This feels a bit like a buzzword soup with mushrooms. :)
MR jobs, at least the ones in Hadoopland, are very batch oriented, so that
wouldn't be very suitable for most search applications. There are some
technologies like Riak that combine MR and search. Let me use this funny
litt
Hi,
I am having some really strange issues matching "N61JQ-B2". If I had a
field "N61JQ-B2", and I wanted to match "N61JQ", "N61JQB2", "N61JQ-B2"
and "N61JQ B2" in dismax, what fieldtype should it have? My final
fallback is to use ngrams but that would impose a pretty large overhead,
since the
> I have a problem with the synonyms. SOLR strips the
> synonyms on white space.
> An example:
>
> manchester united, reds, manunited
>
> My index looks like this:
>
> manchester
> united
> red
> manunited
>
> i want this:
> manchester united
> red
> manunited
You can escape white spaces with
Ahh, thank you for the hints Martin... German stopwords without Umlaut work
correctly.
So I'm trying to figure out where the UTF-8 chars are getting messed up.
Using the Solr admin web UI, I did a search for title:für and the xml (or
json) output in the browser shows the query with the proper enc
Right, and you can go to sharding rather than managing your multiple
cores if thats warranted.
Erick
On Fri, Mar 25, 2011 at 1:31 PM, Brandon Waterloo
wrote:
> I did finally manage to deploy Solr with multiple cores but we've been
> running into so many problems with permissions, index loca
I did finally manage to deploy Solr with multiple cores but we've been running
into so many problems with permissions, index location, and other things that I
(quite fortunately) convinced my boss that multiple cores are not the way to go
here. I had in place a single-core system that would fil
You can only set properties for a lib dir that must be used in solrconfig.xml.
You can use sharedLib in solr.xml though.
> There's options in solr.xml that point to lib dirs. Make sure you get
> them right.
>
> Upayavira
>
> On Thu, 24 Mar 2011 23:28 +0100, "Markus Jelsma"
>
> wrote:
> > I be
You are looking for a language identification tool. You could check
https://issues.apache.org/jira/browse/SOLR-1979 for the start of this.
Otherwise, you have to roll your own or buy a third party one.
On Mar 24, 2011, at 12:24 PM, fr.jur...@voila.fr wrote:
> Hello Solrists,
>
> As it says i
Hello,
I have a problem with the synonyms. SOLR strips the synonyms on white space.
An example:
manchester united, reds, manunited
My index looks like this:
manchester
united
red
manunited
i want this:
manchester united
red
manunited
my configuration:
Hi Amel,
If you copy example dir from the solr distribution dir to example2 and
change jetty's port in the example2/etc/jetty.xml to something different
from the one in example/etc/jetty.xml, you'll effectively have two different
servers with two separate SOLRs.
Now you can independently modify y
Hi,
I am using Solr to index documents. And I would index my documents with 2
different analyzer and generate 2 index.
So I don't know how I could generate 2 different index?
Thank you for your help.
Amel.
Hi again :)
Please elaborate on what you are trying to do in more detail, and we'll be able
to suggest a way forward.
Read this page carefully: http://wiki.apache.org/solr/UsingMailingLists
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
On 25. mars 2011, at 14.16, D
Hi and thanks for all the answers.
I finally managed to construct a fq that did what I wanted
fq=(-(-obj_todate_dt:[NOW/MINUTE TO *] AND obj_todate_dt:[* TO *]) AND
-(-obj_fromdate_dt:[* TO NOW/MINUTE] AND obj_fromdate_dt:[* TO *]))
This gave me all documents without opening and closing time, an
I m new in solr search i want to change schema for search in database and
documents.
On Fri, 25 Mar 2011 14:26 +0200, "Dmitry Kan"
wrote:
> Hi, Upayavira
>
> Probably I'm confusing the terms here. When I say "distributed faceting"
> I'm
> more into SOLR on the cloud (e.g. HDFS + MR + cloud of commodity
> machines)
> rather than into traditional multicore/sharded SOLR on a singl
Why are you using the stemmer at all then? This is the exact
inverse of how protwords.txt is usually used
You might think about removing the stemmer from the analysis
chain and using synonyms to transform your list of words
Best
Erick
On Fri, Mar 25, 2011 at 5:59 AM, anurag.walia wrote:
Solr does not index random XML documents, (but see Martin's comments
about DIH). Solr will index XML documents that have a specific format,
however. The general form is:
value to index
value for this field
So you can either try DIH or parse the raw XML yourself and
Q1. Is is possible to pass *analyzed* content to the
public abstract class Signature {
public void init(SolrParams nl) { }
public abstract String calculate(String content);
}
Q2. Method calculate() is using concatenated fields from name,features,cat
Is there any mechanism I could build "fi
Hi, Upayavira
Probably I'm confusing the terms here. When I say "distributed faceting" I'm
more into SOLR on the cloud (e.g. HDFS + MR + cloud of commodity machines)
rather than into traditional multicore/sharded SOLR on a single or multiple
servers with non-distributed file systems (is that what
Does the suggester component of Solr also support infix search? (like
.*ompute.*)
Kai
--
View this message in context:
http://lucene.472066.n3.nabble.com/Suggester-spellcheck-component-and-infix-tp2729996p2729996.html
Sent from the Solr - User mailing list archive at Nabble.com.
On Fri, 25 Mar 2011 13:44 +0200, "Dmitry Kan"
wrote:
> Hi Yonik,
>
> Oh, this is great. Is distributed faceting available in the trunk? What
> is
> the basic server setup needed for trying this out, is it cloud with HDFS
> and
> SOLR with zookepers?
> Any chance to see the related documentation
Hi Yonik,
Oh, this is great. Is distributed faceting available in the trunk? What is
the basic server setup needed for trying this out, is it cloud with HDFS and
SOLR with zookepers?
Any chance to see the related documentation? :)
On Fri, Mar 25, 2011 at 1:35 PM, Yonik Seeley wrote:
> On Tue, Ma
On Tue, Mar 22, 2011 at 7:51 AM, Dmitry Kan wrote:
> Basically, of high interest is checking out the Map-Reduce for distributed
> faceting, is it even possible with the trunk?
Solr already has distributed faceting, and it's much more performant
than a map-reduce implementation would be.
I've als
Try to send HTML data using format CDATA .
-
Thanx:
Grijesh
www.gettinhahead.co.in
--
View this message in context:
http://lucene.472066.n3.nabble.com/SOLR-problems-with-non-english-symbols-when-extracting-HTML-tp2729126p2729923.html
Sent from the Solr - User mailing list archive at Nabble
Hi Otis,
Ok, thanks.
No, the question about distributed faceting was in a 'guess' mode as
faceting seems to be a good fit to MR. I probably need to follow the jira
tickets closer for a follow-up, but was initially wondering if I missed some
documentation on the topic, which didn't apparently happ
There's options in solr.xml that point to lib dirs. Make sure you get
them right.
Upayavira
On Thu, 24 Mar 2011 23:28 +0100, "Markus Jelsma"
wrote:
> I believe it's example/solr/lib where it looks for shared libs in
> multicore.
> But, each core can has its own lib dir, usually in core/lib. Thi
On 3/25/11 11:25 AM, David McLaughlin wrote:
Thanks Chris. I dug into the SolrCore code and after reading some of the
code I ended up going with core.getNewestSearcher(true) and this fixed the
problem.
FYI, openNew=true is not implemented and can result in an
UnsupportedOperationException. For
Thanks Chris. I dug into the SolrCore code and after reading some of the
code I ended up going with core.getNewestSearcher(true) and this fixed the
problem.
David
On Thu, Mar 24, 2011 at 7:20 PM, Chris Hostetter
wrote:
> : I am not familiar with Solr internals, so the approach I wanted to take
Otis,
Impressive list of possible solutions you've come up with :)
I've used Jonathan's "pattern" in several projects, but it quickly becomes
unmanagable. My plan was to try to come up with a new FieldType inspired by
FAST's Scope-field, which would take JSON in and be able to match hierarchica
Thanks in advance.
Please try to resolve the issue
Please find the screen shot of analyser. I have a problem with number of
character in Term Text after snowballporterfilterfactory . I entered
"Polymer" but after snowballporterfilterfactory it become "Polym" while it
was not exist in "protwords.t
When I send plain utf-8 text to index(non-english text), all ok, but with
HTML I have wrong characters instead of non-ASCII symbols. So
$this->solr->extractContents($url, strip_tags($code),
array("literal.url"=>$url,"fmap.content"=>"body"));
Works well, but just
$this->solr->extractContents($u
I have some questions about your config:
Is the stopwords-de.txt in the same diractory as the shema.xml?
Is the title field from type text?
Have you the same problem with german stopwords with out Umlaut (ü,ö,ä) like
the word "denn"?
A Problem can be that the stopwords-de.txt is not save as UT
You can use the DIH (Dataimport Import Handler) to split up and index that XML.
http://wiki.apache.org/solr/DataImportHandler
Mit freundlichen Grüßen
M.Sc. Dipl.-Inf. (FH) Martin Rödig
SHI Elektronische Medien GmbH
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Relatively new to SOLR (only JUST deployed my first SOLR app to production,
very proud ;o) )
I went to check out the solr/mycore/admin/stats.jsp page... and all I get is
a blank page.
Looking into it deeper, it seems that SOLR is returning badly encoded XML to
the browser, so it's not rendering.
41 matches
Mail list logo