Nutch 2.3 with Ms-SQL?
Hey, I know this is not quite the right place to ask for my nutch question. But, did one of you guy manage to use MS-SQL as GoraBackend for Nutch 2.3? As our the Website we are about to crawl, is not that big I would love to use MS-SQL. So far I haven't worked with hadoop, and in our company we still have a working MS-SQL cluster. Thanks David Kumar
plus sign in request / looking for + in title
Hey, in our title we are having a word named "hd+". Now I want to do a query right on these word, but if I do so, solr is just looking for "hd" and ignoring the plus sign. But I relay need to search for the whole string Of course I did a url encode for the plus sign: q=title:hd%2B Can please anyone tell me, how to search for the plus sign "+"? thanks David
AW: plus sign in request / looking for + in title
Her Erick, thanks for reply. Analysis is a good point I tried "hd+" at the Field Value and you were right: ST text hd raw_bytes [68 64] start 0 end 2 positionLength 1 type position 1 So how can I prevent e.g. the ST (standartTokenizer) to remove thepus sign? An suggestions? thanks -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Donnerstag, 3. August 2017 16:46 An: solr-user Betreff: Re: plus sign in request / looking for + in title Take a look at your analysis chain. My bet is that the + is being stripped by some part of the chain. See the admin UI>>analysis page. Best, Erick On Aug 3, 2017 06:47, "d.ku...@technisat.de" wrote: > Hey, > > in our title we are having a word named "hd+". > Now I want to do a query right on these word, but if I do so, solr is > just looking for "hd" and ignoring the plus sign. But I relay need to > search for the whole string Of course I did a url encode for the plus > sign: > > q=title:hd%2B > > Can please anyone tell me, how to search for the plus sign "+"? > > thanks > > David >
AW: AW: plus sign in request / looking for + in title
Hey, thanks. Yeah i found a way.. I sued for these files my on fieldtype. In these I'm using the WhitespaceTokenizerFactory for query an index.. and now everything is like it should be.. :-) Thanks David -Ursprüngliche Nachricht- Von: Shawn Heisey [mailto:apa...@elyograg.org] Gesendet: Freitag, 4. August 2017 14:53 An: solr-user@lucene.apache.org Betreff: Re: AW: plus sign in request / looking for + in title On 8/4/2017 2:15 AM, d.ku...@technisat.de wrote: > So how can I prevent e.g. the ST (standartTokenizer) to remove the plus sign? > An suggestions? You can't. The standard tokenizer really isn't configurable at all. You'd need to change your analysis chain (tokenizer and filters) to produce the results you want. Thanks, Shawn
Re: AW: plus sign in request / looking for + in title
Hey, that is a good point. What is the best way for filtering? About the plus at the request, we are doing on the whole request an URL encode.. Thanks David > Am 04.08.2017 um 17:34 schrieb Erick Erickson : > > Glad to hear it. Two things: > > 1> you might have to do some additional filtering when using > WhitespaceTokenizer. It, well, splits on whitespace so things like > punctuation will come through as part of the token. So "My dog has > fleas." (note the period after fleas) would have the period included > in the token "fleas.". > > 2> getting the plus sign through URL encoding and the parser may be > fun, you may have to escape it to keep it from being interpreted as an > operator > > Best, > Erick > > On Fri, Aug 4, 2017 at 5:55 AM, d.ku...@technisat.de > wrote: >> Hey, thanks. >> >> Yeah i found a way.. >> I sued for these files my on fieldtype. In these I'm using the >> WhitespaceTokenizerFactory for query an index.. and now everything is like >> it should be.. >> >> :-) >> >> Thanks >> >> David >> >> -Ursprüngliche Nachricht- >> Von: Shawn Heisey [mailto:apa...@elyograg.org] >> Gesendet: Freitag, 4. August 2017 14:53 >> An: solr-user@lucene.apache.org >> Betreff: Re: AW: plus sign in request / looking for + in title >> >>> On 8/4/2017 2:15 AM, d.ku...@technisat.de wrote: >>> So how can I prevent e.g. the ST (standartTokenizer) to remove the plus >>> sign? An suggestions? >> >> You can't. The standard tokenizer really isn't configurable at all. >> >> You'd need to change your analysis chain (tokenizer and filters) to produce >> the results you want. >> >> Thanks, >> Shawn >>