Nutch 2.3 with Ms-SQL?

2017-07-25 Thread d.ku...@technisat.de
Hey,

I know this is not quite the right place to ask for my nutch question.
But, did one of you guy manage to use MS-SQL as GoraBackend for Nutch 2.3?
As our the Website we are about to crawl, is not that big I would love to use 
MS-SQL. So far I haven't worked with hadoop,  and in our company we still have 
a working MS-SQL cluster.


Thanks

David Kumar




plus sign in request / looking for + in title

2017-08-03 Thread d.ku...@technisat.de
Hey,

in our title we are having a word named "hd+".
Now I want to do a query right on these word, but if I do so, solr is just 
looking for "hd" and ignoring the plus sign. But I relay need to search for the 
whole string
Of course I did a url encode for the plus sign:

q=title:hd%2B

Can please anyone tell me, how to search for the plus sign "+"?

thanks

David


AW: plus sign in request / looking for + in title

2017-08-04 Thread d.ku...@technisat.de
Her Erick,

thanks for reply.
Analysis is a good point I tried "hd+" at the Field Value and  you were right: 

ST
text hd
raw_bytes  [68 64]
start 0
end 2
positionLength 1 
type 
position 1

So how can I prevent e.g. the ST (standartTokenizer) to remove thepus sign? An 
suggestions?

thanks


-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Donnerstag, 3. August 2017 16:46
An: solr-user
Betreff: Re: plus sign in request / looking for + in title

Take a look at your analysis chain. My bet is that the + is being stripped by 
some part of the chain. See the admin UI>>analysis page.

Best,
Erick

On Aug 3, 2017 06:47, "d.ku...@technisat.de"  wrote:

> Hey,
>
> in our title we are having a word named "hd+".
> Now I want to do a query right on these word, but if I do so, solr is 
> just looking for "hd" and ignoring the plus sign. But I relay need to 
> search for the whole string Of course I did a url encode for the plus 
> sign:
>
> q=title:hd%2B
>
> Can please anyone tell me, how to search for the plus sign "+"?
>
> thanks
>
> David
>


AW: AW: plus sign in request / looking for + in title

2017-08-04 Thread d.ku...@technisat.de
Hey, thanks.

Yeah i found a  way..
I sued for these files my on fieldtype. In these I'm using the 
WhitespaceTokenizerFactory for query an index.. and now everything is like it 
should be..

:-)

Thanks

David

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:apa...@elyograg.org] 
Gesendet: Freitag, 4. August 2017 14:53
An: solr-user@lucene.apache.org
Betreff: Re: AW: plus sign in request / looking for + in title

On 8/4/2017 2:15 AM, d.ku...@technisat.de wrote:
> So how can I prevent e.g. the ST (standartTokenizer) to remove the plus sign? 
> An suggestions?

You can't.  The standard tokenizer really isn't configurable at all.

You'd need to change your analysis chain (tokenizer and filters) to produce the 
results you want.

Thanks,
Shawn



Re: AW: plus sign in request / looking for + in title

2017-08-04 Thread d.ku...@technisat.de
Hey,

that is a good point. What is the best way for filtering? About the plus at the 
request, we are doing on the whole request an URL encode..



Thanks
David


 

> Am 04.08.2017 um 17:34 schrieb Erick Erickson :
> 
> Glad to hear it. Two things:
> 
> 1> you might have to do some additional filtering when using
> WhitespaceTokenizer. It, well, splits on whitespace so things like
> punctuation will come through as part of the token. So "My dog has
> fleas." (note the period after fleas) would have the period included
> in the token "fleas.".
> 
> 2> getting the plus sign through URL encoding and the parser may be
> fun, you may have to escape it to keep it from being interpreted as an
> operator
> 
> Best,
> Erick
> 
> On Fri, Aug 4, 2017 at 5:55 AM, d.ku...@technisat.de
>  wrote:
>> Hey, thanks.
>> 
>> Yeah i found a  way..
>> I sued for these files my on fieldtype. In these I'm using the 
>> WhitespaceTokenizerFactory for query an index.. and now everything is like 
>> it should be..
>> 
>> :-)
>> 
>> Thanks
>> 
>> David
>> 
>> -Ursprüngliche Nachricht-
>> Von: Shawn Heisey [mailto:apa...@elyograg.org]
>> Gesendet: Freitag, 4. August 2017 14:53
>> An: solr-user@lucene.apache.org
>> Betreff: Re: AW: plus sign in request / looking for + in title
>> 
>>> On 8/4/2017 2:15 AM, d.ku...@technisat.de wrote:
>>> So how can I prevent e.g. the ST (standartTokenizer) to remove the plus 
>>> sign? An suggestions?
>> 
>> You can't.  The standard tokenizer really isn't configurable at all.
>> 
>> You'd need to change your analysis chain (tokenizer and filters) to produce 
>> the results you want.
>> 
>> Thanks,
>> Shawn
>>