Keepwords & DataImportHandler

2014-12-17 Thread leostro
Hi all,

This is my first question in this forum :D
I'm trying to import documents using a DataImportHandler.


 



The first test is to import some document having only a title, I want to
import this field indexing it as a standard text type value.
Moreover I'd like to use a KeepwordsFilter for searching in these titles
fields some words I specified in a file, and I like to put the founded words
on a second fields, named "tags", so I added this row in my configuration:



For example: Assuming I've specified "Nintendo" in my keepwords file. If I'm
importing a document with title "Nintendo NES" I'd like to have two fields
in the resulting document imported:

title --> "Nintendo NES"
tags --> "Nintendo"

At the moment I have two fields with the same values: "Nintendo NES"
If I use the Analyzer section in SOLR  panel It seems that I made a good
configuration on my schema.xml

ST   nintendo NES
KWFnintendo 
LCF nintendo

So... I'd like to understand If I'm trying to use DataImportHandler in a
wrong way or if I need to change something for obtaining the behaviour I
explained above.

Hope someone can help me,
regards
leo

 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Keepwords-DataImportHandler-tp4174699.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Keepwords & DataImportHandler

2014-12-18 Thread leostro
Hi all,

you are right, I was doing everything right but I wasn't using facets for
seeing the result.
I was mixing indexing and analysis.
Now I'm working on the next problem: having keepwords that consist of more
than one word... but this is another problem :)

thank you all, your hints were precious!
Leo




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Keepwords-DataImportHandler-tp4174699p4174941.html
Sent from the Solr - User mailing list archive at Nabble.com.


set keepword file to be used based on a field value

2014-12-21 Thread leostro
Hi all,

I made some test and now I'm able to use keepwords for searching some common
"brands" name in the docs I have in my index.
I have docs with only two fields:
- a title
- a categoryId
The tests I made right now were based on videogame related rows, so I have a
keepwords.txt containing words like "nintendo", "playstation" and so on.

Now I want to intruct solr to use a different keepword file depending on the
categoryid value specified.

So, for docs with categoryid=1 (videogame) I'd like to use keepwrods1.txt
(the one with nintendo, playstation, etcetc) but id categoryid=2 (cars) I'd
like to use keepwords2.txt (another file containg bmw, audi, etcetc)

can someone help me?
Regards

Leo




--
View this message in context: 
http://lucene.472066.n3.nabble.com/set-keepword-file-to-be-used-based-on-a-field-value-tp4175474.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: set keepword file to be used based on a field value

2014-12-22 Thread leostro
Hi Tomoko,

I understand you first reply and the first hint (one field for each
categoryid).
I thought this was a relatively "common" scenario.

I'm interested in understanding the option you are talking about in the
second reply.

> you can tell "which keepwords set (file) shoud be used" to custom filter
> by
> adding special prefix (or something like) to the target field value.
> but of course it makes indexing/querying process slightly complicated. 

Are you talking about adding a postfix (like _CAT1) at value of the field
I'm going to analyze with keepwords? If the value ends with "_CAT1" ==> use
as keepword file "keepwords1.txt" and so on?

I can't understand how to reach this goal, have you seen some configuration
examples?
I didn't find anything :(

Thanks
Leo




--
View this message in context: 
http://lucene.472066.n3.nabble.com/different-keepword-files-for-differents-field-values-tp4175474p4175528.html
Sent from the Solr - User mailing list archive at Nabble.com.


edismax and mm: strange behaviour

2015-01-10 Thread leostro
Hi all

I'm studying SOLR for implement it in my website.
I've imported the db and I'm making some tests about edismax and mm.
I'm searching for documents containing "xbox 360".

- If I specifiy mm=100% (I have the same result setting default operator to
"AND") SOLR give me 5 documents:
[http://localhost:8983/solr/Collection1/select?wt=json&indent=true&q=xbox
360&rows=10&defType=edismax&mm=100%&fq=countryid:53]

Ps3 xbox 360 ps4 xbox one
Ps4, xbox one, ps3, xbox 360
Xbox one - scambio con 360 e supplemento
Playstation 3, 2, xbox 360, one
Xbox 360 ps3 xbox one ps4 notebook netbook i-phone

Ok, It's right they all contains both "xbox" and "360".

- BUT the same url specifying mm=0 gives me a lot of matching documents
(559, the same I have with default operator set to "OR")
[http://localhost:8983/solr/Collection1/select?wt=json&indent=true&q=xbox
360&rows=10&defType=edismax&mm=0%&fq=countryid:53]

Bur surprising, the results consists of a lot of documents containing both
"xbox" and "360" that aren't returned by the first query. 
Here the first 20 rows:

Ps3 xbox 360 ps4 xbox one
Ps4, xbox one, ps3, xbox 360
Xbox one - scambio con 360 e supplemento
Playstation 3, 2, xbox 360, one
Xbox 360 ps3 xbox one ps4 notebook netbook i-phone
Xbox 360
cerco xbox 360
Xbox 360
Xbox 360
xbox 360
Xbox 360
Xbox 360
Xbox 360
Xbox 360
Xbox 360
Xbox 360
Xbox 360
Xbox 360
Xbox 360

How can it happens?
Hope someone can help me.

Regards,
Leo



--
View this message in context: 
http://lucene.472066.n3.nabble.com/edismax-and-mm-strange-behaviour-tp4178532.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: edismax and mm: strange behaviour

2015-01-10 Thread leostro
Hi Ahmet,

I don't specify any qf in this query.
Reading here
(http://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29)
it seems that mm is referred to the text provided as "q" in querystring, I
am wrong?
Reading the doc above, my expectation is that if I specify a q value with 2
words ("xbox" and "one") with mm=100% solr should return EACH document that
contains BOTH the words in title (the only field I told it to search in).

Maybe I'm missing something...
Thanks
Leo



--
View this message in context: 
http://lucene.472066.n3.nabble.com/edismax-and-mm-strange-behaviour-tp4178532p4178595.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: edismax and mm: strange behaviour

2015-01-10 Thread leostro
Hi Jack,

I read the documentation here:
http://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29

My question is quite simple, maybe it's not clear for my poor english.
As explained in the response to ahmet my goal is to get ALL and ONLY the
documents that contains the two words I specified in "q" ("xbox" and "360").

I can't uderstand why if I specify "q=xbox 360" and "mm=100%" some documents
containing both two words are not returned

regards
leo





--
View this message in context: 
http://lucene.472066.n3.nabble.com/edismax-and-mm-strange-behaviour-tp4178532p4178603.html
Sent from the Solr - User mailing list archive at Nabble.com.