Keepwords & DataImportHandler
Hi all, This is my first question in this forum :D I'm trying to import documents using a DataImportHandler. The first test is to import some document having only a title, I want to import this field indexing it as a standard text type value. Moreover I'd like to use a KeepwordsFilter for searching in these titles fields some words I specified in a file, and I like to put the founded words on a second fields, named "tags", so I added this row in my configuration: For example: Assuming I've specified "Nintendo" in my keepwords file. If I'm importing a document with title "Nintendo NES" I'd like to have two fields in the resulting document imported: title --> "Nintendo NES" tags --> "Nintendo" At the moment I have two fields with the same values: "Nintendo NES" If I use the Analyzer section in SOLR panel It seems that I made a good configuration on my schema.xml ST nintendo NES KWFnintendo LCF nintendo So... I'd like to understand If I'm trying to use DataImportHandler in a wrong way or if I need to change something for obtaining the behaviour I explained above. Hope someone can help me, regards leo -- View this message in context: http://lucene.472066.n3.nabble.com/Keepwords-DataImportHandler-tp4174699.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Keepwords & DataImportHandler
Hi all, you are right, I was doing everything right but I wasn't using facets for seeing the result. I was mixing indexing and analysis. Now I'm working on the next problem: having keepwords that consist of more than one word... but this is another problem :) thank you all, your hints were precious! Leo -- View this message in context: http://lucene.472066.n3.nabble.com/Keepwords-DataImportHandler-tp4174699p4174941.html Sent from the Solr - User mailing list archive at Nabble.com.
set keepword file to be used based on a field value
Hi all, I made some test and now I'm able to use keepwords for searching some common "brands" name in the docs I have in my index. I have docs with only two fields: - a title - a categoryId The tests I made right now were based on videogame related rows, so I have a keepwords.txt containing words like "nintendo", "playstation" and so on. Now I want to intruct solr to use a different keepword file depending on the categoryid value specified. So, for docs with categoryid=1 (videogame) I'd like to use keepwrods1.txt (the one with nintendo, playstation, etcetc) but id categoryid=2 (cars) I'd like to use keepwords2.txt (another file containg bmw, audi, etcetc) can someone help me? Regards Leo -- View this message in context: http://lucene.472066.n3.nabble.com/set-keepword-file-to-be-used-based-on-a-field-value-tp4175474.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: set keepword file to be used based on a field value
Hi Tomoko, I understand you first reply and the first hint (one field for each categoryid). I thought this was a relatively "common" scenario. I'm interested in understanding the option you are talking about in the second reply. > you can tell "which keepwords set (file) shoud be used" to custom filter > by > adding special prefix (or something like) to the target field value. > but of course it makes indexing/querying process slightly complicated. Are you talking about adding a postfix (like _CAT1) at value of the field I'm going to analyze with keepwords? If the value ends with "_CAT1" ==> use as keepword file "keepwords1.txt" and so on? I can't understand how to reach this goal, have you seen some configuration examples? I didn't find anything :( Thanks Leo -- View this message in context: http://lucene.472066.n3.nabble.com/different-keepword-files-for-differents-field-values-tp4175474p4175528.html Sent from the Solr - User mailing list archive at Nabble.com.
edismax and mm: strange behaviour
Hi all I'm studying SOLR for implement it in my website. I've imported the db and I'm making some tests about edismax and mm. I'm searching for documents containing "xbox 360". - If I specifiy mm=100% (I have the same result setting default operator to "AND") SOLR give me 5 documents: [http://localhost:8983/solr/Collection1/select?wt=json&indent=true&q=xbox 360&rows=10&defType=edismax&mm=100%&fq=countryid:53] Ps3 xbox 360 ps4 xbox one Ps4, xbox one, ps3, xbox 360 Xbox one - scambio con 360 e supplemento Playstation 3, 2, xbox 360, one Xbox 360 ps3 xbox one ps4 notebook netbook i-phone Ok, It's right they all contains both "xbox" and "360". - BUT the same url specifying mm=0 gives me a lot of matching documents (559, the same I have with default operator set to "OR") [http://localhost:8983/solr/Collection1/select?wt=json&indent=true&q=xbox 360&rows=10&defType=edismax&mm=0%&fq=countryid:53] Bur surprising, the results consists of a lot of documents containing both "xbox" and "360" that aren't returned by the first query. Here the first 20 rows: Ps3 xbox 360 ps4 xbox one Ps4, xbox one, ps3, xbox 360 Xbox one - scambio con 360 e supplemento Playstation 3, 2, xbox 360, one Xbox 360 ps3 xbox one ps4 notebook netbook i-phone Xbox 360 cerco xbox 360 Xbox 360 Xbox 360 xbox 360 Xbox 360 Xbox 360 Xbox 360 Xbox 360 Xbox 360 Xbox 360 Xbox 360 Xbox 360 Xbox 360 How can it happens? Hope someone can help me. Regards, Leo -- View this message in context: http://lucene.472066.n3.nabble.com/edismax-and-mm-strange-behaviour-tp4178532.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: edismax and mm: strange behaviour
Hi Ahmet, I don't specify any qf in this query. Reading here (http://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29) it seems that mm is referred to the text provided as "q" in querystring, I am wrong? Reading the doc above, my expectation is that if I specify a q value with 2 words ("xbox" and "one") with mm=100% solr should return EACH document that contains BOTH the words in title (the only field I told it to search in). Maybe I'm missing something... Thanks Leo -- View this message in context: http://lucene.472066.n3.nabble.com/edismax-and-mm-strange-behaviour-tp4178532p4178595.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: edismax and mm: strange behaviour
Hi Jack, I read the documentation here: http://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29 My question is quite simple, maybe it's not clear for my poor english. As explained in the response to ahmet my goal is to get ALL and ONLY the documents that contains the two words I specified in "q" ("xbox" and "360"). I can't uderstand why if I specify "q=xbox 360" and "mm=100%" some documents containing both two words are not returned regards leo -- View this message in context: http://lucene.472066.n3.nabble.com/edismax-and-mm-strange-behaviour-tp4178532p4178603.html Sent from the Solr - User mailing list archive at Nabble.com.