Re: Sorting TEXT Field problems :-(
Kraus, Ralf | pixelhouse GmbH schrieb: Hello, Querry: {wt=json&rows=30&json.nl=map&start=0&sort=RezeptName+asc} Result : Doppeldecker Eiersalat Curry - Eiersalat Eiersalat Why is my second "Curry..." after "Doppeldecker" ??? RezeptName is a normal "text" field defined as : positionIncrementGap="100"> language="German" /> language="German" /> Greets -Ralf- Hi, normally you would define at least one special field for sorting: http://wiki.apache.org/solr/CommonQueryParameters#head-9f40612b42721ed9e1979a4a80d68f4f8524e9b4 you have to use a single valued, indexed but untokenized field (or use a tokenizer that produces only one token) You might also look at field "alphaOnlySort" in the example schema. Tom
Re: Customizing SOLR-236 field collapsing
Is adding QueryComponent to your SearchComponents an option? When combined with the CollapseComponent this approach would return the collapsed and the complete result set. i.e.: collapse query facet mlt highlight Thomas Marc Sturlese schrieb: Hey there, I have been testing the last adjacent field collapsing patch in trunk and seems to work perfectly. I am trying to modify the function of it but don't know exactly how to do it. What I would like to do is instead of collapse the results send them to the end of the results cue. Aparently it is not possible to do that due to the way it is implemented. I have noticed that you get a DocSet of the ids that "survived" the collapsing and that match the query and filters (collapseFilterDocSet = collapseFilter.getDocSet();, you get it in CollapseComponent.java. Once it is done the search is excuted again, this time the DocSet obtained before is passed as a filter: DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(), collapseFilterDocSet == null? rb.getFilters(): null, collapseFilterDocSet, rb.getSortSpec().getSort(), rb.getSortSpec().getOffset(), rb.getSortSpec().getCount(), rb.getFieldFlags()); The result of this search will give you the final result (with the correct offset and start). I have thought in saving the collapsed docs in another DocSet and after do something with them... but don't know how to manage it. Any clue about how could I reach the goal? Thanks in advance
Re: grouping response docs together
Hello Matt, the patch should work with trunk and after a small fix with 1.3 too (see my comment in SOLR-236). I just made a successful build to be sure. Do you see any error messages? Thomas Matt Mitchell schrieb: Thanks guys. I looked at the dedup stuff, but the documents I'm adding aren't really duplicates. They're very similar, but different. I checked out the field collapsing feature patch, applied the patch but can't get it to build successfully. Will this patch work with a nightly build? Thanks! On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: Matt - you may also want to detect near duplicates at index time: http://wiki.apache.org/solr/Deduplication Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell To: solr-user@lucene.apache.org Sent: Friday, May 15, 2009 6:52:48 PM Subject: grouping response docs together Is there a built-in mechanism for grouping similar documents together in the response? I'd like to make it look like there is only one document with multiple "hits". Matt
Re: Search combination?
I assume you are using the StandardRequestHandler, so this should work: http://192.168.105.54:8983/solr/itas?q=size:7* AND extension:pdf Also have a look at the follwing links: http://wiki.apache.org/solr/SolrQuerySyntax http://lucene.apache.org/java/2_4_1/queryparsersyntax.html Thomas Jörg Agatz schrieb: Hi users... i have a Problem... i will search for: http://192.168.105.54:8983/solr/itas?q=size:7*&extension:db i mean i search for all documents they are size 7* and extension:pdf, But it dosent work i get some other files, with extension doc ore db what is Happens about ? Jörg
Re: Converting German special characters / umlaute
Try the SnowballPorterFilterFactory described here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters You should use the German2 variant that converts ä and ae to a, ö and oe to o and so on. More details: http://snowball.tartarus.org/algorithms/german2/stemmer.html Every document in solr can have any number of fields which might have the same source but have different field types and are therefore handled differently (stored as is, analyzed in different ways...). Use copyField in your schema.xml to feed your data into multiple fields. During searching you decide which fields you like to search on (usually the analyzed ones) and which you retrieve when getting the document back. Tom Matthias Eireiner schrieb: Dear list, I have two questions regarding German special characters or umlaute. is there an analyzer which automatically converts all german special characters to their specific dissected from, such as ü to ue and ä to ae, etc.?! I also would like to have, that the search is always run against the dissected data. But when the results are returned the initial data with the non modified data should be returned. Does lucene GermanAnalyzer this job? I run across it, but I could not figure out from the documentation whether it does the job or not. thanks a lot in advance. Matthias
Re: Different search results for (german) singular/plural searches - looking for a solution
in short: use stemming Try the SnowballPorterFilterFactory with German2 as language attribute first and use synonyms for combined words i.e. "Herrenhose" => "Herren", "Hose". By using stemming you will maybe have some "interesting" results, but it is much better living with them than having no or much less results ;o) Find more infos on the Snowball stemming algorithms here: http://snowball.tartarus.org/ Also have a look at the StopFilterFactory, here is a sample stopwordlist for the german language: http://snowball.tartarus.org/algorithms/german/stop.txt Good luck, Tom Martin Grotzke schrieb: Hello, with our application we have the issue, that we get different results for singular and plural searches (german language). E.g. for "hose" we get 1.000 documents back, but for "hosen" we get 10.000 docs. The same applies to "t-shirt" or "t-shirts", of e.g. "hut" and "hüte" - lots of cases :) This is absolutely correct according to the schema.xml, as right now we do not have any stemming or synonyms included. Now we want to have similar search results for these singular/plural searches. I'm thinking of a solution for this, and want to ask, what are your experiences with this. Basically I see two options: stemming and the usage of synonyms. Are there others? My concern with stemming is, that it might produce unexpected results, so that docs are found that do not match the query from the users point of view. I asume that this needs a lot of testing with different data. The issue with synonyms is, that we would have to create a file containing all synonyms, so we would have to figure out all cases, in contrast to a solutions that is based on an algorithm. The advantage of this approach is IMHO, that it is very predictable which results will be returned for a certain query. Some background information: Our documents contain products (id, name, brand, category, producttype, description, color etc). The singular/plural issue basically applied to the fields name, category and producttype, so we would like to restrict the solution to these fields. Do you have suggestions how to handle this? Thanx in advance for sharing your experiences, cheers, Martin
Re: Different search results for (german) singular/plural searches - looking for a solution
Martin Grotzke schrieb: Try the SnowballPorterFilterFactory with German2 as language attribute first and use synonyms for combined words i.e. "Herrenhose" => "Herren", "Hose". so you use a combined approach? Yes, we define the relevant parts of compounded words (keywords only) as synonyms and feed them in a special field that is used for searching and for the product index. I hope there will be a filter that can split compounded word sometimes in the future... By using stemming you will maybe have some "interesting" results, but it is much better living with them than having no or much less results ;o) Do you have an example what "interesting" results I can expect, just to get an idea? Find more infos on the Snowball stemming algorithms here: http://snowball.tartarus.org/ Thanx! I also had a look at this site already, but what is missing is a demo where one can see what's happening. I think I'll play a little with stemming to get a feeling for this. I think the Snowball stemmer is very good so I have no practical example for you. Maybe this is of value to see what happens: http://snowball.tartarus.org/algorithms/german/diffs.txt If you have mixed languages in your content, which sometimes happens in product data, you might get into some trouble. Also have a look at the StopFilterFactory, here is a sample stopwordlist for the german language: http://snowball.tartarus.org/algorithms/german/stop.txt Our application handles products, do you think such stopwords are useful in this scenario also? I wouldn't expect a user to search for "keine hose" or s.th. like this :) I have seen much worse queries, so you never know ;o) think of a query like this: "Hose in blau für Herren" You will definetly want to remove "in" and "für" during searching and it reduces index size when removed during indexing. Maybe you will even get better scores when only relevant terms are used. You should optimze the stopword list based on your data. Regards, Tom
Re: AW: Converting German special characters / umlaute
Hi, the SnowballPorterFilterFactory is a complete stemmer that transforms words to their basic form (laufen -> lauf, läufer -> lauf). One part of that process is replacing language specific special characters. So SnowballPorterFilterFactory does what you wanted (beside other things). I mentioned it because it is a very good start when using solr and especially when dealing with documents in languages other than english. Tom Matthias Eireiner schrieb: Dear list, it has been some time, but here is what I did. I had a look at Thomas Traeger's tip to use the SnowballPorterFilterFactory, which does not actually do the job. Its purpose is to convert regular ASCII into special characters. And I want it the other way, such that all special character are converted to regular ASCII. The tip of J.J. Larrea, to use the PatternReplaceFilterFactory, solved the problem. And as Chris Hostetter noted, stored fields always return the initial value, which turned the second part of my question obsolete. Thanks a lot for your help! best Matthias -Ursprüngliche Nachricht- Von: Thomas Traeger [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 26. September 2007 23:44 An: solr-user@lucene.apache.org Betreff: Re: Converting German special characters / umlaute Try the SnowballPorterFilterFactory described here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters You should use the German2 variant that converts ä and ae to a, ö and oe to o and so on. More details: http://snowball.tartarus.org/algorithms/german2/stemmer.html Every document in solr can have any number of fields which might have the same source but have different field types and are therefore handled differently (stored as is, analyzed in different ways...). Use copyField in your schema.xml to feed your data into multiple fields. During searching you decide which fields you like to search on (usually the analyzed ones) and which you retrieve when getting the document back. Tom Matthias Eireiner schrieb: Dear list, I have two questions regarding German special characters or umlaute. is there an analyzer which automatically converts all german special characters to their specific dissected from, such as ü to ue and ä to ae, etc.?! I also would like to have, that the search is always run against the dissected data. But when the results are returned the initial data with the non modified data should be returned. Does lucene GermanAnalyzer this job? I run across it, but I could not figure out from the documentation whether it does the job or not. thanks a lot in advance. Matthias
Re: All facet.fields for a given facet.query?
Hi, I'm also just at that point where I think I need a wildcard facet.field parameter (or someone points out another solution for my problem...). Here is my situation: I have many products of different types with totally different attributes. There are currently more than 300 attributes I use dynamic fields to import the attributes into solr without having to define a specific field for each attribute. Now when I make a query I would like to get back all facet.fields that are relevant for that query. I think it would be really nice, if I don't have to know which facets fields are there at query time, instead just import attributes into dynamic fields, get the relevant facets back and decide in the frontend which to display and how... What do the experts think about this? Tom
Re: All facet.fields for a given facet.query?
first: sorry for the bad quoting, I found your message in the archive only... I have many products of different types with totally different attributes. There are currently more than 300 attributes I use dynamic fields to import the attributes into solr without having to define a specific field for each attribute. Now when I make a query I would like to get back all facet.fields that are relevant for that query. I think it would be really nice, if I don't have to know which facets fields are there at query time, instead just import attributes into The problem is there may be lots of fields you index but don't want to facet on (full text search fields) and Solr has no easy way of knowing the difference between those and the fields you think it makes sense to facet on ... even if a field does make sense to facet on some of the time, that doesn't mean it makes sense all of the time (as you say "when I make a query I would like to get back all facet.fields that are relevant for that query" ... Solr has no way of knowing which fields make sense for that query unless it tries them all (can be very expensive) or you tell it. I solve this problem by having metadata stored in my index which tells my custom request handler what fields to facet on for each category ... but i've also got several thousand categories. If you've got less then 100 categories, you could easily enumerate them all with default facet.field params in your solrconfig using seperate requesthandler instances. What do the experts think about this? you may want to read up on the past discussion of this in SOLR-247 ... in particular note the link to the mail archive where there was assitional discussion about it as well. Where we left things is that it might make sense to support true globging in both fl and facet.field, so you can use naming conventions and say things like facet.field=facet_* but that in general trying to do something like facet.field=* would be a very bad idea even if it was supported. http://issues.apache.org/jira/browse/SOLR-247 to make it clear, i agree that it doesn't make sense faceting on all available fields, I only want faceting on those 300 attributes that are stored together with the fields for full text searches. A product/document has typically only 5-10 attributes. I like to decide at index time which attributes of a product might be of interest for faceting and store those in dynamic fields with the attribute-name and some kind of prefix or suffix to identify them at query time as facet.fields. Exactly the naming convention you mentioned. I will have a closer look at SOLR-247 and the supplied patch, seems like a good starting point to dig deeper into solr... :o) Tom
Re: All facet.fields for a given facet.query?
Martin Grotzke schrieb: On Tue, 2007-06-19 at 19:16 +0200, Thomas Traeger wrote: Hi, I'm also just at that point where I think I need a wildcard facet.field parameter (or someone points out another solution for my problem...). Here is my situation: I have many products of different types with totally different attributes. There are currently more than 300 attributes I use dynamic fields to import the attributes into solr without having to define a specific field for each attribute. Now when I make a query I would like to get back all facet.fields that are relevant for that query. I think it would be really nice, if I don't have to know which facets fields are there at query time, instead just import attributes into dynamic fields, get the relevant facets back and decide in the frontend which to display and how... Do you really need all facets in the frontend? no, only the subset with matches for the current query. Would it be a solution to have a facet ranking in the field definitions, and then decide at query time, on which fields to facet on? This would need an additional query parameter like facet.query.count. E.g. if you have a query with q=foo+AND+prop1:bar+AND+prop2:baz and you have fields prop1 with facet-ranking 100 prop2 with facet-ranking 90 prop3 with facet-ranking 80 prop4 with facet-ranking 70 prop5 with facet-ranking 60 then you might decide not to facet on prop1 and prop2 as you have already a constraint on it, but to facet on prop3 and prop4 if facet.query.count is 2. Just thinking about that... :) Cheers, Martin One step after the other ;o), the ranking of the facets will be another problem I have to solve, counts of facets and matching documents will be a starting point. Another idea is to use the score of the documents returned by the query to compute a score for the facet.field... Tom
Re: All facet.fields for a given facet.query?
Chris Hostetter schrieb: : to make it clear, i agree that it doesn't make sense faceting on all : available fields, I only want faceting on those 300 attributes that are : stored together with the fields for full text searches. A : product/document has typically only 5-10 attributes. : : I like to decide at index time which attributes of a product might be of : interest for faceting and store those in dynamic fields with the : attribute-name and some kind of prefix or suffix to identify them at : query time as facet.fields. Exactly the naming convention you mentioned. but if the facet fields are different for every document, and they use a simple dynamicField prefix (like "facet_*" for example) how do you know at query time which fields to facet on? ... even if wildcards work in facet.field, usingfacet.field=facet_* would require solr to compute the counts for *every* field matching that pattern to find out which ones have positive counts for the current result set -- there may only be 5 that actually matter, but it's got to try all 300 of them to find out which 5 that is. I just made a quick test by building a facet query with those 300 attributes. I realized, that the facets are build out of the whole index, not the subset returned by the initial query. Therefore I have a large number of empty facets which I simply ignore. In my case the QueryTime is somewhat higher (of course) but it is still at some milliseconds. (wow!!!) :o) So at this state of my investigation and in my use case I don't have to worry about performance even if I use the system in a way that uses more resources than necessary. this is where custom request handlers that understand that faceting "metadata" for your documents becomes key ... so you can say "when querying across the entire collection, only try to facet on category and manufacturer. if the search is constrained by category, then lookup other facet options to offer based on that category name from our metadata store, etc... Faceting on manufacturers and categories first and than present the corresponding facets might be used under some circumstances, but in my case the category structure is quite deep, detailed and complex. So when the user enters a query I like to say to him "Look, here are the manufacturers and categories with matches to your query, choose one if you want, but maybe there is another one with products that better fit your needs or products that you didn't even know about. So maybe you like to filter based on the following attributes." Something like this ;o) The point is, that i currently don't want to know too much about the data, I just want to feed it into solr, follow some conventions and get the most out of it as quickly as possible. Optimizations can and will take place at a later time. I hope to find some time to dig into solr SimpleFacets this weekend. Regards, Tom
Re: All facet.fields for a given facet.query?
: Faceting on manufacturers and categories first and than present the : corresponding facets might be used under some circumstances, but in my case : the category structure is quite deep, detailed and complex. So when : the user enters a query I like to say to him "Look, here are the : manufacturers and categories with matches to your query, choose one if you : want, but maybe there is another one with products that better fit your : needs or products that you didn't even know about. So maybe you like to : filter based on the following attributes." Something like this ;o) categories was just an example i used because it tends to be a common use case ... my point is the decision about which facet qualifies for the "maybe there is another one with products that better fit your needs" part of the response either requires computing counts for *every* facet constraint and then looking at them to see which ones provide good distribution, or by knowing something more about your metadata (ie: having stats that show the majority of people who search on the word "canon" want to facet on "megapixels") .. this is where custom biz logic comes in, becuase in a lot of situations computing counts for every possible facet may not be practical (even if the syntax to request it was easier) I get your point, but how to know where additional metadata is of value if not just trying? Currently I start with a generic approach to see what really is in the product data, to get an overview of the quality of the data and what happens if I use the data in the new search solution. Then I can decide what is to do to optimize the system, i.e. try to reduce the count of attributes, get the marketing to split somewhat generic attributes into more detailed ones, find a way to display the most relevant facets for the current query first and so on... Tom