Re: quickie: do facetfields use same cached items in field cache as FQ-param?
as a related question: is here a way to inspect the queries currently in the filtercache? Britske wrote: > > Yeah i meant filter-cache, thanks. > It seemed that the particular field (cityname) was using a > keywordtokenizer (which doens't show at the front) which is why i missed > it i guess :-S. This means the term field is tokenized so > termEnums-apporach is used. This results in about 10.000 inserts on > facet.field=cityname on a cold searcher, which matches the nr of different > terms in that field. At least that explains that. > > So if I understand correctly if I use that same field in a FQ-param, say > fq=cityname:amsterdam and amsterdam is a term of field cityname, than the > FQ-query can utilize the cached 'query': cityname:amsterdam which is > already put into the filtercache by the query facet.field=cityname right? > > The thing that I still don't get is why my filtercache starts to have > evictions although it's size is 16.000+. This shouldn't be happing given > that: > I currently only use faceting on cityname and use this field on FQ as > well, as already said (which adds +/- 1 items to the filtercache, > given that faceting and fq share cached items). > Moreover i use FQ on about 2500 different fields (named _ddp*), but only > check to see if a value exists by doing for example: fq=_ddp1234:[* TO *]. > I sometimes add them together like so: fq=_ddp1234:[* TO *] > &fq=_ddp2345:[* TO *]. But never like so: fq=_ddp1234:[* TO *] > +_ddp2345:[* TO *]. Which means each _ddp*-field is only added once to the > filtercache. > > Wouldn't this mean that at a maximum I can only have 12500 items in the > filtercache? > Still my filtercache starts to have evictions although it's size is > 16.000+. > > What am I missing here? > Geert-Jan > > > hossman wrote: >> >> >> : ..fq=country:france >> : >> : do these queries share cached items in the fieldcache? (in this >> example: >> : country:france) or do they somehow live as seperate entities in the >> cache? >> : The latter would explain my fieldcache having evictions at the moment. >> >> FieldCache can't have evicitions. it's a really low level "cache" where >> the key is field name and the value is an array containing a value for >> every document (you cna think of it as an inverted-inverted-index) that >> Lucene maintains directly. items are never removed they just get garbage >> collected when the IndexReader is no longer used. It's primarily for >> sorting, but the SimpleFacets code also leveragies it for facets in some >> cases -- Solr has no way of showing you what's in the FieldCache, because >> Lucene doesn't expose any inspection APIs to query it (it's a heisenberg >> cache .. once you ask if something is in it, it's in it) >> >> are you refering to the "filterCache" ? >> >> filterCache contains records whose key is a "query" and whose value is a >> DocSet (an unordered collection of all docs matching a query) ... it's >> used whenever you use an "fq" param, for faceting on some fields (when >> the >> TermEnum method is used, a filterCache entry is added for each term >> tested), and even for some sorted queries if the >> config option is set to true. >> >> the easiest way to know whether your faceting is using the FieldCache is >> to start your server cold (no newSearcher warming) and then send it a >> simple query with a single facet.field. depending on the query, you >> might >> get 0 or 1 entries in the filterCache if SimpleFacets is using the >> FieldCache -- but if it's using the TermEnums, and generating a DocSet >> per >> term, you'llsee *lots* of inserts into the filterCache. >> >> >> >> -Hoss >> >> > > -- View this message in context: http://www.nabble.com/quickie%3A-do-facetfields-use-same-cached-items-in-field-cache-as-FQ-param--tf4609795.html#a13170530 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Opensearch XSLT
There is a request handler in 1.2 for Atom. That might be close. OpenSearch was a pretty poor design and is dead now, so I wouldn't expect any new implementations. Google's GData (based on Atom) reuses the few useful OpenSearch elements needed for things like number of hits. Solr's Atom support really should include those. http://code.google.com/apis/gdata/reference.html wunder On 10/12/07 4:59 AM, "Robert Young" <[EMAIL PROTECTED]> wrote: > Hi, > > Does anyone know of an XSLT out there for transforming Solr's default > output to Opensearch format? Our current frontend system uses > opensearch so we would like to integrate it like this. > > Cheers > Rob
Re: Opensearch XSLT
There is a file ${SOLR_HOME}/conf/xslt/example_rss.xsl which is easily modified to transform Solr's output to OpenSearch. Works great, though fixing the date format is a hassle. The supported, searchable Solr date format is not the OpensSearch standard. On 10/12/07, Robert Young <[EMAIL PROTECTED]> wrote: > > Hi, > > Does anyone know of an XSLT out there for transforming Solr's default > output to Opensearch format? Our current frontend system uses > opensearch so we would like to integrate it like this. > > Cheers > Rob >
Opensearch XSLT
Hi, Does anyone know of an XSLT out there for transforming Solr's default output to Opensearch format? Our current frontend system uses opensearch so we would like to integrate it like this. Cheers Rob
solr not finding all results
I've found an odd situation where solr is not returning all of the documents that I think it should. A search for "Geckoplp4-M" returns 3 documents but I know that there are at least 100 documents with that string. Here is an example query for that phrase and the result set: http://localhost:9020/solr/select/?q=Geckoplp4-M&version=2.2&start=0&rows=10&indent=on&fl=comments,id 0 0 10 0 on comments,id Geckoplp4-M 2.2 Geckoplp4-M m2816500 toptrax recordings. Same tracks. Geckoplp4-M m2816544 Geckoplp4-M m2815903 Now here's an example of a search for two documents that I know have that string, but were not returned in the previous search: http://localhost:9020/solr/select/?q=id%3Am2816615+OR+id%3Am2816611&version=2.2&start=0&rows=10&indent=on&fl=id,comments 0 1 10 0 on id,comments id:m2816615 OR id:m2816611 2.2 Geckoplp4-M m2816611 Geckoplp4-M m2816615 Here is the definition for the "comments" field: And here is the definition for a "text" field: Any ideas? Am I doing something wrong? thanks, Kevin
Solr, operating systems and globalization
We discovered and verified an issue in SolrSharp whereby indexing and searching can be disrupted without taking Windows globalization & culture settings into consideration. For example, European cultures affect numeric and date values differently from US/English cultures. The resolution for this type of issue is to specifically control the culture settings to allow for index data formatting to work. However, SolrSharp culture settings should be reflective and consistent with the solr server instance's culture. This leads to my question: does Solr control its culture & language settings through the various language components that can be incorporated, or does the underlying OS have a say in how that data is treated? Some education on this would be greatly appreciated. cheers, jeff r.
dismax downweighting
i have a dismax query where I want to boost appearance of the query terms in certain fields but "downboost" appearance in others. The practical use is a field containing a lot of descriptive text and then a product name field where products might be named after a descriptive word. Consider an electric toothbrush called "The Fast And Thorough Toothbrush" -- if a user searches for fast toothbrush I'd like to down-weight that particular model's advantage. The name of the product might also be in the descriptive text. I tried -name description but solr didn't like that. Any better ideas? -- http://variogr.am/
Re: solr not finding all results
Sorry, I've figured out my own problem. There is a problem with the way I create the xml document for indexing that was causing some of the "comments" fields to not be listed correctly in the default search field, "content". On 10/12/07, Kevin Lewandowski <[EMAIL PROTECTED]> wrote: > I've found an odd situation where solr is not returning all of the > documents that I think it should. A search for "Geckoplp4-M" returns 3 > documents but I know that there are at least 100 documents with that > string. > > Here is an example query for that phrase and the result set: > http://localhost:9020/solr/select/?q=Geckoplp4-M&version=2.2&start=0&rows=10&indent=on&fl=comments,id > > > > 0 > 0 > > 10 > 0 > on > comments,id > Geckoplp4-M > 2.2 > > > > > Geckoplp4-M > m2816500 > > > toptrax recordings. Same tracks. > Geckoplp4-M > m2816544 > > > Geckoplp4-M > m2815903 > > > > > Now here's an example of a search for two documents that I know have > that string, but were not returned in the previous search: > http://localhost:9020/solr/select/?q=id%3Am2816615+OR+id%3Am2816611&version=2.2&start=0&rows=10&indent=on&fl=id,comments > > > > 0 > 1 > > 10 > 0 > on > id,comments > id:m2816615 OR id:m2816611 > 2.2 > > > > > Geckoplp4-M > m2816611 > > > Geckoplp4-M > m2816615 > > > > > Here is the definition for the "comments" field: > > > And here is the definition for a "text" field: > > > > > >generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > > > > > > > >synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > >generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0"/> > > > > > > > > Any ideas? Am I doing something wrong? > > thanks, > Kevin >
Re: dismax downweighting
would a dismax boost that's negative work? ie.. name^-1 and description^-1 ? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Oct 12, 2007, at 1:13 PM, Brian Whitman wrote: i have a dismax query where I want to boost appearance of the query terms in certain fields but "downboost" appearance in others. The practical use is a field containing a lot of descriptive text and then a product name field where products might be named after a descriptive word. Consider an electric toothbrush called "The Fast And Thorough Toothbrush" -- if a user searches for fast toothbrush I'd like to down-weight that particular model's advantage. The name of the product might also be in the descriptive text. I tried -name description but solr didn't like that. Any better ideas? -- http://variogr.am/
Re: Will turning off the stored setting on a field remove it from the index?
On 12-Oct-07, at 4:39 PM, BrendanD wrote: We have some fields that we're currently storing in the index (for example product_name, short_description, etc). We'd like to stop storing them in the index as we're going to start faulting them in from the database instead so that the content is fresh. If we change our config to stop storing them, when will they get removed from the index? After the next commit? After an optimize? Or will we have to rebuild the entire index from scratch? The latter, I'm afraid. Solr never modifies or implicitly changes existing documents due to config changes. -Mike
Will turning off the stored setting on a field remove it from the index?
Hi, We have some fields that we're currently storing in the index (for example product_name, short_description, etc). We'd like to stop storing them in the index as we're going to start faulting them in from the database instead so that the content is fresh. If we change our config to stop storing them, when will they get removed from the index? After the next commit? After an optimize? Or will we have to rebuild the entire index from scratch? Thanks, Brendan -- View this message in context: http://www.nabble.com/Will-turning-off-the-stored-setting-on-a-field-remove-it-from-the-index--tf4616636.html#a13184863 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Structured Lucene documents
Hi All, The Structured (or Multi-Page, Multi-Part) document problem is a problem I've been thinking about for a while. A couple of years ago when the project I was working on was using Lucene only (no Solr), we solved this problem in several steps. At the point of ingestion we created a custom analyzer and surrounding Java code that created a mapping for positions to which page it is on (recall that analyzers tokenize the terms in a given field and mark the position of the token). This mapping was stored outside of the Lucene index. At query time, we used home built Java to pull the position hits matching the query from the index and augmented the results generated by Lucene. At presentation time the results were molded into xml and then transformed by several xsl sheets, one of which translated the position hits to the page they were on using the information gleamed from the ingestion stage. When we moved to Solr, we created a custom QueryResponseWriter in order to get the position locations into the xml results and kept the same transformation to obtain the page level hits. The ingestion stage stays the same -- so really we're using Lucene to build the index, but Solr sits on top of it to serve results. I admit this is an awkward hack. Peter Binkley ([EMAIL PROTECTED]) who I worked with on the project made this suggested improvement: > > "Paged-Text" FieldType for Solr > > A chance to dig into the guts of Solr. The problem: If we index a > monograph in Solr, there's no way to convert search results into > page-level hits. The solution: have a "paged-text" fieldtype which keeps > track of page divisions as it indexes, and reports page-level hits in the > search results. > > The input would contain page milestones: . As Solr > processed the tokens (using its standard tokenizers and filters), it would > concurrently build a structural map of the item, indicating which term > position marked the beginning of which page: firstterm="14324"/>. This map would be stored in an unindexed field in > some efficient format. > > At search time, Solr would retrieve term positions for all hits that are > returned in the current request, and use the stored map to determine page > ids for each term position. The results would imitate the results for > highlighting, something like: > > > > 234 > 236 > > > 19 > > > > > > 14325 > > > ... > > > We have some code that does something like this in a Lucene context, which > could form the basis for a Solr fieldtype; but it would probably be just > as easy to start fresh. > > My current project would like to have some meta data about each sub-part of the document also included. For example: each page would have a url, and/or a title associated with the content. This becomes meaningful when we index things like newspapers and monographs which may have page, chapter, or section level content.So a solution would ideally have taken this into consideration. Does anyone with more experience know if this is a reasonable approach? Does an issue exist for this feature request? Other comments or questions? Thanks, Tricia Pierre-Yves LANDRON wrote: > > Hello,Is it possible to structure lucene documents via Solr, so one > document coud fit into another one ?What I would like to do, for example > :I want to retrieve full text articles, that fit on several pages for each > of them. Results must take in account both the pages and the article from > wich the search terms are from. I can create a lucene document for each > pages of the article AND the article itself, and do two requests to get my > results, but it would duplicate the full text in the index, and will not > be too efficient. Ideally, what I would like to do is to create a document > for indexing the text of each pages of the article, and group these > documents in one document that describe the article : this way, when > Lucene retrieve a requested term, i'll get the article and the page that > contains the term.I wonder if there's a way to emulate elegantly this > behavior with Solr ?Kind Regards,Pierre-Yves Landron > -- View this message in context: http://www.nabble.com/Structured-Lucene-documents-tf4234661.html#a13185053 Sent from the Solr - User mailing list archive at Nabble.com.