RE: List of indexed terms for a field
Thanks for the answer. This is not a need for the moment, but it could be in the near future. If it becomes so, I will see how we can implement such a thing. As for the syntax, I would see another parameter for the request (and maybe another URL, as the function is clearly different). Something like: http://localhost:8983/solr/terms/?fl=myfield&rows=10 But perhaps am I completely off-course (I am no Java developer, sorry). -Message d'origine- De : Yonik Seeley [mailto:[EMAIL PROTECTED] Envoyé : mercredi 7 juin 2006 15:41 À : solr-user@lucene.apache.org Objet : Re: List of indexed terms for a field On 6/7/06, Paul Terray <[EMAIL PROTECTED]> wrote: > I am trying to make an index: Is there any way to get a list of all indexed > terms for a field (especially a string or text one)? Hi Paul, There isn't currently a way to do this, except perhaps writing your own custom request handler and using the lower level Lucene TermEnumerator after getting your hands on the underlying IndexReader. This feature has been on my wish-list though. There needs to be a syntax to request info like this, and then the implementation. perhaps something along the lines of a function syntax @top10=terms("myfield",10) // request top 10 terms of "myfield", and return result under "top10" So then the XML result from Solr would have something like this at the end: term1term2term3 @top10=termFreqs("myfield",10) // request top 10 terms and their frequencies Returns: term1142... OR 142... -Yonik
Re: Finding documents with undefined field
Chris Hostetter wrote: > > > There are a couple of things you can do here... > > 1) Use the same approach i described before if you have a uniqueKey, > search for all things with a key and then exclude things that have a value > in your field. Since you are writing a request handler, you could also > progromaticaly build up a BooleanQuery containing a MatchAllDocsQuery > object and your prohibited clause even if you don't have a uniqueKey > > 2) you can fetch the DocSet of all documents that *do* have a value for > that field, and then get the inverse, and use that for your facet counts. > this is something that was discussed before in a thread Erik started... > .. > > Ok at last I tried the easy way so, when I find a particular predefined "undefined-value" in a filter or facet, I convert the query to parse to: "type:ad AND -" +field+":[* TO *]" "type:ad" matches all my documents, the other type I have is "facets" (many thanks for the unbound range trick). I cannot see any particular slowliness (but I'm testing with 50.000 docs now) perhaps thanks to Solr ConstantScoreRangeQueries conversion, should I worry with bigger numbers, say 300.000 docs ? My two cents on Solr development: surely "DocSet.andNot(DocSet other)" capability would be precious to optimize the undefined-field and other inverse-query problems. Thanks again Fabio -- View this message in context: http://www.nabble.com/Finding-documents-with-undefined-field-t1742872.html#a4773462 Sent from the Solr - User forum at Nabble.com.
Re: Finding documents with undefined field
On 6/8/06, Fabio Confalonieri <[EMAIL PROTECTED]> wrote: Ok at last I tried the easy way so, when I find a particular predefined "undefined-value" in a filter or facet, I convert the query to parse to: "type:ad AND -" +field+":[* TO *]" "type:ad" matches all my documents, the other type I have is "facets" (many thanks for the unbound range trick). I cannot see any particular slowliness (but I'm testing with 50.000 docs now) perhaps thanks to Solr ConstantScoreRangeQueries conversion, should I worry with bigger numbers, say 300.000 docs ? Provided you have the memory for the number of facets you are using, the filterCache should handle any slowness problem. There are optimizations that could be done to speed up getting the DocSets (filters) for simple queries, but it hasn't been a priority given that we operate off the filter cache so much. -Yonik
Lucene versioning policy
Hello, I was curious as to policy regarding being current with the lucene codebase. Does solr use the lastest stable release? bleeding edge (trunk?) Occasional manual svn import? Also, are there any plans to split solr into a release/development mode? I'd really like to use solr in a commercial setting, but having nothing but nightly builds available makes me uneasy. Thanks in advance, -Mike
Re: Lucene versioning policy
On 6/8/06, Mike Klaas <[EMAIL PROTECTED]> wrote: I was curious as to policy regarding being current with the lucene codebase. Does solr use the lastest stable release? bleeding edge (trunk?) Occasional manual svn import? An occasional SVN import based on need (same as hadoop/nutch as far as I can see). Lucene releases are to far and few between to always go with a "stable" version. Solr is stocked with Lucene committers, so we know what's is going in. If we find a Lucene problem, I'd rather make a fix directly to Lucene and use it rather than attempting to work around it. Also, as a Lucene committer, I also like making sure the current version is stable. Also, are there any plans to split solr into a release/development mode? Definitely. (just no dates have been set yet) I'd really like to use solr in a commercial setting, but having nothing but nightly builds available makes me uneasy. Anything you develop would need to be QA'd for a commercial setting anyway. Perhaps you could pick the latest nightly build, make sure it works for your application, and stick with it a while :-) -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server
Re: Lucene versioning policy
On 6/8/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > I'd really like to use solr in a commercial setting, but having nothing but > nightly builds available makes me uneasy. Anything you develop would need to be QA'd for a commercial setting anyway. Perhaps you could pick the latest nightly build, make sure it works for your application, and stick with it a while :-) Thanks, Yonik. That is what I have been doing so far, and it is fine for now. The difficulty is not knowing when important bugfixes occur, major features added, etc., without closely watching svn activity. Even something as simple as 'tagging' a nightly build that contains major changes (with a brief changelog) would be helpful. It would also be valuable from a project history perspective. Thanks again--I think solr is a phenomenal little product. -Mike
Re: Lucene versioning policy
On 6/8/06, Mike Klaas <[EMAIL PROTECTED]> wrote: Even something as simple as 'tagging' a nightly build that contains major changes (with a brief changelog) would be helpful. It would also be valuable from a project history perspective. We try to record all non-trivial changes that can have an impact on end users here: http://svn.apache.org/viewvc/incubator/solr/trunk/CHANGES.txt But it is imperfect... Perhaps an entry should be added when updating the Lucene version too. -Yonik
Re: Lucene versioning policy
: http://svn.apache.org/viewvc/incubator/solr/trunk/CHANGES.txt : : But it is imperfect... Perhaps an entry should be added when updating : the Lucene version too. +1 ... definitely. -Hoss
Re: Lucene versioning policy
: Also, are there any plans to split solr into a release/development mode? : : I'd really like to use solr in a commercial setting, but having nothing but : nightly builds available makes me uneasy. I believe that as long as Solr is in the incubator, nightly builds are the only releases we are allowed to have. This is a side note in the incubation policy about exiting incubation... Note: incubator projects are not permitted to issue an official Release. Test snapshots (however good the quality) and Release plans are OK. ...of course, there is some conflicting info higher up in the same doc that suggests they are allowed, but they require jumping through some hoops... http://incubator.apache.org/incubation/Incubation_Policy.html#Releases -Hoss