RE: List of indexed terms for a field

2006-06-08 Thread Paul Terray
Thanks for the answer.

This is not a need for the moment, but it could be in the near future. 

If it becomes so, I will see how we can implement such a thing.

As for the syntax, I would see another parameter for the request (and maybe
another URL, as the function is clearly different).

Something like:
http://localhost:8983/solr/terms/?fl=myfield&rows=10

But perhaps am I completely off-course (I am no Java developer, sorry).



-Message d'origine-
De : Yonik Seeley [mailto:[EMAIL PROTECTED] 
Envoyé : mercredi 7 juin 2006 15:41
À : solr-user@lucene.apache.org
Objet : Re: List of indexed terms for a field

On 6/7/06, Paul Terray <[EMAIL PROTECTED]> wrote:
> I am trying to make an index: Is there any way to get a list of all
indexed
> terms for a field (especially a string or text one)?

Hi Paul,
There isn't currently a way to do this, except perhaps writing your
own custom request handler and using the lower level Lucene
TermEnumerator after getting your hands on the underlying IndexReader.

This feature has been on my wish-list though.
There needs to be a syntax to request info like this, and then the
implementation.

perhaps something along the lines of a function syntax

@top10=terms("myfield",10)
  // request top 10 terms of "myfield", and return result under "top10"

So then the XML result from Solr would have something like this at the end:
term1term2term3


@top10=termFreqs("myfield",10)   // request top 10 terms and their
frequencies
Returns:
term1142...
  OR
142...


-Yonik



Re: Finding documents with undefined field

2006-06-08 Thread Fabio Confalonieri


Chris Hostetter wrote:
> 
> 
> There are a couple of things you can do here...
> 
> 1) Use the same approach i described before if you have a uniqueKey,
> search for all things with a key and then exclude things that have a value
> in your field.  Since you are writing a request handler, you could also
> progromaticaly build up a BooleanQuery containing a MatchAllDocsQuery
> object and your prohibited clause even if you don't have a uniqueKey
> 
> 2) you can fetch the DocSet of all documents that *do* have a value for
> that field, and then get the inverse, and use that for your facet counts.
> this is something that was discussed before in a thread Erik started...
> ..
> 
> 

Ok at last I tried the easy way so, when I find a particular predefined 
"undefined-value" in a filter or facet, I convert the query to parse to:

   "type:ad AND -" +field+":[* TO *]"

"type:ad" matches all my documents, the other type I have is "facets"
 (many thanks for the unbound range trick).

I cannot see any particular slowliness (but I'm testing with 50.000 docs
now) perhaps thanks to Solr ConstantScoreRangeQueries conversion, 
should I worry with bigger numbers, say 300.000 docs ?

My two cents on Solr development: surely "DocSet.andNot(DocSet other)"
capability would be precious to optimize the undefined-field and other 
inverse-query problems.

Thanks again

Fabio
--
View this message in context: 
http://www.nabble.com/Finding-documents-with-undefined-field-t1742872.html#a4773462
Sent from the Solr - User forum at Nabble.com.



Re: Finding documents with undefined field

2006-06-08 Thread Yonik Seeley

On 6/8/06, Fabio Confalonieri <[EMAIL PROTECTED]> wrote:

Ok at last I tried the easy way so, when I find a particular predefined
"undefined-value" in a filter or facet, I convert the query to parse to:

   "type:ad AND -" +field+":[* TO *]"

"type:ad" matches all my documents, the other type I have is "facets"
 (many thanks for the unbound range trick).

I cannot see any particular slowliness (but I'm testing with 50.000 docs
now) perhaps thanks to Solr ConstantScoreRangeQueries conversion,
should I worry with bigger numbers, say 300.000 docs ?


Provided you have the memory for the number  of facets you are using,
the filterCache should handle any slowness problem.

There are optimizations that could be done to speed up getting the
DocSets (filters) for simple queries, but it hasn't been a priority
given that we operate off the filter cache so much.

-Yonik


Lucene versioning policy

2006-06-08 Thread Mike Klaas

Hello,

I was curious as to policy regarding being current with the lucene
codebase.  Does solr use the lastest stable release?  bleeding edge
(trunk?)  Occasional manual svn import?

Also, are there any plans to split solr into a release/development mode?

I'd really like to use solr in a commercial setting, but having nothing but
nightly builds available makes me uneasy.

Thanks in advance,
-Mike


Re: Lucene versioning policy

2006-06-08 Thread Yonik Seeley

On 6/8/06, Mike Klaas <[EMAIL PROTECTED]> wrote:

I was curious as to policy regarding being current with the lucene
codebase.  Does solr use the lastest stable release?  bleeding edge
(trunk?)  Occasional manual svn import?


An occasional SVN import based on need (same as hadoop/nutch as far as
I can see).
Lucene releases are to far and few between to always go with a "stable" version.
Solr is stocked with Lucene committers, so we know what's is going in.
If we find a Lucene problem, I'd rather make a fix directly to Lucene
and use it rather than attempting to work around it.  Also, as a
Lucene committer, I also like making sure the current version is
stable.


Also, are there any plans to split solr into a release/development mode?


Definitely.  (just no dates have been set yet)


I'd really like to use solr in a commercial setting, but having nothing but
nightly builds available makes me uneasy.


Anything you develop would need to be QA'd for a commercial setting
anyway.  Perhaps you could pick the latest nightly build, make sure it
works for your application, and stick with it a while :-)

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server


Re: Lucene versioning policy

2006-06-08 Thread Mike Klaas

On 6/8/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


> I'd really like to use solr in a commercial setting, but having nothing but
> nightly builds available makes me uneasy.

Anything you develop would need to be QA'd for a commercial setting
anyway.  Perhaps you could pick the latest nightly build, make sure it
works for your application, and stick with it a while :-)


Thanks, Yonik.  That is what I have been doing so far, and it is fine
for now.  The difficulty is not knowing when important bugfixes occur,
major features added, etc., without closely watching svn activity.
Even something as simple as 'tagging' a nightly build that contains
major changes (with a brief changelog) would be helpful.  It would
also be valuable from a project history perspective.

Thanks again--I think solr is a phenomenal little product.
-Mike


Re: Lucene versioning policy

2006-06-08 Thread Yonik Seeley

On 6/8/06, Mike Klaas <[EMAIL PROTECTED]> wrote:

Even something as simple as 'tagging' a nightly build that contains
major changes (with a brief changelog) would be helpful.  It would
also be valuable from a project history perspective.


We try to record all non-trivial changes that can have an impact on
end users here:
http://svn.apache.org/viewvc/incubator/solr/trunk/CHANGES.txt

But it is imperfect...  Perhaps an entry should be added when updating
the Lucene version too.

-Yonik


Re: Lucene versioning policy

2006-06-08 Thread Chris Hostetter

: http://svn.apache.org/viewvc/incubator/solr/trunk/CHANGES.txt
:
: But it is imperfect...  Perhaps an entry should be added when updating
: the Lucene version too.

+1 ... definitely.


-Hoss



Re: Lucene versioning policy

2006-06-08 Thread Chris Hostetter

: Also, are there any plans to split solr into a release/development mode?
:
: I'd really like to use solr in a commercial setting, but having nothing but
: nightly builds available makes me uneasy.

I believe that as long as Solr is in the incubator, nightly builds are the
only releases we are allowed to have.  This is a side note in the
incubation policy about exiting incubation...

   Note: incubator projects are not permitted to issue an official
   Release. Test snapshots (however good the quality) and Release
   plans are OK.

...of course, there is some conflicting info higher up in the same doc
that suggests they are allowed, but they require jumping through some
hoops...

http://incubator.apache.org/incubation/Incubation_Policy.html#Releases


-Hoss