Re: Geographical distance searching

2007-09-27 Thread Guillaume Smet
Hi Patrick,

On 9/27/07, patrick o'leary <[EMAIL PROTECTED]> wrote:
>  p.s after a little tidy up I'll be adding this to both lucene and solr's 
> repositories if folks feel that it's a useful addition.

It's definitely very interesting. Did you compare performances of
Lucene with a database allowing you to perform real GIS queries?
I'm more a PostgreSQL guy and I must admit we usually use cube contrib
or PostGIS for this sort of thing and with both, we are capable to use
indexes for proximity queries and they can be pretty fast. Using the
method you used with MySQL is definitely too slow and not used as soon
as you have a certain amount of data in your table.

Regards,

-- 
Guillaume


Re: searching for non-empty fields

2007-09-27 Thread Pieter Berkel
While in theory -URL:"" should be valid syntax, the Lucene query parser
doesn't accept it and throws a ParseException.  I've considered raising this
issue on lucene-dev but it didn't seem to affect many users so I decided not
to pursue the matter.



On 27/09/2007, Chris Hostetter <[EMAIL PROTECTED]> wrote:

> ...and to work arround the problem untill you reindex...
>
> q=(URL:[* TO *] -URL:"")
>
> ...at least: i'm 97% certain that will work.  it won't help if you "empty"
> values are really " " or "  " or ...
>
>


Re: Geographical distance searching

2007-09-27 Thread patrick o'leary




As far as I'm concerned nothings going to beat PG's GIS calculations,
but it's tsearch was
a lot slower than myisam. 

My goal was a single solution to reduce our complexity, but am
interested to know if combining
both an rdbms & lucene works for you. Definitely let me know how it
goes !

P

Guillaume Smet wrote:

  Hi Patrick,

On 9/27/07, patrick o'leary <[EMAIL PROTECTED]> wrote:
  
  
 p.s after a little tidy up I'll be adding this to both lucene and solr's repositories if folks feel that it's a useful addition.

  
  
It's definitely very interesting. Did you compare performances of
Lucene with a database allowing you to perform real GIS queries?
I'm more a PostgreSQL guy and I must admit we usually use cube contrib
or PostGIS for this sort of thing and with both, we are capable to use
indexes for proximity queries and they can be pretty fast. Using the
method you used with MySQL is definitely too slow and not used as soon
as you have a certain amount of data in your table.

Regards,

  


-- 
Patrick O'Leary


You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





Re: searching for non-empty fields

2007-09-27 Thread Brian Whitman

thanks Peter, Hoss and Ryan..


q=(URL:[* TO *] -URL:"")


This gives me 400 Query parsing error: Cannot parse '(URL:[* TO *] - 
URL:"")': Lexical error at line 1, column 29. Encountered: "\"" (34),  
after : "\""




adding something like:
  


I'll do this but the problem here is I have to wait around for all  
these docs to re-index..


Your query will work if you make sure the URL field is omitted from  
the

document at index time when the field is blank.


The thing is, I thought I was omitting the field if it's blank. It's  
in a solrj instance that takes a lucenedocument, so maybe it's a  
solrj issue?


   if( URL != null && URL.length() > 5 )
  doc.add(new Field("URL", URL, Field.Store.YES,  
Field.Index.UN_TOKENIZED));


And then during indexing:

SimpleSolrDoc solrDoc = new SimpleSolrDoc();
solrDoc.setBoost( null, new Float ( doc.getBoost()));
for (Enumeration e = doc.fields(); e.hasMoreElements();) {
  Field field = e.nextElement();
  if (!ignoreFields.contains((field.name( {
solrDoc.addField(field.name(), field.stringValue());
  }
}
try {
  solr.add(solrDoc);
...







LockObtainFailedException

2007-09-27 Thread Jae Joo
will anyone help me why and how?


org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:
SimpleFSLock@/usr/local/se
archengine/apache-solr-1.2.0/fr_companies/solr/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:70)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:579)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java
:341)
at org.apache.solr.update.SolrIndexWriter.(
SolrIndexWriter.java:65)
at org.apache.solr.update.UpdateHandler.createMainIndexWriter(
UpdateHandler.java:120)
at org.apache.solr.update.DirectUpdateHandler2.openWriter(
DirectUpdateHandler2.java:181)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(
DirectUpdateHandler2.java:259)
at org.apache.solr.handler.XmlUpdateRequestHandler.update(
XmlUpdateRequestHandler.java:166)
at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody
(XmlUpdateRequestHandler
.java:84)

Thanks,

Jae Joo


Re: LockObtainFailedException

2007-09-27 Thread matt davies

quick fix

look for a lucene lock file in your tmp directory and delete it, then  
restart solr, should start


I am an idiot though, so be careful, in fact, I'm worse than an  
idiot, I know a little


:-)

you got a lock file somewhere though, deleting that will help you  
out, for me it was in my /tmp directory


On 27 Sep 2007, at 14:10, Jae Joo wrote:


will anyone help me why and how?


org.apache.lucene.store.LockObtainFailedException: Lock obtain  
timed out:

SimpleFSLock@/usr/local/se
archengine/apache-solr-1.2.0/fr_companies/solr/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:70)
at org.apache.lucene.index.IndexWriter.init 
(IndexWriter.java:579)
at org.apache.lucene.index.IndexWriter. 
(IndexWriter.java

:341)
at org.apache.solr.update.SolrIndexWriter.(
SolrIndexWriter.java:65)
at org.apache.solr.update.UpdateHandler.createMainIndexWriter(
UpdateHandler.java:120)
at org.apache.solr.update.DirectUpdateHandler2.openWriter(
DirectUpdateHandler2.java:181)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(
DirectUpdateHandler2.java:259)
at org.apache.solr.handler.XmlUpdateRequestHandler.update(
XmlUpdateRequestHandler.java:166)
at  
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody

(XmlUpdateRequestHandler
.java:84)

Thanks,

Jae Joo




Re: What is facet?

2007-09-27 Thread Erik Hatcher


On Sep 26, 2007, at 7:28 PM, Chris Hostetter wrote:
  cool => (popularity:[100 TO *] (+numFeatures:[10 TO *] +price:[0  
TO 10]))
  lame => (+popularity:[* TO 99] +numFeatures:[* TO 9] +price:[11  
TO *])


That example is definitely in the cool category.   I couldn't resist  
creating a SolrTerminology wiki page linking to your post and  
breaking out the definitions we Solr folks want to embrace.  I think  
it's a good idea to some common language definitions we agree upon here.


Erik



Re: searching for non-empty fields

2007-09-27 Thread Yonik Seeley
On 9/27/07, Pieter Berkel <[EMAIL PROTECTED]> wrote:
> While in theory -URL:"" should be valid syntax, the Lucene query parser
> doesn't accept it and throws a ParseException.

I don't have time to work on that now, but I did just open a bug:
https://issues.apache.org/jira/browse/LUCENE-1006

-Yonik


Request for graphics

2007-09-27 Thread Benjamin Liles
I am trying to make a presentation on SOLR and have been unable to find 
the SOLR graphic in high quality.  Could someone point me in the right 
direction or provide the graphics?


Thanks,

Benjamin Liles

Lead Software Application Developer
Digital Initiatives - Web Services
University Libraries
Texas A&M University
[EMAIL PROTECTED]

3.109E Library Annex | 5000 TAMU | College Station, TX 77843

Tel. 979.862.4948x122

http://library.tamu.edu



moving index

2007-09-27 Thread Jae Joo
Hi,

I do need to move the index files, but have a concerns any potential problem
including performance?
Do I have to keep the original document for querying?

Thanks,

Jae Joo


Re: moving index

2007-09-27 Thread Yonik Seeley
On 9/27/07, Jae Joo <[EMAIL PROTECTED]> wrote:
> I do need to move the index files, but have a concerns any potential problem
> including performance?
> Do I have to keep the original document for querying?

I assume you posted XML documents in Solr XML format (like ...)?
If so, that is just an example way to get the data into Solr.  Those
XML files aren't needed, and any high-speed indexing will avoid
creating files at all - just create the XML doc in memory and send to
solr via HTTP-POST.

-Yonik


Re: Converting German special characters / umlaute

2007-09-27 Thread Steven Rowe
Chris Hostetter wrote:
> : is there an analyzer which automatically converts all german special
> : characters to their specific dissected from, such as ü to ue and ä to
> : ae, etc.?!
> 
> See also the ISOLatin1TokenFilter which does this regardless of langauge.

Actually, ISOLatin1TokenFilter does NOT convert /ü/ to /ue/, /ä/ to
/ae/, etc.

Instead, it converts /ü/ to /u/, /ä/ to /a/, etc.  It *does* convert /ß/
to /ss/, though I've seen some people write that the correct
substitution for /ß/ in German is /sz/ - I don't speak or read German,
so I don't know.

Maybe there should be an option on ISOLatin1TokenFilter to use German
substitutions, in addition to the current behavior of simply stripping
diacritics?

Does anyone know if there are other (Latin-1-utilizing) languages
besides German with standardized diacritic substitutions that involve
something other than just stripping the diacritics?

Steve



Problem with handle hold deleted files

2007-09-27 Thread Danilo Fantinato
Hi,
I'm using EmbeddedSolrServer and when I start the snapinstaller  process i'm
calling the commit method of the EmbeddedSolr througth a servlet but the JVM
holds deleted files on Operating System and usage disk space excessive.
Follow line sample  from the command "lsof |grep deleted"
java  17255 weblogic  419r  REG  104,6  437821226462
/domains/solr-indexes/q/OPNPrecoIndex/datasolr/index_22746_preCommit/_2kb.cfs
(deleted)

When restarting the JVM process, the deleted files opened was clear and the
disk space was free.

I need help on this case.


Re: LockObtainFailedException

2007-09-27 Thread Jae Joo
In solrconfig.xml,
false
10
25000
1400
500
1000
1

Does writeLockTimeout too small?

Thanks,

Jae
On 9/27/07, matt davies <[EMAIL PROTECTED]> wrote:
>
> quick fix
>
> look for a lucene lock file in your tmp directory and delete it, then
> restart solr, should start
>
> I am an idiot though, so be careful, in fact, I'm worse than an
> idiot, I know a little
>
> :-)
>
> you got a lock file somewhere though, deleting that will help you
> out, for me it was in my /tmp directory
>
> On 27 Sep 2007, at 14:10, Jae Joo wrote:
>
> > will anyone help me why and how?
> >
> >
> > org.apache.lucene.store.LockObtainFailedException: Lock obtain
> > timed out:
> > SimpleFSLock@/usr/local/se
> > archengine/apache-solr-1.2.0/fr_companies/solr/data/index/write.lock
> > at org.apache.lucene.store.Lock.obtain(Lock.java:70)
> > at org.apache.lucene.index.IndexWriter.init
> > (IndexWriter.java:579)
> > at org.apache.lucene.index.IndexWriter.
> > (IndexWriter.java
> > :341)
> > at org.apache.solr.update.SolrIndexWriter.(
> > SolrIndexWriter.java:65)
> > at org.apache.solr.update.UpdateHandler.createMainIndexWriter(
> > UpdateHandler.java:120)
> > at org.apache.solr.update.DirectUpdateHandler2.openWriter(
> > DirectUpdateHandler2.java:181)
> > at org.apache.solr.update.DirectUpdateHandler2.addDoc(
> > DirectUpdateHandler2.java:259)
> > at org.apache.solr.handler.XmlUpdateRequestHandler.update(
> > XmlUpdateRequestHandler.java:166)
> > at
> > org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody
> > (XmlUpdateRequestHandler
> > .java:84)
> >
> > Thanks,
> >
> > Jae Joo
>
>


Re: searching for non-empty fields

2007-09-27 Thread Yonik Seeley
On 9/27/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On 9/27/07, Pieter Berkel <[EMAIL PROTECTED]> wrote:
> > While in theory -URL:"" should be valid syntax, the Lucene query parser
> > doesn't accept it and throws a ParseException.
>
> I don't have time to work on that now,

OK, I lied :-)  It was simple (and a nice diversion).

-Yonik

> but I did just open a bug:
> https://issues.apache.org/jira/browse/LUCENE-1006


Re: Converting German special characters / umlaute

2007-09-27 Thread J.J. Larrea
At 12:13 PM -0400 9/27/07, Steven Rowe wrote:
>Chris Hostetter wrote:
>> : is there an analyzer which automatically converts all german special
>> : characters to their specific dissected from, such as ü to ue and ä to
>> : ae, etc.?!
>>
>> See also the ISOLatin1TokenFilter which does this regardless of langauge.
>
>Actually, ISOLatin1TokenFilter does NOT convert /ü/ to /ue/, /ä/ to
>/ae/, etc.
>
>Instead, it converts /ü/ to /u/, /ä/ to /a/, etc.  It *does* convert /ß/
>to /ss/, though I've seen some people write that the correct
>substitution for /ß/ in German is /sz/ - I don't speak or read German,
>so I don't know.

You and lots of other people, including myself... Thus while there is indeed a 
"specific dissected form" -- certainly German speakers clearly understand that 
when an input mechanism doesn't allow for umlauted vowels (e.g. ASCII, 
non-German typewriters) that the /ue/, /ae/, etc. equivalents are to be used -- 
if maximally flexible matching between input texts and queries is desired, an 
information system used by non-German speakers has to account for them simply 
ignoring the umlaut and entering /u/, /e/ etc. while /ß/ needs to be matched as 
itself, /ss/, /sz/ (/ß/ is read as 'ess zed'), and I expect even /b/.

So perhaps it would make sense for translation into a canonical format /ü/ to 
/ue/ and /ß/ to /ss/ at both index and query time, but also to then emit 
synonym (overlapping) tokens with /ue/ -> /u/, /sz/ -> /ss/, and perhaps even 
/b/ -> /ss/.

(This is just thinking aloud and I'd love to be corrected by someone with more 
experience in this realm)

>Maybe there should be an option on ISOLatin1TokenFilter to use German
>substitutions, in addition to the current behavior of simply stripping
>diacritics?

As for implementation, the first part could easily and flexibly accomplished 
with the current PatternReplaceFilter, and I'm thinking the second could be 
done with an extension to that or better yet a new Filter which allows parsing 
synonymous tokens from a flat to overlaid format, e.g. something on the order 
of:


 replace="first"/>

or perhaps better,


   

which in my fantasy implementation would map:

Müller -> Mueller|Muller
Mueller -> Mueller|Muller
Muller -> Muller

and could be run at index-time and/or query-time as appropriate.

>Does anyone know if there are other (Latin-1-utilizing) languages
>besides German with standardized diacritic substitutions that involve
>something other than just stripping the diacritics?

I'm curious about this too.

- J.J.


Re: Converting German special characters / umlaute

2007-09-27 Thread Walter Underwood
Accent transforms are language-specific, so an accent filter
should take an ISO langauge code as an argument.

Some examples:

* In French and English, a diereses is a hint to pronounce neighboring
vowels separateley, as in coöp, naïve, or Noël.

* In German, ü transformes to ue.

* In Swedish, ö is a different letter than o, and should
not be transformed. The same is true for ø in Danish and
Norwegian.

* Then there is Motörhead and Motley Crüe, see:
http://en.wikipedia.org/wiki/Heavy_metal_umlaut

* I don't know of an ISO language code for Tolkein's
Elvish, so we're out of luck for Manwë.

Another approach would be to generate the accent-transformed
terms as synonyms at the same token position. Then you could
generate multiple options.

Obviously, we had to do this right for Ultraseek a few years ago.

wunder

On 9/27/07 9:13 AM, "Steven Rowe" <[EMAIL PROTECTED]> wrote:

> Maybe there should be an option on ISOLatin1TokenFilter to use German
> substitutions, in addition to the current behavior of simply stripping
> diacritics?
> 
> Does anyone know if there are other (Latin-1-utilizing) languages
> besides German with standardized diacritic substitutions that involve
> something other than just stripping the diacritics?



Date facetting and ranges overlapping

2007-09-27 Thread Guillaume Smet
Hi all,

I'm now using date facetting to browse events. It works really fine
and is really useful. The only problem so far is that if I have an
event which is exactly on the boundary of two ranges, it is referenced
2 times.

If we admit that we have a gap of 6 hours starting from 2007-09-27
12:00, ranges are: 2007-09-27 12:00->18:00 and 2007-09-27 18:00->
00:00. An event happening exactly at 18:00 is referenced in both
ranges and so if I select the first range Solr returns both ranges in
facet_dates instead of the first one only.

Couldn't we create the range so that they don't overlap? Something like:
2007-09-27 12:00 -> 2007-09-27 17:59:59.999 for the first one and
2007-09-27 18:00 -> 2007-09-27 23:59:59.999 for the second one.

I don't think people use date facetting with a millisecond range so
retrieving 1 millisecond shouldn't be too much a problem in practice.

Thanks for any comment.

--
Guillaume


Re: custom sorting

2007-09-27 Thread Chris Hostetter

: > Previously we were using lucene to do this. by using the
: > SortComparatorSource we could sort the documents returned by distance
: > nicely. we are now switching over to lucene because of the features it
: > provides, however i am not able to see a way to do this in Solr. 

Someone started another thread where they specificly discuss the 
"Geographical distance searching" aspect of your question.

to answer the broader question of using customized 
LUcene SortComparatorSource objects in solr -- it is in fact possible.

In Solr, all decisisons about how to sort are driven by FieldTypes.  You 
can subclass any of the FieldTypes that come with Solr and override just 
the getSortField method to use whatever sort logic you want and then use 
your new FieldType as you would any other plugin...

http://wiki.apache.org/solr/SolrPlugins

In the case where you have a custom SortComparatorSource that is not 
"field" specific (or uses data from morethen one field) you would need to 
make your field type smart enough to let you cofigure (via the  
declaration in the schema) which fields (if any) to get it's data from, 
and then create a marker field of that type, which you don't use to index 
or store any data, but you use to indicate when to trigger your custom 
sort logic, ie...


 

   

...and then use "sort=distance+asc" in your query



-Hoss



Re: LockObtainFailedException

2007-09-27 Thread Chris Hostetter

In "normal" solr usage, where Solr is the only thing writing to your 
index, you should never get a lock timeout ... typical reasosn for this to 
happen are if your servlet container crashed or was shutdown uncleanly and 
Solr wasn't able to clean up it's lock file  (check your logs)

There is an option to tell Solr to remove the lock file on startup.  
On the trunk there is also an option to tell solr that it doesn't need to 
bother with a lock file.

(i don'tremember whatthese options are called of had, but they are fairly 
well documented in the example solrconfig.xml)




-Hoss



Re: Date facetting and ranges overlapping

2007-09-27 Thread Chris Hostetter
: I'm now using date facetting to browse events. It works really fine
: and is really useful. The only problem so far is that if I have an
: event which is exactly on the boundary of two ranges, it is referenced
: 2 times.

yeah, this is one of the big caveats with date faceting right now ... i 
struggled with this a bit when designing it, and ultimately decided to 
punt on the issue.  the biggest hangup was that even if hte facet counting 
code was smart about making sure the ranges don't overlap, the range query 
syntax in the QueryParser doesn't support ranges that exclude one input 
(so there wouldn't be a lot you can do with the ranges once you know the 
counts in them)

one idea i had in SOLR-258 was that we could add an "interval" option that 
would define how much to add to the "end" or one range to get the "start" 
of another range (think of the current implementation having interval 
hardcoded to "0") which would solve the problem and work with range 
queries that were inclusive of both endpoints, but would require people to 
use "-1MILLI" a lot.

a better option (assuming a query parser change) would be a new option 
thta says wether each computed range should be enclusive of the low poin,t 
the high point, both end points, neither end points, or be "smart" (where 
smart is the same as "low" except for the last range where the it includes 
both)

(I think there's already a lucene issue to add the query parser support, i 
just haven't had time to look at it)

The simple workarround: if you know all of your data is indexed with 
perfect 0.000second precision, then put "-1MILLI" at the end of your start 
and end date faceting params.



-Hoss



Re: custom sorting

2007-09-27 Thread Erik Hatcher


On Sep 27, 2007, at 2:50 PM, Chris Hostetter wrote:

to answer the broader question of using customized
LUcene SortComparatorSource objects in solr -- it is in fact possible.

In Solr, all decisisons about how to sort are driven by  
FieldTypes.  You
can subclass any of the FieldTypes that come with Solr and override  
just
the getSortField method to use whatever sort logic you want and  
then use

your new FieldType as you would any other plugin...

http://wiki.apache.org/solr/SolrPlugins

In the case where you have a custom SortComparatorSource that is not
"field" specific (or uses data from morethen one field) you would  
need to
make your field type smart enough to let you cofigure (via the  

declaration in the schema) which fields (if any) to get it's data  
from,
and then create a marker field of that type, which you don't use to  
index

or store any data, but you use to indicate when to trigger your custom
sort logic, ie...



   
   

...and then use "sort=distance+asc" in your query


Using something like this, how would the custom SortComparatorSource  
get a parameter from the request to use in sorting calculations?


I haven't looked under the covers of the local-solr stuff that flew  
by earlier, but looks quite well done.  I think I can speak for many  
that would love to have geo field types / sorting capability built  
into Solr.


Erik



Selecting Distinct values?

2007-09-27 Thread David Whalen
Hi there.

Is there a query I can use to select distinct values in an index?
I thought I could use a facet, but the facets don't seem to return
all the distinct values in the index, only the highest-count ones.

Is there another query I can try?  Or, can I adjust the facets
somehow to make this work?

Thanks,

DW



Re: custom sorting

2007-09-27 Thread Yonik Seeley
On 9/27/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> Using something like this, how would the custom SortComparatorSource
> get a parameter from the request to use in sorting calculations?

perhaps hook in via function query:
  dist(10.4,20.2,geoloc)

And either manipulate the score with that and sort by score,

q=+(foo bar)^0 dist(10.4,20.2,geoloc)
sort=score asc

or extend solr's sorting mechanisms to allow specifying a function to sort by.

sort="dist(10.4,20.2,geoloc) asc"

-Yonik


Re: Date facetting and ranges overlapping

2007-09-27 Thread Guillaume Smet
On 9/27/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> a better option (assuming a query parser change) would be a new option
> thta says wether each computed range should be enclusive of the low poin,t
> the high point, both end points, neither end points, or be "smart" (where
> smart is the same as "low" except for the last range where the it includes
> both)

That could be really cool.

> The simple workarround: if you know all of your data is indexed with
> perfect 0.000second precision, then put "-1MILLI" at the end of your start
> and end date faceting params.

Good idea. The only problem is that I'll have to modify my client code
to deal with the fact that solr now returns 17:59:59 instead of
18:00:00. Not difficult but less clean than before.

Thanks for the advice. I'll give it a try.

--
Guillaume


Re: Selecting Distinct values?

2007-09-27 Thread Mike Klaas

On 27-Sep-07, at 12:01 PM, David Whalen wrote:


Hi there.

Is there a query I can use to select distinct values in an index?
I thought I could use a facet, but the facets don't seem to return
all the distinct values in the index, only the highest-count ones.

Is there another query I can try?  Or, can I adjust the facets
somehow to make this work?


http://wiki.apache.org/solr/ 
SimpleFacetParameters#head-1b281067d007d3fb66f07a3e90e9b1704cbc59a3


cheers,
-Mike


Re: Date facetting and ranges overlapping

2007-09-27 Thread Guillaume Smet
On 9/27/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> The simple workarround: if you know all of your data is indexed with
> perfect 0.000second precision, then put "-1MILLI" at the end of your start
> and end date faceting params.

It fixed my problem. Thanks.

--
Guillaume


RE: What is facet?

2007-09-27 Thread Teruhiko Kurosaka
Thank you Ezra and Chris for explaining this,
and I like your idea, Erik.  This will make intro to Solr
easier for new comers, and make Solr more popular.

-Kuro 


> That example is definitely in the cool category.   I couldn't resist  
> creating a SolrTerminology wiki page linking to your post and  
> breaking out the definitions we Solr folks want to embrace.  I think  
> it's a good idea to some common language definitions we agree 
> upon here.
> 
>   Erik


RE: Selecting Distinct values?

2007-09-27 Thread David Whalen
  Silly me.  Thanks!

  

> -Original Message-
> From: Mike Klaas [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, September 27, 2007 4:46 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Selecting Distinct values?
> 
> On 27-Sep-07, at 12:01 PM, David Whalen wrote:
> 
> > Hi there.
> >
> > Is there a query I can use to select distinct values in an index?
> > I thought I could use a facet, but the facets don't seem to 
> return all 
> > the distinct values in the index, only the highest-count ones.
> >
> > Is there another query I can try?  Or, can I adjust the 
> facets somehow 
> > to make this work?
> 
> http://wiki.apache.org/solr/SimpleFacetParameters#head-1b28106
> 7d007d3fb66f07a3e90e9b1704cbc59a3
> 
> cheers,
> -Mike
> 
> 


maxBufferedDocs vs autoCommit->maxDocs

2007-09-27 Thread Bouis, Laurent
 

Hi,

 

What is the difference between the
1000, and the
1000... parameters in
solrconfig.xml? 

Do they influence the frequency of flush to disk and document
distribution in segments in a different way?

 

When I did some test with low identical values, I saw similar behavior
in terms of frequency of index data flushed to disk (although the number
of segments sometimes looked a bit different).

 

 

Thanks.

 

 

Laurent



Re: maxBufferedDocs vs autoCommit->maxDocs

2007-09-27 Thread Mike Klaas

On 27-Sep-07, at 3:35 PM, Bouis, Laurent wrote:


What is the difference between the
1000, and the
1000... parameters in
solrconfig.xml?

Do they influence the frequency of flush to disk and document
distribution in segments in a different way?


maxBufferedDocs affects disk flushing behaviour, but is purely a  
lucene-level thing that might affect performance and memory usage,  
but not semantics.


autoCommit is a Solr-level option that affects when added document  
become visible to searches (very important to search semantics).  A  
side-effect of a commit is a lucene-level flush, so if  
maxBufferedDocs > autoCommit it does not have any effect.


cheers,
-Mike


one query or multiple queries

2007-09-27 Thread Xuesong Luo
Hi, 
I have a user index(each user has a unique index record) and need to get
information for 10 users. Should I run 10 queries or 1 query with
multiple user ids? Any performance difference?

Thanks
Xuesong



Re: anyone can send me jetty-plus

2007-09-27 Thread Matt Kangas
If you're using Jetty 6, there's no need for a separate "Jetty Plus"  
download. The "plus" jarfiles come in the standard distribution.


--matt

On Sep 27, 2007, at 12:10 AM, James liu wrote:

i can't download it from http://jetty.mortbay.org/jetty5/plus/ 
index.html


--
regards
jl


--
Matt Kangas / [EMAIL PROTECTED]