Re: numFound is changing when query across distributed-seach with the same query.

2010-01-01 Thread johnson hong



Yonik Seeley-2 wrote:
> 
> On Thu, Dec 31, 2009 at 2:29 AM, johnson hong
>  wrote:
>>
>> Hi,all.
>>    I found a problem on distributed-seach.
>>    when i use "?q=keyword&start=0&rows=20" to query across
>> distributed-seach,it will return numFound="181" ,then I
>>    change the start param from 0 to 100,it will return numFound="131".
> 
> You probably have duplicates (docs on different shards with the same id).
> Deeper paging will detect more of them.
> It does raise the question of if we should be changing numFound, or
> indicating a separate duplicate count.  Duplicates aren't eliminated
> from things like faceting or statistics, so it might be nice to have a
> number that was consistent with those numbers.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
Thank you Yonik,Happy New Year all.
I will check the index soon after the festival.
-- 
View this message in context: 
http://old.nabble.com/numFound-is-changing-when-query-across-distributed-seach-with-the-same-query.-tp26976128p26984236.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: numFound is changing when query across distributed-seach with the same query.

2010-01-01 Thread Yonik Seeley
On Thu, Dec 31, 2009 at 10:26 PM, Chris Hostetter
 wrote:
> why do we bother detecthing/removing the duplicates?
>
> strictly speaking docs with duplicate IDs on multiple shards is a "garbage
> in" situation, i can understanding Solr taking a little extra effort to
> not fail hard if this situation is encountered, but why update the
> numFound at all, or remove the duplicates from the list? ... why not leave
> them in as is?  (then numFound would never change)

Distrib search keys some things off of the unique id, so when we
encountered duplicates in the past it failed hard.  IIRC only keeping
one doc with the same id was actually the easiest way to not fail
hard.

-Yonik
http://www.lucidimagination.com


Has anyone got Carrot2 working with Solr without using ant?

2010-01-01 Thread Alex Muir
Hi,

I'm about to start using Ant to get Carrot2 working with solr however
I was first trying to get it working without Ant by placing jars into
a lib directory in the quickstart example directory however I couldn't
find any documentation to guide me in this.

If anyone can suggest how to accomplish this I would be happy to hear about it.

Happy New Year!
Regards

-- 

Alex
https://sites.google.com/a/utg.edu.gm/alex


Re: Help with creating a solr schema

2010-01-01 Thread Israel Ekpo
On Thu, Dec 31, 2009 at 10:26 AM, JaredM  wrote:

>
> Hi,
>
> I'm new to Solr but so far I think its great.  I've spent 2 weeks reading
> through the wiki and mailing list info.
>
> I have a use case and I'm not sure what the best way is to implement it.  I
> am keeping track of peoples calendar schedules in a really simple way: each
> user can login and input a number of date ranges where they are available
> (so for example - User Alice might be available between 1-Jan-2010 -
> 15-Jan-2010 and 20-Feb-2010 - 22-Feb-2010 and 1-Mar-2010-5-Mar-2010.
>
> In my data model I have this modelled as a one-to-many with a User table
> (consisting of username, some metadata) and an Availability table
> (consisting of start date and end date).
>
> Now I need to search which users are available between a given date range.
> The bit I'm having trouble with is how to store multiple start & end date
> pairs.  Can someone provide some guidance?
> --
> View this message in context:
> http://old.nabble.com/Help-with-creating-a-solr-schema-tp26979319p26979319.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

I have done something similar to this before.

You will have to store the username, firstname, lastname as single valued
fields







However, the start and end dates should be multivalued tint types.

I decided to store the dates as UNIX timestamps. The start dates are stored
as the unix timestamps at 12 midnight of that date (00:00:00)

The end dates are stored as the unix time stamps at 11:59:59 PM on the end
date 23:59:59

This (storing the dates as Trie integers) gave me faster range query
results.

when searching you will also have to convert the dates to unix time stamps
using similar logic before using it in the solr search query

You should use the username of the user as the uniqueKey.

If a user has multiple dates of availability you will enter it like so:



exampleun
examplefn
exampleln
137865661
137865662
137865663
137865681
137865682
137865683




-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Has anyone got Carrot2 working with Solr without using ant?

2010-01-01 Thread Grant Ingersoll
You should just be able to copy those files down to the same location, as this 
is all Ant is doing.

On Jan 1, 2010, at 2:11 PM, Alex Muir wrote:

> Hi,
> 
> I'm about to start using Ant to get Carrot2 working with solr however
> I was first trying to get it working without Ant by placing jars into
> a lib directory in the quickstart example directory however I couldn't
> find any documentation to guide me in this.
> 
> If anyone can suggest how to accomplish this I would be happy to hear about 
> it.
> 
> Happy New Year!
> Regards
> 
> -- 
> 
> Alex
> https://sites.google.com/a/utg.edu.gm/alex



Re: Requesting feedback on solr-spatial plugin

2010-01-01 Thread Grant Ingersoll

On Dec 30, 2009, at 9:54 PM, Mat Brown wrote:

> Hi Grant,
> 
> Thanks for the info and your point is well taken. I should have been
> clearer that I have no intention of this project being a long-term
> solution for spatial search in Solr - rather I was looking to build a
> rough and ready solution that gives some basic spatial search
> capabilities to tide us over until the real deal is available in Solr
> 1.5. That being said, I'd love to be of use in the official spatial
> efforts, so I'll be sure to take a look at the related tickets and see
> if there is anywhere I can help out.

FWIW, most of the functionality for spatial is now already committed on trunk.

-Grant

solr 1.4 csv import -- Document missing required field: id

2010-01-01 Thread evana

Hi,

I am trying to import a csv file (without "id" field) on solr 1.4
In schema.xml "id" field is set with required="false". 
But I am getting "org.apache.solr.common.SolrException: Document missing
required field: id"

Following is the schema.xml fields section
 

   
   

   
   
   
 
 
 id
 

Following is the csv file
company_id,customer_name,active
58,Apache,Y
58,Solr,Y
58,Lucene,Y
60,IBM,Y

Following is the solrj import client
SolrServer server = new
CommonsHttpSolrServer("http://localhost:8080/solr";);
ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/csv");
req.addFile(new File(filename));
req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList result = server.request(req);
System.out.println("Result: " + result);


Could any of you help out please.

Thanks
-- 
View this message in context: 
http://old.nabble.com/solr-1.4-csv-import-Document-missing-required-field%3A-id-tp26990048p26990048.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Help with creating a solr schema

2010-01-01 Thread JaredM

Thanks Ahmet and Israel.  I prefer Israel's approach since the amount of
metadata for the user is quite high but I'm not clear how to get around one
problem:

If I had 2 availabilities (I've left it in human-readable form instead of as
a UNIX timestamp only for ease of understanding):

10-Jan-2010
20-Jan-2010
25-Jan-2010
28-Jan-2010

and I wanted to query for availability between 12-Jan-2010 to 26-Jan-2010
then then wouldn't the above document be returned (even though the use would
not be available 20-25 Jan?
-- 
View this message in context: 
http://old.nabble.com/Help-with-creating-a-solr-schema-tp26979319p26990178.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.4 csv import -- Document missing required field: id

2010-01-01 Thread Israel Ekpo
On Fri, Jan 1, 2010 at 9:13 PM, evana  wrote:

>
> Hi,
>
> I am trying to import a csv file (without "id" field) on solr 1.4
> In schema.xml "id" field is set with required="false".
> But I am getting "org.apache.solr.common.SolrException: Document missing
> required field: id"
>
> Following is the schema.xml fields section
>  
>required="false" />
>   
>multiValued="true"/>
>
>   
>   
>   
>  
>
>  id
>
>
> Following is the csv file
>company_id,customer_name,active
>58,Apache,Y
>58,Solr,Y
>58,Lucene,Y
>60,IBM,Y
>
> Following is the solrj import client
>SolrServer server = new
> CommonsHttpSolrServer("http://localhost:8080/solr";);
>ContentStreamUpdateRequest req = new
> ContentStreamUpdateRequest("/update/csv");
>req.addFile(new File(filename));
>req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
>NamedList result = server.request(req);
>System.out.println("Result: " + result);
>
>
> Could any of you help out please.
>
> Thanks
> --
> View this message in context:
> http://old.nabble.com/solr-1.4-csv-import-Document-missing-required-field%3A-id-tp26990048p26990048.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
The presence of the uniqueKey definition implicitly implies that the id
field is a required field in the document even tough the attribute is set to
false on the field definition.

Try removing the uniqueKey definition for the id field in the schema.xml
file and then try again to run the update script or application.

The uniqueKey definition is not needed if you are going to build the index
from scratch each time you do the import.

However, if you are doing incremental updates, this field is required and
the uniqueKey definition is also needed too to specify what the "primary
key" for the doucment is.

http://wiki.apache.org/solr/UniqueKey


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Help with creating a solr schema

2010-01-01 Thread Israel Ekpo
On Fri, Jan 1, 2010 at 9:47 PM, JaredM  wrote:

>
> Thanks Ahmet and Israel.  I prefer Israel's approach since the amount of
> metadata for the user is quite high but I'm not clear how to get around one
> problem:
>
> If I had 2 availabilities (I've left it in human-readable form instead of
> as
> a UNIX timestamp only for ease of understanding):
>
> 10-Jan-2010
> 20-Jan-2010
> 25-Jan-2010
> 28-Jan-2010
>
> and I wanted to query for availability between 12-Jan-2010 to 26-Jan-2010
> then then wouldn't the above document be returned (even though the use
> would
> not be available 20-25 Jan?
> --
> View this message in context:
> http://old.nabble.com/Help-with-creating-a-solr-schema-tp26979319p26990178.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Unfortunately,

For this particular use case, if you are using the out-of-the-box features
available in Solr 1.4, without a custom Solr plugin using a custom Lucene
filter and some special value storage arrangement for the fields, you will
have to store each start and end date as a separate document. So, there will
be N separate documents for each username if that user has N distinct
periods of availabilty. The start date and end date fields would also have
to be single valued instead of multi-valued as I specified in the earlier
post.

Sorry.
-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/