Re: using solr as master for data storage/retrieval?

Norberto Meijome Thu, 08 May 2008 17:59:35 -0700

On Thu, 8 May 2008 09:24:45 -0400 (EDT)
"Phillip Rhodes" <[EMAIL PROTECTED]> wrote:


> 
> B,
> My thoughts are coming from experience while writing and using stitches.  
> Stitches is a java-based project that allows local and remote java clients 
> (using hessian for java, xfire for dotnet) to search, store and retrieve 
> images and image meta data.  We are using it to store 10 Gb's of images and 
> the search is wicked fast.

interesting, have you got a URL to check it out ? 

>  We use it to allow users to associate images to galleries, events, etc..  
> It's using compass/lucene right now.  The API is a lot like amazon S3, but 
> this is just a coincidence of solving the same problem.  We are using it in 
> dotnet and java.
> 
> I was thinking that one of the benefits of solr is that of replication.  
> Currently, there is only one production instance of stitches, and we are 
> using it to power image serving, image thumbnails for 5 major sites.   If 
> stitches goes down, these sites would not have images, and I would be in 
> trouble.

ok - but solr isn't the only thing out there that provides replication. Are you
planning to store the original image in SOLR, or only the thumbnails? ( i
imagine to make a self contained search-results-with-thumbnail-setup. ) Don't
forget this will increase quite a lot the amount of data stored, replicated and
returned in each query. Not sure if it would increase the size of cache (do
solr caches store only the references to documents, or the whole document? ).
Also, every image extracted from the stream and served by your SOLR-container
or frontend is , arguably, one fewer query-related operation it can perform....

> By using solr, I was hoping I could get more scalability by leveraging the 
> rsync/replication so my search index and it's data (image binary files) would 
> be clustered across multiple machines. 

probably. But i think you may be mixing core functionalities / purposes. I
would keep SOLR (clustered or not) for its purpose, and image serving / storing
in its own.  
I don't see the advantage of using SOLR for this compared to, say, a clustered
storage(glusterFS / MogileFS / HadoopFS / zfs-copy ) system with a web server
interface, and the results from SOLR pointing to these URLs.

But, by all means give it a try and let us know how it works out.

Cheers,
B
_________________________
{Beto|Norberto|Numard} Meijome

"Time exists so everything doesn't happen at once"
   Albert Einstein

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Re: using solr as master for data storage/retrieval?

Reply via email to