Thanks Hoss. I agree that the way you restated the question is better
for getting results. BTW I think you've tipped me off to exactly what
I needed with this URL: http://bbyopen.com/
Thanks!
- Pulkit
On Fri, Sep 16, 2011 at 4:35 PM, Chris Hostetter
wrote:
>
> : Has anyone ever had to create lar
: Has anyone ever had to create large mock/dummy datasets for test
: environments or for POCs/Demos to convince folks that Solr was the
: wave of the future? Any tips would be greatly appreciated. I suppose
: it sounds a lot like crawling even though it started out as innocent
: DIH usage.
the be
On Thu, 2011-09-15 at 22:54 +0200, Pulkit Singhal wrote:
> Has anyone ever had to create large mock/dummy datasets for test
> environments or for POCs/Demos to convince folks that Solr was the
> wave of the future?
Yes, but I did it badly. The problem is that real data are not random so
any simple
http://aws.amazon.com/datasets
DBPedia might be the easiest to work with:
http://aws.amazon.com/datasets/2319
Amazon has a lot of these things.
Infochimps.com is a marketplace for free & pay versions.
Lance
On Thu, Sep 15, 2011 at 6:55 PM, Pulkit Singhal wrote:
> Ah missing } doh!
>
> BTW I s
Thanks for all the feedback thus far. Now to get little technical about it :)
I was thinking of feeding a file with all the tags of amazon that
yield close to roughly 5 results each into a file and then running
my rss DIH off of that, I came up with the following config but
something is amiss
Ah missing } doh!
BTW I still welcome any ideas on how to build an e-commerce test base.
It doesn't have to be amazon that was jsut my approach, any one?
- Pulkit
On Thu, Sep 15, 2011 at 8:52 PM, Pulkit Singhal wrote:
> Thanks for all the feedback thus far. Now to get little technical about it
If we want to test with huge amounts of data we feed portions of the internet.
The problem is it takes a lot of bandwith and lots of computing power to get
to a `reasonable` size. On the positive side, you deal with real text so it's
easier to tune for relevance.
I think it's easier to create a
I've done it using SolrJ and a *lot *of of parallel processes feeding dummy
data into the server.
On Thu, Sep 15, 2011 at 4:54 PM, Pulkit Singhal wrote:
> Hello Everyone,
>
> I have a goal of populating Solr with a million unique products in
> order to create a test environment for a proof of con
Hello Everyone,
I have a goal of populating Solr with a million unique products in
order to create a test environment for a proof of concept. I started
out by using DIH with Amazon RSS feeds but I've quickly realized that
there's no way I can glean a million products from one RSS feed. And
I'd go