Hi all,  I don't know if this is the correct mailing list, so I apologize
if it isn't.  I wasn't sure what other list it would go to.

Anyways, my company a while back (before I started) got Google envy and
decided to purchase a GSA system to store our searchable data.  While the
GSA seems ok for a web-crawler it seems woefully inadequate for quick
searching of application/non-web data.  Unfortunately, since they already
purchased a GSA license and support I am trying to put together a
non-direct cost argument on why I need to switch our search infrastructure
from GSA to Solr.  To preface this, while I have used Lucene in various
projects in the past (though not too extensively, just for basic search
implementations) I have never used Solr.

I was hoping someone could comment on some of the areas below where I have
encountered friction with the GSA and let me know if / how Solr is an
improvement.

1) Sorting by anything other than last modified date or relevancy is
impossible with the GSA.  I need to be able to sort results based on a
specific piece of metadata

2) When performing a search outside of the page bounds (e.g. there are only
2 pages of results but the user queries for data on page 3) the GSA returns
a total results count of zero, making it impossible to know if you have
paged too far or if there were actually zero results

3) No insight into data being fed into the GSA.  When I send data to the
GSA it lists the data feed in the "feeds" page, but it's impossible to know
which feed contained what data, and if an error occurs (depending on the
error) you have no idea which peice of data was rejected or caused the
failure.  Due to this I had to cut down and only send data to the system in
very small chunks, just so one bad entry doesn't hold back too many records
being updated.

4) The GSA does not allow searching for data between two dates.  The most
it lets you do is define a numerical data field with the dates (e.g.
20120901) but the GSA only supports numerical searching up to 6 significant
digits, which means it only gives month accuracy but not day.

5) The GSA does not allow operations nested within OR statements.  For
example, you cannot do (x and y) or (a and b).

6) No way to selectively flush mass data.  If I need to flush all the data
in a collection to re-index it I have to deny a whole URL so the indexer
clears the data out, then re-enable that URL.  Sometimes I need to flush
only data flagged as articles or data for a specific client.

7) Setting up facet groups is a very manual process in the GSA.  Also
there's no easy way to have date ranges as search facets (date ranges all
have to be explicitely defined through the web interface and
manually maintained, I'd rather be able to have it give me facets on a year
by year basis, or month by month).

Those are the main pain points.  There are others, such as community
support (which between the mailing list and stack overflow I'm not worried
about) but if anyone can give me a quick rundown on if Solr addresses any
of these issues  I would be immensely thankful.

Reply via email to