Hi all, I don't know if this is the correct mailing list, so I apologize if it isn't. I wasn't sure what other list it would go to.
Anyways, my company a while back (before I started) got Google envy and decided to purchase a GSA system to store our searchable data. While the GSA seems ok for a web-crawler it seems woefully inadequate for quick searching of application/non-web data. Unfortunately, since they already purchased a GSA license and support I am trying to put together a non-direct cost argument on why I need to switch our search infrastructure from GSA to Solr. To preface this, while I have used Lucene in various projects in the past (though not too extensively, just for basic search implementations) I have never used Solr. I was hoping someone could comment on some of the areas below where I have encountered friction with the GSA and let me know if / how Solr is an improvement. 1) Sorting by anything other than last modified date or relevancy is impossible with the GSA. I need to be able to sort results based on a specific piece of metadata 2) When performing a search outside of the page bounds (e.g. there are only 2 pages of results but the user queries for data on page 3) the GSA returns a total results count of zero, making it impossible to know if you have paged too far or if there were actually zero results 3) No insight into data being fed into the GSA. When I send data to the GSA it lists the data feed in the "feeds" page, but it's impossible to know which feed contained what data, and if an error occurs (depending on the error) you have no idea which peice of data was rejected or caused the failure. Due to this I had to cut down and only send data to the system in very small chunks, just so one bad entry doesn't hold back too many records being updated. 4) The GSA does not allow searching for data between two dates. The most it lets you do is define a numerical data field with the dates (e.g. 20120901) but the GSA only supports numerical searching up to 6 significant digits, which means it only gives month accuracy but not day. 5) The GSA does not allow operations nested within OR statements. For example, you cannot do (x and y) or (a and b). 6) No way to selectively flush mass data. If I need to flush all the data in a collection to re-index it I have to deny a whole URL so the indexer clears the data out, then re-enable that URL. Sometimes I need to flush only data flagged as articles or data for a specific client. 7) Setting up facet groups is a very manual process in the GSA. Also there's no easy way to have date ranges as search facets (date ranges all have to be explicitely defined through the web interface and manually maintained, I'd rather be able to have it give me facets on a year by year basis, or month by month). Those are the main pain points. There are others, such as community support (which between the mailing list and stack overflow I'm not worried about) but if anyone can give me a quick rundown on if Solr addresses any of these issues I would be immensely thankful.