Hi Everyone, We are looking for someone to help us build a similarity engine. Here are some preliminary specs for the project.
1) We want to be able to show similar posts when a user posts a new block of text. A good example of this is StackOverflow. When a user tries to ask a new question, the system displays similar questions. 2) This is for a messaging system, so indexing/analysis should happen preferably at the time of posting, not later. 3) The posts are going to be less than 1000 characters. 4) We anticipate to have a millions of posts so the solution should consider sharding techniques to shard the indexes on many machines. 5) The solution can be delivered as a stand alone Java SE solution which can be run from the command line, no web development necessary. 6) We expect clean APIs. Thanks, Drew