Hi All, I am new to Lucene / SOLR and developing a POC as part of research. Check below my requirement and problem statement. Need help on how I can index the data such data I have a very good search functionality in my POC.
------------------------------------------------------------------ Requirement: ------------------------------------------------------------------ Assume my web application is an Online book store and it sell all categories of books like Computers, Social Studies, Physical Sciences etc. Each of these categories has sub-categories. For example Computers has sub-categories like Software Engineering, Java, SQL Server etc I have a database table called Categories and it contains both Parent Category descriptions and also Child Category descriptions. Data structure of Category table is: Category_ID_Primay_Key integer Parent_Category_ID integer Category_Name varchar(100) Category_Description varchar(1000) ------------------------------------------------------------------ My Search UI: ------------------------------------------------------------------ My search page is very simple. We have a text field with "Search" button. ------------------------------------------------------------------ User Action: ------------------------------------------------------------------ User enter below search text in above text field and clicks on "Search" button. "Books on Data Center" ------------------------------------------------------------------ What is my expected behavior: ------------------------------------------------------------------ Since the word "Data Center" more relevant computers I should show books related to computers. ------------------------------------------------------------------ My Problem statement and Question to you all: ------------------------------------------------------------------ To have a better search in my web applications what kind of strategy should I have and index the data accordingly in SOLR/Lucene. In my Lucene Index I may or may not have the word "data center". Still I should be able to return "data center" One thought I have is as follows: Modify the Category table by adding one more column to it: Category_ID_Primay_Key integer Parent_Category_ID integer Category_Name varchar(100) Category_Description varchar(1000) Category_Description_Keywords varchar(8000) Now take each word in "Category_description", find synonyms of it and store that data in Category_Description_Keywords column. After doing it, index the Category table records in SOLR/Lucene. Below are my questions to you all: Question 1: Need your feedbacks on above approach or any other approach which help me to make my search better that returns most relevant results to the user. Question 2: Can you suggest me Java based best Open Source or commercial synonym engines. I want such a best synonym engine that gives me all possible synonyms of a word. Thanks in Advance, Kishore Veleti A.V.K.