Highlighting is a complex topic. A field has to be stored to be highlight. It does not have to be indexed. But, if it is not, highlighting analyzes it just like if it was indexed in order to highlight it.
http://www.lucidimagination.com/search/document/CDRG_ch07_7.9?q=highlighting http://www.lucidimagination.com/blog/2009/02/17/highlighting-highlighter-thoughts/ On Sun, Apr 18, 2010 at 10:12 AM, Serdar Sahin <anlamar...@gmail.com> wrote: > Thanks everyone, It works! I have successfully indexed them. Thanks again! > > I have couple of more questions regarding with solr, if you don't mind. > > 1-) As I said before, the text files are quite large, between > 100kb-10mb, but I need to store them as well for highlighting, > including with their title, description, tags (I concat tags while > fetching from the db, and treat them as one row). For search result on > the page, I have to get; > > username (string) > lang (string) > cat (string) > view_count (int) > imgid (int) > thumbs_up (int) > thumbs_down (int) > > these columns as well. These columns are not used for indexing, just > for storing. Do you think it is better idea to store these columns as > well and not query the database? Or, I can just get the ids and query > the database myself. Which approach is better from memory usage and > performance perspective? I was using Sphinx for full text searching on > my production websites, so I am not used to this format as Sphinx only > returns document IDs. > > 2-) I was using Sphinx for other purposes as well, like "browse" > section on the website. http://www.youtube.com/videos. It gives better > performance on large datasets (sorting, ordering etc). I know some > people also use solr(lucene) for this, but I have not seen any website > that use solr on their "browse" section without using Facets. So, even > if I don't use Facets, is it still useful to use solr on that section? > I will be storing a large amount of data on solr, and expect to have 1 > TB data after 6-8 months. > > 3-) I will be using http://wiki.apache.org/solr/MoreLikeThis option > too. As I said the text files are large. Do you have any suggestions > regarding with this feature? > > Thanks again, > > > > > > On Sun, Apr 18, 2010 at 7:53 AM, Lance Norskog <goks...@gmail.com> wrote: >> Man you people are fast! >> >> There is a bug in Solr/Lucene. It keeps memory around from previous >> fields, so giant text files might run out of memory when they should >> not. This bug is fixed in the trunk. >> >> On 4/17/10, Lance Norskog <goks...@gmail.com> wrote: >>> The DataImportHandler can let you fetch the file name from the >>> database record, and then load the file as a field and process the >>> text with Tika. >>> >>> It will not be easy :) but it is possible. >>> >>> http://wiki.apache.org/solr/DataImportHandler >>> >>> On 4/17/10, Serdar Sahin <anlamar...@gmail.com> wrote: >>>> Hi, >>>> >>>> I am rather new to Solr and have a question. >>>> >>>> We have around 200.000 txt files which are placed into the file cloud. >>>> The file path is something similar to this: >>>> >>>> file/97/8f/840/fa4-1.txt >>>> file/a6/9d/ab0/ca2-2.txt etc. >>>> >>>> and we also store the metadata (like title, description, tags etc) >>>> about these files in the mysql server. So, what I want to do is to >>>> index title, description, tags and other data from mysql, and also get >>>> the txt file from file server, and link them as one record for >>>> searching, but I could not figure out how to automatize this process. >>>> I can give the path from the sql query like, Select id, title, >>>> description, file_path, and then solr can use this path to retrieve >>>> txt file, but I don't know whether is it possible or not. >>>> >>>> What is the best way to index these files with their tag title and >>>> description without coding in Java (Perl is ok). These txt files are >>>> large, between 100kb-10mb, so the last option is to store them in the >>>> database. >>>> >>>> Thanks, >>>> >>>> Serdar >>>> >>> >>> >>> -- >>> Lance Norskog >>> goks...@gmail.com >>> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> > -- Lance Norskog goks...@gmail.com