SOLR indexing : Multiple content/document types

2010-01-23 Thread Krantiā„¢ K K Parisa
Hi, I would like to know the best strategy/standards to follow for indexing multiple document types thru SOLR. In other words, let us say we have a file upload form thru which user woudl upload the files of different types (text, html, xml, word docs,excel sheets, pdf, jpg, gif..etc) Once we save

Dedupe of document results at query-time

2010-01-23 Thread Peter S
Hi, I wonder if someone might be able to shed some insight into this problem: Is it possible and/or what is the best/accepted way to achieve deduplication of documents by field at query-time? For example: Let's say an index contains: Doc1 host:Host1 time:1

Re: Improvising solr queries

2010-01-23 Thread dipti khullar
Thanks Eric Correctly said!! Initially we used to have a different settings for queryResultCache which used to serve the purpose of serving queries from the cache. But we changed the settings some days back to see if there were any issues/improvements. I believe we need to switch back to some s

RE: SOLR indexing : Multiple content/document types

2010-01-23 Thread Adamsky, Robert
> I would like to know the best strategy/standards to follow for indexing > multiple document types thru SOLR. > In other words, let us say we have a file upload form thru which user woudl > upload the files of different types (text, html, xml, word docs,excel http://lucene.apache.org/tika/ http

Re: Solr under tomcat - UTF-8 issue

2010-01-23 Thread Sven Maurmann
Hi, I did not read the original mail, but for the UTF-8 issue with Tomcat you might consult the url http://wiki.apache.org/solr/SolrTomcat The relevant piece of information is under "URI Charset Config": *** quote *** Edit Tomcat's conf/server.xml and add the following attribute to the correct

Re: Solr vs. Compass

2010-01-23 Thread Uri Boness
waw... well, transactional or "transactional", whether it's a nice feature to have or just a "selling point". Bottom line, For some applications Compass can be very appealing, for other Solr will be the choice. In the last several years I've integrated both in different applications and gaine

Re: Dedupe of document results at query-time

2010-01-23 Thread Martijn v Groningen
This manner of detecting duplicates at query time does really match with what field collapsing does. So I suggest you look into that. As far as I know there isn't any function query that does something you have described in your example. Cheers, Martijn On 23 January 2010 12:31, Peter S wrote:

Re: Find newly added documents

2010-01-23 Thread Simon Rosenthal
"newly added" is a bit vague. Do you mean "since last Sunday" ? "between the last and the one before that" ? Also, do you need to distinguish between updated and newly added documents ? Perhaps you could be more specific about the use case. -Simon On Fri, Jan 22, 2010 at 4:25 AM, Erik Hatcher

Index gets deleted after commit?

2010-01-23 Thread Bogdan Vatkov
After mass upload of docs in Solr I get some "REMOVING ALL DOCUMENTS FROM INDEX" without any explanation. I was running indexing w/ Solr for several weeks now and everything was ok - I indexed 22K+ docs using the SimplePostTool I was first launching *:* then some 22K+ ... with a finishing But

Re: understanding termVector output

2010-01-23 Thread Koji Sekiguchi
Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] wrote: Hi, I'm trying to see if I can use termVectors for a use case I have. Essentially I want to know is: where in the indexed value does the query hit occur? I think either tv.positions or tv.offsets would provide that info but I don't really grok

Solr Cache Viewing/Browsing

2010-01-23 Thread Amit Nithian
Hi All, I am using the SolrCache to store some external data in my search app (to be used in a modified DisMaxHandler) and I was wondering if there is a way to get at this data from the JSP pages? I then thought that it might be nice to view more information about the respective caches like the cu

Re: Index gets deleted after commit?

2010-01-23 Thread Amit Nithian
Are you using the DIH? If so, did you try setting clean=false in the URL line? That prevents wiping out the index on load. On Jan 23, 2010 4:06 PM, "Bogdan Vatkov" wrote: After mass upload of docs in Solr I get some "REMOVING ALL DOCUMENTS FROM INDEX" without any explanation. I was running inde

wildcard search and hierarchical faceting

2010-01-23 Thread Andy
I'd like to provide a hierarchical faceting functionality. An example would be location drill down such as USA -> New York -> New York City -> SoHo The number of levels can be arbitrary. One way to handle this could be to use a special character as separator, store values such as "USA|New York|