LucidWorks Solr

2010-04-18 Thread Andy
Just wanted to know if anyone has used LucidWorks Solr. - How do you compare it to the standard Apache Solr? - the non-blocking IO of LucidWorks Solr -- is that for networking IO or disk IO? what are its effects? - LucidWorks website also talked about "significantly improved faceting performa

Re: LucidWorks Solr

2010-04-18 Thread Paolo Castagna
Thanks for asking, I am interested as well in reading the response to your questions. Paolo Andy wrote: Just wanted to know if anyone has used LucidWorks Solr. - How do you compare it to the standard Apache Solr? - the non-blocking IO of LucidWorks Solr -- is that for networking IO or disk

Autofill 'id' field with the URL of files posted to Solr?

2010-04-18 Thread Praveen Agrawal
Hi, I need to submit thousands of online PDF/html files to Solr. I can submit one file using SolrJ (StreamingUpdateSolrServer and ..solr.common.util.ContentStreamBase.URLStream), setting literal.idparameter to the url. I can't do the same with a batch of multiple files, as their 'id' should be uniq

Autofill 'id' field with the URL of files posted to Solr?

2010-04-18 Thread pk
Hi, I need to submit thousands of online PDF/html files to Solr. I can submit one file using SolrJ (StreamingUpdateSolrServer and ..solr.common.util.ContentStreamBase.URLStream), setting literal.id parameter to the url. I can't do the same with a batch of multiple files, as their 'id' should be un

Re: Facet count problem

2010-04-18 Thread Ranveer Kumar
I am.using text for type, which is static. For example: type is a field and I am using type for categorization. For news type I am using news and for blog using blog.. type is a text field. On Apr 17, 2010 8:38 PM, "Ahmet Arslan" wrote: > I am facing problem to get facet result count. I must be

Solr throws TikaException while parsing sample PDF

2010-04-18 Thread pk
Hi, while posting a sample pdf (that comes with Solr dist'n) to solr, i'm getting a TikaException. Using Solr-1.4, SolrJ (StreamingUpdateSolrServer) for posting pdf to solr. Other sample pdfs can be parsed and indexed successfully.. I;m getting same error with some other pdfs also (but adobe read

Re: Solr Schema Question

2010-04-18 Thread Serdar Sahin
Thanks everyone, It works! I have successfully indexed them. Thanks again! I have couple of more questions regarding with solr, if you don't mind. 1-) As I said before, the text files are quite large, between 100kb-10mb, but I need to store them as well for highlighting, including with their titl

Re: Solr throws TikaException while parsing sample PDF

2010-04-18 Thread Grant Ingersoll
Can you extract content from this using Tika's standalone command line tool? PDF's are notorious for problems in extracting. To me, it looks like a bug in PDFBox. I would try to isolate it down to there and then send, if possible, the sample document to PDFBox and see if they can come up w/ a

Re: LucidWorks Solr

2010-04-18 Thread Grant Ingersoll
On Apr 18, 2010, at 3:53 AM, Andy wrote: > Just wanted to know if anyone has used LucidWorks Solr. > > - How do you compare it to the standard Apache Solr? We take a release of Solr. We wrap it w/ an installer, tomcat/jetty, our reference guide, Luke, etc. We also add in an optimized versio

Re: geometric distance

2010-04-18 Thread Darren Govoni
FAIK, There are no columns per se. But in the past I've just used UTM values for each lat lon and just do basic numeric operators >, < to search within a bounding geographic region. Add them as numeric fields though. Easy. There is new support for spatial searching, however I'm not sure how it com

Re: Autofill 'id' field with the URL of files posted to Solr?

2010-04-18 Thread Lance Norskog
The DataImportHandler has a tool for doing PDF extraction. This allows you to create new fields, do multiple files, and supply lists of access to get the multiple files. http://wiki.apache.org/solr/TikaEntityProcessor On Sun, Apr 18, 2010 at 9:52 AM, pk wrote: > > Hi, > I need to submit thousand

Re: Solr Schema Question

2010-04-18 Thread Lance Norskog
Highlighting is a complex topic. A field has to be stored to be highlight. It does not have to be indexed. But, if it is not, highlighting analyzes it just like if it was indexed in order to highlight it. http://www.lucidimagination.com/search/document/CDRG_ch07_7.9?q=highlighting http://www.luci

Re: Facet count problem

2010-04-18 Thread Erick Erickson
Can we see the actual field definitions from your schema file. Ahmet's question is vital and is best answered if you'll copy/paste the relevant configuration entries But based on what you *have* posted, I'd guess you're trying to facet on tokenized fields, which is not recommended. You might t

Re: DIH dataimport.properties with

2010-04-18 Thread Michael Tibben
Because there is a lot of data, and for scalability reasons we want all non-write operations to happen from a slave - we don't want to be using the master unless necessary On 17/04/10 08:28, Otis Gospodnetic wrote: Hm, why not just go to the MySQL master then? Otis Sematext :: http://se

Re: DIH dataimport.properties with

2010-04-18 Thread Michael Tibben
I don't really understand how this will help. Can you elaborate ? Do you mean that the last_index_time can be imported from somewhere outside solr? But I need to be able to *set* what last_index_time is stored in dataimport.properties, not get properties from somewhere else On 18/04/10 10:

Re: Facet count problem

2010-04-18 Thread Ranveer Kumar
Hi Erick, My schema configuration is following. On Mon, Apr 19, 20

Re: LucidWorks Solr

2010-04-18 Thread Andy
--- On Sun, 4/18/10, Grant Ingersoll wrote: > > Sure, but I'm biased. ;-)  Hopefully, you will find it > useful, but choose the one that best fits your needs (and > let me know if you need help assessing that.) > Thanks for the explanation Grant. WHat is the advantage of KStem over the sta

Re: Autofill 'id' field with the URL of files posted to Solr?

2010-04-18 Thread pk
Lance, I can submit and extract pdf contents using Solr and SolrJ, as i indicated earlier. I've made 'id' a mandatory field and i had to submit its value while submitting (request.addParams("literal.id",url)).. If i put multiple files/streams in the request, then i can't put 'id' this way as the

Query regarding "copyField"

2010-04-18 Thread Sandhya Agarwal
Hello, Is it a problem if I use *copyField* for some fields and not for others. In my query, I have both fields, the ones mentioned in copyField and ones that are not copied to a common destination. Will this cause an anomaly in my search results. I am seeing some weird behavior. Thanks, Sandh