date:20100316

Weired behaviour for certain search terms

2010-03-16 Thread Akash Sahu

Solr is behaving a bit weirdly for some of the search terms. EG: co-ownership, "co ownership". It works fine with terms like quasi-delict, non-interference etc. The issue is, its not return any excerpts in "highlighting" key of the result dictionary. My search query is something like this: http:/

RE: Issue in search

2010-03-16 Thread Nair, Manas

You could write yourr query like q=filedname1:searchValue AND fieldName2:value OR fieldName3: Value Regards, Manas From: Suram [mailto:reactive...@yahoo.com] Sent: Wed 3/17/2010 12:44 AM To: solr-user@lucene.apache.org Subject: Issue in search In solr how ca

RE: XML data in solr field

2010-03-16 Thread Nair, Manas

Thankyou Tommy. But the real problem here is that the xml is dynamic and the element names will be different in different docs which means that there will be a lot of field names to be added in schema if I were to index those xml nodes separately. Is it possible to have nested indexing (xml with

Re: Solr RAM Requirements

2010-03-16 Thread Dennis Gearon

Just turn your entire disk to RAM http://www.hyperossystems.co.uk/ 800X faster. Who cares if it swaps to 'disk' then :-) Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --

Issue in search

2010-03-16 Thread Suram

In solr how can perform AND, OR, NOT search while querying the data -- View this message in context: http://old.nabble.com/Issue-in-search-tp27927828p27927828.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: field length normalization

2010-03-16 Thread Lance Norskog

You need to change your similarity object to be more sensitive at the short end. This is a patch about how to do this: http://issues.apache.org/jira/browse/LUCENE-2187 It involves Lucene coding. On Fri, Mar 12, 2010 at 3:19 AM, muneeb wrote: > > Ah I see. > Thanks very much Jay for your explan

Re: PDFBox/Tika Performance Issues

2010-03-16 Thread Mattmann, Chris A (388J)

Hi Giovanni, Comments below: > I'm pretty unclear on how to patch the Tika 0.7-trunk on our Solr instance. > This is what I've tried so far (which was really just me guessing): > > > > 1. Got the latest version of the trunk code from > http://svn.apache.org/repos/asf/lucene/tika/trunk > >

Re: problem during benchmarking solr query

2010-03-16 Thread Lance Norskog

Use a + sign or %20 for the space. The URL standard uses a plus to mean a space. On Tue, Mar 16, 2010 at 6:06 PM, KshamaPai wrote: > > Hi, > Am using autobench to benchmark solr with the query > http://localhost:8983/solr/select/?q=body:hotel AND > _val_:"recip(hsin(0.7113258,-1.291311553,lat_rad

Re: APR setup

2010-03-16 Thread Lance Norskog

That would be a Tomcat question :) On Tue, Mar 16, 2010 at 8:36 PM, blargy wrote: > > [java] INFO: The APR based Apache Tomcat Native library which allows optimal > performance in production environments was not found on the > java.library.path: > .:/Library/Java/Extensions:/System/Library/Java/E

spanish solr tutorial

2010-03-16 Thread Juan Pedro Danculovic

Hi all, we translated the Solr tutorial to Spanish due to a client's request. For all you Spanish speakers/readers out there, you can have a look at it: http://www.linebee.com/?p=155 We hope this can expand the usage of the project and lower the language barrier to non-english speakers. Thanks

Re: Trouble Implementing Extracting Request Handler

2010-03-16 Thread Lance Norskog

org/apache/solr/util/plugin/SolrCoreAware in the stack trace refers to an interface in the main Solr jar. I think this means that putting all of the libs in apache-tomcat-6.0.20/lib is a mistake: the classloader finds ExtractingRequestHandler in apache-tomcat-6.0.20/lib/apache-solr-cell-1.4.1-dev.

APR setup

2010-03-16 Thread blargy

[java] INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: .:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java What the heck is this and why is it recommended for production setti

Stopwords

2010-03-16 Thread blargy

I was reading "Scaling Lucen and Solr" (http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/) and I came across the section StopWords. In there it mentioned that its not recommended to remove stop words at index time. Why is this the case? Don't all t

Re: Solr RAM Requirements

2010-03-16 Thread Peter Sturge

There are certainly a number of widely varying opinions on the use of RAM directory. Basically, though, if you need the index to be persistent at some point (i.e. saved across reboots, crashes etc.), you'll need to write to a disk, so RAM directory becomes somewhat superfluous in this case. Genera

problem during benchmarking solr query

2010-03-16 Thread KshamaPai

Hi, Am using autobench to benchmark solr with the query http://localhost:8983/solr/select/?q=body:hotel AND _val_:"recip(hsin(0.7113258,-1.291311553,lat_rad,lng_rad,30),1,1,0)"^100 But if i specify the same in the autobench command as autobench --file bar1.tsv --high_rate 100 --low_rate 20 --rate

Solr query parser doesn't invoke analyzer for simple term query?

2010-03-16 Thread Teruhiko Kurosaka

It seems that Solr's query parser doesn't pass a single term query to the Analyzer for the field. For example, if I give it 2001年 (year 2001 in Japanese), the searcher returns 0 hits but if I quote them with double-quotes, it returns hits. In this experiment, I configured schema.xml so that the f

Re: Solr RAM Requirements

2010-03-16 Thread KaktuChakarabati

Hey Peter, Thanks for your reply. My question was mainly about the fact there seems to be two different aspects to the solr RAM usage: in-process and out-process. By that I mean, yes i know the many different parameters/caches to do with solr in-process memory usage and related culprits, however

Re: Undefined field price on Dismax query

2010-03-16 Thread Alex Thurlow

Aha. That appears to be the issue. I hadn't realized that the query handler had all of those definitions there. -Alex On 3/16/2010 6:56 PM, Erick Erickson wrote: I suspect your problem is that you still have "price" defined in solrconfig.xml for the dismax handler. Look for the section

Re: Moving From Oracle Text Search To Solr

2010-03-16 Thread Erick Erickson

Besides the other notes here, I agree you'll hit OOM if you try to read all the rows into memory at once, but I'm absolutely sure you can read then N at a time instead. Not that I could tell you how, mind you. You're on your way... Erick On Tue, Mar 16, 2010 at 4:13 PM, Neil Chaudhuri < nchau

Re: Undefined field price on Dismax query

2010-03-16 Thread Erick Erickson

I suspect your problem is that you still have "price" defined in solrconfig.xml for the dismax handler. Look for the section You'll see price defined as one of the default fields for "fl" and "bf". HTH Erick On Tue, Mar 16, 2010 at 6:55 PM, Alex Thurlow wrote: > Hi guys, >Based on some s

Re: Solr RAM Requirements

2010-03-16 Thread Peter Sturge

On Tue, Mar 16, 2010 at 9:08 PM, KaktuChakarabati wrote: > > Hey, > I am trying to understand what kind of calculation I should do in order to > come up with reasonable RAM size for a given solr machine. > > Suppose the index size is at 16GB. > The Max heap allocated to JVM is about 12GB. > > The

Re: Indexing CLOB Column in Oracle

2010-03-16 Thread Shawn Heisey

Disclaimer: My Oracle experience is miniscule at best. I am also a beginner at Solr, so grab yourself the proverbial grain of salt. I googled a bit on CLOB. One page I found mentioned setting up a view to return the data type you want. Can you use the functions described on these pages in

Re: Trouble Implementing Extracting Request Handler

2010-03-16 Thread Steve Reichgut

Lance, I tried that but no luck. Just in case the relative paths were causing a problem, I also tried using absolute paths but neither seemed to help. First, I tried adding ** as the full directory so it would hopefully include everything. When that didn't work, I tried adding paths directly

Indexing CLOB Column in Oracle

2010-03-16 Thread Neil Chaudhuri

Since my original thread was straying to a new topic, I thought it made sense to create a new thread of discussion. I am using the DataImportHandler to index 3 fields in a table: an id, a date, and the text of a document. This is an Oracle database, and the document is an XML document stored as

Re: Moving From Oracle Text Search To Solr

2010-03-16 Thread Lance Norskog

The DataImportHandler has tools for this. It will fetch rows from Oracle and allow you to unpack columns as XML with Xpaths. http://wiki.apache.org/solr/DataImportHandler http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS http://wiki.apache.org/solr/DataImportHandler#XPathEntityProces

Undefined field price on Dismax query

2010-03-16 Thread Alex Thurlow

Hi guys, Based on some suggestions, I'm trying to use the dismax query type. I'm getting a weird error though that I think it related to the default test data set. From the query tool (/solr/admin/form.jsp), I put in this: Statement: artist:test title:test +type:video query type: dismax

RE: PDFBox/Tika Performance Issues

2010-03-16 Thread Giovanni Fernandez-Kincade

I'm pretty unclear on how to patch the Tika 0.7-trunk on our Solr instance. This is what I've tried so far (which was really just me guessing): 1. Got the latest version of the trunk code from http://svn.apache.org/repos/asf/lucene/tika/trunk 2. Built this using Maven (mvn install) 3

Re: DIH request parameters

2010-03-16 Thread Lance Norskog

They are a namespace like other namespaces and are useable in attributes, just like in the DB query string examples. As to defaults, you can declare those in the declarations in solrconfig.xml. Examples of this (search for "defaults") in the wiki page. On Tue, Mar 16, 2010 at 7:05 AM, Lukas Kahw

Re: Trouble Implementing Extracting Request Handler

2010-03-16 Thread Lance Norskog

NoClassDefFoundError usually means that the class was found, but it needs other classes and those were not found. That is, Solr finds the ExtractingRequestHandler jar but cannot find the Tika jars. In example/solr/conf/slrconfig.xml, there are several '' elements. These give classpath directories

RE: PDFBox/Tika Performance Issues

2010-03-16 Thread Giovanni Fernandez-Kincade

Thanks Chris! I'll try the patch. -Original Message- From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Tuesday, March 16, 2010 5:37 PM To: solr-user@lucene.apache.org Subject: Re: PDFBox/Tika Performance Issues Guys, I think this is an issue with PDFBOX and t

Re: PDFBox/Tika Performance Issues

2010-03-16 Thread Mattmann, Chris A (388J)

Guys, I think this is an issue with PDFBOX and the version that Tika 0.6 depends on. Tika 0.7-trunk upgraded to PDFBox 1.0.0 (see [1]), so it may include a fix for the problem you're seeing. See this discussion [2] on how to patch Tika to use the new PDFBox if you can't wait for the 0.7 release

RE: PDFBox/Tika Performance Issues

2010-03-16 Thread Giovanni Fernandez-Kincade

Originally 16 (the number of CPUs on the machine), but even with 5 threads it's not looking so hot. -Original Message- From: Grant Ingersoll [mailto:gsi...@gmail.com] On Behalf Of Grant Ingersoll Sent: Tuesday, March 16, 2010 5:15 PM To: solr-user@lucene.apache.org Subject: Re: PDFBox/Ti

RE: Moving From Oracle Text Search To Solr

2010-03-16 Thread Neil Chaudhuri

That is a great article, David. For the moment, I am trying an all-Solr approach, but I have run into a small problem. The documents are stored as XML CLOB's using Oracle's OPAQUE object. Is there any facility to unpack this into the actual text? Or must I execute that in the SQL query? Thank

Re: PDFBox/Tika Performance Issues

2010-03-16 Thread Grant Ingersoll

Hmm, that is an ugly thing in PDFBox. We should probably take this over to the PDFBox project. How many threads are you indexing with? FWIW, for that many documents, I might consider using Tika on the client side to save on a lot of network traffic. -Grant On Mar 16, 2010, at 4:37 PM, Giovan

Solr RAM Requirements

2010-03-16 Thread KaktuChakarabati

Hey, I am trying to understand what kind of calculation I should do in order to come up with reasonable RAM size for a given solr machine. Suppose the index size is at 16GB. The Max heap allocated to JVM is about 12GB. The machine I'm trying now has 24GB. When the machine is running for a while

Re: XML data in solr field

2010-03-16 Thread Tommy Chheng

Do you have the option of just importing each xml node as a field/value when you add the document? That'll let you do the search easily. If you need to store the raw XML, you can use an extra field. Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chhe

Re: Moving From Oracle Text Search To Solr

2010-03-16 Thread Smiley, David W.

If you do stay with Oracle, please report back to the list how that went. In order to get decent filtering and faceting performance, I believe you will need to use "bitmapped indexes" which Oracle and some other databases support. You may want to check out my article on this subject: http://ww

PDFBox/Tika Performance Issues

2010-03-16 Thread Giovanni Fernandez-Kincade

I've been trying to bulk index about 11 million PDFs, and while profiling our Solr instance, I noticed that all of the threads that are processing indexing requests are constantly blocking each other during this call: http-8080-Processor39 [BLOCKED] CPU time: 9:35 java.util.Collections$Synchroni

Re: Moving From Oracle Text Search To Solr

2010-03-16 Thread Glen Newton

I've also index a concatenation of 50k journal articles (making a single document of several hundred MB of text) and it did not give me an OOM. -glen On 16 March 2010 15:57, Erick Erickson wrote: > Why do you think you'd hit OOM errors? How big is "very large"? I've > indexed, as a single docum

Re: LucidWorks Solr

2010-03-16 Thread Kevin Osborn

For my purposes, the Porter analyzer was overly aggressive with stemming. So, we then moved to KStem. It looks like this is no longer being maintained and Lucid claimed much better performance with theirs, so I gave that a try and it seems to be working fine. I didn't do any benchmarks though.

RE: Moving From Oracle Text Search To Solr

2010-03-16 Thread Neil Chaudhuri

Certainly I could use some basic SQL count(*) queries to achieve faceted results, but I am not sure of the flexibility, extensibility, or scalability of that approach. And from what I have read, Oracle Text doesn't do faceting out of the box. Each document is a few MB, and there will be million

XML data in solr field

2010-03-16 Thread Nair, Manas

Hello Experts, I need help on this issue of mine. I am unsure if this scenario is possible. I have a field in my solr document named , the value of which is a xml string as below. This xml structure is within the inputxml field value. I needed help on searching this xml structure i.e. if I sear

Re: Moving From Oracle Text Search To Solr

2010-03-16 Thread Erick Erickson

Why do you think you'd hit OOM errors? How big is "very large"? I've indexed, as a single document, a 26 volume encyclopedia of civil war records.. Although as much as I like the technology, if I could get away without using two technologies, I would. Are you completely sure you can't get what

Re: LucidWorks Solr

2010-03-16 Thread blargy

Kevin, When you say you just included the war you mean the /packs/solr.war correct? I see that the KStemmer is nicely packed in there but I don't see LucidGaze anywhere. Have you had any experience using this? So I'm guessing you would suggest using the LucidWorks solr.war over the apache-solr-

Moving From Oracle Text Search To Solr

2010-03-16 Thread Neil Chaudhuri

I am working on an application that currently hits a database containing millions of very large documents. I use Oracle Text Search at the moment, and things work fine. However, there is a request for faceting capability, and Solr seems like a technology I should look at. Suffice to say I am new

Re: Stemming suggestions

2010-03-16 Thread Erick Erickson

If you search the mail archive, you'll find many discussions of multilingual indexing/searching that'll provide you a plethora of information. But the synopsis as I remember is that using a single stemmer for multiple languages is generally a bad idea Best Erick On Tue, Mar 16, 2010 at 12:19

Re: LucidWorks Solr

2010-03-16 Thread AJ Chen

I'm trying it out right now. I hope it will work well out-of-box for indexing/searching a set of documents with frequent update. -aj On Tue, Mar 16, 2010 at 11:52 AM, blargy wrote: > > Has anyone used this?: > http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr > > Other than the KStem

Re: LucidWorks Solr

2010-03-16 Thread Kevin Osborn

I used it mostly for KStemmer, but I also liked the fact that it included about a dozen or so stable patches since Solr 1.4 was released. We just use the included WAR in our project however. We don't use the installer or anything like that. From: blargy To:

Stemming suggestions

2010-03-16 Thread blargy

Most of our documents will be in English but not all and we are certain in the process of acquiring more international content. Does anyone have any experience using all of the different stemmers for languages of unknown origin? Which ones perform the best? Give the most relevant results? What are

Switching data dir on the fly

2010-03-16 Thread schmax

I generate solr index on an hadoop cluster and I want to copy it from HDFS to a server running solr. I wish to copy the index on a different disk than the disk that solr instance is using, then tell the solr server to switch from the current data dir to the location where I copied the hadoop gene

Re: PDF extraction leads to reversed words

2010-03-16 Thread Abdelhamid ABID

Hi again , I just came from trying the version 1.5-dev from Solr trunk. After applying the patch you provided, and adding icu4j-3_8_1 in classpath, results are pretty good different then before. Now words and texts are not reversed and are displayed correctly except some pdf files's text parts that

SQL and $deleteDocById

2010-03-16 Thread Lukas Kahwe Smith

Hi, I am trying to use $deleteDocById to delete rows based on an SQL query in my db-data-config.xml. The following tag is a top level tag in the tag. However it seems like its only fetching the rows, its not actually issuing any index deletes. regards, Lukas Kahwe Smith m...@pooteew

DIH request parameters

2010-03-16 Thread Lukas Kahwe Smith

Hi, According to the wiki its possible to pass parameters to the DIH: http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters I assume they are just being replaced via simple string replacements, which is exactly what I need. Can they also be in all places, even attributes (fo

solr.WordDelimiterFilterFactory problem with hyphenated terms?

2010-03-16 Thread Demian Katz

This is my first post on this list -- apologies if this has been discussed before; I didn't come upon anything exactly equivalent in searching the archives via Google. I'm using Solr 1.4 as part of the VuFind application, and I just noticed that searches for hyphenated terms are failing in stra

Re: Spatial search in Solr 1.5

2010-03-16 Thread Grant Ingersoll

On Mar 15, 2010, at 11:36 AM, Jean-Sebastien Vachon wrote: > Hi All, > > I'm trying to figure out how to perform spatial searches using Solr 1.5 (from > the trunk). > > Is the support for spatial search built-in? Almost. Main thing missing right now is filtering. There are still ways to do

Re: How to get Term Positions?

2010-03-16 Thread Grant Ingersoll

If you're going to spend time mucking w/ TermPositions, you should just spend your time working with SpanQuery, as that is what I understand you to be asking about. AIUI, you want to be able to get at the positions in the document where the query matched. This is exactly what a SpanQuery and i

Re: AutoSuggest

2010-03-16 Thread Suram

Shalin Shekhar Mangar wrote: > > On Sat, Mar 13, 2010 at 9:30 AM, Suram wrote: > >> >> Erick Erickson wrote: >> > >> > Did you commit your changes? >> > >> > Erick >> > >> > On Fri, Mar 12, 2010 at 7:38 AM, Suram wrote: >> > >> >> >> >> Can set my index fields for auto Suggestion, sometime t

Re: Problem with suggest search

2010-03-16 Thread David Rühr

Thank you. This work good as workaround. Yesterday I get the Tipp to look for wrong solrconfig.xml and that was right. By uploading our Files the solrconfig.xml was LOST ;-) Is it possible to start Java in Debugmode for more Infos? David Am 16.03.2010 02:02, schrieb Tom Hill: You need a que

58 matches

Mail list logo