Query performance very slow even after autowarming

2010-12-02 Thread johnnyisrael
Hi, I am using edgeNgramFilterfactory on SOLR 1.4.1 [] for my indexing. Each document will have about 5 fields in it and only one field is indexed with EdgeNGramFilterFactory. I have about 1.4 million documents in my index now and my index size is approx 296MB. I made the field that is indexed

Solr Multi-thread Update Transaction Control

2010-12-02 Thread wangjb
Hi, Now we are using solr1.4.1, and encounter a problem. When multi-threads update solr data at the same time, can every thread have its separate transaction? If this is possible, how can we realize this. Is there any suggestion here? Waiting online. Thank you for any useful reply.

PDF text extracted without spaces

2010-12-02 Thread Ganesh
Hello all, I know, this is not the right group to ask this question, thought some of you guys might have experienced. I newbie with Tika. I am using latest version 0.8 version. I extracted text from PDF document but found spaces and new line missing. Indexing the data gives wrong result. Cou

Limit number of characters returned

2010-12-02 Thread Mark
Is there way to limit the number of characters returned from a stored field? For example: Say I have a document (~2K words) and I search for a word that's somewhere in the middle. I would like the document to match the search query but the stored field should only return the first 200 characte

Re: solr/admin/dataimport Not Found

2010-12-02 Thread Ruixiang Zhang
Thank you so much, Koji, the example-DIH works. I'm reading for details... Richard On Thu, Dec 2, 2010 at 4:39 PM, Koji Sekiguchi wrote: > (10/12/03 9:29), Ruixiang Zhang wrote: > >> Hi Koji >> >> Thanks for your reply. >> I pasted the wrong link. >> Actually I tried this fist http://mydomain.

Re: ramBufferSizeMB not reflected in segment sizes in index

2010-12-02 Thread Michael McCandless
On Thu, Dec 2, 2010 at 4:31 PM, Burton-West, Tom wrote: > We turned on infostream.   Is there documentation about how to interpret it, > or should I just grep through the codebase? There isn't any documentation... and it changes over time as we add new diagnostics. > Is the excerpt below what

Re: solr/admin/dataimport Not Found

2010-12-02 Thread Koji Sekiguchi
(10/12/03 9:29), Ruixiang Zhang wrote: Hi Koji Thanks for your reply. I pasted the wrong link. Actually I tried this fist http://mydomain.com:8983/solr/dataimport It didn't work. The page should be there after installation, right? Did I miss something? Thanks a lot! Richard To work that URL,

Re: solr/admin/dataimport Not Found

2010-12-02 Thread Ruixiang Zhang
Hi Koji Thanks for your reply. I pasted the wrong link. Actually I tried this fist http://mydomain.com:8983/solr/dataimport It didn't work. The page should be there after installation, right? Did I miss something? Thanks a lot! Richard On Thu, Dec 2, 2010 at 4:23 PM, Koji Sekiguchi wrote:

Re: solr/admin/dataimport Not Found

2010-12-02 Thread Koji Sekiguchi
(10/12/03 8:58), Ruixiang Zhang wrote: I tried to import data from mysql. When I tried to run http://mydomain.com:8983/solr/admin/dataimport , I got these error message: HTTP ERROR: 404 NOT_FOUND RequestURI=/solr/admin/dataimport *Powered by Jetty:// * Any help wi

solr/admin/dataimport Not Found

2010-12-02 Thread Ruixiang Zhang
I tried to import data from mysql. When I tried to run http://mydomain.com:8983/solr/admin/dataimport , I got these error message: HTTP ERROR: 404 NOT_FOUND RequestURI=/solr/admin/dataimport *Powered by Jetty:// * Any help will be appreciated!!! Thanks Richard

Re: TermsComponent prefix query with fileds analyzers

2010-12-02 Thread Ahmet Arslan
> Does anyone know how to apply some analyzers over a prefix > query? Lucene has an special QueryParser for this. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html Someone provided a patch to use it in solr. It was an attachmen

Cannot start Solr anymore

2010-12-02 Thread Ruixiang Zhang
Hi, I'm new here. First, could anyone tell me how to restart solr? I started solr and killed the process. Then when I tried to start it again, it failed: $ java -jar start.jar 2010-12-02 14:28:00.011::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2010-12-02 14:28:00.099::INFO: jetty-6.

Re: spatial query parinsg error: org.apache.lucene.queryParser.ParseException

2010-12-02 Thread Dennis Gearon
It WORKED Thank you so much everybody! I feel like jumping up and down like 'Hiro' on Heroes Dennis Gearon - Original Message - From: "Dennis Gearon" To: Sent: Wednesday, December 01, 2010 7:51 PM Subject: spatial query parinsg error: org.apache.lucene.queryParser.ParseException

Re: Joining Fields in and Index

2010-12-02 Thread Adam Estrada
Hi, I was hoping to do it directly in the index but it was more out of curiosity than anything. I can certainly map it in the DAO but again...I was hoping to learn if it was possible in the index. Thanks for the feedback! Adam On Dec 2, 2010, at 5:48 PM, Savvas-Andreas Moysidis wrote: > Hi,

Re: Joining Fields in and Index

2010-12-02 Thread Savvas-Andreas Moysidis
Hi, If you are able to do a full re-index then you could index the full names and not the codes. When you later facet on the Country field you'll get the actual name rather than the code. If you are not able to re-index then probably this conversion could be added at your application layer prior t

Joining Fields in and Index

2010-12-02 Thread Adam Estrada
All, I have an index that has a field with country codes in it. I have 7 million or so documents in the index and when displaying facets the country codes don't mean a whole lot to me. Is there any way to add a field with the full country names then join the codes in there accordingly? I suppos

Re: ramBufferSizeMB not reflected in segment sizes in index

2010-12-02 Thread Yonik Seeley
On Wed, Dec 1, 2010 at 3:01 PM, Shawn Heisey wrote: > I have seen this.  In Solr 1.4.1, the .fdt, .fdx, and the .tv* files do not > segment, but all the other files do.  I can't remember whether it behaves > the same under 3.1, or whether it also creates these files in each segment. Yep, that's t

RE: ramBufferSizeMB not reflected in segment sizes in index

2010-12-02 Thread Burton-West, Tom
Hi Mike, We turned on infostream. Is there documentation about how to interpret it, or should I just grep through the codebase? Is the excerpt below what I am looking for as far as understanding the relationship between ramBufferSize and size on disk? is newFlushedSize the size on disk in byt

Re: TermsComponent prefix query with fileds analyzers

2010-12-02 Thread Jonathan Rochkind
I don't believe you can. If you just need query-time transformation, can't you just do it in your client app? If you need index-time transformation... well, you can do that, but it's up to your schema.xml and will of course apply to the field as a whole, not just for termscomponent queries, be

Re: SOLR Thesaurus

2010-12-02 Thread Jonathan Rochkind
No, it doesn't. And it's not entirely clear what (if any) simple way there is to use Solr to expose hieararchically related documents in a way that preserves and usefully allows navigation of the relationships. At least in general, for sophisticated stuff. On 12/2/2010 3:55 AM, lee carroll w

Re: Import Data Into Solr

2010-12-02 Thread Erick Erickson
You can just point your Solr instance at your Lucene index. Really, copy the Lucene index into the right place to be found by solr. HOWEVER, you need to take great care that the field definitions that you used when you built your Lucene index are compatible with the ones configured in your schema.

Re: Return Lucene DocId in Solr Results

2010-12-02 Thread Erick Erickson
You have to call termDocs.next() after termDocs.seek. Something like termDocs.seek(). if (termDocs.next()) { // means there was a term/doc matching and your references should be valid. } On Thu, Dec 2, 2010 at 10:22 AM, Lohrenz, Steven wrote: > I must be missing something as I'm getting a NPE

RE: disabled replication setting

2010-12-02 Thread Xin Li
Does anything know? Thanks, -Original Message- From: Xin Li [mailto:xin.li@gmail.com] Sent: Thursday, December 02, 2010 12:25 PM To: solr-user@lucene.apache.org Subject: disabled replication setting For solr replication, we can send command to disable replication. Does anyone know w

Exceptions in Embedded Solr

2010-12-02 Thread Tharindu Mathew
Hi everyone, I get the exception below when using Embedded Solr suddenly. If I delete the Solr index it goes back to normal, but it obviously has to start indexing from scratch. Any idea what the cause of this is? java.lang.RuntimeException: java.io.FileNotFoundException: /home/evanthika/WSO2/CAR

disabled replication setting

2010-12-02 Thread Xin Li
For solr replication, we can send command to disable replication. Does anyone know where i can verify the replication enabled/disabled setting? i cannot seem to find it on dashboard or details command output. Thanks, Xin

TermsComponent prefix query with fileds analyzers

2010-12-02 Thread Nestor Oviedo
Hi everyone Does anyone know how to apply some analyzers over a prefix query? What I'm looking for is a way to build an autosuggest using the termsComponent that could be able to remove the accents from the query's prefix. For example, I have the term "analisis" in the index and I want to retrieve

Re: Dinamically change master

2010-12-02 Thread Tommaso Teofili
Back with my master resiliency need, talking with Upayavira we discovered we were proposing the same solution :-) This can be useful if you don't have a VIP with master/backup polling policy. It goes like this: there are 2 host for indexing, one is the main and one is the backup one, the backup one

Import Data Into Solr

2010-12-02 Thread Bing Li
Hi, all, I am a new user of Solr. Before using it, all of the data is indexed myself with Lucene. According to the Chapter 3 of the book, Solr. 1.4 Enterprise Search Server written by David Smiley and Eric Pugh, data in the formats of XML, CSV and even PDF, etc, can be imported to Solr. If I wish

Re: SOLR Thesaurus

2010-12-02 Thread lee carroll
Hi Stephen, yes sorry should have been more plain a term can have a Prefered Term (PT), many Broader Terms (BT), Many Narrower Terms (NT) Related Terms (RT) etc So User supplied Term is say : Ski Prefered term: Skiing Broader terms could be : Ski and Snow Boarding, Mountain Sports, Sports Narr

Re: SOLR Thesaurus

2010-12-02 Thread Michael Zach
Hello Lee, these bells sound like "SKOS" ;o) AFAIK Solr does not support thesauri just plain flat synonym lists. One could implement a thesaurus filter and put it into the end of the analyzer chain of solr. The filter would then do a thesaurus lookup for each token it receives and possibly *

Multi-valued poly fields & search

2010-12-02 Thread Vincent Cautaerts
Hi, (should this be on solr-dev mailing list?) I have this kind of data, about articles in newspapers: article A-001 . published on 2010-10-31, in newspaper "N-1", edition "E1" . published on 2010-10-30, in newspaper "N-2", edition "E2" article A-002 . published on 2010-10-30, in newspaper "N

RE: Return Lucene DocId in Solr Results

2010-12-02 Thread Lohrenz, Steven
I must be missing something as I'm getting a NPE on the line: docIds[i] = termDocs.doc(); here's what I came up with: private int[] getDocIdsFromPrimaryKey(SolrQueryRequest req, List favsBeans) throws ParseException { // open the core & get data directory String indexDir = req.g

Re: Dataimport destroys our harddisks

2010-12-02 Thread Sven Almgren
That's the same series we use... we hade problems when running other disk-heavy operations like rsync and backup on them too.. But in our case we mostly had hangs or load > 180 :P... Can you simulate very heavy random disk i/o? if so then you could check if you still have the same problems... Tha

Re: Dataimport destroys our harddisks

2010-12-02 Thread Robert Gründler
On Dec 2, 2010, at 15:43 , Sven Almgren wrote: > What Raid controller do you use, and what kernel version? (Assuming > Linux). We hade problems during high load with a 3Ware raid controller > and the current kernel for Ubuntu 10.04, we hade to downgrade the > kernel... > > The problem was a bug i

Re: Dataimport destroys our harddisks

2010-12-02 Thread Sven Almgren
What Raid controller do you use, and what kernel version? (Assuming Linux). We hade problems during high load with a 3Ware raid controller and the current kernel for Ubuntu 10.04, we hade to downgrade the kernel... The problem was a bug in the driver that only showed up with very high disk load (a

Re: Dataimport destroys our harddisks

2010-12-02 Thread Robert Gründler
> The very first thing I'd ask is "how much free space is on your disk > when this occurs?" Is it possible that you're simply filling up your > disk? no, i've checked that already. all disks have plenty of space (they have a capacity of 2TB, and are currently filled up to 20%. > > do note that a

Re: Return Lucene DocId in Solr Results

2010-12-02 Thread Erick Erickson
Ahhh, you're already down in Lucene. That makes things easier... See TermDocs. Particularly seek(Term). That'll directly access the indexed unique key rather than having to form a bunch of queries. Best Erick On Thu, Dec 2, 2010 at 8:59 AM, Lohrenz, Steven wrote: > I would be interested in hea

RE: Return Lucene DocId in Solr Results

2010-12-02 Thread Lohrenz, Steven
I would be interested in hearing about some ways to improve the algorithm. I have done a very straightforward Lucene query within a loop to get the docIds. Here's what I did to get it working where favsBean are objects returned from a query of the second core, but there is probably a better way

RE: SOLR Thesaurus

2010-12-02 Thread Steven A Rowe
Hi Lee, Can you describe your thesaurus format (it's not exactly self-descriptive) and how you would like it to be applied? I gather you're referring to a thesaurus feature in another product (or product class)? Maybe if you describe that it would help too. Steve > -Original Message-

Re: Return Lucene DocId in Solr Results

2010-12-02 Thread Erick Erickson
Sounds good, especially because your old scenario was fragile. The doc IDs in your first core could change as a result of a single doc deletion and optimize. So the doc IDs stored in the second core would then be wrong... Your user-defined unique key is definitely a better way to go. There are som

Re: Dataimport destroys our harddisks

2010-12-02 Thread Erick Erickson
The very first thing I'd ask is "how much free space is on your disk when this occurs?" Is it possible that you're simply filling up your disk? do note that an optimize may require up to 2X the size of your index if/when it occurs. Are you sure you aren't optimizing as you add items to your index?

Re: Best practice for Delta every 2 Minutes.

2010-12-02 Thread Erick Erickson
In fact, having a master/slave where the master is the indexing/updating machine and the slave(s) are searchers is one of the recommended configurations. The replication is used in many, many sites so it's pretty solid. It's generally not recommended, though, to run separate instances on the *same

Re: Tuning Solr caches with high commit rates (NRT)

2010-12-02 Thread Peter Sturge
In order for the 'read-only' instance to see any new/updated documents, it needs to do a commit (since it's read-only, it is a commit of 0 documents). You can do this via a client service that issues periodic commits, or use autorefresh from within solrconfig.xml. Be careful that you don't do anyth

Re: Tuning Solr caches with high commit rates (NRT)

2010-12-02 Thread stockii
great thread and exactly my problems :D i set up two solr-instances, one for update the index and another for searching. When i perform an update. the search-instance dont get the new documents. when i start a commit on searcher he found it. how can i say the searcher that he alwas look not onl

RE: Return Lucene DocId in Solr Results

2010-12-02 Thread Lohrenz, Steven
I know the doc ids from one core have nothing to do with the other. I was going to use the docId returned from the first core in the solr results and store it in the second core that way the second core knows about the doc ids from the first core. So when you query the second core from the Filte

Re: Troubles with forming query for solr.

2010-12-02 Thread Savvas-Andreas Moysidis
Hello, would something similar along those lines: (field1:term AND field2:term AND field3:term)^2 OR (field1:term AND field2:term)^0.8 OR (field2:term AND field3:term)^0.5 work? You'll probably need to experiment with the boost values to get the desired result. Another option could be investigat

Re: Preventing index segment corruption when windows crashes

2010-12-02 Thread Michael McCandless
On Thu, Dec 2, 2010 at 4:53 AM, Peter Sturge wrote: > As I'm not familiar with the syncing in Lucene, I couldn't say whether > there's a specific problem with regards Win7/2008 server etc. > > Windows has long had the somewhat odd behaviour of deliberately > caching file handles after an explicit

Dataimport destroys our harddisks

2010-12-02 Thread Robert Gründler
Hi, we have a serious harddisk problem, and it's definitely related to a full-import from a relational database into a solr index. The first time it happened on our development server, where the raidcontroller crashed during a full-import of ~ 8 Million documents. This happened 2 weeks ago, and

Re: Best practice for Delta every 2 Minutes.

2010-12-02 Thread stockii
at the time no OOM occurs. but we are not in correct live system ... i thougt maybe i get this problem ... we are running seven cores and each want be update very fast. only one core have a huge index with 28M docs. maybe it makes sense for the future to use solr with replication !? or can i r

Re: Preventing index segment corruption when windows crashes

2010-12-02 Thread Peter Sturge
As I'm not familiar with the syncing in Lucene, I couldn't say whether there's a specific problem with regards Win7/2008 server etc. Windows has long had the somewhat odd behaviour of deliberately caching file handles after an explicit close(). This has been part of NTFS since NT 4 days, but there

Re: Preventing index segment corruption when windows crashes

2010-12-02 Thread Michael McCandless
On Thu, Dec 2, 2010 at 4:10 AM, Peter Sturge wrote: > The Win7 crashes aren't from disk drivers - they come from, in this > case, a Broadcom wireless adapter driver. > The corruption comes as a result of the 'hard stop' of Windows. > > I would imagine this same problem could/would occur on any OS

Re: Preventing index segment corruption when windows crashes

2010-12-02 Thread Peter Sturge
The Win7 crashes aren't from disk drivers - they come from, in this case, a Broadcom wireless adapter driver. The corruption comes as a result of the 'hard stop' of Windows. I would imagine this same problem could/would occur on any OS if the plug was pulled from the machine. Thanks, Peter On T

SOLR Thesaurus

2010-12-02 Thread lee carroll
Hi List, Coming to and end of a proto type evaluation of SOLR (all very good etc etc) Getting to the point at looking at bells and whistles. Does SOLR have a thesuarus. Cant find any refrerence to one in the docs and on the wiki etc. (Apart from a few mail threads which describe the synonym.txt as

Re: Restrict access to localhost

2010-12-02 Thread Peter Karich
for 1) use the tomcat configuration in conf/server.xml address="127.0.0.1" port="8080" ... for 2) if they have direct access to solr either insert a middleware layer or create a write lock ;-) Hello all, 1) I want to restrict access to Solr only in localhost. How to acheive that? 2) If i wan