Index time boosts, payloads, and long query strings

2009-11-20 Thread Girish Redekar
Hi , I'm relatively new to Solr/Lucene, and am using Solr (and not lucene directly) primarily because I can use it without writing java code (rest of my project is python coded). My application has the following requirements: (a) ability to search over multiple fields, each with different weight

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Lance Norskog
And, terms whose documents have been deleted are not purged. So, you can merge all you like and the index will not shrink back completely. Only an optimize will remove the "orphan" terms. This is important because the orphan terms affect relevance calculations. So you really want to purge them wit

Re: index-time boost ... query

2009-11-20 Thread Lance Norskog
No, the reverse is true. Sorting is very very fast in Lucene. The first sort operation spends a lot of time making a data structure and then following sort calls use it. On Thu, Nov 19, 2009 at 1:52 PM, Anil Cherian wrote: > Hi David, > > I just now tried a sorting on the results and I got the re

Re: getting total index size & last update date/time from query

2009-11-20 Thread Lance Norskog
solr/admin/stats.jsp gives a much larger XML dump and also includes these two data items. Note that Luke can walk the entire index data structures, so if you have a large index it's like playing with fire. On Thu, Nov 19, 2009 at 8:54 AM, Binkley, Peter wrote: > The Luke request handler (normall

Re: Problem with SolrJ driver for Solr 1.4

2009-11-20 Thread Lance Norskog
Yes, these are both bugs. SolrJ should do field lists right, and distributed search should work exactly the same as normal search. Please file these in the JIRA. On Thu, Nov 19, 2009 at 8:32 AM, Asaf work wrote: > Hi, > > I'm using the SolrJ 1.4 client driver in a sharded Solr configuration and

Re: Control DIH from PHP

2009-11-20 Thread Lance Norskog
Nice! I didn't notice that before. Very useful. 2009/11/19 Noble Paul നോബിള്‍ नोब्ळ् : > you can pass the uniqueId as a param and use it in a sql query > http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters. > --Noble > > On Thu, Nov 19, 2009 at 3:53 PM, Pablo Ferrari wrote

Re: Using DirectSolrConnection with Solrj

2009-11-20 Thread Lance Norskog
DirectSolrConnection is older and has not been changed in a year. SolrJ is the preferred way to code an app against Solr. SolrJ with the Embedded server will have the same performance characteristics as DirectSolrConnection. On Thu, Nov 19, 2009 at 5:55 AM, dipti khullar wrote: > Hi Solr experts

Re: Solr index on multiple drives.

2009-11-20 Thread Otis Gospodnetic
Hi, No, dataDir is a single directory, so limited to single partition on a single drive. But, you can always have disks in RAID, and then it could be spread over multiple drives. Yes, if you have multiple Solr cores and multiple drives, you could put them on different drivers for performance

Re: creating Lucene document from an external XML file.

2009-11-20 Thread Otis Gospodnetic
Hi, If I understand you correctly, you really want to be constructing SolrInputDocuments (not Lucene's Documents) and indexing those with SolrJ. I don't think there is anything in the API that can read in an XML file and convert it into a SolrInputDocuments instance, but aren't there libraries

Re: Huge load and long response times during search

2009-11-20 Thread Otis Gospodnetic
Tom, It looks like the machine might simply be running too many things. If the load is around 1 when Solr is not running, and this is a dual-core server, it shows its already relatively busy (cca 50% idle). Your caches are not small, so I am guessing you either have to have a relatively big h

RE: schema-based Index-time field boosting

2009-11-20 Thread Chris Hostetter
: The field boost attribute was put there by me back in the 1.3 days, when : I somehow gained the mistaken impression that it was supposed to work! : Of course, despite a lot of searching I haven't been able to find : anything to back up my position ;) solr has never supported anything like a "bo

Embedded solr with third party libraries

2009-11-20 Thread darniz
Hi We are having issue running our test cases with third party library for embedded solr. For exampel we are using kstem library which is not a part of solr distirbution. When we run test cases our schema.xml has defintion for lucid kstem and it throws ClassNotFound Exception. We declared the depe

Re: comparing index-time boost and sort in the case of a date field

2009-11-20 Thread Smiley, David W.
Using index time boosting isn't really a substitute for sorting. It will be faster (I'm pretty sure) but isn't the same thing. The index time boost is going to influence the score but not totally become the score... which means that in all likelihood there will be documents in search results t

Huge load and long response times during search

2009-11-20 Thread Tomasz Kępski
Hi, I'm using SOLR(1.4) to search among about 3,500,000 documents. After the server kernel was updated to 64bit system has started to suffer. Our server has 8G of RAM and double Intel Core 2 DUO. We used to have average loads around 2-2,5. It was not as good as it should but as long HTTP respo

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Yonik Seeley
On Fri, Nov 20, 2009 at 2:32 PM, Michael wrote: > On Fri, Nov 20, 2009 at 12:35 PM, Yonik Seeley > wrote: >> On Fri, Nov 20, 2009 at 12:24 PM, Michael wrote: >>> So -- I thought I understood you to mean that if I frequently merge, >>> it's basically the same as an optimize, and cruft will get pu

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Michael
On Fri, Nov 20, 2009 at 12:35 PM, Yonik Seeley wrote: > On Fri, Nov 20, 2009 at 12:24 PM, Michael wrote: >> So -- I thought I understood you to mean that if I frequently merge, >> it's basically the same as an optimize, and cruft will get purged.  Am >> I misunderstanding you? > > That only appli

Re: Filtering query results

2009-11-20 Thread aseem cheema
Thank you much for your responses guys. I do not have ACL. I need to make a web service call to find out if a user has access to a document. I was hoping to get search results, call the web service with the IDs from the search results telling me what IDs the user has access to, and then filter othe

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Yonik Seeley
On Fri, Nov 20, 2009 at 12:24 PM, Michael wrote: > So -- I thought I understood you to mean that if I frequently merge, > it's basically the same as an optimize, and cruft will get purged.  Am > I misunderstanding you? That only applies to the segments involved in the merge. The deleted document

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Michael
Hoss, Using Solr 1.4, I see constant index growth until an optimize. I commit (hundreds of updates) every 5 minutes and have a mergefactor of 10, but every 50 minutes I don't see the index collapse down to its original size -- it's slightly larger. Over the course of a week, the index grew from

Re: Default sort order for filter query

2009-11-20 Thread Yonik Seeley
On Fri, Nov 20, 2009 at 11:28 AM, Yonik Seeley wrote: > On Fri, Nov 20, 2009 at 11:15 AM, Mike wrote: >> Sorry for the noise - I think I have just answered my own question. The >> order in which docs are indexed determine the result sort order unless >> overridden via sort query parameters :) > >

Re: Default sort order for filter query

2009-11-20 Thread Mike
Yonik Seeley wrote: On Fri, Nov 20, 2009 at 11:15 AM, Mike wrote: Sorry for the noise - I think I have just answered my own question. The order in which docs are indexed determine the result sort order unless overridden via sort query parameters :) Correct. The internal lucene docume

comparing index-time boost and sort in the case of a date field

2009-11-20 Thread Anil Cherian
Hi, I have a requirement to get results in the order of latest date of a field called approval_dt. ie results having the latest approval date should appear first in the SOLR results xml. A sorting "desc" on approval_dt gave me this. Can index-time boost be of use here to improve performance. Coul

Re: Default sort order for filter query

2009-11-20 Thread Yonik Seeley
On Fri, Nov 20, 2009 at 11:15 AM, Mike wrote: > Sorry for the noise - I think I have just answered my own question. The > order in which docs are indexed determine the result sort order unless > overridden via sort query parameters :) Correct. The internal lucene document id is the tiebreaker fo

Re: Default sort order for filter query

2009-11-20 Thread Mike
Mike wrote: When I do a search using q=*:* and then narrow down the result set using a filter query, are there rules that are used for the sort order in the result set? In my results I have a "name" field that appears to be sorted descending in lexicographical order. For example: Wyoming Wynf

Default sort order for filter query

2009-11-20 Thread Mike
When I do a search using q=*:* and then narrow down the result set using a filter query, are there rules that are used for the sort order in the result set? In my results I have a "name" field that appears to be sorted descending in lexicographical order. For example: Wyoming Wynford Wrightsto

Re: Upgrade to solr 1.4

2009-11-20 Thread Yonik Seeley
On Fri, Nov 20, 2009 at 10:26 AM, kalidoss wrote: > In version 1.3 EventDate field type is date, In 1.4 also its date But we are > getting the following error. Use the schema you had with 1.3 and it should work. The example schemas are not backward compatible with an index built with the previou

RE: Filtering query results

2009-11-20 Thread Glock, Thomas
Hi Aseem - I had a similar challenge. The solution that works for my case was to add "role" as a repeating string value in the solr schema. Each piece of content contains 1 or more roles and these values are supplied to solr for indexing. Users also have one or more roles (which correspond ex

RE: Index documents with Solr

2009-11-20 Thread javaxmlsoapdev
Glock, did you get this approach to work? let me know. Thanks, Glock, Thomas wrote: > > I have a similar situation but not expecting any easy setup. Currently > the tables contain both a url to the file and quite a bit of additional > metadata about the file. I'm planning one initial load to

Re: How to use DataImportHandler with ExtractingRequestHandler?

2009-11-20 Thread javaxmlsoapdev
did you extend DIH to do this work? can you share code samples. I have similar requirement where I need tp index database records and each record has a column with document path so need to create another index for documents (we allow users to search both index separately) in parallel with reading

RE: Solr Cell text extraction - non-issue

2009-11-20 Thread Ian Smith
Sorry guys, the bad request seemed to be caused elsewhere, no need to URL encode now. Ian. -Original Message- From: Ian Smith [mailto:ian.sm...@gossinteractive.com] Sent: 20 November 2009 15:26 To: solr-user@lucene.apache.org Subject: Solr Cell text extraction Hi Guys, I am trying to us

creating Lucene document from an external XML file.

2009-11-20 Thread Phanindra Reva
Hello All, I am a newbie using Solr and Lucene. In my task, I have to create org.apache.lucene.document.Document objects from external valid Solr xml files.To be brief, depending on the names of the fields I need to modify corresponding values which is specific to our project. So I wo

Re: Upgrade to solr 1.4

2009-11-20 Thread kalidoss
In version 1.3 EventDate field type is date, In 1.4 also its date But we are getting the following error. name="EventDate">ERROR:SCHEMA-INDEX-MISMATCH,stringValue=2008-05-16T07:19:28 -kalidoss.m, kalidoss wrote: Even i want to upgrade from v1.3 to 1.4 I did 1.3 index directory replace with

Solr Cell text extraction

2009-11-20 Thread Ian Smith
Hi Guys, I am trying to use Solr Cell to extract body content from documents, and also to pass along some literal field values. Trouble is, some of the literal fields contain spaces, colons etc. which cause a "bad request" exception in the server. However, if I URL encode these fields the encodi

Re: Upgrade to solr 1.4

2009-11-20 Thread kalidoss
Even i want to upgrade from v1.3 to 1.4 I did 1.3 index directory replace with 1.4 and associated schema changes in that. Its throwing lot of exception like datatype mismatch with Integer, String, Date, etc. Even the results are coming with some error example: "name="Alias">ERROR:SCHEMA-INDEX

RE: Multi word synonym problem

2009-11-20 Thread Nair, Manas
Hi, I tried using the recommended approach but to no benefit. The multiword synonyms are still not appearing in the result. My schema.xml has the following fieldType:

Solr index on multiple drives.

2009-11-20 Thread swatkatz
Hi, Can I have one instance of Solr write the index and date to multiple drives ? e.g. Can I configure Solr to do something like - c:\data d:\data e:\data Or is the suggested way to use multiple Solr cores and have the application shard the index across the cores ? Or is distributed search (by

Re: Filtering query results

2009-11-20 Thread Grant Ingersoll
On Nov 19, 2009, at 4:59 PM, aseem cheema wrote: > Hey Guys, > I need to filter out some results based on who is performing the > search. In other words, if a document is not accessible to a user > performing search, I don't want it to be in the result set. What is > the best/easiest way to do th

Re: field type definition

2009-11-20 Thread Grant Ingersoll
On Nov 20, 2009, at 7:22 AM, revas wrote: > Hello, > > If I define a field like this in the schema ,is this correct ? > > class="*solr.TextField*"positionIncrementGap > ="*100*"> > - > > >generateWordParts="*1*"generate

Re: Function queries question

2009-11-20 Thread Grant Ingersoll
On Nov 20, 2009, at 3:15 AM, Oliver Beattie wrote: > Hi all, > > I'm a relative newcomer to Solr, and I'm trying to use it in a project > of mine. I need to do a function query (I believe) to filter the > results so they are within a certain distance of a point. For this, I > understand I should

field type definition

2009-11-20 Thread revas
Hello, If I define a field like this in the schema ,is this correct ? - Here I am not differentiating it in terms of query analyzer and the index analyzer and I am assuming that this will be used by both query

Re: Solr - Load Increasing.

2009-11-20 Thread kalidoss
Thank u all. I have increased the heap size memory from 1gb to 1.5gb. Now its java -Xms512M -Xmx1536M -jar start.jar, My cpu load is normal and solr is not restating frequently, My autocommit maxdoc increased to 200. For last 24 hours no issue on load/restarts. Thanks Guys. Kalidos

Re: Configuring Solr to use RAMDirectory

2009-11-20 Thread Andrey Klochkov
I thought that SOLR-465 just does what is asked, i.e. one can use any Directory implementation including RAMDirectory. Thomas, take a look at it. On Thu, Nov 12, 2009 at 7:55 AM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > I think not out of the box, but look at SOLR-243 issue in JIRA

RE: schema-based Index-time field boosting

2009-11-20 Thread Ian Smith
Hi David, thanks for replying, The field boost attribute was put there by me back in the 1.3 days, when I somehow gained the mistaken impression that it was supposed to work! Of course, despite a lot of searching I haven't been able to find anything to back up my position ;) Unfortunately our cod

Function queries question

2009-11-20 Thread Oliver Beattie
Hi all, I'm a relative newcomer to Solr, and I'm trying to use it in a project of mine. I need to do a function query (I believe) to filter the results so they are within a certain distance of a point. For this, I understand I should use something like sqedist or hsin, and from the documentation o