RE: One item, multiple fields, and range queries

2011-04-08 Thread wojtekpia
Hi Hoss, I realize I'm reviving a really old thread, but I have the same need, and SpanNumericRangeQuery sounds like a good solution for me. Can you give me some guidance on how to implement that? Thanks, Wojtek -- View this message in context: http://lucene.472066.n3.nabble.com/One-item-multip

Re: SEVERE: Unable to move index file

2010-09-30 Thread wojtekpia
Hi, I ran into this problem again the other night. I've looked through my log files in more detail, and nothing seems out of place (I stripped user queries out and included it below). I have the following setup: 1. Indexer has 2 cores. One core gets incremental updates, the other is for full re-sy

Re: performance sorting multivalued field

2010-06-24 Thread wojtekpia
Chris Hostetter-3 wrote: > > sorting on a multivalued is defined to have un-specified behavior. it > might fail with an error, or it might fail silently. > I learned this the hard way, it failed silently for a long time until it failed with an error: http://lucene.472066.n3.nabble.com/Diffe

Re: DataImportHandler and running out of disk space

2010-06-03 Thread wojtekpia
https://issues.apache.org/jira/browse/SOLR-1939 SOLR-1939 created. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-and-running-out-of-disk-space-tp835125p868133.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: DataImportHandler and running out of disk space

2010-05-21 Thread wojtekpia
I ran through some more failure scenarios (scenarios and results below). The concerning ones in my deployment are when data does not get updated, but the DIH's .properties file does. I could only simulate that scenario when I ran out of disk space (all all disk space issues behaved consistently).

DataImportHandler and running out of disk space

2010-05-21 Thread wojtekpia
I'm noticing some data differences between my database and Solr. About a week ago my Solr server ran out of disk space, so now I'm observing how the DataImportHandler behaves when Solr runs out of disk space. In a word, I'd say it behaves badly! It looks like out-of-disk-space exceptions are treat

SEVERE: Unable to move index file

2010-05-12 Thread wojtekpia
Hi I ran into a replication issue yesterday and I have no explanation for it. I see the following in my logs: SEVERE: Unable to move index file from: /my/dir/Solr/data/property/index.20100511050029/_3zj.fdt to: /my/dir/Solr/data/property/index.20100511042539/_3zj.fdt I restarted the subscriber th

Re: Sanity check on numeric types and which of them to use

2010-05-07 Thread wojtekpia
> 3) The only reason to use a "sint" field is for backward compatibility > and/or to use sortMissingFirst/SortMissingLast, correct? > I'm using sint so I can facet and sort facets numerically. -- View this message in context: http://lucene.472066.n3.nabble.com/Sanity-check-on-numeric-types-

Discovering Slaves

2010-02-15 Thread wojtekpia
Is there a way to 'discover' slaves using ReplicationHandler? I'm writing a quick dashboard, and don't have access to a list of slaves, but would like to show some stats about their health. -- View this message in context: http://old.nabble.com/Discovering-Slaves-tp27601334p27601334.html Sent fr

Re: Google Commerce Search

2010-01-19 Thread wojtekpia
While Solr is functionally platform independent, I have seen much better performance on Linux than Windows under high load (related to SOLR-465). MitchK wrote: > > As you know, Solr is fully written in Java and Java is still > plattform-independent. ;) > Learn more about Solr on http://www.luc

Re: Dynamically change config file name in DataImportHandler

2010-01-14 Thread wojtekpia
I thought of another way: have two data import request handlers configured in solrconfig.xml, one for each file. wojtekpia wrote: > > I have 2 data import files, and I'd like to be able to switch between > without renaming either file, and without changing solrconfig.x

Dynamically change config file name in DataImportHandler

2010-01-14 Thread wojtekpia
I have 2 data import files, and I'd like to be able to switch between without renaming either file, and without changing solrconfig.xml. Does the DataImportHandler support that? I tried passing a 'config' parameter with the 'reload-config' command, but that didn't work. Thanks, Wojtek -- View thi

Replication Condition (Swapping indexers)

2010-01-13 Thread wojtekpia
Hi, I have a deployment with 2 indexers (2 cores in a single servlet container), and a farm of searchers that replicate from one of the indexers. Once in a while I need to re-index all my data, so I do that on my second indexer (while my original indexer still gets incremental updates), then swap

Re: question about schemas (and SOLR-1131?)

2009-12-04 Thread wojtekpia
Could this be solved with a multi-valued custom field type (including a custom comparator)? The OP's situation deals with multi-valuing products for each customer. If products contain strictly numeric fields then it seems like a custom field implementation (or extension of BinaryField?) *should* b

Re: javabin in .NET?

2009-11-12 Thread wojtekpia
I was thinking of going this route too because I've found that parsing XML result sets using XmlDocument + XPath can be very slow (up to a few seconds) when requesting ~100 documents. Are you getting good performance parsing large result sets? Are you using SAX instead of DOM? Thanks, Wojtek ma

Re: number of Solr indexes per Tomcat instance

2009-10-23 Thread wojtekpia
I ran into trouble running several cores (either as Solr multi-core or as separate web apps) in a single JVM because the Java garbage collector would freeze all cores during a collection. This may not be an issue if you're not dealing with large amounts of memory. My solution is to run each web ap

Re: how can I use debugQuery if I have extended QParserPlugin?

2009-10-16 Thread wojtekpia
at's the field > type? > > -Yonik > http://www.lucidimagination.com > > On Fri, Oct 16, 2009 at 3:01 PM, wojtekpia wrote: >> >> I'm seeing the same behavior and I don't have any custom query parsing >> plugins. Similar to the original post, my queries li

Re: how can I use debugQuery if I have extended QParserPlugin?

2009-10-16 Thread wojtekpia
I'm seeing the same behavior and I don't have any custom query parsing plugins. Similar to the original post, my queries like: select?q=field:[1 TO *] select?q=field:[1 TO 2] select?q=field:[1 TO 2]&debugQuery=true work correctly, but including an unboundd range appears to break the debug compon

Changing masterUrl in ReplicationHandler at Runtime

2009-10-09 Thread wojtekpia
Hi, I'm trying to change the masterUrl of a search slave at runtime. So far I've found 2 ways of doing it: 1. Change solrconfig_slave.xml on master, and have it replicate to solrconfig.xml on the slave 2. Change solrconfig.xml on slave, then issue a core reload command. (a side note: can I issue

Different sort behavior on same code

2009-10-06 Thread wojtekpia
Hi, I'm running Solr version 1.3.0.2009.07.08.08.05.45 in 2 environments. I have a field defined as: The two environments have different data, but both have single and multi valued entries for myDate. On one environment sorting by myDate works (sort seems to be by the 'last' value if multi va

Multi-valued field cache

2009-09-30 Thread wojtekpia
I want to build a FunctionQuery that scores documents based on a multi-valued field. My intention was to use the field cache, but that doesn't get me multiple values per document. I saw other posts suggesting UnInvertedField as the solution. I don't see a method in the UnInvertedField class that w

Re: FileListEntityProcessor and LineEntityProcessor

2009-09-16 Thread wojtekpia
Note that if I change my import file to explicitly list all my files (instead of using the FileListEntityProcessor) as below then everything works as I expect. ... -- View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25

Re: FileListEntityProcessor and LineEntityProcessor

2009-09-16 Thread wojtekpia
Fergus McMenemie-2 wrote: > > > Can you provide more detail on what you are trying to do? ... > You seem to listing all files "d:\my\directory\.*WRK". Do > these WRK files contain lists of files to be indexed? > > That is my complete data config file. I have a directory containing a bunch

FileListEntityProcessor and LineEntityProcessor

2009-09-16 Thread wojtekpia
Hi, I'm trying to import data from a list of files using the FileListEntityProcessor. Here is my import configuration: If I have only one file in d:\my\directory\ then everything works correctly. If I have multiple files then I get the following exception: Sep

Re: Backups using Replication

2009-09-11 Thread wojtekpia
I've verified that renaming backAfter to snapshot works (I should've checked before asking). Thanks Noble! wojtekpia wrote: > > > > > ... > optimize > ... > > > > > -- View this message in context: h

Re: Backups using Replication

2009-09-11 Thread wojtekpia
Do you mean that it's been renamed, so this should work? ... optimize ... Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: > > before that backupAfter was called "snapshot" > -- View this message in context: http://www.nabble.com/Backups-using-Replication-tp25350083p25407695.

Re: Passing FuntionQuery string parameters

2009-09-10 Thread wojtekpia
It looks like parseArg was added on Aug 20, 2009. I'm working with slightly older code. Thanks! Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: > > did you implement your own ValueSourceParser . the > FunctionQParser#parseArg() method supports strings > > On Wed, Sep 9, 2009 at 12:10

Re: Backups using Replication

2009-09-10 Thread wojtekpia
I'm using trunk from July 8, 2009. Do you know if it's more recent than that? Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: > > which version of Solr are you using? the "backupAfter" name was > introduced recently > -- View this message in context: http://www.nabble.com/Backups-using-Replication-tp25

Passing FuntionQuery string parameters

2009-09-08 Thread wojtekpia
Hi, I'm writing a function query to score documents based on Levenshtein distance from a string. I want my function calls to look like: lev(myFieldName, 'my string to match') I'm running into trouble parsing the string I want to match ('my string to match' above). It looks like all the built in

Backups using Replication

2009-09-08 Thread wojtekpia
I'm trying to create data backups using the ReplicationHandler's built in functionality. I've configured my master as http://wiki.apache.org/solr/SolrReplication documented : ... optimize ... but I don't see any backups created on the master. Do I need the snapshooter scrip

RE: Searching and Displaying Different Logical Entities

2009-08-27 Thread wojtekpia
Funtick wrote: > >>then 2) get all P's by ID, including facet counts, etc. >>The problem I face with this solution is that I can have many matching P's > (10,000+), so my second query will have many (10,000+) constraints. > > SOLR can automatically provide you P's with Counts, and it will be >

Searching and Displaying Different Logical Entities

2009-08-26 Thread wojtekpia
I'm trying to figure out if Solr is the right solution for a problem I'm facing. I have 2 data entities: P(arent) & C(hild). P contains up to 100 instances of C. I need to expose an interface that searches attributes of entity C, but displays them grouped by parent entity, P. I need to include fac

Re: Facets with an IDF concept

2009-08-13 Thread wojtekpia
Hi Asif, Did you end up implementing this as a custom sort order for facets? I'm facing a similar problem, but not related to time. Given 2 terms: A: appears twice in half the search results B: appears once in every search result I think term A is more "interesting". Using facets sorted by freque

Re: Solr CMS Integration

2009-08-07 Thread wojtekpia
Thanks for the responses. I'll give Drupal a shot. It sounds like it'll do the trick, and if it doesn't then at least I'll know what I'm looking for. Wojtek -- View this message in context: http://www.nabble.com/Solr-CMS-Integration-tp24868462p24870218.html Sent from the Solr - User mailing lis

Solr CMS Integration

2009-08-07 Thread wojtekpia
I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with S

Re: Dedicated Slave Master

2009-07-16 Thread wojtekpia
Hey Grant, It's a middleman, not a backup. We don't have any issues in the current setup, just trying to make sure we have a solution in case this becomes an issue. I'm concerned about a situation with dozens of searchers. The i/o and network load on the indexer might become significant at that p

Dedicated Slave Master

2009-07-15 Thread wojtekpia
I'm building a high load system that will require several search slaves (at least 2, but this may grow to 5-10+ in the near future). I plan to have a single indexer that replicates to the search slaves. I want indexing to be as fast as possible, so I've considered adding another machine between my

Custom Values in dataimport.properties

2009-05-29 Thread wojtekpia
Hi, I'd like to include a data version in my index, and it looks like dataimport.properties would be a nice place for it. Is there a way to add a custom name-value pair to that file? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Custom-Values-in-dataimport.properties-

Solr vs Sphinx

2009-05-13 Thread wojtekpia
I came across this article praising Sphinx: http://www.theregister.co.uk/2009/05/08/dziuba_sphinx/. The article specifically mentions Solr as an 'aging' technology, and states that performance on Sphinx is 2x-4x faster than Solr. Has anyone compared Sphinx to Solr? Or used Sphinx in the past? I re

Re: JVM exception_access_violation

2009-05-08 Thread wojtekpia
I updated to Java 6 update 13 and have been running problem free for just over a month. I'll continue this thread if I run into any problems that seem to be related. Yonik Seeley-2 wrote: > > I assume that you're not using any Tomcat native libs? If you are, > try removing them... if not (and

Re: preImportDeleteQuery

2009-05-08 Thread wojtekpia
I'm using full-import, not delta-import. I tried it with delta-import, and it would work, except that I'm querying for a large number of documents so I can't afford the cost of deltaImportQuery for each document. It sounds like $deleteDocId will work. I just need to update from 1.3 to trunk. Than

preImportDeleteQuery

2009-05-07 Thread wojtekpia
Hi, I'm importing data using the DIH. I manage all my data updates outside of Solr, so I use the full-import command to update my index (with clean=false). Everything works fine, except that I can't delete documents easily using the DIH. I noticed the preImportDeleteQuery attribute, but doesn't se

Sorting by 'starts with'

2009-05-07 Thread wojtekpia
I have an index of product names. I'd like to sort results so that entries starting with the user query come first. E.g. q=kitchen Results would sort something like: 1. kitchen appliance 2. kitchenaid dishwasher 3. fridge for kitchen It looks like using a query Function Query comes close, but

JVM exception_access_violation

2009-03-20 Thread wojtekpia
I'm running Solr on Tomcat 6.0.18 with Java 6 update 7 on Windows 2003 64 bit. Over the past month or so, my JVM has crashed twice with the error below. Has anyone experienced this? My system is not heavily loaded, and the crash seems to coincide with an update (via DIH). I'm running trunk code fr

Re: Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread wojtekpia
Thanks Otis. Do you know what the most common deployment OS is? I couldn't find much on the mailing list or http://wiki.apache.org/solr/PublicServers Otis Gospodnetic wrote: > > > You should be fine on either Linux or FreeBSD (or any other UNIX flavour). > Running on Solaris would probably gi

Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread wojtekpia
Is there a recommended unix flavor for deploying Solr on? I've benchmarked my deployment on Red Hat. Our operations team asked if we can use FreeBSD instead. Assuming that my benchmark numbers are consistent on FreeBSD, is there anything else I should watch out for? Thanks. Wojtek -- View this

Re: Reading Core-Specific Config File in a Row Transformer

2009-02-18 Thread wojtekpia
Thanks Shalin. I think you missed the call to .getResourceLoader(), so it should be: context.getSolrCore().getResourceLoader().getInstanceDir() Works great, thanks! Shalin Shekhar Mangar wrote: > > > You can use Context.getSolrCore().getInstanceDir() > > -- View this message in context:

Reading Core-Specific Config File in a Row Transformer

2009-02-17 Thread wojtekpia
I'm using the DataImportHandler to load data. I created a custom row transformer, and inside of it I'm reading a configuration file. I am using the system's solr.solr.home property to figure out which directory the file should be in. That works for a single-core deployment, but not for multi-core

Re: Recent Paging Change?

2009-02-11 Thread wojtekpia
This was a false alarm, sorry. I misinterpreted some results. wojtekpia wrote: > > Has there been a recent change (since Dec 2/08) in the paging algorithm? > I'm seeing much worse performance (75% drop in throughput) when I request > 20 records starting at record 1

Re: Performance degradation caused by choice of range fields

2009-02-11 Thread wojtekpia
Yes, I commit roughly every 15 minutes (via a data update). This update is consistent between my tests, and only causes a performance drop when I'm sorting on fields with many unique values. I've examined my GC logs, and they are also consistent between my tests. Otis Gospodnetic wrote: > > Hi

Re: Recent Paging Change?

2009-02-11 Thread wojtekpia
I'll run a profiler on new and old code and let you know what I find. I have changed my schema between tests: I used to have termVectors turned on for several fields, and now they are always off. My underlying data has not changed. -- View this message in context: http://www.nabble.com/Recent-P

Recent Paging Change?

2009-02-10 Thread wojtekpia
Has there been a recent change (since Dec 2/08) in the paging algorithm? I'm seeing much worse performance (75% drop in throughput) when I request 20 records starting at record 180 (page 10 in my application). Thanks. Wojtek -- View this message in context: http://www.nabble.com/Recent-Paging-

Performance degradation caused by choice of range fields

2009-02-09 Thread wojtekpia
In my schema I have two copies of my numeric fields: one with the original value (used for display, sort), and one with a rounded version of the original value (used for range queries). When I use my rounded field for numeric range queries (e.g. q=RoundedValue:[100 TO 1000]), I see very consisten

Re: Performance "dead-zone" due to garbage collection

2009-02-09 Thread wojtekpia
I tried sorting using a function query instead of the Lucene sort and found no change in performance. I wonder if Lance's results are related to something specific to his deployment? -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21

Re: Performance "dead-zone" due to garbage collection

2009-02-09 Thread wojtekpia
Luckily, I'm still hitting my performance requirements, so I'm able to accept that. Thanks for the tips! Wojtek yonik wrote: > > On Tue, Feb 3, 2009 at 11:58 AM, wojtekpia wrote: >> I noticed your wiki post about sorting with a function query instead of >> the >

Re: Custom Sorting Algorithm

2009-02-04 Thread wojtekpia
Ok, so maybe a better question is: should I bother trying to change the "sorting" algorithm? I'm concerned that with large data sets, sorting becomes a severe bottleneck (this is an assumption, I haven't profiled anything to verify). Does it become a severe bottleneck? Do you know if alternate sor

Re: Custom Sorting Algorithm

2009-02-04 Thread wojtekpia
That's not quite what I meant. I'm not looking for a custom comparator, I'm looking for a custom sorting algorithm. Is there a way to use quick sort or merge sort or... rather than the current algorithm? Also, what is the current algorithm? Otis Gospodnetic wrote: > > > You can use one of the

Queued Requests during GC

2009-02-04 Thread wojtekpia
During full garbage collection, Solr doesn't acknowledge incoming requests. Any requests that were received during the GC are timestamped the moment GC finishes (at least that's what my logs show). Is there a limit to how many requests can queue up during a full GC? This doesn't seem like a Solr s

Custom Sorting Algorithm

2009-02-04 Thread wojtekpia
Is an easy way to choose/create an alternate sorting algorithm? I'm frequently dealing with large result sets (a few million results) and I might be able to benefit domain knowledge in my sort. -- View this message in context: http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21837721.ht

Re: Performance "dead-zone" due to garbage collection

2009-02-03 Thread wojtekpia
I noticed your wiki post about sorting with a function query instead of the Lucene sort mechanism. Did you see a significantly reduced memory footprint by doing this? Did you reduce the number of fields you allowed users to sort by? Lance Norskog-2 wrote: > > Sorting creates a large array with

Solr on Sun Java Real-Time System

2009-01-30 Thread wojtekpia
Has anyone tried Solr on the Sun Java Real-Time JVM (http://java.sun.com/javase/technologies/realtime/index.jsp)? I've read that it includes better control over the garbage collector. Thanks. Wojtek -- View this message in context: http://www.nabble.com/Solr-on-Sun-Java-Real-Time-System-tp2175

RE: Performance "dead-zone" due to garbage collection

2009-01-30 Thread wojtekpia
I profiled our application, and GC is definitely the problem. The IBM JVM didn't change much. I'm currently looking into ways of reducing my memory footprint. -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21758001.html S

Re: Intermittent high response times

2009-01-23 Thread wojtekpia
The type of garbage collector definitely affects performance, but there are other settings as well. There's a related thread currently discussing this: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-td21588427.html hbi dev wrote: > > Hi wojtekpia,

Re: Performance "dead-zone" due to garbage collection

2009-01-22 Thread wojtekpia
I'm not sure if you suggested it, but I'd like to try the IBM JVM. Aside from setting my JRE paths, is there anything else I need to do run inside the IBM JVM? (e.g. re-compiling?) Walter Underwood wrote: > > What JVM and garbage collector setting? We are using the IBM JVM with > their concurre

Re: Intermittent high response times

2009-01-22 Thread wojtekpia
I'm experiencing similar issues. Mine seem to be related to old generation garbage collection. Can you monitor your garbage collection activity? (I'm using JConsole to monitor it: http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html). In my system, garbage collection usually doesn'

Re: Performance "dead-zone" due to garbage collection

2009-01-21 Thread wojtekpia
(Thanks for the responses) My filterCache hit rate is ~60% (so I'll try making it bigger), and I am CPU bound. How do I measure the size of my per-request garbage? Is it (total heap size before collection - total heap size after collection) / # of requests to cause a collection? I'll try your

Re: Performance "dead-zone" due to garbage collection

2009-01-21 Thread wojtekpia
I'm using a recent version of Sun's JVM (6 update 7) and am using the concurrent generational collector. I've tried several other collectors, none seemed to help the situation. I've tried reducing my heap allocation. The search performance got worse as I reduced the heap. I didn't monitor the gar

Re: Performance Hit for Zero Record Dataimport

2009-01-21 Thread wojtekpia
Created SOLR 974: https://issues.apache.org/jira/browse/SOLR-974 -- View this message in context: http://www.nabble.com/Performance-Hit-for-Zero-Record-Dataimport-tp21572935p21588634.html Sent from the Solr - User mailing list archive at Nabble.com.

Performance "dead-zone" due to garbage collection

2009-01-21 Thread wojtekpia
I'm intermittently experiencing severe performance drops due to Java garbage collection. I'm allocating a lot of RAM to my Java process (27GB of the 32GB physically available). Under heavy load, the performance drops approximately every 10 minutes, and the drop lasts for 30-40 seconds. This coinci

Re: Performance Hit for Zero Record Dataimport

2009-01-21 Thread wojtekpia
Thanks Shalin, a short circuit would definitely solve it. Should I open a JIRA issue? Shalin Shekhar Mangar wrote: > > I guess Data Import Handler still calls commit even if there were no > documents created. We can add a short circuit in the code to make sure > that > does not happen. > --

Performance Hit for Zero Record Dataimport

2009-01-20 Thread wojtekpia
I have a transient SQL table that I use to load data into Solr using the DataImportHandler. I run an update every 15 minutes (dataimport?command=full-import&clean=false&optimize=false), but my table will frequently have no new data for me to import. When the table contains no data, it looks like S

Overlapping Replication Scripts

2009-01-08 Thread wojtekpia
I have set up cron jobs that update my index every 15 minutes. I have a distributed setup, so the steps are: 1. Update index on indexer machine (and possibly optimize) 2. Invoke snapshooter on indexer 3. Invoke snappuller on searcher 4. Invoke snapinstaller on searcher. These updates are small, d

Re: Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia
I'm optimizing because I thought I should. I'll be updating my index somewhere between every 15 minutes, and every 2 hours. That means between 12 and 96 updates per day. That seems like a lot of index files (and it scared me a little), so that's my second reason for wanting to optimize nightly. I

Re: Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia
I use my warm up queries to fill the field cache (or at least that's the idea). My filterCache hit rate is ~99% & queryResultCache is ~65%. I update my index several times a day with no 'optimize', and performance is seemless. I also update my index once nightly with an 'optimize', and that's wh

RE: Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia
Sorry, I forgot to include that. All my autowarmcount's are set to 0. Feak, Todd wrote: > > First suspect would be Filter Cache settings and Query Cache settings. > > If they are auto-warming at all, then there is a definite difference > between the first start behavior and the post-commit beh

Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia
I'm running load tests against my Solr instance. I find that it typically takes ~10 minutes for my Solr setup to "warm-up" while I throw my test queries at it. Also, I have the same two warm-up queries specified for the firstSearcher and newSearcher event listeners. I'm now benchmarking the affe

Re: new faceting algorithm

2008-12-12 Thread wojtekpia
It looks like my filterCache was too big. I reduced my filterCache size from 700,000 to 20,000 (without changing the heap size) and all my performance issues went away. I experimented with various GC settings, but none of them made a significant difference. I see a 16% increase in throughput by a

Re: Smaller filterCache giving better performance

2008-12-05 Thread wojtekpia
Reducing the amount of memory given to java slowed down Solr at first, then quickly caused the garbage collector to behave badly (same issue as I referenced above). I am using the concurrent cache for all my caches. -- View this message in context: http://www.nabble.com/Smaller-filterCache-giv

Smaller filterCache giving better performance

2008-12-05 Thread wojtekpia
I've seen some strangle results in the last few days of testing, but this one flies in the face of everything I've read on this forum: Reducing filterCache size has increased performance. I have posted my setup here: http://www.nabble.com/Throughput-Optimization-td20335132.html. My original fil

Re: NIO not working yet

2008-12-04 Thread wojtekpia
I've updated my deployment to use NIOFSDirectory. Now I'd like to confirm some previous results with the original FSDirectory. Can I turn it off with a parameter? I tried: java -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.FSDirectory ... but that didn't work. -- View this mes

Re: new faceting algorithm

2008-12-04 Thread wojtekpia
Yonik Seeley wrote: > > > Are you doing commits at any time? > One possibility is the caching mechanism (weak-ref on the > IndexReader)... that's going to be changing soon hopefully. > > -Yonik > No commits during this test. Should I start looking into my heap size distribution and garbage

Re: Throughput Optimization

2008-12-04 Thread wojtekpia
New faceting stuff off because I'm encountering some problems when I turn it on, I posted the details: http://www.nabble.com/new-faceting-algorithm-td20674902.html#a20840622 Yonik Seeley wrote: > > On Thu, Dec 4, 2008 at 1:54 PM, wojtekpia <[EMAIL PROTECTED]> wrote: >

Re: new faceting algorithm

2008-12-04 Thread wojtekpia
. I described my deployment scenario in an earlier post: http://www.nabble.com/Throughput-Optimization-td20335132.html Does it sound like the new faceting algorithm could be the culprit? wojtekpia wrote: > > Definitely, but it'll take me a few days. I'll also report findin

Re: Throughput Optimization

2008-12-04 Thread wojtekpia
It looks like file locking was the bottleneck - CPU usage is up to ~98% (from the previous peak of ~50%). I'm running the trunk code from Dec 2 with the faceting improvement (SOLR-475) turned off. Thanks for all the help! Yonik Seeley wrote: > > FYI, SOLR-465 has been committed. Let us know if

Re: new faceting algorithm

2008-12-02 Thread wojtekpia
Definitely, but it'll take me a few days. I'll also report findings on SOLR-465. (I've been on holiday for a few weeks) Noble Paul നോബിള്‍ नोब्ळ् wrote: > > wojtek, you can report back the numbers if possible > > It would be nice to know how the new impl performs in real-world > > > -- Vi

Re: new faceting algorithm

2008-12-02 Thread wojtekpia
Is there a configurable way to switch to the previous implementation? I'd like to see exactly how it affects performance in my case. Yonik Seeley wrote: > > And if you want to verify that the new faceting code has indeed kicked > in, some statistics are logged, like: > > Nov 24, 2008 11:14:32

Re: Throughput Optimization

2008-11-05 Thread wojtekpia
I'd like to integrate this improvement into my deployment. Is it just a matter of getting the latest Lucene jars (Lucene nightly build)? Yonik Seeley wrote: > > You're probably hitting some contention with the locking around the > reading of index files... this has been recently improved in Luc

RE: Throughput Optimization

2008-11-05 Thread wojtekpia
I'll try changing my other caches to LRUCache and observe performance. Interestingly, the FastLRUCache has given me a ~10% increase in performance, much lower than I've read on the SOLR-667 thread. Would compressing some of my stored fields significantly improve performance? Most of my stored fie

RE: Throughput Optimization

2008-11-05 Thread wojtekpia
iginal Message- > From: wojtekpia [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 05, 2008 8:15 AM > To: solr-user@lucene.apache.org > Subject: Re: Throughput Optimization > > > Yes, I am seeing evictions. I've tried setting my filterCache higher, > but > then I

Re: Throughput Optimization

2008-11-05 Thread wojtekpia
Where is the alt directory in the source tree (or what is the JIRA issue number)? I'd like to apply this patch and re-run my tests. Does changing the lockType in solrconfig.xml address this issue? (My lockType is the default - single). markrmiller wrote: > > The latest alt directory patch uses

Re: Throughput Optimization

2008-11-05 Thread wojtekpia
If so, it isn't set large enough to handle the faceting > you're doing. > > Erik > > > On Nov 4, 2008, at 8:01 PM, wojtekpia wrote: > >> >> I've been running load tests over the past week or 2, and I can't >> figure out >>

Throughput Optimization

2008-11-04 Thread wojtekpia
I've been running load tests over the past week or 2, and I can't figure out my system's bottle neck that prevents me from increasing throughput. First I'll describe my Solr setup, then what I've tried to optimize the system. I have 10 million records and 59 fields (all are indexed, 37 are stored

Re: Highlight Fragments

2008-09-23 Thread wojtekpia
ad of string? > > > Thank you very much for the help by the way. > > > On Tue, Sep 23, 2008 at 2:49 PM, wojtekpia <[EMAIL PROTECTED]> wrote: > >> >> Your fields are all of string type. String fields aren't tokenized or >> analyzed, so you have to

Re: Highlight Fragments

2008-09-23 Thread wojtekpia
ot sure why it's not working. We use > this live and do very complex queries including facets that work fine. > > www.donorschoose.org > > > > On Tue, Sep 23, 2008 at 2:20 PM, wojtekpia <[EMAIL PROTECTED]> wrote: > >> >> Try a query where you're sure

Re: Highlight Fragments

2008-09-23 Thread wojtekpia
ied on > > stored="true"/> > compressed="true"/> > > > > On Tue, Sep 23, 2008 at 1:59 PM, wojtekpia <[EMAIL PROTECTED]> wrote: > >> >> Make sure the fields you're trying to highlight are stored in your schema &g

Re: Highlight Fragments

2008-09-23 Thread wojtekpia
Make sure the fields you're trying to highlight are stored in your schema (e.g. ) David Snelling-2 wrote: > > Ok, I'm very frustrated. I've tried every configuraiton I can and > parameters > and I cannot get fragments to show up in the highlighting in solr. (no > fragments at the bottom or hig

Re: dataimporter.last_index_time not set for full-import query

2008-09-10 Thread wojtekpia
I created a JIRA issue for this and attached a patch: https://issues.apache.org/jira/browse/SOLR-768 wojtekpia wrote: > > I would like to use (abuse?) the dataimporter.last_index_time variable in > my full-import query, but it looks like that variable is only set when > running a

dataimporter.last_index_time not set for full-import query

2008-09-10 Thread wojtekpia
I would like to use (abuse?) the dataimporter.last_index_time variable in my full-import query, but it looks like that variable is only set when running a delta-import. My use case: I'd like to use a stored procedure to manage how data is given to the DataImportHandler so I can gracefully handle

Re: Faceting MoreLikeThisComponent results

2008-09-08 Thread wojtekpia
Thanks Hoss. I created SOLR 760: https://issues.apache.org/jira/browse/SOLR-760 hossman wrote: > > > : When using the MoreLikeThisHandler with facets turned on, the facets > show > : counts of things that are more like my original document. When I use the > : MoreLikeThisComponent, the facets

Re: Creating dynamic fields with DataImportHandler

2008-08-29 Thread wojtekpia
e (if you know it beforehand) > to > your data config and use the Transformer to set the value. If you don't > know > the field name before hand then this will not work for you. > > On Sat, Aug 30, 2008 at 1:31 AM, wojtekpia <[EMAIL PROTECTED]> wrote: > >> >

  1   2   >