Re: performance sorting multivalued field
Chris Hostetter-3 wrote: > > sorting on a multivalued is defined to have un-specified behavior. it > might fail with an error, or it might fail silently. > I learned this the hard way, it failed silently for a long time until it failed with an error: http://lucene.472066.n3.nabble.com/Different-sort-behavior-on-same-code-td503761.html -- View this message in context: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p920012.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SEVERE: Unable to move index file
Hi, I ran into this problem again the other night. I've looked through my log files in more detail, and nothing seems out of place (I stripped user queries out and included it below). I have the following setup: 1. Indexer has 2 cores. One core gets incremental updates, the other is for full re-syncs with a database. The last step in my full re-sync process is to swap cores (so that the searchers don't have to change their replication master URLs). 2. Searcher that is subscribed to a constant indexer URL. I noticed this replication error occurred right after I swapped my indexer's cores. Since the index version and generation numbers are independent across the 2 cores, could the searcher's index clean up be pre-emptively deleting the active searcher index? When the error occurred, index.20100921053730 did not exist, but index.properties was pointing to it. Previous entries in the log make it seem like the directory did exist a few minutes earlier (replication + warmup succeeded pointing at that directory). I've tried to reproduce this in a development environment, but haven't been able to so far. https://issues.apache.org/jira/browse/SOLR-1822?focusedCommentId=12845175 SOLR-1822 seems to address a similar issue. I suspect that it would solve what I'm seeing, but it treats the symptom rather than the cause (and I'd like to be able to repro before trying it). Any insight/theories are appreciated. Thanks, Wojtek Sep 21, 2010 5:35:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Master's version: 1271723727936, generation: 18616 Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave's version: 1271723727935, generation: 18615 Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Number of files in latest index in master: 118 Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller downloadIndexFiles INFO: Skipping download for /solr/data/index.20100921053730/_13n9.prx Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller downloadIndexFiles INFO: Skipping download for /solr/data/index.20100921053730/_13nx.fnm ... Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller downloadIndexFiles INFO: Skipping download for /solr/data/index.20100921053730/_13m5.fnm ... Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller downloadIndexFiles INFO: Skipping download for /solr/data/index.20100921053730/_13n9.frq Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Total time taken for download : 0 secs Sep 21, 2010 5:37:31 PM org.apache.solr.update.DirectUpdateHandler2 __AW_commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false) Sep 21, 2010 5:37:31 PM org.apache.solr.search.SolrIndexSearcher INFO: Opening searc...@61080339 main Sep 21, 2010 5:37:31 PM org.apache.solr.update.DirectUpdateHandler2 __AW_commit INFO: end_commit_flush Sep 21, 2010 5:37:31 PM org.apache.solr.search.SolrIndexSearcher __AW_warm INFO: autowarming searc...@61080339 main from searc...@26aebd8c main fieldValueCache{lookups=866,hits=866,hitratio=1.00,inserts=0,evictions=0,size=11,warmupTime=0,cumulative_lookups=493365,cumulative_hits=493351,cumulative_hitratio=0.99,cumulative_inserts=7,cumulative_evictions=0,item_FeaturesFacet={field=FeaturesFacet,memSize=51896798,tindexSize=56,time=988,phase1=936,nTerms=50,bigTerms=9,termInstances=5403271,uses=146},...} ... Sep 21, 2010 5:37:31 PM org.apache.solr.search.SolrIndexSearcher __AW_warm INFO: autowarming result for searc...@61080339 main documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=2036931,cumulative_hits=836191,cumulative_hitratio=0.41,cumulative_inserts=1200740,cumulative_evictions=1103563} Sep 21, 2010 5:37:31 PM org.apache.solr.core.QuerySenderListener __AW_newSearcher INFO: QuerySenderListener sending requests to searc...@61080339 main Sep 21, 2010 5:37:31 PM org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field {field=BedFacet,memSize=48178130,tindexSize=42,time=313,phase1=261,nTerms=6,bigTerms=4,termInstances=328351,uses=0} ... INFO: [] webapp=null path=null params={*:*} hits=11546888 status=0 QTime=20687 Sep 21, 2010 5:37:58 PM org.apache.solr.core.QuerySenderListener __AW_newSearcher INFO: QuerySenderListener done. Sep 21, 2010 5:37:58 PM org.apache.solr.core.SolrCore registerSearcher INFO: [] Registered new searcher searc...@61080339 main Sep 21, 2010 5:37:58 PM org.apache.solr.search.SolrIndexSearcher __AW_close INFO: Closing searc...@26aebd8c main fieldValueCache{lookups=950,hits=950,hitratio=1.00,inserts=0,evictions=0,size=11,warmupTime=0,cumulative_lookups=493449,cumulative_hits=493435,cumulative_hitratio=0.99,cumulat
RE: One item, multiple fields, and range queries
Hi Hoss, I realize I'm reviving a really old thread, but I have the same need, and SpanNumericRangeQuery sounds like a good solution for me. Can you give me some guidance on how to implement that? Thanks, Wojtek -- View this message in context: http://lucene.472066.n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p2796613.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr CMS Integration
I've been asked to suggest a framework for managing a website's content and making all that content searchable. I'm comfortable using Solr for search, but I don't know where to start with the content management system. Is anyone using a CMS (open source or commercial) that you've integrated with Solr for search and are happy with? This will be a consumer facing website with a combination or articles, blogs, white papers, etc. Thanks, Wojtek -- View this message in context: http://www.nabble.com/Solr-CMS-Integration-tp24868462p24868462.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr CMS Integration
Thanks for the responses. I'll give Drupal a shot. It sounds like it'll do the trick, and if it doesn't then at least I'll know what I'm looking for. Wojtek -- View this message in context: http://www.nabble.com/Solr-CMS-Integration-tp24868462p24870218.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facets with an IDF concept
Hi Asif, Did you end up implementing this as a custom sort order for facets? I'm facing a similar problem, but not related to time. Given 2 terms: A: appears twice in half the search results B: appears once in every search result I think term A is more "interesting". Using facets sorted by frequency, term B is more important (since it shows up first). To me, terms that appear in all documents aren't really that interesting. I'm thinking of using a combination of document count (in the result set, not globally) and term frequency (in the result set, not globally) to come up with a facet sort order. Wojtek -- View this message in context: http://www.nabble.com/Facets-with-an-IDF-concept-tp24071160p24959192.html Sent from the Solr - User mailing list archive at Nabble.com.
Searching and Displaying Different Logical Entities
I'm trying to figure out if Solr is the right solution for a problem I'm facing. I have 2 data entities: P(arent) & C(hild). P contains up to 100 instances of C. I need to expose an interface that searches attributes of entity C, but displays them grouped by parent entity, P. I need to include facet counts in the result, and the counts are based on P. My first solution was to create 2 Solr instances: one for each entity. I would have to execute 2 queries each time: 1) get a list of matching P's based on a query of the C instance (facet by P ID in C instance to get unique list of P's), then 2) get all P's by ID, including facet counts, etc. The problem I face with this solution is that I can have many matching P's (10,000+), so my second query will have many (10,000+) constraints. My second (and current) solution is to create a single instance, and flatten all C attributes into the appropriate P record using dynamic fields. For example, if C has an attribute CA, then I have a dynamic field in P called CA*. I name this field incrementally based on the number of C's per P (CA1, CA2, ...). This works, except that each query is very long (CA1:condition OR CA2: condition ...). Neither solution is ideal. I'm wondering if I'm missing something obvious, or if I'm using the wrong solution for this problem. Any insight is appreciated. Wojtek -- View this message in context: http://www.nabble.com/Searching-and-Displaying-Different-Logical-Entities-tp25156301p25156301.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Searching and Displaying Different Logical Entities
Funtick wrote: > >>then 2) get all P's by ID, including facet counts, etc. >>The problem I face with this solution is that I can have many matching P's > (10,000+), so my second query will have many (10,000+) constraints. > > SOLR can automatically provide you P's with Counts, and it will be > _unique_... > > I assume you mean to facet by P in the C index. My next problem is to sort those P's based on some attribute of P (as opposed to alphabetically or by occurrence in C). Funtick wrote: > > Even if cardinality of P is 10,000+ SOLR is very fast now (expect few > seconds response time for initial request). You need single query with > "faceting"... > Is there a practical limit for maxBooleanClauses? The default is 1024, but I need at least 10,000. Funtick wrote: > > (!) You do not need P's ID. > > Single document will have unique ID, and fields such as P, C (with > possible > attributes). Do not think in terms of RDBMS... Lucene does all > 'normalization' behind the scenes, and SOLR will give you Ps with Cs... > If I put both P's and C's into a single index, then I agree, I don't need P's ID. If I have P and C in separate indices then I still need to maintain the logical relationship between P and C. It wasn't clear to me if you suggested I continue with either of my 2 proposed solutions. Can you clarify? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Searching-and-Displaying-Different-Logical-Entities-tp25156301p25181664.html Sent from the Solr - User mailing list archive at Nabble.com.
Backups using Replication
I'm trying to create data backups using the ReplicationHandler's built in functionality. I've configured my master as http://wiki.apache.org/solr/SolrReplication documented : ... optimize ... but I don't see any backups created on the master. Do I need the snapshooter script available? I did not deploy it on my master, I assumed it was part of the 'old' way of doing replication. If I invoke the backup command over HTTP (http://master_host:port/solr/replication?command=backup) then it seems to work - I get directories like "snapshot.20090908094423". Thanks, Wojtek -- View this message in context: http://www.nabble.com/Backups-using-Replication-tp25350083p25350083.html Sent from the Solr - User mailing list archive at Nabble.com.
Passing FuntionQuery string parameters
Hi, I'm writing a function query to score documents based on Levenshtein distance from a string. I want my function calls to look like: lev(myFieldName, 'my string to match') I'm running into trouble parsing the string I want to match ('my string to match' above). It looks like all the built in support is for parsing field names and numeric values. Am I missing the string parsing support, or is it not there, and if not, why? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Passing-FuntionQuery-string-parameters-tp25351825p25351825.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Backups using Replication
I'm using trunk from July 8, 2009. Do you know if it's more recent than that? Noble Paul നോബിള് नोब्ळ्-2 wrote: > > which version of Solr are you using? the "backupAfter" name was > introduced recently > -- View this message in context: http://www.nabble.com/Backups-using-Replication-tp25350083p25386886.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Passing FuntionQuery string parameters
It looks like parseArg was added on Aug 20, 2009. I'm working with slightly older code. Thanks! Noble Paul നോബിള് नोब्ळ्-2 wrote: > > did you implement your own ValueSourceParser . the > FunctionQParser#parseArg() method supports strings > > On Wed, Sep 9, 2009 at 12:10 AM, wojtekpia wrote: >> >> Hi, >> >> I'm writing a function query to score documents based on Levenshtein >> distance from a string. I want my function calls to look like: >> >> lev(myFieldName, 'my string to match') >> >> I'm running into trouble parsing the string I want to match ('my string >> to >> match' above). It looks like all the built in support is for parsing >> field >> names and numeric values. Am I missing the string parsing support, or is >> it >> not there, and if not, why? >> >> Thanks, >> >> Wojtek >> -- >> View this message in context: >> http://www.nabble.com/Passing-FuntionQuery-string-parameters-tp25351825p25351825.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/Passing-FuntionQuery-string-parameters-tp25351825p25386910.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Backups using Replication
Do you mean that it's been renamed, so this should work? ... optimize ... Noble Paul നോബിള് नोब्ळ्-2 wrote: > > before that backupAfter was called "snapshot" > -- View this message in context: http://www.nabble.com/Backups-using-Replication-tp25350083p25407695.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Backups using Replication
I've verified that renaming backAfter to snapshot works (I should've checked before asking). Thanks Noble! wojtekpia wrote: > > > > > ... > optimize > ... > > > > > -- View this message in context: http://www.nabble.com/Backups-using-Replication-tp25350083p25407846.html Sent from the Solr - User mailing list archive at Nabble.com.
FileListEntityProcessor and LineEntityProcessor
Hi, I'm trying to import data from a list of files using the FileListEntityProcessor. Here is my import configuration: If I have only one file in d:\my\directory\ then everything works correctly. If I have multiple files then I get the following exception: Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DocBuilder buildDocum ent SEVERE: Exception while processing: f document : null org.apache.solr.handler.dataimport.DataImportHandlerException: Problem reading f rom input Processing Document # 53812 at org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn tityProcessor.java:112) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent ityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde r.java:348) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde r.java:376) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j ava:224) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java :167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo rter.java:316) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j ava:376) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja va:355) Caused by: java.io.IOException: Stream closed at java.io.BufferedReader.ensureOpen(Unknown Source) at java.io.BufferedReader.readLine(Unknown Source) at java.io.BufferedReader.readLine(Unknown Source) at org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn tityProcessor.java:109) ... 8 more Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DataImporter doFullIm port SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Problem reading f rom input Processing Document # 53812 at org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn tityProcessor.java:112) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent ityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde r.java:348) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde r.java:376) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j ava:224) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java :167) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo rter.java:316) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j ava:376) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja va:355) Caused by: java.io.IOException: Stream closed at java.io.BufferedReader.ensureOpen(Unknown Source) at java.io.BufferedReader.readLine(Unknown Source) at java.io.BufferedReader.readLine(Unknown Source) at org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn tityProcessor.java:109) ... 8 more Note that my input files have 53812 lines, which is the same as the document number that I'm choking on. Does anyone know what I'm doing wrong? Thanks, Wojtek -- View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25476443.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FileListEntityProcessor and LineEntityProcessor
Fergus McMenemie-2 wrote: > > > Can you provide more detail on what you are trying to do? ... > You seem to listing all files "d:\my\directory\.*WRK". Do > these WRK files contain lists of files to be indexed? > > That is my complete data config file. I have a directory containing a bunch of files that have one entity per line. Each line contains "blocks" of data. I parse out each block and process it appropriately using myTransformer. Is this use of FileListEntityProcessor with LineEntityProcessor not supported? -- View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25477613.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FileListEntityProcessor and LineEntityProcessor
Note that if I change my import file to explicitly list all my files (instead of using the FileListEntityProcessor) as below then everything works as I expect. ... -- View this message in context: http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25480830.html Sent from the Solr - User mailing list archive at Nabble.com.
Multi-valued field cache
I want to build a FunctionQuery that scores documents based on a multi-valued field. My intention was to use the field cache, but that doesn't get me multiple values per document. I saw other posts suggesting UnInvertedField as the solution. I don't see a method in the UnInvertedField class that will give me a list of field values per document. I only see methods that give values per document set. Should I use one of those methods and create document sets of size 1 for each document? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Multi-valued-field-cache-tp25684952p25684952.html Sent from the Solr - User mailing list archive at Nabble.com.
Different sort behavior on same code
Hi, I'm running Solr version 1.3.0.2009.07.08.08.05.45 in 2 environments. I have a field defined as: The two environments have different data, but both have single and multi valued entries for myDate. On one environment sorting by myDate works (sort seems to be by the 'last' value if multi valued). On the other environment I get: HTTP Status 500 - there are more terms than documents in field "myDate", but it's impossible to sort on tokenized fields java.lang.RuntimeException: there are more terms than documents in field I've read that I shouldn't sort by multi-valued fields, so my solution will be to add a single-valued date field for sorting. But I don't understand why my two environments behave differently, and it doesn't seem like the error message makes sense (are date fields tokenized?). Any thoughts? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Different-sort-behavior-on-same-code-tp25774769p25774769.html Sent from the Solr - User mailing list archive at Nabble.com.
Changing masterUrl in ReplicationHandler at Runtime
Hi, I'm trying to change the masterUrl of a search slave at runtime. So far I've found 2 ways of doing it: 1. Change solrconfig_slave.xml on master, and have it replicate to solrconfig.xml on the slave 2. Change solrconfig.xml on slave, then issue a core reload command. (a side note: can I issue the reload-core command without having a solr.xml file? I had to run a single core in multi-core mode to make this work) So far I like solution 2 better. Does it make sense to add a 'sticky' parameter to the ReplicationHandler's fetchindex command? Something like: http://slave_host:port/solr/replication?command=fetchindex&masterUrl=myUrl&stickyMasterUrl=true If true then 'myUrl' would continue being used for replication, including future polling. Are there other solutions? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Changing-masterUrl-in-ReplicationHandler-at-Runtime-tp25829843p25829843.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how can I use debugQuery if I have extended QParserPlugin?
I'm seeing the same behavior and I don't have any custom query parsing plugins. Similar to the original post, my queries like: select?q=field:[1 TO *] select?q=field:[1 TO 2] select?q=field:[1 TO 2]&debugQuery=true work correctly, but including an unboundd range appears to break the debug component: select?q=field:[1 TO *]&debugQuery=true My stack trace is the same as the original post. gdeconto wrote: > > my apologies, you are correct; I put the stack trace in an edit of the > post and not in the original post. > > re version info: > > Solr Specification Version: 1.3.0.2009.07.08.08.05.45 > Solr Implementation Version: nightly exported - yonik - 2009-07-08 > 08:05:45 > > NOTE: I have some more info on this NPE problem. I get the NPE error > whenever I use debugQuery and the query range has an asterix in it, even > tho the query itself should work. For example: > > These work ok: > > http://127.0.0.1:8994/solr/select?q=myfield:[* TO 1] > http://127.0.0.1:8994/solr/select?q=myfield:[1 TO *] > http://127.0.0.1:8994/solr/select?q=myfield:[1 TO 1000] > http://127.0.0.1:8994/solr/select?q=myfield:[1 TO 1000]&debugQuery=true > > These do not work ok: > > http://127.0.0.1:8994/solr/select?q=myfield:[* TO 1]&debugQuery=true > http://127.0.0.1:8994/solr/select?q=myfield:[1 TO *]&debugQuery=true > http://127.0.0.1:8994/solr/select?q=myfield:* > http://127.0.0.1:8994/solr/select?q=myfield:*&debugQuery=true > > Not sure if the * gets translated somewhere into a null value parameter (I > am just starting to look at the solr code) per your comment > -- View this message in context: http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25930610.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how can I use debugQuery if I have extended QParserPlugin?
Good catch. I was testing on a nightly build from mid-July. I just tested on a similar deployment with nightly code from Oct 5th and everything seems to work. My mid-July deployment breaks on sints, integers, sdouble, doubles, slongs and longs. My more recent deployment works with tints, sints, integers, tdoubles, sdoubles, doubles, tlongs, slongs, and longs. (I don't have any floats in my schema so I didn't test those). Sounds like another reason to upgrade to 1.4. Wojtek Yonik Seeley-3 wrote: > > Is this with trunk? I can't seem to reproduce this... what's the field > type? > > -Yonik > http://www.lucidimagination.com > > On Fri, Oct 16, 2009 at 3:01 PM, wojtekpia wrote: >> >> I'm seeing the same behavior and I don't have any custom query parsing >> plugins. Similar to the original post, my queries like: >> >> select?q=field:[1 TO *] >> select?q=field:[1 TO 2] >> select?q=field:[1 TO 2]&debugQuery=true >> >> work correctly, but including an unboundd range appears to break the >> debug >> component: >> select?q=field:[1 TO *]&debugQuery=true >> >> My stack trace is the same as the original post. >> >> >> gdeconto wrote: >>> >>> my apologies, you are correct; I put the stack trace in an edit of the >>> post and not in the original post. >>> >>> re version info: >>> >>> Solr Specification Version: 1.3.0.2009.07.08.08.05.45 >>> Solr Implementation Version: nightly exported - yonik - 2009-07-08 >>> 08:05:45 >>> >>> NOTE: I have some more info on this NPE problem. I get the NPE error >>> whenever I use debugQuery and the query range has an asterix in it, even >>> tho the query itself should work. For example: >>> >>> These work ok: >>> >>> http://127.0.0.1:8994/solr/select?q=myfield:[* TO 1] >>> http://127.0.0.1:8994/solr/select?q=myfield:[1 TO *] >>> http://127.0.0.1:8994/solr/select?q=myfield:[1 TO 1000] >>> http://127.0.0.1:8994/solr/select?q=myfield:[1 TO 1000]&debugQuery=true >>> >>> These do not work ok: >>> >>> http://127.0.0.1:8994/solr/select?q=myfield:[* TO 1]&debugQuery=true >>> http://127.0.0.1:8994/solr/select?q=myfield:[1 TO *]&debugQuery=true >>> http://127.0.0.1:8994/solr/select?q=myfield:* >>> http://127.0.0.1:8994/solr/select?q=myfield:*&debugQuery=true >>> >>> Not sure if the * gets translated somewhere into a null value parameter >>> (I >>> am just starting to look at the solr code) per your comment >>> >> >> -- >> View this message in context: >> http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25930610.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25932460.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: number of Solr indexes per Tomcat instance
I ran into trouble running several cores (either as Solr multi-core or as separate web apps) in a single JVM because the Java garbage collector would freeze all cores during a collection. This may not be an issue if you're not dealing with large amounts of memory. My solution is to run each web app in its own JVM and Tomcat instance. -- View this message in context: http://www.nabble.com/number-of-Solr-indexes-per-Tomcat-instance-tp26027238p26029243.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: javabin in .NET?
I was thinking of going this route too because I've found that parsing XML result sets using XmlDocument + XPath can be very slow (up to a few seconds) when requesting ~100 documents. Are you getting good performance parsing large result sets? Are you using SAX instead of DOM? Thanks, Wojtek mausch wrote: > > It's one of my pending issues for SolrNet ( > http://code.google.com/p/solrnet/issues/detail?id=71 ) > I've looked at the code, it doesn't seem terribly complex to port to C#. > It > would be kind of cumbersome to test it though. > I just didn't implement it yet because I'm getting good enough performance > with XML (and other people as well: > http://groups.google.com/group/solrnet/msg/4de8224a33279906 ) > > Cheers, > Mauricio > -- View this message in context: http://old.nabble.com/javabin-in-.NET--tp26321914p26323001.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about schemas (and SOLR-1131?)
Could this be solved with a multi-valued custom field type (including a custom comparator)? The OP's situation deals with multi-valuing products for each customer. If products contain strictly numeric fields then it seems like a custom field implementation (or extension of BinaryField?) *should* be easy - only the comparator part needs work. I'm not clear on how the existing query parsers would handle this though, so there's probably some work there too. SOLR-1131 seems like a more general solution that supports analysis that numeric fields don't need. gdeconto wrote: > > I saw an interesting thread in the solr-dev forum about multiple fields > per fieldtype (https://issues.apache.org/jira/browse/SOLR-1131) > > from the sounds of it, it might be of interest and/or use in these types > of problems; for your example, you might be able to define a fieldtype > that houses the product data. > > note that I only skimmed the thread. hopefully, I'll get get some time to > look at it more closely > -- View this message in context: http://old.nabble.com/question-about-schemas-tp26600956p26636170.html Sent from the Solr - User mailing list archive at Nabble.com.
DataImportHandler running out of memory
I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is "select top XXX field1, ... from table1". I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler running out of memory
I'm trying with batchSize=-1 now. So far it seems to be working, but very slowly. I will update when it completes or crashes. Even with a batchSize of 100 I was running out of memory. I'm running on a 32-bit Windows machine. I've set the -Xmx to 1.5 GB - I believe that's the maximum for my environment. The batchSize parameter doesn't seem to control what happens... when I select top 5,000,000 with a batchSize of 10,000, it works. When I select top 10,000,000 with the same batchSize, it runs out of memory. Also, I'm using the 469 patch posted on 2008-06-11 08:41 AM. Noble Paul നോബിള് नोब्ळ् wrote: > > DIH streams rows one by one. > set the fetchSize="-1" this might help. It may make the indexing a bit > slower but memory consumption would be low. > The memory is consumed by the jdbc driver. try tuning the -Xmx value for > the VM > --Noble > > On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar > <[EMAIL PROTECTED]> wrote: >> Setting the batchSize to 1 would mean that the Jdbc driver will keep >> 1 rows in memory *for each entity* which uses that data source (if >> correctly implemented by the driver). Not sure how well the Sql Server >> driver implements this. Also keep in mind that Solr also needs memory to >> index documents. You can probably try setting the batch size to a lower >> value. >> >> The regular memory tuning stuff should apply here too -- try disabling >> autoCommit and turn-off autowarming and see if it helps. >> >> On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia <[EMAIL PROTECTED]> wrote: >> >>> >>> I'm trying to load ~10 million records into Solr using the >>> DataImportHandler. >>> I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) >>> as >>> soon as I try loading more than about 5 million records. >>> >>> Here's my configuration: >>> I'm connecting to a SQL Server database using the sqljdbc driver. I've >>> given >>> my Solr instance 1.5 GB of memory. I have set the dataSource batchSize >>> to >>> 1. My SQL query is "select top XXX field1, ... from table1". I have >>> about 40 fields in my Solr schema. >>> >>> I thought the DataImportHandler would stream data from the DB rather >>> than >>> loading it all into memory at once. Is that not the case? Any thoughts >>> on >>> how to get around this (aside from getting a machine with more memory)? >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> > > > > -- > --Noble Paul > > -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18115900.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler running out of memory
It looks like that was the problem. With responseBuffering=adaptive, I'm able to load all my data using the sqljdbc driver. -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search query optimization
If I know that condition C will eliminate more results than either A or B, does specifying the query as: "C AND A AND B" make it any faster (than the original "A AND B AND C")? -- View this message in context: http://www.nabble.com/Search-query-optimization-tp17544667p18205504.html Sent from the Solr - User mailing list archive at Nabble.com.
"Similarity" of numbers in MoreLikeThisHandler
I have a numeric field that I'm using for getting more records like the current one. Does the MoreLikeThisHandler do numeric comparisons on numeric fields (e.g. 4 is "similar" to 5), or is it a string comparison? -- View this message in context: http://www.nabble.com/%22Similarity%22-of-numbers-in-MoreLikeThisHandler-tp1827p1827.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: "Similarity" of numbers in MoreLikeThisHandler
I stored 2 copies of a single field: one as a number, the other as a string. The MLT handler returned the same documents regardless of which of the 2 fields I used for similarity. So to answer my own question, the MoreLikeThisHandler does not do numeric comparisons on numeric fields. -- View this message in context: http://www.nabble.com/%22Similarity%22-of-numbers-in-MoreLikeThisHandler-tp1827p18285373.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: "Similarity" of numbers in MoreLikeThisHandler
I didn't realize that subsets were used to evaluate similarity. From your example, I assume that the strings: 456 and 123456 are "similar". If I store them as integers instead of strings, will Solr/Lucene still use subsets to assign similarity? -- View this message in context: http://www.nabble.com/%22Similarity%22-of-numbers-in-MoreLikeThisHandler-tp1827p18286144.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: get the fields of solr
Thanks. Can I search for fields using the luke handler? I'd like to be able to say something like: solr/admin/luke?fl=a* where the '*' is a wildcard not necessarily related to dynamic fields. I will have at least a few hundred dynamic fields, so I'd rather not load all fields into memory in the UI. -- View this message in context: http://www.nabble.com/get-the-fields-of-solr-tp14431354p18350707.html Sent from the Solr - User mailing list archive at Nabble.com.
DataImportHandler current_index_time & post-completion action
I have two questions: 1. I am pulling data from 2 data sources using the DIH. I am using the deltaQuery functionality. Since the data sources pull data sequentially, I find that some data is getting unnecessarily re-indexed from my second data source. Hopefully this helps illustrate my probem: Assume last_index_time is 0. At time = 1, pull data from data source 1 with a query that includes "last_modified> '${dataimporter.last_index_time}'". Note that this pulls data for the time interval [0,1]. This step takes 1 time interval. At time = 2, data source 2 is polled with the same query. This step takes 1 time interval. Note that this pulls data for the time interval [0,2]. At t=3, last_index_time is set to 1 Next time I run the DIH, I will be unneccessarily re-indexing data that appeared in data source 2 in the inteval [1,2]. Ideally, I'd like to have access to something like ${dataimporter.current_index_time}, so I could restrict my delta query to: "last_modified> '${dataimporter.last_index_time}' AND last_modified < '${dataimporter.current_index_time}'" Is this available? 2. I have a transient table that I query with the DIH to load my index. After loading values into the index, I want to delete them from the transient table. Is there a way to do this from the DIH? I tried stuffing a delete statement into the deltaQuery attribute, but that didn't work: -- View this message in context: http://www.nabble.com/DataImportHandler-current_index_time---post-completion-action-tp18498832p18498832.html Sent from the Solr - User mailing list archive at Nabble.com.
termVectors and faceting
Does setting termVectors to true affect faceting speed on a field? I changed a field definition from: to: And I see a significant performance improvement (~6x faster). MyFacetField has ~25,000 unique values. Does it make sense that this change caused the improvement? I made several other changes to my schema, but I know that faceting on MyFacetField was by far the slowest part of my queries. Thanks. Wojtek -- View this message in context: http://www.nabble.com/termVectors-and-faceting-tp18717622p18717622.html Sent from the Solr - User mailing list archive at Nabble.com.
Return results for suggested SpellCheck terms
I'd like to have a handler that 1) executes a query, 2) provides spelling suggestions for incorrectly spelled words, and 3) if the original query returns 0 results, return results based on the spell check suggestions. 1 & 2 are straight forward using the SpellCheckComponent, but I can't figure out 3 without writing custom code. Can I do it with just configuration settings? -- View this message in context: http://www.nabble.com/Return-results-for-suggested-SpellCheck-terms-tp18897102p18897102.html Sent from the Solr - User mailing list archive at Nabble.com.
Duplicate Data Across Fields
I have 2 fields which will sometimes contain the same data. When they do contain the same data, am I paying the same performance cost as when they contain unique data? I think the real question here is: does Lucene index values per field, or per document? -- View this message in context: http://www.nabble.com/Duplicate-Data-Across-Fields-tp18986515p18986515.html Sent from the Solr - User mailing list archive at Nabble.com.
Faceting MoreLikeThisComponent results
When using the MoreLikeThisHandler with facets turned on, the facets show counts of things that are more like my original document. When I use the MoreLikeThisComponent, the facets show counts of things that match my original document (I'm querying by document ID), so there is only one result, and the facets are not interesting. I tried changing the order of search components (facet after mlt), but that didn't change the behavior. How can I facet the results of the MoreLikeThisComponent? -- View this message in context: http://www.nabble.com/Faceting-MoreLikeThisComponent-results-tp19206833p19206833.html Sent from the Solr - User mailing list archive at Nabble.com.
Creating dynamic fields with DataImportHandler
I have a custom row transformer that I'm using with the DataImportHandler. When I try to create a dynamic field from my transformer, it doesn't get created. If I do exactly the same thing from my dataimport handler config file, it works as expected. Has anyone experienced this? I'm using a nightly build from about 3 weeks ago. I realize there were some fixes done to the DataImportHandler since then, but if I understand them correctly, they seem unrelated to my issue (http://www.nabble.com/localsolr-and-dataimport-problems-to18849983.html#a18854923). -- View this message in context: http://www.nabble.com/Creating-dynamic-fields-with-DataImportHandler-tp19226532p19226532.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Creating dynamic fields with DataImportHandler
I have created SOLR-742: http://issues.apache.org/jira/browse/SOLR-742 For my case, I don't know the field name ahead of time. Shalin Shekhar Mangar wrote: > > Yes, sounds like a bug. Do you mind opening a jira issue for this? > > A simple workaround is to add the field name (if you know it beforehand) > to > your data config and use the Transformer to set the value. If you don't > know > the field name before hand then this will not work for you. > > On Sat, Aug 30, 2008 at 1:31 AM, wojtekpia <[EMAIL PROTECTED]> wrote: > >> >> I have a custom row transformer that I'm using with the >> DataImportHandler. >> When I try to create a dynamic field from my transformer, it doesn't get >> created. >> >> If I do exactly the same thing from my dataimport handler config file, it >> works as expected. >> >> Has anyone experienced this? I'm using a nightly build from about 3 weeks >> ago. I realize there were some fixes done to the DataImportHandler since >> then, but if I understand them correctly, they seem unrelated to my issue >> ( >> http://www.nabble.com/localsolr-and-dataimport-problems-to18849983.html#a18854923 >> ). >> -- >> View this message in context: >> http://www.nabble.com/Creating-dynamic-fields-with-DataImportHandler-tp19226532p19226532.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Creating-dynamic-fields-with-DataImportHandler-tp19226532p19227919.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting MoreLikeThisComponent results
Thanks Hoss. I created SOLR 760: https://issues.apache.org/jira/browse/SOLR-760 hossman wrote: > > > : When using the MoreLikeThisHandler with facets turned on, the facets > show > : counts of things that are more like my original document. When I use the > : MoreLikeThisComponent, the facets show counts of things that match my > : original document (I'm querying by document ID), so there is only one > ... > : How can I facet the results of the MoreLikeThisComponent? > > I don't think you can at this point. The good news is MoreLikeThisHandler > isn't getting removed anytime soon. > > > What we need to do is provide more options on the componets to dictate > their behavior when deciding what to process and how to return it ... your > example could be solved be either adding an option to MLTComponent telling > it to overwrite hte main result set; or by adding an option to > FacetComponent specifying the name of a DocSet in the response to use in > it's intersections. > > I think it would be good to do both. > > (HighlightComponent should probably also have an option just like the one > i discribed for FacetComponent) > > Would you mind filing a feature request? > > > -Hoss > > > -- View this message in context: http://www.nabble.com/Faceting-MoreLikeThisComponent-results-tp19206833p19376403.html Sent from the Solr - User mailing list archive at Nabble.com.
dataimporter.last_index_time not set for full-import query
I would like to use (abuse?) the dataimporter.last_index_time variable in my full-import query, but it looks like that variable is only set when running a delta-import. My use case: I'd like to use a stored procedure to manage how data is given to the DataImportHandler so I can gracefully handle failed imports. The stored procedure would take in the last successful data import time and decide which records should be returned. I looked into using the delta-import functionality, but it didn't seem like the right fit for my need. Any thoughts? -- View this message in context: http://www.nabble.com/dataimporter.last_index_time-not-set-for-full-import-query-tp19419383p19419383.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataimporter.last_index_time not set for full-import query
I created a JIRA issue for this and attached a patch: https://issues.apache.org/jira/browse/SOLR-768 wojtekpia wrote: > > I would like to use (abuse?) the dataimporter.last_index_time variable in > my full-import query, but it looks like that variable is only set when > running a delta-import. > > My use case: > I'd like to use a stored procedure to manage how data is given to the > DataImportHandler so I can gracefully handle failed imports. The stored > procedure would take in the last successful data import time and decide > which records should be returned. > > I looked into using the delta-import functionality, but it didn't seem > like the right fit for my need. > > Any thoughts? > -- View this message in context: http://www.nabble.com/dataimporter.last_index_time-not-set-for-full-import-query-tp19419383p19425162.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlight Fragments
Make sure the fields you're trying to highlight are stored in your schema (e.g. ) David Snelling-2 wrote: > > Ok, I'm very frustrated. I've tried every configuraiton I can and > parameters > and I cannot get fragments to show up in the highlighting in solr. (no > fragments at the bottom or highlights in the text. I must be > missing something but I'm just not sure what it is. > > /select/?qt=standard&q=crayon&hl=true&hl.fl=synopsis,shortdescription&hl.fragmenter=gap&hl.snippets=3&debugQuery=true > > And I get highlight segment, but no fragments or phrase highlighting. > > My goal - if I'm doing this completely wrong - is to get google like > snippets of text around the query term (or at mimimum to highlight the > query > term itself). > > Results: > > synopsis > true > 3 > gap > crayon > synopsis > standard > true > 2.1 > > > − > > − > ... > . > .. > > > > > > > > > > > > > > > -- > "hic sunt dracones" > > -- View this message in context: http://www.nabble.com/Highlight-Fragments-tp19636705p19636915.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlight Fragments
Try a query where you're sure to get something to highlight in one of your highlight fields, for example: /select/?qt=standard&q=synopsis:crayon&hl=true&hl.fl=synopsis,shortdescription David Snelling-2 wrote: > > This is the configuration for the two fields I have tried on > > stored="true"/> > compressed="true"/> > > > > On Tue, Sep 23, 2008 at 1:59 PM, wojtekpia <[EMAIL PROTECTED]> wrote: > >> >> Make sure the fields you're trying to highlight are stored in your schema >> (e.g. ) >> >> >> >> David Snelling-2 wrote: >> > >> > Ok, I'm very frustrated. I've tried every configuraiton I can and >> > parameters >> > and I cannot get fragments to show up in the highlighting in solr. (no >> > fragments at the bottom or highlights in the text. I must be >> > missing something but I'm just not sure what it is. >> > >> > >> /select/?qt=standard&q=crayon&hl=true&hl.fl=synopsis,shortdescription&hl.fragmenter=gap&hl.snippets=3&debugQuery=true >> > >> > And I get highlight segment, but no fragments or phrase highlighting. >> > >> > My goal - if I'm doing this completely wrong - is to get google like >> > snippets of text around the query term (or at mimimum to highlight the >> > query >> > term itself). >> > >> > Results: >> > >> > synopsis >> > true >> > 3 >> > gap >> > crayon >> > synopsis >> > standard >> > true >> > 2.1 >> > >> > >> > − >> > >> > − >> > ... >> > . >> > .. >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > -- >> > "hic sunt dracones" >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Highlight-Fragments-tp19636705p19636915.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > "hic sunt dracones" > > -- View this message in context: http://www.nabble.com/Highlight-Fragments-tp19636705p19637261.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlight Fragments
Your fields are all of string type. String fields aren't tokenized or analyzed, so you have to match the entire text of those fields to actually get a match. Try the following: /select/?q=firstname:Kathryn&hl=on&hl.fl=firstname The reason you're seeing results with just q=students, but not q=synopsis:students is because you're copying the synopsis field into your field named 'text', which is of type 'text', which does get tokenized and analyzed, and 'text' is your default search field. The reason you don't see any highlights with the following query is because your 'text' field isn't stored. select/?q=text:students&hl=on&hl.fl=text David Snelling-2 wrote: > > Hmmm. That doesn't actually return anything which is odd because I know > it's in the field if I do a query without specifying the field. > > http://qasearch.donorschoose.org/select/?q=synopsis:students > > returns nothing > > http://qasearch.donorschoose.org/select/?q=students > > returns items with query in synopsis field. > > This may be causing issues but I'm not sure why it's not working. We use > this live and do very complex queries including facets that work fine. > > www.donorschoose.org > > > > On Tue, Sep 23, 2008 at 2:20 PM, wojtekpia <[EMAIL PROTECTED]> wrote: > >> >> Try a query where you're sure to get something to highlight in one of >> your >> highlight fields, for example: >> >> >> /select/?qt=standard&q=synopsis:crayon&hl=true&hl.fl=synopsis,shortdescription >> >> >> >> David Snelling-2 wrote: >> > >> > This is the configuration for the two fields I have tried on >> > >> > > > stored="true"/> >> > > > compressed="true"/> >> > >> > >> > >> > On Tue, Sep 23, 2008 at 1:59 PM, wojtekpia <[EMAIL PROTECTED]> >> wrote: >> > >> >> >> >> Make sure the fields you're trying to highlight are stored in your >> schema >> >> (e.g. ) >> >> >> >> >> >> >> >> David Snelling-2 wrote: >> >> > >> >> > Ok, I'm very frustrated. I've tried every configuraiton I can and >> >> > parameters >> >> > and I cannot get fragments to show up in the highlighting in solr. >> (no >> >> > fragments at the bottom or highlights in the text. I must >> be >> >> > missing something but I'm just not sure what it is. >> >> > >> >> > >> >> >> /select/?qt=standard&q=crayon&hl=true&hl.fl=synopsis,shortdescription&hl.fragmenter=gap&hl.snippets=3&debugQuery=true >> >> > >> >> > And I get highlight segment, but no fragments or phrase >> highlighting. >> >> > >> >> > My goal - if I'm doing this completely wrong - is to get google like >> >> > snippets of text around the query term (or at mimimum to highlight >> the >> >> > query >> >> > term itself). >> >> > >> >> > Results: >> >> > >> >> > synopsis >> >> > true >> >> > 3 >> >> > gap >> >> > crayon >> >> > synopsis >> >> > standard >> >> > true >> >> > 2.1 >> >> > >> >> > >> >> > − >> >> > >> >> > − >> >> > ... >> >> > . >> >> > .. >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > "hic sunt dracones" >> >> > >> >> > >> >> >> >> -- >> >> View this message in context: >> >> http://www.nabble.com/Highlight-Fragments-tp19636705p19636915.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> > -- >> > "hic sunt dracones" >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Highlight-Fragments-tp19636705p19637261.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > "hic sunt dracones" > > -- View this message in context: http://www.nabble.com/Highlight-Fragments-tp19636705p19637801.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlight Fragments
Yes, you can use text (or some custom derivative of it) for your fields. David Snelling-2 wrote: > > Ok, thanks, that makes a lot of sense now. > So, how should I be storing the text for the synopsis or shortdescription > fields so it would be tokenized? Should it be text instead of string? > > > Thank you very much for the help by the way. > > > On Tue, Sep 23, 2008 at 2:49 PM, wojtekpia <[EMAIL PROTECTED]> wrote: > >> >> Your fields are all of string type. String fields aren't tokenized or >> analyzed, so you have to match the entire text of those fields to >> actually >> get a match. Try the following: >> >> /select/?q=firstname:Kathryn&hl=on&hl.fl=firstname >> >> The reason you're seeing results with just q=students, but not >> q=synopsis:students is because you're copying the synopsis field into >> your >> field named 'text', which is of type 'text', which does get tokenized and >> analyzed, and 'text' is your default search field. >> >> The reason you don't see any highlights with the following query is >> because >> your 'text' field isn't stored. >> >> select/?q=text:students&hl=on&hl.fl=text >> >> >> >> >> >> David Snelling-2 wrote: >> > >> > Hmmm. That doesn't actually return anything which is odd because I >> know >> > it's in the field if I do a query without specifying the field. >> > >> > http://qasearch.donorschoose.org/select/?q=synopsis:students >> > >> > returns nothing >> > >> > http://qasearch.donorschoose.org/select/?q=students >> > >> > returns items with query in synopsis field. >> > >> > This may be causing issues but I'm not sure why it's not working. We >> use >> > this live and do very complex queries including facets that work fine. >> > >> > www.donorschoose.org >> > >> > >> > >> > On Tue, Sep 23, 2008 at 2:20 PM, wojtekpia <[EMAIL PROTECTED]> >> wrote: >> > >> >> >> >> Try a query where you're sure to get something to highlight in one of >> >> your >> >> highlight fields, for example: >> >> >> >> >> >> >> /select/?qt=standard&q=synopsis:crayon&hl=true&hl.fl=synopsis,shortdescription >> >> >> >> >> >> >> >> David Snelling-2 wrote: >> >> > >> >> > This is the configuration for the two fields I have tried on >> >> > >> >> > > >> > stored="true"/> >> >> > > >> > compressed="true"/> >> >> > >> >> > >> >> > >> >> > On Tue, Sep 23, 2008 at 1:59 PM, wojtekpia <[EMAIL PROTECTED]> >> >> wrote: >> >> > >> >> >> >> >> >> Make sure the fields you're trying to highlight are stored in your >> >> schema >> >> >> (e.g. ) >> >> >> >> >> >> >> >> >> >> >> >> David Snelling-2 wrote: >> >> >> > >> >> >> > Ok, I'm very frustrated. I've tried every configuraiton I can and >> >> >> > parameters >> >> >> > and I cannot get fragments to show up in the highlighting in >> solr. >> >> (no >> >> >> > fragments at the bottom or highlights in the text. I >> must >> >> be >> >> >> > missing something but I'm just not sure what it is. >> >> >> > >> >> >> > >> >> >> >> >> >> /select/?qt=standard&q=crayon&hl=true&hl.fl=synopsis,shortdescription&hl.fragmenter=gap&hl.snippets=3&debugQuery=true >> >> >> > >> >> >> > And I get highlight segment, but no fragments or phrase >> >> highlighting. >> >> >> > >> >> >> > My goal - if I'm doing this completely wrong - is to get google >> like >> >> >> > snippets of text around the query term (or at mimimum to >> highlight >> >> the >> >> >> > query >> >> >> > term itself). >> >> >> > >> >> >> > Results: >> >> >> > >> >> >> > synopsis >> >> >> > true >> >> >> > 3 >> >> >> > gap >> >> >> > crayon >> >> >> > synopsis >> >> >> > standard >> >> >> > true >> >> >> > 2.1 >> >> >> > >> >> >> > >> >> >> > − >> >> >> > >> >> >> > − >> >> >> > ... >> >> >> > . >> >> >> > .. >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > "hic sunt dracones" >> >> >> > >> >> >> > >> >> >> >> >> >> -- >> >> >> View this message in context: >> >> >> http://www.nabble.com/Highlight-Fragments-tp19636705p19636915.html >> >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> >> >> >> > >> >> > >> >> > -- >> >> > "hic sunt dracones" >> >> > >> >> > >> >> >> >> -- >> >> View this message in context: >> >> http://www.nabble.com/Highlight-Fragments-tp19636705p19637261.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> > -- >> > "hic sunt dracones" >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Highlight-Fragments-tp19636705p19637801.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > "hic sunt dracones" > > -- View this message in context: http://www.nabble.com/Highlight-Fragments-tp19636705p19638296.html Sent from the Solr - User mailing list archive at Nabble.com.
Throughput Optimization
I've been running load tests over the past week or 2, and I can't figure out my system's bottle neck that prevents me from increasing throughput. First I'll describe my Solr setup, then what I've tried to optimize the system. I have 10 million records and 59 fields (all are indexed, 37 are stored, 17 have termVectors, 33 are multi-valued) which takes about 15GB of disk space. Most field values are very short (single word or number), and usually about half the fields have any data at all. I'm running on an 8-core, 64-bit, 32GB RAM Redhat box. I allocate about 24GB of memory to the java process, and my filterCache size is 700,000. I'm using a version of Solr between 1.3 and the current trunk (including the latest SOLR-667 (FastLRUCache) patch), and Tomcat 6.0. I'm running a ramp-test, increasing the number of users every few minutes. I measure the maximum number of requests that Solr can handle per second with a fixed response time, and call that my throughput. I'd like to see a single physical resource be maxed out at some point during my test so I know it is my bottle neck. I generated random queries for my dataset representing a more or less realistic scenario. The queries include faceting by up to 6 fields, and quering by up to 8 fields. I ran a baseline on the un-optimized setup, and saw peak CPU usage of about 50%, IO usage around 5%, and negligible network traffic. Interestingly, the CPU peaked when I had 8 concurrent users, and actually dropped down to about 40% when I increased the users beyond 8. Is that because I have 8 cores? I changed a few settings and observed the effect on throughput: 1. Increased filterCache size, and throughput increased by about 50%, but it seems to peak. 2. Put the entire index on a RAM disk, and significantly reduced the average response time, but my throughput didn't change (i.e. even though my response time was 10X faster, the maximum number of requests I could make per second didn't increase). This makes no sense to me, unless there is another bottle neck somewhere. 3. Reduced the number of records in my index. The throughput increased, but the shape of all my graphs stayed the same, and my CPU usage was identical. I have a few questions: 1. Can I get more than 50% CPU utilization? 2. Why does CPU utilization fall when I make more than 8 concurrent requests? 3. Is there an obvious bottleneck that I'm missing? 4. Does Tomcat have any settings that affect Solr performance? Any input is greatly appreciated. -- View this message in context: http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Throughput Optimization
Yes, I am seeing evictions. I've tried setting my filterCache higher, but then I start getting Out Of Memory exceptions. My filterCache hit ratio is > .99. It looks like I've hit a RAM bound here. I ran a test without faceting. The response times / throughput were both significantly higher, there were no evictions from the filter cache, but I still wasn't getting > 50% CPU utilization. Any thoughts on what physical bound I've hit in this case? Erik Hatcher wrote: > > One quick question are you seeing any evictions from your > filterCache? If so, it isn't set large enough to handle the faceting > you're doing. > > Erik > > > On Nov 4, 2008, at 8:01 PM, wojtekpia wrote: > >> >> I've been running load tests over the past week or 2, and I can't >> figure out >> my system's bottle neck that prevents me from increasing throughput. >> First >> I'll describe my Solr setup, then what I've tried to optimize the >> system. >> >> I have 10 million records and 59 fields (all are indexed, 37 are >> stored, 17 >> have termVectors, 33 are multi-valued) which takes about 15GB of >> disk space. >> Most field values are very short (single word or number), and >> usually about >> half the fields have any data at all. I'm running on an 8-core, 64- >> bit, 32GB >> RAM Redhat box. I allocate about 24GB of memory to the java process, >> and my >> filterCache size is 700,000. I'm using a version of Solr between 1.3 >> and the >> current trunk (including the latest SOLR-667 (FastLRUCache) patch), >> and >> Tomcat 6.0. >> >> I'm running a ramp-test, increasing the number of users every few >> minutes. I >> measure the maximum number of requests that Solr can handle per >> second with >> a fixed response time, and call that my throughput. I'd like to see >> a single >> physical resource be maxed out at some point during my test so I >> know it is >> my bottle neck. I generated random queries for my dataset >> representing a >> more or less realistic scenario. The queries include faceting by up >> to 6 >> fields, and quering by up to 8 fields. >> >> I ran a baseline on the un-optimized setup, and saw peak CPU usage >> of about >> 50%, IO usage around 5%, and negligible network traffic. >> Interestingly, the >> CPU peaked when I had 8 concurrent users, and actually dropped down >> to about >> 40% when I increased the users beyond 8. Is that because I have 8 >> cores? >> >> I changed a few settings and observed the effect on throughput: >> >> 1. Increased filterCache size, and throughput increased by about >> 50%, but it >> seems to peak. >> 2. Put the entire index on a RAM disk, and significantly reduced the >> average >> response time, but my throughput didn't change (i.e. even though my >> response >> time was 10X faster, the maximum number of requests I could make per >> second >> didn't increase). This makes no sense to me, unless there is another >> bottle >> neck somewhere. >> 3. Reduced the number of records in my index. The throughput >> increased, but >> the shape of all my graphs stayed the same, and my CPU usage was >> identical. >> >> I have a few questions: >> 1. Can I get more than 50% CPU utilization? >> 2. Why does CPU utilization fall when I make more than 8 concurrent >> requests? >> 3. Is there an obvious bottleneck that I'm missing? >> 4. Does Tomcat have any settings that affect Solr performance? >> >> Any input is greatly appreciated. >> >> -- >> View this message in context: >> http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Throughput-Optimization-tp20335132p20343425.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Throughput Optimization
Where is the alt directory in the source tree (or what is the JIRA issue number)? I'd like to apply this patch and re-run my tests. Does changing the lockType in solrconfig.xml address this issue? (My lockType is the default - single). markrmiller wrote: > > The latest alt directory patch uses It. > > - Mark > > -- View this message in context: http://www.nabble.com/Throughput-Optimization-tp20335132p20345965.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Throughput Optimization
My documentCache hit rate is ~.7, and my queryCache is ~.03. I'm using FastLRUCache on all 3 of the caches. Feak, Todd wrote: > > What are your other cache hit rates looking like? > Which caches are you using the FastLRUCache on? > > -Todd Feak > > -Original Message- > From: wojtekpia [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 05, 2008 8:15 AM > To: solr-user@lucene.apache.org > Subject: Re: Throughput Optimization > > > Yes, I am seeing evictions. I've tried setting my filterCache higher, > but > then I start getting Out Of Memory exceptions. My filterCache hit ratio > is > > .99. It looks like I've hit a RAM bound here. > > I ran a test without faceting. The response times / throughput were both > significantly higher, there were no evictions from the filter cache, but > I > still wasn't getting > 50% CPU utilization. Any thoughts on what > physical > bound I've hit in this case? > > > > Erik Hatcher wrote: >> >> One quick question are you seeing any evictions from your >> filterCache? If so, it isn't set large enough to handle the faceting > >> you're doing. >> >> Erik >> >> >> On Nov 4, 2008, at 8:01 PM, wojtekpia wrote: >> >>> >>> I've been running load tests over the past week or 2, and I can't >>> figure out >>> my system's bottle neck that prevents me from increasing throughput. > >>> First >>> I'll describe my Solr setup, then what I've tried to optimize the >>> system. >>> >>> I have 10 million records and 59 fields (all are indexed, 37 are >>> stored, 17 >>> have termVectors, 33 are multi-valued) which takes about 15GB of >>> disk space. >>> Most field values are very short (single word or number), and >>> usually about >>> half the fields have any data at all. I'm running on an 8-core, 64- >>> bit, 32GB >>> RAM Redhat box. I allocate about 24GB of memory to the java process, > >>> and my >>> filterCache size is 700,000. I'm using a version of Solr between 1.3 > >>> and the >>> current trunk (including the latest SOLR-667 (FastLRUCache) patch), >>> and >>> Tomcat 6.0. >>> >>> I'm running a ramp-test, increasing the number of users every few >>> minutes. I >>> measure the maximum number of requests that Solr can handle per >>> second with >>> a fixed response time, and call that my throughput. I'd like to see >>> a single >>> physical resource be maxed out at some point during my test so I >>> know it is >>> my bottle neck. I generated random queries for my dataset >>> representing a >>> more or less realistic scenario. The queries include faceting by up >>> to 6 >>> fields, and quering by up to 8 fields. >>> >>> I ran a baseline on the un-optimized setup, and saw peak CPU usage >>> of about >>> 50%, IO usage around 5%, and negligible network traffic. >>> Interestingly, the >>> CPU peaked when I had 8 concurrent users, and actually dropped down >>> to about >>> 40% when I increased the users beyond 8. Is that because I have 8 >>> cores? >>> >>> I changed a few settings and observed the effect on throughput: >>> >>> 1. Increased filterCache size, and throughput increased by about >>> 50%, but it >>> seems to peak. >>> 2. Put the entire index on a RAM disk, and significantly reduced the > >>> average >>> response time, but my throughput didn't change (i.e. even though my >>> response >>> time was 10X faster, the maximum number of requests I could make per > >>> second >>> didn't increase). This makes no sense to me, unless there is another > >>> bottle >>> neck somewhere. >>> 3. Reduced the number of records in my index. The throughput >>> increased, but >>> the shape of all my graphs stayed the same, and my CPU usage was >>> identical. >>> >>> I have a few questions: >>> 1. Can I get more than 50% CPU utilization? >>> 2. Why does CPU utilization fall when I make more than 8 concurrent >>> requests? >>> 3. Is there an obvious bottleneck that I'm missing? >>> 4. Does Tomcat have any settings that affect Solr performance? >>> >>> Any input is greatly appreciated. >>> >>> -- >>> View this message in context: >>> > http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> > > -- > View this message in context: > http://www.nabble.com/Throughput-Optimization-tp20335132p20343425.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > > -- View this message in context: http://www.nabble.com/Throughput-Optimization-tp20335132p20346663.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Throughput Optimization
I'll try changing my other caches to LRUCache and observe performance. Interestingly, the FastLRUCache has given me a ~10% increase in performance, much lower than I've read on the SOLR-667 thread. Would compressing some of my stored fields significantly improve performance? Most of my stored fields contain single words or numbers, but I do have one relatively large stored field that contains up to a couple paragraphs of text. I agree that my 3% query cache hit rate is quite low (probably unrealistically low). I'm treating these results as the worst-case. Feak, Todd wrote: > > Yonik said something about the FastLRUCache giving the most gain for > high hit-rates and the LRUCache being faster for low hit-rates. It's in > his Nov 1 comment on SOLR-667. I'm not sure if anything changed since > then, as it's an active issue, but you may want to try the LRUCache for > your query cache. > > It sounds like you are memory bound already, but you may want to > investigate the tradeoffs of your filter cache vs. document cache. High > document hit-rate was a big performance boost for us, as document > garbage collection is a lot of overhead. I believe that would show up as > CPU usage though, so it may not be your bottleneck. > > This also brings up an interesting question. 3% hit rate on your query > cache seems low to me. Are you sure your load test is mimicking > realistic query patterns from your user base? I realize this probably > isn't part of your bottleneck, just curious. > -- View this message in context: http://www.nabble.com/Throughput-Optimization-tp20335132p20348749.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Throughput Optimization
I'd like to integrate this improvement into my deployment. Is it just a matter of getting the latest Lucene jars (Lucene nightly build)? Yonik Seeley wrote: > > You're probably hitting some contention with the locking around the > reading of index files... this has been recently improved in Lucene > for non-Windows boxes, and we're integrating that into Solr (should > def be in the next release). > > -Yonik > > -- View this message in context: http://www.nabble.com/Throughput-Optimization-tp20335132p20349247.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
Is there a configurable way to switch to the previous implementation? I'd like to see exactly how it affects performance in my case. Yonik Seeley wrote: > > And if you want to verify that the new faceting code has indeed kicked > in, some statistics are logged, like: > > Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert > INFO: UnInverted multi-valued field features, memSize=14584, time=47, > phase1=47, > nTerms=285, bigTerms=99, termInstances=186 > > -Yonik > > -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
Definitely, but it'll take me a few days. I'll also report findings on SOLR-465. (I've been on holiday for a few weeks) Noble Paul നോബിള് नोब्ळ् wrote: > > wojtek, you can report back the numbers if possible > > It would be nice to know how the new impl performs in real-world > > > -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20798456.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Throughput Optimization
It looks like file locking was the bottleneck - CPU usage is up to ~98% (from the previous peak of ~50%). I'm running the trunk code from Dec 2 with the faceting improvement (SOLR-475) turned off. Thanks for all the help! Yonik Seeley wrote: > > FYI, SOLR-465 has been committed. Let us know if it improves your > scenario. > > -Yonik > -- View this message in context: http://www.nabble.com/Throughput-Optimization-tp20335132p20840017.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
I'm seeing some strange behavior with my garbage collector that disappears when I turn off this optimization. I'm running load tests on my deployment. For the first few minutes, everything is fine (and this patch does make things faster - I haven't quantified the improvement yet). After that, the garbage collector stops collecting. Specifically, the new generation part of the heap is full, but never garbage collected, and the old generation is emptied, then never gets anything more. This throttles Solr performance (average response times that used to be ~500ms are now ~25s). I described my deployment scenario in an earlier post: http://www.nabble.com/Throughput-Optimization-td20335132.html Does it sound like the new faceting algorithm could be the culprit? wojtekpia wrote: > > Definitely, but it'll take me a few days. I'll also report findings on > SOLR-465. (I've been on holiday for a few weeks) > > > Noble Paul നോബിള് नोब्ळ् wrote: >> >> wojtek, you can report back the numbers if possible >> >> It would be nice to know how the new impl performs in real-world >> >> >> > > -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20840622.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Throughput Optimization
New faceting stuff off because I'm encountering some problems when I turn it on, I posted the details: http://www.nabble.com/new-faceting-algorithm-td20674902.html#a20840622 Yonik Seeley wrote: > > On Thu, Dec 4, 2008 at 1:54 PM, wojtekpia <[EMAIL PROTECTED]> wrote: >> It looks like file locking was the bottleneck - CPU usage is up to ~98% >> (from >> the previous peak of ~50%). > > Great to hear it! > >> I'm running the trunk code from Dec 2 with the >> faceting improvement (SOLR-475) turned off. Thanks for all the help! > > new faceting stuff off because it didn't improve things in your case, > or because you didn't want to change that variable just now? > > -Yonik > -- View this message in context: http://www.nabble.com/Throughput-Optimization-tp20335132p20840668.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
Yonik Seeley wrote: > > > Are you doing commits at any time? > One possibility is the caching mechanism (weak-ref on the > IndexReader)... that's going to be changing soon hopefully. > > -Yonik > No commits during this test. Should I start looking into my heap size distribution and garbage collector selection? -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20841219.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: NIO not working yet
I've updated my deployment to use NIOFSDirectory. Now I'd like to confirm some previous results with the original FSDirectory. Can I turn it off with a parameter? I tried: java -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.FSDirectory ... but that didn't work. -- View this message in context: http://www.nabble.com/NIO-not-working-yet-tp20468152p20845732.html Sent from the Solr - User mailing list archive at Nabble.com.
Smaller filterCache giving better performance
I've seen some strangle results in the last few days of testing, but this one flies in the face of everything I've read on this forum: Reducing filterCache size has increased performance. I have posted my setup here: http://www.nabble.com/Throughput-Optimization-td20335132.html. My original filterCache was 700,000. Reducing it to 20,000, I found: - Average response time decreased by 85% - Average throughput increased by 250% - CPU time used by the garbage collector decreased by 85% - The system showed to weird GC issues (reported yesterday at: http://www.nabble.com/new-faceting-algorithm-td20674902.html) Further reducing the filterCache to 10,000 - Average response time decreased by another 27% - Average throughput increased by another 30% - GC CPU usage also dropped - System behavior changed after ~30 minutes, with a slight performance degradation These results came from a load test. I'm running trunk code from Dec 2 with Yonik's faceting improvement turned on. Any thoughts? -- View this message in context: http://www.nabble.com/Smaller-filterCache-giving-better-performance-tp20863674p20863674.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Smaller filterCache giving better performance
Reducing the amount of memory given to java slowed down Solr at first, then quickly caused the garbage collector to behave badly (same issue as I referenced above). I am using the concurrent cache for all my caches. -- View this message in context: http://www.nabble.com/Smaller-filterCache-giving-better-performance-tp20863674p20864928.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
It looks like my filterCache was too big. I reduced my filterCache size from 700,000 to 20,000 (without changing the heap size) and all my performance issues went away. I experimented with various GC settings, but none of them made a significant difference. I see a 16% increase in throughput by applying this patch. Yonik Seeley wrote: > > ... This can be a big chunk of memory > per-request, and is most likely what changed your GC profile (i.e. > changing the GC settings may help). > > -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20984502.html Sent from the Solr - User mailing list archive at Nabble.com.
Snapinstaller vs Solr Restart
I'm running load tests against my Solr instance. I find that it typically takes ~10 minutes for my Solr setup to "warm-up" while I throw my test queries at it. Also, I have the same two warm-up queries specified for the firstSearcher and newSearcher event listeners. I'm now benchmarking the affect of updating an index under load. I'm finding that after running snapinstaller, Solr takes ~1 hour to get back to the same performance numbers I was getting 10 minutes after a restart. If I can justify being offline for a few moments, it seems like I'll be better off restarting Solr rather than running Snapinstaller. Any ideas why? Thanks. -- View this message in context: http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21315273.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Snapinstaller vs Solr Restart
Sorry, I forgot to include that. All my autowarmcount's are set to 0. Feak, Todd wrote: > > First suspect would be Filter Cache settings and Query Cache settings. > > If they are auto-warming at all, then there is a definite difference > between the first start behavior and the post-commit behavior. This > affects what's in memory, caches, etc. > > -Todd Feak > > > -- View this message in context: http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21315654.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Snapinstaller vs Solr Restart
I use my warm up queries to fill the field cache (or at least that's the idea). My filterCache hit rate is ~99% & queryResultCache is ~65%. I update my index several times a day with no 'optimize', and performance is seemless. I also update my index once nightly with an 'optimize', and that's where I see the performance drop. I'll try turning autowarming on. Could this have to do with file caching by the OS? Otis Gospodnetic wrote: > > Is autowarm count of 0 a good idea, though? > If you don't want to autowarm any caches, doesn't that imply that you have > very low hit rate and therefore don't care to autowarm? And if you have a > very low hit rate, then perhaps caches are not needed at all? > > > How about this. Do you optimize your index at any point? > -- View this message in context: http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21319344.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Snapinstaller vs Solr Restart
I'm optimizing because I thought I should. I'll be updating my index somewhere between every 15 minutes, and every 2 hours. That means between 12 and 96 updates per day. That seems like a lot of index files (and it scared me a little), so that's my second reason for wanting to optimize nightly. I haven't benchmarked the performance hit for not optimizing. That'll be my next step. If the hit isn't too bad, I'll look into optimizing less frequently (weekly, ...). Thanks Otis! Otis Gospodnetic wrote: > > OK, so that question/answer seems to have hit the nail on the head. :) > > When you optimize your index, all index files get rewritten. This means > that everything that the OS cached up to that point goes out the window > and the OS has to slowly re-cache the hot parts of the index. If you > don't optimize, this won't happen. Do you really need to optimize? Or > maybe a more direct question: why are you optimizing? > > > Regarding autowarming, with such high fq hit rate, I'd make good use of fq > autowarming. The result cache rate is lower, but still decent. I > wouldn't turn off autowarming the way you have. > > -- View this message in context: http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21320334.html Sent from the Solr - User mailing list archive at Nabble.com.
Overlapping Replication Scripts
I have set up cron jobs that update my index every 15 minutes. I have a distributed setup, so the steps are: 1. Update index on indexer machine (and possibly optimize) 2. Invoke snapshooter on indexer 3. Invoke snappuller on searcher 4. Invoke snapinstaller on searcher. These updates are small, don't optimize, and take at most 2 minutes (end to end). Nightly (or weekly or monthly), I will want to optimize my index. When I optimize, the end-to-end time jumps to ~1 hour. With my current cron job setup, that means I'll trigger 3 more unoptimized updates before the optimized update completes. What happens if I overlap the execution of my cron jobs? Do any of these scripts detect that another instance is already executing? Thanks. -- View this message in context: http://www.nabble.com/Overlapping-Replication-Scripts-tp21362434p21362434.html Sent from the Solr - User mailing list archive at Nabble.com.
Performance Hit for Zero Record Dataimport
I have a transient SQL table that I use to load data into Solr using the DataImportHandler. I run an update every 15 minutes (dataimport?command=full-import&clean=false&optimize=false), but my table will frequently have no new data for me to import. When the table contains no data, it looks like Solr is doing a lot more work than it needs to. The performance degradation is the same for loading zero records as it is for loading a couple thousand records (while the system is under heavy load). I noticed that when no data is imported, no new index files are created, so it seems like something (Lucene?) is aware of the empty update. But since the performance degradation is the same, I'm guessing that a new Searcher is still created, warmed, and registered. Is that correct? -- View this message in context: http://www.nabble.com/Performance-Hit-for-Zero-Record-Dataimport-tp21572935p21572935.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance Hit for Zero Record Dataimport
Thanks Shalin, a short circuit would definitely solve it. Should I open a JIRA issue? Shalin Shekhar Mangar wrote: > > I guess Data Import Handler still calls commit even if there were no > documents created. We can add a short circuit in the code to make sure > that > does not happen. > -- View this message in context: http://www.nabble.com/Performance-Hit-for-Zero-Record-Dataimport-tp21572935p21588124.html Sent from the Solr - User mailing list archive at Nabble.com.
Performance "dead-zone" due to garbage collection
I'm intermittently experiencing severe performance drops due to Java garbage collection. I'm allocating a lot of RAM to my Java process (27GB of the 32GB physically available). Under heavy load, the performance drops approximately every 10 minutes, and the drop lasts for 30-40 seconds. This coincides with the size of the old generation heap dropping from ~27GB to ~6GB. Is there a way to reduce the impact of garbage collection? A couple ideas we've come up with (but haven't tried yet) are: increasing the minimum heap size, more frequent (but hopefully less costly) garbage collection. Thanks, Wojtek -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21588427.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance Hit for Zero Record Dataimport
Created SOLR 974: https://issues.apache.org/jira/browse/SOLR-974 -- View this message in context: http://www.nabble.com/Performance-Hit-for-Zero-Record-Dataimport-tp21572935p21588634.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance "dead-zone" due to garbage collection
I'm using a recent version of Sun's JVM (6 update 7) and am using the concurrent generational collector. I've tried several other collectors, none seemed to help the situation. I've tried reducing my heap allocation. The search performance got worse as I reduced the heap. I didn't monitor the garbage collector in those tests, but I imagine that it would've gotten better. (As a side note, I do lots of faceting and sorting, I have 10M records in this index, with an approximate index file size of 10GB). This index is on a single machine, in a single Solr core. Would splitting it across multiple Solr cores on a single machine help? I'd like to find the limit of this machine before spreading the data to more machines. Thanks, Wojtek -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21590150.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance "dead-zone" due to garbage collection
(Thanks for the responses) My filterCache hit rate is ~60% (so I'll try making it bigger), and I am CPU bound. How do I measure the size of my per-request garbage? Is it (total heap size before collection - total heap size after collection) / # of requests to cause a collection? I'll try your suggestions and post back any useful results. -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21593661.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Intermittent high response times
I'm experiencing similar issues. Mine seem to be related to old generation garbage collection. Can you monitor your garbage collection activity? (I'm using JConsole to monitor it: http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html). In my system, garbage collection usually doesn't cause any trouble. But once in a while, the size of the old generation flat-lines for some time (~dozens of seconds). When this happens, I see really bad response times from Solr (not quite as bad as you're seeing, but almost). The old-gen flat-lines always seem to be right before, or right after the old-gen is garbage collected. -- View this message in context: http://www.nabble.com/Intermittent-high-response-times-tp21602475p21608986.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance "dead-zone" due to garbage collection
I'm not sure if you suggested it, but I'd like to try the IBM JVM. Aside from setting my JRE paths, is there anything else I need to do run inside the IBM JVM? (e.g. re-compiling?) Walter Underwood wrote: > > What JVM and garbage collector setting? We are using the IBM JVM with > their concurrent generational collector. I would strongly recommend > trying a similar collector on your JVM. Hint: how much memory is in > use after a full GC? That is a good approximation to the working set. > > -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21616078.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Intermittent high response times
The type of garbage collector definitely affects performance, but there are other settings as well. There's a related thread currently discussing this: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-td21588427.html hbi dev wrote: > > Hi wojtekpia, > > That's interesting, I shall be looking into this over the weekend so I > shall > look at the GC also. I was briefly reading about GC last night, am I right > in thinking it could be affected by what version of the jvm I'm using > (1.5.0.8), and also what type of Collector is set? What collector is the > default, and what would people recommend for an application like Solr? > Thanks > Waseem > -- View this message in context: http://www.nabble.com/Intermittent-high-response-times-tp21602475p21628769.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Performance "dead-zone" due to garbage collection
I profiled our application, and GC is definitely the problem. The IBM JVM didn't change much. I'm currently looking into ways of reducing my memory footprint. -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21758001.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr on Sun Java Real-Time System
Has anyone tried Solr on the Sun Java Real-Time JVM (http://java.sun.com/javase/technologies/realtime/index.jsp)? I've read that it includes better control over the garbage collector. Thanks. Wojtek -- View this message in context: http://www.nabble.com/Solr-on-Sun-Java-Real-Time-System-tp21758035p21758035.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance "dead-zone" due to garbage collection
I noticed your wiki post about sorting with a function query instead of the Lucene sort mechanism. Did you see a significantly reduced memory footprint by doing this? Did you reduce the number of fields you allowed users to sort by? Lance Norskog-2 wrote: > > Sorting creates a large array with "roughly" an entry for every document > in > the index. If it is not on an 'integer' field it takes even more memory. > If > you do a sorted request and then don't sort for a while, that will drop > the > sort structures and trigger a giant GC. > > We went through some serious craziness with sorting. > -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21814038.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom Sorting Algorithm
Is an easy way to choose/create an alternate sorting algorithm? I'm frequently dealing with large result sets (a few million results) and I might be able to benefit domain knowledge in my sort. -- View this message in context: http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21837721.html Sent from the Solr - User mailing list archive at Nabble.com.
Queued Requests during GC
During full garbage collection, Solr doesn't acknowledge incoming requests. Any requests that were received during the GC are timestamped the moment GC finishes (at least that's what my logs show). Is there a limit to how many requests can queue up during a full GC? This doesn't seem like a Solr setting, but rather a container/OS setting (I'm using Tomcat on Linux). Thanks. Wojtek -- View this message in context: http://www.nabble.com/Queued-Requests-during-GC-tp21837898p21837898.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom Sorting Algorithm
That's not quite what I meant. I'm not looking for a custom comparator, I'm looking for a custom sorting algorithm. Is there a way to use quick sort or merge sort or... rather than the current algorithm? Also, what is the current algorithm? Otis Gospodnetic wrote: > > > You can use one of the exiting function queries (if they fit your need) or > write a custom function query to reorder the results of a query. > > -- View this message in context: http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21838804.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom Sorting Algorithm
Ok, so maybe a better question is: should I bother trying to change the "sorting" algorithm? I'm concerned that with large data sets, sorting becomes a severe bottleneck (this is an assumption, I haven't profiled anything to verify). Does it become a severe bottleneck? Do you know if alternate sort algorithms have been tried during Lucene development? markrmiller wrote: > > It would not be simple to use a new algorithm. The current > implementation takes place at the Lucene level and uses a priority > queue. When you ask for the top n results, a priority queue of size n is > filled with all of the matching documents. The ordering in the priority > queue is the sort. The on Sort method orders by relevance score - the > Sort method orders by field, relevance, or doc id. > -- View this message in context: http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21840299.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance "dead-zone" due to garbage collection
I've been able to reduce these GC outages by: 1) Optimizing my schema. This reduced my index size by more than 50% 2) Smaller cache sizes. I started with filterCache, documentCache & queryCache sizes of ~10,000. They're now at ~500 3) Reduce heap allocation. I started at 27 GB, now I'm 'only' allocating 8 GB 4) Update to trunk (was using Dec 2/08 code, now using Jan 26/09) I still see outages due to garbage collection every ~10 minutes, but they last ~2 seconds (instead of 20+ seconds). Note that my throughput dropped from ~30 hits/second to ~23 hits/second. Luckily, I'm still hitting my performance requirements, so I'm able to accept that. Thanks for the tips! Wojtek yonik wrote: > > On Tue, Feb 3, 2009 at 11:58 AM, wojtekpia wrote: >> I noticed your wiki post about sorting with a function query instead of >> the >> Lucene sort mechanism. Did you see a significantly reduced memory >> footprint >> by doing this? > > FunctionQuery derives field values from the FieldCache... so it would > use the same amount of memory as sorting. > > -Yonik > > -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21922773.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance "dead-zone" due to garbage collection
I tried sorting using a function query instead of the Lucene sort and found no change in performance. I wonder if Lance's results are related to something specific to his deployment? -- View this message in context: http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21922851.html Sent from the Solr - User mailing list archive at Nabble.com.
Performance degradation caused by choice of range fields
In my schema I have two copies of my numeric fields: one with the original value (used for display, sort), and one with a rounded version of the original value (used for range queries). When I use my rounded field for numeric range queries (e.g. q=RoundedValue:[100 TO 1000]), I see very consistent results under load. My hit rate stays the same (at ~23 hits/sec) throughout long running load tests. When I use my original field for range queries, I get performance degradation over time (while under load), rather than consistently worse throughput. For the first 15 minutes, I see throughput similar to my throughput with rounded values, of about 23 hits/second. For the next 15 minutes, I'm down to about 20 hits/second. For the next 15 minutes, I'm down to about 18 hits/second, etc. I expected worse performance by using the non-rounded original value, but I didn't expect degradation. I expected to see throughput of X < 23 hits/second, but consistent at all times. I don't understand why my performance gets worse over time. Any ideas why? I have ~1000 unique values in my rounded field, and ~ 100,000 unique values in my un-rounded field. Thanks. Wojtek -- View this message in context: http://www.nabble.com/Performance-degradation-caused-by-choice-of-range-fields-tp21924197p21924197.html Sent from the Solr - User mailing list archive at Nabble.com.
Recent Paging Change?
Has there been a recent change (since Dec 2/08) in the paging algorithm? I'm seeing much worse performance (75% drop in throughput) when I request 20 records starting at record 180 (page 10 in my application). Thanks. Wojtek -- View this message in context: http://www.nabble.com/Recent-Paging-Change--tp21946610p21946610.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Recent Paging Change?
I'll run a profiler on new and old code and let you know what I find. I have changed my schema between tests: I used to have termVectors turned on for several fields, and now they are always off. My underlying data has not changed. -- View this message in context: http://www.nabble.com/Recent-Paging-Change--tp21946610p21958267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance degradation caused by choice of range fields
Yes, I commit roughly every 15 minutes (via a data update). This update is consistent between my tests, and only causes a performance drop when I'm sorting on fields with many unique values. I've examined my GC logs, and they are also consistent between my tests. Otis Gospodnetic wrote: > > Hi, > > Did you commit (reopen the searcher) during the performance degradation > period and did any of your queries use sort? If so, perhaps your JVM is > accumulating those thrown-away FieldCache objects and then GC has more and > more garbage to clean up, causing pauses and lowering your overall > throughput. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > -- View this message in context: http://www.nabble.com/Performance-degradation-caused-by-choice-of-range-fields-tp21924197p21958268.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Recent Paging Change?
This was a false alarm, sorry. I misinterpreted some results. wojtekpia wrote: > > Has there been a recent change (since Dec 2/08) in the paging algorithm? > I'm seeing much worse performance (75% drop in throughput) when I request > 20 records starting at record 180 (page 10 in my application). > > Edit: the 75% drop is compared to my throughput for page 10 queries using > Dec 2/08 code. > > Thanks. > > Wojtek > -- View this message in context: http://www.nabble.com/Recent-Paging-Change--tp21946610p21969121.html Sent from the Solr - User mailing list archive at Nabble.com.
Reading Core-Specific Config File in a Row Transformer
I'm using the DataImportHandler to load data. I created a custom row transformer, and inside of it I'm reading a configuration file. I am using the system's solr.solr.home property to figure out which directory the file should be in. That works for a single-core deployment, but not for multi-core deployments (since I'm always looking in solr.solr.home/conf/file.txt). Is there a clean way to resolve the actual conf directory path from within a custom row transformer so that it works for both single-core and multi-core deployments? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Reading-Core-Specific-Config-File-in-a-Row-Transformer-tp22069449p22069449.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reading Core-Specific Config File in a Row Transformer
Thanks Shalin. I think you missed the call to .getResourceLoader(), so it should be: context.getSolrCore().getResourceLoader().getInstanceDir() Works great, thanks! Shalin Shekhar Mangar wrote: > > > You can use Context.getSolrCore().getInstanceDir() > > -- View this message in context: http://www.nabble.com/Reading-Core-Specific-Config-File-in-a-Row-Transformer-tp22069449p22086846.html Sent from the Solr - User mailing list archive at Nabble.com.
Redhat vs FreeBSD vs other unix flavors
Is there a recommended unix flavor for deploying Solr on? I've benchmarked my deployment on Red Hat. Our operations team asked if we can use FreeBSD instead. Assuming that my benchmark numbers are consistent on FreeBSD, is there anything else I should watch out for? Thanks. Wojtek -- View this message in context: http://www.nabble.com/Redhat-vs-FreeBSD-vs-other-unix-flavors-tp22251134p22251134.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Redhat vs FreeBSD vs other unix flavors
Thanks Otis. Do you know what the most common deployment OS is? I couldn't find much on the mailing list or http://wiki.apache.org/solr/PublicServers Otis Gospodnetic wrote: > > > You should be fine on either Linux or FreeBSD (or any other UNIX flavour). > Running on Solaris would probably give you access to goodness like dtrace, > but you can live without it. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > -- View this message in context: http://www.nabble.com/Redhat-vs-FreeBSD-vs-other-unix-flavors-tp22251134p22251260.html Sent from the Solr - User mailing list archive at Nabble.com.
JVM exception_access_violation
I'm running Solr on Tomcat 6.0.18 with Java 6 update 7 on Windows 2003 64 bit. Over the past month or so, my JVM has crashed twice with the error below. Has anyone experienced this? My system is not heavily loaded, and the crash seems to coincide with an update (via DIH). I'm running trunk code from late January. Note that I update my index ~50 times per day, and this crash has happened twice in the past month (so 2 of 1500 updates seem to have triggered the crash). This Windows deployment is for demos, so I'm not too concerned about it. Interestingly, my production deployment is on a 64 bit Linux system (same versions of everything) and I haven't been able to reproduce the bug there. # # An unexpected error has been detected by Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x080e51c3, pid=4404, tid=956 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b23 mixed mode windows-amd64) # Problematic frame: # V [jvm.dll+0xe51c3] # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x01de2000): GCTaskThread [stack: 0x,0x] [id=956] siginfo: ExceptionCode=0xc005, reading address 0x Registers: EAX=0x3000, EBX=0x01e40330, ECX=0x000184b49821, EDX=0x000184b4b580 ESP=0x07cff9b0, EBP=0x, ESI=0x000184b4b580, EDI=0x0935 EIP=0x080e51c3, EFLAGS=0x00010206 Top of Stack: (sp=0x07cff9b0) 0x07cff9b0: 01e40330 0x07cff9c0: 000184b4dd88 0935 0x07cff9d0: 08464b08 01dbbdc0 0x07cff9e0: 01dbf190 8a65 0x07cff9f0: 2f5b4000 0002015f 0x07cffa00: 0002 01dbf2f0 0x07cffa10: 01e40330 01dbf430 0x07cffa20: 01dbf4f0 000201602d18 0x07cffa30: 07effa00 07cffb40 0x07cffa40: 0x07cffa50: 0830484d 0x07cffa60: 0002015f 0002 0x07cffa70: 0048 0001 0x07cffa80: 0001 00bb8501 0x07cffa90: 01dbf378 080ea807 0x07cffaa0: 07cffb40 07cffb40 Instructions: (pc=0x080e51c3) 0x080e51b3: 4c 8d 44 24 20 48 8b d6 48 8b 41 10 48 83 c1 10 0x080e51c3: ff 90 c0 01 00 00 44 8b 1d 08 f2 44 00 45 85 db Stack: [0x,0x], sp=0x07cff9b0, free space=127998k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [jvm.dll+0xe51c3] [error occurred during error reporting (printing native stack), id 0xc005] --- P R O C E S S --- Java Threads: ( => current thread ) 0x10286c00 JavaThread "Thread-135" daemon [_thread_blocked, id=4892, stack(0x1169,0x1179)] 0x10285400 JavaThread "http-8084-10" daemon [_thread_blocked, id=5108, stack(0x1201,0x1211)] 0x10287400 JavaThread "http-8084-9" daemon [_thread_blocked, id=1772, stack(0x149a,0x14aa)] 0x1028a400 JavaThread "http-8084-8" daemon [_thread_blocked, id=1656, stack(0x11f1,0x1201)] 0x01dc2c00 JavaThread "http-8084-7" daemon [_thread_blocked, id=2056, stack(0x11e1,0x11f1)] 0x10288400 JavaThread "http-8084-6" daemon [_thread_blocked, id=4792, stack(0x11d1,0x11e1)] 0x10286800 JavaThread "MultiThreadedHttpConnectionManager cleanup" daemon [_thread_blocked, id=3792, stack(0x1251,0x1261)] 0x0f6e8400 JavaThread "http-8084-5" daemon [_thread_blocked, id=3540, stack(0x11c1,0x11d1)] 0x0f6e7800 JavaThread "http-8084-4" daemon [_thread_blocked, id=4048, stack(0x11b1,0x11c1)] 0x0f6e8000 JavaThread "http-8084-3" daemon [_thread_blocked, id=1932, stack(0x1159,0x1169)] 0x0f6e7000 JavaThread "http-8084-2" daemon [_thread_blocked, id=996, stack(0x1149,0x1159)] 0x01dc6000 JavaThread "http-8084-1" daemon [_thread_blocked, id=4924, stack(0x1139,0x1149)] 0x01dc5800 JavaThread "TP-Monitor" daemon [_thread_blocked, id=2288, stack(0x1121,0x1131)] 0x01dc5400 JavaThread "TP-Processor4" daemon [_thread_in_native, id=4588, stack(0x,0x1121)] 0x01dc4c00 JavaThread "TP-Processor3" daemon [_thread_blocked, id=652, stack(0x1101,0x)] 0x01dc4400
Sorting by 'starts with'
I have an index of product names. I'd like to sort results so that entries starting with the user query come first. E.g. q=kitchen Results would sort something like: 1. kitchen appliance 2. kitchenaid dishwasher 3. fridge for kitchen It looks like using a query Function Query comes close, but I don't know how to write a subquery that only matches if the value starts with the query string. Has anyone solved a similar need? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Sorting-by-%27starts-with%27-tp23432815p23432815.html Sent from the Solr - User mailing list archive at Nabble.com.
preImportDeleteQuery
Hi, I'm importing data using the DIH. I manage all my data updates outside of Solr, so I use the full-import command to update my index (with clean=false). Everything works fine, except that I can't delete documents easily using the DIH. I noticed the preImportDeleteQuery attribute, but doesn't seem to do what I'm looking for. I'm looking to do something like: preImportDeleteQuery="ItemId={select ItemId from table where status='delete'}" http://issues.apache.org/jira/browse/SOLR-1059 SOLR-1059 seems to address this, but I couldn't find any documentation for it in the wiki. Can someone provide an example of how to use this? Thanks, Wojtek -- View this message in context: http://www.nabble.com/preImportDeleteQuery-tp23437674p23437674.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: preImportDeleteQuery
I'm using full-import, not delta-import. I tried it with delta-import, and it would work, except that I'm querying for a large number of documents so I can't afford the cost of deltaImportQuery for each document. It sounds like $deleteDocId will work. I just need to update from 1.3 to trunk. Thanks! Noble Paul നോബിള് नोब्ळ्-2 wrote: > > are you doing a full-import or a delta-import? > > for delta-import there is an option of deletedPkQuery which should > meet your needs > > -- View this message in context: http://www.nabble.com/preImportDeleteQuery-tp23437674p23450308.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JVM exception_access_violation
I updated to Java 6 update 13 and have been running problem free for just over a month. I'll continue this thread if I run into any problems that seem to be related. Yonik Seeley-2 wrote: > > I assume that you're not using any Tomcat native libs? If you are, > try removing them... if not (and the crash happened more than once in > the same place) then it looks like a JVM bug rather than flakey > hardware and the easiest path forward would be to try the latest Java6 > (update 12). > -- View this message in context: http://www.nabble.com/JVM-exception_access_violation-tp22623667p23451994.html Sent from the Solr - User mailing list archive at Nabble.com.