from:"wojtekpia"

Re: performance sorting multivalued field

2010-06-24 Thread wojtekpia



Chris Hostetter-3 wrote:
> 
> sorting on a multivalued is defined to have un-specified behavior.  it 
> might fail with an error, or it might fail silently.
> 

I learned this the hard way, it failed silently for a long time until it
failed with an error: 
http://lucene.472066.n3.nabble.com/Different-sort-behavior-on-same-code-td503761.html

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p920012.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SEVERE: Unable to move index file

2010-09-30 Thread wojtekpia


Hi,
I ran into this problem again the other night. I've looked through my log
files in more detail, and nothing seems out of place (I stripped user
queries out and included it below). I have the following setup:
1. Indexer has 2 cores. One core gets incremental updates, the other is for
full re-syncs with a database. The last step in my full re-sync process is
to swap cores (so that the searchers don't have to change their replication
master URLs).
2. Searcher that is subscribed to a constant indexer URL.

I noticed this replication error occurred right after I swapped my indexer's
cores. Since the index version and generation numbers are independent across
the 2 cores, could the searcher's index clean up be pre-emptively deleting
the active searcher index? When the error occurred, index.20100921053730 did
not exist, but index.properties was pointing to it. Previous entries in the
log make it seem like the directory did exist a few minutes earlier
(replication + warmup succeeded pointing at that directory). 

I've tried to reproduce this in a development environment, but haven't been
able to so far. 
https://issues.apache.org/jira/browse/SOLR-1822?focusedCommentId=12845175
SOLR-1822  seems to address a similar issue. I suspect that it would solve
what I'm seeing, but it treats the symptom rather than the cause (and I'd
like to be able to repro before trying it). Any insight/theories are
appreciated.

Thanks,

Wojtek

Sep 21, 2010 5:35:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's version: 1271723727936, generation: 18616
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's version: 1271723727935, generation: 18615
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Number of files in latest index in master: 118
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller
downloadIndexFiles
INFO: Skipping download for /solr/data/index.20100921053730/_13n9.prx
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller
downloadIndexFiles
INFO: Skipping download for /solr/data/index.20100921053730/_13nx.fnm
...
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller
downloadIndexFiles
INFO: Skipping download for /solr/data/index.20100921053730/_13m5.fnm
...
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller
downloadIndexFiles
INFO: Skipping download for /solr/data/index.20100921053730/_13n9.frq
Sep 21, 2010 5:37:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Total time taken for download : 0 secs
Sep 21, 2010 5:37:31 PM org.apache.solr.update.DirectUpdateHandler2
__AW_commit
INFO: start
commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
Sep 21, 2010 5:37:31 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening searc...@61080339 main
Sep 21, 2010 5:37:31 PM org.apache.solr.update.DirectUpdateHandler2
__AW_commit
INFO: end_commit_flush
Sep 21, 2010 5:37:31 PM org.apache.solr.search.SolrIndexSearcher __AW_warm
INFO: autowarming searc...@61080339 main from searc...@26aebd8c main

fieldValueCache{lookups=866,hits=866,hitratio=1.00,inserts=0,evictions=0,size=11,warmupTime=0,cumulative_lookups=493365,cumulative_hits=493351,cumulative_hitratio=0.99,cumulative_inserts=7,cumulative_evictions=0,item_FeaturesFacet={field=FeaturesFacet,memSize=51896798,tindexSize=56,time=988,phase1=936,nTerms=50,bigTerms=9,termInstances=5403271,uses=146},...}
...
Sep 21, 2010 5:37:31 PM org.apache.solr.search.SolrIndexSearcher __AW_warm
INFO: autowarming result for searc...@61080339 main

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=2036931,cumulative_hits=836191,cumulative_hitratio=0.41,cumulative_inserts=1200740,cumulative_evictions=1103563}
Sep 21, 2010 5:37:31 PM org.apache.solr.core.QuerySenderListener
__AW_newSearcher
INFO: QuerySenderListener sending requests to searc...@61080339 main
Sep 21, 2010 5:37:31 PM org.apache.solr.request.UnInvertedField uninvert
INFO: UnInverted multi-valued field
{field=BedFacet,memSize=48178130,tindexSize=42,time=313,phase1=261,nTerms=6,bigTerms=4,termInstances=328351,uses=0}
...
INFO: [] webapp=null path=null params={*:*} hits=11546888 status=0
QTime=20687 
Sep 21, 2010 5:37:58 PM org.apache.solr.core.QuerySenderListener
__AW_newSearcher
INFO: QuerySenderListener done.
Sep 21, 2010 5:37:58 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher searc...@61080339 main
Sep 21, 2010 5:37:58 PM org.apache.solr.search.SolrIndexSearcher __AW_close
INFO: Closing searc...@26aebd8c main

fieldValueCache{lookups=950,hits=950,hitratio=1.00,inserts=0,evictions=0,size=11,warmupTime=0,cumulative_lookups=493449,cumulative_hits=493435,cumulative_hitratio=0.99,cumulat

RE: One item, multiple fields, and range queries

2011-04-08 Thread wojtekpia

Hi Hoss,
I realize I'm reviving a really old thread, but I have the same need, and
SpanNumericRangeQuery sounds like a good solution for me. Can you give me
some guidance on how to implement that?

Thanks,

Wojtek

--
View this message in context: 
http://lucene.472066.n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p2796613.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr CMS Integration

2009-08-07 Thread wojtekpia


I've been asked to suggest a framework for managing a website's content and
making all that content searchable. I'm comfortable using Solr for search,
but I don't know where to start with the content management system. Is
anyone using a CMS (open source or commercial) that you've integrated with
Solr for search and are happy with? This will be a consumer facing website
with a combination or articles, blogs, white papers, etc.

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Solr-CMS-Integration-tp24868462p24868462.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr CMS Integration

2009-08-07 Thread wojtekpia


Thanks for the responses. I'll give Drupal a shot. It sounds like it'll do
the trick, and if it doesn't then at least I'll know what I'm looking for.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Solr-CMS-Integration-tp24868462p24870218.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facets with an IDF concept

2009-08-13 Thread wojtekpia


Hi Asif,

Did you end up implementing this as a custom sort order for facets? I'm
facing a similar problem, but not related to time. Given 2 terms:
A: appears twice in half the search results
B: appears once in every search result
I think term A is more "interesting". Using facets sorted by frequency, term
B is more important (since it shows up first). To me, terms that appear in
all documents aren't really that interesting. I'm thinking of using a
combination of document count (in the result set, not globally) and term
frequency (in the result set, not globally) to come up with a facet sort
order.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Facets-with-an-IDF-concept-tp24071160p24959192.html
Sent from the Solr - User mailing list archive at Nabble.com.

Searching and Displaying Different Logical Entities

2009-08-26 Thread wojtekpia

I'm trying to figure out if Solr is the right solution for a problem I'm
facing. I have 2 data entities: P(arent) & C(hild). P contains up to 100
instances of C. I need to expose an interface that searches attributes of
entity C, but displays them grouped by parent entity, P. I need to include
facet counts in the result, and the counts are based on P.

My first solution was to create 2 Solr instances: one for each entity. I
would have to execute 2 queries each time: 1) get a list of matching P's
based on a query of the C instance (facet by P ID in C instance to get
unique list of P's), then 2) get all P's by ID, including facet counts, etc.
The problem I face with this solution is that I can have many matching P's
(10,000+), so my second query will have many (10,000+) constraints.

My second (and current) solution is to create a single instance, and flatten
all C attributes into the appropriate P record using dynamic fields. For
example, if C has an attribute CA, then I have a dynamic field in P called
CA*. I name this field incrementally based on the number of C's per P (CA1,
CA2, ...). This works, except that each query is very long (CA1:condition
OR CA2: condition ...).

Neither solution is ideal. I'm wondering if I'm missing something obvious,
or if I'm using the wrong solution for this problem.

Any insight is appreciated.

Wojtek
--
View this message in context:
http://www.nabble.com/Searching-and-Displaying-Different-Logical-Entities-tp25156301p25156301.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Searching and Displaying Different Logical Entities

2009-08-27 Thread wojtekpia

Funtick wrote:
> 
>>then 2) get all P's by ID, including facet counts, etc.
>>The problem I face with this solution is that I can have many matching P's
> (10,000+), so my second query will have many (10,000+) constraints.
> 
> SOLR can automatically provide you P's with Counts, and it will be
> _unique_...
> 
> 

I assume you mean to facet by P in the C index. My next problem is to sort
those P's based on some attribute of P (as opposed to alphabetically or by
occurrence in C).

Funtick wrote:
> 
> Even if cardinality of P is 10,000+ SOLR is very fast now (expect few
> seconds response time for initial request). You need single query with
> "faceting"...
> 

Is there a practical limit for maxBooleanClauses? The default is 1024, but I
need at least 10,000.

Funtick wrote:
> 
> (!) You do not need P's ID.
> 
> Single document will have unique ID, and fields such as P, C (with
> possible
> attributes). Do not think in terms of RDBMS... Lucene does all
> 'normalization' behind the scenes, and SOLR will give you Ps with Cs... 
> 

If I put both P's and C's into a single index, then I agree, I don't need
P's ID. If I have P and C in separate indices then I still need to maintain
the logical relationship between P and C. 

It wasn't clear to me if you suggested I continue with either of my 2
proposed solutions. Can you clarify?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Searching-and-Displaying-Different-Logical-Entities-tp25156301p25181664.html
Sent from the Solr - User mailing list archive at Nabble.com.

Backups using Replication

2009-09-08 Thread wojtekpia


I'm trying to create data backups using the ReplicationHandler's built in
functionality. I've configured my master as 
http://wiki.apache.org/solr/SolrReplication documented :



...
optimize
...




but I don't see any backups created on the master. Do I need the snapshooter
script available? I did not deploy it on my master, I assumed it was part of
the 'old' way of doing replication.

If I invoke the backup command over HTTP
(http://master_host:port/solr/replication?command=backup) then it seems to
work - I get directories like "snapshot.20090908094423".

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Backups-using-Replication-tp25350083p25350083.html
Sent from the Solr - User mailing list archive at Nabble.com.

Passing FuntionQuery string parameters

2009-09-08 Thread wojtekpia


Hi,

I'm writing a function query to score documents based on Levenshtein
distance from a string. I want my function calls to look like:

lev(myFieldName, 'my string to match')

I'm running into trouble parsing the string I want to match ('my string to
match' above). It looks like all the built in support is for parsing field
names and numeric values. Am I missing the string parsing support, or is it
not there, and if not, why?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Passing-FuntionQuery-string-parameters-tp25351825p25351825.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Backups using Replication

2009-09-10 Thread wojtekpia


I'm using trunk from July 8, 2009. Do you know if it's more recent than that?


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> which version of Solr are you using? the "backupAfter" name was
> introduced recently
> 

-- 
View this message in context: 
http://www.nabble.com/Backups-using-Replication-tp25350083p25386886.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Passing FuntionQuery string parameters

2009-09-10 Thread wojtekpia


It looks like parseArg was added on Aug 20, 2009. I'm working with slightly
older code. Thanks!


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> did you implement your own ValueSourceParser . the
> FunctionQParser#parseArg() method supports strings
> 
> On Wed, Sep 9, 2009 at 12:10 AM, wojtekpia wrote:
>>
>> Hi,
>>
>> I'm writing a function query to score documents based on Levenshtein
>> distance from a string. I want my function calls to look like:
>>
>> lev(myFieldName, 'my string to match')
>>
>> I'm running into trouble parsing the string I want to match ('my string
>> to
>> match' above). It looks like all the built in support is for parsing
>> field
>> names and numeric values. Am I missing the string parsing support, or is
>> it
>> not there, and if not, why?
>>
>> Thanks,
>>
>> Wojtek
>> --
>> View this message in context:
>> http://www.nabble.com/Passing-FuntionQuery-string-parameters-tp25351825p25351825.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Passing-FuntionQuery-string-parameters-tp25351825p25386910.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Backups using Replication

2009-09-11 Thread wojtekpia


Do you mean that it's been renamed, so this should work?

 
 
... 
optimize 
... 
 
 


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> before that backupAfter was called "snapshot"
> 

-- 
View this message in context: 
http://www.nabble.com/Backups-using-Replication-tp25350083p25407695.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Backups using Replication

2009-09-11 Thread wojtekpia


I've verified that renaming backAfter to snapshot works (I should've checked
before asking). Thanks Noble!


wojtekpia wrote:
> 
> 
>  
>  
> ... 
> optimize 
> ... 
>  
>  
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Backups-using-Replication-tp25350083p25407846.html
Sent from the Solr - User mailing list archive at Nabble.com.

FileListEntityProcessor and LineEntityProcessor

2009-09-16 Thread wojtekpia


Hi,

I'm trying to import data from a list of files using the
FileListEntityProcessor. Here is my import configuration:

  
  

  
  

  

If I have only one file in d:\my\directory\ then everything works correctly.
If I have multiple files then I get the following exception: 

Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DocBuilder
buildDocum
ent
SEVERE: Exception while processing: f document : null
org.apache.solr.handler.dataimport.DataImportHandlerException: Problem
reading f
rom input Processing Document # 53812
at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:112)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:348)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:376)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:224)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:316)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:376)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
va:355)
Caused by: java.io.IOException: Stream closed
at java.io.BufferedReader.ensureOpen(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:109)
... 8 more
Sep 16, 2009 9:48:46 AM org.apache.solr.handler.dataimport.DataImporter
doFullIm
port
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Problem
reading f
rom input Processing Document # 53812
at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:112)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
ityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:348)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
r.java:376)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
ava:224)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
rter.java:316)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
ava:376)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.ja
va:355)
Caused by: java.io.IOException: Stream closed
at java.io.BufferedReader.ensureOpen(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at
org.apache.solr.handler.dataimport.LineEntityProcessor.nextRow(LineEn
tityProcessor.java:109)
... 8 more



Note that my input files have 53812 lines, which is the same as the document
number that I'm choking on. Does anyone know what I'm doing wrong?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25476443.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: FileListEntityProcessor and LineEntityProcessor

2009-09-16 Thread wojtekpia




Fergus McMenemie-2 wrote:
> 
> 
> Can you provide more detail on what you are trying to do? ...
> You seem to listing all files "d:\my\directory\.*WRK". Do 
> these WRK files contain lists of files to be indexed?
> 
> 

That is my complete data config file. I have a directory containing a bunch
of files that have one entity per line. Each line contains "blocks" of data.
I parse out each block and process it appropriately using myTransformer. Is
this use of FileListEntityProcessor with LineEntityProcessor not supported?
-- 
View this message in context: 
http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25477613.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: FileListEntityProcessor and LineEntityProcessor

2009-09-16 Thread wojtekpia


Note that if I change my import file to explicitly list all my files (instead
of using the FileListEntityProcessor) as below then everything works as I
expect.

  
  




...


-- 
View this message in context: 
http://www.nabble.com/FileListEntityProcessor-and-LineEntityProcessor-tp25476443p25480830.html
Sent from the Solr - User mailing list archive at Nabble.com.

Multi-valued field cache

2009-09-30 Thread wojtekpia


I want to build a FunctionQuery that scores documents based on a multi-valued
field. My intention was to use the field cache, but that doesn't get me
multiple values per document. I saw other posts suggesting UnInvertedField
as the solution. I don't see a method in the UnInvertedField class that will
give me a list of field values per document. I only see methods that give
values per document set. Should I use one of those methods and create
document sets of size 1 for each document?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Multi-valued-field-cache-tp25684952p25684952.html
Sent from the Solr - User mailing list archive at Nabble.com.

Different sort behavior on same code

2009-10-06 Thread wojtekpia


Hi,

I'm running Solr version 1.3.0.2009.07.08.08.05.45 in 2 environments. I have
a field defined as:



The two environments have different data, but both have single and multi
valued entries for myDate.

On one environment sorting by myDate works (sort seems to be by the 'last'
value if multi valued).

On the other environment I get: 
HTTP Status 500 - there are more terms than documents in field "myDate", but
it's impossible to sort on tokenized fields java.lang.RuntimeException:
there are more terms than documents in field 

I've read that I shouldn't sort by multi-valued fields, so my solution will
be to add a single-valued date field for sorting. But I don't understand why
my two environments behave differently, and it doesn't seem like the error
message makes sense (are date fields tokenized?). Any thoughts?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Different-sort-behavior-on-same-code-tp25774769p25774769.html
Sent from the Solr - User mailing list archive at Nabble.com.

Changing masterUrl in ReplicationHandler at Runtime

2009-10-09 Thread wojtekpia


Hi,
I'm trying to change the masterUrl of a search slave at runtime. So far I've
found 2 ways of doing it:

1. Change solrconfig_slave.xml on master, and have it replicate to
solrconfig.xml on the slave
2. Change solrconfig.xml on slave, then issue a core reload command. (a side
note: can I issue the reload-core command without having a solr.xml file? I
had to run a single core in multi-core mode to make this work)

So far I like solution 2 better. Does it make sense to add a 'sticky'
parameter to the ReplicationHandler's fetchindex command? Something like:
http://slave_host:port/solr/replication?command=fetchindex&masterUrl=myUrl&stickyMasterUrl=true
If true then 'myUrl' would continue being used for replication, including
future polling.

Are there other solutions?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Changing-masterUrl-in-ReplicationHandler-at-Runtime-tp25829843p25829843.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how can I use debugQuery if I have extended QParserPlugin?

2009-10-16 Thread wojtekpia


I'm seeing the same behavior and I don't have any custom query parsing
plugins. Similar to the original post, my queries like:

select?q=field:[1 TO *]
select?q=field:[1 TO 2]
select?q=field:[1 TO 2]&debugQuery=true

work correctly, but including an unboundd range appears to break the debug
component:
select?q=field:[1 TO *]&debugQuery=true

My stack trace is the same as the original post.


gdeconto wrote:
> 
> my apologies, you are correct; I put the stack trace in an edit of the
> post and not in the original post.
> 
> re version info: 
> 
> Solr Specification Version: 1.3.0.2009.07.08.08.05.45
> Solr Implementation Version: nightly exported - yonik - 2009-07-08
> 08:05:45 
> 
> NOTE: I have some more info on this NPE problem.  I get the NPE error
> whenever I use debugQuery and the query range has an asterix in it, even
> tho the query itself should work.  For example:
> 
> These work ok:
> 
> http://127.0.0.1:8994/solr/select?q=myfield:[* TO 1]
> http://127.0.0.1:8994/solr/select?q=myfield:[1 TO *]
> http://127.0.0.1:8994/solr/select?q=myfield:[1 TO 1000]
> http://127.0.0.1:8994/solr/select?q=myfield:[1 TO 1000]&debugQuery=true
> 
> These do not work ok:
> 
> http://127.0.0.1:8994/solr/select?q=myfield:[* TO 1]&debugQuery=true
> http://127.0.0.1:8994/solr/select?q=myfield:[1 TO *]&debugQuery=true
> http://127.0.0.1:8994/solr/select?q=myfield:*
> http://127.0.0.1:8994/solr/select?q=myfield:*&debugQuery=true
> 
> Not sure if the * gets translated somewhere into a null value parameter (I
> am just starting to look at the solr code) per your comment
> 

-- 
View this message in context: 
http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25930610.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how can I use debugQuery if I have extended QParserPlugin?

2009-10-16 Thread wojtekpia


Good catch. I was testing on a nightly build from mid-July. I just tested on
a similar deployment with nightly code from Oct 5th and everything seems to
work. 

My mid-July deployment breaks on sints, integers, sdouble, doubles, slongs
and longs. My more recent deployment works with tints, sints, integers,
tdoubles, sdoubles, doubles, tlongs, slongs, and longs. (I don't have any
floats in my schema so I didn't test those). Sounds like another reason to
upgrade to 1.4.

Wojtek


Yonik Seeley-3 wrote:
> 
> Is this with trunk? I can't seem to reproduce this... what's the field
> type?
> 
> -Yonik
> http://www.lucidimagination.com
> 
> On Fri, Oct 16, 2009 at 3:01 PM, wojtekpia  wrote:
>>
>> I'm seeing the same behavior and I don't have any custom query parsing
>> plugins. Similar to the original post, my queries like:
>>
>> select?q=field:[1 TO *]
>> select?q=field:[1 TO 2]
>> select?q=field:[1 TO 2]&debugQuery=true
>>
>> work correctly, but including an unboundd range appears to break the
>> debug
>> component:
>> select?q=field:[1 TO *]&debugQuery=true
>>
>> My stack trace is the same as the original post.
>>
>>
>> gdeconto wrote:
>>>
>>> my apologies, you are correct; I put the stack trace in an edit of the
>>> post and not in the original post.
>>>
>>> re version info:
>>>
>>> Solr Specification Version: 1.3.0.2009.07.08.08.05.45
>>> Solr Implementation Version: nightly exported - yonik - 2009-07-08
>>> 08:05:45
>>>
>>> NOTE: I have some more info on this NPE problem.  I get the NPE error
>>> whenever I use debugQuery and the query range has an asterix in it, even
>>> tho the query itself should work.  For example:
>>>
>>> These work ok:
>>>
>>> http://127.0.0.1:8994/solr/select?q=myfield:[* TO 1]
>>> http://127.0.0.1:8994/solr/select?q=myfield:[1 TO *]
>>> http://127.0.0.1:8994/solr/select?q=myfield:[1 TO 1000]
>>> http://127.0.0.1:8994/solr/select?q=myfield:[1 TO 1000]&debugQuery=true
>>>
>>> These do not work ok:
>>>
>>> http://127.0.0.1:8994/solr/select?q=myfield:[* TO 1]&debugQuery=true
>>> http://127.0.0.1:8994/solr/select?q=myfield:[1 TO *]&debugQuery=true
>>> http://127.0.0.1:8994/solr/select?q=myfield:*
>>> http://127.0.0.1:8994/solr/select?q=myfield:*&debugQuery=true
>>>
>>> Not sure if the * gets translated somewhere into a null value parameter
>>> (I
>>> am just starting to look at the solr code) per your comment
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25930610.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25932460.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: number of Solr indexes per Tomcat instance

2009-10-23 Thread wojtekpia


I ran into trouble running several cores (either as Solr multi-core or as
separate web apps) in a single JVM because the Java garbage collector would
freeze all cores during a collection. This may not be an issue if you're not
dealing with large amounts of memory. My solution is to run each web app in
its own JVM and Tomcat instance.

-- 
View this message in context: 
http://www.nabble.com/number-of-Solr-indexes-per-Tomcat-instance-tp26027238p26029243.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: javabin in .NET?

2009-11-12 Thread wojtekpia

I was thinking of going this route too because I've found that parsing XML
result sets using XmlDocument + XPath can be very slow (up to a few seconds)
when requesting ~100 documents. Are you getting good performance parsing
large result sets? Are you using SAX instead of DOM?

Thanks,
Wojtek

mausch wrote:
> 
> It's one of my pending issues for SolrNet (
> http://code.google.com/p/solrnet/issues/detail?id=71 )
> I've looked at the code, it doesn't seem terribly complex to port to C#.
> It
> would be kind of cumbersome to test it though.
> I just didn't implement it yet because I'm getting good enough performance
> with XML (and other people as well:
> http://groups.google.com/group/solrnet/msg/4de8224a33279906 )
> 
> Cheers,
> Mauricio
> 

-- 
View this message in context: 
http://old.nabble.com/javabin-in-.NET--tp26321914p26323001.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: question about schemas (and SOLR-1131?)

2009-12-04 Thread wojtekpia

Could this be solved with a multi-valued custom field type (including a
custom comparator)? The OP's situation deals with multi-valuing products for
each customer. If products contain strictly numeric fields then it seems
like a custom field implementation (or extension of BinaryField?) *should*
be easy - only the comparator part needs work. I'm not clear on how the
existing query parsers would handle this though, so there's probably some
work there too. SOLR-1131 seems like a more general solution that supports
analysis that numeric fields don't need.

gdeconto wrote:
> 
> I saw an interesting thread in the solr-dev forum about multiple fields
> per fieldtype (https://issues.apache.org/jira/browse/SOLR-1131)
> 
> from the sounds of it, it might be of interest and/or use in these types
> of problems;  for your example, you might be able to define a fieldtype
> that houses the product data.
> 
> note that I only skimmed the thread. hopefully, I'll get get some time to
> look at it more closely
> 

-- 
View this message in context: 
http://old.nabble.com/question-about-schemas-tp26600956p26636170.html
Sent from the Solr - User mailing list archive at Nabble.com.

DataImportHandler running out of memory

2008-06-24 Thread wojtekpia


I'm trying to load ~10 million records into Solr using the DataImportHandler.
I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as
soon as I try loading more than about 5 million records.

Here's my configuration:
I'm connecting to a SQL Server database using the sqljdbc driver. I've given
my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to
1. My SQL query is "select top XXX field1, ... from table1". I have
about 40 fields in my Solr schema.

I thought the DataImportHandler would stream data from the DB rather than
loading it all into memory at once. Is that not the case? Any thoughts on
how to get around this (aside from getting a machine with more memory)? 

-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DataImportHandler running out of memory

2008-06-25 Thread wojtekpia


I'm trying with batchSize=-1 now. So far it seems to be working, but very
slowly. I will update when it completes or crashes.

Even with a batchSize of 100 I was running out of memory.

I'm running on a 32-bit Windows machine. I've set the -Xmx to 1.5 GB - I
believe that's the maximum for my environment.

The batchSize parameter doesn't seem to control what happens... when I
select top 5,000,000 with a batchSize of 10,000, it works. When I select top
10,000,000 with the same batchSize, it runs out of memory.

Also, I'm using the 469 patch posted on 2008-06-11 08:41 AM.


Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> DIH streams rows one by one.
> set the fetchSize="-1" this might help. It may make the indexing a bit
> slower but memory consumption would be low.
> The memory is consumed by the jdbc driver. try tuning the -Xmx value for
> the VM
> --Noble
> 
> On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar
> <[EMAIL PROTECTED]> wrote:
>> Setting the batchSize to 1 would mean that the Jdbc driver will keep
>> 1 rows in memory *for each entity* which uses that data source (if
>> correctly implemented by the driver). Not sure how well the Sql Server
>> driver implements this. Also keep in mind that Solr also needs memory to
>> index documents. You can probably try setting the batch size to a lower
>> value.
>>
>> The regular memory tuning stuff should apply here too -- try disabling
>> autoCommit and turn-off autowarming and see if it helps.
>>
>> On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> I'm trying to load ~10 million records into Solr using the
>>> DataImportHandler.
>>> I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
>>> as
>>> soon as I try loading more than about 5 million records.
>>>
>>> Here's my configuration:
>>> I'm connecting to a SQL Server database using the sqljdbc driver. I've
>>> given
>>> my Solr instance 1.5 GB of memory. I have set the dataSource batchSize
>>> to
>>> 1. My SQL query is "select top XXX field1, ... from table1". I have
>>> about 40 fields in my Solr schema.
>>>
>>> I thought the DataImportHandler would stream data from the DB rather
>>> than
>>> loading it all into memory at once. Is that not the case? Any thoughts
>>> on
>>> how to get around this (aside from getting a machine with more memory)?
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18115900.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DataImportHandler running out of memory

2008-06-25 Thread wojtekpia


It looks like that was the problem. With responseBuffering=adaptive, I'm able
to load all my data using the sqljdbc driver.
-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search query optimization

2008-06-30 Thread wojtekpia


If I know that condition C will eliminate more results than either A or B,
does specifying the query as: "C AND A AND B" make it any faster (than the
original "A AND B AND C")?
-- 
View this message in context: 
http://www.nabble.com/Search-query-optimization-tp17544667p18205504.html
Sent from the Solr - User mailing list archive at Nabble.com.

"Similarity" of numbers in MoreLikeThisHandler

2008-07-03 Thread wojtekpia


I have a numeric field that I'm using for getting more records like the
current one. Does the MoreLikeThisHandler do numeric comparisons on numeric
fields (e.g. 4 is "similar" to 5), or is it a string comparison?
-- 
View this message in context: 
http://www.nabble.com/%22Similarity%22-of-numbers-in-MoreLikeThisHandler-tp1827p1827.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: "Similarity" of numbers in MoreLikeThisHandler

2008-07-04 Thread wojtekpia


I stored 2 copies of a single field: one as a number, the other as a string.
The MLT handler returned the same documents regardless of which of the 2
fields I used for similarity. So to answer my own question, the
MoreLikeThisHandler does not do numeric comparisons on numeric fields.
-- 
View this message in context: 
http://www.nabble.com/%22Similarity%22-of-numbers-in-MoreLikeThisHandler-tp1827p18285373.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: "Similarity" of numbers in MoreLikeThisHandler

2008-07-04 Thread wojtekpia


I didn't realize that subsets were used to evaluate similarity. From your
example, I assume that the strings: 456 and 123456 are "similar". If I store
them as integers instead of strings, will Solr/Lucene still use subsets to
assign similarity?


-- 
View this message in context: 
http://www.nabble.com/%22Similarity%22-of-numbers-in-MoreLikeThisHandler-tp1827p18286144.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: get the fields of solr

2008-07-08 Thread wojtekpia


Thanks. 

Can I search for fields using the luke handler? I'd like to be able to say
something like: 
solr/admin/luke?fl=a*

where the '*' is a wildcard not necessarily related to dynamic fields. I
will have at least a few hundred dynamic fields, so I'd rather not load all
fields into memory in the UI.
-- 
View this message in context: 
http://www.nabble.com/get-the-fields-of-solr-tp14431354p18350707.html
Sent from the Solr - User mailing list archive at Nabble.com.

DataImportHandler current_index_time & post-completion action

2008-07-16 Thread wojtekpia

I have two questions:

1. I am pulling data from 2 data sources using the DIH. I am using the
deltaQuery functionality. Since the data sources pull data sequentially, I
find that some data is getting unnecessarily re-indexed from my second data
source. Hopefully this helps illustrate my probem:

Assume last_index_time is 0.
At time = 1, pull data from data source 1 with a query that includes
"last_modified> '${dataimporter.last_index_time}'". Note that this pulls
data for the time interval [0,1]. This step takes 1 time interval.
At time = 2, data source 2 is polled with the same query. This step takes 1
time interval. Note that this pulls data for the time interval [0,2].
At t=3, last_index_time is set to 1

Next time I run the DIH, I will be unneccessarily re-indexing data that
appeared in data source 2 in the inteval [1,2].

Ideally, I'd like to have access to something like
${dataimporter.current_index_time}, so I could restrict my delta query to:
"last_modified> '${dataimporter.last_index_time}' AND last_modified <
'${dataimporter.current_index_time}'"

Is this available?

2. I have a transient table that I query with the DIH to load my index.
After loading values into the index, I want to delete them from the
transient table. Is there a way to do this from the DIH? I tried stuffing a
delete statement into the deltaQuery attribute, but that didn't work:

--
View this message in context:
http://www.nabble.com/DataImportHandler-current_index_time---post-completion-action-tp18498832p18498832.html
Sent from the Solr - User mailing list archive at Nabble.com.

termVectors and faceting

2008-07-29 Thread wojtekpia


Does setting termVectors to true affect faceting speed on a field? I changed
a field definition from:



to:



And I see a significant performance improvement (~6x faster). MyFacetField
has ~25,000 unique values. Does it make sense that this change caused the
improvement? I made several other changes to my schema, but I know that
faceting on MyFacetField was by far the slowest part of my queries.

Thanks.


Wojtek


-- 
View this message in context: 
http://www.nabble.com/termVectors-and-faceting-tp18717622p18717622.html
Sent from the Solr - User mailing list archive at Nabble.com.

Return results for suggested SpellCheck terms

2008-08-08 Thread wojtekpia


I'd like to have a handler that 1) executes a query, 2) provides spelling
suggestions for incorrectly spelled words, and 3) if the original query
returns 0 results, return results based on the spell check suggestions.

1 & 2 are straight forward using the SpellCheckComponent, but I can't figure
out 3 without writing custom code. Can I do it with just configuration
settings?


-- 
View this message in context: 
http://www.nabble.com/Return-results-for-suggested-SpellCheck-terms-tp18897102p18897102.html
Sent from the Solr - User mailing list archive at Nabble.com.

Duplicate Data Across Fields

2008-08-14 Thread wojtekpia


I have 2 fields which will sometimes contain the same data. When they do
contain the same data, am I paying the same performance cost as when they
contain unique data? I think the real question here is: does Lucene index
values per field, or per document?
-- 
View this message in context: 
http://www.nabble.com/Duplicate-Data-Across-Fields-tp18986515p18986515.html
Sent from the Solr - User mailing list archive at Nabble.com.

Faceting MoreLikeThisComponent results

2008-08-28 Thread wojtekpia


When using the MoreLikeThisHandler with facets turned on, the facets show
counts of things that are more like my original document. When I use the
MoreLikeThisComponent, the facets show counts of things that match my
original document (I'm querying by document ID), so there is only one
result, and the facets are not interesting. I tried changing the order of
search components (facet after mlt), but that didn't change the behavior.
How can I facet the results of the MoreLikeThisComponent?

-- 
View this message in context: 
http://www.nabble.com/Faceting-MoreLikeThisComponent-results-tp19206833p19206833.html
Sent from the Solr - User mailing list archive at Nabble.com.

Creating dynamic fields with DataImportHandler

2008-08-29 Thread wojtekpia


I have a custom row transformer that I'm using with the DataImportHandler.
When I try to create a dynamic field from my transformer, it doesn't get
created.

If I do exactly the same thing from my dataimport handler config file, it
works as expected. 

Has anyone experienced this? I'm using a nightly build from about 3 weeks
ago. I realize there were some fixes done to the DataImportHandler since
then, but if I understand them correctly, they seem unrelated to my issue
(http://www.nabble.com/localsolr-and-dataimport-problems-to18849983.html#a18854923).
-- 
View this message in context: 
http://www.nabble.com/Creating-dynamic-fields-with-DataImportHandler-tp19226532p19226532.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Creating dynamic fields with DataImportHandler

2008-08-29 Thread wojtekpia


I have created SOLR-742:
http://issues.apache.org/jira/browse/SOLR-742

For my case, I don't know the field name ahead of time.



Shalin Shekhar Mangar wrote:
> 
> Yes, sounds like a bug. Do you mind opening a jira issue for this?
> 
> A simple workaround is to add the field name (if you know it beforehand)
> to
> your data config and use the Transformer to set the value. If you don't
> know
> the field name before hand then this will not work for you.
> 
> On Sat, Aug 30, 2008 at 1:31 AM, wojtekpia <[EMAIL PROTECTED]> wrote:
> 
>>
>> I have a custom row transformer that I'm using with the
>> DataImportHandler.
>> When I try to create a dynamic field from my transformer, it doesn't get
>> created.
>>
>> If I do exactly the same thing from my dataimport handler config file, it
>> works as expected.
>>
>> Has anyone experienced this? I'm using a nightly build from about 3 weeks
>> ago. I realize there were some fixes done to the DataImportHandler since
>> then, but if I understand them correctly, they seem unrelated to my issue
>> (
>> http://www.nabble.com/localsolr-and-dataimport-problems-to18849983.html#a18854923
>> ).
>> --
>> View this message in context:
>> http://www.nabble.com/Creating-dynamic-fields-with-DataImportHandler-tp19226532p19226532.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Creating-dynamic-fields-with-DataImportHandler-tp19226532p19227919.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting MoreLikeThisComponent results

2008-09-08 Thread wojtekpia


Thanks Hoss. I created SOLR 760:
https://issues.apache.org/jira/browse/SOLR-760



hossman wrote:
> 
> 
> : When using the MoreLikeThisHandler with facets turned on, the facets
> show
> : counts of things that are more like my original document. When I use the
> : MoreLikeThisComponent, the facets show counts of things that match my
> : original document (I'm querying by document ID), so there is only one
>   ...
> : How can I facet the results of the MoreLikeThisComponent?
> 
> I don't think you can at this point.  The good news is MoreLikeThisHandler 
> isn't getting removed anytime soon.
> 
> 
> What we need to do is provide more options on the componets to dictate 
> their behavior when deciding what to process and how to return it ... your 
> example could be solved be either adding an option to MLTComponent telling 
> it to overwrite hte main result set; or by adding an option to 
> FacetComponent specifying the name of a DocSet in the response to use in 
> it's intersections.
> 
> I think it would be good to do both.
> 
> (HighlightComponent should probably also have an option just like the one 
> i discribed for FacetComponent)
> 
> Would you mind filing a feature request?
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Faceting-MoreLikeThisComponent-results-tp19206833p19376403.html
Sent from the Solr - User mailing list archive at Nabble.com.

dataimporter.last_index_time not set for full-import query

2008-09-10 Thread wojtekpia


I would like to use (abuse?) the dataimporter.last_index_time variable in my
full-import query, but it looks like that variable is only set when running
a delta-import.

My use case: 
I'd like to use a stored procedure to manage how data is given to the
DataImportHandler so I can gracefully handle failed imports. The stored
procedure would take in the last successful data import time and decide
which records should be returned. 

I looked into using the delta-import functionality, but it didn't seem like
the right fit for my need.

Any thoughts? 
-- 
View this message in context: 
http://www.nabble.com/dataimporter.last_index_time-not-set-for-full-import-query-tp19419383p19419383.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: dataimporter.last_index_time not set for full-import query

2008-09-10 Thread wojtekpia


I created a JIRA issue for this and attached a patch:

https://issues.apache.org/jira/browse/SOLR-768


wojtekpia wrote:
> 
> I would like to use (abuse?) the dataimporter.last_index_time variable in
> my full-import query, but it looks like that variable is only set when
> running a delta-import.
> 
> My use case: 
> I'd like to use a stored procedure to manage how data is given to the
> DataImportHandler so I can gracefully handle failed imports. The stored
> procedure would take in the last successful data import time and decide
> which records should be returned. 
> 
> I looked into using the delta-import functionality, but it didn't seem
> like the right fit for my need.
> 
> Any thoughts? 
> 

-- 
View this message in context: 
http://www.nabble.com/dataimporter.last_index_time-not-set-for-full-import-query-tp19419383p19425162.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlight Fragments

2008-09-23 Thread wojtekpia


Make sure the fields you're trying to highlight are stored in your schema
(e.g. )



David Snelling-2 wrote:
> 
> Ok, I'm very frustrated. I've tried every configuraiton I can and
> parameters
> and I cannot get fragments to show up in the highlighting in solr. (no
> fragments at the bottom or highlights  in the text. I must be
> missing something but I'm just not sure what it is.
> 
> /select/?qt=standard&q=crayon&hl=true&hl.fl=synopsis,shortdescription&hl.fragmenter=gap&hl.snippets=3&debugQuery=true
> 
> And I get highlight segment, but no fragments or phrase highlighting.
> 
> My goal - if I'm doing this completely wrong - is to get google like
> snippets of text around the query term (or at mimimum to highlight the
> query
> term itself).
> 
> Results:
> 
> synopsis
> true
> 3
> gap
> crayon
> synopsis
> standard
> true
> 2.1
> 
> 
> −
> 
> −
> ...
> .
> ..
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> "hic sunt dracones"
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Highlight-Fragments-tp19636705p19636915.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlight Fragments

2008-09-23 Thread wojtekpia


Try a query where you're sure to get something to highlight in one of your
highlight fields, for example:

/select/?qt=standard&q=synopsis:crayon&hl=true&hl.fl=synopsis,shortdescription



David Snelling-2 wrote:
> 
> This is the configuration for the two fields I have tried on
> 
>  stored="true"/>
>  compressed="true"/>
> 
> 
> 
> On Tue, Sep 23, 2008 at 1:59 PM, wojtekpia <[EMAIL PROTECTED]> wrote:
> 
>>
>> Make sure the fields you're trying to highlight are stored in your schema
>> (e.g. )
>>
>>
>>
>> David Snelling-2 wrote:
>> >
>> > Ok, I'm very frustrated. I've tried every configuraiton I can and
>> > parameters
>> > and I cannot get fragments to show up in the highlighting in solr. (no
>> > fragments at the bottom or highlights  in the text. I must be
>> > missing something but I'm just not sure what it is.
>> >
>> >
>> /select/?qt=standard&q=crayon&hl=true&hl.fl=synopsis,shortdescription&hl.fragmenter=gap&hl.snippets=3&debugQuery=true
>> >
>> > And I get highlight segment, but no fragments or phrase highlighting.
>> >
>> > My goal - if I'm doing this completely wrong - is to get google like
>> > snippets of text around the query term (or at mimimum to highlight the
>> > query
>> > term itself).
>> >
>> > Results:
>> > 
>> > synopsis
>> > true
>> > 3
>> > gap
>> > crayon
>> > synopsis
>> > standard
>> > true
>> > 2.1
>> > 
>> > 
>> > −
>> > 
>> > −
>> > ...
>> > .
>> > ..
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> >
>> > --
>> > "hic sunt dracones"
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Highlight-Fragments-tp19636705p19636915.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> "hic sunt dracones"
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Highlight-Fragments-tp19636705p19637261.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlight Fragments

2008-09-23 Thread wojtekpia


Your fields are all of string type. String fields aren't tokenized or
analyzed, so you have to match the entire text of those fields to actually
get a match. Try the following:

/select/?q=firstname:Kathryn&hl=on&hl.fl=firstname

The reason you're seeing results with just q=students, but not
q=synopsis:students is because you're copying the synopsis field into your
field named 'text', which is of type 'text', which does get tokenized and
analyzed, and 'text' is your default search field.

The reason you don't see any highlights with the following query is because
your 'text' field isn't stored.

select/?q=text:students&hl=on&hl.fl=text





David Snelling-2 wrote:
> 
> Hmmm. That doesn't actually return  anything which is odd because I know
> it's in the field if I do a query without specifying the field.
> 
> http://qasearch.donorschoose.org/select/?q=synopsis:students
> 
> returns nothing
> 
> http://qasearch.donorschoose.org/select/?q=students
> 
> returns items with query in synopsis field.
> 
> This may be causing issues but I'm not sure why it's not working. We use
> this live and do very complex queries including facets that work fine.
> 
> www.donorschoose.org
> 
> 
> 
> On Tue, Sep 23, 2008 at 2:20 PM, wojtekpia <[EMAIL PROTECTED]> wrote:
> 
>>
>> Try a query where you're sure to get something to highlight in one of
>> your
>> highlight fields, for example:
>>
>>
>> /select/?qt=standard&q=synopsis:crayon&hl=true&hl.fl=synopsis,shortdescription
>>
>>
>>
>> David Snelling-2 wrote:
>> >
>> > This is the configuration for the two fields I have tried on
>> >
>> > > > stored="true"/>
>> > > > compressed="true"/>
>> >
>> >
>> >
>> > On Tue, Sep 23, 2008 at 1:59 PM, wojtekpia <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >>
>> >> Make sure the fields you're trying to highlight are stored in your
>> schema
>> >> (e.g. )
>> >>
>> >>
>> >>
>> >> David Snelling-2 wrote:
>> >> >
>> >> > Ok, I'm very frustrated. I've tried every configuraiton I can and
>> >> > parameters
>> >> > and I cannot get fragments to show up in the highlighting in solr.
>> (no
>> >> > fragments at the bottom or highlights  in the text. I must
>> be
>> >> > missing something but I'm just not sure what it is.
>> >> >
>> >> >
>> >>
>> /select/?qt=standard&q=crayon&hl=true&hl.fl=synopsis,shortdescription&hl.fragmenter=gap&hl.snippets=3&debugQuery=true
>> >> >
>> >> > And I get highlight segment, but no fragments or phrase
>> highlighting.
>> >> >
>> >> > My goal - if I'm doing this completely wrong - is to get google like
>> >> > snippets of text around the query term (or at mimimum to highlight
>> the
>> >> > query
>> >> > term itself).
>> >> >
>> >> > Results:
>> >> > 
>> >> > synopsis
>> >> > true
>> >> > 3
>> >> > gap
>> >> > crayon
>> >> > synopsis
>> >> > standard
>> >> > true
>> >> > 2.1
>> >> > 
>> >> > 
>> >> > −
>> >> > 
>> >> > −
>> >> > ...
>> >> > .
>> >> > ..
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> >
>> >> > --
>> >> > "hic sunt dracones"
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >> http://www.nabble.com/Highlight-Fragments-tp19636705p19636915.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> > --
>> > "hic sunt dracones"
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Highlight-Fragments-tp19636705p19637261.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> "hic sunt dracones"
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Highlight-Fragments-tp19636705p19637801.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlight Fragments

2008-09-23 Thread wojtekpia


Yes, you can use text (or some custom derivative of it) for your fields. 


David Snelling-2 wrote:
> 
> Ok, thanks, that makes a lot of sense now.
> So, how should I be storing the text for the synopsis or shortdescription
> fields so it would be tokenized? Should it be text instead of string?
> 
> 
> Thank you very much for the help by the way.
> 
> 
> On Tue, Sep 23, 2008 at 2:49 PM, wojtekpia <[EMAIL PROTECTED]> wrote:
> 
>>
>> Your fields are all of string type. String fields aren't tokenized or
>> analyzed, so you have to match the entire text of those fields to
>> actually
>> get a match. Try the following:
>>
>> /select/?q=firstname:Kathryn&hl=on&hl.fl=firstname
>>
>> The reason you're seeing results with just q=students, but not
>> q=synopsis:students is because you're copying the synopsis field into
>> your
>> field named 'text', which is of type 'text', which does get tokenized and
>> analyzed, and 'text' is your default search field.
>>
>> The reason you don't see any highlights with the following query is
>> because
>> your 'text' field isn't stored.
>>
>> select/?q=text:students&hl=on&hl.fl=text
>>
>>
>>
>>
>>
>> David Snelling-2 wrote:
>> >
>> > Hmmm. That doesn't actually return  anything which is odd because I
>> know
>> > it's in the field if I do a query without specifying the field.
>> >
>> > http://qasearch.donorschoose.org/select/?q=synopsis:students
>> >
>> > returns nothing
>> >
>> > http://qasearch.donorschoose.org/select/?q=students
>> >
>> > returns items with query in synopsis field.
>> >
>> > This may be causing issues but I'm not sure why it's not working. We
>> use
>> > this live and do very complex queries including facets that work fine.
>> >
>> > www.donorschoose.org
>> >
>> >
>> >
>> > On Tue, Sep 23, 2008 at 2:20 PM, wojtekpia <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >>
>> >> Try a query where you're sure to get something to highlight in one of
>> >> your
>> >> highlight fields, for example:
>> >>
>> >>
>> >>
>> /select/?qt=standard&q=synopsis:crayon&hl=true&hl.fl=synopsis,shortdescription
>> >>
>> >>
>> >>
>> >> David Snelling-2 wrote:
>> >> >
>> >> > This is the configuration for the two fields I have tried on
>> >> >
>> >> > > >> > stored="true"/>
>> >> > > >> > compressed="true"/>
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Sep 23, 2008 at 1:59 PM, wojtekpia <[EMAIL PROTECTED]>
>> >> wrote:
>> >> >
>> >> >>
>> >> >> Make sure the fields you're trying to highlight are stored in your
>> >> schema
>> >> >> (e.g. )
>> >> >>
>> >> >>
>> >> >>
>> >> >> David Snelling-2 wrote:
>> >> >> >
>> >> >> > Ok, I'm very frustrated. I've tried every configuraiton I can and
>> >> >> > parameters
>> >> >> > and I cannot get fragments to show up in the highlighting in
>> solr.
>> >> (no
>> >> >> > fragments at the bottom or highlights  in the text. I
>> must
>> >> be
>> >> >> > missing something but I'm just not sure what it is.
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> /select/?qt=standard&q=crayon&hl=true&hl.fl=synopsis,shortdescription&hl.fragmenter=gap&hl.snippets=3&debugQuery=true
>> >> >> >
>> >> >> > And I get highlight segment, but no fragments or phrase
>> >> highlighting.
>> >> >> >
>> >> >> > My goal - if I'm doing this completely wrong - is to get google
>> like
>> >> >> > snippets of text around the query term (or at mimimum to
>> highlight
>> >> the
>> >> >> > query
>> >> >> > term itself).
>> >> >> >
>> >> >> > Results:
>> >> >> > 
>> >> >> > synopsis
>> >> >> > true
>> >> >> > 3
>> >> >> > gap
>> >> >> > crayon
>> >> >> > synopsis
>> >> >> > standard
>> >> >> > true
>> >> >> > 2.1
>> >> >> > 
>> >> >> > 
>> >> >> > −
>> >> >> > 
>> >> >> > −
>> >> >> > ...
>> >> >> > .
>> >> >> > ..
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> > 
>> >> >> >
>> >> >> > --
>> >> >> > "hic sunt dracones"
>> >> >> >
>> >> >> >
>> >> >>
>> >> >> --
>> >> >> View this message in context:
>> >> >> http://www.nabble.com/Highlight-Fragments-tp19636705p19636915.html
>> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > "hic sunt dracones"
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >> http://www.nabble.com/Highlight-Fragments-tp19636705p19637261.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> > --
>> > "hic sunt dracones"
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Highlight-Fragments-tp19636705p19637801.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> "hic sunt dracones"
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Highlight-Fragments-tp19636705p19638296.html
Sent from the Solr - User mailing list archive at Nabble.com.

Throughput Optimization

2008-11-04 Thread wojtekpia


I've been running load tests over the past week or 2, and I can't figure out
my system's bottle neck that prevents me from increasing throughput. First
I'll describe my Solr setup, then what I've tried to optimize the system.

I have 10 million records and 59 fields (all are indexed, 37 are stored, 17
have termVectors, 33 are multi-valued) which takes about 15GB of disk space.
Most field values are very short (single word or number), and usually about
half the fields have any data at all. I'm running on an 8-core, 64-bit, 32GB
RAM Redhat box. I allocate about 24GB of memory to the java process, and my
filterCache size is 700,000. I'm using a version of Solr between 1.3 and the
current trunk (including the latest SOLR-667 (FastLRUCache) patch), and
Tomcat 6.0.

I'm running a ramp-test, increasing the number of users every few minutes. I
measure the maximum number of requests that Solr can handle per second with
a fixed response time, and call that my throughput. I'd like to see a single
physical resource be maxed out at some point during my test so I know it is
my bottle neck. I generated random queries for my dataset representing a
more or less realistic scenario. The queries include faceting by up to 6
fields, and quering by up to 8 fields.

I ran a baseline on the un-optimized setup, and saw peak CPU usage of about
50%, IO usage around 5%, and negligible network traffic. Interestingly, the
CPU peaked when I had 8 concurrent users, and actually dropped down to about
40% when I increased the users beyond 8. Is that because I have 8 cores?

I changed a few settings and observed the effect on throughput:

1. Increased filterCache size, and throughput increased by about 50%, but it
seems to peak.
2. Put the entire index on a RAM disk, and significantly reduced the average
response time, but my throughput didn't change (i.e. even though my response
time was 10X faster, the maximum number of requests I could make per second
didn't increase). This makes no sense to me, unless there is another bottle
neck somewhere.
3. Reduced the number of records in my index. The throughput increased, but
the shape of all my graphs stayed the same, and my CPU usage was identical.

I have a few questions:
1. Can I get more than 50% CPU utilization?
2. Why does CPU utilization fall when I make more than 8 concurrent
requests?
3. Is there an obvious bottleneck that I'm missing?
4. Does Tomcat have any settings that affect Solr performance?

Any input is greatly appreciated. 

-- 
View this message in context: 
http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Throughput Optimization

2008-11-05 Thread wojtekpia


Yes, I am seeing evictions. I've tried setting my filterCache higher, but
then I start getting Out Of Memory exceptions. My filterCache hit ratio is >
.99. It looks like I've hit a RAM bound here.

I ran a test without faceting. The response times / throughput were both
significantly higher, there were no evictions from the filter cache, but I
still wasn't getting > 50% CPU utilization. Any thoughts on what physical
bound I've hit in this case?



Erik Hatcher wrote:
> 
> One quick question are you seeing any evictions from your  
> filterCache?  If so, it isn't set large enough to handle the faceting  
> you're doing.
> 
>   Erik
> 
> 
> On Nov 4, 2008, at 8:01 PM, wojtekpia wrote:
> 
>>
>> I've been running load tests over the past week or 2, and I can't  
>> figure out
>> my system's bottle neck that prevents me from increasing throughput.  
>> First
>> I'll describe my Solr setup, then what I've tried to optimize the  
>> system.
>>
>> I have 10 million records and 59 fields (all are indexed, 37 are  
>> stored, 17
>> have termVectors, 33 are multi-valued) which takes about 15GB of  
>> disk space.
>> Most field values are very short (single word or number), and  
>> usually about
>> half the fields have any data at all. I'm running on an 8-core, 64- 
>> bit, 32GB
>> RAM Redhat box. I allocate about 24GB of memory to the java process,  
>> and my
>> filterCache size is 700,000. I'm using a version of Solr between 1.3  
>> and the
>> current trunk (including the latest SOLR-667 (FastLRUCache) patch),  
>> and
>> Tomcat 6.0.
>>
>> I'm running a ramp-test, increasing the number of users every few  
>> minutes. I
>> measure the maximum number of requests that Solr can handle per  
>> second with
>> a fixed response time, and call that my throughput. I'd like to see  
>> a single
>> physical resource be maxed out at some point during my test so I  
>> know it is
>> my bottle neck. I generated random queries for my dataset  
>> representing a
>> more or less realistic scenario. The queries include faceting by up  
>> to 6
>> fields, and quering by up to 8 fields.
>>
>> I ran a baseline on the un-optimized setup, and saw peak CPU usage  
>> of about
>> 50%, IO usage around 5%, and negligible network traffic.  
>> Interestingly, the
>> CPU peaked when I had 8 concurrent users, and actually dropped down  
>> to about
>> 40% when I increased the users beyond 8. Is that because I have 8  
>> cores?
>>
>> I changed a few settings and observed the effect on throughput:
>>
>> 1. Increased filterCache size, and throughput increased by about  
>> 50%, but it
>> seems to peak.
>> 2. Put the entire index on a RAM disk, and significantly reduced the  
>> average
>> response time, but my throughput didn't change (i.e. even though my  
>> response
>> time was 10X faster, the maximum number of requests I could make per  
>> second
>> didn't increase). This makes no sense to me, unless there is another  
>> bottle
>> neck somewhere.
>> 3. Reduced the number of records in my index. The throughput  
>> increased, but
>> the shape of all my graphs stayed the same, and my CPU usage was  
>> identical.
>>
>> I have a few questions:
>> 1. Can I get more than 50% CPU utilization?
>> 2. Why does CPU utilization fall when I make more than 8 concurrent
>> requests?
>> 3. Is there an obvious bottleneck that I'm missing?
>> 4. Does Tomcat have any settings that affect Solr performance?
>>
>> Any input is greatly appreciated.
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Throughput-Optimization-tp20335132p20343425.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Throughput Optimization

2008-11-05 Thread wojtekpia

Where is the alt directory in the source tree (or what is the JIRA issue
number)? I'd like to apply this patch and re-run my tests.

Does changing the lockType in solrconfig.xml address this issue? (My
lockType is the default - single).

markrmiller wrote:
> 
> The latest alt directory patch uses It.
> 
> - Mark
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Throughput-Optimization-tp20335132p20345965.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Throughput Optimization

2008-11-05 Thread wojtekpia


My documentCache hit rate is ~.7, and my queryCache is ~.03. I'm using
FastLRUCache on all 3 of the caches.


Feak, Todd wrote:
> 
> What are your other cache hit rates looking like?
> Which caches are you using the FastLRUCache on?
> 
> -Todd Feak
> 
> -Original Message-
> From: wojtekpia [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, November 05, 2008 8:15 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Throughput Optimization
> 
> 
> Yes, I am seeing evictions. I've tried setting my filterCache higher,
> but
> then I start getting Out Of Memory exceptions. My filterCache hit ratio
> is >
> .99. It looks like I've hit a RAM bound here.
> 
> I ran a test without faceting. The response times / throughput were both
> significantly higher, there were no evictions from the filter cache, but
> I
> still wasn't getting > 50% CPU utilization. Any thoughts on what
> physical
> bound I've hit in this case?
> 
> 
> 
> Erik Hatcher wrote:
>> 
>> One quick question are you seeing any evictions from your  
>> filterCache?  If so, it isn't set large enough to handle the faceting
> 
>> you're doing.
>> 
>>  Erik
>> 
>> 
>> On Nov 4, 2008, at 8:01 PM, wojtekpia wrote:
>> 
>>>
>>> I've been running load tests over the past week or 2, and I can't  
>>> figure out
>>> my system's bottle neck that prevents me from increasing throughput.
> 
>>> First
>>> I'll describe my Solr setup, then what I've tried to optimize the  
>>> system.
>>>
>>> I have 10 million records and 59 fields (all are indexed, 37 are  
>>> stored, 17
>>> have termVectors, 33 are multi-valued) which takes about 15GB of  
>>> disk space.
>>> Most field values are very short (single word or number), and  
>>> usually about
>>> half the fields have any data at all. I'm running on an 8-core, 64- 
>>> bit, 32GB
>>> RAM Redhat box. I allocate about 24GB of memory to the java process,
> 
>>> and my
>>> filterCache size is 700,000. I'm using a version of Solr between 1.3
> 
>>> and the
>>> current trunk (including the latest SOLR-667 (FastLRUCache) patch),  
>>> and
>>> Tomcat 6.0.
>>>
>>> I'm running a ramp-test, increasing the number of users every few  
>>> minutes. I
>>> measure the maximum number of requests that Solr can handle per  
>>> second with
>>> a fixed response time, and call that my throughput. I'd like to see  
>>> a single
>>> physical resource be maxed out at some point during my test so I  
>>> know it is
>>> my bottle neck. I generated random queries for my dataset  
>>> representing a
>>> more or less realistic scenario. The queries include faceting by up  
>>> to 6
>>> fields, and quering by up to 8 fields.
>>>
>>> I ran a baseline on the un-optimized setup, and saw peak CPU usage  
>>> of about
>>> 50%, IO usage around 5%, and negligible network traffic.  
>>> Interestingly, the
>>> CPU peaked when I had 8 concurrent users, and actually dropped down  
>>> to about
>>> 40% when I increased the users beyond 8. Is that because I have 8  
>>> cores?
>>>
>>> I changed a few settings and observed the effect on throughput:
>>>
>>> 1. Increased filterCache size, and throughput increased by about  
>>> 50%, but it
>>> seems to peak.
>>> 2. Put the entire index on a RAM disk, and significantly reduced the
> 
>>> average
>>> response time, but my throughput didn't change (i.e. even though my  
>>> response
>>> time was 10X faster, the maximum number of requests I could make per
> 
>>> second
>>> didn't increase). This makes no sense to me, unless there is another
> 
>>> bottle
>>> neck somewhere.
>>> 3. Reduced the number of records in my index. The throughput  
>>> increased, but
>>> the shape of all my graphs stayed the same, and my CPU usage was  
>>> identical.
>>>
>>> I have a few questions:
>>> 1. Can I get more than 50% CPU utilization?
>>> 2. Why does CPU utilization fall when I make more than 8 concurrent
>>> requests?
>>> 3. Is there an obvious bottleneck that I'm missing?
>>> 4. Does Tomcat have any settings that affect Solr performance?
>>>
>>> Any input is greatly appreciated.
>>>
>>> -- 
>>> View this message in context:
>>>
> http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 
>> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/Throughput-Optimization-tp20335132p20343425.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Throughput-Optimization-tp20335132p20346663.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Throughput Optimization

2008-11-05 Thread wojtekpia

I'll try changing my other caches to LRUCache and observe performance.
Interestingly, the FastLRUCache has given me a ~10% increase in performance,
much lower than I've read on the SOLR-667 thread.

Would compressing some of my stored fields significantly improve
performance? Most of my stored fields contain single words or numbers, but I
do have one relatively large stored field that contains up to a couple
paragraphs of text.

I agree that my 3% query cache hit rate is quite low (probably
unrealistically low). I'm treating these results as the worst-case. 

Feak, Todd wrote:
> 
> Yonik said something about the FastLRUCache giving the most gain for
> high hit-rates and the LRUCache being faster for low hit-rates. It's in
> his Nov 1 comment on SOLR-667. I'm not sure if anything changed since
> then, as it's an active issue, but you may want to try the LRUCache for
> your query cache.
> 
> It sounds like you are memory bound already, but you may want to
> investigate the tradeoffs of your filter cache vs. document cache. High
> document hit-rate was a big performance boost for us, as document
> garbage collection is a lot of overhead. I believe that would show up as
> CPU usage though, so it may not be your bottleneck.
> 
> This also brings up an interesting question. 3% hit rate on your query
> cache seems low to me. Are you sure your load test is mimicking
> realistic query patterns from your user base? I realize this probably
> isn't part of your bottleneck, just curious.
> 

-- 
View this message in context: 
http://www.nabble.com/Throughput-Optimization-tp20335132p20348749.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Throughput Optimization

2008-11-05 Thread wojtekpia


I'd like to integrate this improvement into my deployment. Is it just a
matter of getting the latest Lucene jars (Lucene nightly build)?


Yonik Seeley wrote:
> 
> You're probably hitting some contention with the locking around the
> reading of index files... this has been recently improved in Lucene
> for non-Windows boxes, and we're integrating that into Solr (should
> def be in the next release).
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Throughput-Optimization-tp20335132p20349247.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: new faceting algorithm

2008-12-02 Thread wojtekpia


Is there a configurable way to switch to the previous implementation? I'd
like to see exactly how it affects performance in my case.


Yonik Seeley wrote:
> 
> And if you want to verify that the new faceting code has indeed kicked
> in, some statistics are logged, like:
> 
> Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert
> INFO: UnInverted multi-valued field features, memSize=14584, time=47,
> phase1=47,
>  nTerms=285, bigTerms=99, termInstances=186
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: new faceting algorithm

2008-12-02 Thread wojtekpia


Definitely, but it'll take me a few days. I'll also report findings on
SOLR-465. (I've been on holiday for a few weeks)


Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> wojtek, you can report back the numbers if possible
> 
> It would be nice to know how the new impl performs in real-world
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20798456.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Throughput Optimization

2008-12-04 Thread wojtekpia

It looks like file locking was the bottleneck - CPU usage is up to ~98% (from
the previous peak of ~50%). I'm running the trunk code from Dec 2 with the
faceting improvement (SOLR-475) turned off. Thanks for all the help!

Yonik Seeley wrote:
> 
> FYI, SOLR-465 has been committed.  Let us know if it improves your
> scenario.
> 
> -Yonik
> 

-- 
View this message in context: 
http://www.nabble.com/Throughput-Optimization-tp20335132p20840017.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: new faceting algorithm

2008-12-04 Thread wojtekpia

I'm seeing some strange behavior with my garbage collector that disappears
when I turn off this optimization. I'm running load tests on my deployment.
For the first few minutes, everything is fine (and this patch does make
things faster - I haven't quantified the improvement yet). After that, the
garbage collector stops collecting. Specifically, the new generation part of
the heap is full, but never garbage collected, and the old generation is
emptied, then never gets anything more. This throttles Solr performance
(average response times that used to be ~500ms are now ~25s). 

I described my deployment scenario in an earlier post:
http://www.nabble.com/Throughput-Optimization-td20335132.html

Does it sound like the new faceting algorithm could be the culprit?

wojtekpia wrote:
> 
> Definitely, but it'll take me a few days. I'll also report findings on
> SOLR-465. (I've been on holiday for a few weeks)
> 
> 
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>> 
>> wojtek, you can report back the numbers if possible
>> 
>> It would be nice to know how the new impl performs in real-world
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20840622.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Throughput Optimization

2008-12-04 Thread wojtekpia


New faceting stuff off because I'm encountering some problems when I turn it
on, I posted the details:
http://www.nabble.com/new-faceting-algorithm-td20674902.html#a20840622


Yonik Seeley wrote:
> 
> On Thu, Dec 4, 2008 at 1:54 PM, wojtekpia <[EMAIL PROTECTED]> wrote:
>> It looks like file locking was the bottleneck - CPU usage is up to ~98%
>> (from
>> the previous peak of ~50%).
> 
> Great to hear it!
> 
>> I'm running the trunk code from Dec 2 with the
>> faceting improvement (SOLR-475) turned off. Thanks for all the help!
> 
> new faceting stuff off because it didn't improve things in your case,
> or because you didn't want to change that variable just now?
> 
> -Yonik
> 

-- 
View this message in context: 
http://www.nabble.com/Throughput-Optimization-tp20335132p20840668.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: new faceting algorithm

2008-12-04 Thread wojtekpia



Yonik Seeley wrote:
> 
> 
> Are you doing commits at any time?
> One possibility is the caching mechanism (weak-ref on the
> IndexReader)... that's going to be changing soon hopefully.
> 
> -Yonik
> 


No commits during this test. Should I start looking into my heap size
distribution and garbage collector selection?
-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20841219.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: NIO not working yet

2008-12-04 Thread wojtekpia


I've updated my deployment to use NIOFSDirectory. Now I'd like to confirm
some previous results with the original FSDirectory. Can I turn it off with
a parameter? I tried:

java
-Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.FSDirectory
...

but that didn't work. 

-- 
View this message in context: 
http://www.nabble.com/NIO-not-working-yet-tp20468152p20845732.html
Sent from the Solr - User mailing list archive at Nabble.com.

Smaller filterCache giving better performance

2008-12-05 Thread wojtekpia


I've seen some strangle results in the last few days of testing, but this one
flies in the face of everything I've read on this forum: Reducing
filterCache size has increased performance. 

I have posted my setup here:
http://www.nabble.com/Throughput-Optimization-td20335132.html.

My original filterCache was 700,000. Reducing it to 20,000, I found:
- Average response time decreased by 85%
- Average throughput increased by 250%
- CPU time used by the garbage collector decreased by 85%
- The system showed to weird GC issues (reported yesterday at:
http://www.nabble.com/new-faceting-algorithm-td20674902.html)

Further reducing the filterCache to 10,000
- Average response time decreased by another 27%
- Average throughput increased by another 30%
- GC CPU usage also dropped
- System behavior changed after ~30 minutes, with a slight performance
degradation

These results came from a load test. I'm running trunk code from Dec 2 with
Yonik's faceting improvement turned on.

Any thoughts?
-- 
View this message in context: 
http://www.nabble.com/Smaller-filterCache-giving-better-performance-tp20863674p20863674.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Smaller filterCache giving better performance

2008-12-05 Thread wojtekpia


Reducing the amount of memory given to java slowed down Solr at first, then
quickly caused the garbage collector to behave badly (same issue as I
referenced above). 

I am using the concurrent cache for all my caches.
-- 
View this message in context: 
http://www.nabble.com/Smaller-filterCache-giving-better-performance-tp20863674p20864928.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: new faceting algorithm

2008-12-12 Thread wojtekpia

It looks like my filterCache was too big. I reduced my filterCache size from
700,000 to 20,000 (without changing the heap size) and all my performance
issues went away. I experimented with various GC settings, but none of them
made a significant difference.

I see a 16% increase in throughput by applying this patch.

Yonik Seeley wrote:
> 
> ... This can be a big chunk of memory
> per-request, and is most likely what changed your GC profile (i.e.
> changing the GC settings may help).
> 
> 

-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20984502.html
Sent from the Solr - User mailing list archive at Nabble.com.

Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia


I'm running load tests against my Solr instance. I find that it typically
takes ~10 minutes for my Solr setup to "warm-up" while I throw my test
queries at it. Also, I have the same two warm-up queries specified for the
firstSearcher and newSearcher event listeners. 

I'm now benchmarking the affect of updating an index under load. I'm finding
that after running snapinstaller, Solr takes ~1 hour to get back to the same
performance numbers I was getting 10 minutes after a restart. If I can
justify being offline for a few moments, it seems like I'll be better off
restarting Solr rather than running Snapinstaller.

Any ideas why?

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21315273.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia


Sorry, I forgot to include that. All my autowarmcount's are set to 0.


Feak, Todd wrote:
> 
> First suspect would be Filter Cache settings and Query Cache settings.
> 
> If they are auto-warming at all, then there is a definite difference
> between the first start behavior and the post-commit behavior. This
> affects what's in memory, caches, etc.
> 
> -Todd Feak
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21315654.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia

I use my warm up queries to fill the field cache (or at least that's the
idea). My filterCache hit rate is ~99% & queryResultCache is ~65%. 

I update my index several times a day with no 'optimize', and performance is
seemless. I also update my index once nightly with an 'optimize', and that's
where I see the performance drop.

I'll try turning autowarming on.

Could this have to do with file caching by the OS? 

Otis Gospodnetic wrote:
> 
> Is autowarm count of 0 a good idea, though?
> If you don't want to autowarm any caches, doesn't that imply that you have
> very low hit rate and therefore don't care to autowarm?  And if you have a
> very low hit rate, then perhaps caches are not needed at all?
> 
> 
> How about this.  Do you optimize your index at any point?
> 

-- 
View this message in context: 
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21319344.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Snapinstaller vs Solr Restart

2009-01-06 Thread wojtekpia

I'm optimizing because I thought I should. I'll be updating my index
somewhere between every 15 minutes, and every 2 hours. That means between 12
and 96 updates per day. That seems like a lot of index files (and it scared
me a little), so that's my second reason for wanting to optimize nightly.

I haven't benchmarked the performance hit for not optimizing. That'll be my
next step. If the hit isn't too bad, I'll look into optimizing less
frequently (weekly, ...).

Thanks Otis!

Otis Gospodnetic wrote:
> 
> OK, so that question/answer seems to have hit the nail on the head.  :)
> 
> When you optimize your index, all index files get rewritten.  This means
> that everything that the OS cached up to that point goes out the window
> and the OS has to slowly re-cache the hot parts of the index.  If you
> don't optimize, this won't happen.  Do you really need to optimize?  Or
> maybe a more direct question: why are you optimizing?
> 
> 
> Regarding autowarming, with such high fq hit rate, I'd make good use of fq
> autowarming.  The result cache rate is lower, but still decent.  I
> wouldn't turn off autowarming the way you have.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21320334.html
Sent from the Solr - User mailing list archive at Nabble.com.

Overlapping Replication Scripts

2009-01-08 Thread wojtekpia


I have set up cron jobs that update my index every 15 minutes. I have a
distributed setup, so the steps are:
1. Update index on indexer machine (and possibly optimize)
2. Invoke snapshooter on indexer
3. Invoke snappuller on searcher
4. Invoke snapinstaller on searcher.

These updates are small, don't optimize, and take at most 2 minutes (end to
end). 

Nightly (or weekly or monthly), I will want to optimize my index. When I
optimize, the end-to-end time jumps to ~1 hour. With my current cron job
setup, that means I'll trigger 3 more unoptimized updates before the
optimized update completes. 

What happens if I overlap the execution of my cron jobs? Do any of these
scripts detect that another instance is already executing? 

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Overlapping-Replication-Scripts-tp21362434p21362434.html
Sent from the Solr - User mailing list archive at Nabble.com.

Performance Hit for Zero Record Dataimport

2009-01-20 Thread wojtekpia


I have a transient SQL table that I use to load data into Solr using the
DataImportHandler. I run an update every 15 minutes
(dataimport?command=full-import&clean=false&optimize=false), but my table
will frequently have no new data for me to import. When the table contains
no data, it looks like Solr is doing a lot more work than it needs to. The
performance degradation is the same for loading zero records as it is for
loading a couple thousand records (while the system is under heavy load). 

I noticed that when no data is imported, no new index files are created, so
it seems like something (Lucene?) is aware of the empty update. But since
the performance degradation is the same, I'm guessing that a new Searcher is
still created, warmed, and registered. Is that correct? 
-- 
View this message in context: 
http://www.nabble.com/Performance-Hit-for-Zero-Record-Dataimport-tp21572935p21572935.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance Hit for Zero Record Dataimport

2009-01-21 Thread wojtekpia


Thanks Shalin, a short circuit would definitely solve it. Should I open a
JIRA issue? 


Shalin Shekhar Mangar wrote:
> 
> I guess Data Import Handler still calls commit even if there were no
> documents created. We can add a short circuit in the code to make sure
> that
> does not happen.
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-Hit-for-Zero-Record-Dataimport-tp21572935p21588124.html
Sent from the Solr - User mailing list archive at Nabble.com.

Performance "dead-zone" due to garbage collection

2009-01-21 Thread wojtekpia


I'm intermittently experiencing severe performance drops due to Java garbage
collection. I'm allocating a lot of RAM to my Java process (27GB of the 32GB
physically available). Under heavy load, the performance drops approximately
every 10 minutes, and the drop lasts for 30-40 seconds. This coincides with
the size of the old generation heap dropping from ~27GB to ~6GB. 

Is there a way to reduce the impact of garbage collection? A couple ideas
we've come up with (but haven't tried yet) are: increasing the minimum heap
size, more frequent (but hopefully less costly) garbage collection.

Thanks,

Wojtek

-- 
View this message in context: 
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21588427.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance Hit for Zero Record Dataimport

2009-01-21 Thread wojtekpia


Created SOLR 974: https://issues.apache.org/jira/browse/SOLR-974

-- 
View this message in context: 
http://www.nabble.com/Performance-Hit-for-Zero-Record-Dataimport-tp21572935p21588634.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance "dead-zone" due to garbage collection

2009-01-21 Thread wojtekpia


I'm using a recent version of Sun's JVM (6 update 7) and am using the
concurrent generational collector. I've tried several other collectors, none
seemed to help the situation.

I've tried reducing my heap allocation. The search performance got worse as
I reduced the heap. I didn't monitor the garbage collector in those tests,
but I imagine that it would've gotten better. (As a side note, I do lots of
faceting and sorting, I have 10M records in this index, with an approximate
index file size of 10GB).

This index is on a single machine, in a single Solr core. Would splitting it
across multiple Solr cores on a single machine help? I'd like to find the
limit of this machine before spreading the data to more machines.

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21590150.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance "dead-zone" due to garbage collection

2009-01-21 Thread wojtekpia


(Thanks for the responses)

My filterCache hit rate is ~60% (so I'll try making it bigger), and I am CPU
bound. 

How do I measure the size of my per-request garbage? Is it (total heap size
before collection - total heap size after collection) / # of requests to
cause a collection?

I'll try your suggestions and post back any useful results.

-- 
View this message in context: 
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21593661.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Intermittent high response times

2009-01-22 Thread wojtekpia


I'm experiencing similar issues. Mine seem to be related to old generation
garbage collection. Can you monitor your garbage collection activity? (I'm
using JConsole to monitor it:
http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html). 

In my system, garbage collection usually doesn't cause any trouble. But once
in a while, the size of the old generation flat-lines for some time (~dozens
of seconds). When this happens, I see really bad response times from Solr
(not quite as bad as you're seeing, but almost). The old-gen flat-lines
always seem to be right before, or right after the old-gen is garbage
collected.
-- 
View this message in context: 
http://www.nabble.com/Intermittent-high-response-times-tp21602475p21608986.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance "dead-zone" due to garbage collection

2009-01-22 Thread wojtekpia


I'm not sure if you suggested it, but I'd like to try the IBM JVM. Aside from
setting my JRE paths, is there anything else I need to do run inside the IBM
JVM? (e.g. re-compiling?)


Walter Underwood wrote:
> 
> What JVM and garbage collector setting? We are using the IBM JVM with
> their concurrent generational collector. I would strongly recommend
> trying a similar collector on your JVM. Hint: how much memory is in
> use after a full GC? That is a good approximation to the working set.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21616078.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Intermittent high response times

2009-01-23 Thread wojtekpia


The type of garbage collector definitely affects performance, but there are
other settings as well. There's a related thread currently discussing this:
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-td21588427.html



hbi dev wrote:
> 
> Hi wojtekpia,
> 
> That's interesting, I shall be looking into this over the weekend so I
> shall
> look at the GC also. I was briefly reading about GC last night, am I right
> in thinking it could be affected by what version of the jvm I'm using
> (1.5.0.8), and also what type of Collector is set? What collector is the
> default, and what would people recommend for an application like Solr?
> Thanks
> Waseem
> 

-- 
View this message in context: 
http://www.nabble.com/Intermittent-high-response-times-tp21602475p21628769.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Performance "dead-zone" due to garbage collection

2009-01-30 Thread wojtekpia


I profiled our application, and GC is definitely the problem. The IBM JVM
didn't change much. I'm currently looking into ways of reducing my memory
footprint. 

-- 
View this message in context: 
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21758001.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr on Sun Java Real-Time System

2009-01-30 Thread wojtekpia


Has anyone tried Solr on the Sun Java Real-Time JVM
(http://java.sun.com/javase/technologies/realtime/index.jsp)? I've read that
it includes better control over the garbage collector.

Thanks.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Solr-on-Sun-Java-Real-Time-System-tp21758035p21758035.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance "dead-zone" due to garbage collection

2009-02-03 Thread wojtekpia


I noticed your wiki post about sorting with a function query instead of the
Lucene sort mechanism. Did you see a significantly reduced memory footprint
by doing this? Did you reduce the number of fields you allowed users to sort
by?


Lance Norskog-2 wrote:
> 
> Sorting creates a large array with "roughly" an entry for every document
> in
> the index. If it is not on an 'integer' field it takes even more memory.
> If
> you do a sorted request and then don't sort for a while, that will drop
> the
> sort structures and trigger a giant GC.
> 
> We went through some serious craziness with sorting.
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21814038.html
Sent from the Solr - User mailing list archive at Nabble.com.

Custom Sorting Algorithm

2009-02-04 Thread wojtekpia


Is an easy way to choose/create an alternate sorting algorithm? I'm
frequently dealing with large result sets (a few million results) and I
might be able to benefit domain knowledge in my sort.
-- 
View this message in context: 
http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21837721.html
Sent from the Solr - User mailing list archive at Nabble.com.

Queued Requests during GC

2009-02-04 Thread wojtekpia


During full garbage collection, Solr doesn't acknowledge incoming requests.
Any requests that were received during the GC are timestamped the moment GC
finishes (at least that's what my logs show). Is there a limit to how many
requests can queue up during a full GC? This doesn't seem like a Solr
setting, but rather a container/OS setting (I'm using Tomcat on Linux).

Thanks.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Queued-Requests-during-GC-tp21837898p21837898.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom Sorting Algorithm

2009-02-04 Thread wojtekpia

That's not quite what I meant. I'm not looking for a custom comparator, I'm
looking for a custom sorting algorithm. Is there a way to use quick sort or
merge sort or... rather than the current algorithm? Also, what is the
current algorithm?

Otis Gospodnetic wrote:
> 
> 
> You can use one of the exiting function queries (if they fit your need) or
> write a custom function query to reorder the results of a query.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21838804.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom Sorting Algorithm

2009-02-04 Thread wojtekpia

Ok, so maybe a better question is: should I bother trying to change the
"sorting" algorithm? I'm concerned that with large data sets, sorting
becomes a severe bottleneck (this is an assumption, I haven't profiled
anything to verify). Does it become a severe bottleneck? Do you know if
alternate sort algorithms have been tried during Lucene development? 

markrmiller wrote:
> 
> It would not be simple to use a new algorithm. The current 
> implementation takes place at the Lucene level and uses a priority 
> queue. When you ask for the top n results, a priority queue of size n is 
> filled with all of the matching documents. The ordering in the priority 
> queue is the sort. The on Sort method orders by relevance score - the 
> Sort method orders by field, relevance, or doc id.
> 

-- 
View this message in context: 
http://www.nabble.com/Custom-Sorting-Algorithm-tp21837721p21840299.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance "dead-zone" due to garbage collection

2009-02-09 Thread wojtekpia

I've been able to reduce these GC outages by:

1) Optimizing my schema. This reduced my index size by more than 50%
2) Smaller cache sizes. I started with filterCache, documentCache &
queryCache sizes of ~10,000. They're now at ~500
3) Reduce heap allocation. I started at 27 GB, now I'm 'only' allocating 8
GB
4) Update to trunk (was using Dec 2/08 code, now using Jan 26/09)

I still see outages due to garbage collection every ~10 minutes, but they
last ~2 seconds (instead of 20+ seconds). Note that my throughput dropped
from ~30 hits/second to ~23 hits/second. Luckily, I'm still hitting my
performance requirements, so I'm able to accept that.

Thanks for the tips!

Wojtek

yonik wrote:
> 
> On Tue, Feb 3, 2009 at 11:58 AM, wojtekpia  wrote:
>> I noticed your wiki post about sorting with a function query instead of
>> the
>> Lucene sort mechanism. Did you see a significantly reduced memory
>> footprint
>> by doing this?
> 
> FunctionQuery derives field values from the FieldCache... so it would
> use the same amount of memory as sorting.
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21922773.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance "dead-zone" due to garbage collection

2009-02-09 Thread wojtekpia


I tried sorting using a function query instead of the Lucene sort and found
no change in performance. I wonder if Lance's results are related to
something specific to his deployment?
-- 
View this message in context: 
http://www.nabble.com/Performance-%22dead-zone%22-due-to-garbage-collection-tp21588427p21922851.html
Sent from the Solr - User mailing list archive at Nabble.com.

Performance degradation caused by choice of range fields

2009-02-09 Thread wojtekpia


In my schema I have two copies of my numeric fields: one with the original
value (used for display, sort), and one with a rounded version of the
original value (used for range queries).

When I use my rounded field for numeric range queries (e.g.
q=RoundedValue:[100 TO 1000]), I see very consistent results under load. My
hit rate stays the same (at ~23 hits/sec) throughout long running load
tests.

When I use my original field for range queries, I get performance
degradation over time (while under load), rather than consistently worse
throughput. For the first 15 minutes, I see throughput similar to my
throughput with rounded values, of about 23 hits/second. For the next 15
minutes, I'm down to about 20 hits/second. For the next 15 minutes, I'm down
to about 18 hits/second, etc.

I expected worse performance by using the non-rounded original value, but I
didn't expect degradation. I expected to see throughput of X < 23
hits/second, but consistent at all times. I don't understand why my
performance gets worse over time. Any ideas why?

I have ~1000 unique values in my rounded field, and ~ 100,000 unique values
in my un-rounded field.

Thanks.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Performance-degradation-caused-by-choice-of-range-fields-tp21924197p21924197.html
Sent from the Solr - User mailing list archive at Nabble.com.

Recent Paging Change?

2009-02-10 Thread wojtekpia


Has there been a recent change (since Dec 2/08) in the paging algorithm? I'm
seeing much worse performance (75% drop in throughput) when I request 20
records starting at record 180 (page 10 in my application).

Thanks.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Recent-Paging-Change--tp21946610p21946610.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Recent Paging Change?

2009-02-11 Thread wojtekpia


I'll run a profiler on new and old code and let you know what I find.

I have changed my schema between tests: I used to have termVectors turned on
for several fields, and now they are always off. My underlying data has not
changed.
-- 
View this message in context: 
http://www.nabble.com/Recent-Paging-Change--tp21946610p21958267.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance degradation caused by choice of range fields

2009-02-11 Thread wojtekpia


Yes, I commit roughly every 15 minutes (via a data update). This update is
consistent between my tests, and only causes a performance drop when I'm
sorting on fields with many unique values. I've examined my GC logs, and
they are also consistent between my tests.



Otis Gospodnetic wrote:
> 
> Hi,
> 
> Did you commit (reopen the searcher) during the performance degradation
> period and did any of your queries use sort?  If so, perhaps your JVM is
> accumulating those thrown-away FieldCache objects and then GC has more and
> more garbage to clean up, causing pauses and lowering your overall
> throughput.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-degradation-caused-by-choice-of-range-fields-tp21924197p21958268.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Recent Paging Change?

2009-02-11 Thread wojtekpia


This was a false alarm, sorry. I misinterpreted some results.



wojtekpia wrote:
> 
> Has there been a recent change (since Dec 2/08) in the paging algorithm?
> I'm seeing much worse performance (75% drop in throughput) when I request
> 20 records starting at record 180 (page 10 in my application). 
> 
> Edit: the 75% drop is compared to my throughput for page 10 queries using
> Dec 2/08 code.
> 
> Thanks.
> 
> Wojtek
> 

-- 
View this message in context: 
http://www.nabble.com/Recent-Paging-Change--tp21946610p21969121.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reading Core-Specific Config File in a Row Transformer

2009-02-17 Thread wojtekpia


I'm using the DataImportHandler to load data. I created a custom row
transformer, and inside of it I'm reading a configuration file. I am using
the system's solr.solr.home property to figure out which directory the file
should be in. That works for a single-core deployment, but not for
multi-core deployments (since I'm always looking in
solr.solr.home/conf/file.txt). Is there a clean way to resolve the actual
conf directory path from within a custom row transformer so that it works
for both single-core and multi-core deployments?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Reading-Core-Specific-Config-File-in-a-Row-Transformer-tp22069449p22069449.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Reading Core-Specific Config File in a Row Transformer

2009-02-18 Thread wojtekpia


Thanks Shalin. I think you missed the call to .getResourceLoader(), so it
should be:

context.getSolrCore().getResourceLoader().getInstanceDir()

Works great, thanks!


Shalin Shekhar Mangar wrote:
> 
> 
> You can use Context.getSolrCore().getInstanceDir()
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Reading-Core-Specific-Config-File-in-a-Row-Transformer-tp22069449p22086846.html
Sent from the Solr - User mailing list archive at Nabble.com.

Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread wojtekpia


Is there a recommended unix flavor for deploying Solr on? I've benchmarked my
deployment on Red Hat. Our operations team asked if we can use FreeBSD
instead. Assuming that my benchmark numbers are consistent on FreeBSD, is
there anything else I should watch out for? 

Thanks.

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Redhat-vs-FreeBSD-vs-other-unix-flavors-tp22251134p22251134.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Redhat vs FreeBSD vs other unix flavors

2009-02-27 Thread wojtekpia


Thanks Otis. Do you know what the most common deployment OS is? I couldn't
find much on the mailing list or http://wiki.apache.org/solr/PublicServers


Otis Gospodnetic wrote:
> 
> 
> You should be fine on either Linux or FreeBSD (or any other UNIX flavour). 
> Running on Solaris would probably give you access to goodness like dtrace,
> but you can live without it.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Redhat-vs-FreeBSD-vs-other-unix-flavors-tp22251134p22251260.html
Sent from the Solr - User mailing list archive at Nabble.com.

JVM exception_access_violation

2009-03-20 Thread wojtekpia


I'm running Solr on Tomcat 6.0.18 with Java 6 update 7 on Windows 2003 64
bit. Over the past month or so, my JVM has crashed twice with the error
below. Has anyone experienced this? My system is not heavily loaded, and the
crash seems to coincide with an update (via DIH). I'm running trunk code
from late January. Note that I update my index ~50 times per day, and this
crash has happened twice in the past month (so 2 of 1500 updates seem to
have triggered the crash).

This Windows deployment is for demos, so I'm not too concerned about it.
Interestingly, my production deployment is on a 64 bit Linux system (same
versions of everything) and I haven't been able to reproduce the bug there.

#
# An unexpected error has been detected by Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x080e51c3,
pid=4404, tid=956
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b23 mixed mode
windows-amd64)
# Problematic frame:
# V  [jvm.dll+0xe51c3]
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

---  T H R E A D  ---

Current thread (0x01de2000):  GCTaskThread [stack:
0x,0x] [id=956]

siginfo: ExceptionCode=0xc005, reading address 0x

Registers:
EAX=0x3000, EBX=0x01e40330, ECX=0x000184b49821,
EDX=0x000184b4b580
ESP=0x07cff9b0, EBP=0x, ESI=0x000184b4b580,
EDI=0x0935
EIP=0x080e51c3, EFLAGS=0x00010206

Top of Stack: (sp=0x07cff9b0)
0x07cff9b0:   01e40330 
0x07cff9c0:   000184b4dd88 0935
0x07cff9d0:   08464b08 01dbbdc0
0x07cff9e0:   01dbf190 8a65
0x07cff9f0:   2f5b4000 0002015f
0x07cffa00:   0002 01dbf2f0
0x07cffa10:   01e40330 01dbf430
0x07cffa20:   01dbf4f0 000201602d18
0x07cffa30:   07effa00 07cffb40
0x07cffa40:    
0x07cffa50:    0830484d
0x07cffa60:   0002015f 0002
0x07cffa70:   0048 0001
0x07cffa80:   0001 00bb8501
0x07cffa90:   01dbf378 080ea807
0x07cffaa0:   07cffb40 07cffb40 

Instructions: (pc=0x080e51c3)
0x080e51b3:   4c 8d 44 24 20 48 8b d6 48 8b 41 10 48 83 c1 10
0x080e51c3:   ff 90 c0 01 00 00 44 8b 1d 08 f2 44 00 45 85 db 


Stack: [0x,0x],  sp=0x07cff9b0, 
free space=127998k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
V  [jvm.dll+0xe51c3]

[error occurred during error reporting (printing native stack), id
0xc005]


---  P R O C E S S  ---

Java Threads: ( => current thread )
  0x10286c00 JavaThread "Thread-135" daemon [_thread_blocked,
id=4892, stack(0x1169,0x1179)]
  0x10285400 JavaThread "http-8084-10" daemon [_thread_blocked,
id=5108, stack(0x1201,0x1211)]
  0x10287400 JavaThread "http-8084-9" daemon [_thread_blocked,
id=1772, stack(0x149a,0x14aa)]
  0x1028a400 JavaThread "http-8084-8" daemon [_thread_blocked,
id=1656, stack(0x11f1,0x1201)]
  0x01dc2c00 JavaThread "http-8084-7" daemon [_thread_blocked,
id=2056, stack(0x11e1,0x11f1)]
  0x10288400 JavaThread "http-8084-6" daemon [_thread_blocked,
id=4792, stack(0x11d1,0x11e1)]
  0x10286800 JavaThread "MultiThreadedHttpConnectionManager cleanup"
daemon [_thread_blocked, id=3792,
stack(0x1251,0x1261)]
  0x0f6e8400 JavaThread "http-8084-5" daemon [_thread_blocked,
id=3540, stack(0x11c1,0x11d1)]
  0x0f6e7800 JavaThread "http-8084-4" daemon [_thread_blocked,
id=4048, stack(0x11b1,0x11c1)]
  0x0f6e8000 JavaThread "http-8084-3" daemon [_thread_blocked,
id=1932, stack(0x1159,0x1169)]
  0x0f6e7000 JavaThread "http-8084-2" daemon [_thread_blocked,
id=996, stack(0x1149,0x1159)]
  0x01dc6000 JavaThread "http-8084-1" daemon [_thread_blocked,
id=4924, stack(0x1139,0x1149)]
  0x01dc5800 JavaThread "TP-Monitor" daemon [_thread_blocked,
id=2288, stack(0x1121,0x1131)]
  0x01dc5400 JavaThread "TP-Processor4" daemon [_thread_in_native,
id=4588, stack(0x,0x1121)]
  0x01dc4c00 JavaThread "TP-Processor3" daemon [_thread_blocked,
id=652, stack(0x1101,0x)]
  0x01dc4400

Sorting by 'starts with'

2009-05-07 Thread wojtekpia


I have an index of product names. I'd like to sort results so that entries
starting with the user query come first. 
E.g. 

q=kitchen

Results would sort something like:
1. kitchen appliance
2. kitchenaid dishwasher
3. fridge for kitchen

It looks like using a query Function Query comes close, but I don't know how
to write a subquery that only matches if the value starts with the query
string. 

Has anyone solved a similar need?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/Sorting-by-%27starts-with%27-tp23432815p23432815.html
Sent from the Solr - User mailing list archive at Nabble.com.

preImportDeleteQuery

2009-05-07 Thread wojtekpia


Hi,
I'm importing data using the DIH. I manage all my data updates outside of
Solr, so I use the full-import command to update my index (with
clean=false). Everything works fine, except that I can't delete documents
easily using the DIH. I noticed the preImportDeleteQuery attribute, but
doesn't seem to do what I'm looking for. I'm looking to do something like:

preImportDeleteQuery="ItemId={select ItemId from table where
status='delete'}"

http://issues.apache.org/jira/browse/SOLR-1059 SOLR-1059  seems to address
this, but I couldn't find any documentation for it in the wiki. Can someone
provide an example of how to use this?

Thanks,

Wojtek
-- 
View this message in context: 
http://www.nabble.com/preImportDeleteQuery-tp23437674p23437674.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: preImportDeleteQuery

2009-05-08 Thread wojtekpia

I'm using full-import, not delta-import. I tried it with delta-import, and it
would work, except that I'm querying for a large number of documents so I
can't afford the cost of deltaImportQuery for each document.

It sounds like $deleteDocId will work. I just need to update from 1.3 to
trunk. Thanks!

Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> are you doing a full-import or a delta-import?
> 
> for delta-import there is an option of deletedPkQuery which should
> meet your needs
> 
> 

-- 
View this message in context: 
http://www.nabble.com/preImportDeleteQuery-tp23437674p23450308.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: JVM exception_access_violation

2009-05-08 Thread wojtekpia


I updated to Java 6 update 13 and have been running problem free for just
over a month. I'll continue this thread if I run into any problems that seem
to be related.


Yonik Seeley-2 wrote:
> 
> I assume that you're not using any Tomcat native libs?  If you are,
> try removing them... if not (and the crash happened more than once in
> the same place) then it looks like a JVM bug rather than flakey
> hardware and the easiest path forward would be to try the latest Java6
> (update 12).
> 

-- 
View this message in context: 
http://www.nabble.com/JVM-exception_access_violation-tp22623667p23451994.html
Sent from the Solr - User mailing list archive at Nabble.com.

1 2 >

1 - 100 of 118 matches

Mail list logo