solr / lucene engineering positions in Boston, MA USA @ the Echo Nest

2010-09-10 Thread Brian Whitman
Hi all, brief message to let you know that we're in heavy hire mode at the Echo Nest. As many of you know we are very heavy solr/lucene users (~1bn documents across many many servers) and a lot of our staff have been working with and contributing to the projects over the years. We are a "music inte

autocommit commented out -- what is the default?

2010-12-04 Thread Brian Whitman
Hi, if you comment out the block in solrconfig.xml Does this mean that (a) commits never happen automatically or (b) some default autocommit is applied?

"document commit" possible?

2008-06-23 Thread Brian Whitman
Could the commit operation be adapted to just have the searchers aware of new stored content in a particular document? e.g. With the understanding that queries for newly indexed fields in this document will not return this newly added document, but a query for the document by its id will r

Re: diversity in results

2008-08-04 Thread Brian Whitman
On Aug 4, 2008, at 12:50 PM, Jason Rennie wrote: Is there any option in solr to encourage diversity in the results? Our solr index has millions of products, many of which are quite similar to each other. Even something simple like max 50% text overlap in successive results would be valuabl

partialResults, distributed search & SOLR-502

2008-08-15 Thread Brian Whitman
I was going to file a ticket like this: "A SOLR-303 query with &shards=host1,host2,host3 when host3 is down returns an error. One of the advantages of a shard implementation is that data can be stored redundantly across different shards, either as direct copies (e.g. when host1 and host3 ar

Re: partialResults, distributed search & SOLR-502

2008-08-18 Thread Brian Whitman
ng would have to get put in that (optionally) catches connection errors and still builds the response from the shards that did respond. On Fri, Aug 15, 2008 at 1:23 PM, Brian Whitman <[EMAIL PROTECTED] > wrote: I was going to file a ticket like this: "A SOLR-303 query with &am

Re: partialResults, distributed search & SOLR-50

2008-08-18 Thread Brian Whitman
On Aug 18, 2008, at 12:31 PM, Yonik Seeley wrote: On Mon, Aug 18, 2008 at 12:16 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Yes, as far as I know, what Brian said is correct. Also, as far as I know, there is nothing that gracefully handles problematic Solr instances during distributed s

Re: which shard is a result coming from

2008-08-19 Thread Brian Whitman
On Aug 19, 2008, at 8:49 AM, Ian Connor wrote: What is the current "special requestHandler" that you can set currently? If you're referring to my issue post, that's just something we have internally (not in trunk solr) that we use instead of /update -- it just inserts a hostname:port/sol

in a RequestHandler's init, how to get solr data dir?

2008-08-26 Thread Brian Whitman
I want to be able to store non-solr data in solr's data directory (like solr/solr/data/stored alongside solr/solr/data/index) The java class that sets up this data is instantiated from a RequestHandlerBase class like: public class StoreDataHandler extends RequestHandlerBase { StoredData sto

Re: in a RequestHandler's init, how to get solr data dir?

2008-08-26 Thread Brian Whitman
On Aug 26, 2008, at 12:24 PM, Shalin Shekhar Mangar wrote: Hi Brian, You can implement the SolrCoreAware interface which will give you access to the SolrCore object through the SolrCoreAware#inform method you will need to implement. It is called after the init method. Shalin, that worke

Re: Adding a field?

2008-08-26 Thread Brian Whitman
On Aug 26, 2008, at 3:09 PM, Jon Drukman wrote: Is there a way to add a field to an existing index without stopping the server, deleting the index, and reloading every document from scratch? You can add a field to the schema at any time without adversely affecting the rest of the index

UpdateRequestProcessorFactory / Chain etc

2008-09-06 Thread Brian Whitman
Trying to build a simple UpdateRequestProcessor that keeps a field (the time of original index) when overwriting a document. 1) Can I make a updateRequestProcessor chain only work as a certain handler or does putting the following in my solrconfig.xml: Just handle all docu

Re: UpdateRequestProcessorFactory / Chain etc

2008-09-06 Thread Brian Whitman
Answered my own qs, I think: Trying to build a simple UpdateRequestProcessor that keeps a field (the time of original index) when overwriting a document. 1) Can I make a updateRequestProcessor chain only work as a certain handler or does putting the following in my solrconfig.xml:

Re: UpdateRequestProcessorFactory / Chain etc

2008-09-07 Thread Brian Whitman
Hm... I seem to be having trouble getting either the Factory or the Processor to do an init() for me. The end result I'd like to see is a function that gets called only once, either on solr init or the first time the handler is called. I can't seem to do that. I have these two classes:

Re: UpdateRequestProcessorFactory / Chain etc

2008-09-07 Thread Brian Whitman
On Sep 7, 2008, at 2:04 PM, Brian Whitman wrote: Hm... I seem to be having trouble getting either the Factory or the Processor to do an init() for me. The end result I'd like to see is a function that gets called only once, either on solr init or the first time the handler is called.

RequestHandler that passes along the query

2008-10-03 Thread Brian Whitman
Not sure if this is possible or easy: I want to make a requestHandler that acts just like select but does stuff with the output before returning it to the client. e.g. http://url/solr/myhandler?q=type:dog&sort=legsdesc&shards=dogserver1;dogserver2 When myhandler gets it, I'd like to take the resu

Re: RequestHandler that passes along the query

2008-10-04 Thread Brian Whitman
Thanks grant and ryan, so far so good. But I am confused about one thing - when I set this up like: public void process(ResponseBuilder rb) throws IOException { And put it as the last-component on a distributed search (a defaults shard is defined in the solrconfig for the handler), the componen

Re: RequestHandler that passes along the query

2008-10-04 Thread Brian Whitman
t stage (GET_FIELDS) that distributedProcess gets called for. On Sat, Oct 4, 2008 at 10:12 AM, Brian Whitman <[EMAIL PROTECTED]> wrote: > Thanks grant and ryan, so far so good. But I am confused about one thing - > when I set this up like: > > public void process(ResponseBuilder

Re: RequestHandler that passes along the query

2008-10-04 Thread Brian Whitman
mplement: process(ResponseBuilder rb) > > ryan > > > > On Oct 4, 2008, at 1:06 PM, Brian Whitman wrote: > > Sorry for the extended question, but I am having trouble making >> SearchComponent that can actually get at the returned response in a >> distributed setup.

maxCodeLen in the doublemetaphone solr analyzer

2008-11-13 Thread Brian Whitman
I want to change the maxCodeLen param that is in Solr 1.3's doublemetaphone plugin. Doc is here: http://commons.apache.org/codec/apidocs/org/apache/commons/codec/language/DoubleMetaphone.html Is this something I can do in solrconfig or do I need to change it and recompile?

Re: maxCodeLen in the doublemetaphone solr analyzer

2008-11-13 Thread Brian Whitman
oh, thanks! I didn't see that patch. On Thu, Nov 13, 2008 at 3:40 PM, Feak, Todd <[EMAIL PROTECTED]> wrote: > There's a patch in to do that as a separate filter. See > https://issues.apache.org/jira/browse/SOLR-813 >

matching exact terms

2008-11-25 Thread Brian Whitman
This is probably severe user error, but I am curious about how to index docs to make this query work: happy birthday to return the doc with n_name:"Happy Birthday" before the doc with n_name:"Happy Birthday, Happy Birthday" . As it is now, the latter appears first for a query of n_name:"happy birt

cannot allocate memory for snapshooter

2009-01-02 Thread Brian Whitman
I have an indexing machine on a test server (a mid-level EC2 instance, 8GB of RAM) and I run jetty like: java -server -Xms5g -Xmx5g -XX:MaxPermSize=128m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heap -Dsolr.solr.home=/vol/solr -Djava.awt.headless=true -jar start.jar The indexing maste

debugging long commits

2009-01-02 Thread Brian Whitman
We have a distributed setup that has been experiencing glacially slow commit times on only some of the shards. (10s on a good shard, 263s on a slow shard.) Each shard for this index has about 10GB of lucene index data and the documents are segregated by an md5 hash, so the distribution of document/

Re: debugging long commits

2009-01-02 Thread Brian Whitman
a.util.ArrayList 26: 12471 399072 org.apache.lucene.document.Document 27: 3895 372216 [[I 28: 3904 309592 [S 29: 534 249632 30: 3451 220864 org.apache.lucene.index.SegmentReader$Norm 31: 1547 136136 o

Re: debugging long commits

2009-01-02 Thread Brian Whitman
I think I'm getting close with this (sorry for the self-replies) I tried an optimize (which we never do) and it took 30m and said this a lot: Exception in thread "Lucene Merge Thread #4" org.apache.lucene.index.MergePolicy$MergeException: java.lang.ArrayIndexOutOfBoundsException: Array index out

Re: cannot allocate memory for snapshooter

2009-01-02 Thread Brian Whitman
-to11423199.html#a11424938 > > Bill > > On Fri, Jan 2, 2009 at 10:52 AM, Brian Whitman wrote: > > > I have an indexing machine on a test server (a mid-level EC2 instance, > 8GB > > of RAM) and I run jetty like: > > > > java -server -Xms5g -Xmx5g -XX

Re: cannot allocate memory for snapshooter

2009-01-05 Thread Brian Whitman
On Sun, Jan 4, 2009 at 9:47 PM, Mark Miller wrote: > Hey Brian, I didn't catch what OS you are using on EC2 by the way. I > thought most UNIX OS's were using memory overcommit - A quick search brings > up Linux, AIX, and HP-UX, and maybe even OSX? > > What are you running over there? EC2, so Linu

lazily loading search components?

2009-02-08 Thread Brian Whitman
We have a standard solr install that we use across a lot of different uses. In that install is a custom search component that loads a lot of data in its inform() method. This means the data is initialized on solr boot. Only about half of our installs actually ever call this search component, so the

general survey of master/replica setups

2009-02-23 Thread Brian Whitman
Say you have a bunch of solr servers that index new data, and then some replica/"slave" setup that snappulls from the master on a cron or some schedule. Live internet facing queries hit the replica, not the master, as indexes/commits on the master slow down queries. But even the query-only solr ins

arcane queryParser parseException

2009-02-23 Thread Brian Whitman
server:/solr/select?q=field:"''anything can go here;" --> Lexical error, encountered after : "\"\'\'anything can go here" server:/solr/select?q=field:"'anything' anything can go here;" --> Same problem server:/solr/select?q=field:"'anything' anything can go here\;" --> No problem (but ClientUtils

Re: arcane queryParser parseException

2009-02-24 Thread Brian Whitman
> > : I went ahead and added it since it does not hurt anything to escape more > : things -- it just makes the final string ugly. > > : In 1.3 the escape method covered everything: > > H good call, i didn't realize the escape method had been so > blanket in 1.3. this way we protect people

java.lang.NoSuchMethodError: org.apache.solr.common.util.ConcurrentLRUCache.getLatestAccessedItems(J)Ljava/util/Map;

2009-02-24 Thread Brian Whitman
Seeing this in the logs of an otherwise working solr instance. Commits are done automatically I believe every 10m or 1 docs. This is solr trunk (last updated last night) Any ideas? INFO: [] webapp=/solr path=/select params={fl=thingID,n_thingname,score&q=n_thingname:"Cornell+Dupree"^5+net_th

Re: java.lang.NoSuchMethodError: org.apache.solr.common.util.ConcurrentLRUCache.getLatestAccessedItems(J)Ljava/util/Map;

2009-02-24 Thread Brian Whitman
Yep, did ant clean, made sure all the solr-libs were current, no more exception. Thanks ryan & mark On Tue, Feb 24, 2009 at 1:47 PM, Ryan McKinley wrote: > i hit that one too! > > try: ant clean > > > > On Feb 24, 2009, at 12:08 PM, Brian Whitman wrote: > >

maxCodeLength in PhoneticFilterFactory

2009-04-10 Thread Brian Whitman
i have this version of solr running: Solr Implementation Version: 1.4-dev 747554M - bwhitman - 2009-02-24 16:37:49 and am trying to update a schema to support 8 code length metaphone instead of 4 via this (committed) issue: https://issues.apache.org/jira/browse/SOLR-813 So I change the schema t

Re: maxCodeLength in PhoneticFilterFactory

2009-04-12 Thread Brian Whitman
efinitely a bug - I just reproduced it. Nothing obvious > > jumps out at me... and there's no error in the logs either (that's > > another bug it would seem). Could you open a JIRA issue for this? > > > > > > -Yonik > > http://www.lucidimagination.co

python response handler treats "unschema'd" fields differently

2009-04-17 Thread Brian Whitman
I have a solr index where we removed a field from the schema but it still had some documents with that field in it. Queries using the standard response handler had no problem but the &wt=python handler would break on any query (with fl="*" or asking for that field directly) with: SolrHTTPException

index time boosting on multivalued fields

2009-05-27 Thread Brian Whitman
I can set the boost of a field or doc at index time using the boost attr in the update message, e.g. pet But that won't work for multivalued fields according to the RelevancyFAQ pet animal ( I assume it applies the last boost parsed to all terms? ) Now, say I'd like to do index-time boosting of

Re: Pagination of results and XSLT.

2007-07-23 Thread Brian Whitman
Has anyone tried to handle pagination of results using XSLT's ? I'm not really sure it is possible to do it in pure XSLT because all the response object gives us is a total document count - paginating the results would involve more than what XSLT 1.0 could handle (I'll be very happy if some

Re: Pagination of results and XSLT.

2007-07-24 Thread Brian Whitman
On Jul 24, 2007, at 5:20 AM, Ard Schrijvers wrote: I have been using similar xsls like you describe below in the past, butI think after 3 years of using it I came to realize (500 internal server error) that it can lead to nasty errors when you have a recursive call like (though I am

Re: boost field without dismax

2007-07-24 Thread Brian Whitman
Jul 24, 2007, at 9:42 AM, Alessandro Ferrucci wrote: is there a way to boost a field much like is done in dismax request handler? I've tried doing index-time boosting by providing the boost to the field as an attribute in the add doc but that did nothing to affect the score when I went to s

Re: XML parsing error

2007-07-26 Thread Brian Whitman
On Jul 26, 2007, at 11:25 AM, Yonik Seeley wrote: OK, then perhaps it's a jetty bug with charset handling. I'm using resin btw Could you run the same query, but use the python output? wt=python Seems to be OK: {'responseHeader':{'status':0,'QTime':0,'params':{'start':'7','fl':'c onten

Re: XML parsing error

2007-07-26 Thread Brian Whitman
On Jul 26, 2007, at 11:10 AM, Yonik Seeley wrote: If the '<' truely got destroyed, it's a server (Solr or Jetty) bug. One possibility is that the '<' does exist, but due to a charset mismatch, it's being slurped into a multi-byte char. Just dumped it with curl and did a hexdump: 5a0

XML parsing error

2007-07-26 Thread Brian Whitman
I ended up with this doc in solr: 0name="QTime">17name="fl">content"Pez"~1name="rows">1numFound="5381" start="7">Akatsuki - PE'Z ҳ | ̳ | պ | ŷ | >>> Akatsuki - PE'Z ר | и  | Ů  | ֶ  | պ  | ¸  | tӺ  | Ϸ  | Ӱ  | ϼ  | ŷ>  | ϸ  | ѵ ŷ> >

Re: XML parsing error

2007-07-26 Thread Brian Whitman
On Jul 26, 2007, at 11:49 AM, Yonik Seeley wrote: Could you try it with jetty to see if it's the servlet container? It should be simple to just copy the index directory into solr's example/solr/data directory. Yonik, sorry for my delay, but I did just try this in jetty -- it works (it doe

Re: Any clever ideas to inject into solr? Without http?

2007-08-09 Thread Brian Whitman
On Aug 9, 2007, at 11:12 AM, Kevin Holmes wrote: 2: Is there a way to inject into solr without using POST / curl / http? Check http://wiki.apache.org/solr/EmbeddedSolr There's examples in java and cocoa to use the DirectSolrConnection class, querying and updating solr w/o a web serve

Re: Python Utilitys for Solr

2007-08-14 Thread Brian Whitman
On Aug 14, 2007, at 5:16 AM, Christian Klinger wrote: Hi i just play a bit with: http://svn.apache.org/repos/asf/lucene/solr/trunk/client/python/ solr.py Is it possible that this library is a bit out of date? If i try to get the example running. I got a parese error from the result. May

Re: Indexing a URL

2007-09-05 Thread Brian Whitman
It is apparently attempting to parse &en=499af384a9ebd18f in the URL. I am not clear why it would do this as I specified indexed="false." I need to store this because that is how the user gets to the original article. the ampersand is an XML reserved character. you have to escape it (t

Re: DirectSolrConnection, write.lock and Too Many Open Files

2007-09-10 Thread Brian Whitman
On Sep 10, 2007, at 1:33 AM, Adrian Sutton wrote: After a while we start getting exceptions thrown because of a timeout in acquiring write.lock. It's quite possible that this occurs whenever two updates are attempted at the same time - is DirectSolrConnection intended to be thread safe?

Re: DirectSolrConnection, write.lock and Too Many Open Files

2007-09-10 Thread Brian Whitman
On Sep 10, 2007, at 5:00 PM, Mike Klaas wrote: On 10-Sep-07, at 1:50 PM, Adrian Sutton wrote: We use DirectSolrConnection via JNI in a couple of client apps that sometimes have 100s of thousands of new docs as fast as Solr will have them. It would crash relentlessly if I didn't force all

Re: Term extraction

2007-09-19 Thread Brian Whitman
On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote: I'm currently looking at methods of term extraction and automatic keyword generation from indexed documents. We do it manually (not in solr, but we put the results in solr.) We do it the usual way - chunk (into n-grams, named entities & nou

logging bad stuff separately in resin

2007-09-22 Thread Brian Whitman
We have a largish solr index that handles roughly 200K new docs a day and also roughly a million queries a day from other programs. It's hosted by resin. A couple of times in the past few weeks something "bad" has happened -- a lock error or file handle error, or maybe a required field wa

Re: Term extraction

2007-09-22 Thread Brian Whitman
On Sep 21, 2007, at 3:37 AM, Pieter Berkel wrote: Thanks for the response guys: Grant: I had a brief look at LingPipe, it looks quite interesting but I'm concerned that the licensing may prevent me from using it in my project. Does the opennlp license look good for you? It's LGPL. Not

Re: Nutch with SOLR

2007-09-25 Thread Brian Whitman
Sami has a patch in there which used a older version of the solr client. with the current solr client in the SVN tree, his patch becomes much easier. your job would be to upgrade the patch and mail it back to him so he can update his blog, or post it as a patch for inclusion in nutch/cont

Re: Nutch with SOLR

2007-09-25 Thread Brian Whitman
But we still use a version of Sami's patch that works on both trunk nutch and trunk solr (solrj.) I sent my changes to sami when we did it, if you need it let me know... I put my files up here: http://variogr.am/latest/?p=26 -b

Re: Nutch with SOLR

2007-09-26 Thread Brian Whitman
On Sep 26, 2007, at 4:04 AM, Doğacan Güney wrote: NUTCH-442 is one of the issues that I want to really see resolved. Unfortunately, I haven't received many (as in, none) comments, so I haven't made further progress on it. I am probably your target customer but to be honest all we care about

searching for non-empty fields

2007-09-26 Thread Brian Whitman
I have a large index with a field for a URL. For some reason or another, sometimes a doc will get indexed with that field blank. This is fine but I want a query to return only the set URL fields... If I do a query like: q=URL:[* TO *] I get a lot of empty fields back, like: http://thing.

Re: searching for non-empty fields

2007-09-27 Thread Brian Whitman
thanks Peter, Hoss and Ryan.. q=(URL:[* TO *] -URL:"") This gives me 400 Query parsing error: Cannot parse '(URL:[* TO *] - URL:"")': Lexical error at line 1, column 29. Encountered: "\"" (34), after : "\"" adding something like: I'll do this but the problem here is I have to wait

small rsync index question

2007-09-28 Thread Brian Whitman
I'm not using snap* scripts but i quickly need to sync up two indexes on two machines. I am rsyncing the data dirs from A to B, which work fine. But how can I see the new index on B? For some reason sending a is not refreshing the index, and I have to restart resin to see it. Is there some

Re: small rsync index question

2007-09-28 Thread Brian Whitman
Sep 28, 2007, at 5:41 PM, Yonik Seeley wrote: It should... are there any errors in the logs? do you see the commit in the logs? Check the stats page to see info about when the current searcher was last opened too. ugh, nevermind.. was committing the wrong solr index... but Thanks yonik fo

dismax downweighting

2007-10-12 Thread Brian Whitman
i have a dismax query where I want to boost appearance of the query terms in certain fields but "downboost" appearance in others. The practical use is a field containing a lot of descriptive text and then a product name field where products might be named after a descriptive word. Consider

Lock obtain timed out

2007-10-18 Thread Brian Whitman
We have a very active large index running a solr trunk from a few weeks ago that has been going down about once a week for this: [11:08:17.149] No lockType configured for /home/bwhitman/XXX/XXX/ discovered-solr/data/index assuming 'simple' [11:08:17.150] org.apache.lucene.store.LockObtainFaile

Re: Lock obtain timed out

2007-10-18 Thread Brian Whitman
Thanks to ryan and matt.. so far so good. true single

grouped clause search in dismax

2007-10-20 Thread Brian Whitman
I have a dismax handler to match product names found in free text that looks like: explicit 0.01 name^5 nec_name^3 ne_name * 100 *:* name is type string, nec_name and ne_name are special types that do domain-specif

Re: How to get number of indexed documents?

2007-11-01 Thread Brian Whitman
does http://.../solr/admin/luke work for you? 601818 ... On Nov 1, 2007, at 10:39 PM, Papalagi Pakeha wrote: Hello, Is there any way to get XML version of statistics like how many documents are indexed etc? I have found http://.../solr/admin/properties which is cool but doesn't give me th

"overlapping onDeckSearchers" message

2007-11-03 Thread Brian Whitman
I have a solr index that hasn't had many problems recently but I had the logs open and noticed this a lot during indexing: [16:23:34.086] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 Not sure what it means, google didn't come back with much.

Re: start.jar -Djetty.port= not working

2007-11-07 Thread Brian Whitman
On Nov 7, 2007, at 10:00 AM, Mike Davies wrote: java -Djetty.port=8521 -jar start.jar However when I run this it seems to ignore the command and still start on the default port of 8983. Any suggestions? Are you using trunk solr or 1.2? I believe 1.2 still shipped with an older version

Re: start.jar -Djetty.port= not working

2007-11-07 Thread Brian Whitman
On Nov 7, 2007, at 10:07 AM, Mike Davies wrote: I'm using 1.2, downloaded from http://apache.rediris.es/lucene/solr/ Where can i get the trunk version? svn, or http://people.apache.org/builds/lucene/solr/nightly/

Re: LSA Implementation

2007-11-26 Thread Brian Whitman
On Nov 26, 2007 6:58 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is patented, so it is not likely to happen unless the authors donate the patent to the ASF. -Grant There are many ways to catch a bird... LSA reduces to SVD on the

Re: Solr and nutch, for reading a nutch index

2007-11-27 Thread Brian Whitman
On Nov 27, 2007, at 6:08 PM, bbrown wrote: I couldn't tell if this was asked before. But I want to perform a nutch crawl without any solr plugin which will simply write to some index directory. And then ideally I would like to use solr for searching? I am assuming this is possible?

Re: Solr and nutch, for reading a nutch index

2007-11-27 Thread Brian Whitman
Cc: [EMAIL PROTECTED] Sent: Tuesday, November 27, 2007 8:33:18 PM Subject: Re: Solr and nutch, for reading a nutch index On Tue, 27 Nov 2007 18:12:13 -0500 Brian Whitman <[EMAIL PROTECTED]> wrote: On Nov 27, 2007, at 6:08 PM, bbrown wrote: I couldn't tell if this was asked before. B

can I do *thing* substring searches at all?

2007-11-29 Thread Brian Whitman
With a fieldtype of string, can I do any sort of *thing* search? I can do thing* but not *thing or *thing*. Workarounds?

Re: Re:

2007-12-02 Thread Brian Whitman
On Dec 2, 2007, at 5:43 PM, Ryan McKinley wrote: try \& rather then %26 or just put quotes around the whole url. I think curl does the right thing here.

Re: RE: Re:

2007-12-02 Thread Brian Whitman
On Dec 2, 2007, at 6:00 PM, Andrew Nagy wrote: On Dec 2, 2007, at 5:43 PM, Ryan McKinley wrote: try \& rather then %26 or just put quotes around the whole url. I think curl does the right thing here. I tried all the methods: converting & to %26, converting & to \& and encapsulating

Re: RE: Re:

2007-12-02 Thread Brian Whitman
On Dec 2, 2007, at 5:29 PM, Andrew Nagy wrote: Sorry for not explaining my self clearly: I have header=true as you can see from the curl command and there is a header line in the csv file. was this your actual curl request? curl http://localhost:8080/solr/update/csv?header=true%26seper

Re: out of heap space, every day

2007-12-04 Thread Brian Whitman
For faceting and sorting, yes. For normal search, no. Interesting you mention that, because one of the other changes since last week besides the index growing is that we added a sort to an sint field on the queries. Is it reasonable that a sint sort would require over 2.5GB of heap on

out of heap space, every day

2007-12-04 Thread Brian Whitman
This maybe more of a general java q than a solr one, but I'm a bit confused. We have a largish solr index, about 8M documents, the data dir is about 70G. We're getting about 500K new docs a week, as well as about 1 query/second. Recently (when we crossed about the 6M threshold) resin has

Re: out of heap space, every day

2007-12-04 Thread Brian Whitman
int[maxDoc()] + String[nTerms()] + size_of_all_unique_terms. Then double that to allow for a warming searcher. This is great, but can you help me parse this? Assume 8M docs and I'm sorting on an int field that is unix time (seonds since epoch.) For the purposes of the experiment assume eve

solrj - adding a SolrDocument (not a SolrInputDocument)

2007-12-06 Thread Brian Whitman
Writing a utility in java to do a copy from one solr index to another. I query for the documents I want to copy: SolrQuery q = new SolrQuery(); q.setQuery("dogs"); QueryResponse rq = source_solrserver.query(q); for( SolrDocument d : rq.getResults() ) { // now I want to add these to a new

Re: solrj - adding a SolrDocument (not a SolrInputDocument)

2007-12-06 Thread Brian Whitman
On Dec 6, 2007, at 3:07 PM, Ryan McKinley wrote: public static SolrInputDocument toSolrInputDocument( SolrDocument d ) { SolrInputDocument doc = new SolrInputDocument(); for( String name : d.getFieldNames() ) { doc.addField( name, d.getFieldValue(name), 1.0f ); } retur

Re: Solr and Flex

2007-12-13 Thread Brian Whitman
On Dec 13, 2007, at 10:42 AM, jenix wrote: I'm using Flex for the frontend interface and Solr on backend for the search engine. I'm new to Flex and Flash and thought someone might have some code integrating the two. We've done light stuff querying solr w/ actionscript. It is pretty si

Re: debugging slowness

2007-12-20 Thread Brian Whitman
On Dec 20, 2007, at 11:02 AM, Otis Gospodnetic wrote: Sounds like GC to me. That is, the JVM not having large enough heap. Run jconsole and you'll quickly see if this guess is correct or not (kill -QUIT is also your friend, believe it or not). We recently had somebody who had a nice litt

Re: Status 500 - ParseError at [row,col]:[1,1] Message Content is not allowed in Prolog

2008-01-08 Thread Brian Whitman
On Jan 8, 2008, at 10:58 AM, Kirk Beers wrote: curl http://localhost:8080/solr/update -H "Content-Type:text/xml" -- data-binary '/overwritePending="true">0001field>TitleIt was the best of times it was the worst of times blah blah blahdoc>' Why the / after the first single quote?

Re: Status 500 - ParseError at [row,col]:[1,1] Message Content is not allowed in Prolog

2008-01-08 Thread Brian Whitman
I found that on the Wiki at http://wiki.apache.org/solr/UpdateXmlMessages#head-3dfbf90fbc69f168ab6f3389daf68571ad614bef under the title: Updating a Data Record via curl. I removed it and now have the following: 0name="QTime">122This response format is experimental. It is likely to cha

index out of disk space, CorruptIndexException

2008-01-14 Thread Brian Whitman
We had an index run out of disk space. Queries work fine but commits return 500 doc counts differ for segment _18lu: fieldsReader shows 104 but segmentInfo shows 212 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _18lu: fieldsReader shows 104 but segmentInfo

Re: index out of disk space, CorruptIndexException

2008-01-14 Thread Brian Whitman
On Jan 14, 2008, at 4:08 PM, Ryan McKinley wrote: ug -- maybe someone else has better ideas, but you can try: http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/index/CheckIndex.java thanks for the tip, i did run that, but I stopped it 30 minutes in, as it was s

Re: Missing Content Stream

2008-01-15 Thread Brian Whitman
On Jan 15, 2008, at 1:50 PM, Ismail Siddiqui wrote: Hi Everyone, I am new to solr. I am trying to index xml using http post as follows Ismail, you seem to have a few spelling mistakes in your xml string. "fiehld, nadme" etc. (a) try fixing them, (b) try solrj instead, I agree w/ otis.

Re: best way to get number of documents in a Solr index

2008-01-15 Thread Brian Whitman
On Jan 15, 2008, at 3:47 PM, Maria Mosolova wrote: Hello, I am looking for the best way to get the number of documents in a Solr index. I'd like to do it from a java code using solrj. public int resultCount() { try { SolrQuery q = new SolrQuery("*:*"); QueryResponse rq =

Re: Newbie with Java + typo

2008-01-21 Thread Brian Whitman
On Jan 21, 2008, at 11:13 AM, Daniel Andersson wrote: Well, no. "Immutable Page", and as far as I know (english not being my mother tongue), that means I can't edit the page You need to create an account first.

Re: SolrPhpClient with example jetty

2008-01-22 Thread Brian Whitman
$document->title = 'Some Title'; $document->content = 'Some content for this wonderful document. Blah blah blah.'; did you change the schema? There's no title or content field in the default example schema. But I believe solr does output different errors for that.

Re: Cache size clarification

2008-01-28 Thread Brian Whitman
On Jan 28, 2008, at 6:05 PM, Alex Benjamen wrote: I need some clarification on the cache size parameters in the solrconfig. Suppose I'm using these values: A lot of this is here: http://wiki.apache.org/solr/SolrCaching

Re: SEVERE: java.lang.OutOfMemoryError: Java heap space

2008-01-28 Thread Brian Whitman
On Jan 28, 2008, at 7:06 PM, Leonardo Santagada wrote: On 28/01/2008, at 20:44, Alex Benjamen wrote: I could allocate more physical memory, but I can't seem to increase the -Xmx option to 3800 I get an error : "Could not reserve enough space for object heap", even though I have more than

Re: SEVERE: java.lang.OutOfMemoryError: Java heap space

2008-01-28 Thread Brian Whitman
But on Intel, where I'm having the problem it shows: java version "1.6.0_10-ea" Java(TM) SE Runtime Environment (build 1.6.0_10-ea-b10) Java HotSpot(TM) Server VM (build 11.0-b09, mixed mode) I can't seem to find the Intel 64 bit JDK binary, can you pls. send me the link? I was downloading f

date math syntax

2008-01-29 Thread Brian Whitman
Is there a wiki page or more examples of the "date math" parsing other than this: http://www.mail-archive.com/solr-user@lucene.apache.org/msg01563.html out there somewhere? From an end user query perspective. -b

Re: Converting Solr results to java query/collection/map object

2008-02-19 Thread Brian Whitman
On Feb 19, 2008, at 3:08 PM, Paul Treszczotko wrote: Hi, I'm pretty new to SOLR and I'd like to ask your opinion on the best practice for converting XML results you get from SOLR into something that is better fit to display on a webpage. I'm looking for performance and relatively small foo

will hardlinks work across partitions?

2008-02-23 Thread Brian Whitman
Will the hardlink snapshot scheme work across physical disk partitions? Can I snapshoot to a different partition than the one holding the live solr index?

can I form a SolrQuery and query a SolrServer in a request handler?

2008-02-25 Thread Brian Whitman
I'm in a request handler: public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) { And in here i want to form a SolrQuery based on the req, query the searcher and return results. But how do I get a SolrServer out of the req? I can get a SolrIndexSearcher but that does

Re: can I form a SolrQuery and query a SolrServer in a request handler?

2008-02-25 Thread Brian Whitman
Perhaps back up and see if we can do this a simpler way than a request handler... What is the query structure you are trying to generate? I have two dismax queries defined in a solrconfig. Something like ... raw^4 name^1 ... tags^3 typ

Re: can I form a SolrQuery and query a SolrServer in a request handler?

2008-02-25 Thread Brian Whitman
Would query ?qt=q1&q=kittens&bf=2&fl=id, then ? qt=q2&q=kittens&bf=2&fl=id. Sorry, I meant: ?qt=q1&q=kittens&bf=sortable^2&fl=id, then ? qt=q2&q=kittens&bf=sortable^2&fl=id

invalid XML character

2008-03-01 Thread Brian Whitman
Once in a while we get this javax.xml.stream.XMLStreamException: ParseError at [row,col]:[4,790470] [14:32:21.877] Message: An invalid XML character (Unicode: 0x6) was found in the element content of the document. [14:32:21.877] at com .sun .org .apache .xerces .internal.impl.XMLStreamR

  1   2   >