any docs on using the GeoHashField?

2011-09-02 Thread Peter Wolanin
looking at http://wiki.apache.org/solr/SpatialSearchDev I would think I could index a lat,lon pair into a GeoHashField (that works) and then retrieve the field value to see the computed geohash. however, that doesn't seem to work. If I index: 21.4,33.5 The retrieved value is not a hash, but ap

Re: any docs on using the GeoHashField?

2011-09-14 Thread Peter Wolanin
When I retrieve the value the lat/lon pair that comes out is not exactly the same as what I indexed, which made be think it was actually stored as the hash and then transformed back? Anyhow - I'm trying to understand the actual use case for the field as it exists - essentially you are saying I cou

Re: Setting up Solr 3.4 example with Tomcat 7

2011-10-02 Thread Peter Wolanin
I've seen a number of users fail to get Solr working correctly in combination with the Drupal client code when using the .deb installer so I have been strongly recommending against it personally. It's also a rather stale version of Solr, generally. -Peter On Sun, Oct 2, 2011 at 4:04 AM, Gora Moh

Retrieving matched tokens and their payload?

2011-12-05 Thread Peter Wolanin
A colleague came to be with a problem that intrigued me. I can see partly how to solve it with Solr, but looking for insight into solving the last step. The problem: 1) Start from a set of text transcriptions of videos where there is a timestamp associated with each word. 2) Index into Solr wit

Re: Lucene/Solr

2011-12-05 Thread Peter Wolanin
Assuming you are using Drupal for the website, you can have Solr set up and integrated with Drupal in < 5 minutes for local development purposes. See: https://drupal.org/node/1358710 for a pre-configured download. -Peter On Mon, Dec 5, 2011 at 11:46 AM, Achebe, Ike, JCL wrote: > Hi, > My name i

Polish language support?

2010-07-09 Thread Peter Wolanin
In IRC trying to help someone find Polish-language support for Solr. Seems lucene has nothing to offer? Found one stemmer that looks to be compatibly licensed in case someone wants to take a shot at incorporating it: http://www.getopt.org/stempel/ -Peter -- Peter M. Wolanin, Ph.D. Momentum Sp

access control for spellcheck suggestions?

2010-10-07 Thread Peter Wolanin
We have a content access control system that works well for the actual search results, but we see that the spellcheck suggestions include words that are not within the set of documents the current user is allowed to access. Does anyone have an approach to this problem for Solr 1.4.x? Anything new

Re: access control for spellcheck suggestions?

2010-10-08 Thread Peter Wolanin
t Group > (615) 213-4311 > > > -Original Message- > From: Peter Wolanin [mailto:peter.wola...@acquia.com] > Sent: Thursday, October 07, 2010 9:00 AM > To: solr-user@lucene.apache.org > Subject: access control for spellcheck suggestions? > > We have a content access c

mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Peter Wolanin
Trying to maintain the Drupal integration module across multiple versions of 3.x, we've gotten a bug report suggesting that Solr 3.6 needs this change to solrconfig: - org.apache.lucene.index.LogByteSizeMergePolicy + I don't see this mentioned in the release notes - is the second format use

Re: mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Peter Wolanin
browse/SOLR-1052. The second format works > in all 3.x versions. > > -Michael > > -Original Message- > From: Peter Wolanin [mailto:peter.wola...@acquia.com] > Sent: Friday, April 13, 2012 12:32 PM > To: solr-user@lucene.apache.org > Subject: mergePolicy el

Re: Highlighting words with non-ascii chars

2011-05-02 Thread Peter Wolanin
Does your servlet container have the URI encoding set correctly, e.g. URIEncoding="UTF-8" for tomcat6? http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config Older versions of Jetty use ISO-8859-1 as the default URI encoding, but jetty 6 should use UTF-8 as default: http://docs.codehaus.org/d

what data type for geo fields?

2011-07-27 Thread Peter Wolanin
Looking at the example schema: http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3/solr/example/solr/conf/schema.xml the solr.PointType field type uses double (is this just an example field, or used for geo search?), while the solr.LatLonType field uses tdouble and it's unclear ho

Re: what data type for geo fields?

2011-07-28 Thread Peter Wolanin
l 27, 2011 at 9:01 AM, Peter Wolanin > wrote: >> Looking at the example schema: >> >> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3/solr/example/solr/conf/schema.xml >> >> the solr.PointType field type uses double (is this just an example >

Solr 4.0 - distributed updates without zookeeper?

2012-11-11 Thread Peter Wolanin
Looking at how we could upgrade some of our infrastructure to Solr 4.0 - I would really like to take advantage of distributed updates to get NRT, but we want to keep our fixed master and slave server roles since we use different hardware appropriate to the different roles. Looking at the solr 4.0

Re: Solr 4.0 - distributed updates without zookeeper?

2012-11-13 Thread Peter Wolanin
gt; Otis > -- > Search Analytics - http://sematext.com/search-analytics/index.html > Performance Monitoring - http://sematext.com/spm/index.html > > > On Sun, Nov 11, 2012 at 7:42 PM, Peter Wolanin > wrote: > >> Looking at how we could upgrade some of our infrastruct

Re: Solr 4.0 - distributed updates without zookeeper?

2012-11-14 Thread Peter Wolanin
down, but adding HA there would be helpful in some cases. -Peter On Tue, Nov 13, 2012 at 9:12 PM, Peter Wolanin wrote: > Yes, basically I want to at least avoid leader election and the other > dynamic behaviors. I don't have any experience with ZK, and a lot of > "magic" beha

tika 0.4?

2009-07-24 Thread Peter Wolanin
Sadly, I had to muis the meetup in NYC, but looking over the slides (http://files.meetup.com/1482573/YonikSeeley_NYCMeetup_solr14_features.pdf) I see: Solr Cell: Integrates Apache Tika (v0.4) into Solr My current checkout of solr still has tika 0.3, and I don't see a jira issue for updating to 0

Re: server won't start using configs from Drupal

2009-07-24 Thread Peter Wolanin
Looks like we better update our schema for the Drupal module - what rev of Solr incorporates this change? -Peter On Fri, Jul 24, 2009 at 8:38 AM, Koji Sekiguchi wrote: > David, > > Try to change solr.CharStreamAwareWhitespaceTokenizerFactory to > solr.WhitespaceTokenizerFactory > in your schema.

Re: "standard" requestHandler components

2009-09-14 Thread Peter Wolanin
I just copied this information to the wiki at http://wiki.apache.org/solr/SolrRequestHandler -Peter On Fri, Sep 11, 2009 at 7:43 PM, Jay Hill wrote: > RequestHandlers are configured in solrconfig.xml. If no components are > explicitly declared in the request handler config the the defaults are u

Re: dismax + wildcard

2009-11-09 Thread Peter Wolanin
There are some open issues (not for 1.4 at this point) to make dismax more flexible or add wildcard handling, e.g: https://issues.apache.org/jira/browse/SOLR-756 https://issues.apache.org/jira/browse/SOLR-758 You might participate in those to try to get this in a future version and/or get a worki

any docs on solr.EdgeNGramFilterFactory?

2009-11-10 Thread Peter Wolanin
This fairly recent blog post: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ describes the use of the solr.EdgeNGramFilterFactory as the tokenizer for the index. I don't see any mention of that tokenizer on the Solr wiki - is it just waiting t

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-10 Thread Peter Wolanin
analyzer in the wild a few months > back. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: Peter Wolanin >> To: so

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-11 Thread Peter Wolanin
fferent than the normal n-gram > tokenizer. > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: Peter Wolanin >> To: sol

Re: any docs on solr.EdgeNGramFilterFactory?

2009-11-13 Thread Peter Wolanin
different than the normal n-gram >> tokenizer. >> > >> > Otis >> > -- >> > Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> > >> > >>

changes to highlighting config or syntax in 1.4?

2009-11-13 Thread Peter Wolanin
I'm testing out the final release of Solr 1.4 as compared to the build I have been using from around June. I'm using hte dismax handler for searches. I'm finding that highlighting is completely broken as compared to previously. Much more text is returned than it should for each string in , but t

Re: changes to highlighting config or syntax in 1.4?

2009-11-13 Thread Peter Wolanin
Apparently one of my conf files was broken - odd that I didn't see any exceptions. Anyhow - excuse my haste, I don't see the problem now. -Peter On Fri, Nov 13, 2009 at 11:06 PM, Peter Wolanin wrote: > I'm testing out the final release of Solr 1.4 as compared to the build &

Re: Newbie Solr questions

2009-11-15 Thread Peter Wolanin
Take a look at the example schema - you can have dynamic fields that are used based on wildcard matching to the field name if a field doesn't mtach the name of an existing field. -Peter On Sun, Nov 15, 2009 at 10:50 AM, yz5od2 wrote: > Thanks for the reply: > > I follow the schema.xml concept, b

using Xinclude with multi-core

2009-11-27 Thread Peter Wolanin
I'm trying to take advantage of the Solr 1.4 Xinclude feature to include a different xml fragment (e.g. a different analyzer chain in schema.xml) for each core in a multi-core setup. When the Xinclude operates on a relative path, it seems to NOT be acting relative to the xml file with the Xinlclud

is it possible to use Xinclude in schema.xml?

2009-11-28 Thread Peter Wolanin
I'm trying to determine if it's possible to use Xinclude to (for example) have a base schema file and then substitute various pieces. It seems that the schema fieldTypes throw exceptions if there is an unexpected attribute? SEVERE: java.lang.RuntimeException: schema fieldtype text(org.apache.solr

Re: is it possible to use Xinclude in schema.xml?

2009-11-28 Thread Peter Wolanin
Follow-up: it seems the schema parser doesn't barf if you use xinclude with a single analyzer element, but so far seems like it's impossible for a field type. So this seems to work: ... ... On Sat, Nov 28, 2009 at 1:40 PM, Peter Wola

boosting certain terms within one field?

2008-11-29 Thread Peter Wolanin
I've recently started working on the Drupal integration module for SOLR, and we are looking for suggestions for how to address this question: how do we boost the importance of a subset of terms within a field. For example, we are using the standard request handler for queries, and the default fie

Re: boosting certain terms within one field?

2008-11-30 Thread Peter Wolanin
nt way of encoding the byte array and putting it into the XML > format, such that one can send in payloads when indexing. It's not > particularly hard, but no one has done it yet. > > -Grant > > > On Nov 29, 2008, at 10:45 PM, Peter Wolanin wrote: > >> I've

Re: problem index accented character with release version of solr 1.3

2008-12-09 Thread Peter Wolanin
We have been having this problem also. and have resorted to just stripping control characters before sending the text for indexing: preg_replace('@[\x00-\x08\x0B\x0C\x0E-\x1F]@', '', $text); -Peter On Tue, Dec 9, 2008 at 7:59 AM, knietzie <[EMAIL PROTECTED]> wrote: > > hi joshua, > > i'm having

does this break Solr? dynamicField name="*" type="ignored"

2008-12-18 Thread Peter Wolanin
I'm seeing a weird effect with a '*' field. In the example schema.xml, there is a commented out sample: We have this un-commented, and in the schema browser via the admin interface I see that all non-dynamic fields get a type of "ignored". I see this in the Solr admin interface: Field:

Re: does this break Solr? dynamicField name="*" type="ignored"

2008-12-18 Thread Peter Wolanin
-Yonik > > > On Thu, Dec 18, 2008 at 3:20 PM, Peter Wolanin > wrote: >> I'm seeing a weird effect with a '*' field. In the example >> schema.xml, there is a commented out sample: >> >> >> >> >> We have this un-commented, and

Re: How can i omit the illegal characters,when indexing the docs?

2009-01-04 Thread Peter Wolanin
For documents we are indexing via the PHP client, we are currently using the following regex to strip control characters from each field that might contain them: function apachesolr_strip_ctl_chars($text) { // See: http://w3.org/International/questions/qa-forms-utf-8.html // Printable utf-8 d

can the TermsComponent be used in combination with fq?

2009-02-16 Thread Peter Wolanin
We have been trying to figure out how to construct, for example, a directory page with an overview of available facets for several fields. Looking at the issue and wiki http://wiki.apache.org/solr/TermsComponent https://issues.apache.org/jira/browse/SOLR-877 It would seem like this component wou

Re: Finding total range of dates for date faceting

2009-02-17 Thread Peter Wolanin
It *looks* as though Solr supports returning the results of arbitrary calculations: http://wiki.apache.org/solr/SolrQuerySyntax However, I am so far unable to get any example working except in the context of a dismax bf. It seems like one ought to be able to write a query to return the doc match

Re: Store content out of solr

2009-02-17 Thread Peter Wolanin
Sure, we are doing essentially that with our Drupal integration module - each search result contains a link to the "real" content, which is stored in MySQL, etc, and presented via the Drupal CMS. http://drupal.org/project/apachesolr -Peter On Tue, Feb 17, 2009 at 11:57 AM, roberto wrote: > Hell

make the suggested ignored field multi-valued?

2009-02-18 Thread Peter Wolanin
In the example schema.xml, there is a field type 'ignored' which it is suggested can be used with the wildcard * to prevent errors when a document contains fields that don't match any in the schema. My experience recently in using this is that it does not worked as desired if the unmatched field

Re: why don't we have a forum for discussion?

2009-02-18 Thread Peter Wolanin
If some stuff is asked over and over again, it would be great to grab some reasonable responses and add them to the wiki. I've edited it a few times when I've struggled with what's there and found something that wasn't covered or was out of date - even the best forum or mailing list will not repli

Suggested hardening of Solr schema.jsp admin interface

2009-02-20 Thread Peter Wolanin
My colleague Paul opened this issue and supplied a patch and I commented on it regarding a potential security weakness in the admin interface: https://issues.apache.org/jira/browse/SOLR-1031 -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com

What is the performance impact of a fq that matches all docs?

2009-02-20 Thread Peter Wolanin
We are working on integration with the Drupal CMS, and so are writing code that carries out operations that might only be relevant for only a small subset of the sites/indexes that might use the integration module. In this regard, I'm wondering if adding to the query (using the dismax or mlt handl

Re: Error with highlighter and UTF-8 chars?

2009-02-23 Thread Peter Wolanin
We are using Solr trunk (1.4) - currently " nightly exported - yonik - 2009-02-05 08:06:00" -Peter On Mon, Feb 23, 2009 at 8:07 AM, Koji Sekiguchi wrote: > Jacob, > > What Solr version are you using? There is a bug in SolrHighlighter of Solr > 1.3, > you may want to look at: > > https://issues.

Re: Error with highlighter and UTF-8 chars?

2009-02-24 Thread Peter Wolanin
- und Anwender-) − die Entstehungsgeschichte des Portals) auch dokumentiert worden, denn Ihr vermutet schon richtig, daß da You can see the "strong" tags each get offset one character more from where they are supposed to be. -Peter On Mon, Feb 23, 2009 at 8:24 AM, Peter Wolan

Re: Error with highlighter and UTF-8 chars?

2009-02-24 Thread Peter Wolanin
can see the "strong" tags each get offset one character more from > where they are supposed to be. > > > -Peter > > > > On Mon, Feb 23, 2009 at 8:24 AM, Peter Wolanin > wrote: >> We are using Solr trunk (1.4)  - currently " nightly exported - yonik >&g

Re: Error with highlighter and UTF-8 chars?

2009-02-24 Thread Peter Wolanin
, but looks like the real bug is in Solr. -Peter On Tue, Feb 24, 2009 at 4:28 PM, Peter Wolanin wrote: > So - something in the highlighting code is counting bytes when it > should be counting characters.  Looks like a lucene bug, so I'm > surprised others have not hit this before.

up/down sides to using compound file format for index?

2009-03-09 Thread Peter Wolanin
Trying to set up a server to host multiple Solr cores, we have run into the issue of too many open files a few times. The 2nd ed "Lucene in Action" book suggests using the compound file format to reduce the required number of files when having multiple indexes, but mentions a possible ~10% slow-do

Re: Query Boosting using both BQ and BF

2009-03-09 Thread Peter Wolanin
This doesn't seem to match what I'm seeing in terms of using bq - using any value > 0 increases the score. For example, with no bq: solr title,score,type 2.2 1.6885357 Building a killer search for Drupal wikipage 1.5547959 New Solr module available for testing story

Re: ExtractingRequestHandler and SolrRequestHandler issue

2009-04-22 Thread Peter Wolanin
I had problems with this when trying to set this up with multiple cores - I had to set the shared lib as: in example/solr/solr.xml in order for it to find the jars in example/solr/lib -Peter On Wed, Apr 22, 2009 at 11:43 AM, Grant Ingersoll wrote: > > On Apr 20, 2009, at 12:46 PM, francisco t

bug? No highlighting results with dismax and q.alt=*:*

2009-05-07 Thread Peter Wolanin
For the Drupal Apache Solr Integration module, we are exploring the possibility of doing facet browsing - since we are using dismax as the default handler, this would mean issuing a query with an empty q and falling back to to q.alt='*:*' or some other q.alt that matches all docs. However, I noti

Re: bug? No highlighting results with dismax and q.alt=*:*

2009-05-08 Thread Peter Wolanin
Possibly this issue is related: https://issues.apache.org/jira/browse/SOLR-825 Though it seems that might affect the standard handler, while what I'm seeing is more sepcific to the dismax handler. -Peter On Thu, May 7, 2009 at 8:27 PM, Peter Wolanin wrote: > For the Drupal Apa

Re: bug? No highlighting results with dismax and q.alt=*:*

2009-05-09 Thread Peter Wolanin
.alt using the params q and qf. Highlight will work in that case (I > sorted it out doing that) > > Peter Wolanin-2 wrote: >> >> Possibly this issue is related: >> https://issues.apache.org/jira/browse/SOLR-825 >> >> Though it seems that might affect the standard

Re: Replication master+slave

2009-05-13 Thread Peter Wolanin
Indeed - that looks nice - having some kind of conditional includes would make many things easier. -Peter On Wed, May 13, 2009 at 4:22 PM, Otis Gospodnetic wrote: > > This looks nice and simple.  I don't know enough about this stuff to see any > issues.  If there are no issues.? > > Otis >

Re: Solr memory requirements?

2009-05-17 Thread Peter Wolanin
I think that if you have in your index any documents with norms, you will still use norms for those fields even if the schema is changed later. Did you wipe and re-index after all your schema changes? -Peter On Fri, May 15, 2009 at 9:14 PM, vivek sar wrote: > Some more info, > >  Profiling the

exceptions when using existing index with latest build

2009-05-25 Thread Peter Wolanin
Building Solr last night from updated svn, I'm now getting the exception below when I use any fq parameter searching a pre-existing index. So far, I cannot fix it by tweak config files, but I had to delete and re-index. I note that Solr was recently updated to the latest lucene build, so maybe so

Re: Recover crashed solr index

2009-05-25 Thread Peter Wolanin
you can use the lucene jar with solr to invoke the CheckIndex method - this will possibly allow you to recover if you pass the -fix param. You may lose some docs, however, so this is only viable if you can, for example, query to check what's missing. The command looks like (from the root of the

NPE when unloading an absent

2009-06-03 Thread Peter Wolanin
Is this a known bug? When I try to unload a core that does not exist, Solr throws a NullPointerException java.lang.NullPointerException at org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:319) at org.apache.solr.handler.admin.CoreAdminHandl

Re: NPE when unloading an absent

2009-06-03 Thread Peter Wolanin
I did not find any relevant issue, so here's a new issue with a patch: https://issues.apache.org/jira/browse/SOLR-1200 -Peter On Wed, Jun 3, 2009 at 4:56 PM, Peter Wolanin wrote: > Is this a known bug?  When I try to unload a core that does not exist, > Solr throws a NullPoint

Re: Dismax request handler and highlighting

2009-06-07 Thread Peter Wolanin
I had the same problem - I think the answer is that highlighting is not currently supported with q.alt and dismax. http://www.nabble.com/bug--No-highlighting-results-with-dismax-and-q.alt%3D*%3A*-td23438048.html#a23438048 -Peter On Sun, Jun 7, 2009 at 7:51 AM, Fouad Mardini wrote: > Hello, > >

can Trie fields be stored?

2009-06-11 Thread Peter Wolanin
Looking at the new examples of solr.TrieField http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/schema.xml I see that all have indexed="true" stored="false" in the field tpye definition. Does this mean that yo cannot ever store a value for one of these fields? I.e. if I want t

multi-core, autocommit and resource use

2009-06-18 Thread Peter Wolanin
A question for anyone familiar with the details of the time-based autocommit mechanism in Solr: if I am running several core on the same server and send updates to each core at the same time, what happens? If all the cores have their autocommit time run out at the same time, will every core try

Re: multi-core, autocommit and resource use

2009-06-18 Thread Peter Wolanin
So for now would it make sense to spread out the autocommit times for the different cores? Thanks. -Peter On Thu, Jun 18, 2009 at 7:07 PM, Yonik Seeley wrote: > On Thu, Jun 18, 2009 at 4:27 PM, Peter Wolanin > wrote: >>  I think I understand >> that all the pending changes a

Re: facets: case and accent insensitive sort

2009-06-28 Thread Peter Wolanin
Seems like this might be approached using a Lucene payload? For example where the original string is stored as the payload and available in the returned facets for display purposes? Payloads are byte arrays stored with Terms on Fields. See https://issues.apache.org/jira/browse/LUCENE-755 Solr se

Select tika output for extract-only?

2009-07-11 Thread Peter Wolanin
I had been assuming that I could choose among possible tika output formats when using the extracting request handler in extract-only mode as if from the CLI with the tika jar: -x or --xmlOutput XHTML content (default) -h or --html Output HTML content -t or --text Ou

Re: Select tika output for extract-only?

2009-07-13 Thread Peter Wolanin
ee SOLR-284) > A quick patch to specify the output format should make it into 1.4 - > but you may want to wait until I finish. > > -Yonik > http://www.lucidimagination.com > > On Sat, Jul 11, 2009 at 5:39 PM, Peter Wolanin > wrote: >> I had been assuming that I could cho

lucene or Solr bug with dismax?

2009-07-13 Thread Peter Wolanin
I have been getting exceptions thrown when users try to send boolean queries into the dismax handler. In particular, with a leading 'OR'. I'm really not sure why this happens - I thought the dsimax parser ignored AND/OR? I'm using rev 779609 in case there were recent changes to this. Is this a k

Re: lucene or Solr bug with dismax?

2009-07-13 Thread Peter Wolanin
t issue (is there another)? https://issues.apache.org/jira/browse/SOLR-874 -Peter On Mon, Jul 13, 2009 at 4:12 PM, Mark Miller wrote: > It doesn't ignore OR and AND, though it probably should. I think there is a > JIRA issue for it somewhere. > > On Mon, Jul 13, 2009 at 4:10 PM, Peter W

Re: lucene or Solr bug with dismax?

2009-07-13 Thread Peter Wolanin
I can still generate this error with Solr built from svn trunk just now. http://localhost:8983/solr/select/?qt=dismax&q=OR+vti+OR+foo I'm doubly perplexed by this since 'or' is in the stopwords file. -Peter On Mon, Jul 13, 2009 at 3:15 PM, Peter Wolanin wrote: > I have b

Re: Multivalued fields and scoring/sorting

2009-07-16 Thread Peter Wolanin
Assuming that you know the unique ID when constructing the query (which it sounds like you do) why not try a boost query with a high boost for 2 and a lower boost for 1 - then the default sort by score should match your desired ordering, and this order can be further tweaked with other bf or bq ar

Re: Wikipedia or reuters like index for testing facets?

2009-07-16 Thread Peter Wolanin
AWS provides some standard data sets, including an extract of all wikipedia content: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2345&categoryID=249 Looks like it's not being updated often, so this or another AWS data set could be a consistent basis for benchmarking? -Pe

Re: spellcheck with misspelled words in index

2009-07-16 Thread Peter Wolanin
I think you can just tell the spellchecker to only supply "more popular" suggestions, which would naturally omit these rare misspellings: true -Peter On Wed, Jul 15, 2009 at 7:30 PM, Jay Hill wrote: > We had the same thing to deal with recently, and a great solution was posted > to the lis

Re: Obtaining SOLR index size on disk

2009-07-20 Thread Peter Wolanin
Actually, if you have a server enabled as a replication master, the stats.jsp page reports the index size, so that information is available in some cases. -Peter On Sat, Jul 18, 2009 at 8:14 AM, Erik Hatcher wrote: > > On Jul 17, 2009, at 8:45 PM, J G wrote: >> >> Is it possible to obtain the SOL

Re: SOLR: Replication

2010-01-03 Thread Peter Wolanin
Related to the difference between rsync and native Solr replication - we are seeing issues with Solr 1.4 where search queries that come in during a replication request hang for excessive amount of time (up to 100's of seconds for a result normally that takes ~50 ms). We are replicating pretty ofte

Re: SOLR Performance Tuning: Pagination

2010-01-03 Thread Peter Wolanin
At the NOVA Apache Lucene/Solr Meetup last May, one of the speakers from Near Infinity (Aaron McCurry I think) mentioned that he had a patch for lucene that enabled unlimited depth memory-efficient paging. Is anyone in contact with him? -Peter On Thu, Dec 24, 2009 at 11:27 AM, Grant Ingersoll w

Re: Indexing the latests MS Office documents

2010-01-04 Thread Peter Wolanin
You must have been searching old documentation - I think tika 0,3+ has support for the new MS formats. but don't take my word for it - why don't you build tika and try it? -Peter On Sun, Jan 3, 2010 at 7:00 PM, Roland Villemoes wrote: > Hi All, > > Anyone who knows how to index the latest MS o

dramatic load from stas.jsp page

2010-01-05 Thread Peter Wolanin
The attached screenshot shows the transition on a master search server when we updated from a Solr 1.4 dev build (revision 779609 from 2009-05-28) to the Solr 1.4.0 released code. Every 3 hours we have a cron task to log some of the data from the stats.jsp page from each core (about 100 cores, mos

Re: internal XML parser used in Solr

2010-01-05 Thread Peter Wolanin
Config.java (which parses e.g. solrconfig.xml) in the solr core code has: import org.w3c.dom.Document; import org.w3c.dom.Node; import org.xml.sax.SAXException; import org.apache.solr.common.SolrException; import org.apache.solr.common.util.DOMUtil; import javax.xml.parsers.*; import javax.xml.xpa

Re: SOLR Performance Tuning: Pagination

2010-01-07 Thread Peter Wolanin
patch. > So far he has 2 x +1 from Grant and me to stick his patch in JIRA. > >  Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > - Original Message >> From: Peter Wolanin >> To: solr-user@lucene.apache.org >> Sent: S

Re: Solr 1.4 - stats page slow

2010-01-07 Thread Peter Wolanin
I recently noticed the same sort of thing. The attached screenshot shows the transition on a search server when we updated from a Solr 1.4 dev build (revision 779609 from 2009-05-28) to the Solr 1.4.0 released code. Every 3 hours we have a cron task to log some of the data from the stats.jsp page

Re: Solr 1.4 - stats page slow

2010-01-08 Thread Peter Wolanin
ripped by > ML manager.  Maybe upload it somewhere? >  Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > ----- Original Message >> From: Peter Wolanin >> To: solr-user@lucene.apache.org >> Sent: Thu, January 7, 2010 9:32:

Re: Basic questions about Solr cost in programming time

2010-01-26 Thread Peter Wolanin
Having worked quite a bit on the Drupal integration - here's my quick take: If you have someone help you the first time, you can have a basic implementation running in Jetty in about 15 minutes. On your own, a couple hours maybe. For a non-public site (intranet) with modest traffic and no require

Re: Solr 1.4 - stats page slow

2010-01-26 Thread Peter Wolanin
Sorry for not following up sooner- been a busy last couple weeks. We do see a significant instanity count - could this be due to updating indexes from the dev Solr build? E.g. on one server I see 61 and entries like: SUBREADER: Found caches for de

Re: schema.xml and Xinclude

2010-01-26 Thread Peter Wolanin
It doesn't really work with the schema.xml - I beat my head on it for a few hours not long ago - maybe I sent an e-mail to this list about it? Yes, here: http://www.lucidimagination.com/search/document/ba68aa6f2f7702c3/is_it_possible_to_use_xinclude_in_schema_xml -Peter On Wed, Jan 6, 2010 at

Re: Solr 1.4 - stats page slow

2010-02-07 Thread Peter Wolanin
Yes, we do have some fields (like the creation date) that we use for both sorting and faceting. -Peter On Tue, Jan 26, 2010 at 8:55 PM, Yonik Seeley wrote: > On Tue, Jan 26, 2010 at 8:49 PM, Peter Wolanin > wrote: >> Sorry for not following up sooner- been a busy last couple wee

Re: Solr/Drupal Integration - Query Question

2010-02-24 Thread Peter Wolanin
Can you tell me more about the rord() performance issues? I'm one of the maintainers of the Drupal module, so I'd like to switch if there is a better option. Thanks, Peter On Wed, Feb 10, 2010 at 12:00 AM, Lance Norskog wrote: > The admin/form.jsp is supposed to prepopulate fl= with '*,score'

Re: Solr/Drupal Integration - Query Question

2010-02-24 Thread Peter Wolanin
The Drupal schema and solrconfig and the example schema and solrconfig have different fields and defaults, and likely Drupal won't find the fields its looking for and might not be even using the right query perser. -Peter On Thu, Feb 11, 2010 at 3:19 PM, jaybytez wrote: > > So I got it to work b

Solr 1.4 bug? search fails but analyzer indicates a match

2010-03-27 Thread Peter Wolanin
Ran into an odd situation today searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. Our default handler is dismax, but this also fails with the st

Re: Solr 1.4 bug? search fails but analyzer indicates a match

2010-03-27 Thread Peter Wolanin
Hi Mitch, I am also seeing this locally with the exact same solr.war, solrconfig.xml, and schema.xml running under Jetty, as well as on 2 different production servers with the same content indexed. So this is really weird - this seems to be influenced by the surrounding text: "would be great to

Re: Solr 1.4 bug? search fails but analyzer indicates a match

2010-03-27 Thread Peter Wolanin
If I empty the stopword file and re-index, all expected matches happen. So maybe that provides a further suggestion of where the problem is. This certainly feels like a Solr bug (or lucene bug?). -Peter On Sat, Mar 27, 2010 at 3:05 PM, Peter Wolanin wrote: > Hi Mitch, > > I am al

Re: Solr 1.4 bug? search fails but analyzer indicates a match

2010-03-27 Thread Peter Wolanin
The output on the analysis screen does look correct. Here are 2 screen shots: empty stopwords: http://img.skitch.com/20100327-rcsjdih4bn3y8ahajqa5wjwybd.png standard stopwords: http://img.skitch.com/20100327-1w5ct1wr25jkir4sji8kumefn1.png -Peter On Sat, Mar 27, 2010 at 4:13 PM, MitchK wrote: >

Re: Solr 1.4 bug? search fails but analyzer indicates a match

2010-03-27 Thread Peter Wolanin
ct to have that directive here, or is this a bug? -Peter On Sat, Mar 27, 2010 at 4:25 PM, Peter Wolanin wrote: > The output on the analysis screen does look correct. Here are 2 screen shots: > > empty stopwords: http://img.skitch.com/20100327-rcsjdih4bn3y8ahajqa5wjwybd.png > > s

Re: Solr 1.4 bug? search fails but analyzer indicates a match

2010-03-27 Thread Peter Wolanin
ens, not a phrase). -Peter On Sat, Mar 27, 2010 at 4:32 PM, Peter Wolanin wrote: > The stopwords stanza looks like: > >                        ignoreCase="true" >                words="stopwords.txt" >                enablePositionIncrements="true" >    

Re: Solr 1.4 bug? search fails but analyzer indicates a match

2010-03-27 Thread Peter Wolanin
Created a new issue: https://issues.apache.org/jira/browse/SOLR-1852 further discussion there. -Peter On Sat, Mar 27, 2010 at 5:51 PM, Peter Wolanin wrote: > Discussing this with Mark Miller in IRC - we are honing in on the problem. > > Looks as though Identi.ca is treated as phrase

Re: Solr 1.4 bug? search fails but analyzer indicates a match

2010-03-28 Thread Peter Wolanin
I think it is clearly a bug - see comments on the issue by Robert Muir. https://issues.apache.org/jira/browse/SOLR-1852 The patch is a backport by Mark Miller of Robert's fixes for other problems for the WordDelimiterFilter in Solr trunk. Those fixes also fix this bug as a side effect. -Peter

Re: Evangelism

2010-04-29 Thread Peter Wolanin
A very abbreviated list of sites using Apache Solr + Drupal here: http://drupal.org/node/447564 -Peter On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman wrote: > Hi I'm new to the list here, > > > > I'd like to steer someone in the direction of Solr, and I see the list of > companies using solr,