Managed schema used with Cloudera MapreduceIndexerTool and morphlines?

2017-03-17 Thread Jay Hill
I've got a very difficult project to tackle. I've been tasked with using schemaless mode to index json files that we receive. The structure of the json files will always be very different as we're receiving files from different customers totally unrelated to one another. We are attempting to build

Re: Very long running replication.

2014-02-27 Thread Jay Hill
Bumping this. I'm seeing the error mentioned earlier in the thread - "Unable to download completely. Downloaded 0!=" often in my logs. I'm dealing with a situation where maxDoc count is growing at a faster rate than numDocs and is now almost twice as large. I'm not optimizing but rather relying o

Re: Loading custom update request handler on startup

2012-07-09 Thread Jay Hill
tup up the thread for my polling UpdateRequestHandler. This seems to work, but if anyone has a better (or more tested) approach please let us know. -Jay On Mon, Jul 9, 2012 at 2:33 PM, Jay Hill wrote: > I'm writing a custom update request handler that will poll a "hot" > d

Loading custom update request handler on startup

2012-07-09 Thread Jay Hill
I'm writing a custom update request handler that will poll a "hot" directory for Solr xml files and index anything it finds there. The custom class implements Runnable, and when the run method is called the loop starts to do the polling. How can I tell Solr to load this class on startup to fire off

Re: Solr Single Core vs Multiple Cores installation for localization

2012-05-21 Thread Jay Hill
Usually I would recommend trying to index all languages into one Solr core. The determining factor for me is how much "overlap" there is in fields for each language, i.e. how many common fields for each language. For example if you have 60 common fields to all languages, but only 8 fields that are

Re: TermsComponent show only terms that matched query?

2012-02-27 Thread Jay Hill
t; > Best > Erick > > On Fri, Feb 24, 2012 at 3:31 PM, Jay Hill wrote: > > I have a situation where I want to show the term counts as is done in the > > TermsComponent, but *only* for terms that are *matched* in a query, so I > > get something returned like this (p

TermsComponent show only terms that matched query?

2012-02-24 Thread Jay Hill
I have a situation where I want to show the term counts as is done in the TermsComponent, but *only* for terms that are *matched* in a query, so I get something returned like this (pseudo code): q=title:(golf swing) title: golf legends show how to improve your golf swing on the golf course ...ot

Complex query, need filtering after query not before

2012-01-27 Thread Jay Hill
I have a project where we need to search 1B docs and still have results < 700ms. The problem is, we are using geofiltering and that is happening * before* the queries, so we have to geofilter on the 1B docs to restrict our set of docs first, and then do the query on a name field. But it seems that

Re: Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
ibuted search, meaning if a response wasn't received w/in the timeAllowed, and if partialResults is true, then that shard would not be waited on for results. is that correct? thanks, -jay On Thu, Jan 26, 2012 at 2:23 PM, Jay Hill wrote: > We're on the trunk: > 4.0-2011-10-26_08-

Re: Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
We're on the trunk: 4.0-2011-10-26_08-46-59 1189079 - hudson - 2011-10-26 08:51:47 Client timeouts are set to 4 seconds. Thanks, -Jay On Thu, Jan 26, 2012 at 1:40 PM, Mark Miller wrote: > > On Jan 26, 2012, at 1:28 PM, Jay Hill wrote: > > > > > I've tried sett

Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
I'm on a project where we have 1B docs sharded across 20 servers. We're not in production yet and we're doing load tests now. We're sending load to hit 100qps per server. As the load increases we're seeing query times sporadically increasing to 10 seconds, 20 seconds, etc. at times. What we're tryi

/no_coord in dismax scoring explain

2012-01-06 Thread Jay Hill
What does "/no_coord" mean in the dismax scoring output? I've looked through the wiki mail archives, lucidfind, and can't find any reference. -- ¡jah!

Re: facet search and UnInverted multi-valued field?

2011-05-03 Thread Jay Hill
UnInvertedField is similar to Lucene's FieldCache, except, while the FieldCache cannot work with multivalued fields, UnInvertedField is designed for that very purpose. So since your f_dcperson field is multivalued, by default you use UnInvertedField. You're not doing anything wrong, that's default

Scaling Search with Big Data/Hadoop and Solr now available at Lucene Revolution

2011-04-25 Thread Jay Hill
I've worked with a lot of different Solr implementations, and one area that is emerging more and more is using Solr in combination with other "big data" solutions. My company, Lucid Imagination, has added a two-day course to our upcoming Lucene Revolution conference, "Scaling Search with Big Data a

Re: Multiple Tags and Facets

2011-04-21 Thread Jay Hill
I don't think I understand what you're trying to do. Are you trying to preserve all facets after a user clicks on a facet, and thereby triggers a filter query, which excludes the other facets? If that's the case, you can use local parameters to tag the filter queries so they are not used for the fa

Re: Understanding the DisMax tie parameter

2011-04-15 Thread Jay Hill
Looks good, thanks Tom. -Jay On Fri, Apr 15, 2011 at 8:55 AM, Burton-West, Tom wrote: > Thanks everyone. > > I updated the wiki. If you have a chance please take a look and check to > make sure I got it right on the wiki. > > http://wiki.apache.org/solr/DisMaxQParserPlugin#tie_.28Tie_breaker.2

Re: Understanding the DisMax tie parameter

2011-04-14 Thread Jay Hill
Dismax works by first selecting the highest scoring sub-query of all the sub-queries that were run. If I want to search on three fields, manu, name and features, I can configure dismax like this: dismax * 0.0* manu name features *:* Now I'll use this query: http

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-13 Thread Jay Hill
As Hoss mentioned earlier in the thread, you can use the statistics page from the admin console to view the current number of segments. But if you want to know by looking at the files, each segment will have a unique prefix, such as "_u". There will be one unique prefix for every segment in the ind

Re: WordDelimiterFilterFactory

2011-02-04 Thread Jay Hill
You can always try something like this out in the analysis.jsp page, accessible from the Solr Admin home. Check out that page and see how it allows you to enter text to represent what was indexed, and text for a query. You can then see if there are matches. Very handy to see how the various filters

Re: phrase, inidividual term, prefix, fuzzy and stemming search

2011-02-04 Thread Jay Hill
You mentioned that dismax does not support wildcards, but edismax does. Not sure if dismax would have solved your other problems, or whether you just had to shift gears because of the wildcard issue, but you might want to have a look at edismax. -Jay http://www.lucidimagination.com On Mon, Jan 3

Re: Tuning Solr

2010-10-05 Thread Jay Hill
Removing those components is not likely to impact performance very much, if at all. I would focus on other areas when tuning performance, such as looking memory usage and configuration, query design, etc. But there isn't any harm in removing them either. Why not do some load tests with the componen

Re: OutOfMemoryErrors

2010-08-17 Thread Jay Hill
A merge factor of 100 is very high and out of the norm. Try starting with a value of 10. I've never seen a running system with a value anywhere near this high. Also, what is your setting for ramBufferSizeMB? -Jay On Tue, Aug 17, 2010 at 10:46 AM, rajini maski wrote: > yeah sorry I forgot to men

Creating new Solr cores using relative paths

2010-08-16 Thread Jay Hill
I'm having trouble getting the core CREATE command to work with relative paths in the solr.xml configuration. I'm working with a layout like this: /opt/solr [this is solr.solr.home: $SOLR_HOME] /opt/solr/solr.xml /opt/solr/core0/ [this is the "template" core] /opt/solr/core0/conf/schema.xml [etc.]

SolrJ: Setting multiple parameters

2010-06-20 Thread Jay Hill
Working with SolrJ I'm doing a query using the StatsComponent, and the stats.facet parameter. I'm not able to set multiple fields for the "stats.facet" parameter using SolrJ. Here is the query I'm trying to create: http://localhost:8983/solr/select/?q=*:*&stats=on&stats.field=fieldForStats&stats.f

Anyone using Solr spatial from trunk?

2010-06-07 Thread Jay Hill
I was wondering about the production readiness of the new-in-trunk spatial functionality. Is anyone using this in a production environment? -Jay

Re: Index-time vs. search-time boosting performance

2010-06-04 Thread Jay Hill
I've done a lot of recency boosting to documents, and I'm wondering why you would want to do that at index time. If you are continuously indexing new documents, what was "recent" when it was indexed becomes, over time "less recent". Are you unsatisfied with your current performance with the boost f

Auto-suggest internal terms

2010-06-02 Thread Jay Hill
I've got a situation where I'm looking to build an auto-suggest where any term entered will lead to suggestions. For example, if I type "wine" I want to see suggestions like this: french *wine* classes *wine* book discounts burgundy *wine* etc. I've tried some tricks with shingles, but the only

Re: field length normalization

2010-03-11 Thread Jay Hill
The fieldNorm is computed like this: fieldNorm = lengthNorm * documentBoost * documentFieldBoosts and the lengthNorm is: lengthNorm = 1/(numTermsInField)**.5 [note that the value is encoded as a single byte, so there is some precision loss] So the values are not pre-set for the lengthNorm, but

Re: Question about fieldNorms

2010-03-07 Thread Jay Hill
Yes, if omitNorms=true, then no lengthNorm calculation will be done, and the fieldNorm value will be 1.0, and lengths of the field in question will not be a factor in the score. To see an example of this you can do a quick test. Add two "text" fields, and on one omitNorms: Index a doc wi

Re: Free Webinar: Mastering Solr 1.4 with Yonik Seeley

2010-02-26 Thread Jay Hill
Yes, it will be recorded and available to view after the presentation. -Jay On Thu, Feb 25, 2010 at 2:19 PM, Bernadette Houghton < bernadette.hough...@deakin.edu.au> wrote: > Yonk, can you please advise whether this event will be recorded and > available for later download? (It starts 5am our t

Re: optimize is taking too much time

2010-02-22 Thread Jay Hill
Thanks for clearing that up guys, I misspoke slightly. It's just that, in a running system, it's probably very rare that there is only a single segment for any meaningful length of time. Unless that merge-down-to-one occurs right when indexing stops there will almost always be a new (small) segment

Re: optimize is taking too much time

2010-02-22 Thread Jay Hill
With a mergeFactor set to anything > 1 you would never have only one segment - unless you optimized. So Lucene will never naturally merge all the segments into one. Unless, I suppose, the mergeFactor was set to 1, but I've never tested that. It's hard to picture how that would work. If I understan

Re: score computation for dismax handler

2010-02-22 Thread Jay Hill
Set the "tie" parameter to 1.0. This param is set between 0.0 (pure disjunction maximum) and 1.0 (pure disjunction sum): http://wiki.apache.org/solr/DisMaxRequestHandler#tie_.28Tie_breaker.29 -Jay On Thu, Feb 18, 2010 at 4:24 AM, bharath venkatesh < bharathv6.proj...@gmail.com> wrote: > Hi , >

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-22 Thread Jay Hill
Looks like multi-threaded support was added to the DIH recently: http://issues.apache.org/jira/browse/SOLR-1352 -Jay On Fri, Feb 19, 2010 at 6:27 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Glen may be referring to LuSql indexing with multiple threads? > Does/can DIH do that, to

For caches, any reason to not set initialSize and size to the same value?

2010-02-12 Thread Jay Hill
If I've done a lot of research and have a very good idea of where my cache sizes are having monitored the stats right before commits, is there any reason why I wouldn't just set the initialSize and size counts to the same values? Is there any reason to set a smaller initialSize if I know reliably t

Solr Analysis Webinar Jan 28, 2010

2010-01-20 Thread Jay Hill
My colleague at Lucid Imagination, Tom Hill, will be presenting a free webinar focused on analysis in Lucene/Solr. If you're interested, please sign up and join us. Here is the official notice: We'd like to invite you to a free webinar our company is offering next Thursday, 28 January, at 2PM Eas

Re: solr blocking on commit

2010-01-19 Thread Jay Hill
A couple of follow up questions: - What type of garbage collector is in use? - How often are you optimizing the index? - In solrconfig.xml what is the setting for ? - Right before and after you see this pause, check the output of http://:/solr/admin/system, specifically the output of and send thi

Re: Solr 1.4 - stats page slow

2010-01-08 Thread Jay Hill
Actually my cases were all with customers I work with, not just one case. A common practice is to monitor cache stats to tune the caches properly. Also, noting the warmup times for new IndexSearchers, etc. I've worked with people that have excessive auto-warm count values which is causing extremely

Re: Solr 1.4 - stats page slow

2010-01-08 Thread Jay Hill
It's definitely still an issue. I've seen this with at least four different Solr implementations. It clearly seems to be a problem when there is a large field cache. It would be bad enough if the stats.jsp was just slow to load (usually takes 1 to 2 minutes), but when monitoring memory usage with j

Re: Indexing the latests MS Office documents

2010-01-05 Thread Jay Hill
The version of Tika in the 1.4 release definitely parses the most current Office formats (.docx, .pptx, etc.) and they index as expected. -Jay On Mon, Jan 4, 2010 at 6:02 PM, Peter Wolanin wrote: > You must have been searching old documentation - I think tika 0,3+ has > support for the new MS f

Re: Solr 1.4 - stats page slow

2009-12-24 Thread Jay Hill
Also, what is your heap size and the amount of RAM on the machine? I've also noticed that, when watching memory usage through JConsole or YourKit while loading the stats page, the memory usage spikes dramatically - are you seeing this as well? -Jay On Thu, Dec 24, 2009 at 9:12 AM, Jay

Re: Solr 1.4 - stats page slow

2009-12-24 Thread Jay Hill
I've noticed this as well, usually when working with a large field cache. I haven't done in-depth analysis of this yet, but it seems like when the stats page is trying to pull data from a large field cache it takes quite a long time. Are you doing a lot of sorting? If so, what are the field types

Re: Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Jay Hill
Oh, forgot to add (just to keep the thread complete), the field is being used for a sort, so it was able to use TrieDoubleField. Thanks again, -Jay On Sat, Dec 19, 2009 at 12:21 PM, Jay Hill wrote: > This field is of class type solr.SortableDoubleField. > > I'm actually migra

Re: Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Jay Hill
n Sat, Dec 19, 2009 at 11:37 AM, Yonik Seeley wrote: > On Sat, Dec 19, 2009 at 2:25 PM, Jay Hill wrote: > > One thing that struck me as odd in the output of the stats.jsp page is > that > > the field cache always shows a String type for a field, even if it is not > a > >

Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Jay Hill
I'm on a project where I'm trying to determine the size of the field cache. We're seeing lots of memory problems, and I suspect that the field cache is extremely large, but I'm trying to get exact counts on what's in the field cache. One thing that struck me as odd in the output of the stats.jsp p

Sanity check on numeric types and which of them to use

2009-12-04 Thread Jay Hill
Looking at the example version of schema.xml there seems to be some confusion on which numeric field types are best used in different situations. What confused me was that the type of "int" is now set to a TrieIntField, but with a precisionStep of 0: ' the "tint" type is set up as a TrieIntFiel

Re: nested queries

2009-11-19 Thread Jay Hill
I don't think your queries are actually nested queries. Nested queries key off of the "magic" field name _query_. You're right however that there is very little in the way of documentation of examples of nested queries. If you haven't seen this blog about them yet you might find this a helpful over

Replication admin page auto-reload

2009-11-16 Thread Jay Hill
The replication admin page on slaves used to have an auto-reload set to reload every few seconds. In the official 1.4 release this doesn't seem to be working, but it does in a nightly build from early June. Was this changed on purpose or is this a bug? I looked through CHANGES.txt to see if anythin

Re: Wildcards at the Beginning of a Search.

2009-11-16 Thread Jay Hill
There is a "text_rev" field type in the example schema.xml file in the official release of 1.4. It uses the ReversedWildcardFilterFactory to revers a field. You can do a copyField from the field you want to use for leading wildcard searches to a field using the text_rev field, and then do a regular

Re: specify multiple files in for DataImportHandler

2009-11-05 Thread Jay Hill
You can set up multiple request handlers each with their own configuration file. For example, in addition to the config you listed you could add something like this: data-two-config.xml and so on with as many handlers as you need. -Jay http://www.lucidimagination.com On Thu, Nov 5, 2009 a

Re: Sending file to Solr via HTTP POST

2009-11-05 Thread Jay Hill
Here is a brief example of how to use SolrJ with the ExtractingRequestHandler: ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract"); req.addFile(fileToIndex); req.setParam("literal.id", getId(fileToIndex)); req.setParam("literal

Re: CPU utilization and query time high on Solr slave when snapshot install

2009-11-02 Thread Jay Hill
So assuming you set up a few sample sort queries to run in the firstSearcher config, and had very low query volume during that ten minutes so that there were no evictions before a new Searcher was loaded, would those queries run by the firstSearcher be passed along to the cache for the next Searche

Re: solr web ui

2009-10-30 Thread Jay Hill
Have a look at the VelocityResponseWriter ( http://wiki.apache.org/solr/VelocityResponseWriter). It's in the contrib area, but the wiki has instructions on how to move it into your core Solr. Solr uses response writers to return results. The default is XML but responses can be returned in JSON, Rub

Re: Facets - ORing attribute values

2009-10-29 Thread Jay Hill
1.4 has a good chance of being released next week. There was a hope that it might make it this week, but another bug in Lucene 2.9.1 was found, pushing things back just a little bit longer. -Jay http://www.lucidimagination.com On Thu, Oct 29, 2009 at 11:43 AM, beaviebugeater wrote: > > Do you h

Re: java -Dsolr.solr.home=core -jar start.jar not working for me

2009-10-09 Thread Jay Hill
> 2009-10-09 13:37:05.096::INFO: Started SocketConnector @ 0.0.0.0:8983 > > And http://localhost:8983/solr/admin yields a 404 error. > > On Fri, Oct 9, 2009 at 1:27 PM, Jay Hill wrote: > > Shouldn't that be: java -Dsolr.solr.home=multicore -jar start.jar > > > > an

Re: java -Dsolr.solr.home=core -jar start.jar not working for me

2009-10-09 Thread Jay Hill
Shouldn't that be: java -Dsolr.solr.home=multicore -jar start.jar and then hit url: http://localhost:8983/solr/core0/admin/ or http://localhost:8983/solr/core1/admin/ -Jay http://www.lucidimagination.com On Fri, Oct 9, 2009 at 1:17 PM, Jason Rutherglen wrote: > I have a fresh checkout from t

Re: Dynamic Data Import from multiple identical tables

2009-10-09 Thread Jay Hill
You could use separate DIH config files for each of your three tables. This might be overkill, but it would keep them separate. The DIH is not limited to one request handler setup, so you could create a unique handler for each case with a unique name: table1-config.xml

Re: concatenating tokens

2009-10-09 Thread Jay Hill
Use copyField to copy to a field with a field type like this: This works for your example, however I can't be sure if it will work for all of your content, but give it a try and see. -Jay http://www.lucid

Re: DIH: Setting rows= on full-import has no effect

2009-10-09 Thread Jay Hill
//issues.apache.org/jira/browse/SOLR-1501 > > > > On Fri, Oct 9, 2009 at 6:10 AM, Jay Hill wrote: > > > In the past setting rows=n with the full-import command has stopped the > > DIH > > > importing at the number I passed in, but now this doesn't se

DIH: Setting rows= on full-import has no effect

2009-10-08 Thread Jay Hill
In the past setting rows=n with the full-import command has stopped the DIH importing at the number I passed in, but now this doesn't seem to be working. Here is the command I'm using: curl ' http://localhost:8983/solr/indexer/mediawiki?command=full-import&rows=100' But when 100 docs are imported

Re: ISOLatin1AccentFilter before or after Snowball?

2009-10-07 Thread Jay Hill
Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory deprecated in favor of: in 1.4? -Jay http://www.lucidimagination.com On Wed, Oct 7, 2009 at 1:44 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Tue, Oct 6, 2009 at 4:33 PM, Chantal Ackermann < > chantal.

Re: TermsComponent or auto-suggest with filter

2009-10-07 Thread Jay Hill
"Two other approaches are to use either the TermsComponent (new in Solr > > 1.4) or faceting." > > > > On Wed, Oct 7, 2009 at 1:51 AM, Jay Hill wrote: > > > Have a look at a blog I posted on how to use EdgeNGrams to build an > > auto-suggest tool: >

Re: TermsComponent or auto-suggest with filter

2009-10-06 Thread Jay Hill
Have a look at a blog I posted on how to use EdgeNGrams to build an auto-suggest tool: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ You could easily add filter queries to this approach. Ffor example, the query used in the blog could add filter

Batching requests using SolrCell with SolrJ

2009-09-19 Thread Jay Hill
When working with SolrJ I have typically batched a Collection of SolrInputDocument objects before sending them to the Solr server. I'm working with the latest nightly build and using the ExtractingRequestHandler to index documents, and everything is working fine. Except I haven't been able to figur

Any way to encrypt/decrypt stored fields?

2009-09-16 Thread Jay Hill
For security reasons (say I'm indexing very sensitive data, medical records for example) is there a way to encrypt data that is stored in Solr? Some businesses I've encountered have such needs and this is a barrier to them adopting Solr to replace other legacy systems. Would it require a custom-wri

Re: KStem download

2009-09-14 Thread Jay Hill
The two jar files are all you should need, and the configuration is correct. However I noticed that you are on Solr 1.3. I haven't tested the Lucid KStemmer on a non-Lucid-certified distribution of 1.3. I have tested it on recent versions of 1.4 and it works fine (just tested with the most recent n

Re: Is it possible to query for "everything" ?

2009-09-14 Thread Jay Hill
With dismax you can use q.alt when the q param is missing: q.alt=*:* should work. -Jay On Mon, Sep 14, 2009 at 5:38 PM, Jonathan Vanasco wrote: > Thanks Jay & Matt > > I tried *:* on my app, and it didn't work > > I tried it on the solr admin, and it did > > I checked the solr config file, and

Re: Is it possible to query for "everything" ?

2009-09-14 Thread Jay Hill
Use: ?q=*:* -Jay http://www.lucidimagination.com On Mon, Sep 14, 2009 at 4:18 PM, Jonathan Vanasco wrote: > I'm using Solr for seach and faceted browsing > > Is it possible to have solr search for 'everything' , at least as far as q > is concerned ? > > The request handlers I've found don't li

Re: Highlighting in SolrJ?

2009-09-12 Thread Jay Hill
Will do Shalin. -Jay http://www.lucidimagination.com On Fri, Sep 11, 2009 at 9:23 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Jay, it would be great if you can add this example to the Solrj wiki: > > http://wiki.apache.org/solr/Solrj > > On Fri, Sep 11,

Re: "standard" requestHandler components

2009-09-11 Thread Jay Hill
RequestHandlers are configured in solrconfig.xml. If no components are explicitly declared in the request handler config the the defaults are used. They are: - QueryComponent - FacetComponent - MoreLikeThisComponent - HighlightComponent - StatsComponent - DebugComponent If you wanted to have a cus

Re: Highlighting in SolrJ?

2009-09-11 Thread Jay Hill
gh lighted, even if the search term only occurs in the > first line of a 300 page field. I'm not sure if mergeContinuous will > do that, or if it will miss everything after the last line that > contains the search term. > > On Fri, Sep 11, 2009 at 10:42 AM, Jay Hill wrote: > &g

Re: Highlighting in SolrJ?

2009-09-11 Thread Jay Hill
one line out of the > whole field as a snippet. > > On Thu, Sep 10, 2009 at 7:45 PM, Jay Hill wrote: > > Set up the query like this to highlight a field named "content": > > > >SolrQuery query = new SolrQuery(); > >query.setQuery("foo"); >

Re: Highlighting in SolrJ?

2009-09-10 Thread Jay Hill
Set up the query like this to highlight a field named "content": SolrQuery query = new SolrQuery(); query.setQuery("foo"); query.setHighlight(true).setHighlightSnippets(1); //set other params as needed query.setParam("hl.fl", "content"); QueryResponse queryResponse =getSolrSe

Re: TermsComponent

2009-09-10 Thread Jay Hill
If you need an alternative to using the TermsComponent for auto-suggest, have a look at this blog on using EdgeNGrams instead of the TermsComponent. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ -Jay http://www.lucidimagination.com On Wed, S

Re: Pagination with solr json data

2009-09-10 Thread Jay Hill
All you have to do is use the "start" and "rows" parameters to get the results you want. For example, the query for the first page of results might look like this, ?q=solr&start=0&rows=10 (other params omitted). So you'll start at the beginning (0) and get 10 results. They next page would be ?q=sol

Re: Sort a Multivalue field

2009-09-09 Thread Jay Hill
Unfortunately you can't sort on a multi-valued field. In order to sort on a field it must be indexed but not multi-valued. Have a look at the FieldOptions wiki page for a good description of what values to set for different use cases: http://wiki.apache.org/solr/FieldOptionsByUseCase -Jay www.luc

Re: Field names with whitespaces

2009-08-31 Thread Jay Hill
This seems to work: ?q=field\ name:something Probably not a good idea to have field names with whitespace though. -Jay 2009/8/28 Marcin Kuptel > Hi, > > Is there a way to query solr about fields which names contain whitespaces? > Indexing such data does not cause any problems but I have been

Re: MoreLikeThis: How to get quality terms from html from content stream?

2009-08-09 Thread Jay Hill
8, 2009, at 10:42 AM, Ken Krugler wrote: > > >> On Aug 7, 2009, at 5:23pm, Jay Hill wrote: >> >> I'm using the MoreLikeThisHandler with a content stream to get documents >>> from my index that match content from an html page like this: >>> >>> http:

MoreLikeThis: How to get quality terms from html from content stream?

2009-08-07 Thread Jay Hill
I'm using the MoreLikeThisHandler with a content stream to get documents from my index that match content from an html page like this: http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2009/08/06/SP5R194Q13.DTL&mlt.fl=body&rows=4&debugQuery=true But, not su

Re: DIH: Any way to make update on db table?

2009-08-04 Thread Jay Hill
updates. > > > Writing a database procedure might be a good idea. In that case your > > query > > > will simply be > .../>. > > > All the heavy lifting can be done by this query. > > > > > > Moreover, update queries, only return the

DIH: Any way to make update on db table?

2009-08-03 Thread Jay Hill
Is it possible for the DataImportHandler to update records in the table it is querying? For example, say I have a query like this in my entity: query="select field1, field2, from someTable where hasBeenIndexed=false" Is there a way I can mark each record processed by updating the hasBeenIndexed f

Re: How can i get lucene index format version information?

2009-07-30 Thread Jay Hill
Check the system request handler: http://localhost:8983/solr/admin/system Should look something like this: 1.3.0.2009.07.28.10.39.42 1.4-dev 797693M - jayhill - 2009-07-28 10:39:42 2.9-dev 2.9-dev 794238 - 2009-07-15 18:05:08 -Jay On Thu, Jul 30, 2009 at 10:32 AM, Walter Underwood wrote: > I

FieldCollapsing: Two response elements returned?

2009-07-27 Thread Jay Hill
I'm doing some testing with field collapsing, and early results look good. One thing seems odd to me however. I would expect to get back one block of results, but I get two - the first one contains the collapsed results, the second one contains the full non-collapsed results: ... ... This see

Re: DIH: On import (full or delta) commit=false seems to not take effect

2009-07-15 Thread Jay Hill
Actually, "my good" after all. The parameter does not take effect. If commit=false is passed in a commit still happens. Will open and JIRA and supply a patch shortly. -Jay On Wed, Jul 15, 2009 at 5:50 PM, Jay Hill wrote: > My bad, I had a configuration setting overriding this

Re: DIH: On import (full or delta) commit=false seems to not take effect

2009-07-15 Thread Jay Hill
My bad, I had a configuration setting overriding this value. Sorry for the mistake. -Jay On Wed, Jul 15, 2009 at 12:07 PM, Jay Hill wrote: > I am trying to run full and delta imports with the commit=false option, but > it doesn't seem to take effect - after the import a commit alw

Re: spellcheck with misspelled words in index

2009-07-15 Thread Jay Hill
We had the same thing to deal with recently, and a great solution was posted to the list. Create a stopwords filter on the field your using for your spell checking, and then populate a custom stopwords file with known misspelled words: Y

DIH: On import (full or delta) commit=false seems to not take effect

2009-07-15 Thread Jay Hill
I am trying to run full and delta imports with the commit=false option, but it doesn't seem to take effect - after the import a commit always happens no matter what params I send. I've looked at the source and unless I'm missing something it doesn't seem to process the commit param. Here's the url

Spell checking: Is there a way to exclude words known to be wrong?

2009-07-13 Thread Jay Hill
We're building a spell index from a field in our main index with the following configuration: textSpell default spell ./spellchecker true This works great and re-builds the spelling index on commits as expected. However, we know there are misspellings in

Re: Creating DataSource for DIH to Oracle Database

2009-07-09 Thread Jay Hill
Francis, your question is a little vague. Are you looking for the configuration for connecting the DIH to a JNDI datasource set up in Weblogic? -Jay On Mon, Jul 6, 2009 at 2:41 PM, Francis Yakin wrote: > > Have any one had experience creating a datasource for DIH to an Oracle > Database?

Re: Indexing rich documents from websites using ExtractingRequestHandler

2009-07-08 Thread Jay Hill
I haven't tried this myself, but it sounds like what you're looking for is enabling remote streaming: http://wiki.apache.org/solr/ContentStream#head-7179a128a2fdd5dde6b1af553ed41735402aadbf As the link above shows you should be able to enable remote streaming like this: and then something like t

Re: about defaultSearchField

2009-07-08 Thread Jay Hill
Just to be sure: You mentioned that you "adjusted" schema.xml - did you re-index after making your changes? -Jay On Wed, Jul 8, 2009 at 7:07 AM, Yang Lin wrote: > Thanks for your reply. But it works not. > > Yang > > 2009/7/8 Yao Ge > > > > > Try with fl=* or fl=*,score added to your request

Re: Indexing XML

2009-07-07 Thread Jay Hill
Mathieu, have a look at Solr's DataImportHandler. It provides a configuration-based approach to index different types of datasources including relational databases and XML files. In particular have a look at the XpathEntityProcessor ( http://wiki.apache.org/solr/DataImportHandler#head-f1502b1ed71d9

Re: DIH: Limited xpath syntax unable to parse all xml elements

2009-07-02 Thread Jay Hill
Thanks Fergus, setting the field to multivalued did work: gets all the elements as multivalue fields in the body field. The only thing is, the body field is used by some other content sources, so I have to look at the implications setting it to multi-valued will have on the other data sour

Re: DIH: Limited xpath syntax unable to parse all xml elements

2009-07-02 Thread Jay Hill
I'm on the trunk, built on July 2: 1.4-dev 789506 Thanks, -Jay On Thu, Jul 2, 2009 at 11:33 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Thu, Jul 2, 2009 at 11:38 PM, Mark Miller > wrote: > > > Shalin Shekhar Mangar wrote: > > > >> > >> It selects all matching nodes. But if t

Re: DIH: Limited xpath syntax unable to parse all xml elements

2009-07-02 Thread Jay Hill
dy/chapter//p) doesn't seem to be > >supported. > > > >Thanks, > >-Jay > > > > > >2009/7/1 Noble Paul ?? Â Ë³Ë > > > >> complete xpath is not supported > >> > >> /book/body/chapter/p > >> > >> s

Re: DIH: Limited xpath syntax unable to parse all xml elements

2009-07-02 Thread Jay Hill
under irrespective of nesting , tag > names use this > > > > > > > > On Thu, Jul 2, 2009 at 5:31 AM, Jay Hill wrote: > > I'm using the XPathEntityProcessor to parse an xml structure that looks > like > > this: > > > > >

DIH: Limited xpath syntax unable to parse all xml elements

2009-07-01 Thread Jay Hill
I'm using the XPathEntityProcessor to parse an xml structure that looks like this: Joe Smith World Atlas Content I want is here More content I want is here. Still more content here.>/p> The author and title parse out fine:

DIH: Distributing docs to more than one Solr instance

2009-07-01 Thread Jay Hill
I'm using the DIH to index records from a relational database. No problems, everything works great. But now, due to the size of index (70GB w/ 25M+ docs) I need to shard and want the DIH to distribute documents evenly between two shards. Current approach is to modify the sql query in the config fil

PlainTextEntitiyProcessor not putting any text into a field in index

2009-06-18 Thread Jay Hill
I'm having some trouble getting the PlainTextEntityProcessor to populate a field in an index. I'm using the TemplateTransformer to fill 2 fields, and have a timestamp field in schema.xml, and these fields make it into the index. Only the plaintText data is missing. Here is my configuration:

Re: Query faceting

2009-06-08 Thread Jay Hill
In order to get the the values you want for the service field you will need to change the fieldType definition in schema.xml for "service" to use something that doesn't alter your original values. Try the "string" fieldType to start and look at the fieldType definition for "string". I'm guessing yo

  1   2   >