Managed schema used with Cloudera MapreduceIndexerTool and morphlines?

2017-03-17 Thread Jay Hill
I've got a very difficult project to tackle. I've been tasked with using schemaless mode to index json files that we receive. The structure of the json files will always be very different as we're receiving files from different customers totally unrelated to one another. We are attempting to build

Re: Very long running replication.

2014-02-27 Thread Jay Hill
Bumping this. I'm seeing the error mentioned earlier in the thread - "Unable to download completely. Downloaded 0!=" often in my logs. I'm dealing with a situation where maxDoc count is growing at a faster rate than numDocs and is now almost twice as large. I'm not optimizing but rather relying o

/no_coord in dismax scoring explain

2012-01-06 Thread Jay Hill
What does "/no_coord" mean in the dismax scoring output? I've looked through the wiki mail archives, lucidfind, and can't find any reference. -- ¡jah!

Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
I'm on a project where we have 1B docs sharded across 20 servers. We're not in production yet and we're doing load tests now. We're sending load to hit 100qps per server. As the load increases we're seeing query times sporadically increasing to 10 seconds, 20 seconds, etc. at times. What we're tryi

Re: Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
We're on the trunk: 4.0-2011-10-26_08-46-59 1189079 - hudson - 2011-10-26 08:51:47 Client timeouts are set to 4 seconds. Thanks, -Jay On Thu, Jan 26, 2012 at 1:40 PM, Mark Miller wrote: > > On Jan 26, 2012, at 1:28 PM, Jay Hill wrote: > > > > > I've tried sett

Re: Shard timeouts on large (1B docs) Solr cluster

2012-01-26 Thread Jay Hill
ibuted search, meaning if a response wasn't received w/in the timeAllowed, and if partialResults is true, then that shard would not be waited on for results. is that correct? thanks, -jay On Thu, Jan 26, 2012 at 2:23 PM, Jay Hill wrote: > We're on the trunk: > 4.0-2011-10-26_08-

SolrJ: Setting multiple parameters

2010-06-20 Thread Jay Hill
Working with SolrJ I'm doing a query using the StatsComponent, and the stats.facet parameter. I'm not able to set multiple fields for the "stats.facet" parameter using SolrJ. Here is the query I'm trying to create: http://localhost:8983/solr/select/?q=*:*&stats=on&stats.field=fieldForStats&stats.f

Creating new Solr cores using relative paths

2010-08-16 Thread Jay Hill
I'm having trouble getting the core CREATE command to work with relative paths in the solr.xml configuration. I'm working with a layout like this: /opt/solr [this is solr.solr.home: $SOLR_HOME] /opt/solr/solr.xml /opt/solr/core0/ [this is the "template" core] /opt/solr/core0/conf/schema.xml [etc.]

Re: OutOfMemoryErrors

2010-08-17 Thread Jay Hill
A merge factor of 100 is very high and out of the norm. Try starting with a value of 10. I've never seen a running system with a value anywhere near this high. Also, what is your setting for ramBufferSizeMB? -Jay On Tue, Aug 17, 2010 at 10:46 AM, rajini maski wrote: > yeah sorry I forgot to men

Re: Tuning Solr

2010-10-05 Thread Jay Hill
Removing those components is not likely to impact performance very much, if at all. I would focus on other areas when tuning performance, such as looking memory usage and configuration, query design, etc. But there isn't any harm in removing them either. Why not do some load tests with the componen

Complex query, need filtering after query not before

2012-01-27 Thread Jay Hill
I have a project where we need to search 1B docs and still have results < 700ms. The problem is, we are using geofiltering and that is happening * before* the queries, so we have to geofilter on the 1B docs to restrict our set of docs first, and then do the query on a name field. But it seems that

TermsComponent show only terms that matched query?

2012-02-24 Thread Jay Hill
I have a situation where I want to show the term counts as is done in the TermsComponent, but *only* for terms that are *matched* in a query, so I get something returned like this (pseudo code): q=title:(golf swing) title: golf legends show how to improve your golf swing on the golf course ...ot

Re: TermsComponent show only terms that matched query?

2012-02-27 Thread Jay Hill
t; > Best > Erick > > On Fri, Feb 24, 2012 at 3:31 PM, Jay Hill wrote: > > I have a situation where I want to show the term counts as is done in the > > TermsComponent, but *only* for terms that are *matched* in a query, so I > > get something returned like this (p

Re: Solr Single Core vs Multiple Cores installation for localization

2012-05-21 Thread Jay Hill
Usually I would recommend trying to index all languages into one Solr core. The determining factor for me is how much "overlap" there is in fields for each language, i.e. how many common fields for each language. For example if you have 60 common fields to all languages, but only 8 fields that are

Re: phrase, inidividual term, prefix, fuzzy and stemming search

2011-02-04 Thread Jay Hill
You mentioned that dismax does not support wildcards, but edismax does. Not sure if dismax would have solved your other problems, or whether you just had to shift gears because of the wildcard issue, but you might want to have a look at edismax. -Jay http://www.lucidimagination.com On Mon, Jan 3

Re: WordDelimiterFilterFactory

2011-02-04 Thread Jay Hill
You can always try something like this out in the analysis.jsp page, accessible from the Solr Admin home. Check out that page and see how it allows you to enter text to represent what was indexed, and text for a query. You can then see if there are matches. Very handy to see how the various filters

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-13 Thread Jay Hill
As Hoss mentioned earlier in the thread, you can use the statistics page from the admin console to view the current number of segments. But if you want to know by looking at the files, each segment will have a unique prefix, such as "_u". There will be one unique prefix for every segment in the ind

Re: Understanding the DisMax tie parameter

2011-04-14 Thread Jay Hill
Dismax works by first selecting the highest scoring sub-query of all the sub-queries that were run. If I want to search on three fields, manu, name and features, I can configure dismax like this: dismax * 0.0* manu name features *:* Now I'll use this query: http

Re: Understanding the DisMax tie parameter

2011-04-15 Thread Jay Hill
Looks good, thanks Tom. -Jay On Fri, Apr 15, 2011 at 8:55 AM, Burton-West, Tom wrote: > Thanks everyone. > > I updated the wiki. If you have a chance please take a look and check to > make sure I got it right on the wiki. > > http://wiki.apache.org/solr/DisMaxQParserPlugin#tie_.28Tie_breaker.2

Re: Multiple Tags and Facets

2011-04-21 Thread Jay Hill
I don't think I understand what you're trying to do. Are you trying to preserve all facets after a user clicks on a facet, and thereby triggers a filter query, which excludes the other facets? If that's the case, you can use local parameters to tag the filter queries so they are not used for the fa

Scaling Search with Big Data/Hadoop and Solr now available at Lucene Revolution

2011-04-25 Thread Jay Hill
I've worked with a lot of different Solr implementations, and one area that is emerging more and more is using Solr in combination with other "big data" solutions. My company, Lucid Imagination, has added a two-day course to our upcoming Lucene Revolution conference, "Scaling Search with Big Data a

Re: facet search and UnInverted multi-valued field?

2011-05-03 Thread Jay Hill
UnInvertedField is similar to Lucene's FieldCache, except, while the FieldCache cannot work with multivalued fields, UnInvertedField is designed for that very purpose. So since your f_dcperson field is multivalued, by default you use UnInvertedField. You're not doing anything wrong, that's default

Loading custom update request handler on startup

2012-07-09 Thread Jay Hill
I'm writing a custom update request handler that will poll a "hot" directory for Solr xml files and index anything it finds there. The custom class implements Runnable, and when the run method is called the loop starts to do the polling. How can I tell Solr to load this class on startup to fire off

Re: Loading custom update request handler on startup

2012-07-09 Thread Jay Hill
tup up the thread for my polling UpdateRequestHandler. This seems to work, but if anyone has a better (or more tested) approach please let us know. -Jay On Mon, Jul 9, 2012 at 2:33 PM, Jay Hill wrote: > I'm writing a custom update request handler that will poll a "hot" > d

FieldCollapsing: Two response elements returned?

2009-07-27 Thread Jay Hill
I'm doing some testing with field collapsing, and early results look good. One thing seems odd to me however. I would expect to get back one block of results, but I get two - the first one contains the collapsed results, the second one contains the full non-collapsed results: ... ... This see

Re: How can i get lucene index format version information?

2009-07-30 Thread Jay Hill
Check the system request handler: http://localhost:8983/solr/admin/system Should look something like this: 1.3.0.2009.07.28.10.39.42 1.4-dev 797693M - jayhill - 2009-07-28 10:39:42 2.9-dev 2.9-dev 794238 - 2009-07-15 18:05:08 -Jay On Thu, Jul 30, 2009 at 10:32 AM, Walter Underwood wrote: > I

DIH: Any way to make update on db table?

2009-08-03 Thread Jay Hill
Is it possible for the DataImportHandler to update records in the table it is querying? For example, say I have a query like this in my entity: query="select field1, field2, from someTable where hasBeenIndexed=false" Is there a way I can mark each record processed by updating the hasBeenIndexed f

Re: DIH: Any way to make update on db table?

2009-08-04 Thread Jay Hill
updates. > > > Writing a database procedure might be a good idea. In that case your > > query > > > will simply be > .../>. > > > All the heavy lifting can be done by this query. > > > > > > Moreover, update queries, only return the

MoreLikeThis: How to get quality terms from html from content stream?

2009-08-07 Thread Jay Hill
I'm using the MoreLikeThisHandler with a content stream to get documents from my index that match content from an html page like this: http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2009/08/06/SP5R194Q13.DTL&mlt.fl=body&rows=4&debugQuery=true But, not su

Re: MoreLikeThis: How to get quality terms from html from content stream?

2009-08-09 Thread Jay Hill
8, 2009, at 10:42 AM, Ken Krugler wrote: > > >> On Aug 7, 2009, at 5:23pm, Jay Hill wrote: >> >> I'm using the MoreLikeThisHandler with a content stream to get documents >>> from my index that match content from an html page like this: >>> >>> http:

Re: Field names with whitespaces

2009-08-31 Thread Jay Hill
This seems to work: ?q=field\ name:something Probably not a good idea to have field names with whitespace though. -Jay 2009/8/28 Marcin Kuptel > Hi, > > Is there a way to query solr about fields which names contain whitespaces? > Indexing such data does not cause any problems but I have been

Re: Sort a Multivalue field

2009-09-09 Thread Jay Hill
Unfortunately you can't sort on a multi-valued field. In order to sort on a field it must be indexed but not multi-valued. Have a look at the FieldOptions wiki page for a good description of what values to set for different use cases: http://wiki.apache.org/solr/FieldOptionsByUseCase -Jay www.luc

Re: Pagination with solr json data

2009-09-10 Thread Jay Hill
All you have to do is use the "start" and "rows" parameters to get the results you want. For example, the query for the first page of results might look like this, ?q=solr&start=0&rows=10 (other params omitted). So you'll start at the beginning (0) and get 10 results. They next page would be ?q=sol

Re: TermsComponent

2009-09-10 Thread Jay Hill
If you need an alternative to using the TermsComponent for auto-suggest, have a look at this blog on using EdgeNGrams instead of the TermsComponent. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ -Jay http://www.lucidimagination.com On Wed, S

Re: Highlighting in SolrJ?

2009-09-10 Thread Jay Hill
Set up the query like this to highlight a field named "content": SolrQuery query = new SolrQuery(); query.setQuery("foo"); query.setHighlight(true).setHighlightSnippets(1); //set other params as needed query.setParam("hl.fl", "content"); QueryResponse queryResponse =getSolrSe

Re: Highlighting in SolrJ?

2009-09-11 Thread Jay Hill
one line out of the > whole field as a snippet. > > On Thu, Sep 10, 2009 at 7:45 PM, Jay Hill wrote: > > Set up the query like this to highlight a field named "content": > > > >SolrQuery query = new SolrQuery(); > >query.setQuery("foo"); >

Re: Highlighting in SolrJ?

2009-09-11 Thread Jay Hill
gh lighted, even if the search term only occurs in the > first line of a 300 page field. I'm not sure if mergeContinuous will > do that, or if it will miss everything after the last line that > contains the search term. > > On Fri, Sep 11, 2009 at 10:42 AM, Jay Hill wrote: > &g

Re: "standard" requestHandler components

2009-09-11 Thread Jay Hill
RequestHandlers are configured in solrconfig.xml. If no components are explicitly declared in the request handler config the the defaults are used. They are: - QueryComponent - FacetComponent - MoreLikeThisComponent - HighlightComponent - StatsComponent - DebugComponent If you wanted to have a cus

Re: Highlighting in SolrJ?

2009-09-12 Thread Jay Hill
Will do Shalin. -Jay http://www.lucidimagination.com On Fri, Sep 11, 2009 at 9:23 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Jay, it would be great if you can add this example to the Solrj wiki: > > http://wiki.apache.org/solr/Solrj > > On Fri, Sep 11,

Re: Is it possible to query for "everything" ?

2009-09-14 Thread Jay Hill
Use: ?q=*:* -Jay http://www.lucidimagination.com On Mon, Sep 14, 2009 at 4:18 PM, Jonathan Vanasco wrote: > I'm using Solr for seach and faceted browsing > > Is it possible to have solr search for 'everything' , at least as far as q > is concerned ? > > The request handlers I've found don't li

Re: Is it possible to query for "everything" ?

2009-09-14 Thread Jay Hill
With dismax you can use q.alt when the q param is missing: q.alt=*:* should work. -Jay On Mon, Sep 14, 2009 at 5:38 PM, Jonathan Vanasco wrote: > Thanks Jay & Matt > > I tried *:* on my app, and it didn't work > > I tried it on the solr admin, and it did > > I checked the solr config file, and

Re: KStem download

2009-09-14 Thread Jay Hill
The two jar files are all you should need, and the configuration is correct. However I noticed that you are on Solr 1.3. I haven't tested the Lucid KStemmer on a non-Lucid-certified distribution of 1.3. I have tested it on recent versions of 1.4 and it works fine (just tested with the most recent n

Any way to encrypt/decrypt stored fields?

2009-09-16 Thread Jay Hill
For security reasons (say I'm indexing very sensitive data, medical records for example) is there a way to encrypt data that is stored in Solr? Some businesses I've encountered have such needs and this is a barrier to them adopting Solr to replace other legacy systems. Would it require a custom-wri

Batching requests using SolrCell with SolrJ

2009-09-19 Thread Jay Hill
When working with SolrJ I have typically batched a Collection of SolrInputDocument objects before sending them to the Solr server. I'm working with the latest nightly build and using the ExtractingRequestHandler to index documents, and everything is working fine. Except I haven't been able to figur

Re: TermsComponent or auto-suggest with filter

2009-10-06 Thread Jay Hill
Have a look at a blog I posted on how to use EdgeNGrams to build an auto-suggest tool: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ You could easily add filter queries to this approach. Ffor example, the query used in the blog could add filter

Re: TermsComponent or auto-suggest with filter

2009-10-07 Thread Jay Hill
"Two other approaches are to use either the TermsComponent (new in Solr > > 1.4) or faceting." > > > > On Wed, Oct 7, 2009 at 1:51 AM, Jay Hill wrote: > > > Have a look at a blog I posted on how to use EdgeNGrams to build an > > auto-suggest tool: >

Re: ISOLatin1AccentFilter before or after Snowball?

2009-10-07 Thread Jay Hill
Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory deprecated in favor of: in 1.4? -Jay http://www.lucidimagination.com On Wed, Oct 7, 2009 at 1:44 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Tue, Oct 6, 2009 at 4:33 PM, Chantal Ackermann < > chantal.

DIH: Setting rows= on full-import has no effect

2009-10-08 Thread Jay Hill
In the past setting rows=n with the full-import command has stopped the DIH importing at the number I passed in, but now this doesn't seem to be working. Here is the command I'm using: curl ' http://localhost:8983/solr/indexer/mediawiki?command=full-import&rows=100' But when 100 docs are imported

Re: DIH: Setting rows= on full-import has no effect

2009-10-09 Thread Jay Hill
//issues.apache.org/jira/browse/SOLR-1501 > > > > On Fri, Oct 9, 2009 at 6:10 AM, Jay Hill wrote: > > > In the past setting rows=n with the full-import command has stopped the > > DIH > > > importing at the number I passed in, but now this doesn't se

Re: concatenating tokens

2009-10-09 Thread Jay Hill
Use copyField to copy to a field with a field type like this: This works for your example, however I can't be sure if it will work for all of your content, but give it a try and see. -Jay http://www.lucid

Re: Dynamic Data Import from multiple identical tables

2009-10-09 Thread Jay Hill
You could use separate DIH config files for each of your three tables. This might be overkill, but it would keep them separate. The DIH is not limited to one request handler setup, so you could create a unique handler for each case with a unique name: table1-config.xml

Re: java -Dsolr.solr.home=core -jar start.jar not working for me

2009-10-09 Thread Jay Hill
Shouldn't that be: java -Dsolr.solr.home=multicore -jar start.jar and then hit url: http://localhost:8983/solr/core0/admin/ or http://localhost:8983/solr/core1/admin/ -Jay http://www.lucidimagination.com On Fri, Oct 9, 2009 at 1:17 PM, Jason Rutherglen wrote: > I have a fresh checkout from t

Re: java -Dsolr.solr.home=core -jar start.jar not working for me

2009-10-09 Thread Jay Hill
> 2009-10-09 13:37:05.096::INFO: Started SocketConnector @ 0.0.0.0:8983 > > And http://localhost:8983/solr/admin yields a 404 error. > > On Fri, Oct 9, 2009 at 1:27 PM, Jay Hill wrote: > > Shouldn't that be: java -Dsolr.solr.home=multicore -jar start.jar > > > > an

Re: Facets - ORing attribute values

2009-10-29 Thread Jay Hill
1.4 has a good chance of being released next week. There was a hope that it might make it this week, but another bug in Lucene 2.9.1 was found, pushing things back just a little bit longer. -Jay http://www.lucidimagination.com On Thu, Oct 29, 2009 at 11:43 AM, beaviebugeater wrote: > > Do you h

Re: solr web ui

2009-10-30 Thread Jay Hill
Have a look at the VelocityResponseWriter ( http://wiki.apache.org/solr/VelocityResponseWriter). It's in the contrib area, but the wiki has instructions on how to move it into your core Solr. Solr uses response writers to return results. The default is XML but responses can be returned in JSON, Rub

Re: CPU utilization and query time high on Solr slave when snapshot install

2009-11-02 Thread Jay Hill
So assuming you set up a few sample sort queries to run in the firstSearcher config, and had very low query volume during that ten minutes so that there were no evictions before a new Searcher was loaded, would those queries run by the firstSearcher be passed along to the cache for the next Searche

Re: Sending file to Solr via HTTP POST

2009-11-05 Thread Jay Hill
Here is a brief example of how to use SolrJ with the ExtractingRequestHandler: ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract"); req.addFile(fileToIndex); req.setParam("literal.id", getId(fileToIndex)); req.setParam("literal

Re: specify multiple files in for DataImportHandler

2009-11-05 Thread Jay Hill
You can set up multiple request handlers each with their own configuration file. For example, in addition to the config you listed you could add something like this: data-two-config.xml and so on with as many handlers as you need. -Jay http://www.lucidimagination.com On Thu, Nov 5, 2009 a

Re: Wildcards at the Beginning of a Search.

2009-11-16 Thread Jay Hill
There is a "text_rev" field type in the example schema.xml file in the official release of 1.4. It uses the ReversedWildcardFilterFactory to revers a field. You can do a copyField from the field you want to use for leading wildcard searches to a field using the text_rev field, and then do a regular

Replication admin page auto-reload

2009-11-16 Thread Jay Hill
The replication admin page on slaves used to have an auto-reload set to reload every few seconds. In the official 1.4 release this doesn't seem to be working, but it does in a nightly build from early June. Was this changed on purpose or is this a bug? I looked through CHANGES.txt to see if anythin

Re: nested queries

2009-11-19 Thread Jay Hill
I don't think your queries are actually nested queries. Nested queries key off of the "magic" field name _query_. You're right however that there is very little in the way of documentation of examples of nested queries. If you haven't seen this blog about them yet you might find this a helpful over

Sanity check on numeric types and which of them to use

2009-12-04 Thread Jay Hill
Looking at the example version of schema.xml there seems to be some confusion on which numeric field types are best used in different situations. What confused me was that the type of "int" is now set to a TrieIntField, but with a precisionStep of 0: ' the "tint" type is set up as a TrieIntFiel

Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Jay Hill
I'm on a project where I'm trying to determine the size of the field cache. We're seeing lots of memory problems, and I suspect that the field cache is extremely large, but I'm trying to get exact counts on what's in the field cache. One thing that struck me as odd in the output of the stats.jsp p

Re: Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Jay Hill
n Sat, Dec 19, 2009 at 11:37 AM, Yonik Seeley wrote: > On Sat, Dec 19, 2009 at 2:25 PM, Jay Hill wrote: > > One thing that struck me as odd in the output of the stats.jsp page is > that > > the field cache always shows a String type for a field, even if it is not > a > >

Re: Sort fields all look Strings in field cache, no matter schema type

2009-12-19 Thread Jay Hill
Oh, forgot to add (just to keep the thread complete), the field is being used for a sort, so it was able to use TrieDoubleField. Thanks again, -Jay On Sat, Dec 19, 2009 at 12:21 PM, Jay Hill wrote: > This field is of class type solr.SortableDoubleField. > > I'm actually migra

Re: Phrase matching on a text field

2009-05-07 Thread Jay Hill
The string fieldtype is not being tokenized, while the text fieldtype is tokenized. So the stop word "for" is being removed by a stop word filter, which doesn't happen with the text field type (no tokenizing). Have a look at the schema.xml in the example dir and look at the default configuration f

Re: French and SpellingQueryConverter

2009-05-07 Thread Jay Hill
It seems to me that this is just the expected behavior of the FrenchAnalyzer using the FrenchStemmer. I'm not familiar with the French language, but in English words like running, runner, and runs are all stemmed down to "run" as intended. I don't know what other words in French would stem down to

Re: Solr Loggin issue

2009-05-12 Thread Jay Hill
Usually that means there is another log4j.properties or log4j.xml file in your classpath that is being found before the one you are intending to use. Check your classpath for other versions of these files. -Jay On Tue, May 12, 2009 at 3:38 AM, Sagar Khetkade wrote: > > Hi, > I have solr impleme

Re: Selective Searches Based on User Identity

2009-05-12 Thread Jay Hill
The only downside would be that you would have to update a document anytime a user was granted or denied access. You would have to query before the update to get the current values for grantedUID and deniedUID, remove/add values, and update the index. If you don't have a lot of changes in the syste

Re: master/slave failure scenario

2009-05-13 Thread Jay Hill
- Migrate configuration files from old master (or backup) to new master. - Replicate from a slave to the new master. - Resume indexing to new master. -Jay On Wed, May 13, 2009 at 4:26 AM, nk 11 wrote: > Nice. > What if the master fails permanently (like a disk crash...) and the new > master is

Re: query regarding Indexing xml files -db-data-config.xml

2009-05-15 Thread Jay Hill
If that is your complete input file then it looks like you are missing the wrapping element: F8V7067-APL-KIT > field> > Belkin Mobile Power Cord for iPod w/ Dock > Belkin > electronics > connector > car power adapter, white > 4 > 19.95 > 1 > false > Is it possible you just forgot

Re: what does the version parameter in the query mean?

2009-05-21 Thread Jay Hill
I was interested in this recently and also couldn't find anything on the wiki. I found this in the list archive: The version parameter determines the XML protocol used in the response. Clients are strongly encouraged to ''always'' specify the protocol version, so as to ensure that the format of th

Re: Question about field types and querying

2009-05-28 Thread Jay Hill
Try using the admin analysis tool (http://:/solr/admin/analysis.jsp) too see what the analysis chain is doing to your query. Enter the field name ("question" in your case) and the Field value (Index) "customize" (since that's what's in the document). For Field value (Query) enter "customer". Check

Re: Highlighting and Field options

2009-06-01 Thread Jay Hill
Use the fl param to ask for only the fields you need, but also keep hl=true. Something like this: http://localhost:8080/solr/select/?q=bear&version=2.2&start=0&rows=10&indent=on&hl=true&fl=id Note that &fl=id means the only field returned in the XML will be the id field. Highlights are still ret

Re: query issue /special character and case

2009-06-08 Thread Jay Hill
Regarding being able to search SCHOLKOPF (o with no umlaut) and match SCHÖLKOPF (with umlaut) try using the ISOLatin1AccentFilterFactory in your analysis chain: This filter removes accented chars and replaces them with non-accented versions. As always, make sure to add it to the for both

Re: Query faceting

2009-06-08 Thread Jay Hill
In order to get the the values you want for the service field you will need to change the fieldType definition in schema.xml for "service" to use something that doesn't alter your original values. Try the "string" fieldType to start and look at the fieldType definition for "string". I'm guessing yo

PlainTextEntitiyProcessor not putting any text into a field in index

2009-06-18 Thread Jay Hill
I'm having some trouble getting the PlainTextEntityProcessor to populate a field in an index. I'm using the TemplateTransformer to fill 2 fields, and have a timestamp field in schema.xml, and these fields make it into the index. Only the plaintText data is missing. Here is my configuration:

DIH: Distributing docs to more than one Solr instance

2009-07-01 Thread Jay Hill
I'm using the DIH to index records from a relational database. No problems, everything works great. But now, due to the size of index (70GB w/ 25M+ docs) I need to shard and want the DIH to distribute documents evenly between two shards. Current approach is to modify the sql query in the config fil

DIH: Limited xpath syntax unable to parse all xml elements

2009-07-01 Thread Jay Hill
I'm using the XPathEntityProcessor to parse an xml structure that looks like this: Joe Smith World Atlas Content I want is here More content I want is here. Still more content here.>/p> The author and title parse out fine:

Re: DIH: Limited xpath syntax unable to parse all xml elements

2009-07-02 Thread Jay Hill
under irrespective of nesting , tag > names use this > > > > > > > > On Thu, Jul 2, 2009 at 5:31 AM, Jay Hill wrote: > > I'm using the XPathEntityProcessor to parse an xml structure that looks > like > > this: > > > > >

Re: DIH: Limited xpath syntax unable to parse all xml elements

2009-07-02 Thread Jay Hill
dy/chapter//p) doesn't seem to be > >supported. > > > >Thanks, > >-Jay > > > > > >2009/7/1 Noble Paul ?? Â Ë³Ë > > > >> complete xpath is not supported > >> > >> /book/body/chapter/p > >> > >> s

Re: DIH: Limited xpath syntax unable to parse all xml elements

2009-07-02 Thread Jay Hill
I'm on the trunk, built on July 2: 1.4-dev 789506 Thanks, -Jay On Thu, Jul 2, 2009 at 11:33 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Thu, Jul 2, 2009 at 11:38 PM, Mark Miller > wrote: > > > Shalin Shekhar Mangar wrote: > > > >> > >> It selects all matching nodes. But if t

Re: DIH: Limited xpath syntax unable to parse all xml elements

2009-07-02 Thread Jay Hill
Thanks Fergus, setting the field to multivalued did work: gets all the elements as multivalue fields in the body field. The only thing is, the body field is used by some other content sources, so I have to look at the implications setting it to multi-valued will have on the other data sour

Re: Indexing XML

2009-07-07 Thread Jay Hill
Mathieu, have a look at Solr's DataImportHandler. It provides a configuration-based approach to index different types of datasources including relational databases and XML files. In particular have a look at the XpathEntityProcessor ( http://wiki.apache.org/solr/DataImportHandler#head-f1502b1ed71d9

Re: about defaultSearchField

2009-07-08 Thread Jay Hill
Just to be sure: You mentioned that you "adjusted" schema.xml - did you re-index after making your changes? -Jay On Wed, Jul 8, 2009 at 7:07 AM, Yang Lin wrote: > Thanks for your reply. But it works not. > > Yang > > 2009/7/8 Yao Ge > > > > > Try with fl=* or fl=*,score added to your request

Re: Indexing rich documents from websites using ExtractingRequestHandler

2009-07-08 Thread Jay Hill
I haven't tried this myself, but it sounds like what you're looking for is enabling remote streaming: http://wiki.apache.org/solr/ContentStream#head-7179a128a2fdd5dde6b1af553ed41735402aadbf As the link above shows you should be able to enable remote streaming like this: and then something like t

Re: Creating DataSource for DIH to Oracle Database

2009-07-09 Thread Jay Hill
Francis, your question is a little vague. Are you looking for the configuration for connecting the DIH to a JNDI datasource set up in Weblogic? -Jay On Mon, Jul 6, 2009 at 2:41 PM, Francis Yakin wrote: > > Have any one had experience creating a datasource for DIH to an Oracle > Database?

Spell checking: Is there a way to exclude words known to be wrong?

2009-07-13 Thread Jay Hill
We're building a spell index from a field in our main index with the following configuration: textSpell default spell ./spellchecker true This works great and re-builds the spelling index on commits as expected. However, we know there are misspellings in

DIH: On import (full or delta) commit=false seems to not take effect

2009-07-15 Thread Jay Hill
I am trying to run full and delta imports with the commit=false option, but it doesn't seem to take effect - after the import a commit always happens no matter what params I send. I've looked at the source and unless I'm missing something it doesn't seem to process the commit param. Here's the url

Re: spellcheck with misspelled words in index

2009-07-15 Thread Jay Hill
We had the same thing to deal with recently, and a great solution was posted to the list. Create a stopwords filter on the field your using for your spell checking, and then populate a custom stopwords file with known misspelled words: Y

Re: DIH: On import (full or delta) commit=false seems to not take effect

2009-07-15 Thread Jay Hill
My bad, I had a configuration setting overriding this value. Sorry for the mistake. -Jay On Wed, Jul 15, 2009 at 12:07 PM, Jay Hill wrote: > I am trying to run full and delta imports with the commit=false option, but > it doesn't seem to take effect - after the import a commit alw

Re: DIH: On import (full or delta) commit=false seems to not take effect

2009-07-15 Thread Jay Hill
Actually, "my good" after all. The parameter does not take effect. If commit=false is passed in a commit still happens. Will open and JIRA and supply a patch shortly. -Jay On Wed, Jul 15, 2009 at 5:50 PM, Jay Hill wrote: > My bad, I had a configuration setting overriding this

Re: Solr 1.4 - stats page slow

2009-12-24 Thread Jay Hill
I've noticed this as well, usually when working with a large field cache. I haven't done in-depth analysis of this yet, but it seems like when the stats page is trying to pull data from a large field cache it takes quite a long time. Are you doing a lot of sorting? If so, what are the field types

Re: Solr 1.4 - stats page slow

2009-12-24 Thread Jay Hill
Also, what is your heap size and the amount of RAM on the machine? I've also noticed that, when watching memory usage through JConsole or YourKit while loading the stats page, the memory usage spikes dramatically - are you seeing this as well? -Jay On Thu, Dec 24, 2009 at 9:12 AM, Jay

Re: Indexing the latests MS Office documents

2010-01-05 Thread Jay Hill
The version of Tika in the 1.4 release definitely parses the most current Office formats (.docx, .pptx, etc.) and they index as expected. -Jay On Mon, Jan 4, 2010 at 6:02 PM, Peter Wolanin wrote: > You must have been searching old documentation - I think tika 0,3+ has > support for the new MS f

Re: Solr 1.4 - stats page slow

2010-01-08 Thread Jay Hill
It's definitely still an issue. I've seen this with at least four different Solr implementations. It clearly seems to be a problem when there is a large field cache. It would be bad enough if the stats.jsp was just slow to load (usually takes 1 to 2 minutes), but when monitoring memory usage with j

Re: Solr 1.4 - stats page slow

2010-01-08 Thread Jay Hill
Actually my cases were all with customers I work with, not just one case. A common practice is to monitor cache stats to tune the caches properly. Also, noting the warmup times for new IndexSearchers, etc. I've worked with people that have excessive auto-warm count values which is causing extremely

Re: solr blocking on commit

2010-01-19 Thread Jay Hill
A couple of follow up questions: - What type of garbage collector is in use? - How often are you optimizing the index? - In solrconfig.xml what is the setting for ? - Right before and after you see this pause, check the output of http://:/solr/admin/system, specifically the output of and send thi

Solr Analysis Webinar Jan 28, 2010

2010-01-20 Thread Jay Hill
My colleague at Lucid Imagination, Tom Hill, will be presenting a free webinar focused on analysis in Lucene/Solr. If you're interested, please sign up and join us. Here is the official notice: We'd like to invite you to a free webinar our company is offering next Thursday, 28 January, at 2PM Eas

For caches, any reason to not set initialSize and size to the same value?

2010-02-12 Thread Jay Hill
If I've done a lot of research and have a very good idea of where my cache sizes are having monitored the stats right before commits, is there any reason why I wouldn't just set the initialSize and size counts to the same values? Is there any reason to set a smaller initialSize if I know reliably t

  1   2   >