Re: Hierarchical faceting

2014-11-17 Thread Jason Hellman
I realize you want to avoid putting depth details into the field values, but something has to imply the depth. So with that in mind, here is another approach (with the assumption that you are chasing down a single branch of a tree (and all its subbranch offshoots)), Use dynamic fields Step fro

Re: Boost documents having a field value

2014-06-02 Thread Jason Hellman
Hakim, That is what Boost Query (bq=) does. http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29 Jason On Jun 2, 2014, at 10:58 AM, Hakim Benoudjit wrote: > Hi guys, > Is it possible in solr to boost documents having a field value (Ex. > :)? > I know that it's possible to boo

Re: openSearcher, default commit settings

2014-06-02 Thread Jason Hellman
Boon, I expect you will find many definitions of “proper usage” depending upon context and expected results. Personally, don’t believe this is Solr’s job to enforce, and there are many ways through the use of directives in the servlet container layer that can allow restrictions if you feel th

Re: Enforcing a hard timeout on shard requests?

2014-05-30 Thread Jason Hellman
Gregg, I don’t have an answer to your question but I’m very curious what use case you have that permits such arbitrary partial-results. Is it just an edge case or do you want to permit a common occurrence? Jason On May 30, 2014, at 3:05 PM, Gregg Donovan wrote: > I'd like a to add a hard ti

Re: Error enquiry- exceeded limit of maxWarmingSearchers=2

2014-05-30 Thread Jason Hellman
I just realized I failed my own reading comprehension :) You have maxDocs, not maxTime for hard commit. Please disregard. On May 30, 2014, at 1:46 PM, Jason Hellman wrote: > I’m also not sure I understand the practical purpose of your hard/soft auto > commit settings. You are stati

Re: Error enquiry- exceeded limit of maxWarmingSearchers=2

2014-05-30 Thread Jason Hellman
I’m also not sure I understand the practical purpose of your hard/soft auto commit settings. You are stating the following: Every 10 seconds I want data written to disk, but not be searchable. Every 15 seconds I want data to be written into memory and searchable. I would consider whether your s

Re: SolrCloud: Understanding Replication

2014-05-30 Thread Jason Hellman
Marc, Fundamentally it’s a good solution design to always be capable of reposting (reindexing) your data to Solr. You are demonstrating a classic use case of this, which is upgrade. Is there a critical reason why you are avoiding this step? Jason On May 30, 2014, at 10:38 AM, Marc Campeau

Re: Solr interface

2014-04-07 Thread Jason Hellman
This. And so much this. As much this as you can muster. On Apr 7, 2014, at 1:49 PM, Michael Della Bitta wrote: > The speed of ingest via HTTP improves greatly once you do two things: > > 1. Batch multiple documents into a single request. > 2. Index with multiple threads at once. > > Michael

Re: Solr Autosuggest - Strange issue with leading numbers in query

2014-02-19 Thread Jason Hellman
Here’s a rather obvious question: have you rebuilt your spell index recently? Is it possible the offending numbers snuck into the spell dictionary? The terms component will show you what’s in your current, searchable field…but not the dictionary. If my memory serves correctly, with collate=t

Re: Caching Solr boost functions?

2014-02-19 Thread Jason Hellman
Gregg, The QueryResultCache caches a sorted int array of results matching the a query. This should overlap very nicely with your desired behavior, as a hit in this cache will not perform a Lucene query nor a need to calculate score. Now, ‘for the life of the Searcher’ is the trick here. You

Re: Exact fragment length in highlighting

2014-02-19 Thread Jason Hellman
Juan, Pay close attention to the boundary scanner you’re employing: http://wiki.apache.org/solr/HighlightingParameters#hl.boundaryScanner You can be explicit to indicate a type (hl.bs.type) with options such as CHARACTER, WORD, SENTENCE, and LINE. The default is WORD (as the wiki indicates) a

Re: block join and atomic updates

2014-02-18 Thread Jason Hellman
Thinking in terms of normalized data in the context of a Lucene index is dangerous. It is not a relational data model technology, and the join behaviors available to you have limited use. Each approach requires compromises that are likely impermissible for certain uses cases. If it is at al

Re: Solr server requirements for 100+ million documents

2014-02-11 Thread Jason Hellman
Whether you use the same machines as Solr or separate machines is a matter suited to taste. If you are the CTO, then you should make this decision. If not, inform management that risk conditions are greater when you share function and control on a single piece of hardware. A single failure of

Re: Memory Usage on Windows Os while indexing

2014-01-21 Thread Jason Hellman
To a very large extent, the capability of a platform is measurable by the skill of the team administering it. If core competencies lie in Windows OS then I would wager heavily the platform will outperform a similar Linux OS installation in the long haul. All things being equal, it’s really hard

Re: how to best convert some term in q to a fq

2013-12-28 Thread Jason Hellman
I second this notion. My reasoning focuses mostly on maintainability, where I posit that your client code will be far easier to extend/modify/troubleshoot than any effort spent attempting to do this within Solr. Jason On Dec 23, 2013, at 12:07 PM, Joel Bernstein wrote: > I would suggest ha

Re: Function query matching

2013-11-07 Thread Jason Hellman
You can, of course, us a function range query: select?q=text:news&fq={!frange l=0 u=100}sum(x,y) http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html This will give you a bit more flexibility to meet your goal. On Nov 7, 2013, at 7:26 AM, Erik Hat

Re: Problem with size of segments

2013-11-07 Thread Jason Hellman
David, I find Mike McCandless’ blog article to be very informative. Give it a go and let us know if you are still seeking clarification: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html Jason On Nov 7, 2013, at 5:09 AM, david.dav...@correo.aeat.es wrote: > Hi, >

Re: Replacing Google Mini Search Appliance with Solr?

2013-10-30 Thread Jason Hellman
Nutch is an excellent option. It should feel very comfortable for people migrating away from the Google appliances. Apache Droids is another possible way to approach, and I’ve found people using Heretrix or Manifold for various use cases (and usually in combination with other use cases where t

Re: Reclaiming disk space from (large, optimized) segments

2013-10-29 Thread Jason Hellman
If I sage Otis’ intent here it is to create shards on the basis of intervals of time. A shard represents a single interval (let’s say a year’s worth of data) and when that data is no longer necessary it is simply shut down and no longer included in queries. So, for example, you could have thre

Re: When is/should qf different from pf?

2013-10-29 Thread Jason Hellman
It is probable that with no addition boost to pf fields that the sum of the scores will be higher. But it is *possible* that they are not, and adding a boost to pf gives greater probability that they will be. All of this bears testing to confirm what search use cases merit what level of boost.

Re: SOLRJ replace document

2013-10-18 Thread Jason Hellman
Keep in mind that DataStax has a custom update handler, and as such isn't exactly a vanilla Solr implementation (even though in many ways it still is). Since updates are co-written to Cassandra and Solr you should always tread a bit carefully when slightly outside what they perceive to be norms

Re: field "title_ngram" was indexed without position data; cannot run PhraseQuery

2013-10-15 Thread Jason Hellman
If you consider what n-grams do this should make sense to you. Consider the following piece of data: White iPod If the field is fed through a bigram filter (n-gram with size of 2) the resulting token stream would appear as such: wh hi it te ip po od The usual use of n-grams is to match those

Re: Concurent indexing

2013-10-14 Thread Jason Hellman
The limitations on how many threads you can use to load data is primarily driven by factors on your hardware: CPU, heap usage, I/O, and the like. It is common for most index load processes to be able to handle more incoming data on the Solr side of the equation than can typically be loaded fro

Re: Solr auto suggestion not working

2013-10-10 Thread Jason Hellman
Very specifically, what is the field definition that is being used for the suggestions? On Oct 10, 2013, at 5:49 AM, Furkan KAMACI wrote: > What is your configuration for auto suggestion? > > > 2013/10/10 ar...@skillnetinc.com > >> >> >> Hi, >> >> We are encountering an issue in solr sea

Re: Field with default value and stored=false, will be reset back to the default value in case of updating other fields

2013-10-10 Thread Jason Hellman
The best use case I see for atomic updates typically involves avoid transmission of large documents for small field updates. If you are updating a "readCount" field of a PDF document that is 1MB in size you will avoid resending the 1MB PDF document's data in order to increment the "readCount"

Re: Update existing documents when using ExtractingRequestHandler?

2013-10-10 Thread Jason Hellman
As an endorsement of Erick's like, the primary benefit I see to processing through your own code is better error-, exception-, and logging-handling which is trivial for you to write. Consider that your code could reside on any server, either receiving through a PUSH or PULLing the data from you

Re: How to achieve distributed spelling check in SolrCloud ?

2013-10-08 Thread Jason Hellman
The shards.qt parameter is the easiest one to forget, with the most dramatic of consequences! On Oct 8, 2013, at 11:10 AM, shamik wrote: > James, > > Thanks for your reply. The "shards.qt" did the trick. I read the > documentation earlier but was not clear on the implementation, now it > tota

Re: Adding OR operator in querystring and grouping fields?

2013-10-07 Thread Jason Hellman
fq=here:there OR this:that For the lurker: an AND should be: fq=here:there&fq=this:that While you can, technically, pass: fq=here:there AND this:that Solr will cache the separate fq= parameters and reuse them in any context. The AND(ed) filter will be cached as a single entr

Re: Delete a field - Atomic updates (SOLR 4.1.0) without using null="true"

2013-10-07 Thread Jason Hellman
I don't know if there's a way to accomplish your goal directly, but as a pure workaround, you can write a routine to fetch all the stored values and resubmit the document without the field in question. This is what atomic updates do, minus the overhead of the transmission. On Oct 7, 2013, at 1

Re: Some text not indexed in solr4.4

2013-09-17 Thread Jason Hellman
Utkarsh, Check to see if the value is actually indexed into the field by using the Terms request handler: http://localhost:8983/solr/terms?terms.fl=text&terms.prefix=d (adjust the prefix to whatever you're looking for) This should get you going in the right direction. Jason On Sep 17, 2013,

Re: data/index naming format

2013-09-05 Thread Jason Hellman
The circumstance I've most typically seen the index. show up is when an update is sent to a slave server. The replication then appears to preserve the updated slave index in a separate folder while still respecting the correct data from the master. On Sep 5, 2013, at 8:03 PM, Shawn Heisey w

Re: JSON update request handler & commitWithin

2013-09-05 Thread Jason Hellman
They have modified the mechanisms for committing documents…Solr in DSE is not stock Solr...so you are likely encountering a boundary where stock Solr behavior is not fully supported. I would definitely reach out to them to find out if they support the request. On Sep 5, 2013, at 8:27 AM, "Ryan,

Re: SolrCloud Set up

2013-08-30 Thread Jason Hellman
One additional thought here: from a paranoid risk-management perspective it's not a good idea to have two critical services dependent upon a single point of failure if the hardware fails. Obviously risk-management is suited to taste, so you may feel the cost/benefit does not merit the separati

Re: Indexing hangs when more than 1 server in a cluster

2013-08-14 Thread Jason Hellman
ed the SOLR-4816 patch that someone >> indicated might help. I also reduced the CSV upload chunk size to 500. It >> seemed like things got a little better, but still eventually hung. >> >> I also see SOLR-5081, but I don't know if that is my issue or not. At least &

Re: Indexing hangs when more than 1 server in a cluster

2013-08-13 Thread Jason Hellman
While I don't have a past history of this issue to use as reference, if I were in your shoes I would consider trying your updates with softCommit disabled. My suspicion is you're experiencing some issue with the transaction logging and how it's managed when your hard commit occurs. If you can

Re: Facet field display name

2013-08-13 Thread Jason Hellman
It's been my experience that using they convenient feature to change the output key still doesn't save you from having to map it back to the field name underlying it in order to trigger the filter query. With that in mind it just makes more sense to me to leave the effort in the View portion of

Re: Spelling suggestions.

2013-08-09 Thread Jason Hellman
The majority of the behavior outlined in that wiki page should work quite sufficiently for 3.5.0. Note that there are only a few items that are marked Solr4.0 only (DirectSolrSpellChecker and WordBreakSolrSpellChecker, for example). On Aug 9, 2013, at 6:26 AM, Kamaljeet Kaur wrote: > Hello,

Re: Phrase query with prefix query

2013-08-02 Thread Jason Hellman
Or shingles, presuming you want to tokenize and output unigrams. On Aug 2, 2013, at 11:33 AM, Walter Underwood wrote: > Search against a field using edge N-grams. --wunder > > On Aug 2, 2013, at 11:16 AM, T. Kuro Kurosaka wrote: > >> Is there a query parser that supports a phrase query with p

Re: solr - set fileds as default search field

2013-07-29 Thread Jason Hellman
Or use the copyField technique to a single searchable field and set df= to that field. The example schema does this with the field called "text". On Jul 29, 2013, at 8:35 AM, Ahmet Arslan wrote: > Hi, > > > df is a single valued parameter. Only one field can be a default field. > > To query

Re: Solr 4.3.1 - query does not return documents, just numFounds, 2 shards, replication Factor 1

2013-07-29 Thread Jason Hellman
Nitin, You need to ensure the fields you wish to see are marked stored="true" in your schema.xml file, and you should include fields in your fl= parameter (fl=*,score is a good place to start). Jason On Jul 29, 2013, at 8:08 AM, Nitin Agarwal <2nitinagar...@gmail.com> wrote: > Hi, I am using

Re: restricting a query by a "set" of field values

2013-07-29 Thread Jason Hellman
Ben, This could be constructed as so: fl=date_deposited&fq=date[2013-07-01T00:00:00Z TO 2013-07-31T23:59:00Z]&fq=collection_id(1 2 n)&q.op=OR The parenthesis around the 1 2 n set indicate a boolean query, and we're ensuring they are an OR boolean by the q.op parameter. This should get you the

Re: solr 4.3, autocommit, maxdocs

2013-07-15 Thread Jason Hellman
Jonathan, Please note the openSearcher=false part of your configuration. This is why you don't see documents. The commits are occurring, and being written to segments on disk, but they are not visible to the search engine because a Solr searcher class has not opened them for visibility. You

Re: Using the Schema API from SolrJ

2013-07-06 Thread Jason Hellman
Steven, Some information can be gleaned from the "system" admin request handler: http://localhost:8983/solr/admin/system I am specifically looking at this: example Mind you, that is a manually-set value in the schema file. But just in case you want to get crazy you can also call the "file" a

Re: 2.1billion+ document

2013-07-05 Thread Jason Hellman
Saqib: At the simplest level: 1) Source the machine 2) Install Java 3) Install a servlet container of your choice 4) Copy your Solr WAR and conf directories as desired (probably a rough mirror of your current single server) 5) Start it up and start sending data there 6) Query both by simpl

Re: Surprising score?

2013-07-05 Thread Jason Hellman
Also considering using the SweetSpotSimilarityFactory class which allows to to still engage normalization but control how intrusive it is. This, combined with the ability to set a custom Similarity class on a per-fieldType basis may be extremely useful. More info: http://lucene.apache.org/sol

Re: how to replicate Solr Cloud

2013-06-25 Thread Jason Hellman
Kevin, I can imagine this working if you consider your second data center a pure slave relationship to your SolrCloud cluster. I haven't tried it, but I don't see why the solrconfig.xml can't identify as a master allowing you to call any of your cores in the cluster to replicate out. That bei

Re: Restarting SOLR will remove all cache?

2013-06-24 Thread Jason Hellman
Shalin, There's one point to test without caches, which is to establish how much value a cache actually provides. For me, this primarily means providing a benchmark by which to decide when to stop obsessing over caches. But yes, for load testing I definitely agree :) Jason On Jun 21, 2013,

Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

2013-06-24 Thread Jason Hellman
the capabilities of write perf - but not over the edge! > In particular, I am interested in knowing the symptoms of failure, to help > us troubleshoot the underlying problems if and when they arise. > > Thanks, > > Scott > > On Monday, June 24, 2013, Jason Hellman wrote

Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

2013-06-24 Thread Jason Hellman
ds. > > On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman < > jhell...@innoventsolutions.com> wrote: > >> Vinay, >> >> What autoCommit settings do you have for your indexing process? >> >> Jason >> >> On Jun 24, 2013, at 1:28 PM,

Re: [solr cloud] solr hangs when indexing large number of documents from multiple threads

2013-06-24 Thread Jason Hellman
Vinay, What autoCommit settings do you have for your indexing process? Jason On Jun 24, 2013, at 1:28 PM, Vinay Pothnis wrote: > Here is the ulimit -a output: > > core file size (blocks, -c) 0 data seg size(kbytes, > -d) unlimited scheduling priority (-

Re: in Solr 3.5, optimization increase the index size to double

2013-06-16 Thread Jason Hellman
And let's not forget the interesting bug in MMapDirectory: http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/store/MMapDirectory.html "NOTE: memory mapping uses up a portion of the virtual memory address space in your process equal to the size of the file

Re: Filtering down terms in suggest

2013-06-12 Thread Jason Hellman
43 AM, Aloke Ghoshal wrote: > Barani - the fq option doesn't work. > Jason - the dynamic field option won't work due to the high number of > groups and users. > > > > On Wed, Jun 12, 2013 at 1:12 AM, Jason Hellman < > jhell...@innoventsolutions.com> wr

Re: Filtering down terms in suggest

2013-06-11 Thread Jason Hellman
Aloke, If you do not have a factorial problem in the combination of userid and groupid (which I can imagine you might) you could consider creating a field for each combination (u1g1, u2g2) which can easily be done via dynamic fields. Use CopyField to get data into these various constructs (aga

Re: Two instances of solr - the same datadir?

2013-06-04 Thread Jason Hellman
Roman, Could you be more specific as to why replication doesn't meet your requirements? It was geared explicitly for this purpose, including the automatic discovery of changes to the data on the index master. Jason On Jun 4, 2013, at 1:50 PM, Roman Chyla wrote: > OK, so I have verified th

Re: Can mm (min-match) be specified by field in dismax or edismax?

2013-06-03 Thread Jason Hellman
Well, there is a hack(ish) way to do it: _query_:"{!type=edismax qf='someField' v='$q' mm=100%}" This is clearly not a solrconfig.xml settings, but part of your query string using LocalParam behavior. This is going to get really messy if you have plenty of fields you'd like to search, where yo

Re: Getting tons of EofException with jetty/SolrCloud

2013-05-31 Thread Jason Hellman
Those are default, though autoSoftCommit is commented out by default. Keep in mind about the hard commit running every 15 seconds: it is not updating your searchable data (due to the openSearcher=false setting). In theory, your data should be searchable due to autoSoftCommit running every 1 s

Re: 2 VM setup for SOLRCLOUD?

2013-05-30 Thread Jason Hellman
Jamey, You will need a load balancer on the front end to direct traffic into one of your SolrCore entry points. It doesn't matter, technically, which one though you will find benefits to narrowing traffic to fewer (for purposes of better cache management). Internally SolrCloud will round-robi

Re: split document or not

2013-05-28 Thread Jason Hellman
You may wish to explore the concept of using the Result Grouping (Field Collapsing) feature in which your paragraphs are individual documents that share a field to group them by (the ID of the document/book/article/whatever). http://wiki.apache.org/solr/FieldCollapsing This will net you absolut

Re: Nested Facets and distributed shard system.

2013-05-28 Thread Jason Hellman
You have mentioned Pivot Facets, but have you looked at the Path Hierarchy Tokenizer Factory: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory This matches your use case, as best as I understand it. Jason On May 28, 2013, at 12:47 PM, vibhoreng04

Re: filter query by string length or word count?

2013-05-22 Thread Jason Hellman
Sam, I would highly suggest counting the words in your external pipeline and sending that value in as a specific field. It can then be queried quite simply with a: wordcount:{80 TO *] (Note the { next to 80, excluding the value of 80) Jason On May 22, 2013, at 11:37 AM, Sam Lee wrote: > I

Re: multiple cache for same field

2013-05-20 Thread Jason Hellman
Most definitely not the number of unique elements in each segment. My 32 document sample index (built from the default example docs data) has the following: entry#0: 'StandardDirectoryReader(​segments_b:29 _8(​4.2.1):C32)'=>'manu_exact',class org.apache.lucene.index.SortedDocValues,0.5=>org.ap

Re: Not able to search Spanish word with ascent in solr

2013-05-20 Thread Jason Hellman
And use the /terms request handler to view what is present in the field: /solr/terms?terms.fl=text_es&terms.prefix=a You're looking to ensure the index does, in fact, have the accented characters present. It's just a sanity check, but could possibly save you a little (sanity, that is). Jason

Re: Upgrading from SOLR 3.5 to 4.2.1 Results.

2013-05-18 Thread Jason Hellman
Rishi, Fantastic! Thank you so very much for sharing the details. Jason On May 17, 2013, at 12:29 PM, Rishi Easwaran wrote: > > > Hi All, > > Its Friday 3:00pm, warm & sunny outside and it was a good week. Figured I'd > share some good news. > I work for AOL mail team and we use SOLR for

Re: Deleting an entry from a collection when they key has ":" in it

2013-05-16 Thread Jason Hellman
The first rule of Solr without Unique Key is that we don't talk about Solr without a Unique Key. The second rule... On May 16, 2013, at 8:47 PM, Jack Krupansky wrote: > Technically, core Solr does not require a unique key. A lot of features in > Solr do require unique keys, and it is recommen

Re: Aggregate word counts over a subset of documents

2013-05-16 Thread Jason Hellman
David, A Pivot Facet could possibly provide these results by the following syntax: facet.pivot=category,includes We would presume that includes is a tokenized field and thus a set of facet values would be rendered from the terms resoling from that tokenization. This would be nested in each ca

Re: Solr - Best Java Combination for performance?

2013-05-11 Thread Jason Hellman
I have run across plenty of implementations using just about every common servlet container on the market, and haven't run across any common problems to dissuade you against any one of them. On the JVM front most people seem to use Oracle because of it ubiquity. But I have also run across a

Re: Does Distributed Search are Cached Only the By Node That Runs Query?

2013-05-10 Thread Jason Hellman
And for 10,000 documents across n shards, that can be significant! On May 10, 2013, at 11:43 AM, Joel Bernstein wrote: > How many shards are in your collection? The query aggregator node will pull > pack that results from each shard and hold the results in memory. Then it > will add the results

Re: SOLR guidance required

2013-05-10 Thread Jason Hellman
One more tip on the use of filter queries. DO: &fq=name1:value1&fq=name2:value2&fq=namen:valuen DON'T: fq=name1:value1 AND name2:value2 AND name3:value3 Where OR operators apply, this does not matter. But your Solr cache will be much more savvy with the first construct. Jason On May 10, 20

Re: Sharing index data between two Solr instances

2013-05-10 Thread Jason Hellman
the read-only side each > time the solr.xml has changed (additional cores may be added by the updating > machine depending on the imported data). > > Thanks again and best regards! > > Milen > > > -Ursprüngliche Nachricht- > Von: Jason Hellman [mailto:jhel

Re: Sharing index data between two Solr instances

2013-05-10 Thread Jason Hellman
Milen, At some point you'll need to call a commit to search your data, either via AutoCommit policy or deterministically. There are various schools of though on which way to go but something needs to do this. If you go the AutoCommit route be sure to pay attention to the openSearcher value.

Re: Looking for Best Practice of Spellchecker

2013-05-10 Thread Jason Hellman
Nicholas, Also consider that some misspellings are better handled through Synonyms (or injected metadata). You can garner a great deal of value out of the spell checker by following the great advice James is giving here…but you'll find a well-placed "helper" synonym or metavalue can often sa

Re: Negative Boosting at Recent Versions of Solr?

2013-05-10 Thread Jason Hellman
You learned the gosh-darndest things: http://localhost:8983/solr/browse?q=ipod&boost=product(price,-2)&debugQuery=on …nets: -0.3797992 = (MATCH) sum of: 0.13510442 = (MATCH) max of: 0.045963455 = (MATCH) weight(text:ipod^0.5 in 4) [DefaultSimilarity], result of: 0.045963455 = score(

Re: Grouping search results by field returning all search results for a given query

2013-05-09 Thread Jason Hellman
ame number of items, I'll have to carefully > calculate the results that should be returned for each page of 20 items and > probably make several solr calls per page rendered. > > > On Thu, May 9, 2013 at 1:07 PM, Jason Hellman < > jhell...@innoventsolutions.com> wrote: &g

Re: 4.3 logging setup

2013-05-09 Thread Jason Hellman
If you nab the jars in example/lib/ext and place them within the appropriate folder in Tomcat (and this will somewhat depend on which version of Tomcat you are using…let's presume tomcat/lib as a brute-force approach) you should be back in business. On May 9, 2013, at 11:41 AM, richardg wrote:

Re: More Like This and Caching

2013-05-09 Thread Jason Hellman
Purely from empirical observation, both the DocumentCache and QueryResultCache are being populated and reused in reloads of a simple MLT search. You can see in the cache inserts how much extra-curricular activity is happening to populate the MLT data by how many inserts and lookups occur on the

Re: 4.3 logging setup

2013-05-09 Thread Jason Hellman
From: http://lucene.apache.org/solr/4_3_0/changes/Changes.html#4.3.0.upgrading_from_solr_4.2.0 Slf4j/logging jars are no longer included in the Solr webapp. All logging jars are now in example/lib/ext. Changing logging impls is now as easy as updating the jars in this folder with those necessar

Re: Grouping search results by field returning all search results for a given query

2013-05-09 Thread Jason Hellman
Luis, I am presuming you do not have an overarching grouping value here…and simply wish to show a standard search result that shows 1 item per company. You should be able to accomplish your second page of desired items (the second item from each of your 20 represented companies) by using the gr

Re: Use case for storing positions and offsets in index?

2013-05-08 Thread Jason Hellman
Consider further that term vector data and highlighting becomes very useful if you highlight externally to Solr. That is to say, you have the data stored externally and wish to re-parse positions of terms (especially synonyms) from source material. This is a (not too uncommon) technique used f

Re: disaster recovery scenarios for solr cloud and zookeeper

2013-05-03 Thread Jason Hellman
I have to imagine I'm quibbling with the original assertion that "Solr 4.x is architected with a dependency on Zookeeper" when I say the following: Solr 4.x is not architected with a dependency on Zookeeper. SolrCloud, however, is. As such, if a line of reasoning drives greater concern about