Re: Solr Cloud, 100 shards, shards progressively become slower

2015-01-08 Thread Jack Krupansky
mean there will be a reduction in the amount of system memory needed for file caching of the Lucene index. 100 / 4 * 2.8GB = 70 GB of RAM needed on each server. -- Jack Krupansky On Thu, Jan 8, 2015 at 10:57 AM, Andrew Butkus < andrew.but...@c6-intelligence.com> wrote: > Hi Shawn, >

Re: Determining the Number of Solr Shards

2015-01-08 Thread Jack Krupansky
table performance for both indexing and a full range of queries, and then use 10x that RAM for the RAM for the 100% load. That's the OS system memory for file caching, not the total system RAM. -- Jack Krupansky On Thu, Jan 8, 2015 at 4:55 PM, Nishanth S wrote: > Thanks guys for your inpu

Re: Tokenizer or Filter ?

2015-01-09 Thread Jack Krupansky
Consider an update processor - it can take any input, break it up any way you want, and then output multiple field values. You can even us the stateless script update processor to write the logic in JavaScript. -- Jack Krupansky On Fri, Jan 9, 2015 at 6:47 AM, tomas.kalas wrote: > Hello

Re: How does text-rev work?

2015-01-09 Thread Jack Krupansky
that the field type uses the reversed wildcard filter, and then it generates a wildcard query that using the reversed query token and wildcard pattern so that the leading wildcard becomes a trailing wildcard or prefix query -- Jack Krupansky On Fri, Jan 9, 2015 at 3:15 PM, Alexandre Rafalovitch

Re: How does text-rev work?

2015-01-10 Thread Jack Krupansky
uot;expert" feature. And there should be doc on how to use it. I do have some doc in my e-book, with some examples, but even that does not show the complete end-to-end config and schema. -- Jack Krupansky On Sat, Jan 10, 2015 at 1:13 AM, Alexandre Rafalovitch wrote: > So, Query Parser does

Re: ignoring bad documents during index

2015-01-10 Thread Jack Krupansky
Correct, Solr clearly needs improvement in this area. Feel free to comment on the Jira about what options you would like to see supported. -- Jack Krupansky On Sat, Jan 10, 2015 at 5:49 AM, SolrUser1543 wrote: > From reading this (https://issues.apache.org/jira/browse/SOLR-445) I see >

Re: ignoring bad documents during index

2015-01-10 Thread Jack Krupansky
the server rather than optimize performance. -- Jack Krupansky On Sat, Jan 10, 2015 at 6:02 AM, SolrUser1543 wrote: > Would it be a good solution to index single document instead of bulk ? > In this case I will know about the status of each message . > > What is recommendation

Re: edismax and mm: strange behaviour

2015-01-10 Thread Jack Krupansky
ot;required".) So, please explain in plain English what effect you are trying to achieve. mm is not for newbies! Also, please point us to whatever doc or other material you were reading that gave you the impression that mm was appropriate for your use case, so that we can correct any bad documen

Re: Extending solr analysis in index time

2015-01-11 Thread Jack Krupansky
ities/TFIDFSimilarity.html And to use your custom similarity class in Solr: https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity -- Jack Krupansky On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian wrote: > Hi everybody, > > I am going to add some analy

Re: Frequent deletions

2015-01-11 Thread Jack Krupansky
than this optimize operation? -- Jack Krupansky On Sun, Jan 11, 2015 at 1:46 AM, ig01 wrote: > Thank you all for your response, > The thing is that we have 180G index while half of it are deleted > documents. > We tried to run an optimization in order to shrink index size but it

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Jack Krupansky
client or app layer code, then maybe you just need to put more intelligence into that query-generation code in the client. -- Jack Krupansky On Sun, Jan 11, 2015 at 12:08 PM, Michael Lackhoff wrote: > Hi Ahmet, > > > You might find this useful : > > https://lucidworks.com/blog/

Re: pf doesn't work like normal phrase query

2015-01-11 Thread Jack Krupansky
detect some common use cases and handle them specially in your client. Such as the example you gave - you could extract the terms and generate separate bq parameters. -- Jack Krupansky On Sun, Jan 11, 2015 at 1:28 PM, Michael Lackhoff wrote: > Am 11.01.2015 um 18:30 schrieb Jack Krupan

Re: Extending solr analysis in index time

2015-01-11 Thread Jack Krupansky
Won't function queries do the job at query time? You can add or multiply the tf*idf score by a function of the term frequency of arbitrary terms, using the tf, mul, and add functions. See: https://cwiki.apache.org/confluence/display/solr/Function+Queries -- Jack Krupansky On Sun, Jan 11,

Re: Extending solr analysis in index time

2015-01-12 Thread Jack Krupansky
Could you clarify what you mean by "Lucene reverse index"? That's not a term I am familiar with. -- Jack Krupansky On Mon, Jan 12, 2015 at 1:01 AM, Ali Nazemian wrote: > Dear Jack, > Thank you very much. > Yeah I was thinking of function query for sorting, but I have to

Re: Solr grouping problem - need help

2015-01-13 Thread Jack Krupansky
That's your job. The easiest way is to do a copyField to a "string" field. -- Jack Krupansky On Tue, Jan 13, 2015 at 7:33 AM, Naresh Yadav wrote: > *Schema :* > > > *Code :* > SolrQuery q = new SolrQuery().setQuery("*:*"); > q.set(GroupParams.GR

Re: Extending solr analysis in index time

2015-01-13 Thread Jack Krupansky
A function query or an update processor to create a separate field are still your best options. -- Jack Krupansky On Tue, Jan 13, 2015 at 4:18 AM, Ali Nazemian wrote: > Dear Markus, > > Unfortunately I can not use payload since I want to retrieve this score to > each user as a

Re: Tokenizer or Filter ?

2015-01-13 Thread Jack Krupansky
ipt update processors, see my Solr e-book: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html -- Jack Krupansky On Tue, Jan 13, 2015 at 9:21 AM, tomas.kalas wrote: > Thanks Jack for your advice. Can you please explain me little

Re: Tokenizer or Filter ?

2015-01-13 Thread Jack Krupansky
s only . You can use a second pattern char filter to remove the "<[/]d[12>" markers as well, probably changing them to a space in both cases. See: http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceCharFilterFactory.html -- Jack K

Re: Engage custom hit collector for special search processing

2015-01-13 Thread Jack Krupansky
umber of unique row sets. -- Jack Krupansky On Tue, Jan 13, 2015 at 4:29 PM, tedsolr wrote: > I have a complicated problem to solve, and I don't know enough about > lucene/solr to phrase the question properly. This is kind of a shot in the > dark. My requirement is to return searc

Re: Tokenizer or Filter ?

2015-01-14 Thread Jack Krupansky
It should replace all occurrences of the pattern. Post your specific filter XML. Patterns can be very tricky. Use the Solr Admin UI analysis page to see how the filtering is occurring. -- Jack Krupansky On Wed, Jan 14, 2015 at 7:16 AM, tomas.kalas wrote: > Jack, thanks for help, but if i u

Re: Tokenizer or Filter ?

2015-01-14 Thread Jack Krupansky
I was suspecting it might do that - the pattern is "greedy" and takes the longest matching pattern. Add a question mark after the asterisk to use stingy mode that matches the shortest pattern. -- Jack Krupansky On Wed, Jan 14, 2015 at 8:37 AM, tomas.kalas wrote: > I just used Sol

Re: Tokenizer or Filter ?

2015-01-14 Thread Jack Krupansky
It's what Java has, whatever that is: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html So, maybe the correct answer is neither, but similar to both. -- Jack Krupansky On Wed, Jan 14, 2015 at 9:06 AM, tomas.kalas wrote: > Oh yeah, that is it. Thank you very much

Distributed mode for stats component?

2015-01-14 Thread Jack Krupansky
ow the new analytics component doesn't support distributed mode, but my question is about the old "stats" component. -- Jack Krupansky

Re: Distributed mode for stats component?

2015-01-14 Thread Jack Krupansky
admittedly, it's moot if stats is eventually to be superseded by the analytics component. -- Jack Krupansky On Wed, Jan 14, 2015 at 12:26 PM, Chris Hostetter wrote: > > : Does anybody know for sure whether the stats component fully supports > : distributed mode? It is listed in

Re: OutOfMemoryError for PDF document upload into Solr

2015-01-16 Thread Jack Krupansky
to do customization, entity extraction, boiler-plate removal, etc. in app-friendly code, before transport to the Solr server. The extraction request handler is a really cool feature and quite sufficient for a lot of scenarios, but additional architectural flexibility would be a big win. -- Jack

Re: shards per disk

2015-01-20 Thread Jack Krupansky
It sounds like your app needs a lot more RAM so that it is not doing so much I/O. -- Jack Krupansky On Tue, Jan 20, 2015 at 9:24 AM, Nimrod Cohen wrote: > Hi > > I done some performance test, and I wanted to know if any one saw the same > behavior. > > > > We need to

Re: Avoiding wildcard queries using edismax query parser

2015-01-22 Thread Jack Krupansky
The problem is that the presence of a wildcard causes Solr to skip the usual token analysis. But... you could add a "multiterm" analyzer, and then the wildcard would just get treated as punctuation. -- Jack Krupansky On Thu, Jan 22, 2015 at 4:33 PM, Jorge Luis Betancourt González &

Re: How do you query a sentence composed of multiple words in a description field?

2015-01-22 Thread Jack Krupansky
Solr tried to find the remaining terms in the default query field. -- Jack Krupansky On Thu, Jan 22, 2015 at 5:47 PM, Carl Roberts wrote: > Hi, > > How do you query a sentence composed of multiple words in a description > field? > > I want to search for sentence "Oracle Fusi

Re: Avoiding wildcard queries using edismax query parser

2015-01-22 Thread Jack Krupansky
The dismax query parser does not support wildcards. It is designed to be simpler. -- Jack Krupansky On Thu, Jan 22, 2015 at 5:57 PM, Jorge Luis Betancourt González < jlbetanco...@uci.cu> wrote: > I was also suspecting something like that, the odd thing was that the with > the dismax

Re: Avoiding wildcard queries using edismax query parser

2015-01-23 Thread Jack Krupansky
Presence of a wildcard in a query term is detected by the traditional Solr and edismax query parsers and causes normal term analysis to be bypassed. As I said, wildcards are a specific feature that dismax specifically doesn't support - this has nothing to do with edismax. -- Jack Krupansk

Re: Retrieving Phonetic Code as result

2015-01-23 Thread Jack Krupansky
/org/apache/solr/handler/FieldAnalysisRequestHandler.html and in solrconfig.xml -- Jack Krupansky On Thu, Jan 22, 2015 at 8:42 AM, Amit Jha wrote: > Hi, > > I need to know how can I retrieve phonetic codes. Does solr provide it as > part of result? I need codes for record matching. &g

Re: Retrieving Phonetic Code as result

2015-01-23 Thread Jack Krupansky
That's phone the filter is doing - transforming text into phonetic codes at index time. And at query time as well to do the phonetic matching in the query. The actual phonetic codes are stored in the index for the purposes of query matching. -- Jack Krupansky On Fri, Jan 23, 2015 at 12:

Re: Solr regex query help

2015-01-24 Thread Jack Krupansky
or maybe use a Solr update processor to pull the string apart and store the individual pieces as separate fields. As always, the first question is not how to store your data, but how your users intend to access your data. Post some sample queries. I imagine that any sane user would like to refere

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Jack Krupansky
which treated the colons as token separators. -- Jack Krupansky On Sat, Jan 24, 2015 at 3:28 PM, Alexandre Rafalovitch wrote: > You are using keywords here that seem to contradict with each other. > Or your use case is not clear. > > Specifically, you are saying you are getting s

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Jack Krupansky
How are you currently importing data? -- Jack Krupansky On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts wrote: > Sorry if I was not clear. What I am asking is this: > > How can I parse the data during import to tokenize it by (:) and strip the > cpe:/o? > > > > On 1/2

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Jack Krupansky
Take a look at the RegexTransformer. Or,in some cases your may need to use the raw ScriptTransformer. See: https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler -- Jack Krupansky On Sat, Jan 24, 2015 at 3:49 PM, Carl Roberts wrote

Re: Solr facet search improvements

2015-01-28 Thread Jack Krupansky
need to be able to handle. -- Jack Krupansky On Wed, Jan 28, 2015 at 5:56 AM, thakkar.aayush wrote: > I have around 1 million job titles which are indexed on Solr and am looking > to improve the faceted search results on job title matches. > > For example: a job search for *Resear

Re: CopyField exclude patterns

2015-02-02 Thread Jack Krupansky
Sorry, that feature is not available in Solr at this time. You could implement an update processor which copied only the desired input field values. This can be done in JavaScript using the script update processor. -- Jack Krupansky On Mon, Feb 2, 2015 at 2:53 AM, danny teichthal wrote: >

Re: Where can we set the parameters in Solr Config?

2015-02-03 Thread Jack Krupansky
The Solr properties can also be defined in solrcore.properties and core.properties files: https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml -- Jack Krupansky On Tue, Feb 3, 2015 at 3:31 PM, O. Olson wrote: > Thank you Jim. I was hoping if there is an alternative

Re: Exception while loading 2 Billion + Documents in Solr 4.8.0

2015-02-04 Thread Jack Krupansky
l not be a matter of how many documents you can load, but whether the query response latency for those documents is sufficient. -- Jack Krupansky On Wed, Feb 4, 2015 at 4:54 PM, Arumugam, Suresh wrote: > Hi All, > > > > We are trying to load 14+ Billion documents into Solr. But we a

Re: Exception while loading 2 Billion + Documents in Solr 4.8.0

2015-02-11 Thread Jack Krupansky
this front? -- Jack Krupansky On Wed, Feb 11, 2015 at 8:05 AM, Erick Erickson wrote: > bq: Are there any such structures? > > Well, I thought there were, but I've got to admit I can't call any to mind > immediately. > > bq: 2b is just the hard limit > > Yeah,

Re: Multy-tenancy and quarantee of service per application (tenant)

2015-02-12 Thread Jack Krupansky
tenant has their own app and the service provider controls the Solr server but has no control over the app or load. The first is supported by Solr. The second is not, other than the service provider spinning up separate instances of Solr on separate physical servers. -- Jack Krupansky On Thu

Re: Book progress (Solr 4.x Deep Dive) - see my blog

2013-06-25 Thread Jack Krupansky
Please report any comments or issues to my email address or comment on my blog. Comments on the blog will benefit other readers, but the choice is yours. Thanks! -- Jack Krupansky -Original Message- From: Bernd Fehling Sent: Tuesday, June 25, 2013 2:06 AM To: solr-user

Re: Solr indexer and Hadoop

2013-06-25 Thread Jack Krupansky
Solr does not have any integrated Hadoop/HDFS crawling or indexing support today. Sorry. LucidWorks Search does have HDFS crawling support: http://docs.lucidworks.com/display/lweug/Using+the+High+Volume+HDFS+Crawler Cloudera Search has HDFS support as well. -- Jack Krupansky -Original

Re: Pivot-Facets with ranges

2013-06-25 Thread Jack Krupansky
No, facet.pivot takes a comma-separated list of "fields", with no support for "ranges". But, you can have a combination of field and range facets without pivoting. -- Jack Krupansky -Original Message- From: Jakob Frank Sent: Tuesday, June 25, 2013 6

Re: URL search and indexing

2013-06-25 Thread Jack Krupansky
There are examples in my book: http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-1/ebook/product-21079719.html But... I still think you should use a tokenized text field as well - use all three: raw string, tokenized text, and URL classification fields. -- Jack

Re: URL search and indexing

2013-06-25 Thread Jack Krupansky
-sequences that occur in the URL without the need for wildcards or regular expressions. -- Jack Krupansky -Original Message- From: Jan Høydahl Sent: Tuesday, June 25, 2013 6:28 AM To: solr-user@lucene.apache.org Subject: Re: URL search and indexing Probably a good match for the RegExp

Re: Solr indexer and Hadoop

2013-06-25 Thread Jack Krupansky
??? Hadoop=HDFS If the data is not in Hadoop/HDFS, just use the normal Solr indexing tools, including SolrCell and Data Import Handler, and possibly ManifoldCF. -- Jack Krupansky -Original Message- From: engy.morsy Sent: Tuesday, June 25, 2013 8:10 AM To: solr-user

Re: URL search and indexing

2013-06-25 Thread Jack Krupansky
), you automatically get most of that. The user can query by a URL fragment, such as "apache.org", ".org", "lucene.apache.org", etc. and the tokenization will strip out the punctuation. I'll add this script to my list of examples to add in the next rev of my

Re: Querying multiple collections in SolrCloud

2013-06-25 Thread Jack Krupansky
ection - add all the fields to one schema - there is no time or space penalty if most of the field are empty for most documents. -- Jack Krupansky -Original Message- From: Chris Toomey Sent: Tuesday, June 25, 2013 6:08 PM To: solr-user@lucene.apache.org Subject: Querying multiple col

Re: Is it possible to searh Solr with a longer query string?

2013-06-25 Thread Jack Krupansky
/tomcat-5.5-doc/config/http.html) --- If you're not using Tomcat, your container may have a similar limit. -- Jack Krupansky -Original Message- From: yang, gang Sent: Tuesday, June 25, 2013 5:47 PM To: solr-user@lucene.apache.org Cc: Meng, Fan Subject: RE: Is it possible to searh

Re: Is there a way to capture div tag by id?

2013-06-25 Thread Jack Krupansky
Guide mislead people with examples that clearly can never run as expected with real data. -- Jack Krupansky -Original Message- From: eShard Sent: Tuesday, June 25, 2013 1:17 PM To: solr-user@lucene.apache.org Subject: Is there a way to capture div tag by id? let's say I have a div

Re: StatsComponent doesn't work if field's type is TextField - can I change field's type to String

2013-06-26 Thread Jack Krupansky
You could use an update processor to turn the text string into multiple string values. A short snippet of JavaScript in a StatelessScriptUpdateProcessor could do the trick. The field could then be a multivalued string field. -- Jack Krupansky -Original Message- From: Elran Dvir

Re: How to truncate a particular field, LimitTokenCountAnalyzer or LimitTokenCountFilter?

2013-06-26 Thread Jack Krupansky
/4_3_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilterFactory.html The new Apache Solr Reference? No mention of the filter. -- Jack Krupansky -Original Message- From: Daniel Collins Sent: Wednesday, June 26, 2013 3:38 AM To: solr-user@lucene.apache.org

Re: URL search and indexing

2013-06-26 Thread Jack Krupansky
If there is a bug... we should identify it. What's a sample post command that you issued? -- Jack Krupansky -Original Message- From: Flavio Pompermaier Sent: Wednesday, June 26, 2013 10:53 AM To: solr-user@lucene.apache.org Subject: Re: URL search and indexing I was doing ex

Re: Solr indexer and Hadoop

2013-06-26 Thread Jack Krupansky
o 4.4. If not in 4.4, 4.5 is probably a slam-dunk. -- Jack Krupansky -Original Message- From: David Larochelle Sent: Wednesday, June 26, 2013 11:24 AM To: solr-user@lucene.apache.org Subject: Re: Solr indexer and Hadoop Pardon, my unfamiliarity with the Solr development process. Now

Re: Dynamic Type For Solr Schema

2013-06-26 Thread Jack Krupansky
ence Guide nor current release from Lucid, but see the detailed examples in my book. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Wednesday, June 26, 2013 10:51 AM To: solr-user@lucene.apache.org Subject: Dynamic Type For Solr Schema I use Solr 4.3.1 as SolrCloud. I k

Re: Solr 4.2.1 - master taking long time to respond after tomcat restart

2013-06-26 Thread Jack Krupansky
You need to do occasional hard commits, otherwise the update log just grows and grows and gets replayed on each server start. -- Jack Krupansky -Original Message- From: Arun Rangarajan Sent: Wednesday, June 26, 2013 1:18 PM To: solr-user@lucene.apache.org Subject: Solr 4.2.1 - master

Re: Solr document auto-upload?

2013-06-26 Thread Jack Krupansky
directly implemented in Solr -- Jack Krupansky -Original Message- From: aspielman Sent: Wednesday, June 26, 2013 2:16 PM To: solr-user@lucene.apache.org Subject: Solr document auto-upload? Is it possible to to configure Solr to automatically grab documents in a specidfied directory, with

Re: Solr admin search with wildcard

2013-06-27 Thread Jack Krupansky
No, you cannot use wildcards within a quoted term. Tell us a little more about what your strings look like. You might want to consider tokenizing or using ngrams to avoid the need for wildcards. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 3:33

Re: Solr admin search with wildcard

2013-06-27 Thread Jack Krupansky
Just from the string field to a "text" field and use standard tokenization, then you can search the text field for "youtube" or even "something" that is a component of the URL path. No wildcard required. -- Jack Krupansky -Original Message- From: Amit

Re: how to delete on column of a doc in solr

2013-06-27 Thread Jack Krupansky
me, and then you can update with atomic update. You may want to rethink your data model. -- Jack Krupansky -Original Message- From: anurag.jain Sent: Thursday, June 27, 2013 8:28 AM To: solr-user@lucene.apache.org Subject: how to delete on column of a doc in solr In my solr sche

Re: displaying one result per domain

2013-06-27 Thread Jack Krupansky
in the book. You can also use a regular expression tokenfilter to extract the host name as well. And you can use standard Solr "grouping" to group by the field containing host name. -- Jack Krupansky -Original Message- From: Wojciech Kapelinski Sent: Thursday, June 27, 20

Re: solr.DirectUpdateHandler2 failed to instantiate

2013-06-27 Thread Jack Park
arvestServer.getHttpClient().getParams().setParameter("update.chain", "harvest"); In short, the original exception was based on a gross misinterpretation of how one goes about equating solrconfig.xml with configurations of SolrJ. Hope that helps more than it confuses! Cheers Jack On

Re: Context search in solr

2013-06-28 Thread Jack Krupansky
. Sure, people don't like seeing the mis-matched results in the list and a larger number of results, but it's all a tradeoff to assure that the most relevant results are higher and exact matching is a little looser. -- Jack Krupansky -Original Message- From: Erick Erickson Sent:

Re: Replicating files containing external file fields

2013-06-28 Thread Jack Krupansky
Show us your directive. Maybe there is some subtle error in the file name. -- Jack Krupansky -Original Message- From: Arun Rangarajan Sent: Friday, June 28, 2013 1:06 PM To: solr-user@lucene.apache.org Subject: Re: Replicating files containing external file fields Erick, Thx for

Re: An issue with atomic updates?

2013-06-28 Thread Jack Krupansky
Well, it is known to me and documented in my book. BTW, that field value is simply ignored. There are tons of places in Solr where undefined values or outright garbage are simply ignored, silently. Go ahead and file a Jira though. -- Jack Krupansky -Original Message- From: Sam

Re: change solr core schema and config via http

2013-06-28 Thread Jack Krupansky
How could you not have ssh access to the Solr host machine? I mean, how are you managing that server, without ssh access? And if you are not managing the server, what business do you have trying to change the Solr configuration?!?!? Something fishy here! -- Jack Krupansky -Original

Re: change solr core schema and config via http

2013-06-28 Thread Jack Krupansky
Ah, yes, good old multi-tenant - I should have known. Yeah, the Solr API is evolving, albeit too slowly for the needs of some. -- Jack Krupansky -Original Message- From: Wu, James C. Sent: Friday, June 28, 2013 7:06 PM To: solr-user@lucene.apache.org Subject: RE: change solr core

Re: Replicating files containing external file fields

2013-06-28 Thread Jack Krupansky
to.) Sorry, I don't have the answer to the reload question at the tip of my tongue. -- Jack Krupansky -Original Message- From: Arun Rangarajan Sent: Friday, June 28, 2013 7:42 PM To: solr-user@lucene.apache.org Subject: Re: Replicating files containing external file fields Ja

Re: Schema design for parent child field

2013-06-29 Thread Jack Krupansky
to simulate the effect of a simple join in a single clean query. But you can do a separate query to get parent record details. -- Jack Krupansky -Original Message- From: Sperrink Sent: Saturday, June 29, 2013 5:08 AM To: solr-user@lucene.apache.org Subject: Schema design for parent child

Re: increase search score of certain category only for certain keyword

2013-06-29 Thread Jack Krupansky
is good for keyword search. Use the text variant in qf. -- Jack Krupansky -Original Message- From: winsu Sent: Friday, June 28, 2013 9:26 PM To: solr-user@lucene.apache.org Subject: increase search score of certain category only for certain keyword Hi, Currently i've certain sample

Re: No date.gap on pivoted facets

2013-06-30 Thread Jack Krupansky
s correspond to your date gap. You can do that with an update processor, or do it before you send the data to Solr. In the next release of my book I have a script for a StatelessScriptUpdateProccessor (with examples) that supports truncation of dates to a desired resolution, copying or modifyi

Re: Unique key error while indexing pdf files

2013-07-01 Thread Jack Krupansky
It all depends on your data model - tell us more about your data model. For example, how will users or applications query these documents and what will they expect to be able to do with the ID/key for the documents? How are you expecting to identify documents in your data model? -- Jack

Re: Unique key error while indexing pdf files

2013-07-01 Thread Jack Krupansky
quot; data model - which includes what expectations you have about the unique ID/key for each document. So, for that first PDF file, what expectation (according to your data model) do you have for what its ID/key should be? -- Jack Krupansky -Original Message- From: archit2112 Sent

Re: RemoveDuplicatesTokenFilterFactory to avoid import duplicate values in multivalued field

2013-07-01 Thread Jack Krupansky
g" is inappropriate for this email list (or any email list.) -- Jack Krupansky -Original Message- From: tuedel Sent: Monday, July 01, 2013 8:15 AM To: solr-user@lucene.apache.org Subject: Re: RemoveDuplicatesTokenFilterFactory to avoid import duplicate values in multivalued field H

Re: Converting nested data model to solr schema

2013-07-01 Thread Jack Krupansky
to get parent or child IDs and then do a second query filtered by those IDs. And, yes, this only approximates the full power of an SQL join - but at a tiny fraction of the cost. -- Jack Krupansky -Original Message- From: adfel70 Sent: Monday, July 01, 2013 9:56 AM To: solr-user

Re: Distinct values in multivalued fields

2013-07-01 Thread Jack Krupansky
Unfortunately, update processors only "see" the new, fresh, incoming data, not any existing document data. This is a case where your best bet may be to read the document first and then merge your new value into the existing list of values. -- Jack Krupansky -Original Message-

Re: How to re-index Solr & get term frequency within documents

2013-07-01 Thread Jack Krupansky
You can write any function query in the field list of the "fl" parameter. Sounds like you want "termfreq": termfreq(field_arg,term) fl=id,a,b,c,termfreq(a,xyz) -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Monday, July 01, 2013 10

Re: are fields stored or unstored by default xml

2013-07-01 Thread Jack Krupansky
"stored" and "indexed" both default to "true". This is legal: This detail will be in Early Access Release #2 of my book on Friday. -- Jack Krupansky -Original Message- From: Otis Gospodnetic Sent: Monday, July 01, 2013 2:21 PM To: solr-user@lucen

Re: are fields stored or unstored by default xml

2013-07-01 Thread Jack Krupansky
Correct - the field definitions inherit the attributes of the field type, and it is the field type that has the actual default values for indexed and stored (and other attributes.) -- Jack Krupansky -Original Message- From: Yonik Seeley Sent: Monday, July 01, 2013 3:56 PM To: solr

Re: How to re-index Solr & get term frequency within documents

2013-07-01 Thread Jack Krupansky
sources. But, yeah, as Otis says, "re-index" is really just a euphemism for deleting your Solr data directory and indexing from scratch from the original data sources. -- Jack Krupansky -Original Message- From: Otis Gospodnetic Sent: Monday, July 01, 2013 2:26 PM To:

Re: Solr 4.3 Pivot Performance Issue

2013-07-02 Thread Jack Krupansky
What is the nature of your degradation? -- Jack Krupansky -Original Message- From: solrUserJM Sent: Tuesday, July 02, 2013 4:22 AM To: solr-user@lucene.apache.org Subject: Solr 4.3 Pivot Performance Issue Hi There, I notice with the upgrade from solr 4.0 to solr 4.3 that we had a

Re: need distance in miles not in kilometers

2013-07-02 Thread Jack Krupansky
Simply multiply by the number of miles per kilometer, 0.621371: fl=_dist_:mul(geodist(),0.621371) -- Jack Krupansky -Original Message- From: irshad siddiqui Sent: Tuesday, July 02, 2013 5:19 AM To: solr-user@lucene.apache.org Subject: need distance in miles not in kilometers Hi, I

Re: Converting nested data model to solr schema

2013-07-02 Thread Jack Krupansky
It sounds like 4.4 will have an RC next week, so the prospects for block join in 4.4 are kind of dim. I mean, such a significant feature should have more than a few days to bake before getting released. But... who knows what Yonik has planned! -- Jack Krupansky -Original Message

Re: Newbie SolR - Need advice

2013-07-02 Thread Jack Krupansky
Start with the Solr Tutorial. http://lucene.apache.org/solr/tutorial.html -- Jack Krupansky -Original Message- From: fabio1605 Sent: Tuesday, July 02, 2013 11:16 AM To: solr-user@lucene.apache.org Subject: Newbie SolR - Need advice Hi we have a MSSQL Server which is just getting

Re: Newbie SolR - Need advice

2013-07-02 Thread Jack Krupansky
Consider DataStax Enterprise - it combines Cassandra for NoSql data storage with Solr for indexing - fully integrated. http://www.datastax.com/ -- Jack Krupansky -Original Message- From: fabio1605 Sent: Tuesday, July 02, 2013 12:44 PM To: solr-user@lucene.apache.org Subject: Re

Re: How to query Solr for empty field or specific value

2013-07-02 Thread Jack Krupansky
*&fq=((*:* -color.not_null:[* TO *]) OR color:blue) -- Jack Krupansky -Original Message- From: Van Tassell, Kristian Sent: Tuesday, July 02, 2013 3:47 PM To: solr-user@lucene.apache.org Subject: How to query Solr for empty field or specific value Hello, I'm using Solr 4.2 and am trying to get a s

Re: How to show just the parent domains from results in Solr

2013-07-02 Thread Jack Krupansky
tom script with the Stateless Script update processor. My book has examples for URL Classify. -- Jack Krupansky -Original Message- From: A Geek Sent: Tuesday, July 02, 2013 1:47 PM To: solr user Subject: How to show just the parent domains from results in Solr hi All, I've indexed

Re: Partial Matching in both query and field

2013-07-02 Thread Jack Krupansky
You will need to set q.op to "OR", and... use a field type that has the autoGeneratePhraseQueries attribute set to "false". -- Jack Krupansky -Original Message- From: James Bathgate Sent: Tuesday, July 02, 2013 5:10 PM To: solr-user@lucene.apache.org Subject: Part

Re: Partial Matching in both query and field

2013-07-02 Thread Jack Krupansky
Ahhh... you put autoGeneratePhraseQueries="false" on the field - but it needs to be on the field type. You can see from the parsed query that it generated the phrase. -- Jack Krupansky -Original Message- From: James Bathgate Sent: Tuesday, July 02, 2013 5:35 PM To:

Re: Newbie SolR - Need advice

2013-07-03 Thread Jack Krupansky
Design your own application layer for both indexing and query that knows about both SQL and Solr. Give it a REST API and then your client applications can talk to your REST API and not have to care about the details of Solr or SQL. That's the best starting point. -- Jack Krup

Re: Use case indexed="false" stored="false" field

2013-07-03 Thread Jack Krupansky
to undefined fields. In other words, you are telling Solr that it is okay to have inputs for these fields - simply ignore them. But... you could still have update processors that look at the values of "ignored" fields and maybe assigns them to other, non-ignored fields. -- Jack

Re: Search for string ending with question mark

2013-07-03 Thread Jack Krupansky
nce it is a wildcard character. Yes, string_field:*\? should match any string field that ends with a "?". -- Jack Krupansky -Original Message- From: JZ Sent: Wednesday, July 03, 2013 10:59 AM To: solr-user@lucene.apache.org Subject: Search for string ending with question mark

Re: unused fields in Solr schema.xml increase the index size

2013-07-03 Thread Jack Krupansky
view differences. -- Jack Krupansky -Original Message- From: Ali, Saqib Sent: Wednesday, July 03, 2013 11:55 AM To: solr-user@lucene.apache.org Subject: unused fields in Solr schema.xml increase the index size Hello all, Do unused fields in Solr Schem.xml increase the size of the

Re: omitTermFreqAndPositions="true" in easy English, please?

2013-07-03 Thread Jack Krupansky
phrases, and there is no scoring difference whether a term occurs once or a thousand times in that field for each document. A lot less information needs to be stored in the index. -- Jack Krupansky -Original Message- From: Ali, Saqib Sent: Wednesday, July 03, 2013 10:31 PM To: solr-user

Re: omitTermFreqAndPositions="true" in easy English, please?

2013-07-03 Thread Jack Krupansky
Yes, but it is simply doing an AND or OR of the individual terms - no phrases or implied ordering of the terms. -- Jack Krupansky -Original Message- From: Ali, Saqib Sent: Thursday, July 04, 2013 12:52 AM To: solr-user@lucene.apache.org Subject: Re: omitTermFreqAndPositions="tru

Re: omitTermFreqAndPositions="true" in easy English, please?

2013-07-03 Thread Jack Krupansky
Oops... I wasn't reading carefully enough - frequencies and positions only relate to tokenized fields (text) - not string fields. That doesn't impact your ability to do AND and OR of discrete string terms of a multivalued string field. -- Jack Krupansky -Original Message-

Re: Total Term Frequency per ResultSet in Solr 4.3 ?

2013-07-04 Thread Jack Krupansky
ew feature/improvement. -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Thursday, July 04, 2013 9:45 AM To: solr-user@lucene.apache.org Subject: Total Term Frequency per ResultSet in Solr 4.3 ? Hi , I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user

Re: Find related words

2013-07-04 Thread Jack Krupansky
You can take a look at the MoreLikeThis/Find Similar feature. That gives you an approximation, but using documents rather than discrete terms. You would have to write a custom component of your own based on logic from MLT. -- Jack Krupansky -Original Message- From: Dotan Cohen Sent

<    5   6   7   8   9   10   11   12   13   14   >