Re: RealTimeGetHandler doesn't retrieve documents

2015-11-19 Thread Jack Krupansky
Do the failing IDs have any special characters that might need to be escaped? Can you find the documents using a normal query on the unique key field? -- Jack Krupansky On Thu, Nov 19, 2015 at 10:27 AM, Jérémie MONSINJON < jeremie.monsin...@gmail.com> wrote: > Hello everyone ! >

Re: Search with very large boolean filter

2015-11-20 Thread Jack Krupansky
IDs in use during a particular interval of time? -- Jack Krupansky On Fri, Nov 20, 2015 at 4:50 PM, jichi wrote: > Hi, > > I am using Solr 4.7.0 to search text with an id filter, like this: > > id:(100 OR 2 OR 5 OR 81 OR 10 ...) > > The number of IDs in the boolean fi

Re: Querying nested datastructures

2015-11-24 Thread Jack Krupansky
The primary recommendation is that you flatten nested documents. That means one Solr document per cpc, not multivalued. As always, queries should drive your data model, so please specify what a typical query might be like, in plain English. -- Jack Krupansky On Tue, Nov 24, 2015 at 4:39 AM

Re: Range Query on a language specific field

2015-11-24 Thread Jack Krupansky
x27;m not sure how useful it will be. -- Jack Krupansky On Tue, Nov 24, 2015 at 4:06 AM, Manohar Sripada wrote: > I have a requirement where I need to be able to query on a field (say > "salary"). This field contains data in Chinese. > > Is it possible in Solr to do a ra

Re: [Edismax] * escaping

2015-11-25 Thread Jack Krupansky
Yeah, this stuff is poorly documented, not very intuitive, and the terminology is poorly designed in the first place, so it's completely expected to easily get confused by it. Not even a mention of it in the Solr reference guide. -- Jack Krupansky On Wed, Nov 25, 2015 at 4:39 AM, Aless

Re: Difference in query behavior.

2015-11-30 Thread Jack Krupansky
The mm parameter or default operator logic only applies to the top level of the query. Once you get nested in parentheses below the top level, Solr/Lucene reverts to the default of the OR (SHOULD) operator. -- Jack Krupansky On Mon, Nov 30, 2015 at 5:45 AM, Modassar Ather wrote: > Hi, &g

Re: Synonyms in Search Results and More Accurate Matches

2015-12-01 Thread Jack Krupansky
recall (even the most remote partial match to avoid missing any documents) with a much higher boost for exact matches. -- Jack Krupansky On Tue, Dec 1, 2015 at 10:10 AM, Erik Hatcher wrote: > One technique that works well is to use copyField to end up with two > indexed fields, on

Re: Stop adding content in Solr through /update URL

2015-12-04 Thread Jack Krupansky
Never made it into CHANGES.txt either. Not part of any patch either. Appears to have been secretly committed as a part of SOLR-6787 (Blob API) via Revision *1650448 <http://svn.apache.org/viewvc?view=revision&revision=1650448>* in Solr 5.1. -- Jack Krupansky On Fri, Dec 4, 2015 a

Re: capacity of storage a single core

2015-12-08 Thread Jack Krupansky
constantly re-read portions of the index into memory. The practical limit for documents is not per core or number of cores but across all cores on the node since it is mostly a memory limit and the available CPU resources for accessing that memory. -- Jack Krupansky On Tue, Dec 8, 2015 at 8:57 AM

Re: capacity of storage a single core

2015-12-09 Thread Jack Krupansky
monly. And, yes, each app has its own latency requirements. The purpose of a general rule is to generally avoid unhappiness, but if you have an appetite and tolerance for unhappiness, then go for it. Replica vs. shard? They're basically the same - a replica is a copy of a shard. -- Jack Kr

Re: Unstructured/Structured data for indexing

2015-12-09 Thread Jack Krupansky
You can also use Solr Cell to send entire PDF or office documents: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika -- Jack Krupansky On Wed, Dec 9, 2015 at 3:09 AM, subinalex wrote: > Hi, > > I am a solr newbie,just got a quick

Re: NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

2015-12-11 Thread Jack Krupansky
in a separate table (use the same partition key to assure that the join will be more efficient by being on the same node.) -- Jack Krupansky On Fri, Dec 11, 2015 at 6:21 AM, Andrea Gazzarini wrote: > Hi Vikram, > sounds like you're using those "dynamic" fields only for visua

Re: Help Indexing Large File

2015-12-14 Thread Jack Krupansky
and then index the raw text. -- Jack Krupansky On Mon, Dec 14, 2015 at 12:04 PM, Antelmo Aguilar wrote: > Hello, > > I am trying to index a very large file in Solr (around 5GB). However, I > get out of memory errors using Curl. I tried using the post script and I > had some

Re: similarity as a parameter

2015-12-15 Thread Jack Krupansky
You would need to define an alternate field which copied a base field but then had the desired alternate similarity, using SchemaSimilarityFactory. See: https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements -- Jack Krupansky On Tue, Dec 15, 2015 at 10:02 AM, Dmitry Kan wrote

Re: similarity as a parameter

2015-12-15 Thread Jack Krupansky
same things as well. -- Jack Krupansky On Tue, Dec 15, 2015 at 2:42 PM, Chris Hostetter wrote: > > : Sweetspot does require reindexing but is that the only one? I have not > : investigated some exotic implementations, anyone to confirm sweetspot is > : the only one? In that case you

Re: Partial sentence match with block join

2015-12-15 Thread Jack Krupansky
ink of the company as being named "Apple Computer" even though they dropped "Computer" from the name back in 2007. Also, it is "Inc.", not "Company", so a proper search would be for "Apple Inc." or the old "Apple Computer, Inc." -- Jack Kr

Re: Solr High Availability

2015-12-15 Thread Jack Krupansky
Solr Cloud provides HA when you configure at least two replicas for each shard and have at least 3 zookeepers. That's it. No deck or detail document is needed. -- Jack Krupansky On Tue, Dec 15, 2015 at 9:07 PM, wrote: > Hi Team, > > Can you help me in understanding in achieving

Re: Solr High Availability

2015-12-15 Thread Jack Krupansky
There is no HA with a single replica for each shard. Replication factor must be at least 2 for HA. -- Jack Krupansky On Wed, Dec 16, 2015 at 12:38 AM, Peter Tan wrote: > Hi Jack, What happens when there is only one replica setup? > > On Tue, Dec 15, 2015 at 9:32 PM, Jack Krupansky

Re: 43sec commit duration - blocked by index merge events?

2015-02-13 Thread Jack Krupansky
that soft commit waits for background merges! (Hoss??) -- Jack Krupansky On Fri, Feb 13, 2015 at 4:47 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Check > http://search-lucene.com/?q=commit+wait+block&fc_type=mail+_hash_+user > > e.g. http://search-

Re: Solr - Mahout

2015-02-13 Thread Jack Krupansky
There is no recommendation built into Solr itself, but you might get some good ideas from this presentation: http://www.slideshare.net/treygrainger/building-a-real-time-solrpowered-recommendation-engine -- Jack Krupansky On Fri, Feb 13, 2015 at 8:33 AM, wrote: > Sir , >I need to kno

Re: Question about session affinity and SolrCloud

2015-02-14 Thread Jack Krupansky
oss users, so a given query is likely to have been queried recently by another user. -- Jack Krupansky On Sat, Feb 14, 2015 at 3:39 PM, jaime spicciati wrote: > All, > This is my current understanding of how SolrCloud load balancing works... > > Within SolrCloud, for a cluster with more

Re: Solr 4.8.1 : Response Code 500 when creating the new request handler

2015-02-15 Thread Jack Krupansky
t in invariants, but also in the actual request, which is a contradiction in terms - what is your actual intent? This isn't the cause of the exception, but does raise questions of what you are trying to do. 4. Why don't you have a q parameter for the actual query? -- Jack Krupansky On

Re: AND query not working on stopwords as expected

2015-02-16 Thread Jack Krupansky
time when they are not at either end of the query. This way, queries such as "to be or not to be", "vitamin a", and "the office" can still provide meaningful and precise matches even as stop words are generally ignored. -- Jack Krupansky On Mon, Feb 16, 2015 at 4

Re: AND query not working on stopwords as expected

2015-02-16 Thread Jack Krupansky
ueries with operators and the case of a leading or trailing stopword. The old Lucid query parser did have better support for queries with stop words, but that's no longer available in their current product. -- Jack Krupansky On Mon, Feb 16, 2015 at 8:16 PM, Alexandre Rafalovitch wrote:

Re: How to achieve lemmatization for english words in Solr 4.10.2

2015-02-18 Thread Jack Krupansky
Please provide a few examples that illustrate your requirements. Specifically, requirements that are not met by the existing Solr stemming filters. What is your specific goal? -- Jack Krupansky On Wed, Feb 18, 2015 at 10:50 AM, dinesh naik wrote: > Hi, > IS there a way to achieve lemmati

Re: edismax removes query string: (pg_int:-1) becomes ()

2015-02-21 Thread Jack Krupansky
he edismax query parser has a few too many parsing heuristics, causing way too many odd combinations that are not exhaustively tested. -- Jack Krupansky On Sat, Feb 21, 2015 at 5:43 PM, Tang, Rebecca wrote: > Hi there, > > I have a field pg_int which is number of pages stored as intege

Re: more like this and term vectors

2015-02-23 Thread Jack Krupansky
It's never helpful when you merely say that it "did not work" - detail the symptom, please. Post both the query and the response. As well as the field and type definitions for the fields for which you expected term vectors - no term vectors are enabled by default. -- Jack Krupans

Re: Special character and wildcard matching

2015-02-23 Thread Jack Krupansky
Is it really a string field - as opposed to a text field? Show us the field and field type. Besides, if it really were a "raw" name, wouldn't that be a capital "B"? -- Jack Krupansky On Mon, Feb 23, 2015 at 6:52 PM, Arun Rangarajan wrote: > I have a string fi

Re: Special character and wildcard matching

2015-02-23 Thread Jack Krupansky
eyword tokenizer and then filter it for lower case, such as when the user query might have a capital "B". String field is most appropriate when the field really is 100% raw. -- Jack Krupansky On Mon, Feb 23, 2015 at 7:37 PM, Arun Rangarajan wrote: > Yes, it is a string field and not

Re: Special character and wildcard matching

2015-02-24 Thread Jack Krupansky
Please post the info I requested - the exact query, and the Solr response. -- Jack Krupansky On Tue, Feb 24, 2015 at 12:45 PM, Arun Rangarajan wrote: > In our case, the lower-casing is happening in a custom Java indexer code, > via Java's String.toLowerCase() method. > > I

Re: Special character and wildcard matching

2015-02-24 Thread Jack Krupansky
u provided in this thread. -- Jack Krupansky On Tue, Feb 24, 2015 at 2:35 PM, Arun Rangarajan wrote: > Exact query: > /select?q=raw_name:beyonce*&wt=json&fl=raw_name > > Response: > > { "responseHeader": {"status": 0,"QTime": 0,

Re: Special character and wildcard matching

2015-02-24 Thread Jack Krupansky
It's a string field, so there shouldn't be any analysis. (read back in the thread for the field and field type.) -- Jack Krupansky On Tue, Feb 24, 2015 at 3:19 PM, Alexandre Rafalovitch wrote: > What happens if the query does not have wildcard expansion (*)? If the > behavior

Re: Problem with queries that includes NOT

2015-02-25 Thread Jack Krupansky
As a general proposition, your first stop with any query interpretation questions should be to add the debigQuery=true parameter and look at the parsed_query in the query response which shows how the query is really interpreted. -- Jack Krupansky On Wed, Feb 25, 2015 at 8:21 AM, wrote: >

Re: Add fields without manually editing Schema.xml.

2015-02-25 Thread Jack Krupansky
Solr also now has a schema API to dynamically edit the schema without the need to manually edit the schema file: https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-AddaDynamicFieldRule -- Jack Krupansky On Wed, Feb 25, 2015 at 3:15 PM, Vishal Swaroop wrote: > Thanks a

Re: Unable to find query result in solr 5.0.0

2015-02-26 Thread Jack Krupansky
. Please confirm which doc you were reading for the tutorial steps. -- Jack Krupansky On Thu, Feb 26, 2015 at 6:17 AM, rupak wrote: > Hi, > > I am new in Solr and using Solr 5.0.0 search server. After installing when > I’m going to search any keyword in solr 5.0.0 it dose not give any re

Re: qt.shards in solrconfig.xml

2015-02-26 Thread Jack Krupansky
s the qt.shards parameter as suggested, to re-emphasize to people that if they want to use a custom handler in distributed mode, then they will most likely need this parameter. -- Jack Krupansky On Thu, Feb 26, 2015 at 11:28 AM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > Hel

Re: Leading Wildcard Support (ReversedWildcardFilterFactory)

2015-02-26 Thread Jack Krupansky
Please post your field type... or at least confirm a comparison to the example in the javadoc: http://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html -- Jack Krupansky On Thu, Feb 26, 2015 at 2:38 PM, jaime spicciati wrote: > All, >

Re: Leading Wildcard Support (ReversedWildcardFilterFactory)

2015-02-26 Thread Jack Krupansky
Most of the magic is done internal to the query parser which actually inspects the index analyzer chain when a leading wildcard is present. Look at the parsed_query in the debug response, and you should see that special prefix query. -- Jack Krupansky On Thu, Feb 26, 2015 at 3:49 PM, jaime

Re: Encrypt Data in SOLR

2015-02-27 Thread Jack Krupansky
You could simply hash the value before sending it to Solr and then hash the user query before sending it to Solr as well. Do you need or want only exact matches, or do you need keyword search, wildcards, etc? -- Jack Krupansky On Fri, Feb 27, 2015 at 4:38 PM, Alexandre Rafalovitch wrote

Re: Search over a multiValued field

2015-03-03 Thread Jack Krupansky
just trying to match the product name and availability. -- Jack Krupansky On Tue, Mar 3, 2015 at 4:51 PM, Tom Devel wrote: > Hi, > > I am running Solr 5.0.0 and have a question about proximity search and > multiValued fields. > > I am indexing xml files of the following form

Re: DocumentAnalysisRequestHandler

2015-03-12 Thread Jack Krupansky
citly registered (refer to SOLR-6792)*". IOW, remove the XML element from your solrconfig. As far as the document analysis request handler, that should still be fine. Are you encountering some problem? The first log line you gave is just an INFO - information only, not a problem. -- Jack Krupans

Distributed IDF performance

2015-03-13 Thread Jack Krupansky
le now using Distributed IDF as their default? I'm not currently using this, but the existing doc and Jira is too minimal to offer guidance as requested above. Mostly I'm just curious. Thanks. -- Jack Krupansky

Re: Parsing error on space

2015-03-13 Thread Jack Krupansky
sted query term with "\u0020". -- Jack Krupansky On Fri, Mar 13, 2015 at 2:37 AM, Rajesh wrote: > Hi, > > I want to retrieve the parent document which contain "Test Street" in > street > field or if any of it's child contain "Test Street" in

Re: Distributed IDF performance

2015-03-13 Thread Jack Krupansky
Oops... I said "StatsInfo" and that should have been "StatsCache" (""). -- Jack Krupansky On Fri, Mar 13, 2015 at 6:04 PM, Anshum Gupta wrote: > There's no rough formula or performance data that I know of at this point. > About he guidance, if you wa

Re: discrepancy between LuceneQParser and ExtendedDismaxQParser

2015-03-16 Thread Jack Krupansky
There was a Solr release with a bug that required that you put a space between the left parenthesis and the "*:*". The edismax parsed query here indicates that the "*:*" has not parsed properly. You have "area", but in your jira you had a range query. -- Jack Krupan

Re: Re[2]: discrepancy between LuceneQParser and ExtendedDismaxQParser

2015-03-17 Thread Jack Krupansky
Great, glad to hear it! One last question: What release of Solr are you using? -- Jack Krupansky On Tue, Mar 17, 2015 at 11:43 AM, Arsen wrote: > Hello Jack, > > Jack, you made "my day" for me. > > Indeed, when I inserted space between "(" and "*:*

Re: Which one is it "cs" or "cz" for Czech language?

2015-03-18 Thread Jack Krupansky
. I think it's worth a Jira - text types should use language codes, not country codes. -- Jack Krupansky On Tue, Mar 17, 2015 at 1:35 PM, Eduard Moraru wrote: > Hi, > > First of all, a bit of a disclaimer: I am not a Czech language speaker, at > all. > > We are using Sol

Re: Solr Unexpected Query Parser Exception

2015-03-20 Thread Jack Krupansky
Which query parser are you using? The dismax query parser does not support wild cards or "*:*". Either way, the error message is unhelpful - worth filing a Jira. -- Jack Krupansky On Fri, Mar 20, 2015 at 7:21 AM, Vishnu Mishra wrote: > Hi, I am using solr 4.10.3 and doing dist

Re: SOLR indexing strategy

2015-03-20 Thread Jack Krupansky
have a slice of the fields. Then separate Solr clusters could be used for each of the slices. -- Jack Krupansky On Fri, Mar 20, 2015 at 7:12 AM, varun sharma wrote: > Requirements of the system that we are trying to build are for each date > we need to create a SOLR index containing abo

Re: SOLR indexing strategy

2015-03-21 Thread Jack Krupansky
only the native keys for the matching records, and then you would do a database lookup in your bulk storage engine directly by those keys to fetch just the records that match the query results. What do your queries tend to look like? -- Jack Krupansky On Sat, Mar 21, 2015 at 5:36 AM, varun sharma

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Jack Krupansky
? Also be careful to be clear about using the Solr term "shard" (a slice, across all replica nodes) as distinct from the Elasticsearch term "shard" (a single slice of an index for a single replica, analogous to a Solr "core".) -- Jack Krupansky On Tue, Mar 24, 2015 at 9

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Jack Krupansky
multi-tenant cluster, and if there are a large number of tenants of even a moderate number of large tenants, you can't expect them to all run reasonably on a relatively small cluster. Think about scalability. -- Jack Krupansky On Tue, Mar 24, 2015 at 1:22 PM, Ian Rose wrote: > Let me gi

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Jack Krupansky
Don't confuse customers and tenants. -- Jack Krupansky On Tue, Mar 24, 2015 at 2:24 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Sorry Jack. That doesn't scale when you have millions of customers. And > these are good problems to have! > > On Tue, Ma

Re: rough maximum cores (shards) per machine?

2015-03-24 Thread Jack Krupansky
s entity controlling the configuration of a single (Solr) server is a recipe for disaster. Solr works well if there is an architect for the system. Ever hear the old saying "Too many cooks spoil the stew"? -- Jack Krupansky On Tue, Mar 24, 2015 at 3:54 PM, Toke Eskildsen wrote:

Re: rough maximum cores (shards) per machine?

2015-03-25 Thread Jack Krupansky
s for lazy-loading of cores. That may work for you with hundreds (thousands?!) of cores/collections for tenants who are mostly idle or dormant, but if the server is running long enough, it may build up a lot of memory usage for collections that were active but have gone idle after days or weeks.

Re: Problem with Terms Query Parser

2015-03-25 Thread Jack Krupansky
That should work. Check to be sure that you really are running Solr 5.0. Was it an old version of trunk or the 5x branch before last August when the terms query parser was added? -- Jack Krupansky On Tue, Mar 24, 2015 at 5:15 PM, Shamik Bandopadhyay wrote: > Hi, > > I'm tryin

Re: Can SOLR custom analyzer access another field's value?

2015-03-27 Thread Jack Krupansky
? -- Jack Krupansky On Fri, Mar 27, 2015 at 12:22 PM, Alex Sylka wrote: > I am trying to write a custom analyzer , whose execution is determined by > the value of another field within the document. > > For example if the locale field in the document has 'de' as the value, then &g

Re: Structured and Unstructured data indexing in SolrCloud

2015-03-29 Thread Jack Krupansky
The first step is to work out the queries that you wish to perform - that will determine how the data should be organized in the Solr schema. -- Jack Krupansky On Sun, Mar 29, 2015 at 4:04 PM, Vijay Bhoomireddy < vijaya.bhoomire...@whishworks.com> wrote: > Hi, > > > >

Re: how do you replicate solr-cloud between datacenters?

2015-03-30 Thread Jack Krupansky
That's an open issue. See: https://issues.apache.org/jira/browse/SOLR-6273 -- Jack Krupansky On Mon, Mar 30, 2015 at 5:45 PM, Timothy Ehlers wrote: > Can you use /replication ??? How would you do this between datacenters? > > -- > Tim Ehlers >

Re: Stopwords magic

2015-03-31 Thread Jack Krupansky
Use the Solr Admin UI analysis page to see how the text is analyzed at both index and query time. My e-book does have more narrative and examples for stop word processing: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html -- Jack

Re: Customzing Solr Dedupe

2015-04-01 Thread Jack Krupansky
ld by field comparison to all existing documents. -- Jack Krupansky On Wed, Apr 1, 2015 at 6:35 AM, thakkar.aayush wrote: > I'm facing a challenges using de-dupliation of Solr documents. > > De-duplicate is done using TextProfileSignature with following parameters: > field1, fi

Re: Alphanumeric Wild card search

2015-04-02 Thread Jack Krupansky
ually happened to match the full indexing filtering. This is a limitation of Solr. You just have to learn to live with it. Or... don't use the word delimiter filter when you need to be able to do wildcards of multi-part terms. -- Jack Krupansky On Thu, Apr 2, 2015 at 3:43 AM, Palagiri,

Re: edismax operators

2015-04-02 Thread Jack Krupansky
distribute to each term within the nested query. They don't magically distribute to all nested queries. Let's see you full set of query parameters, both on the request and in solrconfig. -- Jack Krupansky On Thu, Apr 2, 2015 at 7:12 AM, Mahmoud Almokadem wrote: > Hello, >

Re: Question regarding enablePositionIncrements

2015-04-02 Thread Jack Krupansky
Position increments were considered problematic, especially for highlighting. Did you get this for the stop filter? There was a Jira for this - check CHANGES.TXT and the Jira for details. For some discussion, see: https://issues.apache.org/jira/browse/SOLR-6468 -- Jack Krupansky On Thu, Apr 2

Re: Question regarding enablePositionIncrements

2015-04-02 Thread Jack Krupansky
That's my understanding - but use the Solr Admin UI analysis page to confirm exactly what happens, for both index and query analysis. -- Jack Krupansky On Thu, Apr 2, 2015 at 10:04 AM, Aman Tandon wrote: > Hi Jack, > > I read that jira, i understand the concern of heaven. >

Re: edismax operators

2015-04-02 Thread Jack Krupansky
the committers sort out whether it is really a bug or simply needs better doc for its expected behavior on this specific issue. -- Jack Krupansky On Thu, Apr 2, 2015 at 1:02 PM, Mahmoud Almokadem wrote: > Thanks all for you response, > > But the parsed_query and number of results s

Re: WordDelimiterFilterFactory - tokenizer question

2015-04-05 Thread Jack Krupansky
You have to tell the filter what types of tokens to generate - words, numbers. You told it to generate... nothing. You did tell it to preserve the original, unfiltered token though, which is fine. -- Jack Krupansky On Sun, Apr 5, 2015 at 3:39 AM, Mike L. wrote: > Solr User Group, >

Re: search on special characters

2015-04-08 Thread Jack Krupansky
this punctuation. You can specify a character type map to treat specific characters as letters. See the doc. (or the examples in my e-book.) -- Jack Krupansky On Wed, Apr 8, 2015 at 2:50 AM, avinash09 wrote: > not able to search on special characters like . ,_ > > my query > http

Re: Keeping frequently changing fields out of SOLR

2015-04-08 Thread Jack Krupansky
How much RAM do you have? Check whether your system is compute-bound or I/O-bound? If all or most of your index doesn't fit in the system memory available for file caching, you're asking for trouble. Is the indexing time also unacceptably slow, or just the query time? -- Jack Krupans

Re: SOLR searching

2015-04-08 Thread Jack Krupansky
, price_, and fill in the user-id when doing the query. -- Jack Krupansky On Wed, Apr 8, 2015 at 5:21 PM, Brian Usrey wrote: > I am extremely new to SOLR and am wondering if it is possible to do > something like the following. Basically I have been tasked with researching > SOLR to see

Re: Query regarding solr input from queue

2015-04-11 Thread Jack Krupansky
It would be better to implement such logic as a separate process - watching for events or reading from a stream, and then feeding discrete requests (or modest-sized batches of documents) to Solr in parallel with such processing. -- Jack Krupansky On Sat, Apr 11, 2015 at 1:49 AM, vishal dsouza

Re: Bq Question - Solr 4.10

2015-04-11 Thread Jack Krupansky
get a score of 0.7/2 = 0.35. IOW, apply an additive boost of 1.0 and then a multiplicative boost of 0.5. -- Jack Krupansky On Sat, Apr 11, 2015 at 12:28 PM, Mike L. wrote: > Hello - > I have qf boosting setup and that works well and balanced across > different fields. >

Re: Problem related to filter on Zero value for DateField

2015-04-14 Thread Jack Krupansky
What does your main query look like? Normally we don't speak of "searching" with the fq parameter - it filters the results, but the actual searching is done via the main query with the q parameter. -- Jack Krupansky On Tue, Apr 14, 2015 at 4:17 AM, Ali Nazemian wrote: > Dea

Re: Indexing PDF and MS Office files

2015-04-14 Thread Jack Krupansky
as a bitmap image, so no text is extracted. -- Jack Krupansky On Tue, Apr 14, 2015 at 10:57 AM, Vijaya Narayana Reddy Bhoomi Reddy < vijaya.bhoomire...@whishworks.com> wrote: > Hi, > > I am trying to index PDF and Microsoft Office files (.doc, .docx, .ppt, > .pptx, .xlx, an

Re: ContentTypes supported by Solr to index

2015-04-15 Thread Jack Krupansky
Check to see if there are any errors in the Solr log for jpg and zip files. Solr should do something for them - if not, file a Jira to suggest that it should, as an imporvement. Zip should give a list of the enclosed files. Images should at least give the metadata. -- Jack Krupansky On Wed, Apr

Re: Correspondance table ?

2015-04-20 Thread Jack Krupansky
PI service layer could be the way to go. In any case, don't try to load too much work onto the Solr server itself. -- Jack Krupansky On Mon, Apr 20, 2015 at 7:32 AM, Bruno Mannina wrote: > Hi Alex, > > well ok but if I have a big table ? more than 10 000 entries ? > is it safe to do

Re: Boolean filter query not working as expected

2015-04-22 Thread Jack Krupansky
A purely negative sub-query is not supported by Lucene - you need to have at least one positive term, such as "*:*, at each level of sub-query. Try: ((*:* -(field:V1) AND -(field:V2)) AND -(field:V3)) -- Jack Krupansky On Wed, Apr 22, 2015 at 10:56 AM, Dhutia, Devansh wrote: > I

Re: TIKA OCR not working

2015-04-23 Thread Jack Krupansky
It's not clear if OCR would happen automatically in Solr Cell, or if changes to Solr would be needed. For Tika OCR info, see: https://issues.apache.org/jira/browse/TIKA-93 https://wiki.apache.org/tika/TikaOCR -- Jack Krupansky On Thu, Apr 23, 2015 at 9:14 AM, Alexandre Rafalovitch

Re: Odp.: solr issue with pdf forms

2015-04-30 Thread Jack Krupansky
Or use a Solr update processor to scrub the source values. The regex pattern replacement processor could do the trick: http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html -- Jack Krupansky On Thu, Apr 30, 2015 at 11:17 AM, Erick

Re: Limit the documents for each shard in solr cloud

2015-05-07 Thread Jack Krupansky
en each virtual machine gets only a fairly tiny amount of SSD disk storage space? Just guessing here A little clarification is in order. In any case, if you really only have such a limited amount of storage per node, that probably simply means that you need more nodes. -- Jack Krupansky On Thu, M

Re: Limit the documents for each shard in solr cloud

2015-05-07 Thread Jack Krupansky
number of shards times the replication factor. But then divided by shards per node if you do place more than one shard per node. -- Jack Krupansky On Thu, May 7, 2015 at 1:29 AM, Jilani Shaik wrote: > Hi, > > Is it possible to restrict number of documents per shard in Solr cloud? > > L

Re: Transactional Behavior

2015-05-12 Thread Jack Krupansky
Solr does have a command, but it is an expert feature and not so clear how it works in SolrCloud. See: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers and https://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22 -- Jack Krupansky On Tue, May 12, 2015

Re: Cannot get "uf" (user fields) to work

2015-05-14 Thread Jack Krupansky
goes into the q parameter, not the client app that is forming the overall Solr query request. -- Jack Krupansky On Thu, May 14, 2015 at 7:49 AM, Steven White wrote: > Hi Everyone, > > I'm trying to utilize "uf" but it doesn't work. My reading of it per: > >

Re: Problem with solr.LengthFilterFactory

2015-05-15 Thread Jack Krupansky
/solr-core/org/apache/solr/update/processor/TruncateFieldUpdateProcessorFactory.html -- Jack Krupansky On Fri, May 15, 2015 at 11:38 AM, Charles Sanders wrote: > Yes, that is what I am seeing. Looking in the code myself, I see no reason > for this behavior. That is why I assumed I was

Re: Problem with solr.LengthFilterFactory

2015-05-15 Thread Jack Krupansky
But... if your term is a string anyway, you could just use the keyword tokenizer. -- Jack Krupansky On Fri, May 15, 2015 at 4:06 PM, Charles Sanders wrote: > Shawn, > Thanks a bunch for working with me on this. > > I have deleted all records from my index. Stopped solr. Made t

Re: Wildcard/Regex Searching with Decimal Fields

2015-05-18 Thread Jack Krupansky
that is causing you to resert to pattern matching and wildcards? I can't wait to hear! I mean, if you simply want to match one of a set of numbers that are not in a consecutive range, try the OR operator. -- Jack Krupansky On Mon, May 18, 2015 at 11:20 AM, Todd Long wrote: > I'

Re: Problem with solr.LengthFilterFactory

2015-05-18 Thread Jack Krupansky
text field, so that Lucene still knows it as an unanalyzed string field? You need to delete the index and start over if you want to change the field types like that. -- Jack Krupansky On Mon, May 18, 2015 at 8:33 AM, Charles Sanders wrote: > Jack, > Thanks for the information. If I understan

Re: Problem with solr.LengthFilterFactory

2015-05-18 Thread Jack Krupansky
nizer was it? -- Jack Krupansky On Mon, May 18, 2015 at 12:21 PM, Charles Sanders wrote: > No, the field has always been text. And from the error, its obviously > passing a very large token to the index, regardless of the tokenizer and > filter. > > So I guess I will have to tok

Re: Deduplication

2015-05-19 Thread Jack Krupansky
e the distribution step, but is that distribution to the leader, or distribution from leader to replicas for a shard? -- Jack Krupansky On Tue, May 19, 2015 at 9:01 AM, Shawn Heisey wrote: > On 5/19/2015 3:02 AM, Bram Van Dam wrote: > > I'm looking for a way to have Solr reject doc

Re: Suggestion on field type

2015-05-19 Thread Jack Krupansky
"double" (solr.TrieDoubleField) gives more precision See: https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/schema/TrieDoubleField.html -- Jack Krupansky On Tue, May 19, 2015 at 11:27 AM, Vishal Swaroop wrote: > Please suggest which numeric field type to use so t

Re: Term Frequency Calculation - Clarification

2015-05-20 Thread Jack Krupansky
Yes. tf is both 1 and 2 - tf is per document, which is 1 for the first document and 2 for the second document. See: http://lucene.apache.org/core/5_1_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html -- Jack Krupansky On Wed, May 20, 2015 at 6:13 AM, ariya bala wrote: >

Re: When is too many fields in "qf" is too many?

2015-05-20 Thread Jack Krupansky
pushed you to have documents with 1500 field? Also, is this 1500 fields that are always populated, or are there really a larger number of different record types, each with a relatively small number of fields populated in a particular document? -- Jack Krupansky On Wed, May 20, 2015 at 8:27 AM, S

Re: Removing characters like '\n \n' from indexing

2015-05-26 Thread Jack Krupansky
. Again, the distinction is between indexed field values and stored field values. -- Jack Krupansky On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo wrote: > It is showing up in the search results. Just to confirm, does this > UpdateProcessor method remove the characters during index

Re: When is too many fields in "qf" is too many?

2015-05-28 Thread Jack Krupansky
horribly wrong. Focus on designing your app to exploit the capabilities of Solr, not to misuse them. In short, to answer the original question, more than a couple dozen fields in qf is indeed too many. More than a dozen raises a yellow flag for me. -- Jack Krupansky On Thu, May 28, 2015 at 8:13 A

Re: HW requirements

2015-05-28 Thread Jack Krupansky
. Solr and Lucene do not merely index a bulk blob of bytes, but semi-structured data, in the form of documents and fields. In some cases the indexed data can be smaller than the source data, but it can sometimes be larger as well. -- Jack Krupansky On Wed, May 27, 2015 at 12:33 PM, Sznajder

Re: When is too many fields in "qf" is too many?

2015-05-28 Thread Jack Krupansky
en in doubt, make your schema as clean and simple as possible. Simplicity over complexity. -- Jack Krupansky On Thu, May 28, 2015 at 12:06 PM, Erick Erickson wrote: > Gotta agree with Jack here. This is an insane number of fields, query > performance on any significant corpus will be &q

Re: any changes about limitations on huge number of fields lately?

2015-05-30 Thread Jack Krupansky
small documents, not large documents. -- Jack Krupansky On Sat, May 30, 2015 at 3:05 PM, Erick Erickson wrote: > Nothing's really changed in that area lately. Your co-worker is > perhaps confusing the statement that "Solr has no a-priori limit on > the number of distinct

Re: Looking for doc on LimitTokenCountFilterFactory

2015-06-02 Thread Jack Krupansky
other than the Lucene limit of unique terms per segment, which is in the billions. Yeah, that should be more clearly documented. -- Jack Krupansky On Tue, Jun 2, 2015 at 10:29 AM, Steven White wrote: > Hi everyone > > I cannot find much useful info on LimitTokenCountFilterFactory other t

Re: Solr Atomic Updates

2015-06-02 Thread Jack Krupansky
://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents -- Jack Krupansky On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова wrote: > Hi! > > I'm using *SOLR 4.4.0* for searching in my project. > Now I am facing a problem of atomic updates in multiple cores. > From

Re: Solr Atomic Updates

2015-06-03 Thread Jack Krupansky
Explain a little about why you have separate cores, and how you decide which core a new document should reside in. Your scenario still seems a bit odd, so help us understand. -- Jack Krupansky On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова wrote: > Hi! > > Thanks for your qu

<    1   2   3   4   5   6   7   8   9   10   >