Re: EdgeNGramFilterFactory for Chinese characters

2015-10-25 Thread Tomoko Uchida
> Will try to see if there is anyway to managed it by only a single field? Of course you can try to create custom Tokenizer or TokenFilter that perfectly meets your needs. I would copy the source codes of EdgeNGramTokenFilter and modify incrementToken() method. It seems reasonable way for me. incr

Re: missing in json facet does not work for stream?

2015-10-25 Thread Gopal Patwa
Docs are available in Ref guide for Json facet and Json request api https://cwiki.apache.org/confluence/display/solr/Faceted+Search https://cwiki.apache.org/confluence/display/solr/JSON+Request+API On Sun, Oct 25, 2015 at 7:08 PM, hao jin wrote: > Thanks, Yonik. > > When the shards parameter i

Re: EdgeNGramFilterFactory for Chinese characters

2015-10-25 Thread Zheng Lin Edwin Yeo
Hi Tomoko, Thank you for your recommendation. I wasn't in favour of using copyField at first to have 2 separate fields for English and Chinese tokens, as it not only increase the index size, but also slow down the performance for both indexing and querying. Will try to see if there is anyway to

Re: Order of actions in Update request

2015-10-25 Thread Jamie Johnson
Yes if they are in separate requests I imagine it would work though I haven't tested. I was wondering if there was a way to execute these actions in a single request and maintain order. On Oct 24, 2015 3:25 PM, "Shawn Heisey" wrote: > On 10/24/2015 5:21 AM, Jamie Johnson wrote: > > Looking at th

Re: missing in json facet does not work for stream?

2015-10-25 Thread hao jin
Thanks, Yonik. When the shards parameter is specified in a json facet query with the stream method, it is still ordered by the count by default. From our perf. test with totally 100,000,000 docs, the stream method is the best and the enum method does not work for the field faceting. I saw JSON

Re: How to get the join data by multiple cores?

2015-10-25 Thread Mikhail Khludnev
Raised https://issues.apache.org/jira/browse/SOLR-8208 There are a lot of questions to discuss. On Thu, Oct 22, 2015 at 11:47 PM, Erick Erickson wrote: > Mikhail: > > Brilliant! Assuming we can get the "from" and "to" parameters out of > the query and, perhaps, the fromIndex (for cross-core) the

Re: DIH Caching with Delta Import

2015-10-25 Thread Erick Erickson
Have you considered using SolrJ instead of DIH? I've seen situations where that can make a difference for things like caching small tables at the start of a run, see: searchhub.org/2012/02/14/indexing-with-solrj/ Best, Erick On Sat, Oct 24, 2015 at 6:17 PM, Todd Long wrote: > Dyer, James-2 wrot

Re: Indexing multiple cores simultaneously

2015-10-25 Thread Erick Erickson
Let's back up a bit and ask what your primary goal is. Just indexing a bunch of stuff as fast as possible? By and large, I'd index to a single core with multiple threads rather than the approach you're taking (I'm assuming that there's a MERGEINDEXES somewhere in this process). You should be able t

Indexing multiple cores simultaneously

2015-10-25 Thread Peri Subrahmanya
Hi, I wanted to check if the following would work; 1. Spawn n threads 2. Create n-cores 3. Index n records simultaneously in n-cores 4. Merge all core indexes into a single master core I have been able to successfully do this for 5 threads (5 cores) with 1000 documents each. However, I wanted

Re: Analytics using Solr

2015-10-25 Thread Alexandre Rafalovitch
It is a very general question. So, the general answer is yes. To get a sample of what's possible, I recommend you check out Solr Revolution presentations from this year and presentations+video from last year. There were at least a couple that you may find interesting. http://www.slideshare.net/lu

Re: Analytics using Solr

2015-10-25 Thread Erik Hatcher
For sure! It’s a very common way our customers are leveraging Solr (and our platform, Fusion) along with banana (/Silk) dashboards. What kind of analytics are you aiming to achieve? Chances are, Solr’s got your needs covered. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.c

Analytics using Solr

2015-10-25 Thread Salman Ansari
Hi, I was wondering if it is possible (and recommended) to run Analytics using Solr. For example big data analytics. Any ideas? Regards, Salman

Re: Solr Pagination

2015-10-25 Thread Salman Ansari
Thanks guys for your responses. That's a very very large cache size. It is likely to use a VERY large amount of heap, and autowarming up to 4096 entries at commit time might take many *minutes*. Each filterCache entry is maxDoc/8 bytes. On an index core with 70 million documents, each filterCac

Re: Using the ExtractRequestHandler

2015-10-25 Thread Erik Hatcher
Salonee - What are you trying to do exactly? Have a read of https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika and let us know what’s not working

Re: EdgeNGramFilterFactory for Chinese characters

2015-10-25 Thread Tomoko Uchida
Hi, Edwin, > This means it is better to have 2 separate fields for English and Chinese words? Yes. I mean, 1. Define FIELD_1 that use HMMChineseTokenizerFactory to extract English and Chinese tokens. 2. Define FIELD_2 that use PatternTokenizerFactory to extract English tokens and EdgeNGramFilter

Re: EdgeNGramFilterFactory for Chinese characters

2015-10-25 Thread Zheng Lin Edwin Yeo
Hi Tomoko, Thank you for your reply. > If you need to perform partial (prefix) match for **only English words**, > you can create a separate field that keeps only English words (I've never > tried that, but might be possible by PatternTokenizerFactory or other > tokenizer/filter chains...,) and a

Fw: new message

2015-10-25 Thread Matthew Annen
Hey! New message, please read Matthew Annen

Fw: new message

2015-10-25 Thread Ben Tilly
Hey! New message, please read Ben Tilly