Re: questions about shard key

2016-10-12 Thread hairymcclarey
See here: https://lucidworks.com/blog/2013/06/13/solr-cloud-document-routing/ The default is to take 16 bits from the prefix and 16 from the ID. Not sure about the second part of your question, maybe someone else can answer that. On Wednesday, October 12, 2016 9:26 PM, "Huang, Daniel" wro

Re: Split words with period in between into separate tokens

2016-10-12 Thread Derek Poh
Why didn't I thought of that. That's another alternative. Thank you for your suggestion. Appreciate it. On 10/13/2016 5:41 AM, Georg Sorst wrote: You could use a PatternReplaceCharFilter before your tokenizer to replace the dot with a space character. Derek Poh schrieb am Mi., 12. Okt. 2016 1

Re: London Lucene Hackday is now running

2016-10-12 Thread Charlie Hull
On the Flax blog (www.flax.co.uk/blog) eventually, but for now I've made some notes at https://github.com/flaxsearch/london-hackday-2016 . We had 20 people on Friday in London and 15 in Boston on Tuesday, everyone seemed to enjoy themselves - and we made some real progress on a number of issues. So

[Solr 5.1.0] - Ignoring Whitespaces as delimiters

2016-10-12 Thread deniz
Hello, Are there any built-in tokenizers which will do sth like StandardTokenizer, but will not tokenize on whitespace? e.g field:abc cde-rfg will be tokenized as "abc cde" and "rfg", not "abc", "cde", "rfg" I have checked the existing tokenizers/analyzers and it seems like there is no other w

Re: qf boosts with MoreLikeThis query parser

2016-10-12 Thread Ere Maijala
Answering to myself.. I did some digging and found out that boosts work if qf is repeated in the local params, at least in Solr 6.2, like this: {!mlt qf=title^100 qf=author=^50}recordid However, it doesn't work properly with CloudMLTQParser used in SolrCloud mode. I'm working on a proposed fix

Re: How to retrieve 200K documents from Solr 4.10.2

2016-10-12 Thread Nick Vasilyev
Check out cursorMark, it should be available in your release. There is some good information on this page: https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results On Wed, Oct 12, 2016 at 5:46 PM, Salikeen, Obaid < obaid.salik...@iacpublishinglabs.com> wrote: > Hi, > > I am using

How to retrieve 200K documents from Solr 4.10.2

2016-10-12 Thread Salikeen, Obaid
Hi, I am using Solr 4.10.2. I have 200K documents sitting on Solr cluster (it has 3 nodes), and let me first state that I am new Solr. I want to retrieve all documents from Sold (essentially just one field from each document). What is the best way of fetching this much data without overloading

Re: Split words with period in between into separate tokens

2016-10-12 Thread Georg Sorst
You could use a PatternReplaceCharFilter before your tokenizer to replace the dot with a space character. Derek Poh schrieb am Mi., 12. Okt. 2016 11:38: > Seems like LetterTokenizerFactory tokenise/discard on numbers as well. The > field does has values with numbers in them therefore it is not a

questions about shard key

2016-10-12 Thread Huang, Daniel
Hi, I was reading about document routing with CompositId (https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud). The document says that I could prefix a shard key to a document ID like “IBM!12345”. It further mentioned that I could specify the number of bit for

Re: Split words with period in between into separate tokens

2016-10-12 Thread Derek Poh
Seems like LetterTokenizerFactory tokenise/discard on numbers as well. The field does has values with numbers in them therefore it is not applicable. Thank you. On 10/12/2016 4:22 PM, Dheerendra Kulkarni wrote: You can use LetterTokenizerFactory instead. Regards, Dheerendra Kulkarni On Wed,

multivalued coordinate for geospatial search

2016-10-12 Thread Chris Chris
Hello solr users! I am trying to use geospatial to do some basic distance search in Solr4.10 At the moment, I got it working if I have just on set of coordinate (latitude,longitude) per document. However, I need to get it to work when I have an unknown numbers of set of coordinates per document:

Re: Split words with period in between ("Co.Ltd") into separate tokens

2016-10-12 Thread Derek Poh
Thank you for pointing out the flags. I set generateWordParts=1 and the term is split up. On 10/12/2016 3:26 PM, Modassar Ather wrote: Hi, The flags set in your WordDelimiterFilterFactory definition is 0. You can try with generateWordParts=1 and splitOnCaseChange=1 and see if it breaks as per y

Re: Split words with period in between into separate tokens

2016-10-12 Thread Dheerendra Kulkarni
You can use LetterTokenizerFactory instead. Regards, Dheerendra Kulkarni On Wed, Oct 12, 2016 at 6:24 AM, Derek Poh wrote: > Hi > > How can I split words with period in between into separate tokens. > Eg. "Co.Ltd" => "Co" "Ltd" . > > I am using StandardTokenizerFactory and it does notreplace pe

Re: Split words with period in between ("Co.Ltd") into separate tokens

2016-10-12 Thread Modassar Ather
Hi, The flags set in your WordDelimiterFilterFactory definition is 0. You can try with generateWordParts=1 and splitOnCaseChange=1 and see if it breaks as per your requirement. You can also try with other available flags enabled. Best, Modassar On Wed, Oct 12, 2016 at 12:44 PM, Derek Poh wrote:

Re: Split words with period in between ("Co.Ltd") into separate tokens

2016-10-12 Thread Derek Poh
I tried adding Word Delimiter Filter to the field but it does not process or it truncate away the term "Co.Ltd". generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> On 10/12/2016 8:54 AM, Derek Poh wrote: Hi How can I split words with period

Re: Re: Config for massive inserts into Solr master

2016-10-12 Thread Reinhard Budenstecher
> That is not correct as of version 4.0. > > The only kind of update I've run into that cannot proceed at the same > time as an optimize is a deleteByQuery operation. If you do that, then > it will block until the optimize is done, and I think it will also block > > any update you do after it. >