Spell checking the synonym list?

2015-07-09 Thread Ryan Yacyshyn
Hi all, I'm wondering if it's possible to have spell checking performed on terms in the synonym list? For example, let's say I have documents with the word "lawyer" in them and I add "lawyer, attorney" in the synonyms.txt file. Then a query is made for the word "atorney". Is there any way to prov

Re: Lost connection to Zookeeper

2015-07-09 Thread Eirik Hungnes
Hi, We are facing the same issues on our setup. 3 zk nodes, 1 shard, 10 collections, 1 replica. v. 5.0.0. default startup params. Solr Servers: 2 core cpu, 7gb memory Index size: 28g, 3gb heap This setup was running on v. 4.6 before upgrading to 5 without any of these errors. The timeout seems to

Re: Can I instruct the Tika Entity Processor to skip the first page using the DIH?

2015-07-09 Thread Charlie Hull
On 08/07/2015 20:39, Allison, Timothy B. wrote: Unfortunately, no. We can't even do that now with straight Tika. I imagine this is for pdf files? If you'd like to add this as a feature, please submit a ticket over on Tika. Another alternative is to pre-process the PDF files to remove the fir

Re: Solr cache when using custom scoring

2015-07-09 Thread amid
Mikhail, We've now override the equal & hashcode of the custom query to use this new param as well, and it works like charm. Thanks allot, Ami -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-cache-when-using-custom-scoring-tp4216419p4216496.html Sent from the Solr - U

RE: Do I really need copyField when my app can do the copy?

2015-07-09 Thread Nir Barel
Hi, I wants to add a question regarding copyField and LowerCaseFilterFactory We notice that LowerCaseFilterFactory takes huge part of the CPU ( via profiling ) for the text filed Can we avoid it or improve that implementation? ( keeping the insensitive case search ) Best Regards, Nir Barel -

Re: Too many Soft commits and opening searchers realtime

2015-07-09 Thread Alessandro Benedetti
Cool ! So actually you were not using the default you defined in th Solrconfig, but it was loaded from a java environment property set to be "3" ms ? Cheers 2015-07-09 4:21 GMT+01:00 Summer Shire : > Yonik, Mikhail, Alessandro > > After a lot of digging around and isolation, All u guys were

Re: Do I really need copyField when my app can do the copy?

2015-07-09 Thread Alessandro Benedetti
Let me answer in line : 2015-07-09 9:35 GMT+01:00 Nir Barel : > Hi, > > I wants to add a question regarding copyField and LowerCaseFilterFactory > We notice that LowerCaseFilterFactory takes huge part of the CPU ( via > profiling ) for the text filed > Can we avoid it or improve that implementati

Data form cataloggroup and catalogentry cores

2015-07-09 Thread santosh sidnal
Hi All, Is there a way to get a combined data from 2 different cores together in a single call? like a data form both CatalogEntry and CatalogGroup cores in a single call to solr. -- Regards, Santosh Sidnal

Re: Ranking based on term position

2015-07-09 Thread JACK
Hi Li Li, I am experiencing the same problem. can you Explain little detailed? Where do i change these methods? I am using Solr 5.0.0, And How do i query this? Is there any change while query? -- View this message in context: http://lucene.472066.n3.nabble.com/Ranking-based-on-term-position-tp

RE: Spell checking the synonym list?

2015-07-09 Thread Reitzel, Charles
One of the uses of synonyms is to replace a mis-spelled query term with a correctly spelled value. The "2 sided" synonym file format allows you to control which values "survive" into the actual query. lawyer, attorney, ambulance chaser, atorney, lowyor => lawyer, attorney I am not aware, howev

Restore index API does not work in solr 5.1.0 ?

2015-07-09 Thread dinesh naik
Hi all, How can we restore the index in Solr 5.1.0 ? We did following: 1:- Started Solr Cloud from: bin/solr start -e cloud -noprompt 2:- posted some documents to solr from examples folder using : java -Dc=gettingstarted -jar post.jar *.xml 3:- Backed up t

SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-09 Thread Paden
Hello, I've been working to get a search engine up an running for a little while now. I'm using Solr to index from both a database and a file system. However, I'm using the filepath contained inside the database to find the file in the filesystem and then merge the the metadata in the DB and the

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-09 Thread Paden
I posted the code anyway just forgot to get rid of that line in the post. Sorry -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Tika-custom-indexer-not-indexing-CERTAIN-doc-text-tp4216541p4216542.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: LowerCaseFilterFactory burns CPU

2015-07-09 Thread Reitzel, Charles
That should be fixable. In a past life, I generated a perfect hash to fold case for Unicode in a locale-neutral manner and it was very fast. If I remember right, there are only about 2500 Unicode characters that can be case folded at all. So the generated, collision-free hash function was v

Re: Do I really need copyField when my app can do the copy?

2015-07-09 Thread Shawn Heisey
On 7/9/2015 2:35 AM, Nir Barel wrote: > I wants to add a question regarding copyField and LowerCaseFilterFactory > We notice that LowerCaseFilterFactory takes huge part of the CPU ( via > profiling ) for the text filed > Can we avoid it or improve that implementation? ( keeping the insensitive >

RE: Spell checking the synonym list?

2015-07-09 Thread Dyer, James
Ryan, If you use index-time synonyms on the spellcheck field, this will give you what you want. For instance, if the document has "lawyer" and you index both terms "lawyer","attorney", then the spellchecker will see that "atorney" is 1 edit away from an indexed term and will suggest "attorney"

RE: Do I really need copyField when my app can do the copy?

2015-07-09 Thread Reitzel, Charles
-Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Thursday, July 09, 2015 9:55 AM To: solr-user@lucene.apache.org Subject: Re: Do I really need copyField when my app can do the copy? On 7/9/2015 2:35 AM, Nir Barel wrote: > I wants to add a question regarding copyF

Class loading problem from ${solr.solr.home}/lib in 5.2.1

2015-07-09 Thread Shawn Heisey
I was having a problem in a 4.x version of Solr and wanted to check 5.2.1 to see if it still had the same problem, so I copied my fieldType into a 5.2.1 example schema. My fieldType uses some ICU analysis classes, so I also put the contrib jars into server/solr/lib. I ran into a problem similar t

RE: LowerCaseFilterFactory burns CPU

2015-07-09 Thread Reitzel, Charles
Combining under new subject to reflect new question. Took a quick look at both the LowerCaseFilter and Java implementation it uses. A perfect hash would be much faster and, since LowerCaseFilter does not consider locale, applicable. ICUFoldingFilter is a somewhat different animal. But I ta

[JOB] Financial search engine company AlphaSense is looking for Search Engineers

2015-07-09 Thread Dmitry Kan
Company: AlphaSense https://www.alpha-sense.com/ Position: Search Engineer AlphaSense is a one-stop financial search engine for financial research analysts all around the world. AlphaSense is looking for Search Engineers experienced with Lucene / Solr and search architectures in general. Position

How to determine cache setting in Solr Search Instance

2015-07-09 Thread wwang525
Hi All, I did a load test with a total of 800 requests (at 40 concurrent requests per second) to be executed against Solr index with 14 M records. Performance was good (< 1 second) especially after a short period of time of the test. BTW, the second round of load test was even better. The local m

Re: Data form cataloggroup and catalogentry cores

2015-07-09 Thread Erick Erickson
You can try using the "shards" parameter. The problem will be, though, that the score calculations may not really be comparable... Best, Erick On Thu, Jul 9, 2015 at 3:40 AM, santosh sidnal wrote: > Hi All, > > Is there a way to get a combined data from 2 different cores together in a > single c

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-09 Thread Erick Erickson
Wow, that code looks familiar ;)... Anyway, what have you tried? bq: It would pull it but when I got the results in Solr it would look blank How do you know this? Do _some_ docs have text in Solr but some don't or are all of your text fields blank? In this case I suspect you're not storing the da

RE: Can I instruct the Tika Entity Processor to skip the first page using the DIH?

2015-07-09 Thread Allison, Timothy B.
Concur on both points. You can also use PDFBox's app "ExtractText" with -startPage and -endPage parameters: https://pdfbox.apache.org/1.8/commandline.html#extractText -Original Message- From: Charlie Hull [mailto:char...@flax.co.uk] Sent: Thursday, July 09, 2015 3:55 AM To: solr-user@

Re: How to determine cache setting in Solr Search Instance

2015-07-09 Thread Shawn Heisey
On 7/9/2015 9:48 AM, wwang525 wrote: > I did a load test with a total of 800 requests (at 40 concurrent requests > per second) to be executed against Solr index with 14 M records. Performance > was good (< 1 second) especially after a short period of time of the test. > BTW, the second round of loa

Re: How to determine cache setting in Solr Search Instance

2015-07-09 Thread Erick Erickson
It's actually unlikely that increasing the documentCache will help materially. It's primarily so various components won't have to fetch the documents off disk for a _single_ request. I've heard some anecdotal evidence that it helps in some situations, but that's been rare in my experience. Your fi

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-09 Thread Paden
Haha no need to reinvent wheels. Especially when you don't know java. Just a prototype anyway. I made a very strong assumption that it was pulling the text as blank because I would copy the EXACT same text from one file in the file system and put it into another file under a different name, but in

RE: Problem with distributed search using grouping and highlighting

2015-07-09 Thread Cario, Elaine
Rich, I've run into various problems with group.query and highlighting. You noted one below (SOLR-5046), and there is also SOLR-6712, which might be related to what you are experiencing. Still waiting for that patch to be reviewed... -Original Message- From: Rich Hume [mailto:rh...@id

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-09 Thread Erick Erickson
I rather doubt that it's a Solr issue. Text is text after all. If some docs display text, then it's probably a matter of not getting the text in the first place. My _guess_ is that you're not getting any text at all from the document. Either the document isn't being found or it's not a form that T

Re: How to determine cache setting in Solr Search Instance

2015-07-09 Thread wwang525
Hi, The real production requests will not be randomly generated, and a lot of requests will be repeated. I think the performance will be better due to the repeated requests. In addition, I am sure the configuration will need to be adjusted once the application is in production. For the time being

Solr Grouping - sorting groups based on the sum of the scores of the documents within each group

2015-07-09 Thread Emilio Borraz
Hi, I'm having a similar use case, still looking for a solution, I have posted a question about it in Stack Overflow ( http://stackoverflow.com/questions/31281640/sum-field-and-sort-on-solr ) Did you solve it ? Regards. -- Emilio Borraz *Back-end Developer* emilio.bor...@sonatasmx.com

Re: How to determine cache setting in Solr Search Instance

2015-07-09 Thread Erick Erickson
I'd examine the filter queries used to see whether they make sense as well. You really have to re-tune after you start getting real user queries though as anything you generate won't reflect reality. I'd start _much_ smaller, 512 or 1024 and work _up_ with real data. Raising the document cache lim

Re: Windows Version

2015-07-09 Thread Allan Elkowitz
Thanks for all your help.  I decided to switch to Ubuntu linux.  Allan elkowitzelkow...@alumni.caltech.edu On Wednesday, July 8, 2015 1:44 AM, Shawn Heisey wrote: On 7/7/2015 10:43 AM, Allan Elkowitz wrote: > So I am a newbie at Solr and am having trouble getting the examples workin

Get content in response from ExtractingRequestHandler

2015-07-09 Thread trung.ht
Hi everyone, I use solr to index and search in office file (docx, pptx, ...). To reduce the size of solr index, I do not store the content of the file on solr, however now my customer want to preview the content of the file. I have read the document of ExtractingRequestHandler, but it seems that

LogTransformer

2015-07-09 Thread Midas A
I want to log query running through DIH should i use LogTransformer to do that

RE: LogTransformer

2015-07-09 Thread Jagdish Vasani
One thing I noted that you need to give full package detail while mentioning transformer. Like, I have added bellow mailto:test.mi...@gmail.com] Sent: Friday, July 10, 2015 11:08 AM To: solr-user@lucene.apache.org Subject: LogTransformer I want to log query running through DIH should i use LogT

Re: LogTransformer

2015-07-09 Thread Midas A
Hi Jagdish, not working for me. On Fri, Jul 10, 2015 at 11:21 AM, Jagdish Vasani wrote: > One thing I noted that you need to give full package detail while > mentioning transformer. > Like, I have added bellow > > Hope this will help you. > > Thanks, > Jagdish > -Original Message- > Fr