Re: Data Import faile in solr 4.3.0

2013-08-21 Thread Montu v Boda
Thanks for suggestion but as per us this is not the right way to re-index all the data each and every time. we mean when we migrate the sole from older to latest version. there is some way that solr have to provide the solutions for this because re indexing the 50 lac document is not an easy job.

Re: How to avoid underscore sign indexing problem?

2013-08-21 Thread Floyd Wu
After trying some search case and different params combination of WordDelimeter. I wonder what is the best strategy to index string "2DA012_ISO MARK 2" and can be search by term "2DA012"? What if I just want _ to be removed both query/index time, what and how to configure? Floyd 2013/8/22 Floy

Re: How to avoid underscore sign indexing problem?

2013-08-21 Thread Floyd Wu
Thank you all. By the way, Jack I gonna by your book. Where to buy? Floyd 2013/8/22 Jack Krupansky > "I thought that the StandardTokenizer always split on punctuation, " > > Proving that you haven't read my book! The section on the standard > tokenizer details the rules that the tokenizer uses

Re: How to avoid underscore sign indexing problem?

2013-08-21 Thread Jack Krupansky
"I thought that the StandardTokenizer always split on punctuation, " Proving that you haven't read my book! The section on the standard tokenizer details the rules that the tokenizer uses (in addition to extensive examples.) That's what I mean by "deep dive." -- Jack Krupansky -Original

Re: How to avoid underscore sign indexing problem?

2013-08-21 Thread Shawn Heisey
On 8/21/2013 7:54 PM, Floyd Wu wrote: > When using StandardAnalyzer to tokenize string "Pacific_Rim" will get > > ST > textraw_bytesstartendtypeposition > pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]0111 > > How to make this string to be tokenized to these two tokens "Pacific", > "Rim"? > Set _

答复: removing duplicates

2013-08-21 Thread Liu
This picture is extracted from apache-solr-ref-guide-4.4.pdf ,Maybe it will help you. You could download the document from https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ -邮件原件- 发件人: Ali, Saqib [mailto:docbook@gmail.com] 发送时间: 2013年8月22日 5:15 收件人: solr-user@lucene.apache.

How to avoid underscore sign indexing problem?

2013-08-21 Thread Floyd Wu
When using StandardAnalyzer to tokenize string "Pacific_Rim" will get ST textraw_bytesstartendtypeposition pacific_rim[70 61 63 69 66 69 63 5f 72 69 6d]0111 How to make this string to be tokenized to these two tokens "Pacific", "Rim"? Set _ as stopword? Please kindly help on this. Many thanks. F

Re: 4.3 Cloud looks good on the outside, but lots of errors in the logs

2013-08-21 Thread Shawn Heisey
On 8/21/2013 6:23 PM, dmarini wrote: Shawn,Thanks for your reply. All of these suggestions look like good ideas and I will follow up. We are running Solr via the Jetty process on windows as well as all of our zookeepers on the same boxes as the clouds. The reason for this is that we're on EC2 ser

Re: 4.3 Cloud looks good on the outside, but lots of errors in the logs

2013-08-21 Thread dmarini
Shawn,Thanks for your reply. All of these suggestions look like good ideas and I will follow up. We are running Solr via the Jetty process on windows as well as all of our zookeepers on the same boxes as the clouds. The reason for this is that we're on EC2 servers so it gets ultra expensive to have

Re: Geo spatial clustering of points

2013-08-21 Thread Chris Atkinson
Did you get any resolution for this? I'm about to implement something identical. On 3 Jul 2013 23:03, "Jeroen Steggink" wrote: > Hi, > > I'm looking for a way to clustering (or should I call it group) geo > spatial points on map based on the current zoom level and get the median > coordinate for

RE: removing duplicates

2013-08-21 Thread Petersen, Robert
This would describe the facet parameters we're talking about: http://wiki.apache.org/solr/SimpleFacetParameters Query something like this: http://localhost:8983/solr/select?q=*:*&fl=id&rows=0&facet=true&facet.limit=-1&facet.field=&facet.mincount=2 Then filter on each facet returned with a filter

Re: removing duplicates

2013-08-21 Thread Aloke Ghoshal
Hi, This will help you identify the duplicates: q=*:*&fl=id&facet=true&facet.mincount=2&rows=0&facet.field= To actually remove them from Solr, you will have to do something like Robert suggested. Write an application that uses the results to build a delete by id query ( http://wiki.apache.org/sol

Re: removing duplicates

2013-08-21 Thread Ali, Saqib
Thanks Aloke and Robert. Can you please give me code/query snippets? (newbie here) On Wed, Aug 21, 2013 at 2:31 PM, Aloke Ghoshal wrote: > Hi, > > Facet by one of the duplicate fields (probably by the numeric field that > you mentioned) and set facet.mincount=2. > > Regards, > Aloke > > > On Th

RE: removing duplicates

2013-08-21 Thread Petersen, Robert
Hi Perhaps you could query for all documents asking for the id field to be returned and then facet on the field you say you can key off of for duplicates. Set the facet mincount to 2, then you would have to filter on each facet value and page through all doc IDs (except skip the first document

Re: removing duplicates

2013-08-21 Thread Aloke Ghoshal
Hi, Facet by one of the duplicate fields (probably by the numeric field that you mentioned) and set facet.mincount=2. Regards, Aloke On Thu, Aug 22, 2013 at 2:44 AM, Ali, Saqib wrote: > hello, > > We have documents that are duplicates i.e. the ID is different, but rest of > the fields are sam

removing duplicates

2013-08-21 Thread Ali, Saqib
hello, We have documents that are duplicates i.e. the ID is different, but rest of the fields are same. Is there a query that can remove duplicate, and just leave one copy of the document on solr? There is one numeric field that we can key off for find duplicates. Please advise. Thanks

Solr 4.4, enablePositionIncrements=true and PhraseQueries

2013-08-21 Thread Ronald K. Braun
Hello, I'm working on an upgrade from solr 1.4.1 to 4.4. One of my field analyzers uses StopWordFilter, which as of 4.4 is forbidden to set enablePositionIncrements to false. As a consequence, some hand-constructed phrase queries (basically generated via calls to SolrPluginUtils.parseQueryString

Re: edismax type query in different sets of fields?

2013-08-21 Thread Rafael Calsaverini
Hum! It seems to be exactly what I need. Thanks! I'll look for it in the docs. Rafael Calsaverini Data Scientist @ Catho cell: +55 11 7525.6222 * * *8d21881718d00d997686177be1c27360493b23ea0258f5e6534437e6* On Wed, Aug 21, 2013 at 12:08 PM, Erick Erickson wrote: > Ha

Re: loading solr from Pig?

2013-08-21 Thread Utkarsh Sengar
That's a good point, we load data from pig to solr everyday. 1. What we do: Pig jobs creates a csv dump, scp it over to a solr node and UpdateCSV request handler loads the data in solr. A complete rebuild of index for about 50M documents (20GB) takes 20mins (pig job which pulls and processes data

Re: 4.3 Cloud looks good on the outside, but lots of errors in the logs

2013-08-21 Thread Shawn Heisey
On 8/20/2013 10:52 PM, dmarini wrote: I'm running a solr 4.3 cloud in a 3 machine setup that has the following configuration: each machine is running 3 zookeepers on different ports each machine is running a jetty instance PER zookeeper.. Essentially, this gives us the ability to host 3 isolated

Re: Solr Indexing Status

2013-08-21 Thread Shalin Shekhar Mangar
Yes, you can invoke http://:/solr/dataimport?command=status which will return how many Solr docs have been added etc. On Wed, Aug 21, 2013 at 4:56 PM, Prasi S wrote: > Hi, > I am using solr 4.4 to index csv files. I am using solrj for this. At > frequent intervels my user may request for "Status"

Re: Data Import faile in solr 4.3.0

2013-08-21 Thread Shalin Shekhar Mangar
I guess you are trying to index another Solr index via DIH's SolrEntityProcessor. That processor wasn't really designed for migrating huge indexes. You're better off re-indexing content directly to another Solr. As far as this error is concerned, my guess is that it is due to an error thrown by yo

loading solr from Pig?

2013-08-21 Thread geeky2
Hello All, Is anyone loading Solr from a Pig script / process? I was talking to another group in our company and they have standardized on MongoDB instead of Solr - apparently there is very good support between MongoDB and Pig - allowing users to "stream" data directly from a Pig process in to Mo

How to SOLR file in svn repository

2013-08-21 Thread jiunarayan
I have a svn respository and svn file path. How can I SOLR search content on the svn file. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-SOLR-file-in-svn-repository-tp4085904.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sharing SolrCloud collection configs w/overrides

2013-08-21 Thread Tim Vaillancourt
Well, the mention of DIH is a bit off-topic. I'll simplify and say all I need is the ability to set ANY variables in solrconfig.xml without having to make N number of copies of the same configuration to achieve that. Essentially I need 10+ collections to use the exact same config dir in Zookeeper w

Re: Solr Indexing Status

2013-08-21 Thread Furkan KAMACI
You know the size of CSV files and you can calculate it if you want. 2013/8/21 Prasi S > Hi, > I am using solr 4.4 to index csv files. I am using solrj for this. At > frequent intervels my user may request for "Status". I have to send get > something like in DIH " Indexing in progress.. Added x

Filter results based on their number of terms, relative to the search query

2013-08-21 Thread Spyros Kapnissis
Hi, We have an index of several small expressions, let's say 4-20 words on average. I have a requirement to search for "approximate" results only, relevant to the search query.  For example, when someone searches for (+a +b +c), we would like to return only these expressions that contain all t

Re: Facing Solr performance during query search

2013-08-21 Thread Jack Krupansky
I'd like to see a screen shot of a search results web page that has 2,000 facets. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Wednesday, August 21, 2013 11:24 AM To: solr-user@lucene.apache.org Subject: Re: Facing Solr performance during query search ~2,000 facets

Re: What filter to use to search with spaces omitted/included between words?

2013-08-21 Thread Erick Erickson
Jack: That's a consequence of keyword tokenizer I hadn't thought of before Erick On Wed, Aug 21, 2013 at 11:17 AM, Jack Krupansky wrote: > The reason that a query of "bestbuy" matches indexing of "best buy" in > this case is that the keyword tokenizer treats the entire input text as one >

Re: Facing Solr performance during query search

2013-08-21 Thread Erick Erickson
~2,000 facets kind of worries me, but let's skip that for now. Your original problem statement was that replication was the thing that changed. So the first thing I'd do is not replicate. If you turn it off, do your slaves still perform poorly? Allocating that much RAM to the JVM is probably not

Re: What filter to use to search with spaces omitted/included between words?

2013-08-21 Thread Jack Krupansky
The reason that a query of "bestbuy" matches indexing of "best buy" in this case is that the keyword tokenizer treats the entire input text as one token, including the space between "best" and "buy" and then the WDF treats any embedded white space as if it were punctuation and then the catenateA

Re: Prevent Some Keywords at Analyzer Step

2013-08-21 Thread Furkan KAMACI
How can I remove unnecessary tokens after shingle filter? 2013/8/20 Jeff Porter > Why not use ShingleFilterFactory and then match on that token if you find > it? > > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory > > > Jeff Porter > co-founder > email: j

Re: What filter to use to search with spaces omitted/included between words?

2013-08-21 Thread Erick Erickson
Keyword tokenizer will probably cause you problems, since you'll never match "best". and searching name:best AND name:buy would fail as well. And I'm surprised this is working at all, I'd really scrutinize why bestbuy matches an index with Best Buy, that makes no sense on the surface. If you have

Re: edismax type query in different sets of fields?

2013-08-21 Thread Erick Erickson
Have you tried "nested queries"? I think you can specify the full edismax syntax on the URL in a nested query for your second search field... Best, Erick On Tue, Aug 20, 2013 at 5:17 PM, Rafael Calsaverini < rafael.calsaver...@gmail.com> wrote: > Hi there, > > > suppose I have documents with fi

Re: Sharing SolrCloud collection configs w/overrides

2013-08-21 Thread Erick Erickson
Hmmm. I'm going to leave the DIH stuff for someone else, but could you raise a JIRA (and assign it to me) to think about a way to add a core.properties file to the collection creation step? I haven't thought it through very well, but currently I think we just assign some defaults. Some thought tha

Re: convert text file to solr document where delimiter fields are fields of document

2013-08-21 Thread Jack Krupansky
Yes, post.jar supports csv files. -- Jack Krupansky -Original Message- From: bharat Sent: Wednesday, August 21, 2013 1:57 AM To: solr-user@lucene.apache.org Subject: Re: convert text file to solr document where delimiter fields are fields of document Thanks all of you for quick repl

Re: get term frequency, just only keywords search

2013-08-21 Thread Jack Krupansky
Probably your best bet is to use the "debug.explain.structured" parameter, set to true, to get the XML version of the debug explain section and then you can traverse looking for the desired phrase and then the "phraseFreq". But, be aware that the terms in a Lucene query have been "analyzed", so

Re: Solr Filter Query

2013-08-21 Thread Jack Krupansky
As with many features in Solr, there is no hard limit per se, but the "rule" is to use the feature in moderation. If you find yourself using a "big" filter query, it likely means that you have chosen a poor design or are misusing Solr in some way. The response should be to correct your design,

Re: Solr 4.4 problem with loading DisMaxRequestHandler

2013-08-21 Thread Jack Krupansky
You must have upgraded from a very old release of Solr. There is no DisMaxRequestHandler. Just use the standard request handler for "/select" in the Solr example config and then add a boolean for the "defType" parameter to set it to dismax to enable the dismax query parser. -- Jack Krupansky

Re: Prevent Some Keywords at Analyzer Step

2013-08-21 Thread Jeff Porter
Why not use ShingleFilterFactory and then match on that token if you find it? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory Jeff Porter co-founder email: jpor...@o2ointeractive.com mobile: +1-303-332-4006 On Aug 19, 2013, at 11:23 AM, Dan Davis wrote: >

Data Import faile in solr 4.3.0

2013-08-21 Thread Montu v Boda
when we import the all index of solr 3.5 to 4.3 then import goes fail each and every time due to below error. Caused by: org.apache.solr.common.SolrException: parsing error Caused by: org.paache.http.MalformedChunkCodingException: Unexpected content at the end of chunk we have 50 lac document is

Re: Issue in Swap Space display at Solr Admin

2013-08-21 Thread Stefan Matheis
Thanks Vladimir, i've created SOLR-5178 - Stefan On Wednesday, August 21, 2013 at 1:29 PM, Vladimir Vagaitsev wrote: > Stefan, > > It's done! Here is the "system" key: > > "system":{"name":"Linux","version":"3.2.0-39-virtual","arch":"amd64","systemLoadAverage":3.38,"committedVirtualMemorySiz

Re: Measuring SOLR performance

2013-08-21 Thread Dmitry Kan
Hi Roman, I have noticed a difference with different solr.xml config contents. It is probably legit, but thought to let you know (tests run on fresh checkout as of today). As mentioned before, I have two cores configured in solr.xml. If the file is: [code] [/code] then the

Re: Issue in Swap Space display at Solr Admin

2013-08-21 Thread Vladimir Vagaitsev
Stefan, It's done! Here is the "system" key: "system":{"name":"Linux","version":"3.2.0-39-virtual","arch":"amd64","systemLoadAverage":3.38,"committedVirtualMemorySize":32454287360,"freePhysicalMemorySize":912945152,"freeSwapSpaceSize":0,"processCpuTime":5627465000,"totalPhysicalMemorySize":71

Solr Indexing Status

2013-08-21 Thread Prasi S
Hi, I am using solr 4.4 to index csv files. I am using solrj for this. At frequent intervels my user may request for "Status". I have to send get something like in DIH " Indexing in progress.. Added xxx documents". Is there anything like in dih, where we can fire a command=status to get the status

Re: "Path must not end with / character" error during performance tests

2013-08-21 Thread Tanya
Eric, The issue is that the problem happens not on every call, I assume that there is not a configuration problem. BR Tanya >It looks like you've specified your zkHost (?) as something like >machine:port/solr/ > >rather than >machine:port/solr > >Is that possible? > >Best, >Erick On Tue,

Re: Issue in Swap Space display at Solr Admin

2013-08-21 Thread Stefan Matheis
Vladimir As Shawn said .. there is/was a change in configuration - my explanation was perhaps not the best. if you try that one, it should work: http://localhost:8983/solr/collection1/admin/system?wt=json otherwise, let us know which is the url you're using to access the Admin UI - Stefan On

Solr 4.4 problem with loading DisMaxRequestHandler

2013-08-21 Thread danielitos85
Hi guys, I'm using a clean solr 4.4 installation and I have add in my solrconfig.xml the following lines: all 0.01 *:* 0 regex but when I start my solr he return an error: *Caused by: java.lang.ClassNotFoundException: solr.DisMaxRequestHandler* In my dist

Re: Issue in Swap Space display at Solr Admin

2013-08-21 Thread Vladimir Vagaitsev
Stefan. the link still doesn't work. I'm usiing solr-4.3.1 and I have the following solr.xml file: 2013/8/20 Shawn Heisey > On 8/20/2013 9:49 AM, Stefan Matheis wrote: > >> Vladimir >> >> That shouldn't matter .. perhaps i did not provide enough information? >> depen

Re: Solr Filter Query

2013-08-21 Thread tamanjit.bin...@yahoo.co.in
I am unsure what you mean when you say /how big a filter query can be ? /. Do you mean how long can a single filter query can be or a limit on number of filter queries that can be put? For the former you may want to visit the maxBooleanClauses in your solrconfig. Try the link: he tottp://wiki.apac

Re: get term frequency, just only keywords search

2013-08-21 Thread danielitos85
Thanks a lot guys, @Jack in my search I use dismax (how defType) and I search either term or phrase, but I need to get the number that show me how many time that term or phrase is in the document. I could get it from debugQuery but I would like get it directly from the results. What do you sugge

Re: Facing Solr performance during query search

2013-08-21 Thread sivaprasad
Here I am providing the slave solrconfig information. 1 35 35 6 1 1024 20 static firstSearcher warming in solrconfig.xml

Re: High memory usage on solr 3.6

2013-08-21 Thread Samuel García Martínez
You were right. I've attached VisualVM to the process and forced a System.gc(): used memory went down to near 1.8gb. So, i don't understand VisualVM dump reports. It said that all those char[] references have a SolrDispatchFilter instance as CG root. Another example (1M+ references with the exact