Re: when to use Fieldnorm ??

2013-10-14 Thread Karan jindal
thanks shawn for quick insight about it. I will look more into this.. and will share my experience Thanks, Karan Jindal On Tue, Oct 15, 2013 at 12:10 AM, Shawn Heisey wrote: > On 10/14/2013 3:05 AM, Karan jindal wrote: > >> Is there standard way of checking to know whether switching off fiel

Re: fq caching question

2013-10-14 Thread Tim Vaillancourt
Thanks Koji! Cheers, Tim On 14/10/13 03:56 PM, Koji Sekiguchi wrote: Hi Tim, (13/10/15 5:22), Tim Vaillancourt wrote: Hey guys, Sorry for such a simple question, but I am curious as to the differences in caching between a "combined" filter query, and many separate filter queries. Here ar

Re: DIH - stream file with solrEntityProcessor

2013-10-14 Thread Lance Norskog
Can you do this data in CSV format? There is a CSV reader in the DIH. The SEP was not intended to read from files, since there are already better tools that do that. Lance On 10/14/2013 04:44 PM, Josh Lincoln wrote: Shawn, I'm able to read in a 4mb file using SEP, so I think that rules out th

Re: Find documents that are composed of % words

2013-10-14 Thread Chris Hostetter
: bq: but you cannot ask this to client. : : You _can_ ask this of a client. IMO you are obligated to. +1. >> When you are given a requirement/request from your client, >> always verify that you aren't dealing with an XY Problem: >> http://people.apache.org/~hossman/#xyproblem ... >> Don'

Re: Replace NULL with 0 while Indexing

2013-10-14 Thread Shawn Heisey
On 10/11/2013 3:02 PM, keshari.prerna wrote: One of my indexing field have NULL values and i want it to be replaces with 0 while indexing itself. So that when i search after indexing it gives me 0 instead of NULL. Most of your other replies have concentrated on the SQL side of things. One men

Re: DIH - stream file with solrEntityProcessor

2013-10-14 Thread Josh Lincoln
Shawn, I'm able to read in a 4mb file using SEP, so I think that rules out the POST buffer being the issue. Thanks for suggesting I test this. The full file is over a gig. Lance, I'm actually pointing SEP at a static file (I simply named the file "select" and put it on a Web server). SEP thinks it

Re: Replace NULL with 0 while Indexing

2013-10-14 Thread Developer
You can also use SELECT ISNULL(myColumn, 0 ) FROM myTable Reference: http://www.w3schools.com/sql/sql_isnull.asp -- View this message in context: http://lucene.472066.n3.nabble.com/Replace-NULL-with-0-while-Indexing-tp4095059p4095550.html Sent from the Solr - User mailing list archive at Nabb

prepareCommit vs Commit

2013-10-14 Thread Phani Chaitanya
Hi all, I'd like to know a bit more in detail about what is happening behind the scenes in case of prepareCommit vs Commit. Also, I read some where in the comments of the lucene/solr code (I don't rememebr, but I'll try to dig it) that if a indexing request comes while commit is requested, it o

Re: fq caching question

2013-10-14 Thread Koji Sekiguchi
Hi Tim, (13/10/15 5:22), Tim Vaillancourt wrote: Hey guys, Sorry for such a simple question, but I am curious as to the differences in caching between a "combined" filter query, and many separate filter queries. Here are 2 example queries, one with combined fq, one separate: 1) "/select?q=*:

Re: DIH - stream file with solrEntityProcessor

2013-10-14 Thread Lance Norskog
On 10/13/2013 10:02 AM, Shawn Heisey wrote: On 10/13/2013 10:16 AM, Josh Lincoln wrote: I have a large solr response in xml format and would like to import it into a new solr collection. I'm able to use DIH with solrEntityProcessor, but only if I first truncate the file to a small subset of the

Average term position

2013-10-14 Thread Saar Carmi
I hope someone could give me some direction on what to read in order to implement the following: Given a query and a term, how could I calculate the average position of the term within every document in the resultset and return that average? I am looking for the fastest (performance wise) solutio

RE: Please any idea? Highlighting exact phrases with solr

2013-10-14 Thread Bryan Loofbourrow
Sil, When you switched over to using the Fast Vector Highlighter, did you change your schema so that the fields that you want to highlight provide term vector information, and reindex your documents? Term vectors are necessary when using the Fast Vector Highlighter. Posting your schema may show va

Re: My posts are NOT getting accepted by the mailing list.

2013-10-14 Thread Developer
Thanks Shawn. Actually some of my posts are still pending while others were successfully accepted by mailing list. I never used HTML formatting.. but hopefully I am not listed as a spammer -- View this message in context: http://lucene.472066.n3.nabble.com/Re-My-posts-are-NOT-getting-accepted-

fq caching question

2013-10-14 Thread Tim Vaillancourt
Hey guys, Sorry for such a simple question, but I am curious as to the differences in caching between a "combined" filter query, and many separate filter queries. Here are 2 example queries, one with combined fq, one separate: 1) "/select?q=*:*&fq=type:bid&fq=user_id:3" 2) "/select?q=*:*&fq=

Re: Update existing documents when using ExtractingRequestHandler?

2013-10-14 Thread Jeroen Steggink
Thanks for your advice Erick and Jason. I implemented the document extraction on a separate server, indeed better load balancing and error handling. Cheers, Jeroen On 10-10-2013 17:09, Jason Hellman wrote: As an endorsement of Erick's like, the primary benefit I see to processing through you

Re: My posts are NOT getting accepted by the mailing list.

2013-10-14 Thread Shawn Heisey
On 10/14/2013 12:39 PM, Developer wrote: For some reason, my posts are not getting accepted my mailing list even though I am a subscriber for more than 3 years now. Did anything change in recent past? Do I need to subscribe to this list again? One of the most common reasons that email will get

Re: when to use Fieldnorm ??

2013-10-14 Thread Shawn Heisey
On 10/14/2013 3:05 AM, Karan jindal wrote: Is there standard way of checking to know whether switching off fieldNorm helps or not? If you will *NEVER* care about how the length of a field affects your relevancy score, you can omit norms for that field. If you don't care at *all* about how ma

My posts are NOT getting accepted by the mailing list.

2013-10-14 Thread Developer
For some reason, my posts are not getting accepted my mailing list even though I am a subscriber for more than 3 years now. Did anything change in recent past? Do I need to subscribe to this list again? -- View this message in context: http://lucene.472066.n3.nabble.com/My-posts-are-NOT-gettin

Re: solrnet sample

2013-10-14 Thread Developer
You can get most of the faceting information using the below link, SOLRNet faceting info: https://github.com/mausch/SolrNet/blob/master/Documentation/Facets.md SOLR faceting info: http://wiki.apache.org/solr/SolrFacetingOverview -- View this message in context: http://lucene.472066.n3.nabble

Re: Clustering unstructured text data

2013-10-14 Thread tamanjit.bin...@yahoo.co.in
You may want to have a look at SolrJ -- View this message in context: http://lucene.472066.n3.nabble.com/Clustering-unstructured-text-data-tp4095241p4095444.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Fetch Unique Values

2013-10-14 Thread tamanjit.bin...@yahoo.co.in
/fetch first 10,000 unique records based on a field/ Do you mean fetch 10,000 records each with a unique value of the field? Is Grouping what you are looking for? -- View this message in context: http://lucene.472066.n3.nabble.com/Fetch-Unique-Va

Re: [Schemaless mode] Solr admin - core schema exception

2013-10-14 Thread Steve Rowe
Hi Alessandro, I know I've seen this problem in the past, but I just tried it with Solr 4.5, and it works as expected - the URL shown in the Schema view is - notice that the file is not "n

Re: Doing time sensitive search in solr

2013-10-14 Thread Erick Erickson
When you specify a sort parameter it totally overrides the scoring. You can specify multiple sort criteria, e.g. both live_dt and score. If you specify two sort criteria, any ties in the first are broken by the second and so on through as many sort criteria as you have specified. Note that specify

Re: Concurent indexing

2013-10-14 Thread Steve Rowe
Hi maephisto, This issue can cause an update deadlock, and may have caused the problem you were seeing: https://issues.apache.org/jira/browse/SOLR-4327 - a fix will be included in forthcoming 4.5.1. Steve On Oct 14, 2013, at 10:20 AM, maephisto wrote: > Thank you! > > I was worried because

Re: Concurent indexing

2013-10-14 Thread maephisto
Thank you! I was worried because i was experimenting with this system, and at some point i was processing 2 big files and both indexing processes had added about 750k docs when suddenly Solr simply refused to accept any more added docs. Querying was working fine but trying to add 1 more single doc

Re: Concurent indexing

2013-10-14 Thread Jason Hellman
The limitations on how many threads you can use to load data is primarily driven by factors on your hardware: CPU, heap usage, I/O, and the like. It is common for most index load processes to be able to handle more incoming data on the Solr side of the equation than can typically be loaded fro

Concurent indexing

2013-10-14 Thread maephisto
Hi, I have a collection (numShards=3, replicationFactor=2) split on 2 machines. Since the amount of data is huge I have to index, I would like start multiple instances of the same process that would index data to Solr. Is there any limitation or counter-indication is this area? The indexing clie

Re: Fwd: Index JTS Point in Solr/Lucene index

2013-10-14 Thread Guido Medina
Yeap, AFAIK you can only send the field in WKT format POINT (X Y), here is my definition for lat lons using polygons in the map: *JTS field definition:* class="solr.SpatialRecursivePrefixTreeFieldType" spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFact

Re: Storing 2 dimension array in Solr

2013-10-14 Thread David Philip
Hi, I will check for pesudo join. Jack, I doubt further de-normalization. Rest of the points that you told me, I will take them. Thank you. Basically, We have 2 different sor indexes. One table is rarely updated but this group-disease table has frequent update and new dieasese are added very o

Re: Fwd: Index JTS Point in Solr/Lucene index

2013-10-14 Thread Shahbaz lodhi
Thanks Guido for the reply, Just to clarify; this means that we cannot index JTS POINT in format like Pt(x=55.76056,y=24.19167). Is that so? Thanks again. On Mon, Oct 14, 2013 at 4:20 PM, Guido Medina wrote: > WKT format should work, like explained in the wiki: > > http://en.wikipedia.org/wiki

Solr DocValues - String

2013-10-14 Thread Michael Tyler
Hi All, I wanted to learn more about docValues. I did a fair google search but I didn't understand on the point that how do I use docvalues as column fields. How can we use this as column stride fields? Right now, we are having fewer data in hbase, we are thinking to move it to solr itself if

Re: SolrDocumentList - bitwise operation

2013-10-14 Thread Michael Tyler
Hi Shawn, This is time consuming operation. I already have this in my application . I was pondering whether I can get bit set from both the solr indexes , bitset.and then retrieve only those matched? I don't know how do I retrieve bitset. - wanted to try this and test the performance. Regards

Re: Fwd: Index JTS Point in Solr/Lucene index

2013-10-14 Thread Guido Medina
WKT format should work, like explained in the wiki: http://en.wikipedia.org/wiki/Well-known_text Guido. On 14/10/13 11:50, Shahbaz lodhi wrote: Hi, *Story:* I am trying to index *JTS point* in following format; not successful though: Pt(x=55.76056,y=24.19167) It is the format that i get by ct

Index JTS Point in Solr/Lucene index

2013-10-14 Thread Shahbaz lodhi
Hi, *Story:* I am trying to index *JTS point* in following format; not successful though: Pt(x=55.76056,y=24.19167) It is the format that i get by ctx.readShape( shapeString ). I don't get any error at reading shape or adding shape to solrInputDocument but prompts "*error reading WKT*" on adding

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-14 Thread Peter Schmidt
But it's used, it's in the JAVA_OPTIONS listet by service jetty check 2013/10/14 Peter Schmidt > But the flag is not listed under the Dashboard->Args in Solr Admin > Interface. > > > 2013/10/14 Peter Schmidt > >> It is necessary to configure the update-alternatives for Oracle Java JDK >> 7. A

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-14 Thread Peter Schmidt
But the flag is not listed under the Dashboard->Args in Solr Admin Interface. 2013/10/14 Peter Schmidt > It is necessary to configure the update-alternatives for Oracle Java JDK > 7. Afterwards i can use the -server flag > > > 2013/10/14 Peter Schmidt > >> I downloaded the Linux 64bit version

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-14 Thread Peter Schmidt
It is necessary to configure the update-alternatives for Oracle Java JDK 7. Afterwards i can use the -server flag 2013/10/14 Peter Schmidt > I downloaded the Linux 64bit version jdk-7u40-linux-x64.tar.gz > > > 2013/10/11 Guido Medina > >> Then I think you downloaded the wrong JDK 7 (32bits JDK

Fwd: Index JTS Point in Solr/Lucene index

2013-10-14 Thread Shahbaz lodhi
Hi, *Story:* I am trying to index *JTS point* in following format; not successful though: Pt(x=55.76056,y=24.19167) It is the format that i get by ctx.readShape( shapeString ). I don't get any error at reading shape or adding shape to solrInputDocument but prompts "*error reading WKT*" on adding

SpanNot Queries in Lucene 4.4

2013-10-14 Thread Ankit Kumar
Does ComplexPhraseQueryParser of lucene supports SpanNot queries implicitly or we need to change the QueryPraser.jj file to identify the SpanNot queries ??

Fetch Unique Values

2013-10-14 Thread kobe.free.wo...@gmail.com
Hi, I wish to execute a Solr query and fetch first 10,000 unique records based on a field. How do I achieve this in Solr? I am using 4.2.0 version. Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/Fetch-Unique-Values-tp4095371.html Sent from the Solr - Us

Re: when to use Fieldnorm ??

2013-10-14 Thread Karan jindal
Is there standard way of checking to know whether switching off fieldNorm helps or not? Regards, Karan Jindal On Mon, Oct 14, 2013 at 2:30 PM, Karan jindal wrote: > Thanks Upayavira for a quick reply, > > True what you have said. But the example you have given is more of > descriptive nature. >

Re: when to use Fieldnorm ??

2013-10-14 Thread Karan jindal
Thanks Upayavira for a quick reply, True what you have said. But the example you have given is more of descriptive nature. Will the same argument apply if field length can't be more than some threshold (say 10) like in case of "Title"? "title" are generally short in length. Consider following exa

Re: Problems using DataImportHandler and TikaEntityProcessor

2013-10-14 Thread PeteBleackley
OK, so I put my pdf files in a directory /path/to/pdf, and edited example-DIH/solr/tika/conf/tika-data-config.xml to contain the parameter What should I do next? Shawn Heisey-4 wrote > On 10/11/2013 9:32 A

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-14 Thread Peter Schmidt
I downloaded the Linux 64bit version jdk-7u40-linux-x64.tar.gz 2013/10/11 Guido Medina > Then I think you downloaded the wrong JDK 7 (32bits JDK?), if you are > running JDK 7 64bits the -server flag should be recognized. According to > the stackoverflow link you mentioned before. > > Guido. > >

Re: Please any idea? Highlighting exact phrases with solr

2013-10-14 Thread Silvia Suárez
Good morning, Please, help me giving any idea/solution to the problem? Thanks a lot in advance Sil, Silvia Suárez Barón I+D+I 972 989 470 / s...@anpro21.com /

Re: when to use Fieldnorm ??

2013-10-14 Thread Upayavira
You search for the word "jack". Which of these three field values best matches? 1) Jack is great. 2) Billy was a young man. Billy studied well and lived well. Jack didn't. Billy went travelling and had a great time. 3) Billy didn't actually like Jack. Jack could at times be difficult. Jack would g

when to use Fieldnorm ??

2013-10-14 Thread Karan jindal
Hi all, I have a general query about fieldNorm Is it advisable to use fieldNorm (which kinds of gives importance to shorter length fields). Is there any set of standard factors on which the decision of turning fieldNorm on/off can be taken? *In my use case:-* I have a user generated data and prim