Boosting documents matching in a specific shard

2012-08-23 Thread Husain, Yavar
I am aware that IDF is not distributed. Suppose I have to boost or give higher rank to documents which are matching in a specific/particular shard, how can I accomplish that? ** This message may contain conf

RE: NGram Indexing Basic Question

2012-07-20 Thread Husain, Yavar
, Jul 19, 2012 at 7:29 AM, Husain, Yavar wrote: > I have set some of my fields to be NGram Indexed. Have also set analyzer both > at query as well as index level. > > Most of the stuff works fine except for use cases where I simply interchange > couple of characters. > >

NGram Indexing Basic Question

2012-07-19 Thread Husain, Yavar
I have set some of my fields to be NGram Indexed. Have also set analyzer both at query as well as index level. Most of the stuff works fine except for use cases where I simply interchange couple of characters. For an example: "springfield" retrieves correct matches, "springfi" retrieves correc

RE: NGram for misspelt words

2012-07-18 Thread Husain, Yavar
create the following grams while indexing for 'ludlow': lu lud ludl ludlo ludlow and hence would not match to 'ludlwo'. Either you need to create gram while querying also or use Edit Distance. On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar wrote: > > > > I have configur

NGram for misspelt words

2012-07-18 Thread Husain, Yavar
I have configured NGram Indexing for some fields. Say I search for the city Ludlow, I get the results (normal search) If I search for Ludlo (with w ommitted) I get the results If I search for Ludl (with ow ommitted) I still get the results I know that they are all partial strings of the main

RE: Solr On Fly Field creation from full text for N-Gram Indexing

2012-05-10 Thread Husain, Yavar
@lucene.apache.org Subject: Re: Solr On Fly Field creation from full text for N-Gram Indexing You can use "Regex Transformer" to extract from a source field. See: http://wiki.apache.org/solr/DataImportHandler#RegexTransformer -- Jack Krupansky -Original Message----- From: Husain, Yavar Sent: Thu

Solr On Fly Field creation from full text for N-Gram Indexing

2012-05-10 Thread Husain, Yavar
I have full text in my database and I am indexing that using Solr. Now at runtime i.e. when the indexing is going on can I extract certain parameters based on regex and create another field/column on the fly using Solr for that extracted text? For example my DB has just 2 columns (DocId & FullT

RE: Apache Tomcat 6 service terminated unexpectedly. It has done this 2 time(s).

2012-04-23 Thread Husain, Yavar
I am sorry, i should have raised this issue on tomcat forums. However just was trying my luck here as it was indirectly related to solr. From: Husain, Yavar Sent: Monday, April 23, 2012 11:07 PM To: solr-user@lucene.apache.org Subject: Apache Tomcat 6

Apache Tomcat 6 service terminated unexpectedly. It has done this 2 time(s).

2012-04-23 Thread Husain, Yavar
Solr 3.5 was not returning results. To my surprise Tomcat 6.x (64 bit) was not running on my Windows. There were absolutely no errors in the logs, no crash dumps nothing. I restarted it and everything seems to be fine now. Went to the Windows Event viewer and exported the following information

RE: Regarding Indexing Multiple Columns Best Practise

2012-03-16 Thread Husain, Yavar
t, if you use a fieldType with the increment gap > 1 (the default is often set to 100), phrase queries (slop) will perform differently depending upon which option you choose. Best Erick On Thu, Mar 15, 2012 at 10:49 AM, Husain, Yavar wrote: > Say I have around 30-40 fields (SQL Ta

Indexing Halts for long time and then restarts

2012-03-16 Thread Husain, Yavar
Since Erick is really active answering now so posting a quick question :) I am using: DIH Solr 3.5 on Windows Building Auto Recommendation Utility Having around 1 Billion Query Strings (3-6 words each) in database. Indexing them using NGram. Merge Factor = 30 Auto Commit not set. DIH halted a

Regarding Indexing Multiple Columns Best Practise

2012-03-15 Thread Husain, Yavar
Say I have around 30-40 fields (SQL Table Columns) indexed using Solr from the database. I concatenate those fields into one field by using Solr copyfield directive and than make it default search field which I search. If at the database level itself I perform concatenation of all those fields

RE: Solr out of memory exception

2012-03-15 Thread Husain, Yavar
Thanks a ton. From: Li Li [fancye...@gmail.com] Sent: Thursday, March 15, 2012 12:11 PM To: Husain, Yavar Cc: solr-user@lucene.apache.org Subject: Re: Solr out of memory exception it seems you are using 64bit jvm(32bit jvm can only allocate about 1.5GB

RE: Solr out of memory exception

2012-03-14 Thread Husain, Yavar
r out of memory exception how many memory are allocated to JVM? On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar wrote: > Solr is giving out of memory exception. Full Indexing was completed fine. > Later while searching maybe when it tries to load the results in memory it > starts giving this

ngram synonyms & dismax together

2012-03-05 Thread Husain, Yavar
I have ngram-indexed 2 fields (columns in the database) and the third one is my full text field. Now my default text field is the full text field and while querying I use dismax handler and specify in it both the ngrammed field with certain boost values and also full text field with a certain

Dismax weird behaior wrt defType

2012-03-02 Thread Husain, Yavar
A weird behavior with respect to "defType". Any clues will be appreciated. Query 1: http://localhost:8085/solr/select/?q=abc&version=2.2&start=0&rows=10&indent=on&defType=dismax [defType with capital T -- does not fetch results] Query 2: http://localhost:8085/solr/select/?q=abc&version=2.2&sta

RE: Spelling Corrector Algorithm

2012-03-01 Thread Husain, Yavar
: Spelling Corrector Algorithm On Thu, Mar 1, 2012 at 6:43 AM, Husain, Yavar wrote: > Hi > > For spell checking component I set extendedResults to get the frequencies and > then select the word with the best frequency. I understand the spell check > algorithm based on Edit Distance.

RE: Spelling Corrector Algorithm

2012-03-01 Thread Husain, Yavar
tting you know which suggestions are going to truly return hits in context (and how many). 4. Try Jaro-Winkler (as mentioned above). Hope this helps. But in the end, especially with 1-word queries, I doubt even the best algorithms are going to always accurately guess what the user wanted. J

Spelling Corrector Algorithm

2012-03-01 Thread Husain, Yavar
Hi For spell checking component I set extendedResults to get the frequencies and then select the word with the best frequency. I understand the spell check algorithm based on Edit Distance. For an example: Query to Solr: Marien Spell Check Text Returned: Marine (Freq: 120), Market (Freq: 900)

Solr Basic Performance Testwith duplicated data

2012-02-10 Thread Husain, Yavar
Will testing Solr based on duplicated data in the database result in same performance statistics as compared to testing Solr with completely unique data? By test I mean routine performance tests like time to index, time to search etc. Will solr perform any kind of optimization that will result i

Solr Indexing Running Time 32bit vs 64bit

2012-01-23 Thread Husain, Yavar
I was running 32 bit Java (JDK, JRE & Tomcat) on my 64 bit Windows. For indexing I was not able to allocate more than 1.5GB Heap Space on my machine. Each time my tomcat process used to touch the upper bound (i.e. 1.5GB) very quickly so I thought of working on 64 bit Java/Tomcat. Now I dont see

Solr Tomcat Maximum Heap Memory

2011-12-21 Thread Husain, Yavar
I know this is a Solr forum however my problem is related to Solr running on Tomcat running on Windows 64 bit OS. I am running a 32 bit JVM on a 64 bit Windows 2008 Server. The max heap space I am able to allocate is around 1.5 GB though I have 10 GB of RAM on my system and there is no other pr

RE: Solr sentiment analysis

2011-12-15 Thread Husain, Yavar
This is a generic Machine Learning question and is not related to Solr (for which this thread is). You can ask this question on Stackoverflow.com. However one of the approaches: Just go through the chapter in O'reilly Programming Collective Intelligence on Non Negative Matrix Factorization. That

SQL Server Solr RAM issue

2011-12-08 Thread Husain, Yavar
When I start solr indexing RAM taken by MS SQL Server 2008 R2 also keeps on increasing & initially from some 1GB it went on to 3.6 GB (when indexing was completed for just 1 Million records/5GB). I have set the responseBuffering parameter to adaptive in data-config.xml however it didn't help me

RE: SolR - Index problems

2011-12-07 Thread Husain, Yavar
Hi Jiggy When you query the index, what do you get in the tomcat logs? (Check that out in tomcat/logs directory) How much of Heap memory have you allocated to Tomcat? - Yavar From: jiggy [new...@trash-mail.com] Sent: Wednesday, December 07, 2011 9:53 P

RE: Autocommit & Index Size

2011-12-06 Thread Husain, Yavar
, 2011 12:00 AM To: solr-user@lucene.apache.org Subject: Re: Autocommit & Index Size On 12/6/2011 1:01 AM, Husain, Yavar wrote: > In solrconfig.xml I was experimenting with Indexing Performance. When I set > the maxDocs (in autoCommit) to say 1 documents the index size is double >

Autocommit & Index Size

2011-12-06 Thread Husain, Yavar
In solrconfig.xml I was experimenting with Indexing Performance. When I set the maxDocs (in autoCommit) to say 1 documents the index size is double to if I just dont use autoCommit (i.e. keep it commented, i.e commit at the end only after adding documents). Does autoCommit affect the index

RE: DIH Strange Problem

2011-11-28 Thread Husain, Yavar
ns of Java 6 update 30 and Java 6 update 30 build 12. We are in contact with Java on these issues and we will update this blog once we have more information." Should work with update 28. Kai -Original Message----- From: Husain, Yavar [mailto:yhus...@firstam.com] Sent: Monday, Novembe

RE: DIH Strange Problem

2011-11-28 Thread Husain, Yavar
the data-config.xml use this statement: driver="net.sourceforge.jtds.jdbc.Driver" 4. Also in data-config.xml mention url like this: "url="jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX" 5. Now run your indexing. It should solve the problem. -Original Mes

RE: Unable to index documents using DataImportHandler with MSSQL

2011-11-28 Thread Husain, Yavar
] Sent: Monday, November 28, 2011 4:11 PM To: Husain, Yavar Cc: solr-user@lucene.apache.org Subject: Re: Unable to index documents using DataImportHandler with MSSQL Right. This is REALLY weird - I've now started from scratch on another machine (this time Windows 7), and got _exactly_ the same pr

RE: Unable to index documents using DataImportHandler with MSSQL

2011-11-27 Thread Husain, Yavar
Hi Ian I am having exactly the same problem what you are having on Win 7 and 2008 Server http://lucene.472066.n3.nabble.com/DIH-Strange-Problem-tc3530370.html I still have not received any replies which could solve my problem till now. Please do let me know if you have arrived at some solution

RE: DIH Strange Problem

2011-11-23 Thread Husain, Yavar
Hi Thanks for your replies. I carried out these 2 steps (it did not solve my problem): 1. I tried setting responseBuffering to adaptive. Did not work. 2. For checking Database connection I wrote a simple java program to connect to database and fetch some results with the same driver that I use

DIH Strange Problem

2011-11-23 Thread Husain, Yavar
I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing data. Indexing and all was working perfectly fine. However today when I started full indexing again, Solr halts/stucks at the line "Creating a connection for entity." There are no further messages after that.

RE: Solr Performance/Architecture

2011-11-22 Thread Husain, Yavar
. -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Monday, November 21, 2011 7:47 PM To: solr-user@lucene.apache.org Subject: Re: Solr Performance/Architecture On 11/21/2011 12:41 AM, Husain, Yavar wrote: > Number of rows in SQL Table (Indexed till now using Solr)

Solr Performance/Architecture

2011-11-20 Thread Husain, Yavar
Number of rows in SQL Table (Indexed till now using Solr): 1 million Total Size of Data in the table: 4GB Total Index Size: 3.5 GB Total Number of Rows that I have to index: 20 Million (approximately 100 GB Data) and growing What is the best practices with respect to distributing the index? Wha

write-lock issue

2011-11-18 Thread Husain, Yavar
Environment: Solr 1.4 on Windows/MS SQL Server A write lock is getting created whenever I am trying to do a full-import of documents using DIH. Logs say "Creating a connection with the database." and the process is not going forward (Not getting a database connection). So the indexes are no

Solr Indexing Time

2011-11-10 Thread Husain, Yavar
Solr 1.4 is doing great with respect to Indexing on a dedicated physical server (Windows Server 2008). For Indexing around 1 million full text documents (around 4 GB size) it takes around 20 minutes with Heap Size = 512M - 1G & 4GB RAM. However while using Solr on a VM, with 4 GB RAM it took

Solr Indexing Time varying each time I index

2011-11-10 Thread Husain, Yavar
Solr 1.4 is doing great with respect to Indexing on a dedicated physical server (Windows Server 2008). For Indexing around 1 million full text documents (around 4 GB size) it takes around 20 minutes with Heap Size = 512M - 1G & 4GB RAM. However while using Solr on a VM, with 4 GB RAM it took 50