I am aware that IDF is not distributed. Suppose I have to boost or give higher
rank to documents which are matching in a specific/particular shard, how can I
accomplish that?
**
This message may contain conf
, Jul 19, 2012 at 7:29 AM, Husain, Yavar wrote:
> I have set some of my fields to be NGram Indexed. Have also set analyzer both
> at query as well as index level.
>
> Most of the stuff works fine except for use cases where I simply interchange
> couple of characters.
>
>
I have set some of my fields to be NGram Indexed. Have also set analyzer both
at query as well as index level.
Most of the stuff works fine except for use cases where I simply interchange
couple of characters.
For an example: "springfield" retrieves correct matches, "springfi" retrieves
correc
create the following grams while indexing
for 'ludlow': lu lud ludl ludlo ludlow and hence would not match to 'ludlwo'.
Either you need to create gram while querying also or use Edit Distance.
On Wed, Jul 18, 2012 at 7:43 PM, Husain, Yavar wrote:
>
>
>
> I have configur
I have configured NGram Indexing for some fields.
Say I search for the city Ludlow, I get the results (normal search)
If I search for Ludlo (with w ommitted) I get the results
If I search for Ludl (with ow ommitted) I still get the results
I know that they are all partial strings of the main
@lucene.apache.org
Subject: Re: Solr On Fly Field creation from full text for N-Gram Indexing
You can use "Regex Transformer" to extract from a source field.
See:
http://wiki.apache.org/solr/DataImportHandler#RegexTransformer
-- Jack Krupansky
-Original Message-----
From: Husain, Yavar
Sent: Thu
I have full text in my database and I am indexing that using Solr. Now at
runtime i.e. when the indexing is going on can I extract certain parameters
based on regex and create another field/column on the fly using Solr for that
extracted text?
For example my DB has just 2 columns (DocId & FullT
I am sorry, i should have raised this issue on tomcat forums. However just was
trying my luck here as it was indirectly related to solr.
From: Husain, Yavar
Sent: Monday, April 23, 2012 11:07 PM
To: solr-user@lucene.apache.org
Subject: Apache Tomcat 6
Solr 3.5 was not returning results. To my surprise Tomcat 6.x (64 bit) was not
running on my Windows. There were absolutely no errors in the logs, no crash
dumps nothing. I restarted it and everything seems to be fine now.
Went to the Windows Event viewer and exported the following information
t, if you use a
fieldType with the increment gap > 1 (the default is often set to 100), phrase
queries (slop) will perform differently depending upon which option you choose.
Best
Erick
On Thu, Mar 15, 2012 at 10:49 AM, Husain, Yavar wrote:
> Say I have around 30-40 fields (SQL Ta
Since Erick is really active answering now so posting a quick question :)
I am using:
DIH
Solr 3.5 on Windows
Building Auto Recommendation Utility
Having around 1 Billion Query Strings (3-6 words each) in database. Indexing
them using NGram.
Merge Factor = 30
Auto Commit not set.
DIH halted a
Say I have around 30-40 fields (SQL Table Columns) indexed using Solr from the
database. I concatenate those fields into one field by using Solr copyfield
directive and than make it default search field which I search.
If at the database level itself I perform concatenation of all those fields
Thanks a ton.
From: Li Li [fancye...@gmail.com]
Sent: Thursday, March 15, 2012 12:11 PM
To: Husain, Yavar
Cc: solr-user@lucene.apache.org
Subject: Re: Solr out of memory exception
it seems you are using 64bit jvm(32bit jvm can only allocate about 1.5GB
r out of memory exception
how many memory are allocated to JVM?
On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar wrote:
> Solr is giving out of memory exception. Full Indexing was completed fine.
> Later while searching maybe when it tries to load the results in memory it
> starts giving this
I have ngram-indexed 2 fields (columns in the database) and the third one is my
full text field. Now my default text field is the full text field and while
querying I use dismax handler and specify in it both the ngrammed field with
certain boost values and also full text field with a certain
A weird behavior with respect to "defType". Any clues will be appreciated.
Query 1:
http://localhost:8085/solr/select/?q=abc&version=2.2&start=0&rows=10&indent=on&defType=dismax
[defType with capital T -- does not fetch results]
Query 2:
http://localhost:8085/solr/select/?q=abc&version=2.2&sta
: Spelling Corrector Algorithm
On Thu, Mar 1, 2012 at 6:43 AM, Husain, Yavar wrote:
> Hi
>
> For spell checking component I set extendedResults to get the frequencies and
> then select the word with the best frequency. I understand the spell check
> algorithm based on Edit Distance.
tting
you know which suggestions are going to truly return hits in context (and how
many).
4. Try Jaro-Winkler (as mentioned above).
Hope this helps. But in the end, especially with 1-word queries, I doubt even
the best algorithms are going to always accurately guess what the user wanted.
J
Hi
For spell checking component I set extendedResults to get the frequencies and
then select the word with the best frequency. I understand the spell check
algorithm based on Edit Distance. For an example:
Query to Solr: Marien
Spell Check Text Returned: Marine (Freq: 120), Market (Freq: 900)
Will testing Solr based on duplicated data in the database result in same
performance statistics as compared to testing Solr with completely unique data?
By test I mean routine performance tests like time to index, time to search
etc. Will solr perform any kind of optimization that will result i
I was running 32 bit Java (JDK, JRE & Tomcat) on my 64 bit Windows. For
indexing I was not able to allocate more than 1.5GB Heap Space on my machine.
Each time my tomcat process used to touch the upper bound (i.e. 1.5GB) very
quickly so I thought of working on 64 bit Java/Tomcat. Now I dont see
I know this is a Solr forum however my problem is related to Solr running on
Tomcat running on Windows 64 bit OS.
I am running a 32 bit JVM on a 64 bit Windows 2008 Server. The max heap space I
am able to allocate is around 1.5 GB though I have 10 GB of RAM on my system
and there is no other pr
This is a generic Machine Learning question and is not related to Solr (for
which this thread is). You can ask this question on Stackoverflow.com.
However one of the approaches: Just go through the chapter in O'reilly
Programming Collective Intelligence on Non Negative Matrix Factorization. That
When I start solr indexing RAM taken by MS SQL Server 2008 R2 also keeps on
increasing & initially from some 1GB it went on to 3.6 GB (when indexing was
completed for just 1 Million records/5GB). I have set the responseBuffering
parameter to adaptive in data-config.xml however it didn't help me
Hi Jiggy
When you query the index, what do you get in the tomcat logs? (Check that out
in tomcat/logs directory)
How much of Heap memory have you allocated to Tomcat?
- Yavar
From: jiggy [new...@trash-mail.com]
Sent: Wednesday, December 07, 2011 9:53 P
, 2011 12:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Autocommit & Index Size
On 12/6/2011 1:01 AM, Husain, Yavar wrote:
> In solrconfig.xml I was experimenting with Indexing Performance. When I set
> the maxDocs (in autoCommit) to say 1 documents the index size is double
>
In solrconfig.xml I was experimenting with Indexing Performance. When I set the
maxDocs (in autoCommit) to say 1 documents the index size is double to if I
just dont use autoCommit (i.e. keep it commented, i.e commit at the end only
after adding documents).
Does autoCommit affect the index
ns of Java 6 update 30 and
Java 6 update 30 build 12. We are in contact with Java on these issues and we
will update this blog once we have more information."
Should work with update 28.
Kai
-Original Message-----
From: Husain, Yavar [mailto:yhus...@firstam.com]
Sent: Monday, Novembe
the data-config.xml use this statement:
driver="net.sourceforge.jtds.jdbc.Driver"
4. Also in data-config.xml mention url like this:
"url="jdbc:jTDS:sqlserver://localhost:1433;databaseName=XXX"
5. Now run your indexing.
It should solve the problem.
-Original Mes
]
Sent: Monday, November 28, 2011 4:11 PM
To: Husain, Yavar
Cc: solr-user@lucene.apache.org
Subject: Re: Unable to index documents using DataImportHandler with MSSQL
Right.
This is REALLY weird - I've now started from scratch on another
machine (this time Windows 7), and got _exactly_ the same pr
Hi Ian
I am having exactly the same problem what you are having on Win 7 and 2008
Server http://lucene.472066.n3.nabble.com/DIH-Strange-Problem-tc3530370.html
I still have not received any replies which could solve my problem till now.
Please do let me know if you have arrived at some solution
Hi
Thanks for your replies.
I carried out these 2 steps (it did not solve my problem):
1. I tried setting responseBuffering to adaptive. Did not work.
2. For checking Database connection I wrote a simple java program to connect to
database and fetch some results with the same driver that I use
I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing
data. Indexing and all was working perfectly fine. However today when I started
full indexing again, Solr halts/stucks at the line "Creating a connection for
entity." There are no further messages after that.
.
-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: Monday, November 21, 2011 7:47 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Performance/Architecture
On 11/21/2011 12:41 AM, Husain, Yavar wrote:
> Number of rows in SQL Table (Indexed till now using Solr)
Number of rows in SQL Table (Indexed till now using Solr): 1 million
Total Size of Data in the table: 4GB
Total Index Size: 3.5 GB
Total Number of Rows that I have to index: 20 Million (approximately 100 GB
Data) and growing
What is the best practices with respect to distributing the index? Wha
Environment: Solr 1.4 on Windows/MS SQL Server
A write lock is getting created whenever I am trying to do a full-import of
documents using DIH. Logs say "Creating a connection with the database."
and the process is not going forward (Not getting a database connection). So
the indexes are no
Solr 1.4 is doing great with respect to Indexing on a dedicated physical server
(Windows Server 2008). For Indexing around 1 million full text documents
(around 4 GB size) it takes around 20 minutes with Heap Size = 512M - 1G & 4GB
RAM.
However while using Solr on a VM, with 4 GB RAM it took
Solr 1.4 is doing great with respect to Indexing on a dedicated physical server
(Windows Server 2008). For Indexing around 1 million full text documents
(around 4 GB size) it takes around 20 minutes with Heap Size = 512M - 1G & 4GB
RAM.
However while using Solr on a VM, with 4 GB RAM it took 50
38 matches
Mail list logo