RE: Newbie SolR - Need advice

2013-07-02 Thread David Quarterman
Hi Fabio,

Like Jack says, try the tutorial. But to answer your question, SOLR isn't a 
bolt on to SQLServer or any other DB. It's a fantastically fast 
indexing/searching tool. You'll need to use the DataImportHandler (see the 
tutorial) to import your data from the DB into the indices that SOLR uses. Once 
in there, you'll have more power & flexibility than SQLServer would ever give 
you!

Haven't tried SOLR on Windows (I guess your environment) but I'm sure it'll 
work using Jetty or Tomcat as web container.

Stick with it. The ride can be bumpy but the experience is sensational!

DQ

-Original Message-
From: fabio1605 [mailto:fabio.to...@btinternet.com] 
Sent: 02 July 2013 16:16
To: solr-user@lucene.apache.org
Subject: Newbie SolR - Need advice

Hi

we have a MSSQL Server which is just getting far to large now and performance 
is dying! the majority of our webservers mainly are doing search function so i 
thought it may be best to move to SolR But i know very little about it!

My questions are!

Does SolR Run as a bolt on to MSSQL - as in the data is still in MSSQL and SolR 
is just the search bit between?

Im really struggling to understand the point of SOLR etc so if someone could 
point me to a Dummies website id apprecaite it! google is throwing to much 
confusion at me!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Newbie SolR - Need advice

2013-07-02 Thread David Quarterman
Don’t worry Fabio - nobody knows everything (apart from Hossman). Following on 
from Sandeep, to use SOLR, you extract the data from your MSSQL DB using the 
DataImportHandler and you can then query it, facet it, pivot it to your heart's 
content. And fast!

You can use almost anything to build the SOLR queries - Java & PHP being 
probably most popular. There is a library for Perl I think but never tried it.

So, you keep your mssql database, you just don't use it for searches - that'll 
relieve some of the load. Searches then all go through SOLR & its Lucene 
indexes. If your various tables need SQL joins, you specify those in the 
DataImportHandler (DIH) config. That way, when SOLR indexes everything, it 
indexes the data the way you want to see it.

DIH handles the data export from mssql -> SOLR and it's not too difficult to 
set up. 

You imply you're adding (inserting) data. How much, how often? DIH has a delta 
import feature so you can add data on the fly to SOLR's indexes.

Much of it come down to the data model you have. My advice would be try it and 
see. You will be pleasantly surprised!



-Original Message-
From: fabio1605 [mailto:fabio.to...@btinternet.com] 
Sent: 02 July 2013 17:10
To: solr-user@lucene.apache.org
Subject: RE: Newbie SolR - Need advice

Thanks guys

So SolR is actually a database replacement for mssql...  Am I right 


We have a lot of perl scripts that contains lots of sql insert queries. Etc


How do we query the SolR database from scripts  I know I have a lot to 
learn still so excuse my ignorance. 

Also...  What is mongo and how does it compare

I just don't understand how in 10years of Web development I have never heard of 
SolR till last week




Sent from Samsung Mobile

---- Original message 
From: "David Quarterman [via Lucene]" 
 
Date: 02/07/2013  16:57  (GMT+00:00) 
To: fabio1605  
Subject: RE: Newbie SolR - Need advice 
 
Hi Fabio, 

Like Jack says, try the tutorial. But to answer your question, SOLR isn't a 
bolt on to SQLServer or any other DB. It's a fantastically fast 
indexing/searching tool. You'll need to use the DataImportHandler (see the 
tutorial) to import your data from the DB into the indices that SOLR uses. Once 
in there, you'll have more power & flexibility than SQLServer would ever give 
you! 

Haven't tried SOLR on Windows (I guess your environment) but I'm sure it'll 
work using Jetty or Tomcat as web container. 

Stick with it. The ride can be bumpy but the experience is sensational! 

DQ 

-Original Message- 
From: fabio1605 [mailto:[hidden email]] 
Sent: 02 July 2013 16:16 
To: [hidden email] 
Subject: Newbie SolR - Need advice 

Hi 

we have a MSSQL Server which is just getting far to large now and performance 
is dying! the majority of our webservers mainly are doing search function so i 
thought it may be best to move to SolR But i know very little about it! 

My questions are! 

Does SolR Run as a bolt on to MSSQL - as in the data is still in MSSQL and SolR 
is just the search bit between? 

Im really struggling to understand the point of SOLR etc so if someone could 
point me to a Dummies website id apprecaite it! google is throwing to much 
confusion at me! 



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746.html
Sent from the Solr - User mailing list archive at Nabble.com. 


If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074772.html
To unsubscribe from Newbie SolR - Need advice, click here.
NAML



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4074782.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Newbie SolR - Need advice

2013-07-03 Thread David Quarterman
Hi Fabio,

Sandeep is right - it'll take time. SOLR isn't straightforward when you first 
start out but the tutorial is the best first step. You can then adapt the 
various config files in the tutorial to adapt to your situation. I'd recommend 
a simple approach to get the hang of it and just index one table, specifying 
some fields to be searched in the schema.xml.

There are some good books around too (Sandeeps's recommendation on Lucidworks 
is good too). Apache Solr 3.1 Cookbook by Rafal Kuc (still valid for 4.x.x), 
Jack Krupansky's Solr 4.x Deep Dive - Early Access Release, Solr In Action by 
Trey Grainger & Tim Potter.

If you need help, shout! It's a great community.

Cheers, DQ

-Original Message-
From: fabio1605 [mailto:fabio.to...@btinternet.com] 
Sent: 03 July 2013 09:55
To: solr-user@lucene.apache.org
Subject: Re: Newbie SolR - Need advice

Hi Sandeep

Thank you for your reply 

Il have a read through the tutorials now that i understand the principle of all 
this,

i would ideally like to keep mssql and bolt solr on top of this so that we can 
keep mssql as we have a 200GB database

Cheers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Newbie-SolR-Need-advice-tp4074746p4075026.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR 4.0 frequent admin problem

2013-07-04 Thread David Quarterman
Hi,

About once a week the admin system comes up with SolrCore Initialization 
Failures. There's nothing in the logs and SOLR continues to work in the 
application it's supporting and in the 'direct access' mode (i.e. 
http://123.465.789.100:8080/solr/collection1/select?q=bingo:*).

The cure is to restart Jetty (8.1.7) and then we can use the admin system again 
via pc's. However, a colleague can get into admin on an iPad with no trouble 
when no browser on a pc can!

Anyone any ideas? It's really frustrating!

Best regards,

DQ



RE: SOLR 4.0 frequent admin problem

2013-07-04 Thread David Quarterman
Cheers, Roman! It was a default Jetty set up so now added a 'work' directory 
and that's in use now.

-Original Message-
From: Roman Chyla [mailto:roman.ch...@gmail.com] 
Sent: 04 July 2013 15:00
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4.0 frequent admin problem

Yes :-)  see SOLR-118, seems an old issue...
On 4 Jul 2013 06:43, "David Quarterman"  wrote:

> Hi,
>
> About once a week the admin system comes up with SolrCore 
> Initialization Failures. There's nothing in the logs and SOLR 
> continues to work in the application it's supporting and in the 'direct 
> access' mode (i.e.
> http://123.465.789.100:8080/solr/collection1/select?q=bingo:*).
>
> The cure is to restart Jetty (8.1.7) and then we can use the admin 
> system again via pc's. However, a colleague can get into admin on an 
> iPad with no trouble when no browser on a pc can!
>
> Anyone any ideas? It's really frustrating!
>
> Best regards,
>
> DQ
>
>


RE: Commit different database rows to solr with same "id" value?

2013-07-10 Thread David Quarterman
Hi Jason,

Assuming you're using DIH, why not build a new, unique id within the query to 
use as  the 'doc_id' for SOLR? We do something like this in one of our 
collections. In MySQL, try this (don't know what it would be for any other db 
but there must be equivalents):

select @rownum:=@rownum+1 rowid, t.* from (main select query) t, (select 
@rownum:=0) s

Regards,

DQ

-Original Message-
From: Jason Huang [mailto:jason.hu...@icare.com] 
Sent: 10 July 2013 15:50
To: solr-user@lucene.apache.org
Subject: Commit different database rows to solr with same "id" value?

Hello,

I am trying to use Solr to store fields from two different database tables, 
where the primary keys are in the format of "1, 2, 3, "

In Java, we build different POJO classes for these two database tables:

table1.java

@SolrIndex(name="id")

private String idTable1




table2.java

@SolrIndex(name="id")

private String idTable2



And later we add these fields defined in the two different types of tables and 
commit it to solrServer.


Here is the scenario where I am having issues:

(1) commit a row from table1 with primary key = "3", this generates a document 
in Solr

(2) commit another row from table2 with the same value of primary key = "3", 
this overwrites the document generated in step (1).


What we really want to achieve is to keep both rows in (1) and (2) because they 
are from different tables. I've read something from google search and it 
appears that we might be able to do it via keeping multiple cores in solr? 
Could anyone point at how to implement multiple core to achieve this?
To be more specific, when I commit the row as a document, I don't have a place 
to pick a certain core and I am not sure if it makes any sense for me to 
specify a core when I commit the document since the layer I am working on 
should abstract it away from me.



The second question is - if we don't want to do a multicore (since we can't 
easily search for related data between multiple cores), how can we resolve this 
issue so both rows from different database table which shares the same primary 
key still exist? We don't want to have to always change the primary key format 
to ensure a uniqueness of the primary key among all different types of database 
tables.


thanks!


Jason


RE: Facet sorting seems weird

2013-07-15 Thread David Quarterman
Hi Henrik,

Try setting up a copyfield in your schema and set the copied field to use 
something like 'text_ws' which implements LowerCaseFilterFactory. Then sort on 
the copyfield.

Regards,

DQ

-Original Message-
From: Henrik Ossipoff Hansen [mailto:h...@entertainment-trading.com] 
Sent: 15 July 2013 15:08
To: solr-user@lucene.apache.org
Subject: Facet sorting seems weird

Hello, first time writing to the list. I am a developer for a company where we 
recently switched all of our search core from Sphinx to Solr with very great 
results. In general we've been very happy with the switch, and everything seems 
to work just as we want it to.

Today however we've run into a bit of a issue regarding faceted sort.

For example we have a field called "brand" in our core, defined as the text_en 
datatype from the example Solr core. This field is copied into facet_brand with 
the datatype string (since we don't really need to do much with it except show 
it for faceted navigation).

Now, given these two entries into the field on different documents, "LEGO" and 
"bObles", and given facet.sort=index, it appears that LEGO is sorted as being 
before bObles. I assume this is because of casing differences.

My question then is, how do we define a decent datatype in our schema, where 
the casing is exact, but we are able to sort it without casing mattering?

Thank you :)

Best regards,
Henrik Ossipoff


Edismax odd results

2013-02-19 Thread David Quarterman
Hi all,

We have an index of boots which contains harness boots, engineer boots , ankle 
boots, etc. An edismax search on the index for 'harness boots' brings back 
2,175 boots with 'harness' results at the top. 'Searching 'engineer boots' 
brings back everything but 'engineer boots', same for 'ankle boots' - in fact, 
same result set of 1,873 mostly boots but a few other products mixed in.

We're on SOLR 4.0 and the field we're querying is stemmed (snowball), 
lowercased on WhiteSpaceTokenizer. Any ideas?

Regards,

 

David Q



RE: Edismax odd results

2013-02-19 Thread David Quarterman
Hi Jack,

Here's q test query we've been using:

select?q=+engineer+boots&defType=edismax&fl=prodname&qf=prodnameplurals&pf2=prodnameplurals^2.0

This still produces a result set where the first 'engineer boot' is way down 
the list and subsequent ones are interspersed with other boots. They're all in 
there, just not at the top. Below is the debug on the first item that is an 
engineer boot.


0.23492618 = (MATCH) sum of:
  0.23492618 = (MATCH) product of:
0.46985236 = (MATCH) sum of:
  0.46985236 = (MATCH) weight(prodnameplurals:boot in 48270) 
[DefaultSimilarity], result of:
0.46985236 = score(doc=48270,freq=1.0 = termFreq=1.0
), product of:
  0.22236869 = queryWeight, product of:
4.8295836 = idf(docFreq=1867, maxDocs=86009)
0.046043035 = queryNorm
  2.112943 = fieldWeight in 48270, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = termFreq=1.0
4.8295836 = idf(docFreq=1867, maxDocs=86009)
0.4375 = fieldNorm(doc=48270)
0.5 = coord(1/2)


Regards,

DQ

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: 19 February 2013 15:31
To: solr-user@lucene.apache.org
Subject: Re: Edismax odd results

Show us your qf and pf params. Do you have PF2 set? That's the key for getting 
the phrase "engineer boots" boosted higher than just boots. You may also simply 
have to give a higher PF2 boost since "boots" probably has a much higher term 
frequency than "engineer" or even the natural Lucene score for "engineer boot".

Also check the &debugQuery=true "explain" scoring to see how engineer, boot, 
and "engineer boot" are being scored - you may have to add some specific query 
phrases to force "engineer boot" into the top results to comparing the scoring.

-- Jack Krupansky

-Original Message-
From: David Quarterman
Sent: Tuesday, February 19, 2013 6:21 AM
To: solr-user@lucene.apache.org
Subject: Edismax odd results

Hi all,

We have an index of boots which contains harness boots, engineer boots , ankle 
boots, etc. An edismax search on the index for 'harness boots' brings back 
2,175 boots with 'harness' results at the top. 'Searching 'engineer boots' 
brings back everything but 'engineer boots', same for 'ankle boots' - in fact, 
same result set of 1,873 mostly boots but a few other products mixed in.

We're on SOLR 4.0 and the field we're querying is stemmed (snowball), 
lowercased on WhiteSpaceTokenizer. Any ideas?

Regards,



David Q



RE: Edismax odd results

2013-02-19 Thread David Quarterman
Hi Shawn,

I checked the admin analysis earlier. Stemming is taking 'engineer' down to 
'engin', but then I'd have thought that a search on 'engin boots' would work 
but it doesn't.

I'll try turning the wick back up on the logging - we set it to 'warning'.

Regards,

DQ

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 19 February 2013 16:25
To: solr-user@lucene.apache.org
Subject: Re: Edismax odd results

I do not see the word engineer (or any other similar word) in the score 
calculation, only boots.  A test on my own index shows both words in the 
calculations.  I would use the analysis admin page on the prodnameplurals field 
to see what happens to the input of "engineer boots" on both index and query - 
see what part of your analysis chain removes it.

If you don't see any problem there, then the Solr log (assuming you haven't 
changed the default log level of INFO) should have a record of what parameters 
were actually received when the query was made.

Thanks,
Shawn


On 2/19/2013 9:14 AM, David Quarterman wrote:
> Hi Jack,
>
> Here's q test query we've been using:
>
> select?q=+engineer+boots&defType=edismax&fl=prodname&qf=prodnameplural
> s&pf2=prodnameplurals^2.0
>
> This still produces a result set where the first 'engineer boot' is way down 
> the list and subsequent ones are interspersed with other boots. They're all 
> in there, just not at the top. Below is the debug on the first item that is 
> an engineer boot.
>
> 
> 0.23492618 = (MATCH) sum of:
>0.23492618 = (MATCH) product of:
>  0.46985236 = (MATCH) sum of:
>0.46985236 = (MATCH) weight(prodnameplurals:boot in 48270) 
> [DefaultSimilarity], result of:
>  0.46985236 = score(doc=48270,freq=1.0 = termFreq=1.0 ), 
> product of:
>0.22236869 = queryWeight, product of:
>  4.8295836 = idf(docFreq=1867, maxDocs=86009)
>  0.046043035 = queryNorm
>2.112943 = fieldWeight in 48270, product of:
>  1.0 = tf(freq=1.0), with freq of:
>1.0 = termFreq=1.0
>  4.8295836 = idf(docFreq=1867, maxDocs=86009)
>  0.4375 = fieldNorm(doc=48270)
>  0.5 = coord(1/2)
> 
>
> Regards,
>
> DQ
>
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: 19 February 2013 15:31
> To: solr-user@lucene.apache.org
> Subject: Re: Edismax odd results
>
> Show us your qf and pf params. Do you have PF2 set? That's the key for 
> getting the phrase "engineer boots" boosted higher than just boots. You may 
> also simply have to give a higher PF2 boost since "boots" probably has a much 
> higher term frequency than "engineer" or even the natural Lucene score for 
> "engineer boot".
>
> Also check the &debugQuery=true "explain" scoring to see how engineer, boot, 
> and "engineer boot" are being scored - you may have to add some specific 
> query phrases to force "engineer boot" into the top results to comparing the 
> scoring.
>
> -- Jack Krupansky
>
> -Original Message-
> From: David Quarterman
> Sent: Tuesday, February 19, 2013 6:21 AM
> To: solr-user@lucene.apache.org
> Subject: Edismax odd results
>
> Hi all,
>
> We have an index of boots which contains harness boots, engineer boots , 
> ankle boots, etc. An edismax search on the index for 'harness boots' brings 
> back 2,175 boots with 'harness' results at the top. 'Searching 'engineer 
> boots' brings back everything but 'engineer boots', same for 'ankle boots' - 
> in fact, same result set of 1,873 mostly boots but a few other products mixed 
> in.
>
> We're on SOLR 4.0 and the field we're querying is stemmed (snowball), 
> lowercased on WhiteSpaceTokenizer. Any ideas?



RE: Edismax odd results

2013-02-19 Thread David Quarterman
Hi Shawn/Jack,

The log shows the query going in okay, nothing gets stripped out so we're still 
at a loss to understand this. Could it be theta Snowball stemming is too 
invasive?

Regards,

DQ

-Original Message-
From: David Quarterman [mailto:da...@corexe.com] 
Sent: 19 February 2013 16:38
To: solr-user@lucene.apache.org
Subject: RE: Edismax odd results

Hi Shawn,

I checked the admin analysis earlier. Stemming is taking 'engineer' down to 
'engin', but then I'd have thought that a search on 'engin boots' would work 
but it doesn't.

I'll try turning the wick back up on the logging - we set it to 'warning'.

Regards,

DQ

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: 19 February 2013 16:25
To: solr-user@lucene.apache.org
Subject: Re: Edismax odd results

I do not see the word engineer (or any other similar word) in the score 
calculation, only boots.  A test on my own index shows both words in the 
calculations.  I would use the analysis admin page on the prodnameplurals field 
to see what happens to the input of "engineer boots" on both index and query - 
see what part of your analysis chain removes it.

If you don't see any problem there, then the Solr log (assuming you haven't 
changed the default log level of INFO) should have a record of what parameters 
were actually received when the query was made.

Thanks,
Shawn


On 2/19/2013 9:14 AM, David Quarterman wrote:
> Hi Jack,
>
> Here's q test query we've been using:
>
> select?q=+engineer+boots&defType=edismax&fl=prodname&qf=prodnameplural
> s&pf2=prodnameplurals^2.0
>
> This still produces a result set where the first 'engineer boot' is way down 
> the list and subsequent ones are interspersed with other boots. They're all 
> in there, just not at the top. Below is the debug on the first item that is 
> an engineer boot.
>
> 
> 0.23492618 = (MATCH) sum of:
>0.23492618 = (MATCH) product of:
>  0.46985236 = (MATCH) sum of:
>0.46985236 = (MATCH) weight(prodnameplurals:boot in 48270) 
> [DefaultSimilarity], result of:
>  0.46985236 = score(doc=48270,freq=1.0 = termFreq=1.0 ), 
> product of:
>0.22236869 = queryWeight, product of:
>  4.8295836 = idf(docFreq=1867, maxDocs=86009)
>  0.046043035 = queryNorm
>2.112943 = fieldWeight in 48270, product of:
>  1.0 = tf(freq=1.0), with freq of:
>1.0 = termFreq=1.0
>  4.8295836 = idf(docFreq=1867, maxDocs=86009)
>  0.4375 = fieldNorm(doc=48270)
>  0.5 = coord(1/2)
> 
>
> Regards,
>
> DQ
>
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: 19 February 2013 15:31
> To: solr-user@lucene.apache.org
> Subject: Re: Edismax odd results
>
> Show us your qf and pf params. Do you have PF2 set? That's the key for 
> getting the phrase "engineer boots" boosted higher than just boots. You may 
> also simply have to give a higher PF2 boost since "boots" probably has a much 
> higher term frequency than "engineer" or even the natural Lucene score for 
> "engineer boot".
>
> Also check the &debugQuery=true "explain" scoring to see how engineer, boot, 
> and "engineer boot" are being scored - you may have to add some specific 
> query phrases to force "engineer boot" into the top results to comparing the 
> scoring.
>
> -- Jack Krupansky
>
> -Original Message-
> From: David Quarterman
> Sent: Tuesday, February 19, 2013 6:21 AM
> To: solr-user@lucene.apache.org
> Subject: Edismax odd results
>
> Hi all,
>
> We have an index of boots which contains harness boots, engineer boots , 
> ankle boots, etc. An edismax search on the index for 'harness boots' brings 
> back 2,175 boots with 'harness' results at the top. 'Searching 'engineer 
> boots' brings back everything but 'engineer boots', same for 'ankle boots' - 
> in fact, same result set of 1,873 mostly boots but a few other products mixed 
> in.
>
> We're on SOLR 4.0 and the field we're querying is stemmed (snowball), 
> lowercased on WhiteSpaceTokenizer. Any ideas?



RE: Edismax odd results

2013-02-19 Thread David Quarterman
Hi,

This is definitely driving us mad now! Changed to PorterStemming and there's 
very little difference. 

If we add fq=engineer, we get 0 results. Add fq=engineer* and we get the 90 in 
the system. Try with fq=ankle* and we get 2. Correct. Try with fq=harness* and 
we get 0!

The stemming reduces 'engineer' to 'engin' so I'd have expected a lot more 
results.

Anyone got any ideas?

Regards,

DQ



-----Original Message-
From: David Quarterman [mailto:da...@corexe.com] 
Sent: 19 February 2013 17:09
To: solr-user@lucene.apache.org
Subject: RE: Edismax odd results

Hi Shawn/Jack,

The log shows the query going in okay, nothing gets stripped out so we're still 
at a loss to understand this. Could it be theta Snowball stemming is too 
invasive?

Regards,

DQ

-----Original Message-
From: David Quarterman [mailto:da...@corexe.com]
Sent: 19 February 2013 16:38
To: solr-user@lucene.apache.org
Subject: RE: Edismax odd results

Hi Shawn,

I checked the admin analysis earlier. Stemming is taking 'engineer' down to 
'engin', but then I'd have thought that a search on 'engin boots' would work 
but it doesn't.

I'll try turning the wick back up on the logging - we set it to 'warning'.

Regards,

DQ

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: 19 February 2013 16:25
To: solr-user@lucene.apache.org
Subject: Re: Edismax odd results

I do not see the word engineer (or any other similar word) in the score 
calculation, only boots.  A test on my own index shows both words in the 
calculations.  I would use the analysis admin page on the prodnameplurals field 
to see what happens to the input of "engineer boots" on both index and query - 
see what part of your analysis chain removes it.

If you don't see any problem there, then the Solr log (assuming you haven't 
changed the default log level of INFO) should have a record of what parameters 
were actually received when the query was made.

Thanks,
Shawn


On 2/19/2013 9:14 AM, David Quarterman wrote:
> Hi Jack,
>
> Here's q test query we've been using:
>
> select?q=+engineer+boots&defType=edismax&fl=prodname&qf=prodnameplural
> s&pf2=prodnameplurals^2.0
>
> This still produces a result set where the first 'engineer boot' is way down 
> the list and subsequent ones are interspersed with other boots. They're all 
> in there, just not at the top. Below is the debug on the first item that is 
> an engineer boot.
>
> 
> 0.23492618 = (MATCH) sum of:
>0.23492618 = (MATCH) product of:
>  0.46985236 = (MATCH) sum of:
>0.46985236 = (MATCH) weight(prodnameplurals:boot in 48270) 
> [DefaultSimilarity], result of:
>  0.46985236 = score(doc=48270,freq=1.0 = termFreq=1.0 ), 
> product of:
>0.22236869 = queryWeight, product of:
>  4.8295836 = idf(docFreq=1867, maxDocs=86009)
>  0.046043035 = queryNorm
>2.112943 = fieldWeight in 48270, product of:
>  1.0 = tf(freq=1.0), with freq of:
>1.0 = termFreq=1.0
>  4.8295836 = idf(docFreq=1867, maxDocs=86009)
>  0.4375 = fieldNorm(doc=48270)
>  0.5 = coord(1/2)
> 
>
> Regards,
>
> DQ
>
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: 19 February 2013 15:31
> To: solr-user@lucene.apache.org
> Subject: Re: Edismax odd results
>
> Show us your qf and pf params. Do you have PF2 set? That's the key for 
> getting the phrase "engineer boots" boosted higher than just boots. You may 
> also simply have to give a higher PF2 boost since "boots" probably has a much 
> higher term frequency than "engineer" or even the natural Lucene score for 
> "engineer boot".
>
> Also check the &debugQuery=true "explain" scoring to see how engineer, boot, 
> and "engineer boot" are being scored - you may have to add some specific 
> query phrases to force "engineer boot" into the top results to comparing the 
> scoring.
>
> -- Jack Krupansky
>
> -Original Message-
> From: David Quarterman
> Sent: Tuesday, February 19, 2013 6:21 AM
> To: solr-user@lucene.apache.org
> Subject: Edismax odd results
>
> Hi all,
>
> We have an index of boots which contains harness boots, engineer boots , 
> ankle boots, etc. An edismax search on the index for 'harness boots' brings 
> back 2,175 boots with 'harness' results at the top. 'Searching 'engineer 
> boots' brings back everything but 'engineer boots', same for 'ankle boots' - 
> in fact, same result set of 1,873 mostly boots but a few other products mixed 
> in.
>
> We're on SOLR 4.0 and the field we're querying is stemmed (snowball), 
> lowercased on WhiteSpaceTokenizer. Any ideas?



Re: Edismax odd results

2013-02-19 Thread David Quarterman
Hi Shawn,

Now finished for the day but will post the schema tomorrow. Thanks for the help 
(and Jack too).

Regards,

DQ

P.S. did reindex after changing schema and the analyzer/query stuff matches 
precisely!!

Shawn Heisey  wrote:

On 2/19/2013 11:16 AM, David Quarterman wrote:
> This is definitely driving us mad now! Changed to PorterStemming and there's 
> very little difference.
>
> If we add fq=engineer, we get 0 results. Add fq=engineer* and we get the 90 
> in the system. Try with fq=ankle* and we get 2. Correct. Try with fq=harness* 
> and we get 0!
>
> The stemming reduces 'engineer' to 'engin' so I'd have expected a lot more 
> results.
>
> Anyone got any ideas?

Did you completely reindex when you changed your schema?  You must reindex.

Does the index analysis match the query analysis?  Some specific 
differences are allowed (and sometimes encouraged), but stemming must be 
done to both.  Can you share your schema?  Use a paste website like 
pastie.org for that.

Thanks,
Shawn



RE: Edismax odd results

2013-02-20 Thread David Quarterman
Hi Shawn,

Schema's at http://justpaste.it/davidqhog. It's the basic SOLR 4.0 with 
additions!

Regards,

DQ


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 19 February 2013 18:32
To: solr-user@lucene.apache.org
Subject: Re: Edismax odd results

On 2/19/2013 11:16 AM, David Quarterman wrote:
> This is definitely driving us mad now! Changed to PorterStemming and there's 
> very little difference.
>
> If we add fq=engineer, we get 0 results. Add fq=engineer* and we get the 90 
> in the system. Try with fq=ankle* and we get 2. Correct. Try with fq=harness* 
> and we get 0!
>
> The stemming reduces 'engineer' to 'engin' so I'd have expected a lot more 
> results.
>
> Anyone got any ideas?

Did you completely reindex when you changed your schema?  You must reindex.

Does the index analysis match the query analysis?  Some specific differences 
are allowed (and sometimes encouraged), but stemming must be done to both.  Can 
you share your schema?  Use a paste website like pastie.org for that.

Thanks,
Shawn



RE: Edismax odd results

2013-02-20 Thread David Quarterman
Hi Erick,

Debug=all posted on http://justpaste.it/davidqhogdebug. Can't see anything 
obvious myselfbut then I'm not an expert!

Regards,

DQ

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 20 February 2013 02:02
To: solr-user@lucene.apache.org
Subject: Re: Edismax odd results

When you get back to this tomorrow, also try and paste the parsed query bits 
you get back when you append &debug=all. Sometimes it's surprising what the 
parsed query _really_ looks like

Best
Erick


On Tue, Feb 19, 2013 at 3:13 PM, David Quarterman  wrote:

> Hi Shawn,
>
> Now finished for the day but will post the schema tomorrow. Thanks for 
> the help (and Jack too).
>
> Regards,
>
> DQ
>
> P.S. did reindex after changing schema and the analyzer/query stuff 
> matches precisely!!
>
> Shawn Heisey  wrote:
>
> On 2/19/2013 11:16 AM, David Quarterman wrote:
> > This is definitely driving us mad now! Changed to PorterStemming and
> there's very little difference.
> >
> > If we add fq=engineer, we get 0 results. Add fq=engineer* and we get 
> > the
> 90 in the system. Try with fq=ankle* and we get 2. Correct. Try with
> fq=harness* and we get 0!
> >
> > The stemming reduces 'engineer' to 'engin' so I'd have expected a 
> > lot
> more results.
> >
> > Anyone got any ideas?
>
> Did you completely reindex when you changed your schema?  You must reindex.
>
> Does the index analysis match the query analysis?  Some specific 
> differences are allowed (and sometimes encouraged), but stemming must 
> be done to both.  Can you share your schema?  Use a paste website like 
> pastie.org for that.
>
> Thanks,
> Shawn
>
>


RE: Edismax odd results

2013-02-20 Thread David Quarterman
Hi Erick,

I understand the wildcard issue -  that was more desperation on our part than 
logic!

TermsComponent showed 

222
197

so the term is in the index.
Using the explainOther, I can see that the relevance of documents with 
'engineer boots' in the name is low compared to the others and they appear 
randomly distributed through the resultset (I know it's not random). We've 
tried all sorts of things to boost them but to no avail. Trying 'logger boots' 
or 'harness boots' gives good results with the required terms at the top of the 
set.

I'm mystified.

Regards,

DQ

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 20 February 2013 12:49
To: solr-user@lucene.apache.org
Subject: Re: Edismax odd results

OK, first:
wildcarding and stemming don't get along well together. Since you've stemmed 
the field, enginee* would not match the stemmed term engin. This is actually 
pretty tricky to try to implement. For instance, how would enginee stem? So the 
fqs you posted are going to mislead you in that regard.

If you want to examine the actual values in your index, consider using 
TermsComponent or Luke. Either will show you exactly what's being searched 
against.

I suspect that your fq entries (as typed) are going against the default field 
of "text" as defined in your schema, which doesn't stem, so that's leading you 
astray possibly.

Finally, you may be getting bitten by scoring, field norms and all that. If you 
have a doc ID that you _know_ contains "engineers boots", try using debug with 
explainOther (
http://wiki.apache.org/solr/CommonQueryParameters#explainOther) which might 
help you understand what's happening with the doc you care about

Best
Erick


On Wed, Feb 20, 2013 at 7:13 AM, David Quarterman  wrote:

> Hi Erick,
>
> Debug=all posted on http://justpaste.it/davidqhogdebug. Can't see 
> anything obvious myselfbut then I'm not an expert!
>
> Regards,
>
> DQ
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 20 February 2013 02:02
> To: solr-user@lucene.apache.org
> Subject: Re: Edismax odd results
>
> When you get back to this tomorrow, also try and paste the parsed 
> query bits you get back when you append &debug=all. Sometimes it's 
> surprising what the parsed query _really_ looks like
>
> Best
> Erick
>
>
> On Tue, Feb 19, 2013 at 3:13 PM, David Quarterman 
> wrote:
>
> > Hi Shawn,
> >
> > Now finished for the day but will post the schema tomorrow. Thanks 
> > for the help (and Jack too).
> >
> > Regards,
> >
> > DQ
> >
> > P.S. did reindex after changing schema and the analyzer/query stuff 
> > matches precisely!!
> >
> > Shawn Heisey  wrote:
> >
> > On 2/19/2013 11:16 AM, David Quarterman wrote:
> > > This is definitely driving us mad now! Changed to PorterStemming 
> > > and
> > there's very little difference.
> > >
> > > If we add fq=engineer, we get 0 results. Add fq=engineer* and we 
> > > get the
> > 90 in the system. Try with fq=ankle* and we get 2. Correct. Try with
> > fq=harness* and we get 0!
> > >
> > > The stemming reduces 'engineer' to 'engin' so I'd have expected a 
> > > lot
> > more results.
> > >
> > > Anyone got any ideas?
> >
> > Did you completely reindex when you changed your schema?  You must
> reindex.
> >
> > Does the index analysis match the query analysis?  Some specific 
> > differences are allowed (and sometimes encouraged), but stemming 
> > must be done to both.  Can you share your schema?  Use a paste 
> > website like pastie.org for that.
> >
> > Thanks,
> > Shawn
> >
> >
>


RE: If we Open Source our platform, would it be interesting to you?

2013-02-21 Thread David Quarterman
Hi Marcelo,

Looked through your site and the framework looks very powerful as an 
aggregator. We do a lot of data aggregation from many different sources in many 
different formats (XML, JSON, text, CSV, etc) using RDBMS as the main 
repository for eventual SOLR indexing. A 'one-stop-shop' for all this would be 
very appealing.

Have you looked at products like Talend & Jitterbit? These offer transformation 
from almost anything to almost anything using graphical interfaces (Jitterbit 
is better) and a PHP-like coding format for trickier work. If you (or somebody) 
could add a graphical interface, the world would beat a path to your door!

Regards,

DQ

-Original Message-
From: Marcelo Elias Del Valle [mailto:marc...@s1mbi0se.com.br] 
Sent: 20 February 2013 18:18
To: solr-user@lucene.apache.org
Subject: If we Open Source our platform, would it be interesting to you?

Hello All,

I’m sending this email because I think it may be interesting for Solr users, as 
this project have a strong usage of Solr platform.

We are strongly considering opening the source of our DMP (Data Management 
Platform), if it proves to be technically interesting to other developers / 
companies.

More details: http://www.s1mbi0se.com/s1mbi0se_DMP.html

All comments, questions and critics happening at HN:
http://news.ycombinator.com/item?id=5251780

Please, feel free to send questions, comments and critics... We will try to 
reply them all.

Regards,
Marcelo


RE: Edismax odd results

2013-02-22 Thread David Quarterman
Hi Erick,

Funnily enough, I cracked it about 5 minutes before your email arrived! Problem 
was using WhiteSpaceTokenizer instead of Standard AND had the LowerCaseFilter 
after the PorterStemmingFilter. Getting them in the right order has solved all 
the problems and we get all our engineer boots, ankle boots at the top of the 
set!

Many thanks to all who took part.

Regards,

DQ

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 22 February 2013 12:59
To: solr-user@lucene.apache.org
Subject: Re: Edismax odd results

OK, let's see the debug data for explainOther.

One thing, though. Your analysis chain is apt to be surprising. The fact that 
you have 222 terms with the ":" says that you're probably not getting what I'd 
guess you want. That ':' is part of your token, and will not match 
"engineering", consider changing some of your filters to remove stuff like 
that

Best
Erick


RE: Building a central index with Lucene + Solr

2013-03-05 Thread David Quarterman
Hi Alvaro,

I agree with Otis & Alexandre (esp. Windows + PHP!). However, there are plenty 
of people using Solr & PHP out there very successfully. There's another good 
package at http://code.google.com/p/solr-php-client/ which is easy to implement 
and has some example usage.

Regards,

DQ

 

From: Álvaro Vargas Quezada [mailto:al...@outlook.com] 
Sent: 05 March 2013 14:53
To: solr-user@lucene.apache.org
Subject: Building a central index with Lucene + Solr

 

Hi everyone!

 

I'm trying to develop a central index, I installed Solr and I reach the screen 
that I attach. But the problem is that I don't know how to continue since this 
point, I wanted to develop an app in php which use Solr, but I don't know how, 
anyone that can help me maybe with a tutorial or something like that?

 

Thanks and greetz from Chile!

 



SOLR 4.0 Beta documents being duplicated

2012-10-05 Thread David Quarterman
Hi,

We've been using V4.x of SOLR since last November without too much
trouble. Our MySQL database is refreshed daily and a full import is run
automatically after the refresh and generally produces around 86,000
products, obviously on unique doc_id's.

 

So, we upgraded to 4.0 Beta a few days ago, with only mild difficulty,
reindexed and all was fine. Except after the next data refresh and
full-import, we had duplicate products appearing on different unique
doc_ids. Not all products are being duplicated, just random ones. We've
just deleted the data directory and reindexed and the product count has
dropped from 116,711 to 86,543. There'll be another refresh/import early
tomorrow morning and I fear we'll have more duplicates.

 

The call to the import now contains clean=true, commit=true and
optimize=true but it seems to make no difference.

 

Anyone have any ideas?

 

Regards,

 

David Q

 



RE: SOLR 4.0 Beta documents being duplicated

2012-10-05 Thread David Quarterman
Thanks Erick.
We've added the '_version_' and we'll see if that makes a difference
tomorrow. Also, have downloaded the RC1 and will try that next week.

Regards,

David Q

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 05 October 2012 15:40
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4.0 Beta documents being duplicated

How are you indexing? There was a problem with indexing from SolrJ if
you indexed documents in batches, server.add(doclist) that's fixed in
4.0 RC#. The work-around is to add docs singly, server.add(doc)

Second thing. Bad Things Happen if you don't have a _version_ field in
your schema.xml. Solr 4.0 RC# isn't happy on startup if this field is
missing...

Personally, I think you'd be better off using one of the release
candidates.
Robert cut one here:
http://people.apache.org/~rmuir/staging_area/lucene-solr-4.0RC1-rev13911
44/solr/

There will be an RC2 sometime, a couple of problems have been found, but
using RC1 should minimize any update to the official 4.0 plus have a lot
of improvements over BETA...

Best
Erick

On Fri, Oct 5, 2012 at 10:25 AM, David Quarterman 
wrote:
> Hi,
>
> We've been using V4.x of SOLR since last November without too much 
> trouble. Our MySQL database is refreshed daily and a full import is 
> run automatically after the refresh and generally produces around 
> 86,000 products, obviously on unique doc_id's.
>
>
>
> So, we upgraded to 4.0 Beta a few days ago, with only mild difficulty,

> reindexed and all was fine. Except after the next data refresh and 
> full-import, we had duplicate products appearing on different unique 
> doc_ids. Not all products are being duplicated, just random ones. 
> We've just deleted the data directory and reindexed and the product 
> count has dropped from 116,711 to 86,543. There'll be another 
> refresh/import early tomorrow morning and I fear we'll have more
duplicates.
>
>
>
> The call to the import now contains clean=true, commit=true and 
> optimize=true but it seems to make no difference.
>
>
>
> Anyone have any ideas?
>
>
>
> Regards,
>
>
>
> David Q
>
>
>


RE: Feature & design question: use autocomple?te to search on 2 different fields, and return 2 different data groups

2012-11-01 Thread David Quarterman
We had a similar requirement and found the best solution (unfortunately)
was to spend a small amount of money. Have a look at Sematext's site
(www.sematext.com). Their Autocomplete is awesome and we have a
fantastic looking AC now on our development site, grouped by category,
product & brand with product pictures to boot!

It's very, very quick in operation too.

Best,

DQ

-Original Message-
From: fernando.beck [mailto:fernando.b...@gmail.com] 
Sent: 01 November 2012 13:40
To: solr-user@lucene.apache.org
Subject: Feature & design question: use autocomple?te to search on 2
different fields, and return 2 different data groups

Hello,

 

 we're facing a new feature request, and we can't get the right way to
come up with a working solution. 

 

Context: we have a list of businesses . For each business we have: name,
category, address, city.
 
One business may have 1 or more categories.

 

Example:

Name: Outback SteakHouse

Category: Restaurants , American

Address: xx

City: Rio de Janeiro

  

Name: Starbucks

Category: Bar, Coffee

Address: y

City: Rio de Janeiro

 

Name:  Pizza Hut

Category: Restaurant, Pizza

Address: 
 
City: New York

 

and so on.

 

What we need to do:  create an "autocomplete" feature; whenever someone
starts to type, we will need to search the term BOTH on CompanyName AND
Category.
 
Example:  I type pizz

 

and the result should be coming back in 2 groups.

Group 1: Categories  (displaying  Pizza)

Group 2:  all those businesses featuring pizza on their name , ie Pizza
Hut.
 
 

Right now we can not find a way to get this done.

 

Schema (since we're running a portuguese based application, there are 2
fieldType added for it):

 




  
-->



 



 





 

 
   


 

 
  



 



 


 


   

 

 



 



  

 


   
  


 

   




   


 


  
  
  

 

 


  
  


   



 
  

 
  



   

  






   
  

 

  
  
  
  
 


  
  

   
   

 
   

   

   
  

 
 LocalBusinessId
 



 

Thanks,

 

F



--
View this message in context:
http://lucene.472066.n3.nabble.com/Feature-design-question-use-autocompl
e-te-to-search-on-2-different-fields-and-return-2-different-dats-tp40175
28.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Feature & design question: use autocomple?te to search on 2 different fields, and return 2 different data groups

2012-11-01 Thread David Quarterman
Fernando,

Pretty much the problem we came up against. We had a basic AC running
using SpellChecker a while ago but it was the grouping that floored us
and sent us elsewhere. Again, multiple queries seemed like the only
possible answer but in an AC scenario, even with SOLR's speed, probably
too slow under load.

Best,

DQ

-Original Message-
From: fernando.beck [mailto:fernando.b...@gmail.com] 
Sent: 01 November 2012 13:55
To: solr-user@lucene.apache.org
Subject: RE: Feature & design question: use autocomple?te to search on 2
different fields, and return 2 different data groups

David,

 appreciate the suggestion.  Our current autocomplete feature is
actually working pretty good.
No perfomance issues; functionally is providing 100% results as
expected.
I checked sematext and also http://www.cominvent.com; they are great,
and our budget to go get them is 0.

At this time, and given the presented schema, my question would be: is
even possible to get it done somehow? with 1 query, and "group" those
results while autocompleting on 2 different search fields?



--
View this message in context:
http://lucene.472066.n3.nabble.com/Feature-design-question-use-autocompl
e-te-to-search-on-2-different-fields-and-return-2-different-dats-tp40175
28p4017534.html
Sent from the Solr - User mailing list archive at Nabble.com.