sharding Query

2011-02-21 Thread Isha Garg
can anyone tell me whether Distributed index or sharding support 
morelikethis feature of solr.


Faceting

2011-02-21 Thread Praveen Parameswaran
Hi,

Is it possible to have 100% accuracy for facet counts using solr ? Since
this is for a product price comparison site I would need the search to
return accurate results. for example if I search "sony lcd Tv" I do not want
"sony Led Tv" to be returned int he results.  Please let me know if this is
possible and how?


Thanks

Prav


Re: Faceting

2011-02-21 Thread Tommaso Teofili
Hi Praveen,
as far as I understand you have to set the type of the field(s) you are
searching over to be conservative.
So for example you won't include stemmer and lowercase filters and use only
a whitespace tokenizer, more over you should search with the default
operator set to AND.
Then faceting over those field(s) will depend on those type settings.
You may find the following wiki page useful:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
My 2 cents,
Tommaso

2011/2/21 Praveen Parameswaran 

> Hi,
>
> Is it possible to have 100% accuracy for facet counts using solr ? Since
> this is for a product price comparison site I would need the search to
> return accurate results. for example if I search "sony lcd Tv" I do not
> want
> "sony Led Tv" to be returned int he results.  Please let me know if this is
> possible and how?
>
>
> Thanks
>
> Prav
>


Re: Faceting

2011-02-21 Thread Jan Høydahl
Hi,

Facet counts in Solr are 100% accurate. That's not the problem.

The problem lies in how you configure search
- In what fields do you search?
  If you search the product description and it says "LED is much better than 
LCD" you get that product in your facet counts
- What synonyms/stemming etc are you applying?
  If someone search for "apple" they may get "apples" which may not be intended

All in all, you want a tradeoff between helping customer find stuff without the 
exact right wording (recall) at the same time as you want the query to be 
precise (precision). This is not an exact science.

The best bet would perhaps be to make an intelligent query interface where you 
detect "lcd" in the query as the feature "LCD" and add that as a strucured 
filter. This takes a lot of effort to build but will probably give the best 
results at the end of the day.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 21. feb. 2011, at 12.39, Praveen Parameswaran wrote:

> Hi,
> 
> Is it possible to have 100% accuracy for facet counts using solr ? Since
> this is for a product price comparison site I would need the search to
> return accurate results. for example if I search "sony lcd Tv" I do not want
> "sony Led Tv" to be returned int he results.  Please let me know if this is
> possible and how?
> 
> 
> Thanks
> 
> Prav



Re: sharding Query

2011-02-21 Thread Otis Gospodnetic
Hello Isha,

I strongly suggest:

1) a Hello :)

2) searching before asking, e.g., 
http://search-lucene.com/?q=%22more+like+this%22+mlt+%2Bdistributed&fc_project=Solr&fc_type=jira


3) Thanks/signature/something-human :)

I think people will be more inclined to help you that way.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Isha Garg 
> To: solr-user@lucene.apache.org
> Sent: Mon, February 21, 2011 6:09:37 AM
> Subject: sharding Query
> 
> can anyone tell me whether Distributed index or sharding support morelikethis 
>  
>feature of solr.
> 


Re: change in field_type

2011-02-21 Thread Otis Gospodnetic
Hello,

When you change types you typically want to reindex everything.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Isha Garg 
> To: solr-user@lucene.apache.org
> Sent: Mon, February 21, 2011 12:21:10 AM
> Subject: change in field_type
> 
> Hii,
> 
>   I want to confirm that if a change the type of  a field from "sting" to 
>"text" in schema.xml then whether i have rebuild the  index or will it works 
>fine with previous index.
> 
> 
> 
> 


Re: Solr 4.0 trunk in production

2011-02-21 Thread Otis Gospodnetic
Hi Mark,

Check out the Wiki - 
http://search-lucene.com/?q=nightly+builds&fc_project=Solr&fc_type=wiki

Now, these are nightly builds, not necessarily stable snapshots :)  But after 
you test them you can call them stable snapshots for your purposes/app.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Mark 
> To: solr-user@lucene.apache.org
> Sent: Sun, February 20, 2011 3:10:36 PM
> Subject: Re: Solr 4.0 trunk in production
> 
> Thanks for the adivce.
> 
> Where can I find stable snapshots, I only know of  checking out from head?
> 
> On 2/20/11 11:56 AM, Ryan McKinley wrote:
> >  Not crazy -- but be aware of a few *key* caviates.
> >
> > 1. Do good  testing on a stable snapshot.
> > 2. Don't get surprised if you have to  rebuild the index from scratch
> > to upgrade in the future.  The  official releases will upgrade smoothly
> > -- but within dev builds,  anything may happen.
> >
> >
> >
> > On Sat, Feb 19, 2011 at  9:50 AM, Mark   wrote:
> >> Would I be crazy even to consider putting this in production?  Thanks
> >>
> 


Re: SolrCloud new....

2011-02-21 Thread Otis Gospodnetic
Hi Stijn,

Yes, there is a link to Login or create account either on top or bottom of 
every 
page.

Please do edit it directly if you see things that are incorrect or outdated.  
If 
you share them in email, I'm 99.9% sure nobody will take the time to transfer 
that to the Wiki.
If you are afraid to mess up something on that page, you could simply append to 
the end of it and clearly label it as "I'm not sure, but this is what I did and 
what worked for me as of -MM-DD".  That will help people who follow 
instructions without messing up anything existing.

Thanks!
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Stijn Vanhoorelbeke 
> To: solr-user@lucene.apache.org
> Sent: Sun, February 20, 2011 2:09:58 PM
> Subject: Re: SolrCloud new
> 
> 
> Hi,
> 
> Can I edit the SolrCloud page?
> I never thought about it - but  since it's a wiki -- everyone can edit,
> right?
> 
> For the moment I'll not  write stuff onto it - but if you need some help, I
> can share you some of my (  little, but some ) experience.
> 
> 2011/2/20 Otis Gospodnetic-2 [via Lucene]  <
> ml-node+2538747-751169843-301...@n3.nabble.com>
> 
> >  Hi Stijn,
> >
> > Thank you for sharing this.
> > Would it at all  be possible for you to update the parts of SolrCloud page
> > that
> >  are incorrect and that you figured out or add anything new that's not on
> >  that
> > page yet?
> >
> > Thanks!
> >
> > Otis
> >  
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem  search :: http://search-lucene.com/
> >
> >
> >
> > - Original  Message 
> >
> > > From: Stijn Vanhoorelbeke <[hidden  
>email]>
> >
> >  > To: [hidden email]
> > > Sent:  Fri, February 18, 2011 6:42:10 AM
> > > Subject: Re: SolrCloud  new
> > >
> > >
> > > Hi,
> > >
> > > I'm  busy doing the exact same thing.
> > > I figured things out -   all  by myself - the wiki page is a nice 'fist
> > view',
> > >  but doesn't goes  in dept...
> > >
> > > Lets go  ahead:
> > > 1)Should i copy the libraries from cloud to   trunk???
> > > 2)should i keep the cloud module in every  system???
> > >
> > > A: Yes,  you should.
> > > You  should get yourself the latest dev trunk and compile  it.
> >  >
> > > The steps I followed:
> > > + grap latest trunk &  build solr
> > > +  backup all solr config files
> > > + in  dir tomcat6/webapps/ remove the dir  'solr'
> > > + copy the new  solr.war ( which you build in first step ) to
> >   tomcat6/webapps
> > > + On your Solr_home/conf dir solrconfig.xml need to  be  replaced by a new
> > one
> > > ( you take from example dir  of your build) -- some  other config files (
> > like
> > >  schema.xml ) you may keep using the old ones.
> > > +  Adapt the new  files to represent the old configuration
> > > + restart tomcat and   it will install new version of solr
> > >
> > > It seems the index  isn't compatible -  so you need to flush your whole
> > index
> >  > and re-index all data.
> > > And finally  you have your solr  system back with zookeeper integrated in
> > > /admin zone   :)
> > >
> > >
> > > 3) I am not using any cores in the  solr. It is a single solr in  every
> > > system.can solrcloud  support it??
> > >
> > > A: Actually you are using one  cor -  so gives no problem.
> > > But be sure to check you have solr.xml file  in  your solr_home dir.
> > > This file just mentions all cores - in  your case just one  core;
> > > ( you can find examples of layout of  this file easily on
> > > http://wiki.apache.org/solr/CoreAdmin )
> > >
> >  > 4) the example is given in  jetty.Is it the same way to make it  in
> > tomcat???
> > >
> > > A: Right now - it is the   same way.
> > > You have to edit your /etc/init.d/tomcat6 startup script.  In the  start)
> > > section you can specify all the JAVA_OPTS ( the  ones the solrcloud  wiki
> > > mentions).
> > >
> > >  Be sure to set following one:
> > > export  JAVA_OPTS="$JAVA_OPTS  -DhostPort=8080" ( if tomcat runs on port
> >  8080
> > >  )
> > >
> > > At first I didn't -->  my zookeeper pointed  to standard  8983 port, which
> >
> > > gave errors.
> >  >
> > >
> > > In the above I gave you a quick peak  how to  get the SolrCloud feature.
> > > In above the Zookeeper is embedded in  one  of your solr machines. If you
> > > don't want this you may  place zookeeper on a  different machine ( like
> > I'm
> > >  doing right now).
> > >
> > > If you need more help -  you  can contact me.
> > > Stijn Vanhoorelbeke,
> > >
> >  >
> > > --
> > > View this message  in context:
> >  
>>http://lucene.472066.n3.nabble.com/SolrCloud-new-tp1528872p2526080.html
>
> >  > Sent  from the Solr - User mailing list archive at  Nabble.com.
> > >
> >
> >
> >  --

Re: Synonyms.txt

2011-02-21 Thread Otis Gospodnetic
Hi Marc,

Check hit#3: http://search-lucene.com/?q=free+synonyms&fc_project=Solr

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Marc Kalberer 
> To: solr-user@lucene.apache.org
> Sent: Sun, February 20, 2011 1:09:41 PM
> Subject: Re: Synonyms.txt
> 
> Thanks Lee !
> Your are right,  I'm new to solr, and I didn't think about  that point. 
> But in my case, target is newspaper articles, so it's more  generic
> contents that specialized contents and I suppose that a generic  synonym
> dictionnary should fit my needs. Thanks for the link  !
> Marc
> 
> Programmers.ch
> Solutions libres et Opensources
> Tel: ++41  76 44 888 72
> Site: http://www.programmers.ch
> 
> 
> Le 20. 02. 11 17:27, lee carroll a  écrit :
> > Hi Marc,
> > I don't want to sound to prissy and also assume  to much about your
> > application but a generic synonym file could do more  harm than good. Lots 
of
> > applications have specific vocabularies and a  specific synonym list is what
> > is needed. Remember synonyms increase  recall but reduce precision. The
> > better matched your synonym list is to  your users and their searches the
> > better this ratio between recall and  precision will be.
> >
> > Without knowing your app or motivation I'd  say don't go for a generic list
> > but maybe its right for your  circumstances.
> >
> > see this thread here
> >
> > 
>http://lucene.472066.n3.nabble.com/French-synonyms-amp-Online-synonyms-td488829.html
>
> >
> >  On 20 February 2011 15:58, Marc Kalberer   wrote:
> >
> >> Hello,
> >> Is there any free Synonyms.txt  available on internet ?  Wasn't able to 
find
> >> any.   Specially interested by the french version.
> >> ++
> >>  Marc
> >> --
> >> *Programmers.ch*
> >> Développement  WEB
> >> Solutions libres et Opensources
> >> Tel: ++41 76 44 888  72
> >> Site: http://www.programmers.ch
> >>
>


how to update slr using Patch files

2011-02-21 Thread Isha Garg

hii,

 I am currently using solr1.4 . I want to  update it  using patch files 
.Can anyone tell me hoow to update it using these files .


Re: how to update slr using Patch files

2011-02-21 Thread Jan Høydahl
Hi,

I recommend you to browse through the WIKI at http://wiki.apache.org/solr/
You will learn a lot about Solr, much faster than asking every question on the 
list.

http://wiki.apache.org/solr/HowToContribute#Working_With_Patches

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 21. feb. 2011, at 13.45, Isha Garg wrote:

> hii,
> 
> I am currently using solr1.4 . I want to  update it  using patch files .Can 
> anyone tell me hoow to update it using these files .



Problem with XML encode UFT-8

2011-02-21 Thread jayronsoares

Hi I'm using solr py to stored files in pdf, however at moment of run script,
shows me that issue:

 An invalid XML character (Unicode: 0xc) was found in the element content of
the document.

Someone could give some help?

cheers
jayron
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2545020.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: change in field_type

2011-02-21 Thread François Schiettecatte
Hello

What about adding or deleting fields? I have been reindexing after doing that 
but is it needed?

François

On Feb 21, 2011, at 7:16 AM, Otis Gospodnetic wrote:

> Hello,
> 
> When you change types you typically want to reindex everything.
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
>> From: Isha Garg 
>> To: solr-user@lucene.apache.org
>> Sent: Mon, February 21, 2011 12:21:10 AM
>> Subject: change in field_type
>> 
>> Hii,
>> 
>>  I want to confirm that if a change the type of  a field from "sting" to 
>> "text" in schema.xml then whether i have rebuild the  index or will it works 
>> fine with previous index.
>> 
>> 
>> 
>> 



Re: Query performance very slow even after autowarming

2011-02-21 Thread johnnyisrael

No Wease,

We got the performance improvement after doing the following stuff

 --> Reduced the merge factor from 10 to 3.
-->  Auto-warming queries as I mentioned in my initial thread.

Thanks,

Johnny 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-performance-very-slow-even-after-autowarming-tp2010384p2545451.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: change in field_type

2011-02-21 Thread Otis Gospodnetic
Hi,

Adding and deleting fields is not something you do regularly in production, so 
I 
assume you are in development phase, in which case I'd suggest just reindexing. 
 
I'm not sure if you get an error or not if you, say, request fl=MyOldFieldName 
and the query returns documents without that field.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: François Schiettecatte 
> To: solr-user@lucene.apache.org
> Sent: Mon, February 21, 2011 9:14:06 AM
> Subject: Re: change in field_type
> 
> Hello
> 
> What about adding or deleting fields? I have been reindexing after  doing 
> that 
>but is it needed?
> 
> François
> 
> On Feb 21, 2011, at 7:16 AM,  Otis Gospodnetic wrote:
> 
> > Hello,
> > 
> > When you change  types you typically want to reindex everything.
> > 
> > Otis
> >  
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem  search :: http://search-lucene.com/
> > 
> > 
> > 
> > - Original  Message 
> >> From: Isha Garg 
> >> To: solr-user@lucene.apache.org
> >>  Sent: Mon, February 21, 2011 12:21:10 AM
> >> Subject: change in  field_type
> >> 
> >> Hii,
> >> 
> >>   I want to confirm that if a change the type of  a field from "sting" 
> >>  
>to 
>
> >> "text" in schema.xml then whether i have rebuild the  index  or will it 
>works 
>
> >> fine with previous index.
> >> 
> >> 
> >> 
> >> 
> 
>


Datetime problems with dataimport

2011-02-21 Thread MOuli

Hey guys.

I want to evaluate Solr as search engine, but now I have got an "Invalid
Date String"-Exception.

Here is the Error Message:

WARNUNG: Error creating document :
SolrInputDocument[{machineId=machineId(1.0)={1151665},
priceBrutto=priceBrutto(1.0)={56525.0},
priceNetto=priceNetto(1.0)={47500.0}, city=city(1.0)={Stiens},
zipcode=zipcode(1.0)={9051}, manufacturerName=manufacturerName(1.0)={},
manufacturerId=manufacturerId(1.0)={163}, model=model(1.0)={BB950R},
offerStatus=offerStatus(1.0)={commercial}, typeId=typeId(1.0)={168},
countryIsocode=countryIsocode(1.0)={NL},
yearConstruction=yearConstruction(1.0)={2003},
typeName=typeName(1.0)={Pressen}, lon=lon(1.0)={0.10050304},
lat=lat(1.0)={0.92957632}, de_b=de_b(1.0)={0}, nl_b=nl_b(1.0)={1},
uk_b=uk_b(1.0)={0}, pl_b=pl_b(1.0)={0}, fr_b=fr_b(1.0)={0},
hu_b=hu_b(1.0)={0}, 426_t=426_t(1.0)={Pers heeft +/- 57000 pakken gemaakt.},
attributes=attributes(1.0)={[Weitere Beschreibung des Angebots Pers heeft
+/- 57000 pakken gemaakt., Erstzulassung / Inbetriebnahme [B@1266f55,
Zentralschmierung, Weitwinkel-Gelenkwelle, Tandemachse, Schneidwerk, Knoter
Reinigungsgebläse, elektronische Überwachung, Ballenauswerfer, Bauart
Quaderballenpresse]}, 481_dt=481_dt(1.0)={[B@1266f55},
178_b=178_b(1.0)={true}, 175_b=175_b(1.0)={true}, 164_b=164_b(1.0)={true},
146_b=146_b(1.0)={true}, 112_b=112_b(1.0)={true}, 74_b=74_b(1.0)={true},
46_b=46_b(1.0)={true}, 188_i=188_i(1.0)={40}}]
org.apache.solr.common.SolrException: Invalid Date String:'[B@1266f55'
at org.apache.solr.schema.DateField.parseMath(DateField.java:163)
at
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:171)
at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)


and the dataimport.xml
---






















---

I want that all attributes get into the multivalued text field "attribute"
and for faceting they should got their own dynamic field. Has someone an
idea how to fix this problem?

Best Regards
Alexander
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Datetime-problems-with-dataimpor

Unicode case folding

2011-02-21 Thread Avi Rosenschein
Is there any analyzer that can do full Unicode case folding (for example, as
described at
http://www.w3.org/International/wiki/Case_folding#Recommendations_for_Case_Folding
)?

Specifically, in a German index, I would like the sharp s character (ß) to
be normalized into ss, which isn't done by any of the Unicode Normal Forms,
but only by case folding.

If there isn't an analyzer for this - any suggestions on how to roll my own?
Should I simply apply String.toUpperCase() followed by .toLowerCase()?

Thanks,
-- Avi


Re: Unicode case folding

2011-02-21 Thread Robert Muir
On Mon, Feb 21, 2011 at 12:16 PM, Avi Rosenschein
 wrote:
> Is there any analyzer that can do full Unicode case folding (for example, as
> described at
> http://www.w3.org/International/wiki/Case_folding#Recommendations_for_Case_Folding
> )?

Hi, in branch_3x you can use the ICUNormalizer2FilterFactory to do
this (normalization mode NFKC_CF)

http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/ICUNormalizer2FilterFactory.java

You can simply use this instead of LowerCaseFilter (just setup your
solr/lib with the solr-analysis-extras.jar, icu jar, and lucene's
contrib-icu jar).

> If there isn't an analyzer for this - any suggestions on how to roll my own?
> Should I simply apply String.toUpperCase() followed by .toLowerCase()?

No, I would recommend using the actual full case folding (with
normalization) instead. This is not the same as uppercase + lowercase.
For example, it will correctly handle the 3 forms of greek sigma.


Re: Datetime problems with dataimport

2011-02-21 Thread Bill Bell
B@1266f55

It looks like you are using MySQL.

The data field needs to be in -MM-dd'T'hh:mm:ss format.

I would probably concert datetime in Mysql to varchar() in this format.



On 2/21/11 8:40 AM, "MOuli"  wrote:

>
>Hey guys.
>
>I want to evaluate Solr as search engine, but now I have got an "Invalid
>Date String"-Exception.
>
>Here is the Error Message:
>
>WARNUNG: Error creating document :
>SolrInputDocument[{machineId=machineId(1.0)={1151665},
>priceBrutto=priceBrutto(1.0)={56525.0},
>priceNetto=priceNetto(1.0)={47500.0}, city=city(1.0)={Stiens},
>zipcode=zipcode(1.0)={9051}, manufacturerName=manufacturerName(1.0)={},
>manufacturerId=manufacturerId(1.0)={163}, model=model(1.0)={BB950R},
>offerStatus=offerStatus(1.0)={commercial}, typeId=typeId(1.0)={168},
>countryIsocode=countryIsocode(1.0)={NL},
>yearConstruction=yearConstruction(1.0)={2003},
>typeName=typeName(1.0)={Pressen}, lon=lon(1.0)={0.10050304},
>lat=lat(1.0)={0.92957632}, de_b=de_b(1.0)={0}, nl_b=nl_b(1.0)={1},
>uk_b=uk_b(1.0)={0}, pl_b=pl_b(1.0)={0}, fr_b=fr_b(1.0)={0},
>hu_b=hu_b(1.0)={0}, 426_t=426_t(1.0)={Pers heeft +/- 57000 pakken
>gemaakt.},
>attributes=attributes(1.0)={[Weitere Beschreibung des Angebots Pers heeft
>+/- 57000 pakken gemaakt., Erstzulassung / Inbetriebnahme [B@1266f55,
>Zentralschmierung, Weitwinkel-Gelenkwelle, Tandemachse, Schneidwerk,
>Knoter
>Reinigungsgebläse, elektronische Überwachung, Ballenauswerfer, Bauart
>Quaderballenpresse]}, 481_dt=481_dt(1.0)={[B@1266f55},
>178_b=178_b(1.0)={true}, 175_b=175_b(1.0)={true}, 164_b=164_b(1.0)={true},
>146_b=146_b(1.0)={true}, 112_b=112_b(1.0)={true}, 74_b=74_b(1.0)={true},
>46_b=46_b(1.0)={true}, 188_i=188_i(1.0)={40}}]
>org.apache.solr.common.SolrException: Invalid Date String:'[B@1266f55'
>at org.apache.solr.schema.DateField.parseMath(DateField.java:163)
>at
>org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:171)
>at
>org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
>at
>org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246
>)
>at
>org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdatePr
>ocessorFactory.java:60)
>at
>org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
>at
>org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHa
>ndler.java:292)
>at
>org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav
>a:392)
>at
>org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:2
>42)
>at
>org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
>at
>org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.
>java:331)
>at
>org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:3
>89)
>at
>org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:37
>0)
>
>
>and the dataimport.xml
>---
>
>
>function attributesFunction(row) {
>
>if(row.get('att_unit') != null &&
>row.get('att_unit').equals('null') &&
>row.get('att_unit').equals('Null')) {
>row.put('attributes', row.get('att_name') + ' ' +
>row.get('atr_value') +
>row.get('att_unit'));
>} else {
>row.put('attributes', row.get('att_name') + ' ' +
>row.get('atr_value'));
>}
>
>if (row.get('att_type').equals('int')) {
>row.put(row.get('attribute_id') + '_i',
>row.get('atr_value'));
>} else if (row.get('att_type').equals('float')) {
>row.put(row.get('attribute_id') + '_f',
>row.get('atr_value'));
>} else if (row.get('att_type').equals('string')) {
>row.put(row.get('attribute_id') + '_s',
>row.get('atr_value'));
>} else if (row.get('att_type').equals('text')) {
>row.put(row.get('attribute_id') + '_t',
>row.get('atr_value'));
>} else if (row.get('att_type').equals('datetime')) {
>row.put(row.get('attribute_id') + '_dt',
>row.get('atr_value'));
>}
>
>row.remove('attribute_id');
>row.remove('att_name');
>row.remove('att_unit');
>row.remove('atr_value');
>row.remove('att_type');
>return row;
>}
>]]>
>
>
>
>url="jdbc:mysql://localhost/mydb" user="root" password=""/>
>
>
>
>
>
>
>transformer="script:attributesFunction"
>query="SELECT att.attribute_id, thl.thl_translation as att_name,
>att.att_unit, if(av.atr_value = '-00-00 00:00:00', null ,
>av.atr_value)
>as atr_value, att.att_type  FROM attribute_values_datetime av  join
>attributes att on att.attribute_id = av.attribute_id join texts t on
>t.text_key=att.at

Re: Unicode case folding

2011-02-21 Thread Avi Rosenschein
Excellent. Thanks, Robert!

-- Avi

On Mon, Feb 21, 2011 at 19:24, Robert Muir  wrote:

> On Mon, Feb 21, 2011 at 12:16 PM, Avi Rosenschein
>  wrote:
> > Is there any analyzer that can do full Unicode case folding (for example,
> as
> > described at
> >
> http://www.w3.org/International/wiki/Case_folding#Recommendations_for_Case_Folding
> > )?
>
> Hi, in branch_3x you can use the ICUNormalizer2FilterFactory to do
> this (normalization mode NFKC_CF)
>
>
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/ICUNormalizer2FilterFactory.java
>
> You can simply use this instead of LowerCaseFilter (just setup your
> solr/lib with the solr-analysis-extras.jar, icu jar, and lucene's
> contrib-icu jar).
>
> > If there isn't an analyzer for this - any suggestions on how to roll my
> own?
> > Should I simply apply String.toUpperCase() followed by .toLowerCase()?
>
> No, I would recommend using the actual full case folding (with
> normalization) instead. This is not the same as uppercase + lowercase.
> For example, it will correctly handle the 3 forms of greek sigma.
>


Question regarding indexing multiple languages, stopwords, etc.

2011-02-21 Thread Greg Georges
Hello all,

I have gotten my DataImporthandler to index my data from my MySQL database. I 
was looking at the schema tool and noticing that stopwords in different 
languages are being indexed as terms. The 6 languages we have are English, 
French, Spanish, Chinese, German and Italian.

Right now I am using the basic schema configuration for English. How do I 
define them for others languages? I have looked at the wiki page 
(http://wiki.apache.org/solr/LanguageAnalysis) but I would like to have an 
example configuration for all the languages I need. Also I need a list of 
stopwords for these languages.  So far I have this


  







  

Thanks in advance

Greg


Solr 4.0 DIH

2011-02-21 Thread Mark
I download Solr 4.0 from trunk today and I tried using a custom 
Evaluator during my full/delta-importing.


Within the evaluate method though, the Context is always null? When 
using this same class with Solr 1.4.1 the context always exists. Is this 
a bug or is this behavior expected?


Thanks


public class MyEvaluator extends Evaluator {
  @Override
  public String evaluate(String argument, Context context) {
  // Argument is present however context is always null!
  }
}


FieldCollapsing

2011-02-21 Thread Mark
Checked out a version of 4.0 to test field collapsing. When I field 
collapse the numFound always returns the number of documents BEFORE 
collapsing. Is there a way to get the total number of documents after 
collapsing?


Thanks


Any plan to make Field Collapsing available for distributed search?

2011-02-21 Thread Andy
Hello,

I'm looking into Field Collapsing. According to the documentation one 
limitation is that "distributed search support for result grouping has not yet 
been implemented."

Just wondered if there's any plan to add distributed search support to field 
collapsing. Or is there any technical obstacle that make such a feature 
unlikely?

Thanks

Andy


  


Re: Question regarding indexing multiple languages, stopwords, etc.

2011-02-21 Thread Otis Gospodnetic
Greg,

You need to get stopword lists for your 6 languages.  Then you need to create 
new field types just like that 'text' type, one for each language.  Point them 
to the appropriate stopwords files and instead of "English" specify each one of 
your languages.  You can either index each language in its own index or put 
them 
all in the same index, in which case you'll want fields like title_en, 
title_fr, 
etc.

Check http://search-lucene.com/ - this multilingual stuff is a common topic.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Greg Georges 
> To: "solr-user@lucene.apache.org" 
> Sent: Mon, February 21, 2011 4:27:46 PM
> Subject: Question regarding indexing multiple languages, stopwords, etc.
> 
> Hello all,
> 
> I have gotten my DataImporthandler to index my data from my  MySQL database. 
> I 
>was looking at the schema tool and noticing that stopwords in  different 
>languages are being indexed as terms. The 6 languages we have are  English, 
>French, Spanish, Chinese, German and Italian.
> 
> Right now I am  using the basic schema configuration for English. How do I 
>define them for  others languages? I have looked at the wiki page 
>(http://wiki.apache.org/solr/LanguageAnalysis) but I would like to have an  
>example configuration for all the languages I need. Also I need a list of  
>stopwords for these languages.  So far I have this
> 
> 
>
>  
>  
> 
>   ignoreCase="true"
>  words="stopwords.txt"
>  enablePositionIncrements="true"
>  />
>  generateWordParts="1"  
>generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="  
>splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
>
> 
> Thanks in advance
> 
> Greg
> 


Re: Faceting

2011-02-21 Thread Praveen Parameswaran
Hi ,
@Tommaso @Jan Høydahl Thanks for the response :)

I 've done it almost similar to what Tommaso suggested and yes it's about
70-80% accurate.
I understand the contradiction in the search - customer find stuff without
the exact right wording (recall) at the same time as you want the query to
be precise (precision).

In my scenario both cases are there as well, but mostly a customer would
know which product name he is searching for and he will be interested in
comparing the prices that different marchants offer. What I feel is that ,
may be the "Search" itself has to be classified based on the contexts.

Will it be possible in solr to have the below:
1 . A customer uses the correct product name to search , get the accurate
results
2.  A customer uses a keyword or without the exact name , get the most
relevant results.

2nd part is fine as it's working good. 1st part is where I'm struggling.

thanks
Praveen

On Mon, Feb 21, 2011 at 5:23 PM, Tommaso Teofili
wrote:

> Hi Praveen,
> as far as I understand you have to set the type of the field(s) you are
> searching over to be conservative.
> So for example you won't include stemmer and lowercase filters and use only
> a whitespace tokenizer, more over you should search with the default
> operator set to AND.
> Then faceting over those field(s) will depend on those type settings.
> You may find the following wiki page useful:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> My 2 cents,
>
>
> 2011/2/21 Praveen Parameswaran 
>
> > Hi,
> >
> > Is it possible to have 100% accuracy for facet counts using solr ? Since
> > this is for a product price comparison site I would need the search to
> > return accurate results. for example if I search "sony lcd Tv" I do not
> > want
> > "sony Led Tv" to be returned int he results.  Please let me know if this
> is
> > possible and how?
> >
> >
> > Thanks
> >
> > Prav
> >
>


AlternateDistributedMLT.patch not working

2011-02-21 Thread Isha Garg

Hello,

 I tried to use SOLR-788 with solr1.4 so that distributed MLT works 
well . While working with this patch i got an error mesg like


1 out of 1 hunk FAILED -- saving rejects to file 
src/java/org/apache/solr/handler/component/MoreLikeThisComponent.java.rej


Can anybody help me out?

Thanks!
Isha Garg