RE: DIH delta-import question

2010-10-19 Thread Ephraim Ofir
According to the DIH wiki, delta-import is only supported by sql
(http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_comman
d-1)


Ephraim Ofir

-Original Message-
From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] 
Sent: Friday, October 15, 2010 8:20 AM
To: solr-user@lucene.apache.org
Subject: DIH delta-import question

Dear list,

I'm trying to delta-import with datasource FileDataSource and
processor FileListEntityProcessor. I want to load only files
which are newer than dataimport.properties -> last_index_time.
It looks like that newerThan="${dataimport.last_index_time}" is
without any function.

Can it be that newerThan is configured under FileListEntityProcessor
but used for the next following entity processor and not for
FileListEntityProcessor itself?

This is in my case the XPathEntityProcessor which doesn't support
newerThan.
Version is solr 4.0 from trunk.

Regards,
Bernd


RE: DIH - configure password in 1 place and store it in encrypted form?

2010-10-19 Thread Ephraim Ofir
You could include a common file with the JdbcDataSource
(http://wiki.apache.org/solr/SolrConfigXml#XInclude) or add the password
as a property in solr.xml in the container scope
(http://wiki.apache.org/solr/CoreAdmin#Configuration) so it will be
available to all cores.
Personally, I use a single configuration for all cores with soft-linked
config files, so I only have to change the config in one place.

Ephraim Ofir


-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Sunday, October 17, 2010 7:05 PM
To: solr-user@lucene.apache.org
Subject: Re: DIH - configure password in 1 place and store it in
encrypted form?

On Sun, Oct 17, 2010 at 7:02 PM, Arunkumar Ayyavu
 wrote:
> Hi!
>
> I have multiple cores reading from the same database and I've provided
> the user credentials in all data-config.xml files. Is there a way to
> tell JdbcDataSource in data-config.xml to read the username and
> password from a file? This would help me not to change the
> username/password in multiple data-config.xml files.
>
> And is it possible to store the password in encrypted and let the DIH
> to call the decrypter to read the password?
[...]

As far as I am aware, it is not possible to do either of the two
options above. However, one could extend the JdbcDataSource
class to add such functionality.

Regards,
Gora


Re: Removing Common Web Page Header and Footer from All Content Fetched by Nutch

2010-10-19 Thread Markus Jelsma
Unfortunately, Nutch still uses Tika 0.7 in 1.2 and trunk. Nutch needs to be 
upgraded to Tika 0.8 (when it's released or just the current trunk). Also, the 
Boilerpipe API needs to be exposed through Nutch configuration, which extractor 
can be used, which parameters need to be set etc.

Upgrading to Tika's trunk might be relatively easy but exposing Boilerpipe 
surely isn't.

On Tuesday, October 19, 2010 06:47:43 am Otis Gospodnetic wrote:
> Hi Israel,
> 
> You can use this: http://search-lucene.com/?q=boilerpipe&fc_project=Tika
> Not sure if it's built into Nutch, though...
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
> 
> > From: Israel Ekpo 
> > To: solr-user@lucene.apache.org; u...@nutch.apache.org
> > Sent: Mon, October 18, 2010 9:01:50 PM
> > Subject: Removing Common Web Page Header and Footer from All Content
> > Fetched by
> >
> >Nutch
> >
> > Hi All,
> > 
> > I am indexing a web application with approximately 9500 distinct  URL and
> > contents using Nutch and Solr.
> > 
> > I use Nutch to fetch the urls,  links and the crawl the entire web
> > application to extract all the content for  all pages.
> > 
> > Then I run the solrindex command to send the content to  Solr.
> > 
> > The problem that I have now is that the first 1000 or so characters  of
> > some pages and the last 400 characters of the pages are showing up in
> > the  search results.
> > 
> > These are contents of the common header and footer  used in the site
> > respectively.
> > 
> > The only work around that I have now is  to index everything and then go
> > through each document one at a time to remove  the first 1000 characters
> > if the levenshtein distance between the first 1000  characters of the
> > page and the common header is less than a certain value.  Same applies
> > to the footer content common to all pages.
> > 
> > Is there a way  to ignore certain "stop phrase" so to speak in the Nutch
> > configuration based  on levenshtein distance or jaro winkler distance so
> > that certain parts of the  fetched data that matches this stop phrases
> > will not be parsed?
> > 
> > Any  useful pointers would be highly appreciated.
> > 
> > Thanks in  advance.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Removing Common Web Page Header and Footer from All Content Fetched by Nutch

2010-10-19 Thread Israel Ekpo
Thanks Otis and Markus for your input.

I will check it out today.

On Tue, Oct 19, 2010 at 4:45 AM, Markus Jelsma
wrote:

> Unfortunately, Nutch still uses Tika 0.7 in 1.2 and trunk. Nutch needs to
> be
> upgraded to Tika 0.8 (when it's released or just the current trunk). Also,
> the
> Boilerpipe API needs to be exposed through Nutch configuration, which
> extractor
> can be used, which parameters need to be set etc.
>
> Upgrading to Tika's trunk might be relatively easy but exposing Boilerpipe
> surely isn't.
>
> On Tuesday, October 19, 2010 06:47:43 am Otis Gospodnetic wrote:
> > Hi Israel,
> >
> > You can use this: http://search-lucene.com/?q=boilerpipe&fc_project=Tika
> > Not sure if it's built into Nutch, though...
> >
> > Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> >
> > - Original Message 
> >
> > > From: Israel Ekpo 
> > > To: solr-user@lucene.apache.org; u...@nutch.apache.org
> > > Sent: Mon, October 18, 2010 9:01:50 PM
> > > Subject: Removing Common Web Page Header and Footer from All Content
> > > Fetched by
> > >
> > >Nutch
> > >
> > > Hi All,
> > >
> > > I am indexing a web application with approximately 9500 distinct  URL
> and
> > > contents using Nutch and Solr.
> > >
> > > I use Nutch to fetch the urls,  links and the crawl the entire web
> > > application to extract all the content for  all pages.
> > >
> > > Then I run the solrindex command to send the content to  Solr.
> > >
> > > The problem that I have now is that the first 1000 or so characters  of
> > > some pages and the last 400 characters of the pages are showing up in
> > > the  search results.
> > >
> > > These are contents of the common header and footer  used in the site
> > > respectively.
> > >
> > > The only work around that I have now is  to index everything and then
> go
> > > through each document one at a time to remove  the first 1000
> characters
> > > if the levenshtein distance between the first 1000  characters of the
> > > page and the common header is less than a certain value.  Same applies
> > > to the footer content common to all pages.
> > >
> > > Is there a way  to ignore certain "stop phrase" so to speak in the
> Nutch
> > > configuration based  on levenshtein distance or jaro winkler distance
> so
> > > that certain parts of the  fetched data that matches this stop phrases
> > > will not be parsed?
> > >
> > > Any  useful pointers would be highly appreciated.
> > >
> > > Thanks in  advance.
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536600 / 06-50258350
>



-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Uppercase and lowercase queries

2010-10-19 Thread PeterKerk

I want to query on cityname. This works when I query for example:
"Boston"

But when I query "boston" it didnt show any results. In the database is
stored: "Boston".

So I thought: I should change the filter on this field to make everything
lowercase.


The field definition for city is: 

So I changed its fieldtype "string" from: 

TO:


  


 
  


  



But it still doesnt show any results when I query "boston"...why?   
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Uppercase-and-lowercase-queries-tp1731349p1731349.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Uppercase and lowercase queries

2010-10-19 Thread Pradeep Singh
Use text field.

On Tue, Oct 19, 2010 at 3:19 AM, PeterKerk  wrote:

>
> I want to query on cityname. This works when I query for example:
> "Boston"
>
> But when I query "boston" it didnt show any results. In the database is
> stored: "Boston".
>
> So I thought: I should change the filter on this field to make everything
> lowercase.
>
>
> The field definition for city is:  indexed="true" stored="true"/>
>
> So I changed its fieldtype "string" from:  class="solr.StrField" sortMissingLast="true" omitNorms="true">
>
> TO:
>
> omitNorms="true">
>  
>
>
>  
>  
>
>
>  
>
>
>
> But it still doesnt show any results when I query "boston"...why?
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Uppercase-and-lowercase-queries-tp1731349p1731349.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Uppercase and lowercase queries

2010-10-19 Thread Markus Jelsma
Because you need to reindex.

On Tuesday, October 19, 2010 12:19:53 pm PeterKerk wrote:
> I want to query on cityname. This works when I query for example:
> "Boston"
> 
> But when I query "boston" it didnt show any results. In the database is
> stored: "Boston".
> 
> So I thought: I should change the filter on this field to make everything
> lowercase.
> 
> 
> The field definition for city is:  indexed="true" stored="true"/>
> 
> So I changed its fieldtype "string" from:  class="solr.StrField" sortMissingLast="true" omitNorms="true">
> 
> TO:
> 
>  omitNorms="true">
>   
>   
> 
>   
>   
> 
> 
>   
>   
> 
> 
> But it still doesnt show any results when I query "boston"...why?

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Uppercase and lowercase queries

2010-10-19 Thread Markus Jelsma
Yes, and reindex. And i suggest not to use `string` as the name of the 
fieldType as it will confuse later.




  


 
  


  


On Tuesday, October 19, 2010 12:25:53 pm Pradeep Singh wrote:
> Use text field.
> 
> On Tue, Oct 19, 2010 at 3:19 AM, PeterKerk  wrote:
> > I want to query on cityname. This works when I query for example:
> > "Boston"
> > 
> > But when I query "boston" it didnt show any results. In the database is
> > stored: "Boston".
> > 
> > So I thought: I should change the filter on this field to make everything
> > lowercase.
> > 
> > 
> > The field definition for city is:  > indexed="true" stored="true"/>
> > 
> > So I changed its fieldtype "string" from:  > class="solr.StrField" sortMissingLast="true" omitNorms="true">
> > 
> > TO:
> > > 
> > omitNorms="true">
> > 
> >  
> >  
> >
> >
> >
> >  
> >  
> >  
> >  
> >
> >
> >  
> >  
> >  
> >
> > 
> > But it still doesnt show any results when I query "boston"...why?
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Uppercase-and-lowercase-queries-tp1731
> > 349p1731349.html Sent from the Solr - User mailing list archive at
> > Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Uppercase and lowercase queries

2010-10-19 Thread PeterKerk

I now used textfield...and it works, so thanks! :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Uppercase-and-lowercase-queries-tp1731349p1731423.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Commits on service after shutdown

2010-10-19 Thread Jan Høydahl / Cominvent
You never get full control of commits, as Solr will auto-commit anyway whenever 
the (configurable) input buffer is full. With the current architecture you 
cannot really trust adds or commits to 100% certainly be successful, because 
the server may have been restarted between an add and commit() without you 
noticing etc. So your feeder app should expect failures to happen, including 
added docs to be committed, and be prepared to re-sumbit any documents needed 
after a failure. This can be achieved by querying the index at regular 
intervals to see if you are in sync. Or you could help implement SOLR-1924 to 
get a reliable callback mechanism :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 18. okt. 2010, at 21.50, Ezequiel Calderara wrote:

> I understand, but i want to have control of what is commit or not.
> In our scenario, we want to add documents to the index, and maybe after an
> hour trigger the commit.
> 
> If in the middle, we have a server shutdown or any process sending a
> Shutdown signal to the process. I don't want those documents being commited.
> 
> Should i file a bug issue or an enhacement issue?.
> 
> Thanks
> 
> 
> On Mon, Oct 18, 2010 at 3:54 PM, Israel Ekpo  wrote:
> 
>> The documents should be implicitly committed when the Lucene index is
>> closed.
>> 
>> When you perform a graceful shutdown, the Lucene index gets closed and the
>> documents get committed implicitly.
>> 
>> When the shutdown is abrupt as in a KILL -9, then this does not happen and
>> the updates are lost.
>> 
>> You can use the auto commit parameter when sending your updates so that the
>> changes are saved right away, thought this could slow down the indexing
>> speed considerably but I do not believe there are parameters to keep those
>> un-commited documents "alive" after a kill.
>> 
>> 
>> 
>> On Mon, Oct 18, 2010 at 2:46 PM, Ezequiel Calderara >> wrote:
>> 
>>> Hi, i'm new in the mailing list.
>>> I'm implementing Solr in my actual job, and i'm having some problems.
>>> I was testing the consistency of the "commits". I found for example that
>> if
>>> we add X documents to the index (without commiting) and then we restart
>> the
>>> service, the documents are commited. They show up in the results. This is
>>> interpreted to me like an error.
>>> But when we add X documents to the index (without commiting) and then we
>>> kill the process and we start it again, the documents doesn't appear.
>> This
>>> behaviour is the one i want.
>>> 
>>> Is there any param to avoid the auto-committing of documents after a
>>> shutdown?
>>> Is there any param to keep those un-commited documents "alive" after a
>>> kill?
>>> 
>>> Thanks!
>>> 
>>> --
>>> __
>>> Ezequiel.
>>> 
>>> Http://www.ironicnet.com  <
>> http://www.ironicnet.com/>
>>> 
>> 
>> 
>> 
>> --
>> °O°
>> "Good Enough" is not good enough.
>> To give anything less than your best is to sacrifice the gift.
>> Quality First. Measure Twice. Cut Once.
>> http://www.israelekpo.com/
>> 
> 
> 
> 
> -- 
> __
> Ezequiel.
> 
> Http://www.ironicnet.com



Re: count(*) equivilent in Solr/Lucene

2010-10-19 Thread Grant Ingersoll

On Oct 19, 2010, at 2:09 AM, Dennis Gearon wrote:

> I/my team will have to look at that and decode it,LOL! I get some of it.
> 
> The database version returns 1 row, with the answer.
> 
> What does this return and how fast is it on BIG indexes?

rows=0 returns 0 rows, but the total count will be returned.  You can do rows=0 
with any query to get the total number of matches.

> 
> PS, that should have been:
> .
> .
> .
>   date_column2 < :end_date;
> .
> 
> Dennis Gearon
> 
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
> better idea to learn from others’ mistakes, so you do not have to make them 
> yourself. from 
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> EARTH has a Right To Life,
>  otherwise we all die.
> 
> 
> --- On Mon, 10/18/10, Chris Hostetter  wrote:
> 
>> From: Chris Hostetter 
>> Subject: Re: count(*) equivilent in Solr/Lucene
>> To: solr-user@lucene.apache.org
>> Date: Monday, October 18, 2010, 10:26 PM
>> : 
>> : SELECT 
>> :   COUNT(*) 
>> : WHERE
>> :   date_column1 > :start_date AND
>> :   date_column2 > :end_date;
>> 
>>q=*:*&fq=column1:[start TO
>> *]&fq=column2:[end TO *]&rows=0
>> 
>> ...every result includes a total count.
>> 
>> -Hoss
>> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search



boosting injection

2010-10-19 Thread Andrea Gazzarini

 Hi all,
I have a client that is sending this query

q=title:history AND author:joyce

is it possible to "transform" at runtime this query in this way:

q=title:history^10 AND author:joyce^5

?

Best regards,
Andrea




Re: boosting injection

2010-10-19 Thread Ken Stanley
Andrea,

Using the SOLR dismax query handler, you could set up queries like this to
boost on fields of your choice. Basically, the q parameter would be the
query terms (without the field definitions, and a qf (Query Fields)
parameter that you use to define your boost(s):
http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative
would be to parse the query in whatever application is sending the queries
to the SOLR instance to make the necessary transformations.

Regards,

Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini <
andrea.gazzar...@atcult.it> wrote:

>  Hi all,
> I have a client that is sending this query
>
> q=title:history AND author:joyce
>
> is it possible to "transform" at runtime this query in this way:
>
> q=title:history^10 AND author:joyce^5
>
> ?
>
> Best regards,
> Andrea
>
>
>


Re: snapshot-4.0 and maven

2010-10-19 Thread Matt Mitchell
Hey thanks Tommy. To be more specific, I'm trying to use SolrJ in a
clojure project. When I try to use SolrJ using what you showed me, I
get errors saying lucene classes can't be found etc.. Is there a way
to build everything SolrJ (snapshot-4.0) needs into one jar?

Matt

On Mon, Oct 18, 2010 at 11:01 PM, Tommy Chheng  wrote:
> Once you built the solr 4.0 jar, you can use mvn's install command like
> this:
>
> mvn install:install-file -DgroupId=org.apache -DartifactId=solr
> -Dpackaging=jar -Dversion=4.0-SNAPSHOT -Dfile=solr-4.0-SNAPSHOT.jar
> -DgeneratePom=true
>
> @tommychheng
>
> On 10/18/10 7:28 PM, Matt Mitchell wrote:
>
> I'd like to get solr snapshot-4.0 pushed into my local maven repo. Is
> this possible to do? If so, could someone give me a tip or two on
> getting started?
>
> Thanks,
> Matt
>


Re: **SPAM** Re: boosting injection

2010-10-19 Thread Andrea Gazzarini
Hi Ken, 
thanks for your response...unfortunately it doesn't solve my problem.

I cannot chnage the client behaviour so the query must be a query and not only 
the query terms.
In this scenario, It would be great, for example, if I could declare the boost 
in the schema field definitionbut I think it's not possible isn't it?

Regards
Andrea 
  _  

From: Ken Stanley [mailto:doh...@gmail.com]
To: solr-user@lucene.apache.org
Sent: Tue, 19 Oct 2010 15:05:31 +0200
Subject: **SPAM**  Re: boosting injection

Andrea,
  
  Using the SOLR dismax query handler, you could set up queries like this to
  boost on fields of your choice. Basically, the q parameter would be the
  query terms (without the field definitions, and a qf (Query Fields)
  parameter that you use to define your boost(s):
  http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative
  would be to parse the query in whatever application is sending the queries
  to the SOLR instance to make the necessary transformations.
  
  Regards,
  
  Ken
  
  It looked like something resembling white marble, which was
  probably what it was: something resembling white marble.
  -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"
  
  
  On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini <
  andrea.gazzar...@atcult.it> wrote:
  
  >  Hi all,
  > I have a client that is sending this query
  >
  > q=title:history AND author:joyce
  >
  > is it possible to "transform" at runtime this query in this way:
  >
  > q=title:history^10 AND author:joyce^5
  >
  > ?
  >
  > Best regards,
  > Andrea
  >
  >
  >


Re: boosting injection

2010-10-19 Thread Andrea Gazzarini
Hi Ken, 
thanks for your response...unfortunately it doesn't solve my problem.

I cannot chnage the client behaviour so the query must be a query and not only 
the query terms.
In   this scenario, It would be great, for example, if I could declare the   
boost in the schema field definitionbut I think it's not possible   isn't 
it?

Regards
Andrea   _  

From: Ken Stanley [mailto:doh...@gmail.com]
To: solr-user@lucene.apache.org
Sent: Tue, 19 Oct 2010 15:05:31 +0200
Subject: **SPAM**  Re: boosting injection

Andrea,
  
  Using the SOLR dismax query handler, you could set up queries like this to
  boost on fields of your choice. Basically, the q parameter would be the
  query terms (without the field definitions, and a qf (Query Fields)
  parameter that you use to define your boost(s):
  http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative
  would be to parse the query in whatever application is sending the queries
  to the SOLR instance to make the necessary transformations.
  
  Regards,
  
  Ken
  
  It looked like something resembling white marble, which was
  probably what it was: something resembling white marble.
  -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"
  
  
  On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini <
  andrea.gazzar...@atcult.it> wrote:
  
  >  Hi all,
  > I have a client that is sending this query
  >
  > q=title:history AND author:joyce
  >
  > is it possible to "transform" at runtime this query in this way:
  >
  > q=title:history^10 AND author:joyce^5
  >
  > ?
  >
  > Best regards,
  > Andrea
  >
  >
  >


Re: **SPAM** Re: boosting injection

2010-10-19 Thread Markus Jelsma
Index-time boosting maybe?
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22

On Tuesday, October 19, 2010 04:23:46 pm Andrea Gazzarini wrote:
> Hi Ken,
> thanks for your response...unfortunately it doesn't solve my problem.
> 
> I cannot chnage the client behaviour so the query must be a query and not
> only the query terms. In this scenario, It would be great, for example, if
> I could declare the boost in the schema field definitionbut I think
> it's not possible isn't it?
> 
> Regards
> Andrea
>   _
> 
> From: Ken Stanley [mailto:doh...@gmail.com]
> To: solr-user@lucene.apache.org
> Sent: Tue, 19 Oct 2010 15:05:31 +0200
> Subject: **SPAM**  Re: boosting injection
> 
> Andrea,
> 
>   Using the SOLR dismax query handler, you could set up queries like this
> to boost on fields of your choice. Basically, the q parameter would be the
> query terms (without the field definitions, and a qf (Query Fields)
> parameter that you use to define your boost(s):
>   http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative
>   would be to parse the query in whatever application is sending the
> queries to the SOLR instance to make the necessary transformations.
> 
>   Regards,
> 
>   Ken
> 
>   It looked like something resembling white marble, which was
>   probably what it was: something resembling white marble.
>   -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"
> 
> 
>   On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini <
> 
>   andrea.gazzar...@atcult.it> wrote:
>   >  Hi all,
>   > 
>   > I have a client that is sending this query
>   > 
>   > q=title:history AND author:joyce
>   > 
>   > is it possible to "transform" at runtime this query in this way:
>   > 
>   > q=title:history^10 AND author:joyce^5
>   > 
>   > ?
>   > 
>   > Best regards,
>   > Andrea

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: boosting injection

2010-10-19 Thread Andrea Gazzarini

   Y-E-A-H! I think it's so!
Markus, what are disadvantages of this boosting strategy?

Thanks a lot
Andrea

Il 19/10/2010 16:25, Markus Jelsma ha scritto:

Index-time boosting maybe?
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22

On Tuesday, October 19, 2010 04:23:46 pm Andrea Gazzarini wrote:

Hi Ken,
thanks for your response...unfortunately it doesn't solve my problem.

I cannot chnage the client behaviour so the query must be a query and not
only the query terms. In this scenario, It would be great, for example, if
I could declare the boost in the schema field definitionbut I think
it's not possible isn't it?

Regards
Andrea
   _

From: Ken Stanley [mailto:doh...@gmail.com]
To: solr-user@lucene.apache.org
Sent: Tue, 19 Oct 2010 15:05:31 +0200
Subject: **SPAM**  Re: boosting injection

Andrea,

   Using the SOLR dismax query handler, you could set up queries like this
to boost on fields of your choice. Basically, the q parameter would be the
query terms (without the field definitions, and a qf (Query Fields)
parameter that you use to define your boost(s):
   http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative
   would be to parse the query in whatever application is sending the
queries to the SOLR instance to make the necessary transformations.

   Regards,

   Ken

   It looked like something resembling white marble, which was
   probably what it was: something resembling white marble.
   -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


   On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini<

   andrea.gazzar...@atcult.it>  wrote:
   >   Hi all,
   >
   >  I have a client that is sending this query
   >
   >  q=title:history AND author:joyce
   >
   >  is it possible to "transform" at runtime this query in this way:
   >
   >  q=title:history^10 AND author:joyce^5
   >
   >  ?
   >
   >  Best regards,
   >  Andrea


Re: **SPAM** Re: boosting injection

2010-10-19 Thread Ken Stanley
Andrea,

Another approach, aside of Markus' suggestion, would be to create your own
handler that could intercept the query and perform whatever necessary
transformations that you need at query time. However, that would require
having Java knowledge (which I make no assumption).

Regards,

Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


On Tue, Oct 19, 2010 at 10:23 AM, Andrea Gazzarini <
andrea.gazzar...@atcult.it> wrote:

>  Hi Ken,
> thanks for your response...unfortunately it doesn't solve my problem.
>
> I cannot chnage the client behaviour so the query must be a query and not
> only the query terms.
> In this scenario, It would be great, for example, if I could declare the
> boost in the schema field definitionbut I think it's not possible isn't
> it?
>
> Regards
> Andrea
>
> --
> *From:* Ken Stanley [mailto:doh...@gmail.com]
> *To:* solr-user@lucene.apache.org
> *Sent:* Tue, 19 Oct 2010 15:05:31 +0200
> *Subject:* **SPAM** Re: boosting injection
>
> Andrea,
>
> Using the SOLR dismax query handler, you could set up queries like this to
> boost on fields of your choice. Basically, the q parameter would be the
> query terms (without the field definitions, and a qf (Query Fields)
> parameter that you use to define your boost(s):
> http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative
> would be to parse the query in whatever application is sending the queries
> to the SOLR instance to make the necessary transformations.
>
> Regards,
>
> Ken
>
> It looked like something resembling white marble, which was
> probably what it was: something resembling white marble.
> -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"
>
>
> On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini <
> andrea.gazzar...@atcult.it> wrote:
>
> > Hi all,
> > I have a client that is sending this query
> >
> > q=title:history AND author:joyce
> >
> > is it possible to "transform" at runtime this query in this way:
> >
> > q=title:history^10 AND author:joyce^5
> >
> > ?
> >
> > Best regards,
> > Andrea
> >
> >
> >
>
>


Documents and cores

2010-10-19 Thread Olson, Ron
Hi all-

I have a newbie design question about documents, especially with SQL databases. 
I am trying to set up Solr to go against a database that, for example, has 
"items" and "people". The way I see it, and I don't know if this is right or 
not (thus the question), is that I see both as separate documents as an item 
may contain a list of parts, which the user may want to search, and, as part of 
the "item", view the list of people who have ordered the item.

Then there's the actual "people", who the user might want to search to find a 
name and, consequently, what items they ordered. To me they are both "top 
level" things, with some overlap of fields. If I'm searching for "people", I'm 
likely not going to be interested in the parts of the item, while if I'm 
searching for "items" the likelihood is that I may want to search for "42532" 
which is, in this instance, a SKU, and not get hits on the zip code section of 
the "people".

Does it make sense, then, to separate these two out as separate documents? I 
believe so because the documentation I've read suggests that a document should 
be analogous to a row in a table (in this case, very de-normalized). What is 
tripping me up is, as far as I can tell, you can have only one document type 
per index, and thus one document per core. So in this example, I have two 
cores, "items" and "people". Is this correct? Should I embrace the idea of 
having many cores or am I supposed to have a single, unified index with all 
documents (which doesn't seem like Solr supports).

The ultimate question comes down to the search interface. I don't necessarily 
want to have the user explicitly state which document they want to search; I'd 
like them to simply type "42532" and get documents from both cores, and then 
possibly allow for filtering results after the fact, not before. As I've only 
used the admin site so far (which is core-specific), does the client API allow 
for unified searching across all cores? Assuming it does, I'd think my idea of 
multiple-documents is okay, but I'd love to hear from people who actually know 
what they're doing. :)

Thanks,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: boosting injection

2010-10-19 Thread Andrea Gazzarini

 Hi Ken,
yes I'm a java developer so I think I should be able to do that but I 
was wondering if there's a way to solve my issue without coding.
Problem is that I need to adjust this query in a short time and in 
addition I cannot justify (at this stage of the project) additional 
software artifacts.


Anyway thanks for your support

Best Regards,
Andrea

Il 19/10/2010 16:33, Ken Stanley ha scritto:

Andrea,

Another approach, aside of Markus' suggestion, would be to create your own
handler that could intercept the query and perform whatever necessary
transformations that you need at query time. However, that would require
having Java knowledge (which I make no assumption).

Regards,

Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
 -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


On Tue, Oct 19, 2010 at 10:23 AM, Andrea Gazzarini<
andrea.gazzar...@atcult.it>  wrote:


  Hi Ken,
thanks for your response...unfortunately it doesn't solve my problem.

I cannot chnage the client behaviour so the query must be a query and not
only the query terms.
In this scenario, It would be great, for example, if I could declare the
boost in the schema field definitionbut I think it's not possible isn't
it?

Regards
Andrea

--
*From:* Ken Stanley [mailto:doh...@gmail.com]
*To:* solr-user@lucene.apache.org
*Sent:* Tue, 19 Oct 2010 15:05:31 +0200
*Subject:* **SPAM** Re: boosting injection

Andrea,

Using the SOLR dismax query handler, you could set up queries like this to
boost on fields of your choice. Basically, the q parameter would be the
query terms (without the field definitions, and a qf (Query Fields)
parameter that you use to define your boost(s):
http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR alternative
would be to parse the query in whatever application is sending the queries
to the SOLR instance to make the necessary transformations.

Regards,

Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini<
andrea.gazzar...@atcult.it>  wrote:


Hi all,
I have a client that is sending this query

q=title:history AND author:joyce

is it possible to "transform" at runtime this query in this way:

q=title:history^10 AND author:joyce^5

?

Best regards,
Andrea







Timeouts in distributed search using Solr + Zookeeper

2010-10-19 Thread Cinquini, Luca (3880)
Hi,
we are looking at Solr+Zookeeper as the architecture for enabling 
federated searches among geographically distributed data centers.

I wonder if anybody can comment on what is the status of enabling timeouts with 
respect to distributed searches in a Solr-Zookeeper environment. Specifially, 
following the example C) in the Solr Cloud wiki:

http://wiki.apache.org/solr/SolrCloud

it seems like the system is resilient to any Solr server (out of 4) being 
unavailable, but if both Solr servers serving the same shard go down, then a 
distributed query results in error, instead or returning partial results. Is 
there any special configuration that needs to be set for the Solr and/or 
Zookeepers servers, or any request parameter that needs to be added, to make 
the distributed query just return results from the only available shard ? Or 
maybe this feature is not yet operational ?

thanks a lot,
Luca



Negative filter using the "appends" element

2010-10-19 Thread Kevin Cunningham
I'm using Solr 1.4 with the standard request handler and attempting to apply a 
negative fq for all requests via the "appends" elements but its not being 
applied.  Is this an intended limitation?  I looked in JIRA for an existing 
issue but nothing jumped out.

Works fine:

  tag:test



Does not work:

  -tag:test





query results file for trec_eval

2010-10-19 Thread Valli Indraganti
Hello!

I am a student and I am trying to run evaluation for TREC format document. I
have the judgments. I would like to have the output of my queries for use
with trec_eval software. Can someone please point me how to make Solr spit
out output in this format? Or at least point me to some material that guides
me through this.

Thanks,
Valli


RE: query results file for trec_eval

2010-10-19 Thread abhatna...@vantage.com

If I understand your use case correctly. You will have to write your own 
response writer.
Only the below response writers are available .

Query response writer

Description

XMLResponseWriter

The most general-purpose response format outputs its results in XML, as 
demonstrated by the blogging application in Part 
1.

XSLTResponseWriter

The XSLTResponseWriter applies a specified XSLT transformation to the output of 
the XMLResponseWriter. The tr parameter in the request specifies the name of 
the XSLT transformation to use. The transformation specified must exist in the 
Solr Home's conf/xslt directory. See 
Resources to 
learn more about the XSLT Response Writer.

JSONResponseWriter

Outputs results in JavaScript Object Notation (JSON) format. JSON is a simple, 
human-readable, data-interchange format that is also easy for machines to parse.

RubyResponseWriter

The RubyResponseWriter extends the JSON format so that the results can safely 
be evaluated in Ruby. If you are interested in using Ruby with Solr, follow the 
links to 
acts_as_solr 
and Flare in 
Resources.

PythonResponseWriter

Extends the JSON output format for safe use in the Python eval method.

QueryResponseWriters are added to Solr in the solrconfig.xml file using the 
 tag and affiliated attributes. The response type is 
specified in the request using the wt parameter. The default is "standard," 
which is set in the solrconfig.xml to be the XMLResponseWriter. Finally, 
instances of the QueryResponseWriter must provide thread-safe implementations 
of the write() and getContentType() methods used to create responses.

-Ankit

From: Valli Indraganti [via Lucene] 
[mailto:ml-node+1732965-820449511-24...@n3.nabble.com]
Sent: Tuesday, October 19, 2010 11:30 AM
To: Ankit Bhatnagar
Subject: query results file for trec_eval

Hello!

I am a student and I am trying to run evaluation for TREC format document. I
have the judgments. I would like to have the output of my queries for use
with trec_eval software. Can someone please point me how to make Solr spit
out output in this format? Or at least point me to some material that guides
me through this.

Thanks,
Valli


View message @ 
http://lucene.472066.n3.nabble.com/query-results-file-for-trec-eval-tp1732965p1732965.html
To start a new topic under Solr - User, email 
ml-node+472068-1740001710-24...@n3.nabble.com
To unsubscribe from Solr - User, click 
here.


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/query-results-file-for-trec-eval-tp1732965p1732999.html
Sent from the Solr - User mailing list archive at Nabble.com.


FW: Dismax phrase boosts on multi-value fields

2010-10-19 Thread Jason Brown



-Original Message-
From: Jason Brown
Sent: Tue 19/10/2010 13:45
To: d...@lucene.apache.org
Subject: Dismax phrase boosts on multi-value fields
 

Hi - I have a multi-value field, so say for example it consists of 

'my black cat'
'my white dog'
'my blue rabbit'

The field is whitespace parsed when put into the index.

I have a phrase query boost configured on this field which I understand kicks 
in when my search term is found entirely in this field.

So, if the search term is 'my blue rabbit', then I understand that my phrase 
boost will be applied as this is found entirley in this field. 

My question/presumption is that as this is a multi-valued field, only 1 value 
of the multi-value needs to match for the phrase query boost (given my very 
imaginative set of test data :-) above, you can see that this obviously matches 
1 value and not them all)

Thanks for your help.





If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer


Re: query results file for trec_eval

2010-10-19 Thread Ezequiel Calderara
I don't know anything about the TREC format document, but i think if you
want text output, you can do it by using the
http://wiki.apache.org/solr/XsltResponseWriter to transform the xml to a
text...

On Tue, Oct 19, 2010 at 12:29 PM, Valli Indraganti <
valli.indraga...@gmail.com> wrote:

> Hello!
>
> I am a student and I am trying to run evaluation for TREC format document.
> I
> have the judgments. I would like to have the output of my queries for use
> with trec_eval software. Can someone please point me how to make Solr spit
> out output in this format? Or at least point me to some material that
> guides
> me through this.
>
> Thanks,
> Valli
>



-- 
__
Ezequiel.

Http://www.ironicnet.com


does solr support posting gzipped content?

2010-10-19 Thread danomano

Hi folks, I was wondering if there is any native support for posting gzipped
files to solr?

i.e. I'm testing a project where we inject our log files into solr for
indexing, these logs files are gzipped, and I figure it would take less
network bandwith to inject gzipped files directl.  is there a way to do
this? other then implementing my own SerlvetFilter or some such.

thanx
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/does-solr-support-posting-gzipped-content-tp1733178p1733178.html
Sent from the Solr - User mailing list archive at Nabble.com.


Facet Use Case

2010-10-19 Thread Edgar Espina
Hi Guys,

 Let me describe you the use case for our search applications:
 a- The user enter to the search application to latest 20 document are
displayed.
 b- A Tag cloud component is populate with the facet available from a.
 c- The user type something in the text box.
 d- The documents are tagged in some way (there is tags field).
 e- Get the tags of the first document returned by c) and build a facet
result with documents containing the same tags.

make sense?

Is it possible to do this with a single Solr request?

Thanks in advance.
-- 
edgar


Dismax phrase boosts on multi-value fields

2010-10-19 Thread Jason Brown
 

Hi - I have a multi-value field, so say for example it consists of 

'my black cat'
'my white dog'
'my blue rabbit'

The field is whitespace parsed when put into the index.

I have a phrase query boost configured on this field which I understand kicks 
in when my search term is found entirely in this field.

So, if the search term is 'my blue rabbit', then I understand that my phrase 
boost will be applied as this is found entirley in this field. 

My question/presumption is that as this is a multi-valued field, only 1 value 
of the multi-value needs to match for the phrase query boost (given my very 
imaginative set of test data :-) above, you can see that this obviously matches 
1 value and not them all)

Thanks for your help.






If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer


Re: Dismax phrase boosts on multi-value fields

2010-10-19 Thread Jonathan Rochkind
You are correct.  The query needs to match as a phrase. It doesn't need 
to match "everything". Note that if a value is:


"long sentence with my blue rabbit in it",

then query "my blue rabbit" will also match as a phrase, for phrase 
boosting or query purposes.


Jonathan

Jason Brown wrote:
 

Hi - I have a multi-value field, so say for example it consists of 


'my black cat'
'my white dog'
'my blue rabbit'

The field is whitespace parsed when put into the index.

I have a phrase query boost configured on this field which I understand kicks 
in when my search term is found entirely in this field.

So, if the search term is 'my blue rabbit', then I understand that my phrase boost will be applied as this is found entirley in this field. 


My question/presumption is that as this is a multi-valued field, only 1 value 
of the multi-value needs to match for the phrase query boost (given my very 
imaginative set of test data :-) above, you can see that this obviously matches 
1 value and not them all)

Thanks for your help.






If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer

  




Re: I need to indexing the first character of a field in another field

2010-10-19 Thread Renato Wesenauer
Hi guys,

I read all suggestions and I did some tests, and finally, the indexing
process is working.

I did the extraction of initial character of three fields. Here are the
functions:

function extraiInicial(valor) {
  if (valor != "" && valor != null) {
  valor = valor.substring(0, 1).toUpperCase();
  }
  else {
  valor = '';
  }
  return valor;
}

function extraiIniciaisAutorEditoraSebo(linha) {

linha.put("inicialautor", extraiInicial(linha.get("autor")));
linha.put("inicialeditora",
extraiInicial(linha.get("editora")));
linha.put("inicialsebo",
extraiInicial(linha.get("sebo")));
return linha;
}

Thank you for your help,

Renato F. Wesenauer




2010/10/18 Chris Hostetter 

>
> This exact topic was just discussed a few days ago...
>
>
> http://search.lucidimagination.com/search/document/7b6e2cc37bbb95c8/faceting_and_first_letter_of_fields#3059a28929451cb4
>
> My comments on when/where it makes sense to put this logic...
>
>
> http://search.lucidimagination.com/search/document/7b6e2cc37bbb95c8/faceting_and_first_letter_of_fields#7b6e2cc37bbb95c8
>
>
> : Date: Mon, 18 Oct 2010 19:31:28 -0200
> : From: Renato Wesenauer 
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: I need to indexing the first character of a field in another
> field
> :
> : Hello guys,
> :
> : I need to indexing the first character of the field "autor" in another
> field
> : "inicialautor".
> : Example:
> :autor = Mark Webber
> :inicialautor = M
> :
> : I did a javascript function in the dataimport, but the field
>  inicialautor
> : indexing empty.
> :
> : The function:
> :
> : function InicialAutor(linha) {
> : var aut = linha.get("autor");
> : if (aut != null) {
> :   if (aut.length > 0) {
> :   var ch = aut.charAt(0);
> :   linha.put("inicialautor", ch);
> :   }
> :   else {
> :   linha.put("inicialautor", '');
> :   }
> : }
> : else {
> : linha.put("inicialautor", '');
> : }
> : return linha;
> : }
> :
> : What's wrong?
> :
> : Thank's,
> :
> : Renato Wesenauer
> :
>
> -Hoss
>


Re: Documents and cores

2010-10-19 Thread Chris Hostetter

: Subject: Documents and cores
: References: <4cbd939c.3020...@atcult.it>
:  
: In-Reply-To: 

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss


Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-10-19 Thread Chris Hostetter

: We will take this approach in our production environment but meanwhile I am
: curious if this issue will be addressed: it seems the new/first searchers do
: not really buy any performance benefits because it uses so much memory,
: especially at core loading time.

There's nothing inheriently wrong with using newSearcher/firstSearcher -- 
for many people they do in fact provide a perf improvement for "real" 
users (at the cost of some initial time spent warming before those users 
ever get access to the searcher)

As i udnerstand it from this thread, your issue is not actually the 
firstSearcher/newSearcher -- your issue (per yonik's ocmments) is that 
with per Segment sorting in 1.4, the FieldCache for some of your fields 
requires a lot more ram in 1.4 then it would have been for Solr 1.3 -- 
which caused GC thrashing during initialization.

Even w/o using firstSearcher/newSearcher, all that RAM is still going to 
be used if/when you do sort on those fields -- all removing the 
firstSearcher/newSearcher queries on those fields has done for you is 
delay when the time spent initializing those FieldCaches happens and when 
that RAM first starts getting used.

It's posisbly you never actual sort on those fields, in which case 
removing those warming queries completely is definitely the way to go -- 
but if you do sort on them then the warming queries can still legitimately 
be helpful (in thta they pay the cost up front before a real user issues 
queries)

As yonik mentioned the real "fix" for the amount of memory being used is 
to switch to the TrieDateFields which use much more efficient FieldCache's 
for sorting -- with that change you can probably start using the warming 
queries again.  (Depending on how you tested, you may not have noticed 
much advantage to having them because you'll really only see the 
advantages on the initial queries that do sorting -- those should show 
huge outlier times w/o the warming queries, but once those poor unlucky 
users have paid the price for initializing hte FieldCache, every one elses 
sorts should be fast)

-Hoss


Re: SolrJ new javabin format

2010-10-19 Thread Chris Hostetter

:  The CHANGES.txt file in branch_3x says that the javabin format has changed in
: Solr 3.1, so you need to update SolrJ as well as Solr.  Is the SolrJ included
: in 3.1 compatible with both 3.1 and 1.4.1?  If not, that's going to make a
: graceful upgrade of my replicated distributed installation a little harder.

The formats are not currently compatible.  The first priority was to get 
the format fixed so it was using true UTF8 (instead of Java's bastardized 
modified UTF8) in a way that would generate a clear error if people 
attempted to use an older SolrJ to talk to a newer SOlr server (or vice 
versa).

The concensus was that fixing thta problem was worth the added complexity 
during upgrading -- people that want to use SolrJ 1.4 to talk to a Solr 
3.x server can always use the XML format instead of the binary format.

If you'd like to help improve the codec so that 3.x can recognize when a 
1.4 client connects and switch to the older format, patches along those 
lines would certianly be welcome.


-Hoss


Documents and Cores, take 2

2010-10-19 Thread Olson, Ron
Hi all-

I have a newbie design question about documents, especially with SQL databases. 
I am trying to set up Solr to go against a database that, for example, has 
"items" and "people". The way I see it, and I don't know if this is right or 
not (thus the question), is that I see both as separate documents as an item 
may contain a list of parts, which the user may want to search, and, as part of 
the "item", view the list of people who have ordered the item.

Then there's the actual "people", who the user might want to search to find a 
name and, consequently, what items they ordered. To me they are both "top 
level" things, with some overlap of fields. If I'm searching for "people", I'm 
likely not going to be interested in the parts of the item, while if I'm 
searching for "items" the likelihood is that I may want to search for "42532" 
which is, in this instance, a SKU, and not get hits on the zip code section of 
the "people".

Does it make sense, then, to separate these two out as separate documents? I 
believe so because the documentation I've read suggests that a document should 
be analogous to a row in a table (in this case, very de-normalized). What is 
tripping me up is, as far as I can tell, you can have only one document type 
per index, and thus one document per core. So in this example, I have two 
cores, "items" and "people". Is this correct? Should I embrace the idea of 
having many cores or am I supposed to have a single, unified index with all 
documents (which doesn't seem like Solr supports).

The ultimate question comes down to the search interface. I don't necessarily 
want to have the user explicitly state which document they want to search; I'd 
like them to simply type "42532" and get documents from both cores, and then 
possibly allow for filtering results after the fact, not before. As I've only 
used the admin site so far (which is core-specific), does the client API allow 
for unified searching across all cores? Assuming it does, I'd think my idea of 
multiple-documents is okay, but I'd love to hear from people who actually know 
what they're doing. :)

Thanks,

Ron

BTW: Sorry about the problem with the previous message; I didn't know about 
thread hijacking.

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: How can i get collect stemmed query?

2010-10-19 Thread Ahmet Arslan
Oh you are constructing the string 'fly +body:away' in your StemFilter?
Just to make sure, does this q=+body:(fly away) return your document?
And analysis.jsp (at query time) displays 'fly +body:away' from the string 
'flyaway'?

I don't know why are you doing this but your stemfilter should return only 
terms, not field names attached to it.

Maybe you can find this useful so that you can do what you want without writing 
custom code.
 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory



 

--- On Tue, 10/19/10, Jerad  wrote:

> From: Jerad 
> Subject: Re: How can i get collect stemmed query?
> To: solr-user@lucene.apache.org
> Date: Tuesday, October 19, 2010, 5:10 AM
> 
> Thanks for your reply :)
> 
> 1. I tested that "q=*:*&fl=body" , 1 doc returned as
> result as I expected.
> 
> 2. I'm edit my scheme.xml as you instructed. 
> 
>      class="com.testsolr.ir.customAnalyzer.MyCustomQueryAnalyzer">
> 
>         //No filter description.
>      
> 
>     but no result returned.
> 
> 3. I wonder that...
> 
>     Tipically Tokenizer and filter flow was
> 
>     1) Input stream provide text stream to
> tokenizer or filter.
>     2) tokenizer or filter get a token, and
> processed token and offset
> attribute info has returned.
>     3) offset attributes has the infomation of
> token's.
>     
>         This is a part of tipical
> filter src that I thought.
>        
> 
>         public class CustomStemFilter
> extends TokenFilter {
> 
>             private
> MyCustomStemer stemmer;
>             private
> TermAttribute termAttr;
>             private
> OffsetAttribute offsetAttr;
>             private
> TypeAttribute typeAttr;
>             private
> Hashtable reserved = new
> Hashtable();
>             
>             public
> CustomStemFilter( TokenStream tokenStream, boolean isQuery,
> MyCustomStemer stemmer ){
>              
>   super( tokenStream );
>                 
>              
>   this.stemmer = stemmer;
>              
>   termAttr   = (TermAttribute)
> addAttribute(TermAttribute.class);   
>              
>   offsetAttr = (OffsetAttribute)
> addAttribute(OffsetAttribute.class);   
>                
> typeAttr   = (TypeAttribute)
> addAttribute(TypeAttribute.class);   
>                
> addAttribute(PositionIncrementAttribute.class);
>             
>                
> //Some of my custom logic here.
>                
> //do something.
>             }
>             
>             private
> MyCustomStemmer stemmer = new MyCustomStemmer();
>             
>             public boolean
> incrementToken() throws IOException {
>                
> clearAttributes();
>             
>                 if
> (!input.incrementToken())
>                
>     return false;
> 
>                
> StringBuffer queryBuffer = new StringBuffer();
>                 
>                
> //stemming logic here.
>                
> //generated query string has append to queryBuffer.
>                 
>              
>   termAttr.setTermBuffer(queryBuffer.toString(), 0,
> queryBuffer.length());
>              
>   offsetAttr.setOffset(0, queryBuffer.length());
>         
>       offSet += queryBuffer.length();
>         
>       typeAttr.setType("word");
>         
>       
>         
>       return true;
>             }
>         }
>        
> 
> 
>         ※ MyCustomStemmer analyze
> input string "flyaway" to query string :
> fly +body:away
>            and return
> it.
> 
>         At index time, contents to be
> searched is normally analyzed and
> indexed as below.
>         
>         a) Contents to be indexed : fly
> away
>         b) Token "fly" and length of
> "fly" = 3(Has been setup by offset
> attribute method) 
>            has returned
> by filter or analyzer.
>         c) Next token "away" and length
> of "away" = 4 has returned.
>         
>         I think it's a general index
> flow.
> 
>         But, I customized
> MyCustomFilter that filter generate query string,
> not a token.
>         In the process, offset value
> has changed : query's length, not a
> single token's length.
>         
>         I wonder that value to be set
> up by offsetAttr.setOffset() method 
>         has influence on search result
> on using solr? 
>         (I tested this on main page's
> query input box at
> http://localhost:8983/solr/admin/ )
> 

> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-can-i-get-collect-search-result-from-custom-filtered-query-tp1723055p1729717.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 





Re: Documents and Cores, take 2

2010-10-19 Thread Ken Stanley
Ron,

In the past I've worked with SOLR for a product that required the ability to
search - separately - for companies, people, business lists, and a
combination of the previous three. In designing this in SOLR, I found that
using a combination of explicit field definitions and dynamic fields (
http://wiki.apache.org/solr/SchemaXml#Dynamic_fields) gave me the best
possible solution for the problem.

In essence, I created explicit fields that would be shared among all
document "types": a unique id, a document type, an indexed date, a modified
date, and maybe a couple of other fields that share traits with all document
types (i.e., name, a "market" specific to our business, etc). The unique id
was built as a string, and was prefixed with the document type, and it ended
with the unique id from the database.

The dynamic fields can be configured to be as flexible as you need, and in
my experience I would strongly recommend documenting each type of dynamic
field for each of your document types as a reference for your developers
(and yourself). :)

This allows us to build queries that can be focused on specific document
types, or combining all of the types into a "super" search. For example, you
could something to the effect of: (docType: people) AND (df_firstName:John
AND df_lastName:Hancock), (docType:companies) AND
(df_BusinessName:Acme+Inc), or even ((df_firstName:John AND
df_lastName:Hancock) OR (df_BusinessName:Acme+Inc)).

I hope this helps!

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


On Tue, Oct 19, 2010 at 4:57 PM, Olson, Ron  wrote:

> Hi all-
>
> I have a newbie design question about documents, especially with SQL
> databases. I am trying to set up Solr to go against a database that, for
> example, has "items" and "people". The way I see it, and I don't know if
> this is right or not (thus the question), is that I see both as separate
> documents as an item may contain a list of parts, which the user may want to
> search, and, as part of the "item", view the list of people who have ordered
> the item.
>
> Then there's the actual "people", who the user might want to search to find
> a name and, consequently, what items they ordered. To me they are both "top
> level" things, with some overlap of fields. If I'm searching for "people",
> I'm likely not going to be interested in the parts of the item, while if I'm
> searching for "items" the likelihood is that I may want to search for
> "42532" which is, in this instance, a SKU, and not get hits on the zip code
> section of the "people".
>
> Does it make sense, then, to separate these two out as separate documents?
> I believe so because the documentation I've read suggests that a document
> should be analogous to a row in a table (in this case, very de-normalized).
> What is tripping me up is, as far as I can tell, you can have only one
> document type per index, and thus one document per core. So in this example,
> I have two cores, "items" and "people". Is this correct? Should I embrace
> the idea of having many cores or am I supposed to have a single, unified
> index with all documents (which doesn't seem like Solr supports).
>
> The ultimate question comes down to the search interface. I don't
> necessarily want to have the user explicitly state which document they want
> to search; I'd like them to simply type "42532" and get documents from both
> cores, and then possibly allow for filtering results after the fact, not
> before. As I've only used the admin site so far (which is core-specific),
> does the client API allow for unified searching across all cores? Assuming
> it does, I'd think my idea of multiple-documents is okay, but I'd love to
> hear from people who actually know what they're doing. :)
>
> Thanks,
>
> Ron
>
> BTW: Sorry about the problem with the previous message; I didn't know about
> thread hijacking.
>
> DISCLAIMER: This electronic message, including any attachments, files or
> documents, is intended only for the addressee and may contain CONFIDENTIAL,
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended
> recipient, you are hereby notified that any use, disclosure, copying or
> distribution of this message or any of the information included in or with
> it is  unauthorized and strictly prohibited.  If you have received this
> message in error, please notify the sender immediately by reply e-mail and
> permanently delete and destroy this message and its attachments, along with
> any copies thereof. This message does not create any contractual obligation
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>


Re: Negative filter using the "appends" element

2010-10-19 Thread Ahmet Arslan
> Does not work:
>     
>        name="fq">-tag:test
>     

Can you append &echoParams=all to your search url and verify that that 
fq=-tag:test included in response?





Spatial

2010-10-19 Thread Pradeep Singh
https://issues.apache.org/jira/browse/LUCENE-2519

If I change my code as per 2519

to have this  -

public double[] coords(double latitude, double longitude) {
double rlat = Math.toRadians(latitude);
double rlong = Math.toRadians(longitude);
double nlat = rlong * Math.cos(rlat);
return new double[]{nlat, rlong};

  }


return this -

x = (gamma - gamma[0]) cos(phi)
y = phi

would it make it give correct results? Correct projections, tier ids?

I am not talking about changing Lucene/Solr code, I can duplicate the
classes to create my own version. Just wanted to be sure about the results.

Pradeep


xi:include

2010-10-19 Thread Peter A. Kirk
Hi

I am trying to use xi:include in my solrconfig.xml.

For example:
http://localhost/config/config.aspx"; />

This works fine, as long as config.aspx exists, and as long as it returns valid 
xml.

Sometimes though, the config.aspx can fail, and return invalid xml. Then I get 
a problem, as Solr's parsing of the solrconfig.xml fails. If I use xi:fallback, 
eg:

http://localhost/config/config.aspx";>
  

  text^0.4 n^1.2 c^1.5 d^0.4 b^3

  


This helps if config.xml does not exist - then the fallback is used. But if 
config.aspx returns invalid xml, then the fallback does not appear to be used, 
and I get exceptions when I start Solr up. How can I get Solr to fallback if 
the included xml fails?

Thanks,
Peter



Re: query results file for trec_eval

2010-10-19 Thread Ahmet Arslan
> I am a student and I am trying to run evaluation for TREC
> format document. I
> have the judgments. I would like to have the output of my
> queries for use
> with trec_eval software. Can someone please point me how to
> make Solr spit
> out output in this format? Or at least point me to some
> material that guides
> me through this.

Lucene has a package (org.apache.lucene.benchmark.quality.trec) for this.

http://search-lucene.com/jd/lucene/org/apache/lucene/benchmark/quality/package-summary.html


  


Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?

2010-10-19 Thread Israel Ekpo
Hi All,

Just wanted to post an update on where we stand with all the requests for
new features


List of Features Requested In SOLR PECL Extension

1. Ability to Send Custom Requests to Custom URLS other than select, update,
terms etc.
2. Ability to add files (pdf, office documents etc)
3. Windows version of latest releases.
4. Ensuring that SolrQuery::getFields(), SolrQuery::getFacets() et al
returns an array consistently.
5. Lowering Libxml version to 2.6.16

If there is anything that you think I left out please let me know. This is a
summary.

On Wed, Oct 13, 2010 at 3:48 AM, Stefan Matheis <
matheis.ste...@googlemail.com> wrote:

> On Tue, Oct 12, 2010 at 6:29 PM, Israel Ekpo  wrote:
>
> > I think this feature will take care of this.
> >
> > What do you think?
>
>
> sounds good!
>



-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Spatial

2010-10-19 Thread Grant Ingersoll

On Oct 19, 2010, at 6:23 PM, Pradeep Singh wrote:

> https://issues.apache.org/jira/browse/LUCENE-2519
> 
> If I change my code as per 2519
> 
> to have this  -
> 
> public double[] coords(double latitude, double longitude) {
>double rlat = Math.toRadians(latitude);
>double rlong = Math.toRadians(longitude);
>double nlat = rlong * Math.cos(rlat);
>return new double[]{nlat, rlong};
> 
>  }
> 
> 
> return this -
> 
> x = (gamma - gamma[0]) cos(phi)
> y = phi
> 
> would it make it give correct results? Correct projections, tier ids?

I'm not sure.  I have a lot of doubt around that code.  After making that 
correction, I spent several days trying to get the tests to pass and ultimately 
gave up.  Does that mean it is wrong?  I don't know.  I just don't have enough 
confidence to recommend it given that the tests I were asking it to do I could 
verify through other tools.  Personally, I would recommend seeing if one of the 
non-tier based approaches suffices for your situation and use that.

-Grant

Re: boosting injection

2010-10-19 Thread Erick Erickson
The main disadvantage of index-time boosting is that you must reindex your
corpus entirely if you want to alter the boost factors. And there's no very
good
way to anticipate what boost factors will give you the results you want

I wonder if you could cheat and do some basic string processing on the
query and add your boosts? That's be tricky unless you have very predictable
strings

Best
Erick

On Tue, Oct 19, 2010 at 10:33 AM, Andrea Gazzarini <
andrea.gazzar...@atcult.it> wrote:

>   Y-E-A-H! I think it's so!
> Markus, what are disadvantages of this boosting strategy?
>
> Thanks a lot
> Andrea
>
> Il 19/10/2010 16:25, Markus Jelsma ha scritto:
>
>> Index-time boosting maybe?
>>
>> http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22
>>
>>
>> On Tuesday, October 19, 2010 04:23:46 pm Andrea Gazzarini wrote:
>>
>>> Hi Ken,
>>> thanks for your response...unfortunately it doesn't solve my problem.
>>>
>>> I cannot chnage the client behaviour so the query must be a query and not
>>> only the query terms. In this scenario, It would be great, for example,
>>> if
>>> I could declare the boost in the schema field definitionbut I think
>>> it's not possible isn't it?
>>>
>>> Regards
>>> Andrea
>>>   _
>>>
>>> From: Ken Stanley [mailto:doh...@gmail.com]
>>> To: solr-user@lucene.apache.org
>>> Sent: Tue, 19 Oct 2010 15:05:31 +0200
>>> Subject: **SPAM**  Re: boosting injection
>>>
>>> Andrea,
>>>
>>>   Using the SOLR dismax query handler, you could set up queries like this
>>> to boost on fields of your choice. Basically, the q parameter would be
>>> the
>>> query terms (without the field definitions, and a qf (Query Fields)
>>> parameter that you use to define your boost(s):
>>>   http://wiki.apache.org/solr/DisMaxQParserPlugin. A non-SOLR
>>> alternative
>>>   would be to parse the query in whatever application is sending the
>>> queries to the SOLR instance to make the necessary transformations.
>>>
>>>   Regards,
>>>
>>>   Ken
>>>
>>>   It looked like something resembling white marble, which was
>>>   probably what it was: something resembling white marble.
>>>   -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"
>>>
>>>
>>>   On Tue, Oct 19, 2010 at 8:48 AM, Andrea Gazzarini<
>>>
>>>   andrea.gazzar...@atcult.it>  wrote:
>>>   >   Hi all,
>>>   >
>>>   >  I have a client that is sending this query
>>>   >
>>>   >  q=title:history AND author:joyce
>>>   >
>>>   >  is it possible to "transform" at runtime this query in this way:
>>>   >
>>>   >  q=title:history^10 AND author:joyce^5
>>>   >
>>>   >  ?
>>>   >
>>>   >  Best regards,
>>>   >  Andrea
>>>
>>


Re: Documents and cores

2010-10-19 Thread Erick Erickson
This is something most everybody has to get over when transitioning from the
DB
world to Solr/Lucene. The schema describes the #possible# fields in the
document.
There is absolutely no requirement that #every# document in the index have
all these fields in them (unless #you# define it so with .

Solr will happily index documents that have fields missing, so feel free...
You should be able to define your people and parts documents as you
choose, with perhaps some common fields.

You'll have to take some care not to form queries like name:ralph AND
sku:12345
assuming that the name field is only in people and sku only in parts

Do continue down the path of de-normalization. That's another thing most DB
folks
don't want to do. Each document you index should contain all the data you
need.
The moment you find yourself asking "how to I do a join" you should stop and
consider further de-normalization.

HTH
Erick


On Tue, Oct 19, 2010 at 10:39 AM, Olson, Ron  wrote:

> Hi all-
>
> I have a newbie design question about documents, especially with SQL
> databases. I am trying to set up Solr to go against a database that, for
> example, has "items" and "people". The way I see it, and I don't know if
> this is right or not (thus the question), is that I see both as separate
> documents as an item may contain a list of parts, which the user may want to
> search, and, as part of the "item", view the list of people who have ordered
> the item.
>
> Then there's the actual "people", who the user might want to search to find
> a name and, consequently, what items they ordered. To me they are both "top
> level" things, with some overlap of fields. If I'm searching for "people",
> I'm likely not going to be interested in the parts of the item, while if I'm
> searching for "items" the likelihood is that I may want to search for
> "42532" which is, in this instance, a SKU, and not get hits on the zip code
> section of the "people".
>
> Does it make sense, then, to separate these two out as separate documents?
> I believe so because the documentation I've read suggests that a document
> should be analogous to a row in a table (in this case, very de-normalized).
> What is tripping me up is, as far as I can tell, you can have only one
> document type per index, and thus one document per core. So in this example,
> I have two cores, "items" and "people". Is this correct? Should I embrace
> the idea of having many cores or am I supposed to have a single, unified
> index with all documents (which doesn't seem like Solr supports).
>
> The ultimate question comes down to the search interface. I don't
> necessarily want to have the user explicitly state which document they want
> to search; I'd like them to simply type "42532" and get documents from both
> cores, and then possibly allow for filtering results after the fact, not
> before. As I've only used the admin site so far (which is core-specific),
> does the client API allow for unified searching across all cores? Assuming
> it does, I'd think my idea of multiple-documents is okay, but I'd love to
> hear from people who actually know what they're doing. :)
>
> Thanks,
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or
> documents, is intended only for the addressee and may contain CONFIDENTIAL,
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended
> recipient, you are hereby notified that any use, disclosure, copying or
> distribution of this message or any of the information included in or with
> it is  unauthorized and strictly prohibited.  If you have received this
> message in error, please notify the sender immediately by reply e-mail and
> permanently delete and destroy this message and its attachments, along with
> any copies thereof. This message does not create any contractual obligation
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>


Re: Negative filter using the "appends" element

2010-10-19 Thread Erick Erickson
I suspect, but don't know for sure, that you need to modify it to
*:* - tag:test

but I confess I'm not at all sure that it'll work in this context..

Best
Erick

On Tue, Oct 19, 2010 at 11:10 AM, Kevin Cunningham <
kcunning...@telligent.com> wrote:

> I'm using Solr 1.4 with the standard request handler and attempting to
> apply a negative fq for all requests via the "appends" elements but its not
> being applied.  Is this an intended limitation?  I looked in JIRA for an
> existing issue but nothing jumped out.
>
> Works fine:
>
>  tag:test
>
>
>
> Does not work:
>
>  -tag:test
>
>
>
>


Multiple partial word searching with dismax handler

2010-10-19 Thread Chamnap Chhorn
Hi,

I have some problem with combining the query with multiple parital-word
searching in dismax handler. In order to make multiple partial word
searching, I use EdgeNGramFilterFactory, and my query must be something like
this: "name_ngram:sun name_ngram:hot" in q.alt combined with my search
handler (
http://localhost:8081/solr/select/?q.alt=name_ngram:sun%20name_ngram:hot&qt=products).
I wonder how I combine this with my search handler.

Here is my search handler config:
  

  explicit
  20
  dismax
  name^200 full_text
  fap^15
  uuid
  2.2
  on
  0.1


  type:Product


  false


  spellcheck
  elevateProducts

  

If I query with this url
http://localhost:8081/solr/select/?q.alt=name_ngram:sun%20name_ngram:hot&q=sun
hot&qt=products, it doesn't show the correct answer like the previous query.

How could configure this in my search handler with boost score?

-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Not able to subscribe to ML

2010-10-19 Thread Abdullah Shaikh
Just a test mail to check if my mails are reaching the ML.

I dont know, but my mails are failing to reach the ML with the following
error :

Delivery to the following recipient failed permanently:

solr-user@lucene.apache.org

Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the recipient
domain. We recommend contacting the other email provider for further
information about the cause of this error. The error that the other server
returned was: 552 552 spam score (5.7) exceeded threshold (state 18).


- Abdullah


Re: Implementing Search Suggestion on Solr

2010-10-19 Thread Pablo Recio
Yeah, I know.

Does anyone could tell me wich one is the good way?

Regards,
> What an interesting application :-)
>
> Dennis Gearon
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
better idea to learn from others’ mistakes, so you do not have to make them
yourself. from '
http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
> --- On Mon, 10/18/10, Pablo Recio Quijano  wrote:
>
>> From: Pablo Recio Quijano 
>> Subject: Implementing Search Suggestion on Solr
>> To: solr-user@lucene.apache.org
>> Date: Monday, October 18, 2010, 3:53 AM
>> Hi!
>>
>> I'm trying to implement some kind of Search Suggestion on a
>> search engine I have implemented. This search suggestions
>> should not be automatically like the one described for the
>> SpellCheckComponent [1]. I'm looking something like:
>>
>> "SAS oppositions" => "Public job offers for
>> some-company"
>>
>> So I will have to define it manually. I was thinking about
>> synonyms [2] but I don't know if it's the proper way to do
>> it, because semantically those terms are not synonyms.
>>
>> Any ideas or suggestions?
>>
>> Regards,
>>
>> [1] http://wiki.apache.org/solr/SpellCheckComponent
>> [2]
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>


Re: Lucene vs Solr

2010-10-19 Thread Pradeep Singh
Is that right?

On Tue, Oct 19, 2010 at 11:08 PM, findbestopensource <
findbestopensou...@gmail.com> wrote:

> Hello all,
>
> I have posted an article Lucene vs Solr
> http://www.findbestopensource.com/article-detail/lucene-vs-solr
>
> Please feel free to add your comments.
>
> Regards
> Aditya
> www.findbestopensource.com
>


Re: SolrJ new javabin format

2010-10-19 Thread Shawn Heisey

 On 10/19/2010 2:40 PM, Chris Hostetter wrote:

The formats are not currently compatible.  The first priority was to get
the format fixed so it was using true UTF8 (instead of Java's bastardized
modified UTF8) in a way that would generate a clear error if people
attempted to use an older SolrJ to talk to a newer SOlr server (or vice
versa).

The concensus was that fixing thta problem was worth the added complexity
during upgrading -- people that want to use SolrJ 1.4 to talk to a Solr
3.x server can always use the XML format instead of the binary format.


What happens with distributed search, which uses javabin behind the 
scenes?  I don't query my actual index machines with a shards parameter, 
I have dedicated brokers (with empty indexes) that have the shards 
parameter included in the request handler, pointed at load balancer IP 
addresses.  Is there any way to have that use XML instead of javabin, or 
do I need to be cautious about not mixing versions during the upgrade?


Thanks,
Shawn