schedule indexing with DataImportHandler

2008-09-15 Thread rameshgalla

hi,

Is it possible to schedule indexing with solr DataImportHandler?

eg: I want to do delta import automatically everyday at 12AM like that.

or 

Is it possible to initiate delta import automatically whenever there is a
modification in
database?
-- 
View this message in context: 
http://www.nabble.com/schedule-indexing-with-DataImportHandler-tp19488273p19488273.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: schedule indexing with DataImportHandler

2008-09-15 Thread Shalin Shekhar Mangar
On Mon, Sep 15, 2008 at 1:02 PM, rameshgalla <[EMAIL PROTECTED]>wrote:

>
> Is it possible to schedule indexing with solr DataImportHandler?
>
> eg: I want to do delta import automatically everyday at 12AM like that.


Only through external means. Create a cron job that uses wget to hit the
delta-import command URL.

Is it possible to initiate delta import automatically whenever there is a
> modification in
> database?


No. DataImportHandler cannot know when the data has changed. You must
achieve this through your application.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Searching for Index-Time Boosting in FAQ

2008-09-15 Thread Shalin Shekhar Mangar
You can specify a boost while indexing by adding a 'boost' attribute to the
field tag in the XML.

For example:

  
value
  


In the same manner, boost can also be specified on the document tag to boost
the score for the whole document.

This information is not very prominent in the wiki but some documentation is
at http://wiki.apache.org/solr/UpdateXmlMessages

On Mon, Sep 15, 2008 at 2:25 PM, Luca Molteni <[EMAIL PROTECTED]> wrote:

> Hello, dear Solr Users,
>
> I'm starting to learn Solr and Lucene, since I want to use this technology
> in my project, but I found some trouble in the "index-time boosting"
> section
> of the documentation, I'm probably missing something, but since I can't
> figure out all by myself, I decided to wrote here.
>
> For example, in the scoring faq:
>
> http://wiki.apache.org/solr/SolrRelevancyFAQ
>
> There are references to and "index-time boosting" options for fields, but I
> haven't found anything neither in the field configuration section of the
> wiki nor in the schema.xml file.
>
> From what I've understand, these are the attributes for fields:
>
> name, indexed, stored, compressed, multiValued, omitNorms, termVectors
>
> But none of these solved my problem.
>
> At the end of the scoring FAQ there is this note:
>
>   -
>
>   when should index-time boosts be used
>
>
> I'd like to see "how" index-time boosts should be used, please,
>
> Thank you very much,
>
> Bye.
>
> L.M.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Searching for Index-Time Boosting in FAQ

2008-09-15 Thread Luca Molteni
Now that's clear!

Since it's and Index-time boost, it was in the indexing documents section. I
should have checked for that. Would you mind if I dare to update the scoring
FAQ with the link to this page?

"To increase the scores for certain documents that match a query, regardless
of what that query may be, one can use index-time boosts.

Index-time boosts can be specified per-field also, so only queries matching
on that specific field will get the extra boost. An Index-time boost on a
value of a multiValued field applies to all values for that field."

"Index-time boosts are assigned with the optional attribute "boost" in the
 section of the XML updating messages - LINK TO PAGE".
L.M.


2008/9/15 Shalin Shekhar Mangar <[EMAIL PROTECTED]>

> You can specify a boost while indexing by adding a 'boost' attribute to the
> field tag in the XML.
>
> For example:
> 
>  
>value
>  
> 
>
> In the same manner, boost can also be specified on the document tag to
> boost
> the score for the whole document.
>
> This information is not very prominent in the wiki but some documentation
> is
> at http://wiki.apache.org/solr/UpdateXmlMessages
>
> On Mon, Sep 15, 2008 at 2:25 PM, Luca Molteni <[EMAIL PROTECTED]> wrote:
>
> > Hello, dear Solr Users,
> >
> > I'm starting to learn Solr and Lucene, since I want to use this
> technology
> > in my project, but I found some trouble in the "index-time boosting"
> > section
> > of the documentation, I'm probably missing something, but since I can't
> > figure out all by myself, I decided to wrote here.
> >
> > For example, in the scoring faq:
> >
> > http://wiki.apache.org/solr/SolrRelevancyFAQ
> >
> > There are references to and "index-time boosting" options for fields, but
> I
> > haven't found anything neither in the field configuration section of the
> > wiki nor in the schema.xml file.
> >
> > From what I've understand, these are the attributes for fields:
> >
> > name, indexed, stored, compressed, multiValued, omitNorms, termVectors
> >
> > But none of these solved my problem.
> >
> > At the end of the scoring FAQ there is this note:
> >
> >   -
> >
> >   when should index-time boosts be used
> >
> >
> > I'd like to see "how" index-time boosts should be used, please,
> >
> > Thank you very much,
> >
> > Bye.
> >
> > L.M.
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Searching for Index-Time Boosting in FAQ

2008-09-15 Thread Shalin Shekhar Mangar
Please go ahead :-)

On Mon, Sep 15, 2008 at 3:04 PM, Luca Molteni <[EMAIL PROTECTED]> wrote:

> Now that's clear!
>
> Since it's and Index-time boost, it was in the indexing documents section.
> I
> should have checked for that. Would you mind if I dare to update the
> scoring
> FAQ with the link to this page?
>
> "To increase the scores for certain documents that match a query,
> regardless
> of what that query may be, one can use index-time boosts.
>
> Index-time boosts can be specified per-field also, so only queries matching
> on that specific field will get the extra boost. An Index-time boost on a
> value of a multiValued field applies to all values for that field."
>
> "Index-time boosts are assigned with the optional attribute "boost" in the
>  section of the XML updating messages - LINK TO PAGE".
> L.M.
>
>
> 2008/9/15 Shalin Shekhar Mangar <[EMAIL PROTECTED]>
>
> > You can specify a boost while indexing by adding a 'boost' attribute to
> the
> > field tag in the XML.
> >
> > For example:
> > 
> >  
> >value
> >  
> > 
> >
> > In the same manner, boost can also be specified on the document tag to
> > boost
> > the score for the whole document.
> >
> > This information is not very prominent in the wiki but some documentation
> > is
> > at http://wiki.apache.org/solr/UpdateXmlMessages
> >
> > On Mon, Sep 15, 2008 at 2:25 PM, Luca Molteni <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hello, dear Solr Users,
> > >
> > > I'm starting to learn Solr and Lucene, since I want to use this
> > technology
> > > in my project, but I found some trouble in the "index-time boosting"
> > > section
> > > of the documentation, I'm probably missing something, but since I can't
> > > figure out all by myself, I decided to wrote here.
> > >
> > > For example, in the scoring faq:
> > >
> > > http://wiki.apache.org/solr/SolrRelevancyFAQ
> > >
> > > There are references to and "index-time boosting" options for fields,
> but
> > I
> > > haven't found anything neither in the field configuration section of
> the
> > > wiki nor in the schema.xml file.
> > >
> > > From what I've understand, these are the attributes for fields:
> > >
> > > name, indexed, stored, compressed, multiValued, omitNorms, termVectors
> > >
> > > But none of these solved my problem.
> > >
> > > At the end of the scoring FAQ there is this note:
> > >
> > >   -
> > >
> > >   when should index-time boosts be used
> > >
> > >
> > > I'd like to see "how" index-time boosts should be used, please,
> > >
> > > Thank you very much,
> > >
> > > Bye.
> > >
> > > L.M.
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Sending queries to multicore installation

2008-09-15 Thread Henrib

Hi,
If you are sure that you did index your documents through the intended core,
it might be that your solrconfig.xml does not use the 'dataDir' property you
declared in solr.xml for your 2 cores.

The shopping & tourims solconfig.xml should have a line stating:
${dataDir}

And *not* the default:
${solr.data.dir:./solr/data}
Which will make both cores use the same index.

Hope this helps,
Henrib


rogerio.araujo wrote:
> 
> Hi!
> 
> I have a multicore installation with the following configuration:
> 
> 
>   
> 
> 
> 
> 
> 
> 
>   
> 
> 
> Each core uses different schemas, I indexed some docs shopping core and a
> few others on tourism core, when I send a query "a*" to tourism core I'm
> getting docs from shopping core, this is the expected behaviour? Should I
> define a "core" field on both schemas and use this field as filter, like
> we
> have
> here,
> to avoid it?
> 
> -- 
> Regards,
> 
> Rogério (_rogerio_)
> 
> [Blog: http://faces.eti.br] [Sandbox: http://bmobile.dyndns.org] [Twitter:
> http://twitter.com/ararog]
> 
> "Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento,
> distribua e aprenda mais."
> (http://faces.eti.br/2006/10/30/conhecimento-e-amadurecimento)
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Sending-queries-to-multicore-installation-tp19486412p19489077.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: What's the bottleneck?

2008-09-15 Thread r.prieto
Hi Jason,

I'd like to know how you solved the problem.
could you post the solution??

Thanks

Raúl
-Mensaje original-
De: Jason Rennie [mailto:[EMAIL PROTECTED] 
Enviado el: jueves, 11 de septiembre de 2008 21:58
Para: solr-user@lucene.apache.org
Asunto: Re: What's the bottleneck?

On Thu, Sep 11, 2008 at 1:29 PM, <[EMAIL PROTECTED]> wrote:

> what is your index configuration???


Not sure what you mean.  We're using 1.2, though we've tested with a recent
nightly and didn't see a significant change in performance...


> What is your average size form the returned fields ???


Returned fields are relatively small, ~200 characters total per document.
We're requesting the top 10 or so docs.

How much memory have your System ??


8g.  We give the jvm a 2g (max) heap.  We have another solr running on the
same box also w/ 2g heap.  The Linux kernel caches ~2.5g of disk.


> Do you have long fieds who is returned in the queries ?


No.  The searched and returned fields are relatively short.  One
searched-over (but not returned) field can get up to a few hundred
characters, but it's safe to assume they're all < 1k.


> Do you have actívate the Highlighting in the request ?


Nope.


> Are you using multi-value filed for filter ...


No, it does not have the multiValue attribute turned on.  The qf field is
just an integer.

Any thoughts/comments are appreciated.

Thanks,

Jason



Searching for Index-Time Boosting in FAQ

2008-09-15 Thread Luca Molteni
Hello, dear Solr Users,

I'm starting to learn Solr and Lucene, since I want to use this technology
in my project, but I found some trouble in the "index-time boosting" section
of the documentation, I'm probably missing something, but since I can't
figure out all by myself, I decided to wrote here.

For example, in the scoring faq:

http://wiki.apache.org/solr/SolrRelevancyFAQ

There are references to and "index-time boosting" options for fields, but I
haven't found anything neither in the field configuration section of the
wiki nor in the schema.xml file.

>From what I've understand, these are the attributes for fields:

name, indexed, stored, compressed, multiValued, omitNorms, termVectors

But none of these solved my problem.

At the end of the scoring FAQ there is this note:

   -

   when should index-time boosts be used


I'd like to see "how" index-time boosts should be used, please,

Thank you very much,

Bye.

L.M.


Date field mystery

2008-09-15 Thread Kolodziej Christian
Hello everybody,

We have big problem searching out solr index and filtering for the date. Let me 
give you an example: there is a record with date 30.04.2008, 15:32:00. My query 
contains "+date:[20080101T12:00:00Z TO 20080915T13:59:00Z]" but the record is 
not found. But when I search "+date:[20071231T12:00:00Z TO 20080915T13:59:00Z]" 
the record is found.

Does anyone had similar problems? Or does anyone have an idea was is going 
wrong? Or do you need more detailed information?

Best regards,
Christian


Re: Sending queries to multicore installation

2008-09-15 Thread Rogerio Pereira
It worked! Thanks Henrib!

2008/9/15 Henrib <[EMAIL PROTECTED]>

>
> Hi,
> If you are sure that you did index your documents through the intended
> core,
> it might be that your solrconfig.xml does not use the 'dataDir' property
> you
> declared in solr.xml for your 2 cores.
>
> The shopping & tourims solconfig.xml should have a line stating:
> ${dataDir}
>
> And *not* the default:
> ${solr.data.dir:./solr/data}
> Which will make both cores use the same index.
>
> Hope this helps,
> Henrib
>
>
> rogerio.araujo wrote:
> >
> > Hi!
> >
> > I have a multicore installation with the following configuration:
> >
> > 
> >   
> > 
> > 
> > 
> > 
> > 
> > 
> >   
> > 
> >
> > Each core uses different schemas, I indexed some docs shopping core and a
> > few others on tourism core, when I send a query "a*" to tourism core I'm
> > getting docs from shopping core, this is the expected behaviour? Should I
> > define a "core" field on both schemas and use this field as filter, like
> > we
> > have
> > here<
> http://wiki.apache.org/solr/MultipleIndexes#head-9e6bee989c8120974eee9df0944b58a28d489ba2
> >,
> > to avoid it?
> >
> > --
> > Regards,
> >
> > Rogério (_rogerio_)
> >
> > [Blog: http://faces.eti.br] [Sandbox: http://bmobile.dyndns.org]
> [Twitter:
> > http://twitter.com/ararog]
> >
> > "Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento,
> > distribua e aprenda mais."
> > (http://faces.eti.br/2006/10/30/conhecimento-e-amadurecimento)
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Sending-queries-to-multicore-installation-tp19486412p19489077.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,

Rogério (_rogerio_)

[Blog: http://faces.eti.br] [Sandbox: http://bmobile.dyndns.org] [Twitter:
http://twitter.com/ararog]

"Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento,
distribua e aprenda mais."
(http://faces.eti.br/2006/10/30/conhecimento-e-amadurecimento)


Re: Turkish stemming ??? stemming?

2008-09-15 Thread sunnyfr

Hi Grant,

Sorry I'm new can you explain me how to apply patch,  what is exactly trunk
version?

Thanks,
Sunny



Grant Ingersoll-6 wrote:
> 
> Snowball has a Turkish stemmer.  It is available in the trunk version  
> of Solr.
> 
> On Sep 12, 2008, at 11:29 AM, sunnyfr wrote:
> 
>>
>> Hi everybody,
>>
>> Does somebody found a way to manage Turkish's language?
>>
>> Thanks,
>> Sunny
>> -- 
>> View this message in context:
>> http://www.nabble.com/Turkish-stemming-stemming--tp19458041p19458041.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Turkish-stemming-stemming--tp19458041p19493018.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: update solr

2008-09-15 Thread Kashyap, Raghu
SOLR 1.3.0 is in process of being released soon. If you wait for it you
can get the latest official release that you can use.

http://wiki.apache.org/solr/SolrInstall

http://wiki.apache.org/solr/Solr1.3?highlight=(1.3)

-Raghu

-Original Message-
From: sunnyfr [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 12, 2008 4:36 AM
To: solr-user@lucene.apache.org
Subject: update solr


Hi - I am a newbie to Solr and would like to know how to update solr
version
properly.
I saw a lot of patch everywhere and I don't want to mess up everything.
My environment is Linux.

Thanks a lot,
Sunny
-- 
View this message in context:
http://www.nabble.com/update-solr-tp19452613p19452613.html
Sent from the Solr - User mailing list archive at Nabble.com.



apply patch

2008-09-15 Thread sunnyfr

Hello,

I'm new in Solr / Linux.

I would like to know how to check if there is a Solr's update? Where ?
Then how can I apply a patch,  I read a bit everywhere about trunk folder,
but I don't have it? 

How it works ?

Thanks,
Sunny

-- 
View this message in context: 
http://www.nabble.com/apply-patch-tp19493267p19493267.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Help with Dismax query Handler

2008-09-15 Thread Vaijanath N. Rao

Hi Shalin,

This works for me.

--Thanks and Regards
Vaijanath

Shalin Shekhar Mangar wrote:

On Sun, Sep 14, 2008 at 10:08 AM, Vaijanath N. Rao <[EMAIL PROTECTED]>wrote:

  

We have one field called language, i.e.. language of the documents, we want
people to search for there required query terms but limit it to the language
selected

instead the q can be of type
q=field1:en +xyz&qt=dismax




It seems to me that the language field is only being used for filtering. So
what you actually need is the fq parameter. Your query should look like:
q=xyz&fq=field1:en&qt=dismax


  




Re: Date field mystery

2008-09-15 Thread Erick Erickson
The guys who really know will be able to provide you much better
feedback if you include:
your field definitions
probably your locale settings.

And have you looked with Luke at your index to see what
the data actually looks like for that field in that record? Is it
possible that the date is getting interpreted as month
30, day 4, 2008? (How that would make this date appear
earlier than 2008 I have no idea, but stranger things have
happened )

Best
Erick

On Mon, Sep 15, 2008 at 8:07 AM, Kolodziej Christian <
[EMAIL PROTECTED]> wrote:

> Hello everybody,
>
> We have big problem searching out solr index and filtering for the date.
> Let me give you an example: there is a record with date 30.04.2008,
> 15:32:00. My query contains "+date:[20080101T12:00:00Z TO
> 20080915T13:59:00Z]" but the record is not found. But when I search
> "+date:[20071231T12:00:00Z TO 20080915T13:59:00Z]" the record is found.
>
> Does anyone had similar problems? Or does anyone have an idea was is going
> wrong? Or do you need more detailed information?
>
> Best regards,
> Christian
>


Some new SOLR features

2008-09-15 Thread Jason Rutherglen
Hello,

There are a few features I would like to see in SOLR going forward and
I am interested in finding out what other folks thought about them to
get a priority list.  I believe there are many features that Google
and FAST have that SOLR and Lucene will want to implement in future
releases.

1. Machine learning based suggest feature
https://issues.apache.org/jira/browse/LUCENE-626 which is implemented
as is similar to what Google in their suggest implementation.  The
Fuzzy based spellchecker is ok, but it would be better to incorporate
use behavior.
2. Realtime updates https://issues.apache.org/jira/browse/LUCENE-1313
and work being planned for IndexWriter
3. Realtime untokenized field updates
https://issues.apache.org/jira/browse/LUCENE-1292
4. BM25 Scoring
5. Integration with an open source SQL database such as H2.  This
would mean under the hood, SOLR would enable storing data in a
relational database to allow for joins and things.  It would need to
be combined with realtime updates.  H2 has Lucene integration but it
is the usual index everything at once, non-incrementally.  The new
system would simply index as a new row in a table is added.  The SOLR
schema could allow for certain fields being stored in an SQL database.
6. SOLR schema allowing for multiple indexes without using the
multicore.  The indexes could be defined like SQL tables in the
schema.xml file.
6. Crowd by feature ala GBase
http://code.google.com/apis/base/attrs-queries.html#crowding which is
similar to Field Collapsing.  I am thinking it is advantageous from a
performance perspective to obtain an excessive amount of results, then
filter down the result set, rather than first sort a result set.
7. Improved relevance based on user clicks of individual query results
for individual queries.  This can be thought of as similar to what
Digg does.  I'm sure Google does something similar.  It is a feature
that would be of value to almost any SOLR implementation.
8. Integration of LocalSolr into the standard SOLR distribution.
Location is something many sites use these days and is standard in
GBase and most likely other products like FAST.
9. Distributed search and updates using a object serialization which
could use.  https://issues.apache.org/jira/browse/LUCENE-1336  This
allows span queries, custom payload queries, custom similarities,
custom analyzers, without compiling and deploying and a new SOLR war
file to individual servers.

Cheers,
Jason


Re: Some new SOLR features

2008-09-15 Thread Ryan McKinley




Here are my gut reactions to this list... in general, most of this  
comes down to "sounds great, if someone did the work I'm all for it"!


Also, no need to post to solr-user AND solr-dev, probably better to  
think of solr-user as a superset of solr-dev.




1. Machine learning based suggest feature
https://issues.apache.org/jira/browse/LUCENE-626 which is implemented
as is similar to what Google in their suggest implementation.  The
Fuzzy based spellchecker is ok, but it would be better to incorporate
use behavior.
2. Realtime updates https://issues.apache.org/jira/browse/LUCENE-1313
and work being planned for IndexWriter
3. Realtime untokenized field updates
https://issues.apache.org/jira/browse/LUCENE-1292


Without knowing the details of these patches, everything sounds great.

In my view, SOLR should offer a nice interface to anything in lucene  
core/contrib




4. BM25 Scoring


Again, no idea, but if implement in lucene yes



5. Integration with an open source SQL database such as H2.  This
would mean under the hood, SOLR would enable storing data in a
relational database to allow for joins and things.  It would need to
be combined with realtime updates.  H2 has Lucene integration but it
is the usual index everything at once, non-incrementally.  The new
system would simply index as a new row in a table is added.  The SOLR
schema could allow for certain fields being stored in an SQL database.


Sounds interesting -- what is the basic problem you are addressing?

(It seems you are pointing to something specific, and describing your  
solution)





6. SOLR schema allowing for multiple indexes without using the
multicore.  The indexes could be defined like SQL tables in the
schema.xml file.


Is this just a configuration issue?  I defiantly hope we can make  
configuration easier in the future.


As is, a custom handler can look at multiple indexes... why is their a  
need to have multiple lucene indexes within a single SolrCore?





6. Crowd by feature ala GBase
http://code.google.com/apis/base/attrs-queries.html#crowding which is
similar to Field Collapsing.  I am thinking it is advantageous from a
performance perspective to obtain an excessive amount of results, then
filter down the result set, rather than first sort a result set.


Again, sounds great!  I would love to see it.



7. Improved relevance based on user clicks of individual query results
for individual queries.  This can be thought of as similar to what
Digg does.  I'm sure Google does something similar.  It is a feature
that would be of value to almost any SOLR implementation.


Agreed -- if there is a good way to quickly update a field used for  
sorting/scoring, this would happen




8. Integration of LocalSolr into the standard SOLR distribution.
Location is something many sites use these days and is standard in
GBase and most likely other products like FAST.


I'm working on it  will be a lucene contrib package and cooked  
into the core solr distribution.





9. Distributed search and updates using a object serialization which
could use.  https://issues.apache.org/jira/browse/LUCENE-1336  This
allows span queries, custom payload queries, custom similarities,
custom analyzers, without compiling and deploying and a new SOLR war
file to individual servers.



sounds good (but I have no technical basis to say so)


ryan



RE: apply patch

2008-09-15 Thread Steven A Rowe
Hi Sunny,

This wiki page should answer your questions:



Look under the sections "Getting the source code" and "Working With Patches".

Good luck,
Steve

On 09/15/2008 at 9:45 AM, sunnyfr wrote:
> 
> Hello,
> 
> I'm new in Solr / Linux.
> 
> I would like to know how to check if there is a Solr's update? Where ?
> Then how can I apply a patch,  I read a bit everywhere about
> trunk folder,
> but I don't have it?
> 
> How it works ?
> 
> Thanks,
> Sunny
> 
> -- View this message in context:
> http://www.nabble.com/apply-patch-tp19493267p19493267.html Sent from the
> Solr - User mailing list archive at Nabble.com.
> 
>

 



Re: SolrJ and JSON in Solr -1.3

2008-09-15 Thread Ryan McKinley
I also have trouble understanding why you would care how solrj talks  
to the server...  the javabin option is the fastest available.


If you need to give JSON to a client, can't you just put in a proxy?


On Sep 15, 2008, at 12:46 AM, Erik Hatcher wrote:

If the client wants JSON, then it seems passing it straight from  
Solr through the application server tier (hypothetical architecture  
here) to the client as JSON is a nice way to go.  If the client can  
talk directly to Solr, then definitely just &wt=json and carry on,  
but more often then not an application server is in the middle.


Curious: SolrJ with javabin format to an app server that converts to  
JSON, pros/cons to the raw response writer?


What are others doing in the Ajaxed client world with Solr?

Erik



On Sep 15, 2008, at 12:01 AM, Jon Baer wrote:

Hmm am I missing something but isn't the real point of SolrJ to be  
able to use the binary (javabin) format to keep it small / tight /  
compressed?  I have had to proxy Solr recently and found just  
throwing a SolrDocumentList as a JSONArray (via json.org libs)  
works pretty well (YMMV).  I was just under the impression that the  
Java to Java bridge was the best way to go ...


It would be nice to have util methods on the SolrDocumentList  
(toJSON(), toXML(), etc) maybe?


- Jon

On Sep 14, 2008, at 11:14 PM, Erik Hatcher wrote:



On Sep 14, 2008, at 2:51 PM, Julio Castillo wrote:

What is the status of JSON support via SolrJ?


Requires a custom ResponseParser.  See SOLR-402 for a couple of  
implementation ideas:




Maybe this code is no longer current to trunk?

I want to be able to specify a parser such as the  
XMLResponseParser on my

SolrServer. What are my options?


Use SolrServer#setParser() for one of the above implementations.

I guess I could get an XML response and then convert it to JSON?  
I rather

not.


Ewww, don't do that.

There is a JIRA entry SOLR-402, but real resolution to it per the  
comments

that follow in the feature request.
https://issues.apache.org/jira/browse/SOLR-402


Did the RawResponseParser work for you?   If so, we can build that  
into Solr trunk - +1.  I shoulda done that a while ago, sorry.   
This actually fits well with SOLR-620, in my nefarious plans to  
build a web framework out of Solr ;)


Erik







RE: apply patch

2008-09-15 Thread Steven A Rowe
(I'm responding on the mailing list to a personal email.  Sunny, please use the 
mailing list, rather than replying to my personal email address.  Note that 
this is a community policy/convention, not just my own preference.)

On 09/15/2008 at 11:04 AM, [EMAIL PROTECTED] wrote:
> Hi thanks a lot for your quick answer,
> Just little question what is exactly solr trunk .. is it the
> root folder with build folder, build.xml file ...client, dist,
> src folder ? is it this place ?

You should read about Subversion, the version control repository that Solr uses 
to track changes to its source files.  Here's a link to the free online book 
for version 1.4:



In particular, Chapter 4 "Branching and Merging" discusses (on the second web 
page of the chapter) what "trunk" means:



Also, there is a link from the HowToContribute web page to the wiki page "Solr 
Version Control System"  
that contains a link to a web site that lets you browse the contents of the 
repository using your web browser: . 
 If you go there, you'll see that one of the top-level directory names is 
"trunk".

Steve

On 09/15/2008 at 10:58 AM, Steven A Rowe wrote:
> Hi Sunny,
> 
> This wiki page should answer your questions:
> 
> 
> 
> Look under the sections "Getting the source code" and
> "Working With Patches".
> 
> Good luck,
> Steve
> 
> On 09/15/2008 at 9:45 AM, sunnyfr wrote:
> > 
> > Hello,
> > 
> > I'm new in Solr / Linux.
> > 
> > I would like to know how to check if there is a Solr's update? Where ?
> > Then how can I apply a patch,  I read a bit everywhere about trunk
> > folder, but I don't have it?
> > 
> > How it works ?
> > 
> > Thanks,
> > Sunny

 



Re: No server response code on insert: how do I avoid this at high speed?

2008-09-15 Thread Paleo Tek
Good questions. 


Otis Gospodnetic wrote:

>Perhaps the container logs explain what happened

1)  I can't find anything intersterting in the container logs.  To the 
best of my knowledge, neither of the containers notice the drop.  Jetty 
d show "out of threads" type errors before I tweaking the thread 
parameters.  Once it was tuned a bit, I stopped seeing these entries in 
the log, but did not stop getting the errors.



How about just throttling to the point where the failure rate is 0%?  Too slow?



2) Throttling to 0 errors really slows things down.  The last time I ran 
stats, performance scaled almost linearly with additional threads until 
we reached the approximate number of CPUs in the system.  Anything above 
two threads shows progressively more error if I don't apply any 
throttling.  The churn I need to keep up with makes that undesirable.


I'll put together some stats on insert rates, number of threads, and 
error rates and post them here.  It's a classic trade off: tolerating 
poor results that require additional processing in exchange for higher 
performance.  A set of heuristics for this situation might be useful, 
since I'm likely not the only one with an indexing bottleneck.


 -Jim

Otis Gospodnetic wrote:

Perhaps the container logs explain what happened?
How about just throttling to the point where the failure rate is 0%?  Too slow?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  

From: Paleo Tek <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, September 12, 2008 11:19:52 AM
Subject: No server response code on insert:  how do I avoid this at high speed?

I have a largish index with a lot of churn, and inserts that come in 
large bursts.  My server is a multiprocessor with plenty of memory, so I 
can multi-thread and stuff in about 1.6 million records per hour, going 
full speed.  I use a dozen or so threads to post curl inserts, and 
monitor the responses.


Using jetty, there is ~10% failure rate with no server response code 
received.  Switching to tomcat reduces the error rate to around 2%.  
(which makes me like tomcat  a lot, even though I'm a dog person...).  I 
suspect I'm overrunning the capacity of the servlet container.  Tweaking 
parameters in Jetty improved performance, and I can tune Tomcat.  But 
then I'll just be overrunning a tuned system, at a slightly faster rate.


My work around is to keep track of which inserts fail, but I suspect 
there's a better approach.  Any suggestions how I can balance maximum 
insert speed with a low error rate?  Thanks!


  -Jim




  




Re: No server response code on insert: how do I avoid this at high speed?

2008-09-15 Thread Yonik Seeley
On Mon, Sep 15, 2008 at 2:17 PM, Paleo Tek <[EMAIL PROTECTED]> wrote:
> 1)  I can't find anything intersterting in the container logs.

Is the client timing out the connection?
If Solr were encountering errors, they would be logged.

-Yonik


RE: SolrJ and JSON in Solr -1.3

2008-09-15 Thread Julio Castillo
Jon,
Is the binary (javabin) format implied by selecting the RawResponseParser? I
guess I don't know what the javabin format is.

So you took a SolrDocumentList and converted it into a JSON Array?

Thanks

** julio

-Original Message-
From: Jon Baer [mailto:[EMAIL PROTECTED] 
Sent: Sunday, September 14, 2008 9:01 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrJ and JSON in Solr -1.3

Hmm am I missing something but isn't the real point of SolrJ to be able to
use the binary (javabin) format to keep it small / tight / compressed?  I
have had to proxy Solr recently and found just throwing a SolrDocumentList
as a JSONArray (via json.org libs) works pretty well (YMMV).  I was just
under the impression that the Java to Java bridge was the best way to go ...

It would be nice to have util methods on the SolrDocumentList (toJSON(),
toXML(), etc) maybe?

- Jon

On Sep 14, 2008, at 11:14 PM, Erik Hatcher wrote:

>
> On Sep 14, 2008, at 2:51 PM, Julio Castillo wrote:
>> What is the status of JSON support via SolrJ?
>
> Requires a custom ResponseParser.  See SOLR-402 for a couple of 
> implementation ideas:
>
>  
>
> Maybe this code is no longer current to trunk?
>
>> I want to be able to specify a parser such as the XMLResponseParser 
>> on my SolrServer. What are my options?
>
> Use SolrServer#setParser() for one of the above implementations.
>
>> I guess I could get an XML response and then convert it to JSON? I 
>> rather not.
>
> Ewww, don't do that.
>
>> There is a JIRA entry SOLR-402, but real resolution to it per the 
>> comments that follow in the feature request.
>> https://issues.apache.org/jira/browse/SOLR-402
>
> Did the RawResponseParser work for you?   If so, we can build that  
> into Solr trunk - +1.  I shoulda done that a while ago, sorry.  This 
> actually fits well with SOLR-620, in my nefarious plans to build a web 
> framework out of Solr ;)
>
>   Erik
>



RE: SolrJ and JSON in Solr -1.3

2008-09-15 Thread Julio Castillo
Erik,
Yes indeed my architecture has a middle tier and was hoping to use a solrj
client interface to perform the handshake between a Solr server and the
browser.

And so, if I was able to get hold of the response stream already in JSON
format, and just pass it through without having to convert it. Of course if
I could get hold of the XML stream then I could pass it through to the
browser too I suppose.

** julio
 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Sunday, September 14, 2008 9:46 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrJ and JSON in Solr -1.3

If the client wants JSON, then it seems passing it straight from Solr
through the application server tier (hypothetical architecture here) to the
client as JSON is a nice way to go.  If the client can talk directly to
Solr, then definitely just &wt=json and carry on, but more often then not an
application server is in the middle.

Curious: SolrJ with javabin format to an app server that converts to JSON,
pros/cons to the raw response writer?

What are others doing in the Ajaxed client world with Solr?

Erik



On Sep 15, 2008, at 12:01 AM, Jon Baer wrote:

> Hmm am I missing something but isn't the real point of SolrJ to be 
> able to use the binary (javabin) format to keep it small / tight / 
> compressed?  I have had to proxy Solr recently and found just throwing 
> a SolrDocumentList as a JSONArray (via json.org libs) works pretty 
> well (YMMV).  I was just under the impression that the Java to Java 
> bridge was the best way to go ...
>
> It would be nice to have util methods on the SolrDocumentList 
> (toJSON(), toXML(), etc) maybe?
>
> - Jon
>
> On Sep 14, 2008, at 11:14 PM, Erik Hatcher wrote:
>
>>
>> On Sep 14, 2008, at 2:51 PM, Julio Castillo wrote:
>>> What is the status of JSON support via SolrJ?
>>
>> Requires a custom ResponseParser.  See SOLR-402 for a couple of 
>> implementation ideas:
>>
>> 
>>
>> Maybe this code is no longer current to trunk?
>>
>>> I want to be able to specify a parser such as the XMLResponseParser 
>>> on my SolrServer. What are my options?
>>
>> Use SolrServer#setParser() for one of the above implementations.
>>
>>> I guess I could get an XML response and then convert it to JSON? I 
>>> rather not.
>>
>> Ewww, don't do that.
>>
>>> There is a JIRA entry SOLR-402, but real resolution to it per the 
>>> comments that follow in the feature request.
>>> https://issues.apache.org/jira/browse/SOLR-402
>>
>> Did the RawResponseParser work for you?   If so, we can build that  
>> into Solr trunk - +1.  I shoulda done that a while ago, sorry.   
>> This actually fits well with SOLR-620, in my nefarious plans to build 
>> a web framework out of Solr ;)
>>
>>  Erik
>>



RE: 1.3.0 candidate

2008-09-15 Thread Teruhiko Kurosaka
The release candidates is up again. 

> -Original Message-
> From: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
> Sent: Monday, September 08, 2008 10:34 AM
> To: solr-user@lucene.apache.org
> Subject: Re: 1.3.0 candidate
> 
> This is temporarily removed, as I need to create another.
> 
> On Sep 7, 2008, at 8:45 PM, Grant Ingersoll wrote:
> 
> > I've posted what I hope is the final 1.3.0 candidate at 
> > http://people.apache.org/~gsingers/solr/1.3.0/
> >
> > Please try it out and provide feedback. Note, this is not 
> an official 
> > release.
> >
> > Cheers,
> > Grant
> 
> 
> 


Re: Turkish stemming ??? stemming?

2008-09-15 Thread Grant Ingersoll
trunk version is just the latest development version, and can be  
obtained via Subversion: svn checkout http://svn.apache.org/repos/asf/lucene/solr/trunk


See also http://wiki.apache.org/solr/HowToContribute

Or, you could just wait for Solr 1.3.0 which will be out this week (I  
promise!).


On Sep 15, 2008, at 9:31 AM, sunnyfr wrote:



Hi Grant,

Sorry I'm new can you explain me how to apply patch,  what is  
exactly trunk

version?

Thanks,
Sunny



Grant Ingersoll-6 wrote:


Snowball has a Turkish stemmer.  It is available in the trunk version
of Solr.

On Sep 12, 2008, at 11:29 AM, sunnyfr wrote:



Hi everybody,

Does somebody found a way to manage Turkish's language?

Thanks,
Sunny
--
View this message in context:
http://www.nabble.com/Turkish-stemming-stemming--tp19458041p19458041.html
Sent from the Solr - User mailing list archive at Nabble.com.






Re: SolrJ and JSON in Solr -1.3

2008-09-15 Thread Jon Baer
From what I understand you don't have to select a thing, the SolrCore  
would detect SolrJ and do it automatically(?) ...


44. SOLR-486: Binary response format, faster and smaller
than XML and JSON response formats (use wt=javabin).
BinaryResponseParser for utilizing the binary format via SolrJ
and is now the default.
(Noble Paul, yonik)

On Sep 15, 2008, at 2:40 PM, Julio Castillo wrote:


Jon,
Is the binary (javabin) format implied by selecting the  
RawResponseParser? I

guess I don't know what the javabin format is.

So you took a SolrDocumentList and converted it into a JSON Array?

Thanks

** julio

-Original Message-
From: Jon Baer [mailto:[EMAIL PROTECTED]
Sent: Sunday, September 14, 2008 9:01 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrJ and JSON in Solr -1.3

Hmm am I missing something but isn't the real point of SolrJ to be  
able to
use the binary (javabin) format to keep it small / tight /  
compressed?  I
have had to proxy Solr recently and found just throwing a  
SolrDocumentList
as a JSONArray (via json.org libs) works pretty well (YMMV).  I was  
just
under the impression that the Java to Java bridge was the best way  
to go ...


It would be nice to have util methods on the SolrDocumentList  
(toJSON(),

toXML(), etc) maybe?

- Jon

On Sep 14, 2008, at 11:14 PM, Erik Hatcher wrote:



On Sep 14, 2008, at 2:51 PM, Julio Castillo wrote:

What is the status of JSON support via SolrJ?


Requires a custom ResponseParser.  See SOLR-402 for a couple of
implementation ideas:



Maybe this code is no longer current to trunk?


I want to be able to specify a parser such as the XMLResponseParser
on my SolrServer. What are my options?


Use SolrServer#setParser() for one of the above implementations.


I guess I could get an XML response and then convert it to JSON? I
rather not.


Ewww, don't do that.


There is a JIRA entry SOLR-402, but real resolution to it per the
comments that follow in the feature request.
https://issues.apache.org/jira/browse/SOLR-402


Did the RawResponseParser work for you?   If so, we can build that
into Solr trunk - +1.  I shoulda done that a while ago, sorry.  This
actually fits well with SOLR-620, in my nefarious plans to build a  
web

framework out of Solr ;)

Erik







Solr stops listening

2008-09-15 Thread Peter Williams
I am using Solr 1.2.0 with Jetty and I am experiencing some odd failures of
Solr.  Solr seems to just stops listening for new TCP connections.  The Solr
process continues running and log contains nothing suspicious (to me,
anyway) but curl requests against the server fail with "connection refused"
and `netstat --listening` does not list the Solr port.  Restarting Solr
fixes the problem (curl request work, etc).  This happens once or more a
day.

Any one have any ideas about how to figure out what is going wrong?

Peter Williams


Re: Solr stops listening

2008-09-15 Thread Fuad Efendi


SOLR main servlet catches all Throwable. In case of very popular OOME  
with standard JVM from SUN you will get exactly this behaviour.

==
http://www.tokenizer.org/bot.html


Quoting Peter Williams <[EMAIL PROTECTED]>:


I am using Solr 1.2.0 with Jetty and I am experiencing some odd failures of
Solr.  Solr seems to just stops listening for new TCP connections.  The Solr
process continues running and log contains nothing suspicious (to me,
anyway) but curl requests against the server fail with "connection refused"
and `netstat --listening` does not list the Solr port.  Restarting Solr
fixes the problem (curl request work, etc).  This happens once or more a
day.

Any one have any ideas about how to figure out what is going wrong?

Peter Williams







RE: Adding bias to Distributed search feature?

2008-09-15 Thread Lance Norskog
Thanks!  We made variants of this and a couple of other files.

As to why we have the same document in different shards with different
contents: once you hit a certain index size and ingest rate, it is easiest
to create a series of indexes and leave the older ones alone. In the future,
please consider this as a legitimate use case instead of simply a mistake.

Thanks again,

Lance

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: Saturday, September 13, 2008 5:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Adding bias to Distributed search feature?

On Thu, Sep 11, 2008 at 10:31 PM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> Is it possible to add a bias to the ordering in the distributed search 
> feature? That is, if the search finds the same content in two 
> different indexes, it always favors the document from the first index over
the second.

Handling duplicates is not currently done as a feature, but as a check
against a mistake.
It's not currently deterministic... first one returned will win.

Here's the relevant code from QueryComponent:

  String prevShard = uniqueDoc.put(id, srsp.getShard());
  if (prevShard != null) {
// duplicate detected
numFound--;

// For now, just always use the first encountered since we can't
currently
// remove the previous one added to the priority queue.
If we switched
// to the Java5 PriorityQueue, this would be easier.
continue;
// make which duplicate is used deterministic based on shard
// if (prevShard.compareTo(srsp.shard) >= 0) {
//  TODO: remove previous from priority queue
//  continue;
// }
  }



Searching for future or "null" dates

2008-09-15 Thread Chris Maxwell

I'm having a lot of trouble getting this query syntax to work correctly. How
can I search for a date, which is either in the future OR missing completely
(meaning open ended)

I've tried -endDate:[* TO *] OR endDate[NOW TO *] but that doesn't work.
Adding parentheses doesn't help either.

Any help would be appreciated.
-- 
View this message in context: 
http://www.nabble.com/Searching-for-future-or-%22null%22-dates-tp19502167p19502167.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: SolrJ and JSON in Solr -1.3

2008-09-15 Thread Julio Castillo
I guess I'm still confused on how to use the Binary response format.
I was looking for examples of SolrJ consumers of the response object, but
didn't find anything.
The only example I see listed on the documentation is uses the
XMLResponseParser as follows (excerpt):

CommonHttpSolrServer server = new CommonsHttpSolrServer(url);
Server.setParser(new XMLResponseParser());
SolrQuery query = new SolrQuery();
Query.setQuery("*:*");
QueryResponse rsp = server.query(query);
SolrDocumentList docs = rsp.getResults();

If I use a Binary format instead, do I still use the same steps to extract
the data?

thanks

** julio

-Original Message-
From: Jon Baer [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 15, 2008 1:11 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrJ and JSON in Solr -1.3

 From what I understand you don't have to select a thing, the SolrCore would
detect SolrJ and do it automatically(?) ...

44. SOLR-486: Binary response format, faster and smaller
 than XML and JSON response formats (use wt=javabin).
 BinaryResponseParser for utilizing the binary format via SolrJ
 and is now the default.
 (Noble Paul, yonik)

On Sep 15, 2008, at 2:40 PM, Julio Castillo wrote:

> Jon,
> Is the binary (javabin) format implied by selecting the 
> RawResponseParser? I guess I don't know what the javabin format is.
>
> So you took a SolrDocumentList and converted it into a JSON Array?
>
> Thanks
>
> ** julio
>
> -Original Message-
> From: Jon Baer [mailto:[EMAIL PROTECTED]
> Sent: Sunday, September 14, 2008 9:01 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrJ and JSON in Solr -1.3
>
> Hmm am I missing something but isn't the real point of SolrJ to be 
> able to use the binary (javabin) format to keep it small / tight / 
> compressed?  I have had to proxy Solr recently and found just throwing 
> a SolrDocumentList as a JSONArray (via json.org libs) works pretty 
> well (YMMV).  I was just under the impression that the Java to Java 
> bridge was the best way to go ...
>
> It would be nice to have util methods on the SolrDocumentList 
> (toJSON(), toXML(), etc) maybe?
>
> - Jon
>
> On Sep 14, 2008, at 11:14 PM, Erik Hatcher wrote:
>
>>
>> On Sep 14, 2008, at 2:51 PM, Julio Castillo wrote:
>>> What is the status of JSON support via SolrJ?
>>
>> Requires a custom ResponseParser.  See SOLR-402 for a couple of 
>> implementation ideas:
>>
>> 
>>
>> Maybe this code is no longer current to trunk?
>>
>>> I want to be able to specify a parser such as the XMLResponseParser 
>>> on my SolrServer. What are my options?
>>
>> Use SolrServer#setParser() for one of the above implementations.
>>
>>> I guess I could get an XML response and then convert it to JSON? I 
>>> rather not.
>>
>> Ewww, don't do that.
>>
>>> There is a JIRA entry SOLR-402, but real resolution to it per the 
>>> comments that follow in the feature request.
>>> https://issues.apache.org/jira/browse/SOLR-402
>>
>> Did the RawResponseParser work for you?   If so, we can build that
>> into Solr trunk - +1.  I shoulda done that a while ago, sorry.  This 
>> actually fits well with SOLR-620, in my nefarious plans to build a 
>> web framework out of Solr ;)
>>
>>  Erik
>>
>



Re: SolrJ and JSON in Solr -1.3

2008-09-15 Thread Ryan McKinley
The solrj API does not care how data is passed around, the interface  
to use it is identical.


If you create a CommonsHttpSolrServer and don't set the parser, it  
will by default use the javabin parser.


  SolrServer server = new CommonsHttpSolrServer(url);
  SolrQuery query = new SolrQuery();
  query.setQuery("*:*");
  QueryResponse rsp = server.query(query);
  SolrDocumentList docs = rsp.getResults();

Are you connecting to a 1.2 server?  If not, there is no reason to use  
the XMLResponseParser:

http://wiki.apache.org/solr/Solrj#head-12c26b2d7806432c88b26cf66e236e9bd6e91849

ryan


On Sep 15, 2008, at 11:40 PM, Julio Castillo wrote:


I guess I'm still confused on how to use the Binary response format.
I was looking for examples of SolrJ consumers of the response  
object, but

didn't find anything.
The only example I see listed on the documentation is uses the
XMLResponseParser as follows (excerpt):

CommonHttpSolrServer server = new CommonsHttpSolrServer(url);
Server.setParser(new XMLResponseParser());
SolrQuery query = new SolrQuery();
Query.setQuery("*:*");
QueryResponse rsp = server.query(query);
SolrDocumentList docs = rsp.getResults();

If I use a Binary format instead, do I still use the same steps to  
extract

the data?

thanks

** julio

-Original Message-
From: Jon Baer [mailto:[EMAIL PROTECTED]
Sent: Monday, September 15, 2008 1:11 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrJ and JSON in Solr -1.3

From what I understand you don't have to select a thing, the  
SolrCore would

detect SolrJ and do it automatically(?) ...

44. SOLR-486: Binary response format, faster and smaller
than XML and JSON response formats (use wt=javabin).
BinaryResponseParser for utilizing the binary format via SolrJ
and is now the default.
(Noble Paul, yonik)

On Sep 15, 2008, at 2:40 PM, Julio Castillo wrote:


Jon,
Is the binary (javabin) format implied by selecting the
RawResponseParser? I guess I don't know what the javabin format is.

So you took a SolrDocumentList and converted it into a JSON Array?

Thanks

** julio

-Original Message-
From: Jon Baer [mailto:[EMAIL PROTECTED]
Sent: Sunday, September 14, 2008 9:01 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrJ and JSON in Solr -1.3

Hmm am I missing something but isn't the real point of SolrJ to be
able to use the binary (javabin) format to keep it small / tight /
compressed?  I have had to proxy Solr recently and found just  
throwing

a SolrDocumentList as a JSONArray (via json.org libs) works pretty
well (YMMV).  I was just under the impression that the Java to Java
bridge was the best way to go ...

It would be nice to have util methods on the SolrDocumentList
(toJSON(), toXML(), etc) maybe?

- Jon

On Sep 14, 2008, at 11:14 PM, Erik Hatcher wrote:



On Sep 14, 2008, at 2:51 PM, Julio Castillo wrote:

What is the status of JSON support via SolrJ?


Requires a custom ResponseParser.  See SOLR-402 for a couple of
implementation ideas:



Maybe this code is no longer current to trunk?


I want to be able to specify a parser such as the XMLResponseParser
on my SolrServer. What are my options?


Use SolrServer#setParser() for one of the above implementations.


I guess I could get an XML response and then convert it to JSON? I
rather not.


Ewww, don't do that.


There is a JIRA entry SOLR-402, but real resolution to it per the
comments that follow in the feature request.
https://issues.apache.org/jira/browse/SOLR-402


Did the RawResponseParser work for you?   If so, we can build that
into Solr trunk - +1.  I shoulda done that a while ago, sorry.  This
actually fits well with SOLR-620, in my nefarious plans to build a
web framework out of Solr ;)

Erik









Solr 1.3 and Lucene 2.4 dev

2008-09-15 Thread Lance Norskog
Is it possible to run Solr 1.3 with Lucene 2.3.2, the last official release
of Lucene?  We're running into a problem with our very very large index and
wonder if there is a bug in the development Lucene.
 
Thanks,
 
Lance Norskog


Re: Adding bias to Distributed search feature?

2008-09-15 Thread Andrzej Bialecki

Lance Norskog wrote:

Thanks!  We made variants of this and a couple of other files.

As to why we have the same document in different shards with different
contents: once you hit a certain index size and ingest rate, it is easiest
to create a series of indexes and leave the older ones alone. In the future,
please consider this as a legitimate use case instead of simply a mistake.


You may be interested in implementing something like this:

"Compact Features for Detection of Near-Duplicates in Distributed 
Retrieval", Yaniv Bernstein, Milad Shokouhi, and Justin Zobel


It sounds straightforward, and relieves your from the need to 
de-duplicate your collection.


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



AW: Date field mystery

2008-09-15 Thread Kolodziej Christian
Hi Eric,

>The guys who really know will be able to provide you much better
>feedback if you include:
>your field definitions

I hope, the following fields are enough.



   
   
   
   
 

>probably your locale settings.

The standard locale is en_US UTF8 and java doesn't seem to use another locale. 
Solr is running on a based CentOs server. Let me know if you need other 
settings, too. I don't know what also might be interesting...

>And have you looked with Luke at your index to see what
>the data actually looks like for that field in that record? Is it
>possible that the date is getting interpreted as month
>30, day 4, 2008? (How that would make this date appear
>earlier than 2008 I have no idea, but stranger things have
>happened )

The problem is that the date field isn't stored due to storage reasons (the 
index shouldn't get too big). We ony need the id and can rebuilt the found 
record later. Or is there a possibility to check how the date is saved. You 
thoughts may be correct, we also thought about this but couldn't find a 
solution to "look in an unstored field".

By the way. Does Solr make a difference between "20080915T16:28:00Z" and 
"2008-09-15T16:28:00Z"? I saw and tried both spellings and no error occurred.

Best regard,
Christian


>On Mon, Sep 15, 2008 at 8:07 AM, Kolodziej Christian <
>[EMAIL PROTECTED]> wrote:
>
>> Hello everybody,
>>
>> We have big problem searching out solr index and filtering for the
>date.
>> Let me give you an example: there is a record with date 30.04.2008,
>> 15:32:00. My query contains "+date:[20080101T12:00:00Z TO
>> 20080915T13:59:00Z]" but the record is not found. But when I search
>> "+date:[20071231T12:00:00Z TO 20080915T13:59:00Z]" the record is
>found.
>>
>> Does anyone had similar problems? Or does anyone have an idea was is
>going
>> wrong? Or do you need more detailed information?
>>
>> Best regards,
>> Christian
>>