date:20121012

Re: Any filter to map mutiple tokens into one ?

2012-10-12 Thread Konrad Lötzsch

You can build shingles and then use the synonym filter. in this case you 
will have to think about all these token that you don't need after the 
shingle filter.



Am 12.10.2012 01:35, schrieb T. Kuro Kurosaka:
I am looking for a way to fold a particular sequence of tokens into 
one token.
Concretely, I'd like to detect a three-token sequence of "*", ":" and 
"*", and replace it with a token of the text "*:*".
I tried SynonymFIlter but it seems it can only deal with a single 
input token. "* : * => *:*" seems to be interpreted

as one input token of 5 characters "*", space, ":", space and "*".

I'm using Solr 3.5.

Background:
My tokenizer separate the three character sequence "*:*" into 3 tokens 
of one character each.
The edismax parser, when given the query "*:*", i.e. find every doc, 
seems to pass the entire string "*:*" to the query analyzer (I suspect 
a bug.),

and feed the tokenized result to DisjunctionMaxQuery object,
according to this debug output:


*:*
*:*
+MatchAllDocsQuery(*:*) 
DisjunctionMaxQuery((body:"* : *"~100^0.5 | title:"* : 
*"~100^1.2)~0.01)
+*:* (body:"* : *"~100^0.5 | title:"* 
: *"~100^1.2)~0.01


Notice that there is a space between * and : in 
DisjunctionMaxQuery((body:"* : *" )


Probably because of this, the hit score is as low as 0.109, while it 
is 1.000 if an analyzer that doesn't break "*:*" is used.
So I'd like to stitch together "*", ":", "*" into "*:*" again to make 
DisjunctionMaxQuery happy.



Thanks.


T. "Kuro" Kurosaka

[ANNOUNCE] Apache Solr 4.0 released.

2012-10-12 Thread Robert Muir

October 12 2012, Apache Solr™ 4.0 available.
The Lucene PMC is pleased to announce the release of Apache Solr 4.0.

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the
searchand navigation features of many of the world's largest internet
sites.

Solr 4.0 is available for immediate download at:
   http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the CHANGES.txt file included with the release for a full list of details.

Solr 4.0 Release Highlights:

The largest set of features goes by the development code-name
"SolrCloud" and involves bringing easy scalability to Solr.  See
http://wiki.apache.org/solr/SolrCloud for more details.
* Distributed indexing designed from the ground up for near real-time
(NRT) and NoSQL features such as realtime-get, optimistic locking, and
durable updates.
* High availability with no single points of failure.
* Apache Zookeeper integration for distributed coordination and
cluster metadata and configuration storage.
* Immunity to split-brain issues due to Zookeeper's Paxos distributed
consensus protocols.
* Updates sent to any node in the cluster and are automatically
forwarded to the correct shard and replicated to multiple nodes for
redundancy.
* Queries sent to any node automatically perform a full distributed
search across the cluster with load balancing and fail-over.
* A collection management API.
* Smart SolrJ client (CloudSolrServer) that knows to send documents
only to the shard leaders

Solr 4.0 includes more NoSQL features for those using Solr as a
primary data store:
* Update durability – A transaction log ensures that even uncommitted
documents are never lost.
* Real-time Get – The ability to quickly retrieve the latest version
of a document, without the need to commit or open a new searcher
* Versioning and Optimistic Locking – combined with real-time get,
this allows read-update-write functionality that ensures no
conflicting changes were made concurrently by other clients.
* Atomic updates - the ability to add, remove, change, and increment
fields of an existing document without having to send in the complete
document again.

Many additional improvements include:
* New spatial field types with polygon support.
* Pivot Faceting – Multi-level or hierarchical faceting where the top
constraints for one field are found for each top constraint of a
different field.
* Pseudo-fields – The ability to alias fields, or to add metadata
along with returned documents, such as function query values and
results of spatial distance calculations.
* A spell checker implementation that can work directly from the main
index instead of creating a sidecar index.
* Pseudo-Join functionality – The ability to select a set of documents
based on their relationship to a second set of documents.
* Function query enhancements including conditional function queries
and relevancy functions.
* New update processors to facilitate modifying documents prior to indexing.
* A brand new web admin interface, including support for SolrCloud and
improved error reporting
* Numerous bug fixes and optimizations.

Noteworthy changes since 4.0-BETA:
* New spatial field types with polygon support.
* Various Admin UI improvements.
* SolrCloud related performance optimizations in writing the
transaction log, PeerSync recovery, Leader election, and ClusterState
caching.
* Numerous bug fixes and optimizations.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases.  It is possible that the mirror you
are using may not have replicated the release yet.  If that is the
case, please try another mirror.  This also goes for Maven access.

Happy searching,

Apache Lucene/Solr Developers

find a way to solr netbeans

2012-10-12 Thread Iwan Hanjoyo

Hi list,

Any one know the how-to integration solr with netbeans?


The reasons I want to have solr in netbeans:
+ to avoid the long classpath configuration in the environment variables
+ avoid complicated steps (especially when starting and restarting the
glassfish server),
+ help with debugging the app.

*=* *It simply integrate all the processes.*

So far, it is ok. I have netbeans run the app in the browser, view the
admin pages,
but got error when submit the search button.

Here is the error message:
HTTP Status 400 - Missing solr core name in path

I found the glassfish' log file reporting:
[#|2012-10-11T23:19:51.468+0700|INFO|glassfish3.1.2|org.apache.solr.handler.component.HttpShardHandlerFactory|..Setting
urlScheme to: http://|#]

This happened since I put the solr/home folder into the netbeans project
and hardcode
solr/home path in solr.xml file.

This is what I have done to fix "Setting urlScheme to: http://"; :
add in the solrconfig.xml file this configuration


1000
5000
http://127.0.0.1:8080/SolrRedo

  

Results:
The glassfish log file indicate this progress
[#|2012-10-11T23:19:51.796+0700|INFO|glassfish3.1.2|org.apache.solr.handler.component.HttpShardHandlerFactory|_ThreadID=68;_ThreadName=Thread-2;|Setting
urlScheme to: http://127.0.0.1:8080/SolrRedo|#]

However, the problem still happened (HTTP Status 400 - Missing solr core
name in path).

Can anyone help? Many thanks before.

Kind regards,


Hanjoyo

Re: Reloading ExternalFileField blocks Solr

2012-10-12 Thread Mikhail Khludnev

Martin,

I found slide quite relevant to what are you asking about.

http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr


On Tue, Oct 9, 2012 at 7:57 AM, Otis Gospodnetic  wrote:

> Hi Martin,
>
> Perhaps you could make a small change in Solr to add "don't reload EFF
> if it hasn't been modified since it was last opened".  I assume you
> commit pretty often, but don't modify EFF files that often, so this
> could save you some needless loading.  That said, I'd be surprised EFF
> doesn't already do this... I didn't check.
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Mon, Oct 8, 2012 at 4:55 AM, Martin Koch  wrote:
> > Hi List
> >
> > We're using Solr-4.0.0-Beta with a 7M document index running on a single
> > host with 16 shards. We'd like to use an ExternalFileField to hold a
> value
> > that changes often. However, we've discovered that the file is apparently
> > re-read by every shard/core on *every commit*; the index is unresponsive
> in
> > this period (around 20s on the host we're running on). This is
> unacceptable
> > for our needs. In the future, we'd like to add other values as
> > ExternalFileFields, and this will make the problem worse.
> >
> > It would be better if the external file were instead read in in the
> > background, updating previously read relevant values for each shard as
> they
> > are read in.
> >
> > I guess a change in the ExternalFileField code would be required to
> achieve
> > this, but I have no experience here, so suggestions are very welcome.
> >
> > Thanks,
> > /Martin Koch - Issuu - Senior Systems Architect.
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

Re: add shard to index

2012-10-12 Thread Radim Kolar


Dne 11.10.2012 1:12, Upayavira napsal(a):

That is what is being discussed already. The thing is, at present, Solr
requires an even distribution of documents across shards, so you can't
just add another shard, assign it to a hash range, and be done with it.

You can use shard size as part of scoring mechanism.

Re: SolrJ, optimize, maxSegments

2012-10-12 Thread Erick Erickson

Hmmm, I dug around in the code and found this bit:
   *  Forces merging of all segments that have deleted
   *  documents.  The actual merges to be executed are
   *  determined by the {@link MergePolicy}.  For example,
   *  the default {@link TieredMergePolicy} will only
   *  pick a segment if the percentage of
   *  deleted docs is over 10%.

see IndexWriter.forceMergeDeletes. So perhaps this limit
was never hit?

Best
Erick

On Thu, Oct 11, 2012 at 6:10 PM, Shawn Heisey  wrote:
> On 10/11/2012 2:02 PM, Shawn Heisey wrote:
>>
>> UpdateResponse ur = server.optimize(true, true, 20);
>>
>> What happens with this if I am already below 20 segments? Will it still
>> expunge all of my (typically several thousand) deleted documents?  I am
>> hoping that what it will do is rebuild any segment that contains deleted
>> documents and leave the other segments alone.
>
>
> I have just tried this on a test system with 11 segments via curl, not
> SolrJ.  I don't expect that it would be any different with SolrJ, though.
>
> curl
> 'http://localhost:8981/solr/s0live/update?optimize=true&maxSegments=20&expungeDeletes=true&waitFlush=true'
>
> It didn't work.  When I changed maxSegments to 10, it did reduce the index
> from 11 segments to 10, but there are still deleted documents in the index
> -- maxDoc > numDocs on the statistics screen.
>
> numDocs : 12782762
> maxDoc : 12788156
>
> I don't think expungeDeletes is actually a valid parameter for optimize, but
> I included it anyway.  I also tried doing a commit with expungeDeletes=true
> and that didn't work either.
>
> Is this a bug?  The server is 3.5.0.  Because I haven't finished getting my
> configuration worked out, I don't have the ability right now to try this on
> 4.0.0.
>
> Thanks,
> Shawn
>

Re: Custom html headers/footers to solr admin console

2012-10-12 Thread Erick Erickson

Well, I'm certainly not all that up on how that all works, I was mostly
trying to make sure you really needed to, and you do

But this capability, though rarely requested, seems harmless, so if
you wanted to create a patch that allows this but doesn't put
anything in the header/footer (or maybe a minimal message), it might
be worth including.

But usually when the presentation is required to follow some rules,
people put _all_ solr access behind a firewall, only allow access from
know IP addresses where the application runs and do any company-
specific presentation manipulations in the application.

But that may not satisfy your requirememt...

Best
Erick

On Thu, Oct 11, 2012 at 6:38 PM, Billy Newman  wrote:
> I take that answer as a no ;)
>
> And no admin only page. But you can query from that page. And the data 
> returned could be sensitive. As such our company requires us to flag in a 
> header/footer that the contents of the page could could be sensitive. So even 
> though it will just be for admin access I still need those headers.
>
> Sound like I am gonna have to dive into the HTML and make custom changes.
>
> Thanks for the quick response.
> Billy
>
> Sent from my iPhone
>
> On Oct 11, 2012, at 3:26 PM, Erick Erickson  wrote:
>
>> Uhhmmm, why do you want to do this? The admin screen is pretty
>> much purely intended for developers/in-house use. Mostly I just
>> want to be sure you aren't thinking about letting users, say, see
>> this page. Consider
>> /update?stream.body=*:*
>>
>> Best
>> Erick
>>
>> On Thu, Oct 11, 2012 at 4:57 PM, Billy Newman  wrote:
>>> Hello all,
>>>
>>>
>>> I was just poking around in my solr distribution and I noticed some files:
>>> admin-extra.html
>>> admin-extra.menu-top.html
>>> admin-extra.menu-bottom.html
>>>
>>>
>>> I was really hoping that that was html inserted into the solr admin
>>> page and I could modify the:
>>> admin-extra.menu-top.html
>>> admin-extra.menu-bottom.html
>>>
>>> files to make a header/footer.
>>>
>>> I un-commented out admin-extra.html and can now see that html in the
>>> admin extras section for my core so not exactly what I was looking
>>> for.
>>>
>>> Are the top/bottom html files used and are they really inserted at the
>>> top and bottom of the page?
>>>
>>> Any way to get some headers in the static admin page?  I would usually
>>> just modify the html, but in this case there might already be
>>> something I can use.
>>>
>>> Thanks,
>>> Billy

performance of group.ngroups=true

2012-10-12 Thread Rikke Willer

Hi,

I was wondering if there are any plans to work on this issue: 
https://issues.apache.org/jira/browse/SOLR-2963 ?
And possibly any thoughts on how difficult it will be to resolve?

Thanks,
Rikke

Re: Can I rely on correct handling of interrupted status of threads?

2012-10-12 Thread Robert Krüger

On Tue, Oct 2, 2012 at 11:48 AM, Robert Krüger  wrote:
> Hi,
>
> I'm using Solr 3.6.1 in an application embedded directly, i.e. via
> EmbeddedSolrServer, not over an HTTP connection, which works
> perfectly. Our application uses Thread.interrupt() for canceling
> long-running tasks (e.g. through Future.cancel). A while (and a few
> Solr versions) back a colleague of mine implemented a workaround
> because he said that Solr didn't handle the thread's interrupted
> status correctly, i.e. not setting the interrupted status after having
> caught an InterruptedException or rethrowing it, thus killing the
> information that an interrupt has been requested, which breaks
> libraries relying on that. However, I did not find anything up-to-date
> in mailing list or forum archives on the web. Is that still or was it
> ever the case? What does one have to watch out for when interrupting a
> thread that is doing anything within Solr/Lucene?
>
> Any advice would be appreciated.
>
> Regards,
>
> Robert

Just in case anyone else has the same question: You cannot. Thread
interruption is not handled properly so you should not use Solr in
code that you plan to interrupt via Thread.interrupt. You will get
stuff like IOExceptions so you cannot cleanly tell the difference
between "real" errors and stuff caused by interruption. Reading old
Jira issues, it looks as if this was a conscious decision because of
the amount of work it would cause in API design changes and lots of
checked exceptions the caller would have to handle.

Re: add shard to index

2012-10-12 Thread Otis Gospodnetic

Hi,

Can you share more please?  Have you tried this?  How well did it work for you?

Thanks,
Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Fri, Oct 12, 2012 at 7:17 AM, Radim Kolar  wrote:
> Dne 11.10.2012 1:12, Upayavira napsal(a):
>>
>> That is what is being discussed already. The thing is, at present, Solr
>> requires an even distribution of documents across shards, so you can't
>> just add another shard, assign it to a hash range, and be done with it.
>
> You can use shard size as part of scoring mechanism.

Re: Search in specific website

2012-10-12 Thread Otis Gospodnetic

Hi Tolga,

You'll get more help on the Nutch mailing list.  I don't know the
schema Nutch uses for Solr off the top of my head, so I can't tell you
if maybe it uses "site" for a field or "host" or "url" or "domain" or
...

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Fri, Oct 12, 2012 at 2:30 AM, Tolga  wrote:
> Hi,
>
> I use nutch to crawl my website and index to solr. However, how can I search
> for piece of content in a specific website? I use multiple URL's
>
> Regards,

Re: [ANNOUNCE] Apache Solr 4.0 released.

2012-10-12 Thread Péter Király

I would like to thank you all who participated in this!

Thank you very much!
Péter


2012/10/12 Robert Muir :
> October 12 2012, Apache Solr™ 4.0 available.
> The Lucene PMC is pleased to announce the release of Apache Solr 4.0.
>
> Solr is the popular, blazing fast, open source NoSQL search platform
> from the Apache Lucene project. Its major features include powerful
> full-text search, hit highlighting, faceted search, dynamic
> clustering, database integration, rich document (e.g., Word, PDF)
> handling, and geospatial search.  Solr is highly scalable, providing
> fault tolerant distributed search and indexing, and powers the
> searchand navigation features of many of the world's largest internet
> sites.
>
> Solr 4.0 is available for immediate download at:
>http://lucene.apache.org/solr/mirrors-solr-latest-redir.html
>
> See the CHANGES.txt file included with the release for a full list of details.
>
> Solr 4.0 Release Highlights:
>
> The largest set of features goes by the development code-name
> "SolrCloud" and involves bringing easy scalability to Solr.  See
> http://wiki.apache.org/solr/SolrCloud for more details.
> * Distributed indexing designed from the ground up for near real-time
> (NRT) and NoSQL features such as realtime-get, optimistic locking, and
> durable updates.
> * High availability with no single points of failure.
> * Apache Zookeeper integration for distributed coordination and
> cluster metadata and configuration storage.
> * Immunity to split-brain issues due to Zookeeper's Paxos distributed
> consensus protocols.
> * Updates sent to any node in the cluster and are automatically
> forwarded to the correct shard and replicated to multiple nodes for
> redundancy.
> * Queries sent to any node automatically perform a full distributed
> search across the cluster with load balancing and fail-over.
> * A collection management API.
> * Smart SolrJ client (CloudSolrServer) that knows to send documents
> only to the shard leaders
>
> Solr 4.0 includes more NoSQL features for those using Solr as a
> primary data store:
> * Update durability – A transaction log ensures that even uncommitted
> documents are never lost.
> * Real-time Get – The ability to quickly retrieve the latest version
> of a document, without the need to commit or open a new searcher
> * Versioning and Optimistic Locking – combined with real-time get,
> this allows read-update-write functionality that ensures no
> conflicting changes were made concurrently by other clients.
> * Atomic updates - the ability to add, remove, change, and increment
> fields of an existing document without having to send in the complete
> document again.
>
> Many additional improvements include:
> * New spatial field types with polygon support.
> * Pivot Faceting – Multi-level or hierarchical faceting where the top
> constraints for one field are found for each top constraint of a
> different field.
> * Pseudo-fields – The ability to alias fields, or to add metadata
> along with returned documents, such as function query values and
> results of spatial distance calculations.
> * A spell checker implementation that can work directly from the main
> index instead of creating a sidecar index.
> * Pseudo-Join functionality – The ability to select a set of documents
> based on their relationship to a second set of documents.
> * Function query enhancements including conditional function queries
> and relevancy functions.
> * New update processors to facilitate modifying documents prior to indexing.
> * A brand new web admin interface, including support for SolrCloud and
> improved error reporting
> * Numerous bug fixes and optimizations.
>
> Noteworthy changes since 4.0-BETA:
> * New spatial field types with polygon support.
> * Various Admin UI improvements.
> * SolrCloud related performance optimizations in writing the
> transaction log, PeerSync recovery, Leader election, and ClusterState
> caching.
> * Numerous bug fixes and optimizations.
>
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/solr/discussion.html)
>
> Note: The Apache Software Foundation uses an extensive mirroring
> network for distributing releases.  It is possible that the mirror you
> are using may not have replicated the release yet.  If that is the
> case, please try another mirror.  This also goes for Maven access.
>
> Happy searching,
>
> Apache Lucene/Solr Developers



-- 
Péter Király
eXtensible Catalog
http://eXtensibleCatalog.org
http://drupal.org/project/xc

Re: Any filter to map mutiple tokens into one ?

2012-10-12 Thread T. Kuro Kurosaka


On 10/11/12 4:47 PM, Jack Krupansky wrote:
The ":" which normally separates a field name from a term (or quoted 
string or parenthesized sub-query) is "parsed" by the query parser 
before analysis gets called, and "*:*" is recognized before analysis 
as well. So, any attempt to recreate "*:*" in analysis will be too 
late to affect query parsing and other pre-analysis processing.
That's why I suspect a bug in Solr.  Tokenizer shouldn't play any roles 
here but it is affecting the score calculation. I am seeing an evidence 
that "*:*" is being passed to my tokenizer.
I'm trying to find a way to work around this by reconstructing "*:*" in 
the analysis chain.


But, what is it you are really trying to do? What's the real problem? 
(This sounds like a proverbial "XY Problem".)


-- Jack Krupansky

-Original Message- From: T. Kuro Kurosaka
Sent: Thursday, October 11, 2012 7:35 PM
To: solr-user@lucene.apache.org
Subject: Any filter to map mutiple tokens into one ?

I am looking for a way to fold a particular sequence of tokens into one
token.
Concretely, I'd like to detect a three-token sequence of "*", ":" and
"*", and replace it with a token of the text "*:*".
I tried SynonymFIlter but it seems it can only deal with a single input
token. "* : * => *:*" seems to be interpreted
as one input token of 5 characters "*", space, ":", space and "*".

I'm using Solr 3.5.

Background:
My tokenizer separate the three character sequence "*:*" into 3 tokens
of one character each.
The edismax parser, when given the query "*:*", i.e. find every doc,
seems to pass the entire string "*:*" to the query analyzer  (I suspect
a bug.),
and feed the tokenized result to DisjunctionMaxQuery object,
according to this debug output:


*:*
*:*
+MatchAllDocsQuery(*:*)
DisjunctionMaxQuery((body:"* : *"~100^0.5 | title:"* :
*"~100^1.2)~0.01)
+*:* (body:"* : *"~100^0.5 | title:"* :
*"~100^1.2)~0.01

Notice that there is a space between * and : in
DisjunctionMaxQuery((body:"* : *" )

Probably because of this, the hit score is as low as 0.109, while it is
1.000 if an analyzer that doesn't break "*:*" is used.
So I'd like to stitch together "*", ":", "*" into "*:*" again to make
DisjunctionMaxQuery happy.


Thanks.


T. "Kuro" Kurosaka

Re: PointType doc reindex issue

2012-10-12 Thread Ravi Solr

Thank you very much Hoss, I knew I was doing something stupid. I will
change the dynamic fields to stored="false" and check it out.

Thanks

Ravi Kiran Bhaskar

On Wed, Oct 10, 2012 at 3:02 PM, Chris Hostetter
 wrote:
> : I have a weird problem, Whenever I read the doc from solr and
> : then index the same doc that already exists in the index (aka
> : reindexing) I get the following error. Can somebody tell me what I am
> : doing wrong. I use solr 3.6 and the definition of the field is given
> : below
>
> When you use the LatLonType field type you get "synthetic" *_coordinate"
> fields automicaly constructed under the covers from each of your fields
> that use a "latlon" fieldType.  because you have configured the
> "*_coordinate" fields to be "stored" they are included in the response
> when you request the doc.
>
> this means that unless you explicitly remove those synthetically
> constructed values before "reindexing", they will still be there in
> addition to the new (posisbly redundent) synthetic values created while
> indexing.
>
> This is why the "*_coordinate" dynamicField in the solr example schema.xml
> is marked 'stored="false"' so that this field doesn't come back in the
> response -- it's not ment for end users.
>
>
> :  subFieldSuffix="_coordinate"/>
> :  stored="true"/>
> :
> : Exception in thread "main"
> : org.apache.solr.client.solrj.SolrServerException: Server at
> : http://testsolr:8080/solr/mycore returned non ok status:400,
> : message:ERROR: [doc=1182684] multiple values encountered for non
> : multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
> :   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
> :   at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
> :   at com.wpost.search.indexing.MyTest.main(MyTest.java:31)
> :
> :
> : The data in the index looks as follows
> :
> : 39.017608,-77.375239
> : 
> :  39.017608
> :  39.017608
> : 
> : 
> : -77.375239
> : -77.375239
> : 
> :
> : Thanks
> :
> : Ravi Kiran Bhaskar
> :
>
> -Hoss

Re: Any filter to map mutiple tokens into one ?

2012-10-12 Thread Jack Krupansky

Okay, let's back up. First, hold off mixing in your proposed solution until 
after we understand the actual, original problem:


1. What is your field and field type (with analyzer details)?
2. What is your query parser (defType)?
3. What is your query request URL?
4. What is the parsed query (add &debugQuery=true to your query request)? 
(Actually, I think you gave us that)


I just tried the following query with the fresh 4.0 release and it works 
fine:


http://localhost:8983/solr/collection1/select?q=*:*&wt=xml&debugQuery=true&defType=edismax

*:*

The parsed query is:

(+MatchAllDocsQuery(*:*))/no_coord

And this was with the 4.0 example schema, adding *.xml and books.json 
documents.


If you could try your scenario with 4.0 that would be a help. If it's a bug 
in 3.5 that is fixed now... oh well. I mean, feel free to check the revision 
history for edismax since the 3.5 release.


-- Jack Krupansky

-Original Message- 
From: T. Kuro Kurosaka

Sent: Friday, October 12, 2012 11:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Any filter to map mutiple tokens into one ?

On 10/11/12 4:47 PM, Jack Krupansky wrote:
The ":" which normally separates a field name from a term (or quoted 
string or parenthesized sub-query) is "parsed" by the query parser before 
analysis gets called, and "*:*" is recognized before analysis as well. So, 
any attempt to recreate "*:*" in analysis will be too late to affect query 
parsing and other pre-analysis processing.

That's why I suspect a bug in Solr.  Tokenizer shouldn't play any roles
here but it is affecting the score calculation. I am seeing an evidence
that "*:*" is being passed to my tokenizer.
I'm trying to find a way to work around this by reconstructing "*:*" in
the analysis chain.


But, what is it you are really trying to do? What's the real problem? 
(This sounds like a proverbial "XY Problem".)


-- Jack Krupansky

-Original Message- From: T. Kuro Kurosaka
Sent: Thursday, October 11, 2012 7:35 PM
To: solr-user@lucene.apache.org
Subject: Any filter to map mutiple tokens into one ?

I am looking for a way to fold a particular sequence of tokens into one
token.
Concretely, I'd like to detect a three-token sequence of "*", ":" and
"*", and replace it with a token of the text "*:*".
I tried SynonymFIlter but it seems it can only deal with a single input
token. "* : * => *:*" seems to be interpreted
as one input token of 5 characters "*", space, ":", space and "*".

I'm using Solr 3.5.

Background:
My tokenizer separate the three character sequence "*:*" into 3 tokens
of one character each.
The edismax parser, when given the query "*:*", i.e. find every doc,
seems to pass the entire string "*:*" to the query analyzer  (I suspect
a bug.),
and feed the tokenized result to DisjunctionMaxQuery object,
according to this debug output:


*:*
*:*
+MatchAllDocsQuery(*:*)
DisjunctionMaxQuery((body:"* : *"~100^0.5 | title:"* :
*"~100^1.2)~0.01)
+*:* (body:"* : *"~100^0.5 | title:"* :
*"~100^1.2)~0.01

Notice that there is a space between * and : in
DisjunctionMaxQuery((body:"* : *" )

Probably because of this, the hit score is as low as 0.109, while it is
1.000 if an analyzer that doesn't break "*:*" is used.
So I'd like to stitch together "*", ":", "*" into "*:*" again to make
DisjunctionMaxQuery happy.


Thanks.


T. "Kuro" Kurosaka

Re: Shutting down Solr in Cygwin on Wndows

2012-10-12 Thread Jack Krupansky

That’s "used to see several".

-- Jack Krupansky

-Original Message- 
From: Jack Krupansky

Sent: Friday, October 12, 2012 12:54 PM
To: solr-user@lucene.apache.org
Subject: Shutting down Solr in Cygwin on Wndows

I used to several several “shutdown” messages when I hit ^C while running 
Solr in Cygwin on Windows, but now, I instantly get a bash prompt. Is 
orderly shutdown of Solr still occurring? Or is there some other preferred 
way to shutdown Solr?

I recall seeing some dev list traffic related to some unit testing issue 
releated to Solr shutdown.

I also recently upgraded Cygwin.

-- Jack Krupansky

SolrCloud with PHP

2012-10-12 Thread Shaddy Zeineddine


Hello,

I have some questions about the SolrCloud.

Can I take full advtange of the Cloud with the PECL Solr client? It was 
last updated for Solr 3.1 http://pecl.php.net/package/solr


Is Jetty the recommended servlet for the Cloud?

The documentation about configuring, optimizing, and accessing the Solr 
cloud is very sparse. How can I get more information about these topics?


Thanks for any help you can provide!
Shaddy


Shaddy Zeineddine, PHP Developer
szeinedd...@breakmedia.com  |  310.360.4141 Ext. 297  
Explore: http://acumen.breakmedia.com/ - Trends, Insights and ideas about Men

Re: SolrCloud with PHP

2012-10-12 Thread Mark Miller


On 10/12/2012 01:42 PM, Shaddy Zeineddine wrote:

Hello,

I have some questions about the SolrCloud.

Can I take full advtange of the Cloud with the PECL Solr client? It 
was last updated for Solr 3.1 http://pecl.php.net/package/solr


I don't know for sure, I don't know that client. If it's HTTP based, it 
probably still works, but somehow else might no better. The answer 
probably involves some testing.




Is Jetty the recommended servlet for the Cloud?


It's what I would recommend.



The documentation about configuring, optimizing, and accessing the 
Solr cloud is very sparse. How can I get more information about these 
topics?


What would you like to see added to the SolrCloud wiki page?


- Mark

Re: SolrCloud with PHP

2012-10-12 Thread Mark Miller

On 10/12/2012 01:42 PM, Shaddy Zeineddine wrote:
> Hello,
>
> I have some questions about the SolrCloud.
>
> Can I take full advtange of the Cloud with the PECL Solr client? It was last 
> updated for Solr 3.1 http://pecl.php.net/package/solr 

I don't know for sure, I don't know that client. If it's HTTP based, it 
probably still works, but somehow else might no better. The answer probably 
involves some testing.

>
> Is Jetty the recommended servlet for the Cloud? 

It's what I would recommend.

>
> The documentation about configuring, optimizing, and accessing the Solr cloud 
> is very sparse. How can I get more information about these topics? 

What would you like to see added to the SolrCloud wiki page?

- Mark

Solr Cloud and Hadoop

2012-10-12 Thread Rui Vaz

Hello,

Solr Cloud and Hadoop are new to me. And I am figuring out an
architecture to do a
distributed indexing/searching system in a cluster. Integrating them is an
option.

I would like to know if Hadoop + Solr is still a good option to build the a
big index in a cluster,
using HDFS and MapReduce, or if the new functionalities in Solr Cloud make
Hadoop unnecessary.

I know I provided few insight about the number of shards, or if I have more
network throughput
or memory constraints. I want to launch the discussion and see diferent
points of view.

Thank you very much,
-- 
Rui Vaz

Re: multi-core sharing synonym map

2012-10-12 Thread simon

I definitely haven't tried this ;=) but perhaps you could create your own
XXXSynonymFilterFactory  as a subclass of SynonymFilterFactory,  which
would allow you to share the synonym map across all cores - though I think
there would need to be a nasty global variable to hold a reference to it...

-Simon

On Fri, Oct 12, 2012 at 12:27 PM, Phil Hoy wrote:

> Hi,
>
> We have a multi-core set up with a fairly large synonym file, all cores
> share the same schema.xml and synonym file but when solr loads the cores,
> it loads multiple instances of the synonym map, this is a little wasteful
> of memory and lengthens the start-up time. Is there a way to get all cores
> to share the same map?
>
>
> Phil
>

Re: Solr Cloud and Hadoop

2012-10-12 Thread Otis Gospodnetic

Hello Rui,

If your data to be indexed is in HDFS, using MapReduce to parallelize
indexing is still a good idea.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Fri, Oct 12, 2012 at 2:35 PM, Rui Vaz  wrote:
> Hello,
>
> Solr Cloud and Hadoop are new to me. And I am figuring out an
> architecture to do a
> distributed indexing/searching system in a cluster. Integrating them is an
> option.
>
> I would like to know if Hadoop + Solr is still a good option to build the a
> big index in a cluster,
> using HDFS and MapReduce, or if the new functionalities in Solr Cloud make
> Hadoop unnecessary.
>
> I know I provided few insight about the number of shards, or if I have more
> network throughput
> or memory constraints. I want to launch the discussion and see diferent
> points of view.
>
> Thank you very much,
> --
> Rui Vaz

Re: Any filter to map mutiple tokens into one ?

2012-10-12 Thread T. Kuro Kurosaka


Jack,
It goes like this:

http://myhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on

and edismax is the default query parser in solrconfig.xml.

There is a field named text_jpn that uses a Tokenizer that we developed 
as a product, which we can't share here.


But I can simulate our situation using NGramTokenizer.
After indexing the Solr sample docs normally, stop the Solr and insert:

positionIncrementGap="100">









Replace the field definition for "name", for example:


In solrconfig.xml, change the default search handler's definition like this:
edismax
name^0.5
(I guess I could just have these in the URL.)

Start Solr and give this URL:

http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on&explainOther=&hl.fl=

Hopefully you'll see
0.3663672
and
+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : *"^0.5))

in the debug output.

The score calculation should not be done when the query is *:* which has 
the special meaning, should it ?
And even if the score calculation is done, "*:*" shouldn't be fed to 
Tokenizers, should it?


On 10/12/12 9:44 AM, Jack Krupansky wrote:
Okay, let's back up. First, hold off mixing in your proposed solution 
until after we understand the actual, original problem:


1. What is your field and field type (with analyzer details)?
2. What is your query parser (defType)?
3. What is your query request URL?
4. What is the parsed query (add &debugQuery=true to your query 
request)? (Actually, I think you gave us that)


I just tried the following query with the fresh 4.0 release and it 
works fine:


http://localhost:8983/solr/collection1/select?q=*:*&wt=xml&debugQuery=true&defType=edismax 



*:*

The parsed query is:

(+MatchAllDocsQuery(*:*))/no_coord

And this was with the 4.0 example schema, adding *.xml and books.json 
documents.


If you could try your scenario with 4.0 that would be a help. If it's 
a bug in 3.5 that is fixed now... oh well. I mean, feel free to check 
the revision history for edismax since the 3.5 release.


-- Jack Krupansky

-Original Message- From: T. Kuro Kurosaka
Sent: Friday, October 12, 2012 11:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Any filter to map mutiple tokens into one ?

On 10/11/12 4:47 PM, Jack Krupansky wrote:
The ":" which normally separates a field name from a term (or quoted 
string or parenthesized sub-query) is "parsed" by the query parser 
before analysis gets called, and "*:*" is recognized before analysis 
as well. So, any attempt to recreate "*:*" in analysis will be too 
late to affect query parsing and other pre-analysis processing.

That's why I suspect a bug in Solr.  Tokenizer shouldn't play any roles
here but it is affecting the score calculation. I am seeing an evidence
that "*:*" is being passed to my tokenizer.
I'm trying to find a way to work around this by reconstructing "*:*" in
the analysis chain.


But, what is it you are really trying to do? What's the real problem? 
(This sounds like a proverbial "XY Problem".)


-- Jack Krupansky

-Original Message- From: T. Kuro Kurosaka
Sent: Thursday, October 11, 2012 7:35 PM
To: solr-user@lucene.apache.org
Subject: Any filter to map mutiple tokens into one ?

I am looking for a way to fold a particular sequence of tokens into one
token.
Concretely, I'd like to detect a three-token sequence of "*", ":" and
"*", and replace it with a token of the text "*:*".
I tried SynonymFIlter but it seems it can only deal with a single input
token. "* : * => *:*" seems to be interpreted
as one input token of 5 characters "*", space, ":", space and "*".

I'm using Solr 3.5.

Background:
My tokenizer separate the three character sequence "*:*" into 3 tokens
of one character each.
The edismax parser, when given the query "*:*", i.e. find every doc,
seems to pass the entire string "*:*" to the query analyzer  (I suspect
a bug.),
and feed the tokenized result to DisjunctionMaxQuery object,
according to this debug output:


*:*
*:*
+MatchAllDocsQuery(*:*)
DisjunctionMaxQuery((body:"* : *"~100^0.5 | title:"* :
*"~100^1.2)~0.01)
+*:* (body:"* : *"~100^0.5 | title:"* :
*"~100^1.2)~0.01

Notice that there is a space between * and : in
DisjunctionMaxQuery((body:"* : *" )

Probably because of this, the hit score is as low as 0.109, while it is
1.000 if an analyzer that doesn't break "*:*" is used.
So I'd like to stitch together "*", ":", "*" into "*:*" again to make
DisjunctionMaxQuery happy.


Thanks.


T. "Kuro" Kurosaka

RE: multi-core sharing synonym map

2012-10-12 Thread Phil Hoy

Yes I was thinking the same thing, although I was hoping there was a more 
elegant mechanism exposed by the solr infrastructure code to handle the shared 
map, aside from just using a global that is. 

Phil

-Original Message-
From: simon [mailto:mtnes...@gmail.com] 
Sent: 12 October 2012 19:38
To: solr-user@lucene.apache.org
Subject: Re: multi-core sharing synonym map

I definitely haven't tried this ;=) but perhaps you could create your own 
XXXSynonymFilterFactory  as a subclass of SynonymFilterFactory,  which would 
allow you to share the synonym map across all cores - though I think there 
would need to be a nasty global variable to hold a reference to it...

-Simon

On Fri, Oct 12, 2012 at 12:27 PM, Phil Hoy wrote:

> Hi,
>
> We have a multi-core set up with a fairly large synonym file, all 
> cores share the same schema.xml and synonym file but when solr loads 
> the cores, it loads multiple instances of the synonym map, this is a 
> little wasteful of memory and lengthens the start-up time. Is there a 
> way to get all cores to share the same map?
>
>
> Phil
>

__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__

Re: SolrCloud with PHP

2012-10-12 Thread Shaddy Zeineddine


What I'd like to see added to the SolrCloud wiki page:

- The wiki page states that you can send your request to any server, but 
what if that server goes down? Doesn't there need to be an aliased IP 
address pointing to an active server? Or, is there client side support 
like MongoDB replica sets where the client gets a seed list of nodes, 
then gets a complete list, and the client can send requests to alternate 
nodes if one doesn't respond.


- Are all the possible cloud configurations described or implied by the 
examples on the wiki page? Is it possible to have only 1 shard and 1 
replica? Is it easy to then split the single shard into more as you add 
new nodes? How much of organizing the shards/replicas is automated and 
what should be manually configured for optimal performance for my situation?


Those are the areas which are still unclear to me after reading over the 
wiki page. I'm sure I'll have more questions as I move forward to 
integrating the SolrCloud with our searching.


Shaddy

On 10/12/2012 11:14 AM, Mark Miller wrote:

On 10/12/2012 01:42 PM, Shaddy Zeineddine wrote:

Hello,

I have some questions about the SolrCloud.

Can I take full advtange of the Cloud with the PECL Solr client? It
was last updated for Solr 3.1 http://pecl.php.net/package/solr

I don't know for sure, I don't know that client. If it's HTTP based, it
probably still works, but somehow else might no better. The answer
probably involves some testing.


Is Jetty the recommended servlet for the Cloud?

It's what I would recommend.


The documentation about configuring, optimizing, and accessing the
Solr cloud is very sparse. How can I get more information about these
topics?

What would you like to see added to the SolrCloud wiki page?


- Mark





Shaddy Zeineddine, PHP Developer
szeinedd...@breakmedia.com  |  310.360.4141 Ext. 297  
Explore: http://acumen.breakmedia.com/ - Trends, Insights and ideas about Men

Re: SolrJ, optimize, maxSegments

2012-10-12 Thread Shawn Heisey


On 10/12/2012 6:04 AM, Erick Erickson wrote:

Hmmm, I dug around in the code and found this bit:
*  Forces merging of all segments that have deleted
*  documents.  The actual merges to be executed are
*  determined by the {@link MergePolicy}.  For example,
*  the default {@link TieredMergePolicy} will only
*  pick a segment if the percentage of
*  deleted docs is over 10%.

see IndexWriter.forceMergeDeletes. So perhaps this limit
was never hit?


My own digging based on yours turned up this:

https://issues.apache.org/jira/browse/SOLR-2725

This sounds like there is currently no way to change this in the Solr 
config, so it looks like my choices right now are to make a source code 
change and respin the Solr 3.5.0 that I'm using or just continue to use 
optimize with no options until SOLR-2725 gets resolved and I can 
upgrade.  Is that right?


Thanks,
Shawn

Re: Solr Cloud and Hadoop

2012-10-12 Thread Timothy Potter

Hi Rui,

If you're going to shard and/or replicate your index, then be sure to take
a look at CloudSolrServer in the SolrJ client library. CloudSolrServer is
an extension to SolrServer that works with Zookeeper to understand the
shards and replicas in a Solr cluster. Using CloudSolrServer, there is no
single point-of-failure during distributed indexing.

At my company, we use Pig (on top of Hadoop) to "enrich" documents before
they are indexed so we developed a Pig StoreFunc that uses CloudSolrServer
under the covers. We achieve very high throughput rates with this
configuration. Also, you mentioned you are new to Hadoop so definitely take
a look at Pig vs. doing lower-level MapReduce tasks.

Cheers,
Tim

On Fri, Oct 12, 2012 at 1:41 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hello Rui,
>
> If your data to be indexed is in HDFS, using MapReduce to parallelize
> indexing is still a good idea.
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Fri, Oct 12, 2012 at 2:35 PM, Rui Vaz  wrote:
> > Hello,
> >
> > Solr Cloud and Hadoop are new to me. And I am figuring out an
> > architecture to do a
> > distributed indexing/searching system in a cluster. Integrating them is
> an
> > option.
> >
> > I would like to know if Hadoop + Solr is still a good option to build
> the a
> > big index in a cluster,
> > using HDFS and MapReduce, or if the new functionalities in Solr Cloud
> make
> > Hadoop unnecessary.
> >
> > I know I provided few insight about the number of shards, or if I have
> more
> > network throughput
> > or memory constraints. I want to launch the discussion and see diferent
> > points of view.
> >
> > Thank you very much,
> > --
> > Rui Vaz
>

Re: SolrCloud with PHP

2012-10-12 Thread Mark Miller


bq. but somehow else might no better.

* But someone else might know better* - brain is a bit scrambled today.

I'll try and address your questions on the wiki.

- Mark

On 10/12/2012 03:32 PM, Shaddy Zeineddine wrote:

What I'd like to see added to the SolrCloud wiki page:

- The wiki page states that you can send your request to any server, 
but what if that server goes down? Doesn't there need to be an aliased 
IP address pointing to an active server? Or, is there client side 
support like MongoDB replica sets where the client gets a seed list of 
nodes, then gets a complete list, and the client can send requests to 
alternate nodes if one doesn't respond.


- Are all the possible cloud configurations described or implied by 
the examples on the wiki page? Is it possible to have only 1 shard and 
1 replica? Is it easy to then split the single shard into more as you 
add new nodes? How much of organizing the shards/replicas is automated 
and what should be manually configured for optimal performance for my 
situation?


Those are the areas which are still unclear to me after reading over 
the wiki page. I'm sure I'll have more questions as I move forward to 
integrating the SolrCloud with our searching.


Shaddy

On 10/12/2012 11:14 AM, Mark Miller wrote:

On 10/12/2012 01:42 PM, Shaddy Zeineddine wrote:

Hello,

I have some questions about the SolrCloud.

Can I take full advtange of the Cloud with the PECL Solr client? It
was last updated for Solr 3.1 http://pecl.php.net/package/solr

I don't know for sure, I don't know that client. If it's HTTP based, it
probably still works, but somehow else might no better. The answer
probably involves some testing.


Is Jetty the recommended servlet for the Cloud?

It's what I would recommend.


The documentation about configuring, optimizing, and accessing the
Solr cloud is very sparse. How can I get more information about these
topics?

What would you like to see added to the SolrCloud wiki page?


- Mark





Shaddy Zeineddine, PHP Developer
szeinedd...@breakmedia.com  |  310.360.4141 Ext. 297  Explore: 
http://acumen.breakmedia.com/ - Trends, Insights and ideas about Men

Re: Solr Cloud and Hadoop

2012-10-12 Thread Jack Krupansky

You may also want take a look at the DataStax Enterprise product which 
combines Cassandra, Solr, and Hadoop.


See:
http://www.datastax.com/products/enterprise

-- Jack Krupansky

-Original Message- 
From: Rui Vaz

Sent: Friday, October 12, 2012 2:35 PM
To: solr-user@lucene.apache.org
Subject: Solr Cloud and Hadoop

Hello,

Solr Cloud and Hadoop are new to me. And I am figuring out an
architecture to do a
distributed indexing/searching system in a cluster. Integrating them is an
option.

I would like to know if Hadoop + Solr is still a good option to build the a
big index in a cluster,
using HDFS and MapReduce, or if the new functionalities in Solr Cloud make
Hadoop unnecessary.

I know I provided few insight about the number of shards, or if I have more
network throughput
or memory constraints. I want to launch the discussion and see diferent
points of view.

Thank you very much,
--
Rui Vaz

How to import a part of index from main Solr server(based on a query) to another Solr server and then do incremental import at intervals later(the updated index)?

2012-10-12 Thread jefferyyuan

I have a main solr server(solr1) which stores indexes of all docs, and want
to implement the following function:
1. First make a full import of my doc updated/created recently(last 1 or 2
weeks) from solr1.
2. Make delta import at intervals to copy the change of my doc from solr1 to
solr2. - doc may be deleted, updated, created during this period.

-- as the function supported by SqlEntityProcessor to import data from DB to
Solr.

http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor
SolrEntityProcessor can make a full-import from one Solr to another solr
based on a query(using query parameter in config file), but seems can't do
delta import later: no deltaImportQuery and deltaQuery configuration, which
is supported in SqlEntityProcessor.

I have a field last_modified which records the timestamp an doc is created
or updated. 
Task1 can be easily implemented: ; 

But how can implement incremental import with SolrEntityProcessor? Seems
SolrEntityProcessor doesn't support "command=delta-import". 

Thanks for any reply and help :)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-import-a-part-of-index-from-main-Solr-server-based-on-a-query-to-another-Solr-server-and-then-tp4013479.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Any filter to map mutiple tokens into one ?

2012-10-12 Thread Jack Krupansky

I don't have a Solr 3.5 to check, but SOLR-3261, which was fixed in Solr 3.6 
may be your culprit.


See:
https://issues.apache.org/jira/browse/SOLR-3261

So, try SOlr 3.6 or 3.6.1 or 4.0 to see if your issue goes away.

-- Jack Krupansky

-Original Message- 
From: T. Kuro Kurosaka

Sent: Friday, October 12, 2012 3:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Any filter to map mutiple tokens into one ?

Jack,
It goes like this:

http://myhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on

and edismax is the default query parser in solrconfig.xml.

There is a field named text_jpn that uses a Tokenizer that we developed
as a product, which we can't share here.

But I can simulate our situation using NGramTokenizer.
After indexing the Solr sample docs normally, stop the Solr and insert:










Replace the field definition for "name", for example:


In solrconfig.xml, change the default search handler's definition like this:
edismax
name^0.5
(I guess I could just have these in the URL.)

Start Solr and give this URL:

http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on&explainOther=&hl.fl=

Hopefully you'll see
0.3663672
and
+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : *"^0.5))

in the debug output.

The score calculation should not be done when the query is *:* which has
the special meaning, should it ?
And even if the score calculation is done, "*:*" shouldn't be fed to
Tokenizers, should it?

On 10/12/12 9:44 AM, Jack Krupansky wrote:

Okay, let's back up. First, hold off mixing in your proposed solution
until after we understand the actual, original problem:

1. What is your field and field type (with analyzer details)?
2. What is your query parser (defType)?
3. What is your query request URL?
4. What is the parsed query (add &debugQuery=true to your query
request)? (Actually, I think you gave us that)

I just tried the following query with the fresh 4.0 release and it
works fine:

http://localhost:8983/solr/collection1/select?q=*:*&wt=xml&debugQuery=true&defType=edismax


*:*

The parsed query is:

(+MatchAllDocsQuery(*:*))/no_coord

And this was with the 4.0 example schema, adding *.xml and books.json
documents.

If you could try your scenario with 4.0 that would be a help. If it's
a bug in 3.5 that is fixed now... oh well. I mean, feel free to check
the revision history for edismax since the 3.5 release.

-- Jack Krupansky

-Original Message- From: T. Kuro Kurosaka
Sent: Friday, October 12, 2012 11:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Any filter to map mutiple tokens into one ?

On 10/11/12 4:47 PM, Jack Krupansky wrote:

The ":" which normally separates a field name from a term (or quoted
string or parenthesized sub-query) is "parsed" by the query parser
before analysis gets called, and "*:*" is recognized before analysis
as well. So, any attempt to recreate "*:*" in analysis will be too
late to affect query parsing and other pre-analysis processing.

That's why I suspect a bug in Solr.  Tokenizer shouldn't play any roles
here but it is affecting the score calculation. I am seeing an evidence
that "*:*" is being passed to my tokenizer.
I'm trying to find a way to work around this by reconstructing "*:*" in
the analysis chain.


But, what is it you are really trying to do? What's the real problem?
(This sounds like a proverbial "XY Problem".)

-- Jack Krupansky

-Original Message- From: T. Kuro Kurosaka
Sent: Thursday, October 11, 2012 7:35 PM
To: solr-user@lucene.apache.org
Subject: Any filter to map mutiple tokens into one ?

I am looking for a way to fold a particular sequence of tokens into one
token.
Concretely, I'd like to detect a three-token sequence of "*", ":" and
"*", and replace it with a token of the text "*:*".
I tried SynonymFIlter but it seems it can only deal with a single input
token. "* : * => *:*" seems to be interpreted
as one input token of 5 characters "*", space, ":", space and "*".

I'm using Solr 3.5.

Background:
My tokenizer separate the three character sequence "*:*" into 3 tokens
of one character each.
The edismax parser, when given the query "*:*", i.e. find every doc,
seems to pass the entire string "*:*" to the query analyzer  (I suspect
a bug.),
and feed the tokenized result to DisjunctionMaxQuery object,
according to this debug output:


*:*
*:*
+MatchAllDocsQuery(*:*)
DisjunctionMaxQuery((body:"* : *"~100^0.5 | title:"* :
*"~100^1.2)~0.01)
+*:* (body:"* : *"~100^0.5 | title:"* :
*"~100^1.2)~0.01

Notice that there is a space between * and : in
DisjunctionMaxQuery((body:"* : *" )

Probably because of this, the hit score is as low as 0.109, while it is
1.000 if an analyzer that doesn't break "*:*" is used.
So I'd like to stitch together "*", ":", "*" into "*:*" again to make
DisjunctionMaxQuery happy.


Thanks.


T. "Kuro" Kurosaka

RE: anyone have any clues about this exception

2012-10-12 Thread Petersen, Robert

Hi Erick,

After reading the discussion you guys were having about renaming optimize to 
forceMerge I realized I was guilty of over-optimizing like you guys were 
worried about!  We have about 15 million docs indexed now and we spin about 
50-300 adds per second 24/7, most of them being updates to existing documents 
whose data has changed since the last time it was indexed (which we keep track 
of in a DB table).  There are some new documents being added in the mix and 
some deletes as well too.

I understand now how the merge policy caps the number of segments.  I used to 
think they would grow unbounded and thus optimize was required.  How does the 
large number of updates of existing documents affect the need to optimize, by 
causing a large number of deletes with a 're-add'?  And so I suppose that means 
the index size tends to grow with the deleted docs hanging around in the 
background, as it were.

So in our situation, what frequency of optimize would you recommend?  We're on 
3.6.1 btw...

Thanks,
Robi

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, October 11, 2012 5:29 AM
To: solr-user@lucene.apache.org
Subject: Re: anyone have any clues about this exception

Well, you'll actually be able to optimize, it's just called forceMerge.

But the point is that optimize seems like something that _of course_ you want 
to do, when in reality it's not something you usually should do at all. 
Optimize does two things:
1> merges all the segments into one (usually)
2> removes all of the info associated with deleted documents.

Of the two, point <2> is the one that really counts and that's done whenever 
segment merging is done anyway. So unless you have a very large number of 
deletes (or updates of the same document), optimize buys you very little. You 
can tell this by the difference between numDocs and maxDoc in the admin page.

So what happens if you just don't bother to optimize? Take a look at merge 
policy to help control how merging happens perhaps as an alternative.

Best
Erick

On Wed, Oct 10, 2012 at 3:04 PM, Petersen, Robert  wrote:
> You could be right.  Going back in the logs, I noticed it used to happen less 
> frequently and always towards the end of an optimize operation.  It is 
> probably my indexer timing out waiting for updates to occur during optimizes. 
>  The errors grew recently due to my upping the indexer threadcount to 22 
> threads, so there's a lot more timeouts occurring now.  Also our index has 
> grown to double the old size so the optimize operation has started taking a 
> lot longer, also contributing to what I'm seeing.   I have just changed my 
> optimize frequency from three times a day to one time a day after reading the 
> following:
>
> Here they are talking about completely deprecating the optimize 
> command in the next version of solr... 
> https://issues.apache.org/jira/browse/SOLR-3141c
>
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Wednesday, October 10, 2012 11:10 AM
> To: solr-user@lucene.apache.org
> Subject: Re: anyone have any clues about this exception
>
> Something timed out, the other end closed the connection. This end tried to 
> write to closed pipe and died, something tried to catch that exception and 
> write its own and died even worse? Just making it up really, but sounds good 
> (plus a 3-year Java tech-support hunch).
>
> If it happens often enough, see if you can run WireShark on that machine's 
> network interface and catch the whole network conversation in action. Often, 
> there is enough clues there by looking at tcp packets and/or stuff 
> transmitted. WireShark is a power-tool, so takes a little while the first 
> time, but the learning will pay for itself over and over again.
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all 
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert  wrote:
>> Tomcat localhost log (not the catalina log) for my  solr 3.6.1 (master) 
>> instance contains lots of these exceptions but solr itself seems to be doing 
>> fine... any ideas?  I'm not seeing these exceptions being logged on my slave 
>> servers btw, just the master where we do our indexing only.
>>
>>
>>
>> Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve
>> invoke
>> SEVERE: Servlet.service() for servlet default threw exception 
>> java.lang.IllegalStateException
>> at 
>> org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
>> at 
>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389)
>> at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291)
>>

Re: Solr Cloud and Hadoop

2012-10-12 Thread Rui Vaz

Thank you very much for your replies,

Yes Otis one possibility is to copy my data do HDFS and then apply a Map
function
to create the intermediate indexes across the cluster using SOLR java
library in HDFS.

I have some doubts concerning this solution:

 1 - The intermediate indexes created realy
need to be merged?
   I mean is there any mechanis in SOLR
CLOUD to easily
   combine those intermediate indexes
and serve them as if they were a "whole index", in a
   distributed fashion?

  2-  Can I serve these diferent index with
SOLR or SOLR CLOUD directly in HDFS?
   Google says no :), so maybe I need
to copy the indexes to a local file system
   and point Solr at it.

Timothy thak you for your tips. I am looking at Pig.
CloudSolrServer seams an interesting piece of the architecture, especially
to discover
Solr endpoints and then possibly replicate my index, but I was wondering if
I need to implement that
or if Solr will take care of that for me. Maybe I just didn't get your tip
due to my newbie knowledge in Solr.

I am sorry if I am confusing some concepts or not being very precise in my
words.

Jack thank you for sharing DataSax solution, I will definetey take a look
since it's free :). But anyway the objective
of this project is for me to learn Solr and Hadoop. :)

Thank you,
Rui Vaz

Re: multi-core sharing synonym map

2012-10-12 Thread Erick Erickson

There are a lot of sub-tasks having to do with lots of cores here:
http://wiki.apache.org/solr/LotsOfCores
I don't see a reference to this particular issue, but it sure seems
like this could be a reasonable thing to add to the list. By extension,
all of the files that can be specified (stopwords, query elevation,
to name two) could reasonably be shared..

Erick

On Fri, Oct 12, 2012 at 3:21 PM, Phil Hoy  wrote:
> Yes I was thinking the same thing, although I was hoping there was a more 
> elegant mechanism exposed by the solr infrastructure code to handle the 
> shared map, aside from just using a global that is.
>
> Phil
>
> -Original Message-
> From: simon [mailto:mtnes...@gmail.com]
> Sent: 12 October 2012 19:38
> To: solr-user@lucene.apache.org
> Subject: Re: multi-core sharing synonym map
>
> I definitely haven't tried this ;=) but perhaps you could create your own 
> XXXSynonymFilterFactory  as a subclass of SynonymFilterFactory,  which would 
> allow you to share the synonym map across all cores - though I think there 
> would need to be a nasty global variable to hold a reference to it...
>
> -Simon
>
> On Fri, Oct 12, 2012 at 12:27 PM, Phil Hoy wrote:
>
>> Hi,
>>
>> We have a multi-core set up with a fairly large synonym file, all
>> cores share the same schema.xml and synonym file but when solr loads
>> the cores, it loads multiple instances of the synonym map, this is a
>> little wasteful of memory and lengthens the start-up time. Is there a
>> way to get all cores to share the same map?
>>
>>
>> Phil
>>
>
>
> __
> This email has been scanned by the brightsolid Email Security System. Powered 
> by MessageLabs 
> __

Re: SolrJ, optimize, maxSegments

2012-10-12 Thread Erick Erickson

Sounds reasonable although I admit I haven't looked deeply.

Erick

On Fri, Oct 12, 2012 at 3:41 PM, Shawn Heisey  wrote:
> On 10/12/2012 6:04 AM, Erick Erickson wrote:
>>
>> Hmmm, I dug around in the code and found this bit:
>> *  Forces merging of all segments that have deleted
>> *  documents.  The actual merges to be executed are
>> *  determined by the {@link MergePolicy}.  For example,
>> *  the default {@link TieredMergePolicy} will only
>> *  pick a segment if the percentage of
>> *  deleted docs is over 10%.
>>
>> see IndexWriter.forceMergeDeletes. So perhaps this limit
>> was never hit?
>
>
> My own digging based on yours turned up this:
>
> https://issues.apache.org/jira/browse/SOLR-2725
>
> This sounds like there is currently no way to change this in the Solr
> config, so it looks like my choices right now are to make a source code
> change and respin the Solr 3.5.0 that I'm using or just continue to use
> optimize with no options until SOLR-2725 gets resolved and I can upgrade.
> Is that right?
>
> Thanks,
> Shawn
>

Re: How to import a part of index from main Solr server(based on a query) to another Solr server and then do incremental import at intervals later(the updated index)?

2012-10-12 Thread Erick Erickson

Hmmm, not quite what you asked, but would it work to just
replicate from Solr1 to Solr2 when you want to synch? You
can trigger this via http, see:
http://wiki.apache.org/solr/SolrReplication#HTTP_API

If you're talking about individual documents, then the answer
is no. There's nothing as far as I know that moves just a single
document. If you've stored all the fields, you could fetch the doc
from solr1 and index it to solr2, but that's kinda kludgy...

Best
Erick

On Fri, Oct 12, 2012 at 4:44 PM, jefferyyuan  wrote:
> I have a main solr server(solr1) which stores indexes of all docs, and want
> to implement the following function:
> 1. First make a full import of my doc updated/created recently(last 1 or 2
> weeks) from solr1.
> 2. Make delta import at intervals to copy the change of my doc from solr1 to
> solr2. - doc may be deleted, updated, created during this period.
>
> -- as the function supported by SqlEntityProcessor to import data from DB to
> Solr.
>
> http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor
> SolrEntityProcessor can make a full-import from one Solr to another solr
> based on a query(using query parameter in config file), but seems can't do
> delta import later: no deltaImportQuery and deltaQuery configuration, which
> is supported in SqlEntityProcessor.
>
> I have a field last_modified which records the timestamp an doc is created
> or updated.
> Task1 can be easily implemented:  processor="SolrEntityProcessor" query="+from:jeffery
> +last_modified:[${dataimporter.request.start_time} TO NOW]"
> url="mainsolr:8080/solr/"/>;
>
> But how can implement incremental import with SolrEntityProcessor? Seems
> SolrEntityProcessor doesn't support "command=delta-import".
>
> Thanks for any reply and help :)
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-import-a-part-of-index-from-main-Solr-server-based-on-a-query-to-another-Solr-server-and-then-tp4013479.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using

2012-10-12 Thread Erick Erickson

I've been building 4.x regularly. Have you tried "ant clean-jars"?

Best
Erick

On Fri, Oct 12, 2012 at 6:32 PM, P Williams
 wrote:
> Hi,
>
> Has anyone tried using  name="solr-test-framework" rev="4.0.0" conf="test->default"/> with Apache
> IVY in their project?
>
> rev 3.6.1 works but any of the 4.0.0 ALPHA, BETA and release result in:
> [ivy:resolve] :: problems summary ::
> [ivy:resolve]  WARNINGS
> [ivy:resolve]   [FAILED ]
> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit:
>  (0ms)
> [ivy:resolve]    shared: tried
> [ivy:resolve]
> C:\Users\pjenkins\.ant/shared/org.eclipse.jetty.orbit/javax.servlet/3.0.0.v201112011016/orbits/javax.servlet.orbit
> [ivy:resolve]    public: tried
> [ivy:resolve]
> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
> [ivy:resolve]   ::
> [ivy:resolve]   ::  FAILED DOWNLOADS::
> [ivy:resolve]   :: ^ see resolution messages for details  ^ ::
> [ivy:resolve]   ::
> [ivy:resolve]   ::
> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit
> [ivy:resolve]   ::
> [ivy:resolve]
> [ivy:resolve]
> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>
> Can anybody point me to the source of this error or a workaround?
>
> Thanks,
> Tricia

Re: anyone have any clues about this exception

2012-10-12 Thread Erick Erickson

Right. If I've multiplied right, you're essentially replacing your entire index
every day given the rate you're adding documents.

Have a look at MergePolicy, here are a couple of references:
http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/
https://lucene.apache.org/core/old_versioned_docs/versions/3_2_0/api/core/org/apache/lucene/index/MergePolicy.html
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

But unless you're having problems with performance, I'd consider just
optimizing once a day at off-peak hours.

FWIW,
Erick

On Fri, Oct 12, 2012 at 5:35 PM, Petersen, Robert  wrote:
> Hi Erick,
>
> After reading the discussion you guys were having about renaming optimize to 
> forceMerge I realized I was guilty of over-optimizing like you guys were 
> worried about!  We have about 15 million docs indexed now and we spin about 
> 50-300 adds per second 24/7, most of them being updates to existing documents 
> whose data has changed since the last time it was indexed (which we keep 
> track of in a DB table).  There are some new documents being added in the mix 
> and some deletes as well too.
>
> I understand now how the merge policy caps the number of segments.  I used to 
> think they would grow unbounded and thus optimize was required.  How does the 
> large number of updates of existing documents affect the need to optimize, by 
> causing a large number of deletes with a 're-add'?  And so I suppose that 
> means the index size tends to grow with the deleted docs hanging around in 
> the background, as it were.
>
> So in our situation, what frequency of optimize would you recommend?  We're 
> on 3.6.1 btw...
>
> Thanks,
> Robi
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, October 11, 2012 5:29 AM
> To: solr-user@lucene.apache.org
> Subject: Re: anyone have any clues about this exception
>
> Well, you'll actually be able to optimize, it's just called forceMerge.
>
> But the point is that optimize seems like something that _of course_ you want 
> to do, when in reality it's not something you usually should do at all. 
> Optimize does two things:
> 1> merges all the segments into one (usually)
> 2> removes all of the info associated with deleted documents.
>
> Of the two, point <2> is the one that really counts and that's done whenever 
> segment merging is done anyway. So unless you have a very large number of 
> deletes (or updates of the same document), optimize buys you very little. You 
> can tell this by the difference between numDocs and maxDoc in the admin page.
>
> So what happens if you just don't bother to optimize? Take a look at merge 
> policy to help control how merging happens perhaps as an alternative.
>
> Best
> Erick
>
> On Wed, Oct 10, 2012 at 3:04 PM, Petersen, Robert  wrote:
>> You could be right.  Going back in the logs, I noticed it used to happen 
>> less frequently and always towards the end of an optimize operation.  It is 
>> probably my indexer timing out waiting for updates to occur during 
>> optimizes.  The errors grew recently due to my upping the indexer 
>> threadcount to 22 threads, so there's a lot more timeouts occurring now.  
>> Also our index has grown to double the old size so the optimize operation 
>> has started taking a lot longer, also contributing to what I'm seeing.   I 
>> have just changed my optimize frequency from three times a day to one time a 
>> day after reading the following:
>>
>> Here they are talking about completely deprecating the optimize
>> command in the next version of solr...
>> https://issues.apache.org/jira/browse/SOLR-3141c
>>
>>
>> -Original Message-
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: Wednesday, October 10, 2012 11:10 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: anyone have any clues about this exception
>>
>> Something timed out, the other end closed the connection. This end tried to 
>> write to closed pipe and died, something tried to catch that exception and 
>> write its own and died even worse? Just making it up really, but sounds good 
>> (plus a 3-year Java tech-support hunch).
>>
>> If it happens often enough, see if you can run WireShark on that machine's 
>> network interface and catch the whole network conversation in action. Often, 
>> there is enough clues there by looking at tcp packets and/or stuff 
>> transmitted. WireShark is a power-tool, so takes a little while the first 
>> time, but the learning will pay for itself over and over again.
>>
>> Regards,
>>Alex.
>>
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert  wrote:
>>> Tomcat localhost log (not the catalina log) for my  solr 3.6.1 (m

Re: anyone have any clues about this exception

2012-10-12 Thread Walter Underwood

If you are updating all the time, don't forceMerge at all, unless you want to 
put the overhead of big merges at a known time. Otherwise, leave it alone.

wunder

On Oct 12, 2012, at 3:56 PM, Erick Erickson wrote:

> Right. If I've multiplied right, you're essentially replacing your entire 
> index
> every day given the rate you're adding documents.
> 
> Have a look at MergePolicy, here are a couple of references:
> http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/
> https://lucene.apache.org/core/old_versioned_docs/versions/3_2_0/api/core/org/apache/lucene/index/MergePolicy.html
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> 
> But unless you're having problems with performance, I'd consider just
> optimizing once a day at off-peak hours.
> 
> FWIW,
> Erick
> 
> On Fri, Oct 12, 2012 at 5:35 PM, Petersen, Robert  wrote:
>> Hi Erick,
>> 
>> After reading the discussion you guys were having about renaming optimize to 
>> forceMerge I realized I was guilty of over-optimizing like you guys were 
>> worried about!  We have about 15 million docs indexed now and we spin about 
>> 50-300 adds per second 24/7, most of them being updates to existing 
>> documents whose data has changed since the last time it was indexed (which 
>> we keep track of in a DB table).  There are some new documents being added 
>> in the mix and some deletes as well too.
>> 
>> I understand now how the merge policy caps the number of segments.  I used 
>> to think they would grow unbounded and thus optimize was required.  How does 
>> the large number of updates of existing documents affect the need to 
>> optimize, by causing a large number of deletes with a 're-add'?  And so I 
>> suppose that means the index size tends to grow with the deleted docs 
>> hanging around in the background, as it were.
>> 
>> So in our situation, what frequency of optimize would you recommend?  We're 
>> on 3.6.1 btw...
>> 
>> Thanks,
>> Robi
>> 
>> -Original Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Thursday, October 11, 2012 5:29 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: anyone have any clues about this exception
>> 
>> Well, you'll actually be able to optimize, it's just called forceMerge.
>> 
>> But the point is that optimize seems like something that _of course_ you 
>> want to do, when in reality it's not something you usually should do at all. 
>> Optimize does two things:
>> 1> merges all the segments into one (usually)
>> 2> removes all of the info associated with deleted documents.
>> 
>> Of the two, point <2> is the one that really counts and that's done whenever 
>> segment merging is done anyway. So unless you have a very large number of 
>> deletes (or updates of the same document), optimize buys you very little. 
>> You can tell this by the difference between numDocs and maxDoc in the admin 
>> page.
>> 
>> So what happens if you just don't bother to optimize? Take a look at merge 
>> policy to help control how merging happens perhaps as an alternative.
>> 
>> Best
>> Erick
>> 
>> On Wed, Oct 10, 2012 at 3:04 PM, Petersen, Robert  wrote:
>>> You could be right.  Going back in the logs, I noticed it used to happen 
>>> less frequently and always towards the end of an optimize operation.  It is 
>>> probably my indexer timing out waiting for updates to occur during 
>>> optimizes.  The errors grew recently due to my upping the indexer 
>>> threadcount to 22 threads, so there's a lot more timeouts occurring now.  
>>> Also our index has grown to double the old size so the optimize operation 
>>> has started taking a lot longer, also contributing to what I'm seeing.   I 
>>> have just changed my optimize frequency from three times a day to one time 
>>> a day after reading the following:
>>> 
>>> Here they are talking about completely deprecating the optimize
>>> command in the next version of solr...
>>> https://issues.apache.org/jira/browse/SOLR-3141c
>>> 
>>> 
>>> -Original Message-
>>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>>> Sent: Wednesday, October 10, 2012 11:10 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: anyone have any clues about this exception
>>> 
>>> Something timed out, the other end closed the connection. This end tried to 
>>> write to closed pipe and died, something tried to catch that exception and 
>>> write its own and died even worse? Just making it up really, but sounds 
>>> good (plus a 3-year Java tech-support hunch).
>>> 
>>> If it happens often enough, see if you can run WireShark on that machine's 
>>> network interface and catch the whole network conversation in action. 
>>> Often, there is enough clues there by looking at tcp packets and/or stuff 
>>> transmitted. WireShark is a power-tool, so takes a little while the first 
>>> time, but the learning will pay for itself over and over again.
>>> 
>>> Regards,
>>>   Alex.
>>> 
>>> Personal blog: http://blog.outerthoughts.com/

Re: which api to use to manage solr ?

2012-10-12 Thread Otis Gospodnetic

Good evening,

SolrJ lives in the same house as Solr itself, so...

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 12, 2012 5:39 PM, "autregalaxie"  wrote:

> Good morning everybody,
>
> I'm a new user of Solr, i have to develop new interface to manage Solr. I
> have found severel api to do that ( Blacklight, Sunspot, Solrj,
> ruby-Solr...) and I need your help to know wish one are better and more
> reliable.
>
> Thank You
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/which-api-to-use-to-manage-solr-tp4013491.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: which api to use to manage solr ?

2012-10-12 Thread Lance Norskog

SolrJ is in Java, RSolr and ruby-solr are for ruby, etc. These are for
low-level programming.

There is a Wordpress plugin for Solr, Django, Magento e-commerce, and
some other apps. Blacklight is an content manager for libraries.

What do you want to do with Solr?

On Fri, Oct 12, 2012 at 4:45 PM, Otis Gospodnetic
 wrote:
> Good evening,
>
> SolrJ lives in the same house as Solr itself, so...
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm
> On Oct 12, 2012 5:39 PM, "autregalaxie"  wrote:
>
>> Good morning everybody,
>>
>> I'm a new user of Solr, i have to develop new interface to manage Solr. I
>> have found severel api to do that ( Blacklight, Sunspot, Solrj,
>> ruby-Solr...) and I need your help to know wish one are better and more
>> reliable.
>>
>> Thank You
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/which-api-to-use-to-manage-solr-tp4013491.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>



-- 
Lance Norskog
goks...@gmail.com

Re: Using

2012-10-12 Thread Lance Norskog

After that, remove your ivy repository (home/.ivy2) and try again. And
rename your Maven repository just to avoid anything.

 I have had weird problems with connectivity to different Ivy
repositories. I use a VPN service that pops out in different countries
(blackVPN) and some countries worked and other countries did not.

On Fri, Oct 12, 2012 at 3:52 PM, Erick Erickson  wrote:
> I've been building 4.x regularly. Have you tried "ant clean-jars"?
>
> Best
> Erick
>
> On Fri, Oct 12, 2012 at 6:32 PM, P Williams
>  wrote:
>> Hi,
>>
>> Has anyone tried using > name="solr-test-framework" rev="4.0.0" conf="test->default"/> with Apache
>> IVY in their project?
>>
>> rev 3.6.1 works but any of the 4.0.0 ALPHA, BETA and release result in:
>> [ivy:resolve] :: problems summary ::
>> [ivy:resolve]  WARNINGS
>> [ivy:resolve]   [FAILED ]
>> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit:
>>  (0ms)
>> [ivy:resolve]    shared: tried
>> [ivy:resolve]
>> C:\Users\pjenkins\.ant/shared/org.eclipse.jetty.orbit/javax.servlet/3.0.0.v201112011016/orbits/javax.servlet.orbit
>> [ivy:resolve]    public: tried
>> [ivy:resolve]
>> http://repo1.maven.org/maven2/org/eclipse/jetty/orbit/javax.servlet/3.0.0.v201112011016/javax.servlet-3.0.0.v201112011016.orbit
>> [ivy:resolve]   ::
>> [ivy:resolve]   ::  FAILED DOWNLOADS::
>> [ivy:resolve]   :: ^ see resolution messages for details  ^ ::
>> [ivy:resolve]   ::
>> [ivy:resolve]   ::
>> org.eclipse.jetty.orbit#javax.servlet;3.0.0.v201112011016!javax.servlet.orbit
>> [ivy:resolve]   ::
>> [ivy:resolve]
>> [ivy:resolve]
>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>>
>> Can anybody point me to the source of this error or a workaround?
>>
>> Thanks,
>> Tricia



-- 
Lance Norskog
goks...@gmail.com

42 matches

Mail list logo