Large number of collections in SolrCloud

2015-07-27 Thread Olivier
Hi,

I have a SolrCloud cluster with 3 nodes :  3 shards per node and
replication factor at 3.
The collections number is around 1000. All the collections use the same
Zookeeper configuration.
So when I create each collection, the ZK configuration is pulled from ZK
and the configuration files are stored in the JVM.
I thought that if the configuration was the same for each collection, the
impact on the JVM would be insignifiant because the configuration should be
loaded only once. But it is not the case, for each collection created, the
JVM size increases because the configuration is loaded again, am I correct ?

If I have a small configuration folder size, I have no problem because the
folder size is less than 500 KB so if we count 1000 collections x 500 KB,
the JVM impact is 500 MB.
But we manage a lot of languages with some dictionaries so the
configuration folder size is about 6 MB. The JVM impact is very important
now because it can be more than 6 GB (1000 x 6 MB).

So I would like to have the feeback of people who have a cluster with a
large number of collections too. Do I have to change some settings to
handle this case better ? What can I do to optimize this behaviour ?
For now, we just increase the RAM size per node at 16 GB but we plan to
increase the collections number.

Thanks,

Olivier


reload collections timeout

2015-08-03 Thread olivier

Hi everybody,

I have about 1300 collections, 3 shards, replicationfactor = 3, 
MaxShardPerNode=3.
I have 3 boxes of 64G (32 JVM).

When I want to reload all my collections I get a timeout error.
Is there a way to make a reload in async as to create collections 
(async=requestid)?
I saw on this issue that it was done but it did not seem to work.

https://issues.apache.org/jira/browse/SOLR-5477

how to use the async mode to reload collections ?

thanks a lot

Olivier Damiot



Re: Large number of collections in SolrCloud

2015-08-03 Thread Olivier
Hi,

Thanks a lot Erick and Shawn for your answers.
I am aware that it is a very particular issue with not a common use of
Solr. I just wondered if people had the similar business case. For
information we need a very important number of collections with the same
configuration cause of legally reasons. Indeed each collection represents
one of our customers and by contract we have to separate the data of each
of them.
If we had the choice, we just would have one collection with a field name
'Customers' and we would do filter queries on it but we can't !

Anyway thanks again for your answers. For now, we finally did not add the
different languages dictionaries per collection and it is fine for 1K+
customers with more resources added to the servers.

Best,

Olivier Tavard



2015-07-27 17:53 GMT+02:00 Shawn Heisey :

> On 7/27/2015 9:16 AM, Olivier wrote:
> > I have a SolrCloud cluster with 3 nodes :  3 shards per node and
> > replication factor at 3.
> > The collections number is around 1000. All the collections use the same
> > Zookeeper configuration.
> > So when I create each collection, the ZK configuration is pulled from ZK
> > and the configuration files are stored in the JVM.
> > I thought that if the configuration was the same for each collection, the
> > impact on the JVM would be insignifiant because the configuration should
> be
> > loaded only once. But it is not the case, for each collection created,
> the
> > JVM size increases because the configuration is loaded again, am I
> correct ?
> >
> > If I have a small configuration folder size, I have no problem because
> the
> > folder size is less than 500 KB so if we count 1000 collections x 500 KB,
> > the JVM impact is 500 MB.
> > But we manage a lot of languages with some dictionaries so the
> > configuration folder size is about 6 MB. The JVM impact is very important
> > now because it can be more than 6 GB (1000 x 6 MB).
> >
> > So I would like to have the feeback of people who have a cluster with a
> > large number of collections too. Do I have to change some settings to
> > handle this case better ? What can I do to optimize this behaviour ?
> > For now, we just increase the RAM size per node at 16 GB but we plan to
> > increase the collections number.
>
> Severe issues were noticed when dealing with many collections, and this
> was with a simple config, and completely empty indexes.  A complex
> config and actual index data would make it run that much more slowly.
>
> https://issues.apache.org/jira/browse/SOLR-7191
>
> Memory usage for the config wasn't even considered when I was working on
> reporting that issue.
>
> SolrCloud is highly optimized to work well when there are a relatively
> small number of collections.  I think there is work that we can do which
> will optimize operations to the point where thousands of collections
> will work well, especially if they all share the same config/schema ...
> but this is likely to be a fair amount of work, which will only benefit
> a handful of users who are pushing the boundaries of what Solr can do.
> In the open source world, a problem like that doesn't normally receive a
> lot of developer attention, and we rely much more on help from the
> community, specifically from knowledgeable users who are having the
> problem and know enough to try and fix it.
>
> FYI -- 16GB of RAM per machine is quite small for Solr, particularly
> when pushing the envelope.  My Solr machines are maxed at 64GB, and I
> frequently wish I could install more.
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
>
> One possible solution for your dilemma is simply adding more machines
> and spreading your collections out so each machine's memory requirements
> go down.
>
> Thanks,
> Shawn
>
>


Large multivalued field and overseer problem

2015-11-19 Thread Olivier
Hi,

We have a Solrcloud cluster with 3 nodes (4 processors, 24 Gb RAM per node).
We have 3 shards per node and the replication factor is 3. We host 3
collections, the biggest is about 40K documents only.
The most important thing is a multivalued field with about 200K to 300K
values per document (each value is a kind of reference product of type
String).
We have some very big issues with our SolrCloud cluster. It crashes
entirely very frequently at the indexation time. It starts with an overseer
issue :

Session expired de l’overseer : KeeperErrorCode = Session expired for
/overseer_elect/leader

Then an another node is elected overseer. But the recovery phase seems to
failed indefinitely. It seems that the communication between the overseer
and ZK is impossible.
And after a short period of time, all the cluster is unavailable (out of
memory JVM error). And we have to restart it.

So I wanted to know if we can continue to use huge multivalued field with
SolrCloud.
We are on Solr 4.10.4 for now, do you think that if we upgrade to Solr 5,
with an overseer per collection it can fix our issues ?
Or do we have to rethink the schema to avoid this very large multivalued
field ?

Thanks,
Best,

Olivier


Problems for indexing large documents on SolrCloud

2014-09-10 Thread Olivier
Hi,

I have some problems for indexing large documents in a SolrCloud cluster of
3 servers  (Solr 4.8.1) with 3 shards and 2 replicas for each shard on
Tomcat 7.
For a specific document (with 300 K values in a  multivalued field), I
couldn't index it on SolrCloud but I could do it in a single instance of
Solr on my own PC.

The indexation is done with Solarium from a database. The data indexed are
e-commerce products with classic fields like name, price, description,
instock, etc... The large field (type int) is constitued of other products
ids.
The only difference with other documents well-indexed on Solr  is the size
of that multivalued field. Indeed, other documents well-indexed have all
between 100K values and 200 K values for that field.
The index size is 11 Mb for 20 documents.

To solve it, I tried to change several parameters including ZKTimeout in
solr.xml  :

In solrcloud section :

6

10

10



 In shardHandlerFactory section  :



${socketTimeout:10}

${connTimeout:10}


I also tried to increase these values in solrconfig.xml :






I also tried to increase the quantity of RAM (there are VMs) : each server
has 4 Gb of RAM with 3Gb for the JVM.

Are there other settings which can solve the problem that I would have
forgotten ?


The error messages are :

ERROR

SolrDispatchFilter

null:java.lang.RuntimeException: [was class java.net.SocketException]
Connection reset

ERROR

SolrDispatchFilter

null:ClientAbortException:

java.net.SocketException:
broken pipe

ERROR

SolrDispatchFilter

null:ClientAbortException:

java.net.SocketException:
broken pipe

ERROR

SolrCore

org.apache.solr.common.SolrException:
  Unexpected end of input
block; expected an identifier

ERROR

SolrCore

org.apache.solr.common.SolrException:
  Unexpected end of input
block; expected an identifier

ERROR

SolrCore

org.apache.solr.common.SolrException:
  Unexpected end of input
block; expected an identifier

ERROR

SolrCore

org.apache.solr.common.SolrException:
  Unexpected EOF in
attribute value








Thanks,

Olivier

SolrCore

org.apache.solr.common.SolrException:
  Unexpected end of input
block in start tag


Re: Problems for indexing large documents on SolrCloud

2014-09-22 Thread Olivier
Hi,

First thanks for your advices.
I did some several tests and finally I could index all the data on my
SolrCloud cluster.
The error was client side, it's documented in this post :
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201406.mbox/%3ccfc09ae1.94f8%25rebecca.t...@ucsf.edu%3E

"EofException from Jetty means one specific thing:  The client software
disconnected before Solr was finished with the request and sent its
response.  Chances are good that this is because of a configured socket
timeout on your SolrJ client or its HttpClient.  This might have been
done with the setSoTimeout method on the server object."

So I increased Solarium timeout from 5 to 60 seconds and all the data
is well indexed now. The error was not reproducible on my development
PC because the database and the Solr were on the same local virtual
machine with a lot of available resources so the indexation was faster
than in SolrCloud cluster.

Thanks,

Olivier


2014-09-11 0:21 GMT+02:00 Shawn Heisey :

> On 9/10/2014 2:05 PM, Erick Erickson wrote:
> > bq: org.apache.solr.common.SolrException: Unexpected end of input
> > block; expected an identifier
> >
> > This is very often an indication that your packets are being
> > truncated by "something in the chain". In your case, make sure
> > that Tomcat is configured to handle inputs of the size that you're
> sending.
> >
> > This may be happening before things get to Solr, in which case your
> settings
> > in solrconfig.xml aren't germane, the problem is earlier than than.
> >
> > A "semi-smoking-gun" here is that there's a size of your multivalued
> > field that seems to break things... That doesn't rule out time problems
> > of course.
> >
> > But I'd look at the Tomcat settings for maximum packet size first.
>
> The maximum HTTP request size is actually is controlled by Solr itself
> since 4.1, with changes committed for SOLR-4265.  Changing the setting
> on Tomcat probably will not help.
>
> An example from my own config which sets this to 32MB - the default is
> 2048, or 2MB:
>
>   multipartUploadLimitInKB="32768" formdataUploadLimitInKB="32768"/>
>
> Thanks,
> Shawn
>
>


Leader election

2015-07-29 Thread Olivier Damiot
Hello everybody,

I use solr 5.2.1 and am having a big problem.
I have about 1200 collections, 3 shards, replicationfactor = 3,
MaxShardPerNode=3.
I have 3 boxes of 64G (32 JVM).
I have no problems with the creation of collection or indexing, but when I
lose a node (VMY full or kill) and I restart, all my collections are down.
I look in the logs I can see problems of leader election, eg:
  - Checking if I (core = test339_shard1_replica1, coreNodeName =
core_node5) shoulds try and be the leader.
- Cloud says we are still state leader.

I feel that all server pass the buck!

I do not understand this error especially as if I read the mailing list I
have the impression that this bug is solved long ago.

what should I do to start my collections properly?

Is someone could help me ?

thank you a lot

Olivier


Fast autocomplete for large dataset

2015-08-01 Thread Olivier Austina
Hi,

I am looking for a fast and easy to maintain way to do autocomplete for
large dataset in solr. I heard about Ternary Search Tree (TST)
<https://en.wikipedia.org/wiki/Ternary_search_tree>.
But I would like to know if there is something I missed such as best
practice, Solr new feature. Any suggestion is welcome. Thank you.

Regards
Olivier


Re: Fast autocomplete for large dataset

2015-08-01 Thread Olivier Austina
Thank you Eric for your reply.
If I understand it seems that these approaches are using index to hold
terms. As the index grows bigger, it can be a performance issues.
Is it right? Please can you check this article
<http://www.norconex.com/serving-autocomplete-suggestions-fast/> to see
what I mean?   Thank you.

Regards
Olivier


2015-08-01 17:42 GMT+02:00 Erick Erickson :

> Well, defining what you mean by "autocomplete" would be a start. If it's
> just
> a user types some letters and you suggest the next N terms in the list,
> TermsComponent will fix you right up.
>
> If it's more complicated, the AutoSuggest functionality might help.
>
> If it's correcting spelling, there's the spellchecker.
>
> Best,
> Erick
>
> On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina
>  wrote:
> > Hi,
> >
> > I am looking for a fast and easy to maintain way to do autocomplete for
> > large dataset in solr. I heard about Ternary Search Tree (TST)
> > <https://en.wikipedia.org/wiki/Ternary_search_tree>.
> > But I would like to know if there is something I missed such as best
> > practice, Solr new feature. Any suggestion is welcome. Thank you.
> >
> > Regards
> > Olivier
>


Re: Fast autocomplete for large dataset

2015-08-01 Thread Olivier Austina
Thank you Eric,

I would like to implement an autocomplete for large dataset.  The
autocomplete should show the phrase or the question the user want as the
user types. The requirement is that the autocomplete should be fast (not
slowdown by the volume of data as dataset become bigger), and easy to
maintain. The autocomplete can have its own Solr server.  It is an
autocomplete like others but it should be only fast and easy to maintain.

What is the limitations of suggesters mentioned in the article? Thank you.

Regards
Olivier


2015-08-01 19:41 GMT+02:00 Erick Erickson :

> Not really. There's no need to use ngrams as the article suggests if the
> terms component does what you need. Which is why I asked you about what
> autocomplete means in your context. Which you have not clarified. Have you
> even looked at terms component?  Especially the terms.prefix option?
>
> Terms component has it's limitations, but performance isn't one of them.
> The suggesters mentioned in the article have other limitations. It's really
> useless to discuss those limitations, though, until the problem you're
> trying to solve is clearly stated.
> On Aug 1, 2015 1:01 PM, "Olivier Austina" 
> wrote:
>
> > Thank you Eric for your reply.
> > If I understand it seems that these approaches are using index to hold
> > terms. As the index grows bigger, it can be a performance issues.
> > Is it right? Please can you check this article
> > <http://www.norconex.com/serving-autocomplete-suggestions-fast/> to see
> > what I mean?   Thank you.
> >
> > Regards
> > Olivier
> >
> >
> > 2015-08-01 17:42 GMT+02:00 Erick Erickson :
> >
> > > Well, defining what you mean by "autocomplete" would be a start. If
> it's
> > > just
> > > a user types some letters and you suggest the next N terms in the list,
> > > TermsComponent will fix you right up.
> > >
> > > If it's more complicated, the AutoSuggest functionality might help.
> > >
> > > If it's correcting spelling, there's the spellchecker.
> > >
> > > Best,
> > > Erick
> > >
> > > On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina
> > >  wrote:
> > > > Hi,
> > > >
> > > > I am looking for a fast and easy to maintain way to do autocomplete
> for
> > > > large dataset in solr. I heard about Ternary Search Tree (TST)
> > > > <https://en.wikipedia.org/wiki/Ternary_search_tree>.
> > > > But I would like to know if there is something I missed such as best
> > > > practice, Solr new feature. Any suggestion is welcome. Thank you.
> > > >
> > > > Regards
> > > > Olivier
> > >
> >
>


Re: Fast autocomplete for large dataset

2015-08-01 Thread Olivier Austina
Thank you Eric for your replies and the link.

Regards
Olivier


2015-08-02 3:47 GMT+02:00 Erick Erickson :

> Here's some background:
>
> http://lucidworks.com/blog/solr-suggester/
>
> Basically, the limitation is that to build the suggester all docs in
> the index need to be read to pull out the stored field and build
> either the FST or the sidecar Lucene index, which can be a _very_
> costly operation (as in minutes/hours for a large dataset).
>
> bq: The requirement is that the autocomplete should be fast (not
> slowdown by the volume of data as dataset become bigger)
>
> Well, in some alternate universe this may be possible. But the larger
> the corpus the slower the processing will be, there's just no way
> around that. Whether it's fast enough for your application is a better
> question ;).
>
> Best,
> Erick
>
>
> On Sat, Aug 1, 2015 at 2:05 PM, Olivier Austina
>  wrote:
> > Thank you Eric,
> >
> > I would like to implement an autocomplete for large dataset.  The
> > autocomplete should show the phrase or the question the user want as the
> > user types. The requirement is that the autocomplete should be fast (not
> > slowdown by the volume of data as dataset become bigger), and easy to
> > maintain. The autocomplete can have its own Solr server.  It is an
> > autocomplete like others but it should be only fast and easy to maintain.
> >
> > What is the limitations of suggesters mentioned in the article? Thank
> you.
> >
> > Regards
> > Olivier
> >
> >
> > 2015-08-01 19:41 GMT+02:00 Erick Erickson :
> >
> >> Not really. There's no need to use ngrams as the article suggests if the
> >> terms component does what you need. Which is why I asked you about what
> >> autocomplete means in your context. Which you have not clarified. Have
> you
> >> even looked at terms component?  Especially the terms.prefix option?
> >>
> >> Terms component has it's limitations, but performance isn't one of them.
> >> The suggesters mentioned in the article have other limitations. It's
> really
> >> useless to discuss those limitations, though, until the problem you're
> >> trying to solve is clearly stated.
> >> On Aug 1, 2015 1:01 PM, "Olivier Austina" 
> >> wrote:
> >>
> >> > Thank you Eric for your reply.
> >> > If I understand it seems that these approaches are using index to hold
> >> > terms. As the index grows bigger, it can be a performance issues.
> >> > Is it right? Please can you check this article
> >> > <http://www.norconex.com/serving-autocomplete-suggestions-fast/> to
> see
> >> > what I mean?   Thank you.
> >> >
> >> > Regards
> >> > Olivier
> >> >
> >> >
> >> > 2015-08-01 17:42 GMT+02:00 Erick Erickson :
> >> >
> >> > > Well, defining what you mean by "autocomplete" would be a start. If
> >> it's
> >> > > just
> >> > > a user types some letters and you suggest the next N terms in the
> list,
> >> > > TermsComponent will fix you right up.
> >> > >
> >> > > If it's more complicated, the AutoSuggest functionality might help.
> >> > >
> >> > > If it's correcting spelling, there's the spellchecker.
> >> > >
> >> > > Best,
> >> > > Erick
> >> > >
> >> > > On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina
> >> > >  wrote:
> >> > > > Hi,
> >> > > >
> >> > > > I am looking for a fast and easy to maintain way to do
> autocomplete
> >> for
> >> > > > large dataset in solr. I heard about Ternary Search Tree (TST)
> >> > > > <https://en.wikipedia.org/wiki/Ternary_search_tree>.
> >> > > > But I would like to know if there is something I missed such as
> best
> >> > > > practice, Solr new feature. Any suggestion is welcome. Thank you.
> >> > > >
> >> > > > Regards
> >> > > > Olivier
> >> > >
> >> >
> >>
>


SOLR cloud (5.2.1) recovery

2015-08-18 Thread Olivier Damiot
hello,

i'am a bit confused about how solr cloud recovery is supposed to work
exactly in the case of loosing a single node completely.

My 600 collections are created with
numShards=3&replicationFactor=3&maxShardsPerNode=3

However, how do i configure a new node to take the place of the dead
node, or if accidentally i delete the data dir ?

I bring up a new node which is completely empty (empty data dir),
install solr, and connect it to zookeeper.Is it supposed to work
automatically from there? All my shards/replicas on this node as down
(i suppose because there is no cores in data dir).

Do I need to recreate the cores first?

Can i copy/paste data directory from another node to this one ? I
think no because i should rename all variables in core.properties
which are dedicated for each node (like name or coreNodeName)

thanks,

Olivier Damiot


How to dereference boost values?

2015-07-14 Thread Olivier Lebra
Is it possible to do something like this: bf=myfield^$myfactor

Thanks,
Olivier


Dereferencing boost values?

2015-07-14 Thread Olivier Lebra
Is there a way to do something like this: " bf=myfield^$myfactor " ?
(Doesn't work, the boost value has to be a direct number)

Thanks,
Olivier


Re: Dereferencing boost values?

2015-07-14 Thread Olivier Lebra

Thanks guys...
I'm using edismax, and I have a long bf field, that I want in a solr's 
requesthandler config as default, but customizable via query string, 
something like that:


  
product(a,$a)^$fa sum(b,$b1,$b2)^$fb c^$fc ...

where the caller would pass $a, $fa, $b1, $b2, $fb, $fc (and a, b, c are 
numeric fields)


So my problem is with $fa, $fb, and $fc. Solr doesn't take that syntax.

For numeric operands, is the dismax boost operator ^ just a pow()? If 
so, my problem is solved by doing that:
 pow(product(a,$a1),$fa) pow(sum(b,$b1,$b2),$fb) 
pow(c,$fc)

Is a^b equiv to pow(a,b)?

Thanks,
Olivier


On 7/14/2015 2:31 PM, Chris Hostetter wrote:

To clarify the difference:

- "bf" is a special param of the dismax parser, which does an *additive*
boost function - that function can be something as simple as a numeric
field

- alternatively, you can use the "boost" parser in your main query string,
to wrap any parser (dismax, edismax, standard, whatever) in a
*multiplicitive* boost, where the boost function can be anything

- multiplicitve boosts are almost always what people really want, additive
boosts are a lot less useful.

- when specifying any function, you can use variable derefrencing for any
function params.

So in the example Upayavira gave, you can use any arbitrary query param to
specify the function to use as a multiplicitive boost arround an arbitrary
query -- which could still use dismax if you want (just specify the
neccessary parser "type" as a localparam on the inner query, or use a
defType localparam on the original boost query).  Or you could explicitly
specify a function that incorporates a field value with some other
dynamic params, and use that entire function as your multiplicitive boost.

a more elaborate example using the "bin/solr -e techproducts" data...

http://localhost:8983/solr/techproducts/query?debug=query&q={!boost%20b=$boost_func%20defType=dismax%20v=$qq}&qf=name+title&qq=apple%20ipod&boost_func=pow%28$boost_field,$boost_factor%29&boost_field=price&boost_factor=2

 "params":{
   "qq":"apple ipod",
   "q":"{!boost b=$boost_func defType=dismax v=$qq}",
   "debug":"query",
   "qf":"name title",
   "boost_func":"pow($boost_field,$boost_factor)",
   "boost_factor":"2",
   "boost_field":"price"}},







: Date: Tue, 14 Jul 2015 21:58:36 +0100
: From: Upayavira 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Dereferencing boost values?
:
: You could do
:
: q={!boost b=$b v=$qq}
: qq=your query
: b=YOUR-FACTOR
:
: If what you want is to provide a value outside.
:
: Also, with later Solrs, you can use ${whatever} syntax in your main
: query, which might work for you too.
:
: Upayavira
:
: On Tue, Jul 14, 2015, at 09:28 PM, Olivier Lebra wrote:
: > Is there a way to do something like this: " bf=myfield^$myfactor " ?
: > (Doesn't work, the boost value has to be a direct number)
: >
: > Thanks,
: > Olivier
:

-Hoss
http://www.lucidworks.com/




Querying specific database attributes or table

2014-03-16 Thread Olivier Austina
Hi,
I am new to Solr.

I would like to index and querying a relational database. Is it possible to
query a specific table or attribute of the database. Example if I have 2
tables A and B both have the attribute "name" and I want to have only the
results form the table A and not from table B. Is it possible?
Can I restrict the query to only one table without having result from
others table?
Is it possible to query a specific attribute of a table?
Is it possible to do join query like SQL?
Any suggestion is welcome. Thank you.

Regards
Olivier


Topology of Solr use

2014-04-17 Thread Olivier Austina
Hi All,
I would to have an idea about Solr usage: number of users, industry,
countries or any helpful information. Thank you.
Regards
Olivier


Re: Topology of Solr use

2014-04-17 Thread Olivier Austina
Thank you Markus, the link is very useful.


Regards
Olivier



2014-04-17 18:24 GMT+02:00 Markus Jelsma :

> This may help a bit:
>
> https://wiki.apache.org/solr/PublicServers
>
> -Original message-
> From:Olivier Austina 
> Sent:Thu 17-04-2014 18:16
> Subject:Topology of Solr use
> To:solr-user@lucene.apache.org;
> Hi All,
> I would to have an idea about Solr usage: number of users, industry,
> countries or any helpful information. Thank you.
> Regards
> Olivier
>


Problem indexing email attachments

2014-04-23 Thread Olivier . Masseau
Hello, 

I'm trying to index email files with Solr (4.7.2)

The files have the extension .eml (message/rfc822) 

The mail body is correctly indexed but attachments are not indexed if they 
are not .txt files. 

If attachments are .txt files it works, but if attachment are .pdf of 
.docx files they are not indexed. 



I checked the extracted text by calling: 

curl "
http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true&extractOnly=true&extractFormat=text
" -F "myfile=@Test1.eml" 

The returned extracted text does not contain the content of the 
attachments if they are not .txt files. 


It is not a problem with the Apache Tika library not being able to process 
attachments, because running the standalone Apache Tika app by calling: 


java -jar tika-app-1.4.jar -t Test1.eml 


on my eml files correctly displays the attachments' text. 



Maybe is it a problem with how Tika is called by Solr ? 

Is there something to modify in the default configuration ? 


Thanx for any help ;) 
 
Olivier 

Re: Problem indexing email attachments

2014-04-23 Thread Olivier . Masseau
As I said, it is not a problem in the Tika library ;)

I have tried with Tika 1.5 jars and it gives the same results.



Guido Medina  wrote on 23/04/2014 16:15:11:

> From: Guido Medina 
> To: solr-user@lucene.apache.org
> Date: 23/04/2014 16:15
> Subject: Re: Problem indexing email attachments
> 
> We particularly massage solr.war and put our own updated jars, maybe 
> this helps:
> 
> http://www.apache.org/dist/tika/CHANGES-1.5.txt
> 
> We using Tika 1.5 inside Solr with POI 3.10-Final, etc...
> 
> Guido.
> 
> On 23/04/14 14:38, olivier.mass...@real.lu wrote:
> > Hello,
> >
> > I'm trying to index email files with Solr (4.7.2)
> >
> > The files have the extension .eml (message/rfc822)
> >
> > The mail body is correctly indexed but attachments are not indexed if 
they
> > are not .txt files.
> >
> > If attachments are .txt files it works, but if attachment are .pdf of
> > .docx files they are not indexed.
> >
> >
> >
> > I checked the extracted text by calling:
> >
> > curl "
> > http://localhost:8983/solr/update/extract?
> literal.id=doc1&commit=true&extractOnly=true&extractFormat=text
> > " -F "myfile=@Test1.eml"
> >
> > The returned extracted text does not contain the content of the
> > attachments if they are not .txt files.
> >
> >
> > It is not a problem with the Apache Tika library not being able to 
process
> > attachments, because running the standalone Apache Tika app by 
calling:
> >
> >
> > java -jar tika-app-1.4.jar -t Test1.eml
> >
> >
> > on my eml files correctly displays the attachments' text.
> >
> >
> >
> > Maybe is it a problem with how Tika is called by Solr ?
> >
> > Is there something to modify in the default configuration ?
> >
> >
> > Thanx for any help ;)
> > 
> > Olivier
> 


Website running Solr

2014-05-11 Thread Olivier Austina
Hi All,
Is there a way to know if a website use Solr? Thanks.
Regards
Olivier


Subject=How to Get Highlighting Working in Velocity (Solr 4.8.0)

2014-07-27 Thread Olivier FOSTIER
May be you miss that your field "dom_title" should be
index="true" termVectors="true" termPositions="true" termOffsets="true"


Re: feedback on Solr 4.x LotsOfCores feature

2013-10-18 Thread Soyez Olivier
15K cores is around 4 minutes : no network drive, just a spinning disk
But, one important thing, to simulate a cold start or an useless linux buffer 
cache,
I used the following command to empty the linux buffer cache :
sync && echo 3 > /proc/sys/vm/drop_caches
Then, I started Solr and I found the result above


Le 11/10/2013 13:06, Erick Erickson a écrit :


bq: sharing the underlying solrconfig object the configset introduced
in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode

SOLR-4478 will NOT share the underlying config objects, it simply
shares the underlying directory. Each core will, at least as presently
envisioned, simply read the files that exist there and create their
own solrconfig object. Schema objects may be shared, but not config
objects. It may turn out to be relatively easy to do in the configset
situation, but last time I looked at sharing the underlying config
object it was too fraught with problems.

bq: 15K cores is around 4 minutes

I find this very odd. On my laptop, spinning disk, I think I was
seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I
have no idea what's going on here. If this is just reading the files,
you should be seeing horrible disk contention. Are you on some kind of
networked drive?

bq: To do that in background and to block on that request until core
discovery is complete, should not work for us (due to the worst case).
What other choices are there? Either you have to do it up front or
with some kind of blocking. Hmmm, I suppose you could keep some kind
of custom store (DB? File? ZooKeeper?) that would keep the last known
layout. You'd still have some kind of worst-case situation where the
core you were trying to load wouldn't be in your persistent store and
you'd _still_ have to wait for the discovery process to complete.

bq: and we will use the cores Auto option to create load or only load
the core on
Interesting. I can see how this could all work without any core
discovery but it does require a very specific setup.

On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier
<mailto:olivier.so...@worldline.com> wrote:
> The corresponding patch for Solr 4.2.1 LotsOfCores can be found in SOLR-5316, 
> including the new Cores options :
> - "numBuckets" to create a subdirectory based on a hash on the corename % 
> numBuckets in the core Datadir
> - "Auto" with 3 differents values :
>   1) false : default behaviour
>   2) createLoad : create, if not exist, and load the core on the fly on the 
> first incoming request (update, select)
>   3) onlyLoad : load the core on the fly on the first incoming request 
> (update, select), if exist on disk
>
> Concerning :
> - sharing the underlying solrconfig object, the configset introduced in JIRA 
> SOLR-4478 seems to be the solution for non-SolrCloud mode.
> We need to test it for our use case. If another solution exists, please tell 
> me. We are very interested in such functionality and to contribute, if we can.
>
> - the possibility of lotsOfCores in SolrCloud, we don't know in details how 
> SolrCloud is working.
> But one possible limit is the maximum number of entries that can be added to 
> a zookeeper node.
> Maybe, a solution will be just a kind of hashing in the zookeeper tree.
>
> - the time to discover cores in Solr 4.4 : with spinning disk under linux, 
> all cores with transient="true" and loadOnStartup="false", the linux buffer 
> cache empty before starting Solr :
> 15K cores is around 4 minutes. It's linear in the cores number, so for 50K 
> it's more than 13 minutes. In fact, it corresponding to the time to read all 
> core.properties files.
> To do that in background and to block on that request until core discovery is 
> complete, should not work for us (due to the worst case).
> So, we will just disable the core Discovery, because we don't need to know 
> all cores from the start. Start Solr without any core entries in solr.xml, 
> and we will use the cores Auto option to create load or only load the core on 
> the fly, based on the existence of the core on the disk (absolute path 
> calculated from the core name).
>
> Thanks for your interest,
>
> Olivier
> 
> De : Erick Erickson [erickerick...@gmail.com<mailto:erickerick...@gmail.com>]
> Date d'envoi : lundi 7 octobre 2013 14:33
> À : solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> Objet : Re: feedback on Solr 4.x LotsOfCores feature
>
> Thanks for the great writeup! It's always interesting to see how
> a feature plays out "in the real world". A couple of questions
> though:
>
> bq: We added 2 Cores options :
> Do you mean you patched Solr? If so are you willing to shard the code
> 

Re: feedback on Solr 4.x LotsOfCores feature

2013-10-22 Thread Soyez Olivier
Another way to "simulate" the core discovery is :
time find $PATH_TO_CORES -name core.properties -type f -exec cat '{}' > 
/dev/null 2>&1 \;

or just the core.properties read time  :
find $PATH_TO_CORES -name core.properties > cores.list
time for i in `cat cores.list`; do cat $i > /dev/null 2>&1; done;

Olivier

Le 19/10/2013 11:57, Erick Erickson a écrit :

For my quick-and-dirty test I just rebooted my machine totally and still
had 1K/sec core discovery. So this still puzzles me greatly. The time
do do this should be approximated by the time it takes to just walk
your tree, find all the core.properties and read them. I it possible to
just write a tiny Java program to do that? Or rip off the core discovery
code and use that for a small stand-alone program? Because this is quite
a bit at odds with what I've seen. Although now that I think about it,
the code has gone through some revisions since then, but I don't think
they should have affected this...

Best
Erick


On Fri, Oct 18, 2013 at 2:59 PM, Soyez Olivier
<mailto:olivier.so...@worldline.com>wrote:

> 15K cores is around 4 minutes : no network drive, just a spinning disk
> But, one important thing, to simulate a cold start or an useless linux
> buffer cache,
> I used the following command to empty the linux buffer cache :
> sync && echo 3 > /proc/sys/vm/drop_caches
> Then, I started Solr and I found the result above
>
>
> Le 11/10/2013 13:06, Erick Erickson a écrit :
>
>
> bq: sharing the underlying solrconfig object the configset introduced
> in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode
>
> SOLR-4478 will NOT share the underlying config objects, it simply
> shares the underlying directory. Each core will, at least as presently
> envisioned, simply read the files that exist there and create their
> own solrconfig object. Schema objects may be shared, but not config
> objects. It may turn out to be relatively easy to do in the configset
> situation, but last time I looked at sharing the underlying config
> object it was too fraught with problems.
>
> bq: 15K cores is around 4 minutes
>
> I find this very odd. On my laptop, spinning disk, I think I was
> seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I
> have no idea what's going on here. If this is just reading the files,
> you should be seeing horrible disk contention. Are you on some kind of
> networked drive?
>
> bq: To do that in background and to block on that request until core
> discovery is complete, should not work for us (due to the worst case).
> What other choices are there? Either you have to do it up front or
> with some kind of blocking. Hmmm, I suppose you could keep some kind
> of custom store (DB? File? ZooKeeper?) that would keep the last known
> layout. You'd still have some kind of worst-case situation where the
> core you were trying to load wouldn't be in your persistent store and
> you'd _still_ have to wait for the discovery process to complete.
>
> bq: and we will use the cores Auto option to create load or only load
> the core on
> Interesting. I can see how this could all work without any core
> discovery but it does require a very specific setup.
>
> On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier
> <mailto:olivier.so...@worldline.com><mailto:olivier.so...@worldline.com>
>  wrote:
> > The corresponding patch for Solr 4.2.1 LotsOfCores can be found in
> SOLR-5316, including the new Cores options :
> > - "numBuckets" to create a subdirectory based on a hash on the corename
> % numBuckets in the core Datadir
> > - "Auto" with 3 differents values :
> >   1) false : default behaviour
> >   2) createLoad : create, if not exist, and load the core on the fly on
> the first incoming request (update, select)
> >   3) onlyLoad : load the core on the fly on the first incoming request
> (update, select), if exist on disk
> >
> > Concerning :
> > - sharing the underlying solrconfig object, the configset introduced in
> JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode.
> > We need to test it for our use case. If another solution exists, please
> tell me. We are very interested in such functionality and to contribute, if
> we can.
> >
> > - the possibility of lotsOfCores in SolrCloud, we don't know in details
> how SolrCloud is working.
> > But one possible limit is the maximum number of entries that can be
> added to a zookeeper node.
> > Maybe, a solution will be just a kind of hashing in the zookeeper tree.
> >
> > - the time to discover cores in Solr 4.4 : with spinning disk under
> linux, all cores with transient="true" and

Remove indexes of XML file

2014-10-24 Thread Olivier Austina
Hi,

This is newbie question. I have indexed some documents using some XML files
as indicating in the tutorial
<http://lucene.apache.org/solr/4_10_1/tutorial.html> with the command :

java -jar post.jar *.xml

I have seen how to delete an index for one document but how to delete
all indexes
for documents within an XML file. For example if I have indexed some
files A, B, C, D etc.,
how to delete indexes of documents from file C. Is there a command
like above or other
solution without using individual ID? Thank you.


Regards
Olivier


Re: Remove indexes of XML file

2014-10-25 Thread Olivier Austina
Thank you Alex, I think I can use the file to delete corresponding indexes.

Regards
Olivier


2014-10-24 21:51 GMT+02:00 Alexandre Rafalovitch :

> You can delete individually, all (*:* query) or by specific query. So,
> if there is no common query pattern you may need to do a multi-id
> query - something like "id:(id1 id2 id3 id4)" which does require you
> knowing the IDs.
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 24 October 2014 15:44, Olivier Austina 
> wrote:
> > Hi,
> >
> > This is newbie question. I have indexed some documents using some XML
> files
> > as indicating in the tutorial
> > <http://lucene.apache.org/solr/4_10_1/tutorial.html> with the command :
> >
> > java -jar post.jar *.xml
> >
> > I have seen how to delete an index for one document but how to delete
> > all indexes
> > for documents within an XML file. For example if I have indexed some
> > files A, B, C, D etc.,
> > how to delete indexes of documents from file C. Is there a command
> > like above or other
> > solution without using individual ID? Thank you.
> >
> >
> > Regards
> > Olivier
>


OpenExchangeRates.Org rates in solr

2014-10-26 Thread Olivier Austina
Hi,

There is a way to see the OpenExchangeRates.Org
<http://www.OpenExchangeRates.Org> rates used in Solr somewhere. I have
changed the configuration to use these rates. Thank you.
Regards
Olivier


Re: OpenExchangeRates.Org rates in solr

2014-10-26 Thread Olivier Austina
Hi Will,

I am learning Solr now. I can use it  later for business or for free
access. Thank you.

Regards
Olivier


2014-10-26 17:32 GMT+01:00 Will Martin :

> Hi Olivier:
>
> Can you clarify this message? Are you using Solr at the business? Or are
> you giving free access to solr installations?
>
> Thanks,
> Will
>
>
> -Original Message-
> From: Olivier Austina [mailto:olivier.aust...@gmail.com]
> Sent: Sunday, October 26, 2014 10:57 AM
> To: solr-user@lucene.apache.org
> Subject: OpenExchangeRates.Org rates in solr
>
> Hi,
>
> There is a way to see the OpenExchangeRates.Org <
> http://www.OpenExchangeRates.Org> rates used in Solr somewhere. I have
> changed the configuration to use these rates. Thank you.
> Regards
> Olivier
>
>


Indexing documents/files for production use

2014-10-28 Thread Olivier Austina
Hi All,

I am reading the solr documentation. I have understood that post.jar
<http://wiki.apache.org/solr/ExtractingRequestHandler#SimplePostTool_.28post.jar.29>
is not meant for production use, cURL
<https://cwiki.apache.org/confluence/display/solr/Introduction+to+Solr+Indexing>
is not recommanded. Is SolrJ better for production?  Thank you.
Regards
Olivier


Re: Indexing documents/files for production use

2014-10-30 Thread Olivier Austina
Thank you Alexandre, Jürgen and Erick for your replies. It is clear for me.

Regards
Olivier


2014-10-28 23:35 GMT+01:00 Erick Erickson :

> And one other consideration in addition to the two excellent responses
> so far
>
> In a SolrCloud environment, SolrJ via CloudSolrServer will automatically
> route the documents to the correct shard leader, saving some additional
> overhead. Post.jar and cURL send the docs to a node, which in turn
> forward the docs to the correct shard leader which lowers
> throughput
>
> Best,
> Erick
>
> On Tue, Oct 28, 2014 at 2:32 PM, "Jürgen Wagner (DVT)"
>  wrote:
> > Hello Olivier,
> >   for real production use, you won't really want to use any toys like
> > post.jar or curl. You want a decent connector to whatever data source
> there
> > is, that fetches data, possibly massages it a bit, and then feeds it into
> > Solr - by means of SolrJ or directly into the web service of Solr via
> binary
> > protocols. This way, you can properly handle incremental feeding,
> processing
> > of data from remote locations (with the connector being closer to the
> data
> > source), and also source data security. Also think about what happens if
> you
> > do processing of incoming documents in Solr. What happens if Tika runs
> out
> > of memory because of PDF problems? What if this crashes your Solr node?
> In
> > our Solr projects, we generally do not do any sizable processing within
> Solr
> > as document processing and document indexing or querying have all
> different
> > scaling properties.
> >
> > "Production use" most typically is not achieved by deploying a vanilla
> Solr,
> > but rather having a bit more glue and wrappage, so the whole will fit
> your
> > requirements in terms of functionality, scaling, monitoring and
> robustness.
> > Some similar platforms like Elasticsearch try to alleviate these pains of
> > going to a production-style infrastructure, but that's at the expense of
> > flexibility and comes with limitations.
> >
> > For proof-of-concept or demonstrator-style applications, the plain tools
> out
> > of the box will be fine. For production applications, you want to have
> more
> > robust components.
> >
> > Best regards,
> > --Jürgen
> >
> >
> > On 28.10.2014 22:12, Olivier Austina wrote:
> >
> > Hi All,
> >
> > I am reading the solr documentation. I have understood that post.jar
> > <
> http://wiki.apache.org/solr/ExtractingRequestHandler#SimplePostTool_.28post.jar.29
> >
> > is not meant for production use, cURL
> > <
> https://cwiki.apache.org/confluence/display/solr/Introduction+to+Solr+Indexing
> >
> > is not recommanded. Is SolrJ better for production?  Thank you.
> > Regards
> > Olivier
> >
> >
> >
> > --
> >
> > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
> > уважением
> > i.A. Jürgen Wagner
> > Head of Competence Center "Intelligence"
> > & Senior Cloud Consultant
> >
> > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
> > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864
> 1543
> > E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
> >
> > 
> > Managing Board: Jürgen Hatzipantelis (CEO)
> > Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
> > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
> >
> >
>


UI for Solr

2014-12-23 Thread Olivier Austina
Hi,

I would like to build a User Interface on top of Solr for PC and mobile. I
am wondering if there is a framework, best practice commonly used. I want
Solr features such as suggestion, auto complete, facet to be available for
UI. Any suggestion is welcome. Than you.

Regards
Olivier


Re: UI for Solr

2014-12-23 Thread Olivier Austina
Hi Alex,

Thank you for prompt reply. I am not aware of Spring.io's Spring Data Solr.

Regards
Olivier


2014-12-23 16:50 GMT+01:00 Alexandre Rafalovitch :

> You don't expose Solr directly to the user, it is not setup for
> full-proof security out of the box. So you would need a client to talk
> to Solr.
>
> Something like Spring.io's Spring Data Solr could be one of the things
> to check. You can see an auto-complete example for it at:
> https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer/src/main
> and embedded in action at
> http://www.solr-start.com/javadoc/solr-lucene/index.html (search box
> on the top)
>
> Regards,
>Alex.
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 23 December 2014 at 10:45, Olivier Austina 
> wrote:
> > Hi,
> >
> > I would like to build a User Interface on top of Solr for PC and mobile.
> I
> > am wondering if there is a framework, best practice commonly used. I want
> > Solr features such as suggestion, auto complete, facet to be available
> for
> > UI. Any suggestion is welcome. Than you.
> >
> > Regards
> > Olivier
>


Architecture for PHP web site, Solr and an application

2014-12-26 Thread Olivier Austina
Hi,

I would like to query only some fields in Solr depend on the user input as
I know the fields.

The user send an HTML form to the PHP website. The application get the
fields and their content from the PHP web site. The application then
formulate a query to Solr based on this fields and other contextual
information. Only fields from the HTML form are used. The forms don't have
the same fields. The application is not yet developed. It could be in C++,
Java or other language using a database. It uses more resources.

I am wondering which architecture is suitable for this case:
-How to make the architecture scalable (to support more users)
-How to make PHP communicate with the application if this application is
not in PHP.

Any suggestion is welcome. Thank you.

 Regards
Olivier


How to implement Auto complete, suggestion client side

2015-01-26 Thread Olivier Austina
Hi All,

I would say I am new to web technology.

I would like to implement auto complete/suggestion in the user search box
as the user type in the search box (like Google for example). I am using
Solr as database. Basically I am  familiar with Solr and I can formulate
suggestion queries.

But now I don't know how to implement suggestion in the User Interface.
Which technologies should I need. The website is in PHP. Any suggestions,
examples, basic tutorial is welcome. Thank you.



Regards
Olivier


Re: How to implement Auto complete, suggestion client side

2015-01-28 Thread Olivier Austina
Hi,

Thank you Dan Davis and Alexandre Rafalovitch. This is very helpful for me.

Regards
Olivier


2015-01-27 0:51 GMT+01:00 Alexandre Rafalovitch :

> You've got a lot of options depending on what you want. But since you
> seem to just want _an_ example, you can use mine from
> http://www.solr-start.com/javadoc/solr-lucene/index.html (gray search
> box there).
>
> You can see the source for the test screen (using Spring Boot and
> Spring Data Solr as a middle-layer) and Select2 for the UI at:
> https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer.
> The Solr definition is at:
>
> https://github.com/arafalov/Solr-Javadoc/tree/master/JavadocIndex/JavadocCollection/conf
>
> Other implementation pieces are in that (and another) public
> repository as well, but it's all in Java. You'll probably want to do
> something similar in PHP.
>
> Regards,
>Alex.
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 26 January 2015 at 17:11, Olivier Austina 
> wrote:
> > Hi All,
> >
> > I would say I am new to web technology.
> >
> > I would like to implement auto complete/suggestion in the user search box
> > as the user type in the search box (like Google for example). I am using
> > Solr as database. Basically I am  familiar with Solr and I can formulate
> > suggestion queries.
> >
> > But now I don't know how to implement suggestion in the User Interface.
> > Which technologies should I need. The website is in PHP. Any suggestions,
> > examples, basic tutorial is welcome. Thank you.
> >
> >
> >
> > Regards
> > Olivier
>


feedback on Solr 4.x LotsOfCores feature

2013-10-07 Thread Soyez Olivier
Hello,

In my company, we use Solr in production to offer full text search on
mailboxes.
We host dozens million of mailboxes, but only webmail users have such
feature (few millions).
We have the following use case :
- non static indexes with more update (indexing and deleting), than
select requests (ratio 7:1)
- homogeneous configuration for all indexes
- not so much user at the same time

We started to index mailboxes with Solr 1.4 in 2010, on a subset of
400,000 users.
- we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr
instance
- we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per
index (~2 million users)
- we upgraded to Solr 3.5 in 2012
As indexes grew, IOPS and the response times have increased more and more.

The index size was mainly due to stored fields (large .fdt files)
Retrieving these fields from the index was costly, because of many seek
in large files, and no limit usage possible.
There is also an overhead on queries : too many results are filtered to
find only results concerning user.
For these reason and others, like not pooled users, hardware savings,
better scoring, some requests that do not support filtering, we have
decided to use the LotsOfCores feature.

Our goal was to change the current I/O usage : from lots of random I/O
access on huge segments to mostly sequential I/O access on small segments.
For our use case, it's not a big deal, that the first query to one not
yet loaded core will be slow.
And, we don’t need to fit all the cores into memory at once.

We started from the SOLR-1293 issue and the LotsOfCores wiki page to
finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1
core).
We don't need anymore to run so many Solr per node. We are now able to
have around 5 cores per Solr and we plan to grow to 100,000 cores
per instance.
In a first time, we used the solr.xml persistence. All cores have
loadOnStartup="false" and transient="true" attributes, so a cold start
is very quick. The response times were better than ever, in comparaison
with poor response times, we had before using LotsOfCores.

We added 2 Cores options :
- "numBuckets" to create a subdirectory based on a hash on the corename
% numBuckets in the core Datadir, because all cores cannot live in the
same directory
- "Auto" with 3 differents values :
1) false : default behaviour
2) createLoad : create, if not exist, and load the core on the fly on
the first incoming request (update, select).
3) onlyLoad : load the core on the fly on the first incoming request
(update, select), if exist on disk

Then, to improve performance and avoid synchronization in the solr.xml
persistence : we disabled it.
The drawback is we cannot see anymore all the availables cores list with
the admin core status command, only those warmed up.
Finally, we can achieve very good performances with Solr LotsOfCores :
- Index 5 emails (avg) + commit + search : x4.9 faster response time
(Mean), x5.4 faster (95th per)
- Delete 5 documents (avg) : x8.4 faster response time (Mean) x7.4
faster (95th per)
- Search : x3.7 faster response time (Mean) 4x faster (95th per)

In fact, the better performance is mainly due to the little size of each
index, but also thanks to the isolation between cores (updates and
queries on many mailboxes don’t have side effects to each other).
One important thing with the LotsOfCores feature is to take care of :
- the number of file descriptors, it used a lot (need to increase global
max and per process fd)
- the value of the transientCacheSize depending of the RAM size and the
PermGen allocated size
- the leak of ClassLoader that increase minor GC times, when CMS GC is
enabled (use -XX:+CMSClassUnloadingEnabled)
- the overhead to parse solrconfig.xml and load dependencies to open
each core
- lotsOfCores doesn’t work with SolrCloud, then we store indexes
location outside of Solr. We have Solr proxies to route requests to the
right instance.

Not in production, we try the core discovery feature in Solr 4.4 with a
lots of cores.
When you start, it spend a lot of times to discover cores due to a big
number of cores, meanwhile all requests fail (SolrDispatchFilter.init()
not done yet). It will be great to have for example an option for a core
discovery in background, or just to be able to disable it, like we do in
our use case.

If someone is interested in these new options for LotsOfCores feature,
just tell me


Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet ég

Re: Re: feedback on Solr 4.x LotsOfCores feature

2013-10-10 Thread Soyez Olivier
The corresponding patch for Solr 4.2.1 LotsOfCores can be found in SOLR-5316, 
including the new Cores options :
- "numBuckets" to create a subdirectory based on a hash on the corename % 
numBuckets in the core Datadir
- "Auto" with 3 differents values :
  1) false : default behaviour
  2) createLoad : create, if not exist, and load the core on the fly on the 
first incoming request (update, select)
  3) onlyLoad : load the core on the fly on the first incoming request (update, 
select), if exist on disk

Concerning :
- sharing the underlying solrconfig object, the configset introduced in JIRA 
SOLR-4478 seems to be the solution for non-SolrCloud mode.
We need to test it for our use case. If another solution exists, please tell 
me. We are very interested in such functionality and to contribute, if we can.

- the possibility of lotsOfCores in SolrCloud, we don't know in details how 
SolrCloud is working.
But one possible limit is the maximum number of entries that can be added to a 
zookeeper node.
Maybe, a solution will be just a kind of hashing in the zookeeper tree.

- the time to discover cores in Solr 4.4 : with spinning disk under linux, all 
cores with transient="true" and loadOnStartup="false", the linux buffer cache 
empty before starting Solr :
15K cores is around 4 minutes. It's linear in the cores number, so for 50K it's 
more than 13 minutes. In fact, it corresponding to the time to read all 
core.properties files.
To do that in background and to block on that request until core discovery is 
complete, should not work for us (due to the worst case).
So, we will just disable the core Discovery, because we don't need to know all 
cores from the start. Start Solr without any core entries in solr.xml, and we 
will use the cores Auto option to create load or only load the core on the fly, 
based on the existence of the core on the disk (absolute path calculated from 
the core name).

Thanks for your interest,

Olivier

De : Erick Erickson [erickerick...@gmail.com]
Date d'envoi : lundi 7 octobre 2013 14:33
À : solr-user@lucene.apache.org
Objet : Re: feedback on Solr 4.x LotsOfCores feature

Thanks for the great writeup! It's always interesting to see how
a feature plays out "in the real world". A couple of questions
though:

bq: We added 2 Cores options :
Do you mean you patched Solr? If so are you willing to shard the code
back? If both are "yes", please open a JIRA, attach the patch and assign
it to me.

bq:  the number of file descriptors, it used a lot (need to increase global
max and per process fd)

Right, this makes sense since you have a bunch of cores all with their
own descriptors open. I'm assuming that you hit a rather high max
number and it stays pretty steady

bq: the overhead to parse solrconfig.xml and load dependencies to open
each core

Right, I tried to look at sharing the underlying solrconfig object but
it seemed pretty hairy. There are some extensive comments in the
JIRA of the problems I foresaw. There may be some action on this
in the future.

bq: lotsOfCores doesn’t work with SolrCloud

Right, we haven't concentrated on that, it's an interesting problem.
In particular it's not clear what happens when nodes go up/down,
replicate, resynch, all that.

bq: When you start, it spend a lot of times to discover cores due to a big

How long? I tried 15K cores on my laptop and I think I was getting 15
second delays or roughly 1K cores discovered/second. Is your delay
on the order of 50 seconds with 50K cores?

I'm not sure how you could do that in the background, but I haven't
thought about it much. I tried multi-threading core discovery and that
didn't help (SSD disk), I assumed that the problem was mostly I/O
contention (but didn't prove it). What if a request came in for a core
before you'd found it? I'm not sure what the right behavior would be
except perhaps to block on that request until core discovery was
complete. Hm. How would that work for your case? That
seems do-able.

BTW, so far you get the prize for the most cores on a node I think.

Thanks again for the great feedback!

Erick

On Mon, Oct 7, 2013 at 3:53 AM, Soyez Olivier
 wrote:
> Hello,
>
> In my company, we use Solr in production to offer full text search on
> mailboxes.
> We host dozens million of mailboxes, but only webmail users have such
> feature (few millions).
> We have the following use case :
> - non static indexes with more update (indexing and deleting), than
> select requests (ratio 7:1)
> - homogeneous configuration for all indexes
> - not so much user at the same time
>
> We started to index mailboxes with Solr 1.4 in 2010, on a subset of
> 400,000 users.
> - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr
> instance
> - we grow to 6000 

Re: solr distributed search don't work

2011-09-01 Thread olivier sallou
   

 
   explicit
   enum
   1
   10
  192.168.1.6/solr/,192.168.1.7/solr/
 
  

2011/8/19 Li Li 

> could you please show me your configuration in solrconfig.xml?
>
> On Fri, Aug 19, 2011 at 5:31 PM, olivier sallou
>  wrote:
> > Hi,
> > I do not use spell but I use distributed search, using qt=spell is
> correct,
> > should not use qt=\spell.
> > For "shards", I specify it in solrconfig directly, not in url, but should
> > work the same.
> > Maybe an issue in your spell request handler.
> >
> >
> > 2011/8/19 Li Li 
> >
> >> hi all,
> >> I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
> >> but there is something wrong.
> >> the url given my the wiki is
> >>
> >>
> http://solr:8983/solr/select?q=*:*&spellcheck=true&spellcheck.build=true&spellcheck.q=toyata&qt=spell&shards.qt=spell&shards=solr-shard1:8983/solr,solr-shard2:8983/solr
> >> but it does not work. I trace the codes and find that
> >> qt=spell&shards.qt=spell should be qt=/spell&shards.qt=/spell
> >> After modification of url, It return all documents but nothing
> >> about spell check.
> >> I debug it and find the
> >> AbstractLuceneSpellChecker.getSuggestions() is called.
> >>
> >
>


Solr 3.5 MoreLikeThis on Date fields

2012-01-16 Thread Jaco Olivier
Hi Everyone,

Please help out if you know what is going on.
We are upgrading to Solr 3.5 (from 1.4.1) and busy with a Re-Index and Test on 
our data.

Everything seems OK, but Date Fields seem to be "broken" when using with the 
MoreLikeThis handler 
(I also saw the same error on Date Fields using the HighLighter in another 
forum post "Invalid Date String for highlighting any date field match @ Mon 
2011/08/15 13:10 ").
* I deleted the index/core and only loaded a few records and still get the 
error when using the MoreLikeThis using the "docdate" as part of the mlt.fl 
params.
* I double checked all the data that was loaded and the dates parse 100% and 
can see no problems with any of the data loaded.

Type: 
Definition:   
A sample result: 1999-06-28T00:00:00Z

THE MLT QUERY:

Jan 16, 2012 4:09:16 PM org.apache.solr.core.SolrCore execute
INFO: [legal_spring] webapp=/solr path=/select 
params={mlt.fl=doctitle,pld_pubtype,docdate,pld_cluster,pld_port,pld_summary,alltext,subclass&mlt.mintf=1&mlt=true&version=2.2&fl=doc_id,doctitle,docdate,prodtype&qt=mlt&mlt.boost=true&mlt.qf=doctitle^5.0+alltext^0.2&json.nl=map&wt=json&rows=50&mlt.mindf=1&mlt.count=50&start=0&q=doc_id:PLD23996}
 status=400 QTime=1

THE ERROR:

Jan 16, 2012 4:09:16 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'94046400'
at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
at 
org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:106)
at 
org.apache.solr.analysis.TrieTokenizer.(TrieTokenizerFactory.java:76)
at 
org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:51)
at 
org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:41)
at 
org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:68)
at 
org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:75)
at 
org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:385)
at 
org.apache.lucene.search.similar.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:876)
at 
org.apache.lucene.search.similar.MoreLikeThis.retrieveTerms(MoreLikeThis.java:820)
at 
org.apache.lucene.search.similar.MoreLikeThis.like(MoreLikeThis.java:629)
at 
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:311)
at 
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:149)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619)

Sincerely,
Jaco Olivier
Please note: This email and its content are subject to the disclaimer as 
displayed at the following link 
http://www.sabinet.co.za/?page=e-mail-disclaimer. Should you not have Web 
access, send an email to i...@sabinet.co.za<mailto:i...@sabinet.co.za> and a 
copy will be sent to you


Faceted search outofmemory

2010-06-29 Thread olivier sallou
Hi,
I try to make a faceted search on a very large index (around 200GB with 200M
doc).
I have an out of memory error. With no facet it works fine.

There are quite many questions around this but I could not find the answer.
How can we know the required memory when facets are used so that I try to
scale my server/index correctly to handle it.

Thanks

Olivier


Re: Faceted search outofmemory

2010-06-29 Thread olivier sallou
How do make paging over facets?

2010/6/29 Ankit Bhatnagar 

>
> Did you trying paging them?
>
>
> -Original Message-
> From: olivier sallou [mailto:olivier.sal...@gmail.com]
> Sent: Tuesday, June 29, 2010 2:04 PM
> To: solr-user@lucene.apache.org
> Subject: Faceted search outofmemory
>
> Hi,
> I try to make a faceted search on a very large index (around 200GB with
> 200M
> doc).
> I have an out of memory error. With no facet it works fine.
>
> There are quite many questions around this but I could not find the answer.
> How can we know the required memory when facets are used so that I try to
> scale my server/index correctly to handle it.
>
> Thanks
>
> Olivier
>


Re: Re: Faceted search outofmemory

2010-06-29 Thread olivier sallou
I already use facet.limit in my query. I tried however facet.method=enum and
though it does not seem to fix everything, I have some requests without the
outofmemory error.
Best would be to have a calculation rule of required memory for such type of
query.

2010/6/29 Markus Jelsma 

> http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit
>
> -Original message-
> From: olivier sallou 
> Sent: Tue 29-06-2010 20:11
> To: solr-user@lucene.apache.org;
> Subject: Re: Faceted search outofmemory
>
> How do make paging over facets?
>
> 2010/6/29 Ankit Bhatnagar 
>
> >
> > Did you trying paging them?
> >
> >
> > -Original Message-
> > From: olivier sallou [mailto:olivier.sal...@gmail.com]
> > Sent: Tuesday, June 29, 2010 2:04 PM
> > To: solr-user@lucene.apache.org
> > Subject: Faceted search outofmemory
> >
> > Hi,
> > I try to make a faceted search on a very large index (around 200GB with
> > 200M
> > doc).
> > I have an out of memory error. With no facet it works fine.
> >
> > There are quite many questions around this but I could not find the
> answer.
> > How can we know the required memory when facets are used so that I try to
> > scale my server/index correctly to handle it.
> >
> > Thanks
> >
> > Olivier
> >
>


Re: Faceted search outofmemory

2010-06-29 Thread olivier sallou
I have given 6G to Tomcat. Using facet.method=enum and facet.limit seems to
fix the issue with a few tests, but I do know that it is not a "final"
solution. Will work under certain configurations.

Real "issue" is to be able to know what is the required RAM for an index...

2010/6/29 Nagelberg, Kallin 

> How much memory have you given the solr jvm? Many servlet containers have
> small amount by default.
>
> -Kal
>
> -Original Message-
> From: olivier sallou [mailto:olivier.sal...@gmail.com]
> Sent: Tuesday, June 29, 2010 2:04 PM
> To: solr-user@lucene.apache.org
> Subject: Faceted search outofmemory
>
> Hi,
> I try to make a faceted search on a very large index (around 200GB with
> 200M
> doc).
> I have an out of memory error. With no facet it works fine.
>
> There are quite many questions around this but I could not find the answer.
> How can we know the required memory when facets are used so that I try to
> scale my server/index correctly to handle it.
>
> Thanks
>
> Olivier
>


Re: Tag generation

2010-07-15 Thread Olivier Dobberkau

Am 15.07.2010 um 17:34 schrieb kenf_nc:

> A colleague mentioned that he knew of services where you pass some content
> and it spits out some suggested Tags or Keywords that would be best suited
> to associate with that content.
> 
> Does anyone know if there is a contrib to Solr or Lucene that does something
> like this? Or a third party tool that can be given a solr index or solr
> query and it comes up with some good Tag suggestions?

Hi

there something from http://www.zemanta.com/
and something from basis tech http://www.basistech.com/

i am not sure if this would help. you could have a look at

http://uima.apache.org/

greetings,

olivier

--

Olivier Dobberkau



Spatial filtering

2010-07-19 Thread Olivier Ricordeau

Hi folks,

I can't manage to have the new spatial filtering feature (added in 
r962727 by Grant Ingersoll, see 
https://issues.apache.org/jira/browse/SOLR-1568) working. I'm trying to 
get all the documents located within a circle defined by its center and 
radius.
I've modified my query url as specified in 
http://wiki.apache.org/solr/SpatialSearch#Spatial_Filter_QParser to add 
the "pt", "d" and "meas" parameters. Here is what my query parameters 
looks like (from Solr's response with debug mode activated):


[params] => Array
(
[explainOther] => true
[mm] => 2<-75%
[d] => 50
[sort] => date asc
[qf] =>
[wt] => php
[rows] => 5000
[version] => 2.2
[fl] => object_type object_id score
[debugQuery] => true
[start] => 0
[q] => *:*
[meas] => hsin
[pt] => 48.85341,2.3488
[bf] =>
[qt] => standard
[fq] => +object_type:Concert 
+date:[2010-07-19T00:00:00Z TO 2011-07-19T23:59:59Z]

)



With this query, I get 3859 results. And some (lots) of the found 
documents are not located whithin the circle! :(
If I run the same query without spatial filtering (if I remove the "pt", 
"d" and "meas" parameters from the url), I get 3859 results too. So it 
looks like my spatial filtering constraint is not taken into account in 
the first search query (the one where "pt", "d" and "meas" are set). Is 
the wiki's doc up to date?


In the comments of SOLR-1568, I've seen someone talking about adding 
"{!sfilt fl=latlon_field_name}". So I tried the following request:


[params] => Array
(
[explainOther] => true
[mm] => 2<-75%
[d] => 50
[sort] => date asc
[qf] =>
[wt] => php
[rows] => 5000
[version] => 2.2
[fl] => object_type object_id score
[debugQuery] => true
[start] => 0
[q] => *:*
[meas] => hsin
[pt] => 48.85341,2.3488
[bf] =>
[qt] => standard
[fq] => +object_type:Concert 
+date:[2010-07-19T00:00:00Z TO 2011-07-19T23:59:59Z] +{!sfilt 
fl=coords_lat_lon,units=km,meas=hsin}

)

This leads to 2713 results (which is smaller than 3859, good). But some 
(lots) of the results are once more out of the circle :(


Can someone help me get spatial filtering working? I really don't 
understand the search results I'm getting.


Cheers,
Olivier

--
- *Olivier RICORDEAU* -
 oliv...@ricordeau.org
http://olivier.ricordeau.org



How to get the list of all available fields in a (sharded) index

2010-07-19 Thread olivier sallou
Hi,
I cannot find any info on how to get the list of current fields in an index
(possibly sharded). With dynamic fields, I cannot simply parse the schema to
know what field are available.
Is there any way to get it via a request (or easilly programmable) ? I know
information is available in one of the Lucene generated files, but I 'd like
to get it via a query for my whole index.

Thanks

Olivier


Re: dismax request handler without q

2010-07-19 Thread olivier sallou
Hi,
this is not very clear, if you need to query only keyphrase, why don't you
query directly it? e.g. q=keyphrase:hotel ?
Furthermore, why dismax if only keyphrase field is of interest? dismax is
used to query multiple fields automatically.

At least dismax do not appear in your query (using query type). It is set in
your config for your default request handler?

2010/7/20 Chamnap Chhorn 

> I wonder how could i make a query to return only *all books* that has
> keyphrase "web development" using dismax handler? A book has multiple
> keyphrases (keyphrase is multivalued column). Do I have to pass q
> parameter?
>
>
> Is it the correct one?
> http://locahost:8081/solr/select?&q=hotel&fq=keyphrase:%20hotel
>
> --
> Chhorn Chamnap
> http://chamnapchhorn.blogspot.com/
>


Re: Spatial filtering

2010-07-20 Thread Olivier Ricordeau



Le 20/07/2010 04:18, Lance Norskog a écrit :

Add the debugQuery=true parameter and it will show you the Lucene
query tree, and how each document is evaluated. This can help with the
more complex queries.


Do you see something wrong?

 [debug] => Array
(
[rawquerystring] => *:*
[querystring] => *:*
[parsedquery] => MatchAllDocsQuery(*:*)
[parsedquery_toString] => *:*
[explain] => Array
(
[doc_45269] =>
1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm

[doc_50206] =>
1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm

[doc_50396] =>
1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm

[doc_51199] =>
1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm

[]

)

[QParser] => LuceneQParser
[filter_queries] => Array
(
[0] => +object_type:Concert 
+date:[2010-07-20T00:00:00Z TO 2011-07-20T23:59:59Z] +{!sfilt 
fl=coords_lat_lon,units=km,meas=hsin}

)

[parsed_filter_queries] => Array
(
[0] => +object_type:Concert +date:[127958400 TO 
1311206399000] +name:{!sfilt TO fl=coords_lat_lon,units=km,meas=hsin}

)

[...]

I'm not sure about the "parsed_filter_queries" entry. It looks like the 
"+{!sfilt fl=coords_lat_lon,units=km,meas=hsin}" is not well interpreted 
(seems like it's interpreted as a range). Does anyone know what the 
right syntax? This is not documented...


Cheers,
Olivier



On Mon, Jul 19, 2010 at 3:35 AM, Olivier Ricordeau
  wrote:

Hi folks,

I can't manage to have the new spatial filtering feature (added in r962727
by Grant Ingersoll, see https://issues.apache.org/jira/browse/SOLR-1568)
working. I'm trying to get all the documents located within a circle defined
by its center and radius.
I've modified my query url as specified in
http://wiki.apache.org/solr/SpatialSearch#Spatial_Filter_QParser to add the
"pt", "d" and "meas" parameters. Here is what my query parameters looks like
(from Solr's response with debug mode activated):

[params] =>  Array
(
[explainOther] =>  true
[mm] =>  2<-75%
[d] =>  50
[sort] =>  date asc
[qf] =>
[wt] =>  php
[rows] =>  5000
[version] =>  2.2
[fl] =>  object_type object_id score
[debugQuery] =>  true
[start] =>  0
[q] =>  *:*
[meas] =>  hsin
[pt] =>  48.85341,2.3488
[bf] =>
[qt] =>  standard
[fq] =>  +object_type:Concert +date:[2010-07-19T00:00:00Z
TO 2011-07-19T23:59:59Z]
)



With this query, I get 3859 results. And some (lots) of the found documents
are not located whithin the circle! :(
If I run the same query without spatial filtering (if I remove the "pt", "d"
and "meas" parameters from the url), I get 3859 results too. So it looks
like my spatial filtering constraint is not taken into account in the first
search query (the one where "pt", "d" and "meas" are set). Is the wiki's doc
up to date?

In the comments of SOLR-1568, I've seen someone talking about adding
"{!sfilt fl=latlon_field_name}". So I tried the following request:

[params] =>  Array
(
[explainOther] =>  true
[mm] =>  2<-75%
[d] =>  50
[sort] =>  date asc
[qf] =>
[wt] =>  php
[rows] =>  5000
[version] =>  2.2
[fl] =>  object_type object_id score
[debugQuery] =>  true
[start] =>  0
[q] =>  *:*
[meas] =>  hsin
[pt] =>  48.85341,2.3488
[bf] =>
[qt] =>  standard
[fq] =>  +object_type:Concert +date:[2010-07-19T00:00:00Z
TO 2011-07-19T23:59:59Z] +{!sfilt fl=coords_lat_lon,units=km,meas=hsin}
)

This leads to 2713 results (which is smaller than 3859, good). But some
(lots) of the results are once more out of the circle :(

Can someone help me get spatial filtering working? I really don't understand
the search results I'm getting.

Cheers,
Olivier

--
- *Olivier RICORDEAU* -
  oliv...@ricordeau.org
http://olivier.ricordeau.org








--
- *Olivier RICORDEAU* -
 oliv...@ricordeau.org
http://olivier.ricordeau.org



Re: Spatial filtering

2010-07-20 Thread Olivier Ricordeau
Ok, I have found a big bug in my indexing script. Things are getting 
better. I managed to have my parsed_filter_query to:
+coords_lat_lon_0_latLon:[48.694179707855874 TO 49.01213545059667] 
+coords_lat_lon_1_latLon:[2.1079512793239767 TO 2.5911832073858765]


For the record, here are the parameters which made it work:
[params] => Array
(
[explainOther] => true
[mm] => 2<-75%
[d] => 25
[sort] => date asc
[qf] =>
[wt] => php
[rows] => 5000
[version] => 2.2
[fl] => * score
[debugQuery] => true
[start] => 0
[q] => *:*
[meas] => hsin
[pt] => 48.85341,2.3488
[bf] =>
[qt] => standard
[fq] => {!sfilt fl=coords_lat_lon} 
+object_type:Concert +date:[2008-07-20T00:00:00Z TO 2011-07-20T23:59:59Z]

)
But I am facing one problem: the " +object_type:Concert + 
date:[2008-07-20T00:00:00Z TO 2011-07-20T23:59:59Z]" part of my fq 
parameter is not taken into account (see the parsed_filter_query above).

So here is my question:
How can I mix the "{!sfilt fl=coords_lat_lon}" part of the fq parameter 
with "usual" fq parameters (eg: "+object_type:Concert")?


Can anyone help?

Regards,
Olivier


Le 20/07/2010 09:53, Olivier Ricordeau a écrit :



Le 20/07/2010 04:18, Lance Norskog a écrit :

Add the debugQuery=true parameter and it will show you the Lucene
query tree, and how each document is evaluated. This can help with the
more complex queries.


Do you see something wrong?

[debug] => Array
(
[rawquerystring] => *:*
[querystring] => *:*
[parsedquery] => MatchAllDocsQuery(*:*)
[parsedquery_toString] => *:*
[explain] => Array
(
[doc_45269] =>
1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm

[doc_50206] =>
1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm

[doc_50396] =>
1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm

[doc_51199] =>
1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm

[]

)

[QParser] => LuceneQParser
[filter_queries] => Array
(
[0] => +object_type:Concert +date:[2010-07-20T00:00:00Z TO
2011-07-20T23:59:59Z] +{!sfilt fl=coords_lat_lon,units=km,meas=hsin}
)

[parsed_filter_queries] => Array
(
[0] => +object_type:Concert +date:[127958400 TO 1311206399000]
+name:{!sfilt TO fl=coords_lat_lon,units=km,meas=hsin}
)

[...]

I'm not sure about the "parsed_filter_queries" entry. It looks like the
"+{!sfilt fl=coords_lat_lon,units=km,meas=hsin}" is not well interpreted
(seems like it's interpreted as a range). Does anyone know what the
right syntax? This is not documented...

Cheers,
Olivier



On Mon, Jul 19, 2010 at 3:35 AM, Olivier Ricordeau
 wrote:

Hi folks,

I can't manage to have the new spatial filtering feature (added in
r962727
by Grant Ingersoll, see https://issues.apache.org/jira/browse/SOLR-1568)
working. I'm trying to get all the documents located within a circle
defined
by its center and radius.
I've modified my query url as specified in
http://wiki.apache.org/solr/SpatialSearch#Spatial_Filter_QParser to
add the
"pt", "d" and "meas" parameters. Here is what my query parameters
looks like
(from Solr's response with debug mode activated):

[params] => Array
(
[explainOther] => true
[mm] => 2<-75%
[d] => 50
[sort] => date asc
[qf] =>
[wt] => php
[rows] => 5000
[version] => 2.2
[fl] => object_type object_id score
[debugQuery] => true
[start] => 0
[q] => *:*
[meas] => hsin
[pt] => 48.85341,2.3488
[bf] =>
[qt] => standard
[fq] => +object_type:Concert +date:[2010-07-19T00:00:00Z
TO 2011-07-19T23:59:59Z]
)



With this query, I get 3859 results. And some (lots) of the found
documents
are not located whithin the circle! :(
If I run the same query without spatial filtering (if I remove the
"pt", "d"
and "meas" parameters from the url), I get 3859 results too. So it looks
like my spatial filtering constraint is not taken into account in the
first
search query (the one where "pt", "d" and "meas" are set). Is the
wiki's doc
up to date?

In the comments of SOLR-1568, I've seen someone talking about adding
"{!sfilt fl=latlon_field_name}". So I tried the following request:

[params] => Array
(
[explainOther] => true
[mm] => 2<-75%
[d] => 50
[sort] => date asc
[qf] =>
[wt] => php
[rows] => 5000
[version] => 2.2
[fl] => object_type object_id score
[debugQuery] => true
[start] => 0
[q] => *:*
[meas] =>

Re: dismax request handler without q

2010-07-20 Thread olivier sallou
q will search in defaultSearchField if no field name is set, but you can
specify in your "q" param the fields you want to search into.

Dismax is a handler where you can specify to look in a number of fields for
the input query. In this case, you do not specify the fields and dismax will
look in the fields specified in its configuration.
However, by default, dismax is not used, it needs to be called help with the
query type parameter (qt=dismax).

In default solr config, you can call ...solr/select?q=keyphrase:hotel if
keyphrzase is a declared field in your schema

2010/7/20 Chamnap Chhorn 

> I can't put q=keyphrase:hotel in my request using dismax handler. It
> returns
> no result.
>
> On Tue, Jul 20, 2010 at 1:19 PM, Chamnap Chhorn  >wrote:
>
> > There are some default configuration on my solrconfig.xml that I didn't
> > show you. I'm a little confused when reading
> > http://wiki.apache.org/solr/DisMaxRequestHandler#q. I think q is for
> plain
> > user input query.
> >
> >
> > On Tue, Jul 20, 2010 at 12:08 PM, olivier sallou <
> olivier.sal...@gmail.com
> > > wrote:
> >
> >> Hi,
> >> this is not very clear, if you need to query only keyphrase, why don't
> you
> >> query directly it? e.g. q=keyphrase:hotel ?
> >> Furthermore, why dismax if only keyphrase field is of interest? dismax
> is
> >> used to query multiple fields automatically.
> >>
> >> At least dismax do not appear in your query (using query type). It is
> set
> >> in
> >> your config for your default request handler?
> >>
> >> 2010/7/20 Chamnap Chhorn 
> >>
> >> > I wonder how could i make a query to return only *all books* that has
> >> > keyphrase "web development" using dismax handler? A book has multiple
> >> > keyphrases (keyphrase is multivalued column). Do I have to pass q
> >> > parameter?
> >> >
> >> >
> >> > Is it the correct one?
> >> > http://locahost:8081/solr/select?&q=hotel&fq=keyphrase:%20hotel
> >> >
> >> > --
> >> > Chhorn Chamnap
> >> > http://chamnapchhorn.blogspot.com/
> >> >
> >>
> >
> >
> >
> > --
> > Chhorn Chamnap
> > http://chamnapchhorn.blogspot.com/
> >
>
>
>
> --
> Chhorn Chamnap
> http://chamnapchhorn.blogspot.com/
>


Solr and Lucene in South Africa

2010-07-30 Thread Jaco Olivier
Hi to all Solr/Lucene Users...

Out team had a discussion today regarding the Solr/Lucene community closer to 
home.
I am hereby putting out an SOS to all Solr/Lucene users in the South African 
market and wish to organize a meet-up (or user support group) if at all 
possible.
It would be great to share some triumphs and pitfalls that were experienced.

* Sorry for hogging the User Mailing list on non-technical question, but think 
this is the easiest way to get it done :)

Jaco Olivier
Web Specialist

Please note: This email and its content are subject to the disclaimer as 
displayed at the following link 
http://www.sabinet.co.za/?page=e-mail-disclaimer. Should you not have Web 
access, send an email to i...@sabinet.co.za<mailto:i...@sabinet.co.za> and a 
copy will be sent to you


Replication and CPU

2010-10-12 Thread Olivier RICARD

Hello,

I setup a server for the replication of Solr. I used 2 cores and for 
each one I specified the replication. I followed the tutorial on 
http://wiki.apache.org/solr/SolrReplication.


The replication is OK for each cores. However the CPU is used to 100% on 
the slave. The master and slave are 2 servers with the same hardware 
configuration. I don't understand which can cause the problem. The slave 
is launched by :



java -Dsolr.solr.home=/solr/multicore -Denable.master=false 
-Denable.slave=true -Xms512m -Xmx1536m -XX:+UseConcMarkSweepGC -jar 
start.jar


If I comment the replication the server is OK.

Anyone have an idea ?

Regards,
Olivier


Re: Replication and CPU

2010-10-12 Thread Olivier RICARD

Hello Peter,

On the slave server http://slave/solr/core0/admin/replication/index.jsp

Poll Interval00:30:00
Local Index Index Version: 1284026488242, Generation: 13102
Location: /solr/multicore/core0/data/index
Size: 26.9 GB
Times Replicated Since Startup: 289
Previous Replication Done At: Tue Oct 12 12:00:00 GMT+02:00 2010
Config Files Replicated At: 1286790818824
Config Files Replicated: [solrconfig_slave.xml]
Times Config Files Replicated Since Startup: 1
Next Replication Cycle At: Tue Oct 12 12:30:00 GMT+02:00 2010

The request Handler on the slave  :


name="masterUrl">http://master/solr/${solr.core.name}/replication

00:30:00



I increased the poll interval because I thought that there were too many 
changes. Currently there are no changes on the master and the slave is 
always to 100% of cpu.



On the master, I have



startup
commit
name="confFiles">solrconfig_slave.xml:solrconfig.xml,schema.xml,stopwords.txt,elevate.xml,protwords.txt,spellings.txt,synonyms.txt

    00:00:10



Regards,
Olivier


Le 12/10/2010 12:11, Peter Karich a écrit :

Hi Olivier,

maybe the slave replicates after startup? check replication status here:
http://localhost/solr/admin/replication/index.jsp

what is your poll frequency (could you paste the replication part)?

Regards,
Peter.


Hello,

I setup a server for the replication of Solr. I used 2 cores and for
each one I specified the replication. I followed the tutorial on
http://wiki.apache.org/solr/SolrReplication.

The replication is OK for each cores. However the CPU is used to 100%
on the slave. The master and slave are 2 servers with the same
hardware configuration. I don't understand which can cause the
problem. The slave is launched by :


java -Dsolr.solr.home=/solr/multicore -Denable.master=false
-Denable.slave=true -Xms512m -Xmx1536m -XX:+UseConcMarkSweepGC -jar
start.jar

If I comment the replication the server is OK.

Anyone have an idea ?

Regards,
Olivier








Re: Can solr index folder can be moved from one system to another?

2012-03-22 Thread olivier sallou
The index is not directory related, there is no path information in the
index. You can create an index then move it anywhere (or merge it with an
other one).

I often do this, there is no issue.

Olivier

2012/3/22 ravicv 

> Hi Tomás,
>
> I can not use Solr replcation in my scenario. My requirement is to gzip the
> solr index folder and send to dotnet system through webservice.
> Then in dotnet the same index folder should be unzipped and same folder
> should be used as an index folder through solrnet .
>
> Whether my requirement is possible?
>
> Thanks
> Ravi
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Can-solr-index-folder-can-be-moved-from-one-system-to-another-tp3844919p3847725.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

gpg key id: 4096R/326D8438  (keyring.debian.org)

Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438


Solr Cell and operations on metadata extracted

2011-05-16 Thread Olivier Tavard
Hi,



I have a question about Solr Cell please.

I index some files. For example, if I want to extract the filename, then use
a hash function on it like MD5 and then store it on Solr ; the correct way
is to use Tika « manually » to extract the metadata I want, do the
transformations on it and then send it to Solr ?

I can’t use directly Solr Cell in this case because I can't do modifications
on the metadata extracted, right ?





Thanks,



Olivier


Re: how to request for Json object

2011-06-02 Thread olivier sallou
ajax does not allow request to an other domain.
Only sway, unless using server side requests, is going through a proxy that
would hide the host origin so that ajax request think both servers are the
same

2011/6/2 Romi 

> How to parse Json through ajax when your ajax pager is on one
> server(Tomcat)and Json object is of onther server(solr server). i mean i
> have to make a request to another server, how can i do it .
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


SOlr upgrade: Invalid version (expected 2, but 1) error when using shards

2011-08-16 Thread olivier sallou
Hi,
I just migrated to solr 3.3 from 1.4.1.
My index is still in 1.4.1 format (will be migrated soon).

I have an error when I use sharding with the new version:

org.apache.solr.common.SolrException: java.lang.RuntimeException: Invalid
version (expected 2, but 1) or the data in not in 'javabin' format

However, if I request each shard independently (/request), answer is
correct. So the error is triggered only with the shard mechanism.

While I foresee to upgrade my indexes, I'd like to understand the issue,
e.g. is it an "upgrade" issue or don't shards support using an "old" format.

Thanks

Olivier


lucene 3 and merge/optimize

2011-08-18 Thread olivier sallou
Hi,
after an upgrade to solr/lucene 3, I tried to change the code to remove
deprecated functions  Though new MergePolicy etc... are not really
clear.

I have now issues with the merge and optimize functions.

I have a command line application (Java/Lucene api) that merge multiple
indexes in a single one, or optimize an existing index (this is done
offline)

When I execute my code, the merge creates a new index, but looks to contain
more files than before (with solr 4.1), why not...
When I try to optimize, code says OK, but I still have many files, segments
: (below for a very small example)
_0.fdt  _0.tis  _1.tii  _2.prx  _3.nrm  _4.frq  _5.fnm  _6.fdx  _7.fdt
 _7.tis  _8.tii  _9.prx  _a.nrm  _b.frq
_0.fdx  _1.fdt  _1.tis  _2.tii  _3.prx  _4.nrm  _5.frq  _6.fnm  _7.fdx
 _8.fdt  _8.tis  _9.tii  _a.prx  _b.nrm
_0.fnm  _1.fdx  _2.fdt  _2.tis  _3.tii  _4.prx  _5.nrm  _6.frq  _7.fnm
 _8.fdx  _9.fdt  _9.tis  _a.tii  _b.prx
_0.frq  _1.fnm  _2.fdx  _3.fdt  _3.tis  _4.tii  _5.prx  _6.nrm  _7.frq
 _8.fnm  _9.fdx  _a.fdt  _a.tis  _b.tii
_0.nrm  _1.frq  _2.fnm  _3.fdx  _4.fdt  _4.tis  _5.tii  _6.prx  _7.nrm
 _8.frq  _9.fnm  _a.fdx  _b.fdt  _b.tis
_0.prx  _1.nrm  _2.frq  _3.fnm  _4.fdx  _5.fdt  _5.tis  _6.tii  _7.prx
 _8.nrm  _9.frq  _a.fnm  _b.fdx  segments_1
_0.tii  _1.prx  _2.nrm  _3.frq  _4.fnm  _5.fdx  _6.fdt  _6.tis  _7.tii
 _8.prx  _9.nrm  _a.frq  _b.fnm  segments.gen

I'd like to reduce with the optimize or the merge to the minimum the number
of files, my index is read only and does not change.

Here is the code for optimize, am I doing something wrong?

 IndexWriterConfig conf = new
IndexWriterConfig(Version.LUCENE_33,newStandardAnalyzer(Version.
LUCENE_33));

 conf.setRAMBufferSizeMB(50);

 LogByteSizeMergePolicy policy = new LogByteSizeMergePolicy();

 policy.setMaxMergeDocs(10);

 conf.setMergePolicy(policy);

 IndexWriter writer =
newIndexWriter(FSDirectory.open(INDEX_DIR),getIndexConfig() );


  writer.optimize();

 writer.close();



Thanks


Olivier


Re: lucene 3 and merge/optimize

2011-08-18 Thread olivier sallou
answer to myself, to be checked...

I used policy.setMaxMergeDocs(10),  limiting to small number of filesat
least for merge.
I gonna test.

2011/8/18 olivier sallou 

> Hi,
> after an upgrade to solr/lucene 3, I tried to change the code to remove
> deprecated functions  Though new MergePolicy etc... are not really
> clear.
>
> I have now issues with the merge and optimize functions.
>
> I have a command line application (Java/Lucene api) that merge multiple
> indexes in a single one, or optimize an existing index (this is done
> offline)
>
> When I execute my code, the merge creates a new index, but looks to contain
> more files than before (with solr 4.1), why not...
> When I try to optimize, code says OK, but I still have many files, segments
> : (below for a very small example)
> _0.fdt  _0.tis  _1.tii  _2.prx  _3.nrm  _4.frq  _5.fnm  _6.fdx  _7.fdt
>  _7.tis  _8.tii  _9.prx  _a.nrm  _b.frq
> _0.fdx  _1.fdt  _1.tis  _2.tii  _3.prx  _4.nrm  _5.frq  _6.fnm  _7.fdx
>  _8.fdt  _8.tis  _9.tii  _a.prx  _b.nrm
> _0.fnm  _1.fdx  _2.fdt  _2.tis  _3.tii  _4.prx  _5.nrm  _6.frq  _7.fnm
>  _8.fdx  _9.fdt  _9.tis  _a.tii  _b.prx
> _0.frq  _1.fnm  _2.fdx  _3.fdt  _3.tis  _4.tii  _5.prx  _6.nrm  _7.frq
>  _8.fnm  _9.fdx  _a.fdt  _a.tis  _b.tii
> _0.nrm  _1.frq  _2.fnm  _3.fdx  _4.fdt  _4.tis  _5.tii  _6.prx  _7.nrm
>  _8.frq  _9.fnm  _a.fdx  _b.fdt  _b.tis
> _0.prx  _1.nrm  _2.frq  _3.fnm  _4.fdx  _5.fdt  _5.tis  _6.tii  _7.prx
>  _8.nrm  _9.frq  _a.fnm  _b.fdx  segments_1
> _0.tii  _1.prx  _2.nrm  _3.frq  _4.fnm  _5.fdx  _6.fdt  _6.tis  _7.tii
>  _8.prx  _9.nrm  _a.frq  _b.fnm  segments.gen
>
> I'd like to reduce with the optimize or the merge to the minimum the number
> of files, my index is read only and does not change.
>
> Here is the code for optimize, am I doing something wrong?
>
>  IndexWriterConfig conf = new 
> IndexWriterConfig(Version.LUCENE_33,newStandardAnalyzer(Version.
> LUCENE_33));
>
>  conf.setRAMBufferSizeMB(50);
>
>  LogByteSizeMergePolicy policy = new LogByteSizeMergePolicy();
>
>  policy.setMaxMergeDocs(10);
>
>  conf.setMergePolicy(policy);
>
>  IndexWriter writer = 
> newIndexWriter(FSDirectory.open(INDEX_DIR),getIndexConfig() );
>
>
>   writer.optimize();
>
>  writer.close();
>
>
>
> Thanks
>
>
> Olivier
>


Re: solr distributed search don't work

2011-08-19 Thread olivier sallou
Hi,
I do not use spell but I use distributed search, using qt=spell is correct,
should not use qt=\spell.
For "shards", I specify it in solrconfig directly, not in url, but should
work the same.
Maybe an issue in your spell request handler.


2011/8/19 Li Li 

> hi all,
> I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
> but there is something wrong.
> the url given my the wiki is
>
> http://solr:8983/solr/select?q=*:*&spellcheck=true&spellcheck.build=true&spellcheck.q=toyata&qt=spell&shards.qt=spell&shards=solr-shard1:8983/solr,solr-shard2:8983/solr
> but it does not work. I trace the codes and find that
> qt=spell&shards.qt=spell should be qt=/spell&shards.qt=/spell
> After modification of url, It return all documents but nothing
> about spell check.
> I debug it and find the
> AbstractLuceneSpellChecker.getSuggestions() is called.
>


Re: Solr CMS Integration

2009-08-07 Thread Olivier Dobberkau


Am 07.08.2009 um 19:01 schrieb wojtekpia:

I've been asked to suggest a framework for managing a website's  
content and
making all that content searchable. I'm comfortable using Solr for  
search,

but I don't know where to start with the content management system. Is
anyone using a CMS (open source or commercial) that you've  
integrated with
Solr for search and are happy with? This will be a consumer facing  
website

with a combination or articles, blogs, white papers, etc.



Hi Wojtek,

Have a look at TYPO3. http://typo3.org/
It is quite powerful.
Ingo and I are currently implementing a SOLR extension for it.
We currently use it at http://www.be-lufthansa.com/
Contact me if you want an insight.

Many greetings,

Olivier


--
Olivier Dobberkau
. . . . . . . . . . . . . .
Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstrasse 73
D 60329 Frankfurt/Main

Fon:  +49 (0)69 - 247 52 18 - 0
Fax:  +49 (0)69 - 247 52 18 - 99

Mail: olivier.dobber...@dkd.de
Web: http://www.dkd.de

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

Aktuelle Projekte:
http://bewegung.taz.de - Launch (Ruby on Rails)
http://www.hans-im-glueck.de - Relaunch (TYPO3)
http://www.proasyl.de - Relaunch (TYPO3)



Re: Showcase: Facetted Search for Wine using Solr

2009-09-28 Thread Olivier Dobberkau

Marian Steinbach schrieb:

On Sat, Sep 26, 2009 at 3:22 AM, Lance Norskog  wrote:
  

Have you seen this? It is another Solr/Typeo3 integration project.

http://forge.typo3.org/projects/show/extension-solr

Would you consider open-sourcing your Solr/Typo3 integration?




Hi Lance!

I wasn't aware of that extension. Having looked at the website, it
does something very different from what we did. The solr extension
mentioned above tries to provide a better website search for the Typo3
CMS on top of Solr.

Our integration doesn't index web pages but product data from an XML
file. I'd say the implementation is pretty much customer-specific so
that I don't see a real benefit of making it open source.

Regards,

Marian
  


hi marian.
our extension will be able to do see also once we have set up the 
indexing queue for the typo3 backend.
we have a concept called typo3 extensions connectors so that you will be 
able to add index documents to your index.
feel free to contact ingo about the contribution possibililies in our 
solr project.
if you use open source software you shoud definitly contribute. this 
gives you great karma.

or as we at typo3 say. inspire people to share!

olivier


Re: i want to use something like *query* similar to database - %query% like search

2009-12-02 Thread Olivier Dobberkau

Am 02.12.2009 um 09:55 schrieb amittripathi:

> its accepting the trailing wildcard character but solr is not accepting the
> leading wildcard character

The Error message says it all.

'*' or '?' not allowed as first character in WildcardQuery 

solr is not SQL.

Olivier

--

Olivier Dobberkau


RE: why no results?

2009-12-08 Thread Jaco Olivier
Hi Regan,

I am using STRING fields only for values that in most cases will be used
to FACET on..
I suggest using TEXT fields as per the default examples...

ALSO, remember that if you do not specify the "
solr.LowerCaseFilterFactory " that your search has just become case
sensitive.. I struggled with that one before, so make sure what you are
indexing is what you are searching for.
* Stick to the default examples that is provided with the SOLR distro
and you should be fine.


  








  
  







  
    

Jaco Olivier

-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 08 December 2009 06:15
To: solr-user@lucene.apache.org
Subject: Re: why no results?



Tom Hill-7 wrote:
> 
> Try solr.TextField instead.
> 


Thanks Tom,

I've replaced the  section above with...






deleted my index, restarted Solr and re-indexed my documents - but the
search still returns nothing.

Do I need to change the type in the  sections as well?

regan
-- 
View this message in context:
http://old.nabble.com/why-no-results--tp26688249p26688469.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.


RE: why no results?

2009-12-08 Thread Jaco Olivier
Hi,

Try changing your TEXT field to type "text"
 (without the  of course :))

That is your problem... also use the "text" type as per default examples
with SOLR distro :)

Jaco Olivier


-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 08 December 2009 05:44
To: solr-user@lucene.apache.org
Subject: why no results?


hi all - newbie solr question - I've indexed some documents and can
search /
receive results using the following schema - BUT ONLY when searching on
the
"id" field. If I try searching on the title, subtitle, body or text
field I
receive NO results. Very confused. :confused: Can anyone see anything
obvious I'm doing wrong Regan.











 






 

 
 id

 
 text

 
 

 






-- 
View this message in context:
http://old.nabble.com/why-no-results--tp26688249p26688249.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.


RE: do copyField's need to exist as Fields?

2009-12-08 Thread Jaco Olivier
Hi Regan,

Something I noticed on your setup...
The ID field in your setup I assume to be your uniqueID for the book or
journal (The ISSN or something)
Try making this a string as TEXT is not the ideal field to use for
unique IDs



Congrats on figuring out SOLR fields - I suggest getting the SOLR 1.4
Book.. It really saved me a 1000 questions on this mailing list :)

Jaco Olivier

-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 09 December 2009 00:48
To: solr-user@lucene.apache.org
Subject: Re: do copyField's need to exist as Fields?



regany wrote:
> 
> Is there a different way I should be setting it up to achieve the
above??
> 


Think I figured it out.

I set up the  so they are present, but get ignored accept for
the
"text" field which gets indexed...







and then copyField the first 4 fields to the "text" field:







Seems to be working!? :drunk:
-- 
View this message in context:
http://old.nabble.com/do-copyField%27s-need-to-exist-as-Fields--tp267017
06p26702224.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.


Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau


Am 04.02.2009 um 13:33 schrieb Anto Binish Kaspar:


Hi,
I am trying to configure solr on ubuntu server and I am getting the  
following exception. I can able work it on windows box.



Hi Anto.

Have you installed the solr package 1.2 from ubuntu?
Or the release 1.3 as war file?

Olivier

--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)


Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau


Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar:


Hi Olivier

Thanks for your quick reply. I am using the release 1.3 as war file.

- Anto Binish Kaspar


OK.
As far a i understood you need to make sure that your solr home is set.
this needs to be done in

Quting:

http://wiki.apache.org/solr/SolrTomcat

In addition to using the default behavior of relying on the Solr Home  
being in the current working directory (./solr) you can alternately  
add the solr.solr.home system property to your JVM settings before  
starting Tomcat...


export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/my/custom/solr/home/dir/"

...or use a Context file to configure the Solr Home using JNDI

A Tomcat context fragments can be used to configure the JNDI property  
needed to specify your Solr Home directory.


Just put a context fragment file under $CATALINA_HOME/conf/Catalina/ 
localhost that looks something like this...


$ cat /tomcat55/conf/Catalina/localhost/solr.xml


   



Greetings,

Olivier

PS: May be it would be great if we could provide an ubuntu dpkg with  
1.3 ? Any takers?


--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)


Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau

A slash?

Olivier

Von meinem iPhone gesendet


Am 04.02.2009 um 14:06 schrieb Anto Binish Kaspar :


I am using Context file, here is my solr.xml

$ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml






I change the ownership of the folder (usr/local/solr/solr-1.3/solr)  
to tomcat6:tomcat6 from root:root


Anything I am missing?

- Anto Binish Kaspar


-Original Message-
From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de]
Sent: Wednesday, February 04, 2009 6:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Severe errors in solr configuration


Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar:


Hi Olivier

Thanks for your quick reply. I am using the release 1.3 as war file.

- Anto Binish Kaspar


OK.
As far a i understood you need to make sure that your solr home is  
set.

this needs to be done in

Quting:

http://wiki.apache.org/solr/SolrTomcat

In addition to using the default behavior of relying on the Solr Home
being in the current working directory (./solr) you can alternately
add the solr.solr.home system property to your JVM settings before
starting Tomcat...

export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/my/custom/solr/home/ 
dir/"


...or use a Context file to configure the Solr Home using JNDI

A Tomcat context fragments can be used to configure the JNDI property
needed to specify your Solr Home directory.

Just put a context fragment file under $CATALINA_HOME/conf/Catalina/
localhost that looks something like this...

$ cat /tomcat55/conf/Catalina/localhost/solr.xml


   


Greetings,

Olivier

PS: May be it would be great if we could provide an ubuntu dpkg with
1.3 ? Any takers?

--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)



Re: Severe errors in solr configuration

2009-02-04 Thread Olivier Dobberkau


Am 04.02.2009 um 15:50 schrieb Anto Binish Kaspar:

Yes I removed, still I have the same issue. Any idea what may be  
cause of this issue?



Have you solved your problem?

Olivier
--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)


Re: Severe errors in solr configuration

2009-02-05 Thread Olivier Dobberkau


Am 05.02.2009 um 12:07 schrieb Anto Binish Kaspar:


Do I need to give some permissions to the folder?



i would guess so.

Olivier
--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)


Apachecon 2009 Europe

2009-03-27 Thread Olivier Dobberkau

Hi all,

you came back with a head full of impressions from Apachecon Europe.
Thanks a lot for the great Speeches and the inspiring personal talks.

I strongly believe that solr will have great future.

Olivier

--
Olivier Dobberkau
d.k.d Internet Service GmbH
fon:  +49 (0)69 - 43 05 61-70 fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de home: http://www.dkd.de


Re: indexing/crawling HTML + solr

2009-06-03 Thread Olivier Dobberkau

Hi

Have à Look at the droids project in The incubator.

Olivier

Von meinem iPhone gesendet


Am 03.06.2009 um 12:09 schrieb Gena Batsyan :


Hi!

to be short, where to start with the subject?

Any pointers to some [semi-]functional solutions that crawl the web  
as a normal crawler, take care about html parsing, etc, and feed the  
crawled stuff as solr-documents per   ?


regards!



Re: Best approach to multiple languages

2009-07-22 Thread Olivier Dobberkau


Am 22.07.2009 um 18:31 schrieb Ed Summers:


In case you are curious I've attached a copy of our schema.xml to give
you an idea of what we did.



Thanks for sharing!

--
Olivier Dobberkau


Re: How to set User.dir or CWD for Solr during Tomcat startup

2010-01-07 Thread Olivier Dobberkau

Am 07.01.2010 um 00:07 schrieb Turner, Robbin J:

> I've been doing a bunch of googling and haven't seen if there is a parameter 
> to set within Tomcat other than the solr/home which is setup in the solr.xml 
> under the $CATALINA_HOME/conf/Catalina/localhost/.

Hi.

We set this in solr.xml


   


http://wiki.apache.org/solr/SolrTomcat#Simple_Example_Install

hope this helps.

olivier

--

Olivier Dobberkau
. . . . . . . . . . . . . .
Je TYPO3, desto d.k.d



Re: Interesting stuff; Solr as a syslog store.

2010-02-12 Thread Olivier Dobberkau

Am 13.02.2010 um 03:02 schrieb Antonio Lobato:

> Just thought this would be a neat story to share with you all.  I've really 
> grown to love Solr, it's something else!

Hi Antonio,

Great.

Would you also share the source code somewhere! 
May the Source be with you. 

Thanks.

Olivier




Re: ubuntu lucid package

2010-04-30 Thread Olivier Dobberkau

Am 30.04.2010 um 09:24 schrieb Gora Mohanty:

> Also, the standard Debian/Ubuntu way of finding out what files a
> package installed is:
>  dpkg -l 
> 
> Regards,
> Gora

You might try:

# dpkg -L solr-common
/.
/etc
/etc/solr
/etc/solr/web.xml
/etc/solr/conf
/etc/solr/conf/admin-extra.html
/etc/solr/conf/elevate.xml
/etc/solr/conf/mapping-ISOLatin1Accent.txt
/etc/solr/conf/protwords.txt
/etc/solr/conf/schema.xml
/etc/solr/conf/scripts.conf
/etc/solr/conf/solrconfig.xml
/etc/solr/conf/spellings.txt
/etc/solr/conf/stopwords.txt
/etc/solr/conf/synonyms.txt
/etc/solr/conf/xslt
/etc/solr/conf/xslt/example.xsl
/etc/solr/conf/xslt/example_atom.xsl
/etc/solr/conf/xslt/example_rss.xsl
/etc/solr/conf/xslt/luke.xsl
/usr
/usr/share
/usr/share/solr
/usr/share/solr/WEB-INF
/usr/share/solr/WEB-INF/lib
/usr/share/solr/WEB-INF/lib/apache-solr-core-1.4.0.jar
/usr/share/solr/WEB-INF/lib/apache-solr-dataimporthandler-1.4.0.jar
/usr/share/solr/WEB-INF/lib/apache-solr-solrj-1.4.0.jar
/usr/share/solr/WEB-INF/weblogic.xml
/usr/share/solr/scripts
/usr/share/solr/scripts/abc
/usr/share/solr/scripts/abo
/usr/share/solr/scripts/backup
/usr/share/solr/scripts/backupcleaner
/usr/share/solr/scripts/commit
/usr/share/solr/scripts/optimize
/usr/share/solr/scripts/readercycle
/usr/share/solr/scripts/rsyncd-disable
/usr/share/solr/scripts/rsyncd-enable
/usr/share/solr/scripts/rsyncd-start
/usr/share/solr/scripts/rsyncd-stop
/usr/share/solr/scripts/scripts-util
/usr/share/solr/scripts/snapcleaner
/usr/share/solr/scripts/snapinstaller
/usr/share/solr/scripts/snappuller
/usr/share/solr/scripts/snappuller-disable
/usr/share/solr/scripts/snappuller-enable
/usr/share/solr/scripts/snapshooter
/usr/share/solr/admin
/usr/share/solr/admin/_info.jsp
/usr/share/solr/admin/action.jsp
/usr/share/solr/admin/analysis.jsp
/usr/share/solr/admin/analysis.xsl
/usr/share/solr/admin/distributiondump.jsp
/usr/share/solr/admin/favicon.ico
/usr/share/solr/admin/form.jsp
/usr/share/solr/admin/get-file.jsp
/usr/share/solr/admin/get-properties.jsp
/usr/share/solr/admin/header.jsp
/usr/share/solr/admin/index.jsp
/usr/share/solr/admin/jquery-1.2.3.min.js
/usr/share/solr/admin/meta.xsl
/usr/share/solr/admin/ping.jsp
/usr/share/solr/admin/ping.xsl
/usr/share/solr/admin/raw-schema.jsp
/usr/share/solr/admin/registry.jsp
/usr/share/solr/admin/registry.xsl
/usr/share/solr/admin/replication
/usr/share/solr/admin/replication/header.jsp
/usr/share/solr/admin/replication/index.jsp
/usr/share/solr/admin/schema.jsp
/usr/share/solr/admin/solr-admin.css
/usr/share/solr/admin/solr_small.png
/usr/share/solr/admin/stats.jsp
/usr/share/solr/admin/stats.xsl
/usr/share/solr/admin/tabular.xsl
/usr/share/solr/admin/threaddump.jsp
/usr/share/solr/admin/threaddump.xsl
/usr/share/solr/admin/debug.jsp
/usr/share/solr/admin/dataimport.jsp
/usr/share/solr/favicon.ico
/usr/share/solr/index.jsp
/usr/share/doc
/usr/share/doc/solr-common
/usr/share/doc/solr-common/changelog.Debian.gz
/usr/share/doc/solr-common/README.Debian
/usr/share/doc/solr-common/TODO.Debian
/usr/share/doc/solr-common/copyright
/usr/share/doc/solr-common/changelog.gz
/usr/share/doc/solr-common/NOTICE.txt.gz
/usr/share/doc/solr-common/README.txt.gz
/var
/var/lib
/var/lib/solr
/var/lib/solr/data
/usr/share/solr/WEB-INF/lib/xml-apis.jar
/usr/share/solr/WEB-INF/lib/xml-apis-ext.jar
/usr/share/solr/WEB-INF/lib/slf4j-jdk14.jar
/usr/share/solr/WEB-INF/lib/slf4j-api.jar
/usr/share/solr/WEB-INF/lib/lucene-spellchecker.jar
/usr/share/solr/WEB-INF/lib/lucene-snowball.jar
/usr/share/solr/WEB-INF/lib/lucene-queries.jar
/usr/share/solr/WEB-INF/lib/lucene-highlighter.jar
/usr/share/solr/WEB-INF/lib/lucene-core.jar
/usr/share/solr/WEB-INF/lib/lucene-analyzers.jar
/usr/share/solr/WEB-INF/lib/jetty-util.jar
/usr/share/solr/WEB-INF/lib/jetty.jar
/usr/share/solr/WEB-INF/lib/commons-io.jar
/usr/share/solr/WEB-INF/lib/commons-httpclient.jar
/usr/share/solr/WEB-INF/lib/commons-fileupload.jar
/usr/share/solr/WEB-INF/lib/commons-csv.jar
/usr/share/solr/WEB-INF/lib/commons-codec.jar
/usr/share/solr/WEB-INF/web.xml
/usr/share/solr/conf

If i reckon correctly some parts of apache solr will not work with the ubuntu 
lucid distribution.

http://solr.dkd.local/update/extract
 throws an error:

The server encountered an internal error (lazy loading error
org.apache.solr.common.SolrException: lazy loading error at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
at

Maybe someone from ubuntu reading this list can confirm this.

Olivier
--

Olivier Dobberkau

d.k.d Internet Service GmbH
Kaiserstraße 73
60329 Frankfurt/Main

mail: olivier.dobber...@dkd.de
web: http://www.dkd.de


Solr 1.4 query fails against all fields, but succeed if field is specified.

2010-05-31 Thread olivier sallou
Hi,
I have created in index with several fields.
If I query my index in the admin section of solr (or via http request), I
get results for my search if I specify the requested field:
Query:   note:Aspergillus  (look for "Aspergillus" in field "note")
However, if I query the same word against all fields  ("Aspergillus" or
"all:Aspergillus") , I have no match in response from Solr.

Do you have any idea of what can be wrong with my index?

Regards

Olivier


Re: Solr 1.4 query fails against all fields, but succeed if field is specified.

2010-05-31 Thread olivier sallou
Ok,
I use default e.g. standard request handler.
Using "*:Aspergillus" does not work either.

I can try with DisMax but this means that I know all field names. My schema
knows a number of them, but some other fields are defined via dynamic fields
(I know the type, but I do not know their names).
Is there any way to query all fields including dynamic ones?

thanks

Olivier

2010/5/31 Michael Kuhlmann 

> Am 31.05.2010 11:50, schrieb olivier sallou:
> > Hi,
> > I have created in index with several fields.
> > If I query my index in the admin section of solr (or via http request), I
> > get results for my search if I specify the requested field:
> > Query:   note:Aspergillus  (look for "Aspergillus" in field "note")
> > However, if I query the same word against all fields  ("Aspergillus" or
> > "all:Aspergillus") , I have no match in response from Solr.
>
> Querying "Aspergillus" without a field does only work if you're using
> DisMaxHandler.
>
> Do you have a field "all"?
>
> Try "*:Aspergillus" instead.
>


Re: Solr 1.4 query fails against all fields, but succeed if field is specified.

2010-05-31 Thread olivier sallou
I finally got a solution. As I use dynamic fields. I use the copyField to a
global indexed attribute, and specify this attribute as defaultSearchField
in my schema.

The *:term with "standard" query type fails without this...

This solution requires to double the required indexing data but works in all
cases...

In my schema I have:

Some other fields are "lowercase" or "int" types.

Regards

2010/5/31 Michael Kuhlmann 

> Am 31.05.2010 12:36, schrieb olivier sallou:
> > Is there any way to query all fields including dynamic ones?
>
> Yes, using the *:term query. (Please note that the asterisk should not
> be quoted.)
>
> To answer your question, we need more details on your Solr
> configuration, esp. the part of schema.xml that defines your "note" field.
>
> Greetings,
> Michael
>
>
>


Re: newbie question on how to batch commit documents

2010-06-01 Thread olivier sallou
I would additionally suggest to use embeddedSolrServer for large uploads if
possible, performance are better.

2010/5/31 Steve Kuo 

> I have a newbie question on what is the best way to batch add/commit a
> large
> collection of document data via solrj.  My first attempt  was to write a
> multi-threaded application that did following.
>
> Collection docs = new ArrayList();
> for (Widget w : widges) {
>doc.addField("id", w.getId());
>doc.addField("name", w.getName());
>   doc.addField("price", w.getPrice());
>doc.addField("category", w.getCat());
>doc.addField("srcType", w.getSrcType());
>docs.add(doc);
>
>// commit docs to solr server
>server.add(docs);
>server.commit();
> }
>
> And I got this exception.
>
> rg.apache.solr.common.SolrException:
>
> Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later
>
>
> Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later
>
>at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
>at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
>at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>at
> org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86)
>
> The solrj wiki/documents seemed to indicate that because multiple threads
> were calling SolrServer.commit() which in term called
> CommonsHttpSolrServer.request() resulting in multiple searchers.  My first
> thought was to change the configs for autowarming.  But after looking at
> the
> autowarm params, I am not sure what can be changed or perhaps a different
> approach is recommened.
>
>  class="solr.FastLRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
>  class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
>  class="solr.LRUCache"
>  size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
> Your help is much appreciated.
>


Re: solr itas

2010-06-11 Thread olivier sallou
did you update solrconfig.xml to add /itas query handler?

2010/6/11 

> Hi,
>
> When I type http://127.0.0.1:8080/solr/itas
>
> I receive this result in the webpage instead of html page. Does anyone
> know the reason and/or suggestion to fix it.
>
> 
> - 
> - 
>  0
>  62
>  
> - 
> - 
>  1.0
> - 
>  Lucid Imagination
>  
> - 
>  USA
>  
> - 
>
>
>
>
> Thanks,
>
>
>


Need help on Solr Cell usage with specific Tika parser

2010-06-14 Thread olivier sallou
Hi,
I use Solr Cell to send specific content files. I developped a dedicated
Parser for specific mime types.
However I cannot get Solr accepting my new mime types.

In solrconfig, in update/extract requesthandler I specified ./tika-config.xml , where tika-config.xml is in
conf directory (same as solrconfig).

In tika-config I added my mimetypes:


biosequence/document
biosequence/embl
biosequence/genbank


I do not know for:
  

whereas path to tika mimetypes should be absolute or relative... and even if
this file needs to be redefined if "magic" is not used.


When I run my update/extract, I have an error that "biosequence/document"
does not match any known parser.

Thanks

Olivier


Re: Need help on Solr Cell usage with specific Tika parser

2010-06-14 Thread olivier sallou
Yeap, I do.
As magic is not set, this is the reason why it looks for this specific
mime-type. Unfortunatly, It seems it either do not read my specific
tika-config file or the mime-type file. But there is no error log concerning
those files... (not trying to load them?)


2010/6/14 Ken Krugler 

> Hi Olivier,
>
> Are you setting the mime type explicitly via the stream.type parameter?
>
> -- Ken
>
>
> On Jun 14, 2010, at 9:14am, olivier sallou wrote:
>
>  Hi,
>> I use Solr Cell to send specific content files. I developped a dedicated
>> Parser for specific mime types.
>> However I cannot get Solr accepting my new mime types.
>>
>> In solrconfig, in update/extract requesthandler I specified > name="tika.config">./tika-config.xml , where tika-config.xml is in
>> conf directory (same as solrconfig).
>>
>> In tika-config I added my mimetypes:
>>
>> > class="org.irisa.genouest.tools.readseq.ReadSeqParser">
>>   biosequence/document
>>   biosequence/embl
>>   biosequence/genbank
>>   
>>
>> I do not know for:
>>  
>>
>> whereas path to tika mimetypes should be absolute or relative... and even
>> if
>> this file needs to be redefined if "magic" is not used.
>>
>>
>> When I run my update/extract, I have an error that "biosequence/document"
>> does not match any known parser.
>>
>> Thanks
>>
>> Olivier
>>
>
> 
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>


Re: Need help on Solr Cell usage with specific Tika parser

2010-06-15 Thread olivier sallou
Thanks,
moving it to direcxt child worked.

Olivier

2010/6/14 Chris Hostetter 

>
> : In solrconfig, in update/extract requesthandler I specified  : name="tika.config">./tika-config.xml , where tika-config.xml is in
> : conf directory (same as solrconfig).
>
> can you show us the full requestHandler decalration? ... tika.config needs
> to be a direct child of the requestHandler (not in the defaults)
>
> I also don't know if using a "local" path like that will work -- depends
> on how that file is loaded (if solr loads it, then you might want to
> remove the "./";  if solr just gives the path to tika, then you probably
> need an absolute path.
>
>
> -Hoss
>
>


ConfigSet API V2 issue with configSetProp.property present

2018-11-12 Thread Olivier Tavard
Hi,

I have an issue for creating a configset with the V2 API using a configset
property.
Indeed if I enter the command :
curl -X POST -H 'Content-type: application/json' -d '{ "create":{"name":
"Test", "baseConfigSet": "myConfigSet","configSetProp.immutable":
"false"}}'  http://localhost:8983/api/cluster/configs?omitHeader=true
(same one than in the documentation :
https://lucene.apache.org/solr/guide/7_5/configsets-api.html)
It fails with the error :
"errorMessages":["Unknown field 'configSetProp.immutable' in object : {\n
\"name\":\"Test\",\n  \"baseConfigSet\":\"myConfigSet\",\n
\"configSetProp.immutable\":\"false\"}"]}],
"msg":"Error in command payload",
"code":400}}

If I enter the same command still with the V2 API without the
configSetProp.immutable property it succeeds.

With the V1 API, no problem with or without the presence of the configset
property.

The tests were done with Solr 7.4 and Solr 7.5.

Did I miss something with the configset property usage ?

Thanks,
Best regards,
Olivier


Backup collections using SolrJ

2018-05-04 Thread Olivier Tavard
Hi,

I have a question regarding the backup of a Solr collection using SolrJ. I
use Solr 7.
I want to do a JAR for that and launch it into a cron job.

So far, no problem for the request using
CollectionAdminRequest.backupCollection then I use the processAsync method.

The command is well transmitted to Solr.

My problem is for parsing the response and manage the different cases in
the code for a failure.

Let's say that the Solr response is the following after sending the
asynchronous backup request (the request id is "backupsolr")  :

{
"responseHeader": {
"status": 0,
"QTime": 1
},
"success": {
"IP:8983_solr": {
"responseHeader": {
"status": 0,
"QTime": 0
}
},
"IP:8983_solr": {
"responseHeader": {
"status": 0,
"QTime": 0
}
}
},
"solrbackup5704378348890743": {
"responseHeader": {
"status": 0,
"QTime": 0
},
"STATUS": "failed",
"Response": "Failed to backup core=Test_shard1_replica1 because
java.io.IOException: Aucun espace disponible sur le périphérique"
},
"status": {
"state": "completed",
"msg": "found [solrbackup] in completed tasks"
}
}
If I use the code :
System.out.println(CollectionAdminRequest.requestStatus("solrbackup
").process(solr).getRequestStatus());

The output is : "COMPLETED".
But it is not enough to check if the backup was well done or not. For
example in this case the task is completed but the backup was not
successful because there was not enough space left on the disk.
So the interesting part is into the solrbackup5704378348890743 section of
the response.

My first question is why some numbers are added to the request-id name ?

Because if I write :
CollectionAdminRequest.requestStatus("solrbackup").getRequestId() the
response is : "solrbackup" and not solrbackup5704378348890743.
So retrieving the section related to solrbackup5704378348890743 in the
response is not very easy.
I cannot directly use (NamedList)
CollectionAdminRequest.requestStatus("solrbackup").process(solr).getResponse().get("solrbackup")
but instead I have to use an iterator into the entire Solr response and
check the beginning of each String for retrieving the section that begins
by solrbackup. And finally get the elements that I want.

Am I correct to do this, maybe there is a simpler way to do that ?

Thanks,
Olivier Tavard


Cannot find Solr 7.4.1 release

2021-02-18 Thread Olivier Tavard
Hi,

I wanted to download Solr 7.4.1, but I cannot find the 7.4.1 release into
http://archive.apache.org/dist/lucene/solr/ : there are Solr 7.4 and after
directly 7.5.
Of course I can build from source code, but this is frustrating because I
can see that in the 7_4_branch there is a fix that I need (SOLR-12594) with
the status fixed into 7.4.1 and 7.5 versions. Everythings seems to have
been prepared to release the 7.4.1, but I cannot find it.
Does this release exist ?

Thank you,

Olivier


filtering facets

2009-08-30 Thread Olivier H. Beauchesne

Hi,

Long time lurker, first time poster.

I have a multi-valued field, let's call it article_outlinks containing 
all outgoing urls from a document. I want to get all matching urls 
sorted by counts.


For exemple, I want to get all outgoing wikipedia url in my documents 
sorted by counts.


So I execute a query like this:
q=article_outlinks:http*wikipedia.org*  and I facet on article_outlinks

But I get facets containing the other urls in the documents. I can get 
something close by using facet.prefix=http://en.wikipedia.org but I want 
to include other subdomains on wikipedia (ex: fr.wikipedia.org).


Is there a way to do a search and getting facets only matching my query?

I know facet.prefix isn't a query, but is there a way to get that behavior?

Is it easy to extend solr to do something like that?

Thank you,

Olivier

Sorry for my english.


Re: filtering facets

2009-08-31 Thread Olivier H. Beauchesne

Hi Mike,

No, my problem is that the field article_outlinks is multivalued thus it 
contains several urls not related to my search. I would like to facet 
only urls matching my query.


For exemple(only on one document, but my search targets over 1M docs):

Doc1:
article_url:
url1.com/1
url2.com/2
url1.com/1
url1.com/3

And my query is: article_url:url1.com* and I facet by article_url and I 
want it to give me:

url1.com/1 (2)
url1.com/3 (1)

But right now, because url2.com/2 is contained in a multivalued field 
with the matching urls, I get this:

url1.com/1 (2)
url1.com/3 (1)
url2.com/2 (1)

I can use facet.prefix to filter, but it's not very flexible if my url 
contains a subdomain as facet.prefix doesn't support wildcards.


Thank you,

Olivier

Mike Topper a écrit :

Hi Olivier,

are the facet counts on the urls you dont want 0?

if so you can use facet.mincount to only return results greater than 0.

-Mike

Olivier H. Beauchesne wrote:
  

Hi,

Long time lurker, first time poster.

I have a multi-valued field, let's call it article_outlinks containing
all outgoing urls from a document. I want to get all matching urls
sorted by counts.

For exemple, I want to get all outgoing wikipedia url in my documents
sorted by counts.

So I execute a query like this:
q=article_outlinks:http*wikipedia.org*  and I facet on article_outlinks

But I get facets containing the other urls in the documents. I can get
something close by using facet.prefix=http://en.wikipedia.org but I
want to include other subdomains on wikipedia (ex: fr.wikipedia.org).

Is there a way to do a search and getting facets only matching my query?

I know facet.prefix isn't a query, but is there a way to get that
behavior?

Is it easy to extend solr to do something like that?

Thank you,

Olivier

Sorry for my english.





  


Re: filtering facets

2009-08-31 Thread Olivier H. Beauchesne
yeah, but then I would have to retrieve *a lot* of facets. I think for 
now i'll retrieve all the subdomains with facet.prefix and then merge 
those queries. Not ideal, but when I will have more motivation, I will 
submit a patch to solr :-)


Michael a écrit :

You could post-process the response and remove urls that don't match your
domain pattern.

On Mon, Aug 31, 2009 at 9:45 AM, Olivier H. Beauchesne wrote:

  

Hi Mike,

No, my problem is that the field article_outlinks is multivalued thus it
contains several urls not related to my search. I would like to facet only
urls matching my query.

For exemple(only on one document, but my search targets over 1M docs):

Doc1:
article_url:
url1.com/1
url2.com/2
url1.com/1
url1.com/3

And my query is: article_url:url1.com* and I facet by article_url and I
want it to give me:
url1.com/1 (2)
url1.com/3 (1)

But right now, because url2.com/2 is contained in a multivalued field with
the matching urls, I get this:
url1.com/1 (2)
url1.com/3 (1)
url2.com/2 (1)

I can use facet.prefix to filter, but it's not very flexible if my url
contains a subdomain as facet.prefix doesn't support wildcards.

Thank you,

Olivier

Mike Topper a écrit :

 Hi Olivier,


are the facet counts on the urls you dont want 0?

if so you can use facet.mincount to only return results greater than 0.

-Mike

Olivier H. Beauchesne wrote:


  

Hi,

Long time lurker, first time poster.

I have a multi-valued field, let's call it article_outlinks containing
all outgoing urls from a document. I want to get all matching urls
sorted by counts.

For exemple, I want to get all outgoing wikipedia url in my documents
sorted by counts.

So I execute a query like this:
q=article_outlinks:http*wikipedia.org*  and I facet on article_outlinks

But I get facets containing the other urls in the documents. I can get
something close by using facet.prefix=http://en.wikipedia.org but I
want to include other subdomains on wikipedia (ex: fr.wikipedia.org).

Is there a way to do a search and getting facets only matching my query?

I know facet.prefix isn't a query, but is there a way to get that
behavior?

Is it easy to extend solr to do something like that?

Thank you,

Olivier

Sorry for my english.