from:"olivier"

Hi,

I am looking for a fast and easy to maintain way to do autocomplete for
large dataset in solr. I heard about Ternary Search Tree (TST)
<https://en.wikipedia.org/wiki/Ternary_search_tree>.
But I would like to know if there is something I missed such as best
practice, Solr new feature. Any suggestion is welcome. Thank you.

Regards
Olivier

Re: Fast autocomplete for large dataset

Thank you Eric for your reply.
If I understand it seems that these approaches are using index to hold
terms. As the index grows bigger, it can be a performance issues.
Is it right? Please can you check this article
<http://www.norconex.com/serving-autocomplete-suggestions-fast/> to see
what I mean?   Thank you.

Regards
Olivier


2015-08-01 17:42 GMT+02:00 Erick Erickson :

> Well, defining what you mean by "autocomplete" would be a start. If it's
> just
> a user types some letters and you suggest the next N terms in the list,
> TermsComponent will fix you right up.
>
> If it's more complicated, the AutoSuggest functionality might help.
>
> If it's correcting spelling, there's the spellchecker.
>
> Best,
> Erick
>
> On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina
>  wrote:
> > Hi,
> >
> > I am looking for a fast and easy to maintain way to do autocomplete for
> > large dataset in solr. I heard about Ternary Search Tree (TST)
> > <https://en.wikipedia.org/wiki/Ternary_search_tree>.
> > But I would like to know if there is something I missed such as best
> > practice, Solr new feature. Any suggestion is welcome. Thank you.
> >
> > Regards
> > Olivier
>

Re: Fast autocomplete for large dataset

Thank you Eric,

I would like to implement an autocomplete for large dataset.  The
autocomplete should show the phrase or the question the user want as the
user types. The requirement is that the autocomplete should be fast (not
slowdown by the volume of data as dataset become bigger), and easy to
maintain. The autocomplete can have its own Solr server.  It is an
autocomplete like others but it should be only fast and easy to maintain.

What is the limitations of suggesters mentioned in the article? Thank you.

Regards
Olivier


2015-08-01 19:41 GMT+02:00 Erick Erickson :

> Not really. There's no need to use ngrams as the article suggests if the
> terms component does what you need. Which is why I asked you about what
> autocomplete means in your context. Which you have not clarified. Have you
> even looked at terms component?  Especially the terms.prefix option?
>
> Terms component has it's limitations, but performance isn't one of them.
> The suggesters mentioned in the article have other limitations. It's really
> useless to discuss those limitations, though, until the problem you're
> trying to solve is clearly stated.
> On Aug 1, 2015 1:01 PM, "Olivier Austina" 
> wrote:
>
> > Thank you Eric for your reply.
> > If I understand it seems that these approaches are using index to hold
> > terms. As the index grows bigger, it can be a performance issues.
> > Is it right? Please can you check this article
> > <http://www.norconex.com/serving-autocomplete-suggestions-fast/> to see
> > what I mean?   Thank you.
> >
> > Regards
> > Olivier
> >
> >
> > 2015-08-01 17:42 GMT+02:00 Erick Erickson :
> >
> > > Well, defining what you mean by "autocomplete" would be a start. If
> it's
> > > just
> > > a user types some letters and you suggest the next N terms in the list,
> > > TermsComponent will fix you right up.
> > >
> > > If it's more complicated, the AutoSuggest functionality might help.
> > >
> > > If it's correcting spelling, there's the spellchecker.
> > >
> > > Best,
> > > Erick
> > >
> > > On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina
> > >  wrote:
> > > > Hi,
> > > >
> > > > I am looking for a fast and easy to maintain way to do autocomplete
> for
> > > > large dataset in solr. I heard about Ternary Search Tree (TST)
> > > > <https://en.wikipedia.org/wiki/Ternary_search_tree>.
> > > > But I would like to know if there is something I missed such as best
> > > > practice, Solr new feature. Any suggestion is welcome. Thank you.
> > > >
> > > > Regards
> > > > Olivier
> > >
> >
>

Re: Fast autocomplete for large dataset

Thank you Eric for your replies and the link.

Regards
Olivier


2015-08-02 3:47 GMT+02:00 Erick Erickson :

> Here's some background:
>
> http://lucidworks.com/blog/solr-suggester/
>
> Basically, the limitation is that to build the suggester all docs in
> the index need to be read to pull out the stored field and build
> either the FST or the sidecar Lucene index, which can be a _very_
> costly operation (as in minutes/hours for a large dataset).
>
> bq: The requirement is that the autocomplete should be fast (not
> slowdown by the volume of data as dataset become bigger)
>
> Well, in some alternate universe this may be possible. But the larger
> the corpus the slower the processing will be, there's just no way
> around that. Whether it's fast enough for your application is a better
> question ;).
>
> Best,
> Erick
>
>
> On Sat, Aug 1, 2015 at 2:05 PM, Olivier Austina
>  wrote:
> > Thank you Eric,
> >
> > I would like to implement an autocomplete for large dataset.  The
> > autocomplete should show the phrase or the question the user want as the
> > user types. The requirement is that the autocomplete should be fast (not
> > slowdown by the volume of data as dataset become bigger), and easy to
> > maintain. The autocomplete can have its own Solr server.  It is an
> > autocomplete like others but it should be only fast and easy to maintain.
> >
> > What is the limitations of suggesters mentioned in the article? Thank
> you.
> >
> > Regards
> > Olivier
> >
> >
> > 2015-08-01 19:41 GMT+02:00 Erick Erickson :
> >
> >> Not really. There's no need to use ngrams as the article suggests if the
> >> terms component does what you need. Which is why I asked you about what
> >> autocomplete means in your context. Which you have not clarified. Have
> you
> >> even looked at terms component?  Especially the terms.prefix option?
> >>
> >> Terms component has it's limitations, but performance isn't one of them.
> >> The suggesters mentioned in the article have other limitations. It's
> really
> >> useless to discuss those limitations, though, until the problem you're
> >> trying to solve is clearly stated.
> >> On Aug 1, 2015 1:01 PM, "Olivier Austina" 
> >> wrote:
> >>
> >> > Thank you Eric for your reply.
> >> > If I understand it seems that these approaches are using index to hold
> >> > terms. As the index grows bigger, it can be a performance issues.
> >> > Is it right? Please can you check this article
> >> > <http://www.norconex.com/serving-autocomplete-suggestions-fast/> to
> see
> >> > what I mean?   Thank you.
> >> >
> >> > Regards
> >> > Olivier
> >> >
> >> >
> >> > 2015-08-01 17:42 GMT+02:00 Erick Erickson :
> >> >
> >> > > Well, defining what you mean by "autocomplete" would be a start. If
> >> it's
> >> > > just
> >> > > a user types some letters and you suggest the next N terms in the
> list,
> >> > > TermsComponent will fix you right up.
> >> > >
> >> > > If it's more complicated, the AutoSuggest functionality might help.
> >> > >
> >> > > If it's correcting spelling, there's the spellchecker.
> >> > >
> >> > > Best,
> >> > > Erick
> >> > >
> >> > > On Sat, Aug 1, 2015 at 10:00 AM, Olivier Austina
> >> > >  wrote:
> >> > > > Hi,
> >> > > >
> >> > > > I am looking for a fast and easy to maintain way to do
> autocomplete
> >> for
> >> > > > large dataset in solr. I heard about Ternary Search Tree (TST)
> >> > > > <https://en.wikipedia.org/wiki/Ternary_search_tree>.
> >> > > > But I would like to know if there is something I missed such as
> best
> >> > > > practice, Solr new feature. Any suggestion is welcome. Thank you.
> >> > > >
> >> > > > Regards
> >> > > > Olivier
> >> > >
> >> >
> >>
>

SOLR cloud (5.2.1) recovery

2015-08-18 Thread Olivier Damiot

hello,

i'am a bit confused about how solr cloud recovery is supposed to work
exactly in the case of loosing a single node completely.

My 600 collections are created with
numShards=3&replicationFactor=3&maxShardsPerNode=3

However, how do i configure a new node to take the place of the dead
node, or if accidentally i delete the data dir ?

I bring up a new node which is completely empty (empty data dir),
install solr, and connect it to zookeeper.Is it supposed to work
automatically from there? All my shards/replicas on this node as down
(i suppose because there is no cores in data dir).

Do I need to recreate the cores first?

Can i copy/paste data directory from another node to this one ? I
think no because i should rename all variables in core.properties
which are dedicated for each node (like name or coreNodeName)

thanks,

Olivier Damiot

How to dereference boost values?

2015-07-14 Thread Olivier Lebra

Is it possible to do something like this: bf=myfield^$myfactor

Thanks,
Olivier

Dereferencing boost values?

2015-07-14 Thread Olivier Lebra

Is there a way to do something like this: " bf=myfield^$myfactor " ?
(Doesn't work, the boost value has to be a direct number)

Thanks,
Olivier

Re: Dereferencing boost values?

2015-07-14 Thread Olivier Lebra


Thanks guys...
I'm using edismax, and I have a long bf field, that I want in a solr's 
requesthandler config as default, but customizable via query string, 
something like that:


  
product(a,$a)^$fa sum(b,$b1,$b2)^$fb c^$fc ...

where the caller would pass $a, $fa, $b1, $b2, $fb, $fc (and a, b, c are 
numeric fields)


So my problem is with $fa, $fb, and $fc. Solr doesn't take that syntax.

For numeric operands, is the dismax boost operator ^ just a pow()? If 
so, my problem is solved by doing that:
 pow(product(a,$a1),$fa) pow(sum(b,$b1,$b2),$fb) 
pow(c,$fc)

Is a^b equiv to pow(a,b)?

Thanks,
Olivier


On 7/14/2015 2:31 PM, Chris Hostetter wrote:

To clarify the difference:

- "bf" is a special param of the dismax parser, which does an *additive*
boost function - that function can be something as simple as a numeric
field

- alternatively, you can use the "boost" parser in your main query string,
to wrap any parser (dismax, edismax, standard, whatever) in a
*multiplicitive* boost, where the boost function can be anything

- multiplicitve boosts are almost always what people really want, additive
boosts are a lot less useful.

- when specifying any function, you can use variable derefrencing for any
function params.

So in the example Upayavira gave, you can use any arbitrary query param to
specify the function to use as a multiplicitive boost arround an arbitrary
query -- which could still use dismax if you want (just specify the
neccessary parser "type" as a localparam on the inner query, or use a
defType localparam on the original boost query).  Or you could explicitly
specify a function that incorporates a field value with some other
dynamic params, and use that entire function as your multiplicitive boost.

a more elaborate example using the "bin/solr -e techproducts" data...

http://localhost:8983/solr/techproducts/query?debug=query&q={!boost%20b=$boost_func%20defType=dismax%20v=$qq}&qf=name+title&qq=apple%20ipod&boost_func=pow%28$boost_field,$boost_factor%29&boost_field=price&boost_factor=2

 "params":{
   "qq":"apple ipod",
   "q":"{!boost b=$boost_func defType=dismax v=$qq}",
   "debug":"query",
   "qf":"name title",
   "boost_func":"pow($boost_field,$boost_factor)",
   "boost_factor":"2",
   "boost_field":"price"}},







: Date: Tue, 14 Jul 2015 21:58:36 +0100
: From: Upayavira 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Dereferencing boost values?
:
: You could do
:
: q={!boost b=$b v=$qq}
: qq=your query
: b=YOUR-FACTOR
:
: If what you want is to provide a value outside.
:
: Also, with later Solrs, you can use ${whatever} syntax in your main
: query, which might work for you too.
:
: Upayavira
:
: On Tue, Jul 14, 2015, at 09:28 PM, Olivier Lebra wrote:
: > Is there a way to do something like this: " bf=myfield^$myfactor " ?
: > (Doesn't work, the boost value has to be a direct number)
: >
: > Thanks,
: > Olivier
:

-Hoss
http://www.lucidworks.com/

Querying specific database attributes or table

2014-03-16 Thread Olivier Austina

Hi,
I am new to Solr.

I would like to index and querying a relational database. Is it possible to
query a specific table or attribute of the database. Example if I have 2
tables A and B both have the attribute "name" and I want to have only the
results form the table A and not from table B. Is it possible?
Can I restrict the query to only one table without having result from
others table?
Is it possible to query a specific attribute of a table?
Is it possible to do join query like SQL?
Any suggestion is welcome. Thank you.

Regards
Olivier

Topology of Solr use

2014-04-17 Thread Olivier Austina

Hi All,
I would to have an idea about Solr usage: number of users, industry,
countries or any helpful information. Thank you.
Regards
Olivier

Re: Topology of Solr use

2014-04-17 Thread Olivier Austina

Thank you Markus, the link is very useful.


Regards
Olivier



2014-04-17 18:24 GMT+02:00 Markus Jelsma :

> This may help a bit:
>
> https://wiki.apache.org/solr/PublicServers
>
> -Original message-
> From:Olivier Austina 
> Sent:Thu 17-04-2014 18:16
> Subject:Topology of Solr use
> To:solr-user@lucene.apache.org;
> Hi All,
> I would to have an idea about Solr usage: number of users, industry,
> countries or any helpful information. Thank you.
> Regards
> Olivier
>

Problem indexing email attachments

2014-04-23 Thread Olivier . Masseau

Hello, 

I'm trying to index email files with Solr (4.7.2)

The files have the extension .eml (message/rfc822) 

The mail body is correctly indexed but attachments are not indexed if they 
are not .txt files. 

If attachments are .txt files it works, but if attachment are .pdf of 
.docx files they are not indexed. 



I checked the extracted text by calling: 

curl "
http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true&extractOnly=true&extractFormat=text
" -F "myfile=@Test1.eml" 

The returned extracted text does not contain the content of the 
attachments if they are not .txt files. 


It is not a problem with the Apache Tika library not being able to process 
attachments, because running the standalone Apache Tika app by calling: 


java -jar tika-app-1.4.jar -t Test1.eml 


on my eml files correctly displays the attachments' text. 



Maybe is it a problem with how Tika is called by Solr ? 

Is there something to modify in the default configuration ? 


Thanx for any help ;) 
 
Olivier

Re: Problem indexing email attachments

2014-04-23 Thread Olivier . Masseau

As I said, it is not a problem in the Tika library ;)

I have tried with Tika 1.5 jars and it gives the same results.



Guido Medina  wrote on 23/04/2014 16:15:11:

> From: Guido Medina 
> To: solr-user@lucene.apache.org
> Date: 23/04/2014 16:15
> Subject: Re: Problem indexing email attachments
> 
> We particularly massage solr.war and put our own updated jars, maybe 
> this helps:
> 
> http://www.apache.org/dist/tika/CHANGES-1.5.txt
> 
> We using Tika 1.5 inside Solr with POI 3.10-Final, etc...
> 
> Guido.
> 
> On 23/04/14 14:38, olivier.mass...@real.lu wrote:
> > Hello,
> >
> > I'm trying to index email files with Solr (4.7.2)
> >
> > The files have the extension .eml (message/rfc822)
> >
> > The mail body is correctly indexed but attachments are not indexed if 
they
> > are not .txt files.
> >
> > If attachments are .txt files it works, but if attachment are .pdf of
> > .docx files they are not indexed.
> >
> >
> >
> > I checked the extracted text by calling:
> >
> > curl "
> > http://localhost:8983/solr/update/extract?
> literal.id=doc1&commit=true&extractOnly=true&extractFormat=text
> > " -F "myfile=@Test1.eml"
> >
> > The returned extracted text does not contain the content of the
> > attachments if they are not .txt files.
> >
> >
> > It is not a problem with the Apache Tika library not being able to 
process
> > attachments, because running the standalone Apache Tika app by 
calling:
> >
> >
> > java -jar tika-app-1.4.jar -t Test1.eml
> >
> >
> > on my eml files correctly displays the attachments' text.
> >
> >
> >
> > Maybe is it a problem with how Tika is called by Solr ?
> >
> > Is there something to modify in the default configuration ?
> >
> >
> > Thanx for any help ;)
> > 
> > Olivier
>

Website running Solr

2014-05-11 Thread Olivier Austina

Hi All,
Is there a way to know if a website use Solr? Thanks.
Regards
Olivier

Subject=How to Get Highlighting Working in Velocity (Solr 4.8.0)

2014-07-27 Thread Olivier FOSTIER

May be you miss that your field "dom_title" should be
index="true" termVectors="true" termPositions="true" termOffsets="true"

Re: feedback on Solr 4.x LotsOfCores feature

2013-10-18 Thread Soyez Olivier

15K cores is around 4 minutes : no network drive, just a spinning disk
But, one important thing, to simulate a cold start or an useless linux buffer 
cache,
I used the following command to empty the linux buffer cache :
sync && echo 3 > /proc/sys/vm/drop_caches
Then, I started Solr and I found the result above

Le 11/10/2013 13:06, Erick Erickson a écrit :

bq: sharing the underlying solrconfig object the configset introduced
in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode

SOLR-4478 will NOT share the underlying config objects, it simply
shares the underlying directory. Each core will, at least as presently
envisioned, simply read the files that exist there and create their
own solrconfig object. Schema objects may be shared, but not config
objects. It may turn out to be relatively easy to do in the configset
situation, but last time I looked at sharing the underlying config
object it was too fraught with problems.

bq: 15K cores is around 4 minutes

I find this very odd. On my laptop, spinning disk, I think I was
seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I
have no idea what's going on here. If this is just reading the files,
you should be seeing horrible disk contention. Are you on some kind of
networked drive?

bq: To do that in background and to block on that request until core
discovery is complete, should not work for us (due to the worst case).
What other choices are there? Either you have to do it up front or
with some kind of blocking. Hmmm, I suppose you could keep some kind
of custom store (DB? File? ZooKeeper?) that would keep the last known
layout. You'd still have some kind of worst-case situation where the
core you were trying to load wouldn't be in your persistent store and
you'd _still_ have to wait for the discovery process to complete.

bq: and we will use the cores Auto option to create load or only load
the core on
Interesting. I can see how this could all work without any core
discovery but it does require a very specific setup.

On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier
<mailto:olivier.so...@worldline.com> wrote:
> The corresponding patch for Solr 4.2.1 LotsOfCores can be found in SOLR-5316, 
> including the new Cores options :
> - "numBuckets" to create a subdirectory based on a hash on the corename % 
> numBuckets in the core Datadir
> - "Auto" with 3 differents values :
>   1) false : default behaviour
>   2) createLoad : create, if not exist, and load the core on the fly on the 
> first incoming request (update, select)
>   3) onlyLoad : load the core on the fly on the first incoming request 
> (update, select), if exist on disk
>
> Concerning :
> - sharing the underlying solrconfig object, the configset introduced in JIRA 
> SOLR-4478 seems to be the solution for non-SolrCloud mode.
> We need to test it for our use case. If another solution exists, please tell 
> me. We are very interested in such functionality and to contribute, if we can.
>
> - the possibility of lotsOfCores in SolrCloud, we don't know in details how 
> SolrCloud is working.
> But one possible limit is the maximum number of entries that can be added to 
> a zookeeper node.
> Maybe, a solution will be just a kind of hashing in the zookeeper tree.
>
> - the time to discover cores in Solr 4.4 : with spinning disk under linux, 
> all cores with transient="true" and loadOnStartup="false", the linux buffer 
> cache empty before starting Solr :
> 15K cores is around 4 minutes. It's linear in the cores number, so for 50K 
> it's more than 13 minutes. In fact, it corresponding to the time to read all 
> core.properties files.
> To do that in background and to block on that request until core discovery is 
> complete, should not work for us (due to the worst case).
> So, we will just disable the core Discovery, because we don't need to know 
> all cores from the start. Start Solr without any core entries in solr.xml, 
> and we will use the cores Auto option to create load or only load the core on 
> the fly, based on the existence of the core on the disk (absolute path 
> calculated from the core name).
>
> Thanks for your interest,
>
> Olivier
> 
> De : Erick Erickson [erickerick...@gmail.com<mailto:erickerick...@gmail.com>]
> Date d'envoi : lundi 7 octobre 2013 14:33
> À : solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>
> Objet : Re: feedback on Solr 4.x LotsOfCores feature
>
> Thanks for the great writeup! It's always interesting to see how
> a feature plays out "in the real world". A couple of questions
> though:
>
> bq: We added 2 Cores options :
> Do you mean you patched Solr? If so are you willing to shard the code
>

Re: feedback on Solr 4.x LotsOfCores feature

2013-10-22 Thread Soyez Olivier

Another way to "simulate" the core discovery is :
time find $PATH_TO_CORES -name core.properties -type f -exec cat '{}' > 
/dev/null 2>&1 \;

or just the core.properties read time  :
find $PATH_TO_CORES -name core.properties > cores.list
time for i in `cat cores.list`; do cat $i > /dev/null 2>&1; done;

Olivier

Le 19/10/2013 11:57, Erick Erickson a écrit :

For my quick-and-dirty test I just rebooted my machine totally and still
had 1K/sec core discovery. So this still puzzles me greatly. The time
do do this should be approximated by the time it takes to just walk
your tree, find all the core.properties and read them. I it possible to
just write a tiny Java program to do that? Or rip off the core discovery
code and use that for a small stand-alone program? Because this is quite
a bit at odds with what I've seen. Although now that I think about it,
the code has gone through some revisions since then, but I don't think
they should have affected this...

Best
Erick


On Fri, Oct 18, 2013 at 2:59 PM, Soyez Olivier
<mailto:olivier.so...@worldline.com>wrote:

> 15K cores is around 4 minutes : no network drive, just a spinning disk
> But, one important thing, to simulate a cold start or an useless linux
> buffer cache,
> I used the following command to empty the linux buffer cache :
> sync && echo 3 > /proc/sys/vm/drop_caches
> Then, I started Solr and I found the result above
>
>
> Le 11/10/2013 13:06, Erick Erickson a écrit :
>
>
> bq: sharing the underlying solrconfig object the configset introduced
> in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode
>
> SOLR-4478 will NOT share the underlying config objects, it simply
> shares the underlying directory. Each core will, at least as presently
> envisioned, simply read the files that exist there and create their
> own solrconfig object. Schema objects may be shared, but not config
> objects. It may turn out to be relatively easy to do in the configset
> situation, but last time I looked at sharing the underlying config
> object it was too fraught with problems.
>
> bq: 15K cores is around 4 minutes
>
> I find this very odd. On my laptop, spinning disk, I think I was
> seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I
> have no idea what's going on here. If this is just reading the files,
> you should be seeing horrible disk contention. Are you on some kind of
> networked drive?
>
> bq: To do that in background and to block on that request until core
> discovery is complete, should not work for us (due to the worst case).
> What other choices are there? Either you have to do it up front or
> with some kind of blocking. Hmmm, I suppose you could keep some kind
> of custom store (DB? File? ZooKeeper?) that would keep the last known
> layout. You'd still have some kind of worst-case situation where the
> core you were trying to load wouldn't be in your persistent store and
> you'd _still_ have to wait for the discovery process to complete.
>
> bq: and we will use the cores Auto option to create load or only load
> the core on
> Interesting. I can see how this could all work without any core
> discovery but it does require a very specific setup.
>
> On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier
> <mailto:olivier.so...@worldline.com><mailto:olivier.so...@worldline.com>
>  wrote:
> > The corresponding patch for Solr 4.2.1 LotsOfCores can be found in
> SOLR-5316, including the new Cores options :
> > - "numBuckets" to create a subdirectory based on a hash on the corename
> % numBuckets in the core Datadir
> > - "Auto" with 3 differents values :
> >   1) false : default behaviour
> >   2) createLoad : create, if not exist, and load the core on the fly on
> the first incoming request (update, select)
> >   3) onlyLoad : load the core on the fly on the first incoming request
> (update, select), if exist on disk
> >
> > Concerning :
> > - sharing the underlying solrconfig object, the configset introduced in
> JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode.
> > We need to test it for our use case. If another solution exists, please
> tell me. We are very interested in such functionality and to contribute, if
> we can.
> >
> > - the possibility of lotsOfCores in SolrCloud, we don't know in details
> how SolrCloud is working.
> > But one possible limit is the maximum number of entries that can be
> added to a zookeeper node.
> > Maybe, a solution will be just a kind of hashing in the zookeeper tree.
> >
> > - the time to discover cores in Solr 4.4 : with spinning disk under
> linux, all cores with transient="true" and

Remove indexes of XML file

2014-10-24 Thread Olivier Austina

Hi,

This is newbie question. I have indexed some documents using some XML files
as indicating in the tutorial
<http://lucene.apache.org/solr/4_10_1/tutorial.html> with the command :

java -jar post.jar *.xml

I have seen how to delete an index for one document but how to delete
all indexes
for documents within an XML file. For example if I have indexed some
files A, B, C, D etc.,
how to delete indexes of documents from file C. Is there a command
like above or other
solution without using individual ID? Thank you.


Regards
Olivier

Re: Remove indexes of XML file

2014-10-25 Thread Olivier Austina

Thank you Alex, I think I can use the file to delete corresponding indexes.

Regards
Olivier


2014-10-24 21:51 GMT+02:00 Alexandre Rafalovitch :

> You can delete individually, all (*:* query) or by specific query. So,
> if there is no common query pattern you may need to do a multi-id
> query - something like "id:(id1 id2 id3 id4)" which does require you
> knowing the IDs.
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 24 October 2014 15:44, Olivier Austina 
> wrote:
> > Hi,
> >
> > This is newbie question. I have indexed some documents using some XML
> files
> > as indicating in the tutorial
> > <http://lucene.apache.org/solr/4_10_1/tutorial.html> with the command :
> >
> > java -jar post.jar *.xml
> >
> > I have seen how to delete an index for one document but how to delete
> > all indexes
> > for documents within an XML file. For example if I have indexed some
> > files A, B, C, D etc.,
> > how to delete indexes of documents from file C. Is there a command
> > like above or other
> > solution without using individual ID? Thank you.
> >
> >
> > Regards
> > Olivier
>

OpenExchangeRates.Org rates in solr

2014-10-26 Thread Olivier Austina

Hi,

There is a way to see the OpenExchangeRates.Org
<http://www.OpenExchangeRates.Org> rates used in Solr somewhere. I have
changed the configuration to use these rates. Thank you.
Regards
Olivier

Re: OpenExchangeRates.Org rates in solr

2014-10-26 Thread Olivier Austina

Hi Will,

I am learning Solr now. I can use it  later for business or for free
access. Thank you.

Regards
Olivier


2014-10-26 17:32 GMT+01:00 Will Martin :

> Hi Olivier:
>
> Can you clarify this message? Are you using Solr at the business? Or are
> you giving free access to solr installations?
>
> Thanks,
> Will
>
>
> -Original Message-
> From: Olivier Austina [mailto:olivier.aust...@gmail.com]
> Sent: Sunday, October 26, 2014 10:57 AM
> To: solr-user@lucene.apache.org
> Subject: OpenExchangeRates.Org rates in solr
>
> Hi,
>
> There is a way to see the OpenExchangeRates.Org <
> http://www.OpenExchangeRates.Org> rates used in Solr somewhere. I have
> changed the configuration to use these rates. Thank you.
> Regards
> Olivier
>
>

Indexing documents/files for production use

2014-10-28 Thread Olivier Austina

Hi All,

I am reading the solr documentation. I have understood that post.jar
<http://wiki.apache.org/solr/ExtractingRequestHandler#SimplePostTool_.28post.jar.29>
is not meant for production use, cURL
<https://cwiki.apache.org/confluence/display/solr/Introduction+to+Solr+Indexing>
is not recommanded. Is SolrJ better for production?  Thank you.
Regards
Olivier

Re: Indexing documents/files for production use

2014-10-30 Thread Olivier Austina

Thank you Alexandre, Jürgen and Erick for your replies. It is clear for me.

Regards
Olivier


2014-10-28 23:35 GMT+01:00 Erick Erickson :

> And one other consideration in addition to the two excellent responses
> so far
>
> In a SolrCloud environment, SolrJ via CloudSolrServer will automatically
> route the documents to the correct shard leader, saving some additional
> overhead. Post.jar and cURL send the docs to a node, which in turn
> forward the docs to the correct shard leader which lowers
> throughput
>
> Best,
> Erick
>
> On Tue, Oct 28, 2014 at 2:32 PM, "Jürgen Wagner (DVT)"
>  wrote:
> > Hello Olivier,
> >   for real production use, you won't really want to use any toys like
> > post.jar or curl. You want a decent connector to whatever data source
> there
> > is, that fetches data, possibly massages it a bit, and then feeds it into
> > Solr - by means of SolrJ or directly into the web service of Solr via
> binary
> > protocols. This way, you can properly handle incremental feeding,
> processing
> > of data from remote locations (with the connector being closer to the
> data
> > source), and also source data security. Also think about what happens if
> you
> > do processing of incoming documents in Solr. What happens if Tika runs
> out
> > of memory because of PDF problems? What if this crashes your Solr node?
> In
> > our Solr projects, we generally do not do any sizable processing within
> Solr
> > as document processing and document indexing or querying have all
> different
> > scaling properties.
> >
> > "Production use" most typically is not achieved by deploying a vanilla
> Solr,
> > but rather having a bit more glue and wrappage, so the whole will fit
> your
> > requirements in terms of functionality, scaling, monitoring and
> robustness.
> > Some similar platforms like Elasticsearch try to alleviate these pains of
> > going to a production-style infrastructure, but that's at the expense of
> > flexibility and comes with limitations.
> >
> > For proof-of-concept or demonstrator-style applications, the plain tools
> out
> > of the box will be fine. For production applications, you want to have
> more
> > robust components.
> >
> > Best regards,
> > --Jürgen
> >
> >
> > On 28.10.2014 22:12, Olivier Austina wrote:
> >
> > Hi All,
> >
> > I am reading the solr documentation. I have understood that post.jar
> > <
> http://wiki.apache.org/solr/ExtractingRequestHandler#SimplePostTool_.28post.jar.29
> >
> > is not meant for production use, cURL
> > <
> https://cwiki.apache.org/confluence/display/solr/Introduction+to+Solr+Indexing
> >
> > is not recommanded. Is SolrJ better for production?  Thank you.
> > Regards
> > Olivier
> >
> >
> >
> > --
> >
> > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
> > уважением
> > i.A. Jürgen Wagner
> > Head of Competence Center "Intelligence"
> > & Senior Cloud Consultant
> >
> > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
> > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864
> 1543
> > E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
> >
> > 
> > Managing Board: Jürgen Hatzipantelis (CEO)
> > Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
> > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
> >
> >
>

UI for Solr

2014-12-23 Thread Olivier Austina

Hi,

I would like to build a User Interface on top of Solr for PC and mobile. I
am wondering if there is a framework, best practice commonly used. I want
Solr features such as suggestion, auto complete, facet to be available for
UI. Any suggestion is welcome. Than you.

Regards
Olivier

Re: UI for Solr

2014-12-23 Thread Olivier Austina

Hi Alex,

Thank you for prompt reply. I am not aware of Spring.io's Spring Data Solr.

Regards
Olivier


2014-12-23 16:50 GMT+01:00 Alexandre Rafalovitch :

> You don't expose Solr directly to the user, it is not setup for
> full-proof security out of the box. So you would need a client to talk
> to Solr.
>
> Something like Spring.io's Spring Data Solr could be one of the things
> to check. You can see an auto-complete example for it at:
> https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer/src/main
> and embedded in action at
> http://www.solr-start.com/javadoc/solr-lucene/index.html (search box
> on the top)
>
> Regards,
>Alex.
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 23 December 2014 at 10:45, Olivier Austina 
> wrote:
> > Hi,
> >
> > I would like to build a User Interface on top of Solr for PC and mobile.
> I
> > am wondering if there is a framework, best practice commonly used. I want
> > Solr features such as suggestion, auto complete, facet to be available
> for
> > UI. Any suggestion is welcome. Than you.
> >
> > Regards
> > Olivier
>

Architecture for PHP web site, Solr and an application

2014-12-26 Thread Olivier Austina

Hi,

I would like to query only some fields in Solr depend on the user input as
I know the fields.

The user send an HTML form to the PHP website. The application get the
fields and their content from the PHP web site. The application then
formulate a query to Solr based on this fields and other contextual
information. Only fields from the HTML form are used. The forms don't have
the same fields. The application is not yet developed. It could be in C++,
Java or other language using a database. It uses more resources.

I am wondering which architecture is suitable for this case:
-How to make the architecture scalable (to support more users)
-How to make PHP communicate with the application if this application is
not in PHP.

Any suggestion is welcome. Thank you.

 Regards
Olivier

How to implement Auto complete, suggestion client side

2015-01-26 Thread Olivier Austina

Hi All,

I would say I am new to web technology.

I would like to implement auto complete/suggestion in the user search box
as the user type in the search box (like Google for example). I am using
Solr as database. Basically I am  familiar with Solr and I can formulate
suggestion queries.

But now I don't know how to implement suggestion in the User Interface.
Which technologies should I need. The website is in PHP. Any suggestions,
examples, basic tutorial is welcome. Thank you.



Regards
Olivier

Re: How to implement Auto complete, suggestion client side

2015-01-28 Thread Olivier Austina

Hi,

Thank you Dan Davis and Alexandre Rafalovitch. This is very helpful for me.

Regards
Olivier


2015-01-27 0:51 GMT+01:00 Alexandre Rafalovitch :

> You've got a lot of options depending on what you want. But since you
> seem to just want _an_ example, you can use mine from
> http://www.solr-start.com/javadoc/solr-lucene/index.html (gray search
> box there).
>
> You can see the source for the test screen (using Spring Boot and
> Spring Data Solr as a middle-layer) and Select2 for the UI at:
> https://github.com/arafalov/Solr-Javadoc/tree/master/SearchServer.
> The Solr definition is at:
>
> https://github.com/arafalov/Solr-Javadoc/tree/master/JavadocIndex/JavadocCollection/conf
>
> Other implementation pieces are in that (and another) public
> repository as well, but it's all in Java. You'll probably want to do
> something similar in PHP.
>
> Regards,
>Alex.
> 
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 26 January 2015 at 17:11, Olivier Austina 
> wrote:
> > Hi All,
> >
> > I would say I am new to web technology.
> >
> > I would like to implement auto complete/suggestion in the user search box
> > as the user type in the search box (like Google for example). I am using
> > Solr as database. Basically I am  familiar with Solr and I can formulate
> > suggestion queries.
> >
> > But now I don't know how to implement suggestion in the User Interface.
> > Which technologies should I need. The website is in PHP. Any suggestions,
> > examples, basic tutorial is welcome. Thank you.
> >
> >
> >
> > Regards
> > Olivier
>

feedback on Solr 4.x LotsOfCores feature

2013-10-07 Thread Soyez Olivier

Hello,

In my company, we use Solr in production to offer full text search on
mailboxes.
We host dozens million of mailboxes, but only webmail users have such
feature (few millions).
We have the following use case :
- non static indexes with more update (indexing and deleting), than
select requests (ratio 7:1)
- homogeneous configuration for all indexes
- not so much user at the same time

We started to index mailboxes with Solr 1.4 in 2010, on a subset of
400,000 users.
- we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr
instance
- we grow to 6000 users per Solr instance, 8 Solr per server, 60Go per
index (~2 million users)
- we upgraded to Solr 3.5 in 2012
As indexes grew, IOPS and the response times have increased more and more.

The index size was mainly due to stored fields (large .fdt files)
Retrieving these fields from the index was costly, because of many seek
in large files, and no limit usage possible.
There is also an overhead on queries : too many results are filtered to
find only results concerning user.
For these reason and others, like not pooled users, hardware savings,
better scoring, some requests that do not support filtering, we have
decided to use the LotsOfCores feature.

Our goal was to change the current I/O usage : from lots of random I/O
access on huge segments to mostly sequential I/O access on small segments.
For our use case, it's not a big deal, that the first query to one not
yet loaded core will be slow.
And, we don’t need to fit all the cores into memory at once.

We started from the SOLR-1293 issue and the LotsOfCores wiki page to
finally use a patched Solr 4.2.1 LotsOfCores in production (1 user = 1
core).
We don't need anymore to run so many Solr per node. We are now able to
have around 5 cores per Solr and we plan to grow to 100,000 cores
per instance.
In a first time, we used the solr.xml persistence. All cores have
loadOnStartup="false" and transient="true" attributes, so a cold start
is very quick. The response times were better than ever, in comparaison
with poor response times, we had before using LotsOfCores.

We added 2 Cores options :
- "numBuckets" to create a subdirectory based on a hash on the corename
% numBuckets in the core Datadir, because all cores cannot live in the
same directory
- "Auto" with 3 differents values :
1) false : default behaviour
2) createLoad : create, if not exist, and load the core on the fly on
the first incoming request (update, select).
3) onlyLoad : load the core on the fly on the first incoming request
(update, select), if exist on disk

Then, to improve performance and avoid synchronization in the solr.xml
persistence : we disabled it.
The drawback is we cannot see anymore all the availables cores list with
the admin core status command, only those warmed up.
Finally, we can achieve very good performances with Solr LotsOfCores :
- Index 5 emails (avg) + commit + search : x4.9 faster response time
(Mean), x5.4 faster (95th per)
- Delete 5 documents (avg) : x8.4 faster response time (Mean) x7.4
faster (95th per)
- Search : x3.7 faster response time (Mean) 4x faster (95th per)

In fact, the better performance is mainly due to the little size of each
index, but also thanks to the isolation between cores (updates and
queries on many mailboxes don’t have side effects to each other).
One important thing with the LotsOfCores feature is to take care of :
- the number of file descriptors, it used a lot (need to increase global
max and per process fd)
- the value of the transientCacheSize depending of the RAM size and the
PermGen allocated size
- the leak of ClassLoader that increase minor GC times, when CMS GC is
enabled (use -XX:+CMSClassUnloadingEnabled)
- the overhead to parse solrconfig.xml and load dependencies to open
each core
- lotsOfCores doesn’t work with SolrCloud, then we store indexes
location outside of Solr. We have Solr proxies to route requests to the
right instance.

Not in production, we try the core discovery feature in Solr 4.4 with a
lots of cores.
When you start, it spend a lot of times to discover cores due to a big
number of cores, meanwhile all requests fail (SolrDispatchFilter.init()
not done yet). It will be great to have for example an option for a core
discovery in background, or just to be able to disable it, like we do in
our use case.

If someone is interested in these new options for LotsOfCores feature,
just tell me


Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet ég

Re: Re: feedback on Solr 4.x LotsOfCores feature

2013-10-10 Thread Soyez Olivier

The corresponding patch for Solr 4.2.1 LotsOfCores can be found in SOLR-5316, 
including the new Cores options :
- "numBuckets" to create a subdirectory based on a hash on the corename % 
numBuckets in the core Datadir
- "Auto" with 3 differents values :
  1) false : default behaviour
  2) createLoad : create, if not exist, and load the core on the fly on the 
first incoming request (update, select)
  3) onlyLoad : load the core on the fly on the first incoming request (update, 
select), if exist on disk

Concerning :
- sharing the underlying solrconfig object, the configset introduced in JIRA 
SOLR-4478 seems to be the solution for non-SolrCloud mode.
We need to test it for our use case. If another solution exists, please tell 
me. We are very interested in such functionality and to contribute, if we can.

- the possibility of lotsOfCores in SolrCloud, we don't know in details how 
SolrCloud is working.
But one possible limit is the maximum number of entries that can be added to a 
zookeeper node.
Maybe, a solution will be just a kind of hashing in the zookeeper tree.

- the time to discover cores in Solr 4.4 : with spinning disk under linux, all 
cores with transient="true" and loadOnStartup="false", the linux buffer cache 
empty before starting Solr :
15K cores is around 4 minutes. It's linear in the cores number, so for 50K it's 
more than 13 minutes. In fact, it corresponding to the time to read all 
core.properties files.
To do that in background and to block on that request until core discovery is 
complete, should not work for us (due to the worst case).
So, we will just disable the core Discovery, because we don't need to know all 
cores from the start. Start Solr without any core entries in solr.xml, and we 
will use the cores Auto option to create load or only load the core on the fly, 
based on the existence of the core on the disk (absolute path calculated from 
the core name).

Thanks for your interest,

Olivier

De : Erick Erickson [erickerick...@gmail.com]
Date d'envoi : lundi 7 octobre 2013 14:33
À : solr-user@lucene.apache.org
Objet : Re: feedback on Solr 4.x LotsOfCores feature

Thanks for the great writeup! It's always interesting to see how
a feature plays out "in the real world". A couple of questions
though:

bq: We added 2 Cores options :
Do you mean you patched Solr? If so are you willing to shard the code
back? If both are "yes", please open a JIRA, attach the patch and assign
it to me.

bq:  the number of file descriptors, it used a lot (need to increase global
max and per process fd)

Right, this makes sense since you have a bunch of cores all with their
own descriptors open. I'm assuming that you hit a rather high max
number and it stays pretty steady

bq: the overhead to parse solrconfig.xml and load dependencies to open
each core

Right, I tried to look at sharing the underlying solrconfig object but
it seemed pretty hairy. There are some extensive comments in the
JIRA of the problems I foresaw. There may be some action on this
in the future.

bq: lotsOfCores doesn’t work with SolrCloud

Right, we haven't concentrated on that, it's an interesting problem.
In particular it's not clear what happens when nodes go up/down,
replicate, resynch, all that.

bq: When you start, it spend a lot of times to discover cores due to a big

How long? I tried 15K cores on my laptop and I think I was getting 15
second delays or roughly 1K cores discovered/second. Is your delay
on the order of 50 seconds with 50K cores?

I'm not sure how you could do that in the background, but I haven't
thought about it much. I tried multi-threading core discovery and that
didn't help (SSD disk), I assumed that the problem was mostly I/O
contention (but didn't prove it). What if a request came in for a core
before you'd found it? I'm not sure what the right behavior would be
except perhaps to block on that request until core discovery was
complete. Hm. How would that work for your case? That
seems do-able.

BTW, so far you get the prize for the most cores on a node I think.

Thanks again for the great feedback!

Erick

On Mon, Oct 7, 2013 at 3:53 AM, Soyez Olivier
 wrote:
> Hello,
>
> In my company, we use Solr in production to offer full text search on
> mailboxes.
> We host dozens million of mailboxes, but only webmail users have such
> feature (few millions).
> We have the following use case :
> - non static indexes with more update (indexing and deleting), than
> select requests (ratio 7:1)
> - homogeneous configuration for all indexes
> - not so much user at the same time
>
> We started to index mailboxes with Solr 1.4 in 2010, on a subset of
> 400,000 users.
> - we had a cluster of 50 servers, 4 Solr per server, 2000 users per Solr
> instance
> - we grow to 6000

Re: solr distributed search don't work

2011-09-01 Thread olivier sallou

   

 
   explicit
   enum
   1
   10
  192.168.1.6/solr/,192.168.1.7/solr/
 
  

2011/8/19 Li Li 

> could you please show me your configuration in solrconfig.xml?
>
> On Fri, Aug 19, 2011 at 5:31 PM, olivier sallou
>  wrote:
> > Hi,
> > I do not use spell but I use distributed search, using qt=spell is
> correct,
> > should not use qt=\spell.
> > For "shards", I specify it in solrconfig directly, not in url, but should
> > work the same.
> > Maybe an issue in your spell request handler.
> >
> >
> > 2011/8/19 Li Li 
> >
> >> hi all,
> >> I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
> >> but there is something wrong.
> >> the url given my the wiki is
> >>
> >>
> http://solr:8983/solr/select?q=*:*&spellcheck=true&spellcheck.build=true&spellcheck.q=toyata&qt=spell&shards.qt=spell&shards=solr-shard1:8983/solr,solr-shard2:8983/solr
> >> but it does not work. I trace the codes and find that
> >> qt=spell&shards.qt=spell should be qt=/spell&shards.qt=/spell
> >> After modification of url, It return all documents but nothing
> >> about spell check.
> >> I debug it and find the
> >> AbstractLuceneSpellChecker.getSuggestions() is called.
> >>
> >
>

Solr 3.5 MoreLikeThis on Date fields

2012-01-16 Thread Jaco Olivier

Hi Everyone,

Please help out if you know what is going on.
We are upgrading to Solr 3.5 (from 1.4.1) and busy with a Re-Index and Test on 
our data.

Everything seems OK, but Date Fields seem to be "broken" when using with the 
MoreLikeThis handler 
(I also saw the same error on Date Fields using the HighLighter in another 
forum post "Invalid Date String for highlighting any date field match @ Mon 
2011/08/15 13:10 ").
* I deleted the index/core and only loaded a few records and still get the 
error when using the MoreLikeThis using the "docdate" as part of the mlt.fl 
params.
* I double checked all the data that was loaded and the dates parse 100% and 
can see no problems with any of the data loaded.

Type: 
Definition:   
A sample result: 1999-06-28T00:00:00Z

THE MLT QUERY:

Jan 16, 2012 4:09:16 PM org.apache.solr.core.SolrCore execute
INFO: [legal_spring] webapp=/solr path=/select 
params={mlt.fl=doctitle,pld_pubtype,docdate,pld_cluster,pld_port,pld_summary,alltext,subclass&mlt.mintf=1&mlt=true&version=2.2&fl=doc_id,doctitle,docdate,prodtype&qt=mlt&mlt.boost=true&mlt.qf=doctitle^5.0+alltext^0.2&json.nl=map&wt=json&rows=50&mlt.mindf=1&mlt.count=50&start=0&q=doc_id:PLD23996}
 status=400 QTime=1

THE ERROR:

Jan 16, 2012 4:09:16 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'94046400'
at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
at 
org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:106)
at 
org.apache.solr.analysis.TrieTokenizer.(TrieTokenizerFactory.java:76)
at 
org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:51)
at 
org.apache.solr.analysis.TrieTokenizerFactory.create(TrieTokenizerFactory.java:41)
at 
org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:68)
at 
org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:75)
at 
org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:385)
at 
org.apache.lucene.search.similar.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:876)
at 
org.apache.lucene.search.similar.MoreLikeThis.retrieveTerms(MoreLikeThis.java:820)
at 
org.apache.lucene.search.similar.MoreLikeThis.like(MoreLikeThis.java:629)
at 
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:311)
at 
org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:149)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:619)

Sincerely,
Jaco Olivier
Please note: This email and its content are subject to the disclaimer as 
displayed at the following link 
http://www.sabinet.co.za/?page=e-mail-disclaimer. Should you not have Web 
access, send an email to i...@sabinet.co.za<mailto:i...@sabinet.co.za> and a 
copy will be sent to you

Faceted search outofmemory

Hi,
I try to make a faceted search on a very large index (around 200GB with 200M
doc).
I have an out of memory error. With no facet it works fine.

There are quite many questions around this but I could not find the answer.
How can we know the required memory when facets are used so that I try to
scale my server/index correctly to handle it.

Thanks

Olivier

Re: Faceted search outofmemory

How do make paging over facets?

2010/6/29 Ankit Bhatnagar 

>
> Did you trying paging them?
>
>
> -Original Message-
> From: olivier sallou [mailto:olivier.sal...@gmail.com]
> Sent: Tuesday, June 29, 2010 2:04 PM
> To: solr-user@lucene.apache.org
> Subject: Faceted search outofmemory
>
> Hi,
> I try to make a faceted search on a very large index (around 200GB with
> 200M
> doc).
> I have an out of memory error. With no facet it works fine.
>
> There are quite many questions around this but I could not find the answer.
> How can we know the required memory when facets are used so that I try to
> scale my server/index correctly to handle it.
>
> Thanks
>
> Olivier
>

Re: Re: Faceted search outofmemory

I already use facet.limit in my query. I tried however facet.method=enum and
though it does not seem to fix everything, I have some requests without the
outofmemory error.
Best would be to have a calculation rule of required memory for such type of
query.

2010/6/29 Markus Jelsma 

> http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit
>
> -Original message-
> From: olivier sallou 
> Sent: Tue 29-06-2010 20:11
> To: solr-user@lucene.apache.org;
> Subject: Re: Faceted search outofmemory
>
> How do make paging over facets?
>
> 2010/6/29 Ankit Bhatnagar 
>
> >
> > Did you trying paging them?
> >
> >
> > -Original Message-
> > From: olivier sallou [mailto:olivier.sal...@gmail.com]
> > Sent: Tuesday, June 29, 2010 2:04 PM
> > To: solr-user@lucene.apache.org
> > Subject: Faceted search outofmemory
> >
> > Hi,
> > I try to make a faceted search on a very large index (around 200GB with
> > 200M
> > doc).
> > I have an out of memory error. With no facet it works fine.
> >
> > There are quite many questions around this but I could not find the
> answer.
> > How can we know the required memory when facets are used so that I try to
> > scale my server/index correctly to handle it.
> >
> > Thanks
> >
> > Olivier
> >
>

Re: Faceted search outofmemory

I have given 6G to Tomcat. Using facet.method=enum and facet.limit seems to
fix the issue with a few tests, but I do know that it is not a "final"
solution. Will work under certain configurations.

Real "issue" is to be able to know what is the required RAM for an index...

2010/6/29 Nagelberg, Kallin 

> How much memory have you given the solr jvm? Many servlet containers have
> small amount by default.
>
> -Kal
>
> -Original Message-
> From: olivier sallou [mailto:olivier.sal...@gmail.com]
> Sent: Tuesday, June 29, 2010 2:04 PM
> To: solr-user@lucene.apache.org
> Subject: Faceted search outofmemory
>
> Hi,
> I try to make a faceted search on a very large index (around 200GB with
> 200M
> doc).
> I have an out of memory error. With no facet it works fine.
>
> There are quite many questions around this but I could not find the answer.
> How can we know the required memory when facets are used so that I try to
> scale my server/index correctly to handle it.
>
> Thanks
>
> Olivier
>

Re: Tag generation

2010-07-15 Thread Olivier Dobberkau


Am 15.07.2010 um 17:34 schrieb kenf_nc:

> A colleague mentioned that he knew of services where you pass some content
> and it spits out some suggested Tags or Keywords that would be best suited
> to associate with that content.
> 
> Does anyone know if there is a contrib to Solr or Lucene that does something
> like this? Or a third party tool that can be given a solr index or solr
> query and it comes up with some good Tag suggestions?

Hi

there something from http://www.zemanta.com/
and something from basis tech http://www.basistech.com/

i am not sure if this would help. you could have a look at

http://uima.apache.org/

greetings,

olivier

--

Olivier Dobberkau

Spatial filtering

2010-07-19 Thread Olivier Ricordeau

How to get the list of all available fields in a (sharded) index

2010-07-19 Thread olivier sallou

Hi,
I cannot find any info on how to get the list of current fields in an index
(possibly sharded). With dynamic fields, I cannot simply parse the schema to
know what field are available.
Is there any way to get it via a request (or easilly programmable) ? I know
information is available in one of the Lucene generated files, but I 'd like
to get it via a query for my whole index.

Thanks

Olivier

Re: dismax request handler without q

2010-07-19 Thread olivier sallou

Hi,
this is not very clear, if you need to query only keyphrase, why don't you
query directly it? e.g. q=keyphrase:hotel ?
Furthermore, why dismax if only keyphrase field is of interest? dismax is
used to query multiple fields automatically.

At least dismax do not appear in your query (using query type). It is set in
your config for your default request handler?

2010/7/20 Chamnap Chhorn 

> I wonder how could i make a query to return only *all books* that has
> keyphrase "web development" using dismax handler? A book has multiple
> keyphrases (keyphrase is multivalued column). Do I have to pass q
> parameter?
>
>
> Is it the correct one?
> http://locahost:8081/solr/select?&q=hotel&fq=keyphrase:%20hotel
>
> --
> Chhorn Chamnap
> http://chamnapchhorn.blogspot.com/
>

Re: Spatial filtering

2010-07-20 Thread Olivier Ricordeau

Re: Spatial filtering

2010-07-20 Thread Olivier Ricordeau

Re: dismax request handler without q

2010-07-20 Thread olivier sallou

q will search in defaultSearchField if no field name is set, but you can
specify in your "q" param the fields you want to search into.

Dismax is a handler where you can specify to look in a number of fields for
the input query. In this case, you do not specify the fields and dismax will
look in the fields specified in its configuration.
However, by default, dismax is not used, it needs to be called help with the
query type parameter (qt=dismax).

In default solr config, you can call ...solr/select?q=keyphrase:hotel if
keyphrzase is a declared field in your schema

2010/7/20 Chamnap Chhorn 

> I can't put q=keyphrase:hotel in my request using dismax handler. It
> returns
> no result.
>
> On Tue, Jul 20, 2010 at 1:19 PM, Chamnap Chhorn  >wrote:
>
> > There are some default configuration on my solrconfig.xml that I didn't
> > show you. I'm a little confused when reading
> > http://wiki.apache.org/solr/DisMaxRequestHandler#q. I think q is for
> plain
> > user input query.
> >
> >
> > On Tue, Jul 20, 2010 at 12:08 PM, olivier sallou <
> olivier.sal...@gmail.com
> > > wrote:
> >
> >> Hi,
> >> this is not very clear, if you need to query only keyphrase, why don't
> you
> >> query directly it? e.g. q=keyphrase:hotel ?
> >> Furthermore, why dismax if only keyphrase field is of interest? dismax
> is
> >> used to query multiple fields automatically.
> >>
> >> At least dismax do not appear in your query (using query type). It is
> set
> >> in
> >> your config for your default request handler?
> >>
> >> 2010/7/20 Chamnap Chhorn 
> >>
> >> > I wonder how could i make a query to return only *all books* that has
> >> > keyphrase "web development" using dismax handler? A book has multiple
> >> > keyphrases (keyphrase is multivalued column). Do I have to pass q
> >> > parameter?
> >> >
> >> >
> >> > Is it the correct one?
> >> > http://locahost:8081/solr/select?&q=hotel&fq=keyphrase:%20hotel
> >> >
> >> > --
> >> > Chhorn Chamnap
> >> > http://chamnapchhorn.blogspot.com/
> >> >
> >>
> >
> >
> >
> > --
> > Chhorn Chamnap
> > http://chamnapchhorn.blogspot.com/
> >
>
>
>
> --
> Chhorn Chamnap
> http://chamnapchhorn.blogspot.com/
>

Solr and Lucene in South Africa

2010-07-30 Thread Jaco Olivier

Hi to all Solr/Lucene Users...

Out team had a discussion today regarding the Solr/Lucene community closer to 
home.
I am hereby putting out an SOS to all Solr/Lucene users in the South African 
market and wish to organize a meet-up (or user support group) if at all 
possible.
It would be great to share some triumphs and pitfalls that were experienced.

* Sorry for hogging the User Mailing list on non-technical question, but think 
this is the easiest way to get it done :)

Jaco Olivier
Web Specialist

Please note: This email and its content are subject to the disclaimer as 
displayed at the following link 
http://www.sabinet.co.za/?page=e-mail-disclaimer. Should you not have Web 
access, send an email to i...@sabinet.co.za<mailto:i...@sabinet.co.za> and a 
copy will be sent to you

Replication and CPU

2010-10-12 Thread Olivier RICARD


Hello,

I setup a server for the replication of Solr. I used 2 cores and for 
each one I specified the replication. I followed the tutorial on 
http://wiki.apache.org/solr/SolrReplication.


The replication is OK for each cores. However the CPU is used to 100% on 
the slave. The master and slave are 2 servers with the same hardware 
configuration. I don't understand which can cause the problem. The slave 
is launched by :



java -Dsolr.solr.home=/solr/multicore -Denable.master=false 
-Denable.slave=true -Xms512m -Xmx1536m -XX:+UseConcMarkSweepGC -jar 
start.jar


If I comment the replication the server is OK.

Anyone have an idea ?

Regards,
Olivier

Re: Replication and CPU

2010-10-12 Thread Olivier RICARD

Hello Peter,

On the slave server http://slave/solr/core0/admin/replication/index.jsp

Poll Interval00:30:00
Local Index Index Version: 1284026488242, Generation: 13102
Location: /solr/multicore/core0/data/index
Size: 26.9 GB
Times Replicated Since Startup: 289
Previous Replication Done At: Tue Oct 12 12:00:00 GMT+02:00 2010
Config Files Replicated At: 1286790818824
Config Files Replicated: [solrconfig_slave.xml]
Times Config Files Replicated Since Startup: 1
Next Replication Cycle At: Tue Oct 12 12:30:00 GMT+02:00 2010

The request Handler on the slave :

name="masterUrl">http://master/solr/${solr.core.name}/replication

00:30:00

I increased the poll interval because I thought that there were too many
changes. Currently there are no changes on the master and the slave is
always to 100% of cpu.

On the master, I have

startup
commit
name="confFiles">solrconfig_slave.xml:solrconfig.xml,schema.xml,stopwords.txt,elevate.xml,protwords.txt,spellings.txt,synonyms.txt

00:00:10

Regards,
Olivier

Le 12/10/2010 12:11, Peter Karich a écrit :

Hi Olivier,

maybe the slave replicates after startup? check replication status here:
http://localhost/solr/admin/replication/index.jsp

what is your poll frequency (could you paste the replication part)?

Regards,
Peter.

Hello,

I setup a server for the replication of Solr. I used 2 cores and for
each one I specified the replication. I followed the tutorial on
http://wiki.apache.org/solr/SolrReplication.

The replication is OK for each cores. However the CPU is used to 100%
on the slave. The master and slave are 2 servers with the same
hardware configuration. I don't understand which can cause the
problem. The slave is launched by :

java -Dsolr.solr.home=/solr/multicore -Denable.master=false
-Denable.slave=true -Xms512m -Xmx1536m -XX:+UseConcMarkSweepGC -jar
start.jar

If I comment the replication the server is OK.

Anyone have an idea ?

Regards,
Olivier

Re: Can solr index folder can be moved from one system to another?

2012-03-22 Thread olivier sallou

The index is not directory related, there is no path information in the
index. You can create an index then move it anywhere (or merge it with an
other one).

I often do this, there is no issue.

Olivier

2012/3/22 ravicv 

> Hi Tomás,
>
> I can not use Solr replcation in my scenario. My requirement is to gzip the
> solr index folder and send to dotnet system through webservice.
> Then in dotnet the same index folder should be unzipped and same folder
> should be used as an index folder through solrnet .
>
> Whether my requirement is possible?
>
> Thanks
> Ravi
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Can-solr-index-folder-can-be-moved-from-one-system-to-another-tp3844919p3847725.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

gpg key id: 4096R/326D8438  (keyring.debian.org)

Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Solr Cell and operations on metadata extracted

2011-05-16 Thread Olivier Tavard

Hi,



I have a question about Solr Cell please.

I index some files. For example, if I want to extract the filename, then use
a hash function on it like MD5 and then store it on Solr ; the correct way
is to use Tika « manually » to extract the metadata I want, do the
transformations on it and then send it to Solr ?

I can’t use directly Solr Cell in this case because I can't do modifications
on the metadata extracted, right ?





Thanks,



Olivier

Re: how to request for Json object

2011-06-02 Thread olivier sallou

ajax does not allow request to an other domain.
Only sway, unless using server side requests, is going through a proxy that
would hide the host origin so that ajax request think both servers are the
same

2011/6/2 Romi 

> How to parse Json through ajax when your ajax pager is on one
> server(Tomcat)and Json object is of onther server(solr server). i mean i
> have to make a request to another server, how can i do it .
>
> -
> Thanks & Regards
> Romi
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-request-for-Json-object-tp3014138p3014138.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

SOlr upgrade: Invalid version (expected 2, but 1) error when using shards

2011-08-16 Thread olivier sallou

Hi,
I just migrated to solr 3.3 from 1.4.1.
My index is still in 1.4.1 format (will be migrated soon).

I have an error when I use sharding with the new version:

org.apache.solr.common.SolrException: java.lang.RuntimeException: Invalid
version (expected 2, but 1) or the data in not in 'javabin' format

However, if I request each shard independently (/request), answer is
correct. So the error is triggered only with the shard mechanism.

While I foresee to upgrade my indexes, I'd like to understand the issue,
e.g. is it an "upgrade" issue or don't shards support using an "old" format.

Thanks

Olivier

lucene 3 and merge/optimize

2011-08-18 Thread olivier sallou

Hi,
after an upgrade to solr/lucene 3, I tried to change the code to remove
deprecated functions  Though new MergePolicy etc... are not really
clear.

I have now issues with the merge and optimize functions.

I have a command line application (Java/Lucene api) that merge multiple
indexes in a single one, or optimize an existing index (this is done
offline)

When I execute my code, the merge creates a new index, but looks to contain
more files than before (with solr 4.1), why not...
When I try to optimize, code says OK, but I still have many files, segments
: (below for a very small example)
_0.fdt  _0.tis  _1.tii  _2.prx  _3.nrm  _4.frq  _5.fnm  _6.fdx  _7.fdt
 _7.tis  _8.tii  _9.prx  _a.nrm  _b.frq
_0.fdx  _1.fdt  _1.tis  _2.tii  _3.prx  _4.nrm  _5.frq  _6.fnm  _7.fdx
 _8.fdt  _8.tis  _9.tii  _a.prx  _b.nrm
_0.fnm  _1.fdx  _2.fdt  _2.tis  _3.tii  _4.prx  _5.nrm  _6.frq  _7.fnm
 _8.fdx  _9.fdt  _9.tis  _a.tii  _b.prx
_0.frq  _1.fnm  _2.fdx  _3.fdt  _3.tis  _4.tii  _5.prx  _6.nrm  _7.frq
 _8.fnm  _9.fdx  _a.fdt  _a.tis  _b.tii
_0.nrm  _1.frq  _2.fnm  _3.fdx  _4.fdt  _4.tis  _5.tii  _6.prx  _7.nrm
 _8.frq  _9.fnm  _a.fdx  _b.fdt  _b.tis
_0.prx  _1.nrm  _2.frq  _3.fnm  _4.fdx  _5.fdt  _5.tis  _6.tii  _7.prx
 _8.nrm  _9.frq  _a.fnm  _b.fdx  segments_1
_0.tii  _1.prx  _2.nrm  _3.frq  _4.fnm  _5.fdx  _6.fdt  _6.tis  _7.tii
 _8.prx  _9.nrm  _a.frq  _b.fnm  segments.gen

I'd like to reduce with the optimize or the merge to the minimum the number
of files, my index is read only and does not change.

Here is the code for optimize, am I doing something wrong?

 IndexWriterConfig conf = new
IndexWriterConfig(Version.LUCENE_33,newStandardAnalyzer(Version.
LUCENE_33));

 conf.setRAMBufferSizeMB(50);

 LogByteSizeMergePolicy policy = new LogByteSizeMergePolicy();

 policy.setMaxMergeDocs(10);

 conf.setMergePolicy(policy);

 IndexWriter writer =
newIndexWriter(FSDirectory.open(INDEX_DIR),getIndexConfig() );


  writer.optimize();

 writer.close();



Thanks


Olivier

Re: lucene 3 and merge/optimize

2011-08-18 Thread olivier sallou

answer to myself, to be checked...

I used policy.setMaxMergeDocs(10),  limiting to small number of filesat
least for merge.
I gonna test.

2011/8/18 olivier sallou 

> Hi,
> after an upgrade to solr/lucene 3, I tried to change the code to remove
> deprecated functions  Though new MergePolicy etc... are not really
> clear.
>
> I have now issues with the merge and optimize functions.
>
> I have a command line application (Java/Lucene api) that merge multiple
> indexes in a single one, or optimize an existing index (this is done
> offline)
>
> When I execute my code, the merge creates a new index, but looks to contain
> more files than before (with solr 4.1), why not...
> When I try to optimize, code says OK, but I still have many files, segments
> : (below for a very small example)
> _0.fdt  _0.tis  _1.tii  _2.prx  _3.nrm  _4.frq  _5.fnm  _6.fdx  _7.fdt
>  _7.tis  _8.tii  _9.prx  _a.nrm  _b.frq
> _0.fdx  _1.fdt  _1.tis  _2.tii  _3.prx  _4.nrm  _5.frq  _6.fnm  _7.fdx
>  _8.fdt  _8.tis  _9.tii  _a.prx  _b.nrm
> _0.fnm  _1.fdx  _2.fdt  _2.tis  _3.tii  _4.prx  _5.nrm  _6.frq  _7.fnm
>  _8.fdx  _9.fdt  _9.tis  _a.tii  _b.prx
> _0.frq  _1.fnm  _2.fdx  _3.fdt  _3.tis  _4.tii  _5.prx  _6.nrm  _7.frq
>  _8.fnm  _9.fdx  _a.fdt  _a.tis  _b.tii
> _0.nrm  _1.frq  _2.fnm  _3.fdx  _4.fdt  _4.tis  _5.tii  _6.prx  _7.nrm
>  _8.frq  _9.fnm  _a.fdx  _b.fdt  _b.tis
> _0.prx  _1.nrm  _2.frq  _3.fnm  _4.fdx  _5.fdt  _5.tis  _6.tii  _7.prx
>  _8.nrm  _9.frq  _a.fnm  _b.fdx  segments_1
> _0.tii  _1.prx  _2.nrm  _3.frq  _4.fnm  _5.fdx  _6.fdt  _6.tis  _7.tii
>  _8.prx  _9.nrm  _a.frq  _b.fnm  segments.gen
>
> I'd like to reduce with the optimize or the merge to the minimum the number
> of files, my index is read only and does not change.
>
> Here is the code for optimize, am I doing something wrong?
>
>  IndexWriterConfig conf = new 
> IndexWriterConfig(Version.LUCENE_33,newStandardAnalyzer(Version.
> LUCENE_33));
>
>  conf.setRAMBufferSizeMB(50);
>
>  LogByteSizeMergePolicy policy = new LogByteSizeMergePolicy();
>
>  policy.setMaxMergeDocs(10);
>
>  conf.setMergePolicy(policy);
>
>  IndexWriter writer = 
> newIndexWriter(FSDirectory.open(INDEX_DIR),getIndexConfig() );
>
>
>   writer.optimize();
>
>  writer.close();
>
>
>
> Thanks
>
>
> Olivier
>

Re: solr distributed search don't work

2011-08-19 Thread olivier sallou

Hi,
I do not use spell but I use distributed search, using qt=spell is correct,
should not use qt=\spell.
For "shards", I specify it in solrconfig directly, not in url, but should
work the same.
Maybe an issue in your spell request handler.


2011/8/19 Li Li 

> hi all,
> I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
> but there is something wrong.
> the url given my the wiki is
>
> http://solr:8983/solr/select?q=*:*&spellcheck=true&spellcheck.build=true&spellcheck.q=toyata&qt=spell&shards.qt=spell&shards=solr-shard1:8983/solr,solr-shard2:8983/solr
> but it does not work. I trace the codes and find that
> qt=spell&shards.qt=spell should be qt=/spell&shards.qt=/spell
> After modification of url, It return all documents but nothing
> about spell check.
> I debug it and find the
> AbstractLuceneSpellChecker.getSuggestions() is called.
>

Re: Solr CMS Integration

2009-08-07 Thread Olivier Dobberkau



Am 07.08.2009 um 19:01 schrieb wojtekpia:

I've been asked to suggest a framework for managing a website's  
content and
making all that content searchable. I'm comfortable using Solr for  
search,

but I don't know where to start with the content management system. Is
anyone using a CMS (open source or commercial) that you've  
integrated with
Solr for search and are happy with? This will be a consumer facing  
website

with a combination or articles, blogs, white papers, etc.



Hi Wojtek,

Have a look at TYPO3. http://typo3.org/
It is quite powerful.
Ingo and I are currently implementing a SOLR extension for it.
We currently use it at http://www.be-lufthansa.com/
Contact me if you want an insight.

Many greetings,

Olivier


--
Olivier Dobberkau
. . . . . . . . . . . . . .
Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstrasse 73
D 60329 Frankfurt/Main

Fon:  +49 (0)69 - 247 52 18 - 0
Fax:  +49 (0)69 - 247 52 18 - 99

Mail: olivier.dobber...@dkd.de
Web: http://www.dkd.de

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

Aktuelle Projekte:
http://bewegung.taz.de - Launch (Ruby on Rails)
http://www.hans-im-glueck.de - Relaunch (TYPO3)
http://www.proasyl.de - Relaunch (TYPO3)

Re: Showcase: Facetted Search for Wine using Solr

2009-09-28 Thread Olivier Dobberkau


Marian Steinbach schrieb:

On Sat, Sep 26, 2009 at 3:22 AM, Lance Norskog  wrote:
  

Have you seen this? It is another Solr/Typeo3 integration project.

http://forge.typo3.org/projects/show/extension-solr

Would you consider open-sourcing your Solr/Typo3 integration?




Hi Lance!

I wasn't aware of that extension. Having looked at the website, it
does something very different from what we did. The solr extension
mentioned above tries to provide a better website search for the Typo3
CMS on top of Solr.

Our integration doesn't index web pages but product data from an XML
file. I'd say the implementation is pretty much customer-specific so
that I don't see a real benefit of making it open source.

Regards,

Marian
  


hi marian.
our extension will be able to do see also once we have set up the 
indexing queue for the typo3 backend.
we have a concept called typo3 extensions connectors so that you will be 
able to add index documents to your index.
feel free to contact ingo about the contribution possibililies in our 
solr project.
if you use open source software you shoud definitly contribute. this 
gives you great karma.

or as we at typo3 say. inspire people to share!

olivier

Re: i want to use something like query similar to database - %query% like search

2009-12-02 Thread Olivier Dobberkau


Am 02.12.2009 um 09:55 schrieb amittripathi:

> its accepting the trailing wildcard character but solr is not accepting the
> leading wildcard character

The Error message says it all.

'*' or '?' not allowed as first character in WildcardQuery 

solr is not SQL.

Olivier

--

Olivier Dobberkau

RE: why no results?

2009-12-08 Thread Jaco Olivier

Hi Regan,

I am using STRING fields only for values that in most cases will be used
to FACET on..
I suggest using TEXT fields as per the default examples...

ALSO, remember that if you do not specify the "
solr.LowerCaseFilterFactory " that your search has just become case
sensitive.. I struggled with that one before, so make sure what you are
indexing is what you are searching for.
* Stick to the default examples that is provided with the SOLR distro
and you should be fine.

Jaco Olivier

-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 08 December 2009 06:15
To: solr-user@lucene.apache.org
Subject: Re: why no results?

Tom Hill-7 wrote:
> 
> Try solr.TextField instead.
> 

Thanks Tom,

I've replaced the  section above with...

deleted my index, restarted Solr and re-indexed my documents - but the
search still returns nothing.

Do I need to change the type in the  sections as well?

regan
-- 
View this message in context:
http://old.nabble.com/why-no-results--tp26688249p26688469.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.

RE: why no results?

2009-12-08 Thread Jaco Olivier

Hi,

Try changing your TEXT field to type "text"
 (without the  of course :))

That is your problem... also use the "text" type as per default examples
with SOLR distro :)

Jaco Olivier


-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 08 December 2009 05:44
To: solr-user@lucene.apache.org
Subject: why no results?


hi all - newbie solr question - I've indexed some documents and can
search /
receive results using the following schema - BUT ONLY when searching on
the
"id" field. If I try searching on the title, subtitle, body or text
field I
receive NO results. Very confused. :confused: Can anyone see anything
obvious I'm doing wrong Regan.











 






 

 
 id

 
 text

 
 

 






-- 
View this message in context:
http://old.nabble.com/why-no-results--tp26688249p26688249.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.

RE: do copyField's need to exist as Fields?

2009-12-08 Thread Jaco Olivier

Hi Regan,

Something I noticed on your setup...
The ID field in your setup I assume to be your uniqueID for the book or
journal (The ISSN or something)
Try making this a string as TEXT is not the ideal field to use for
unique IDs

Congrats on figuring out SOLR fields - I suggest getting the SOLR 1.4
Book.. It really saved me a 1000 questions on this mailing list :)

Jaco Olivier

-Original Message-
From: regany [mailto:re...@newzealand.co.nz] 
Sent: 09 December 2009 00:48
To: solr-user@lucene.apache.org
Subject: Re: do copyField's need to exist as Fields?

regany wrote:
> 
> Is there a different way I should be setting it up to achieve the
above??
> 

Think I figured it out.

I set up the  so they are present, but get ignored accept for
the
"text" field which gets indexed...

and then copyField the first 4 fields to the "text" field:

Seems to be working!? :drunk:
-- 
View this message in context:
http://old.nabble.com/do-copyField%27s-need-to-exist-as-Fields--tp267017
06p26702224.html
Sent from the Solr - User mailing list archive at Nabble.com.

Please consider the environment before printing this email. This 
transmission is for the intended addressee only and is confidential 
information. If you have received this transmission in error, please 
delete it and notify the sender. The content of this e-mail is the 
opinion of the writer only and is not endorsed by Sabinet Online Limited 
unless expressly stated otherwise.

Re: Severe errors in solr configuration



Am 04.02.2009 um 13:33 schrieb Anto Binish Kaspar:


Hi,
I am trying to configure solr on ubuntu server and I am getting the  
following exception. I can able work it on windows box.



Hi Anto.

Have you installed the solr package 1.2 from ubuntu?
Or the release 1.3 as war file?

Olivier

--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)

Re: Severe errors in solr configuration



Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar:


Hi Olivier

Thanks for your quick reply. I am using the release 1.3 as war file.

- Anto Binish Kaspar


OK.
As far a i understood you need to make sure that your solr home is set.
this needs to be done in

Quting:

http://wiki.apache.org/solr/SolrTomcat

In addition to using the default behavior of relying on the Solr Home  
being in the current working directory (./solr) you can alternately  
add the solr.solr.home system property to your JVM settings before  
starting Tomcat...


export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/my/custom/solr/home/dir/"

...or use a Context file to configure the Solr Home using JNDI

A Tomcat context fragments can be used to configure the JNDI property  
needed to specify your Solr Home directory.


Just put a context fragment file under $CATALINA_HOME/conf/Catalina/ 
localhost that looks something like this...


$ cat /tomcat55/conf/Catalina/localhost/solr.xml


   



Greetings,

Olivier

PS: May be it would be great if we could provide an ubuntu dpkg with  
1.3 ? Any takers?


--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)

Re: Severe errors in solr configuration


A slash?

Olivier

Von meinem iPhone gesendet


Am 04.02.2009 um 14:06 schrieb Anto Binish Kaspar :


I am using Context file, here is my solr.xml

$ cat /var/lib/tomcat6/conf/Catalina/localhost/solr.xml






I change the ownership of the folder (usr/local/solr/solr-1.3/solr)  
to tomcat6:tomcat6 from root:root


Anything I am missing?

- Anto Binish Kaspar


-Original Message-
From: Olivier Dobberkau [mailto:olivier.dobber...@dkd.de]
Sent: Wednesday, February 04, 2009 6:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Severe errors in solr configuration


Am 04.02.2009 um 13:54 schrieb Anto Binish Kaspar:


Hi Olivier

Thanks for your quick reply. I am using the release 1.3 as war file.

- Anto Binish Kaspar


OK.
As far a i understood you need to make sure that your solr home is  
set.

this needs to be done in

Quting:

http://wiki.apache.org/solr/SolrTomcat

In addition to using the default behavior of relying on the Solr Home
being in the current working directory (./solr) you can alternately
add the solr.solr.home system property to your JVM settings before
starting Tomcat...

export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/my/custom/solr/home/ 
dir/"


...or use a Context file to configure the Solr Home using JNDI

A Tomcat context fragments can be used to configure the JNDI property
needed to specify your Solr Home directory.

Just put a context fragment file under $CATALINA_HOME/conf/Catalina/
localhost that looks something like this...

$ cat /tomcat55/conf/Catalina/localhost/solr.xml


   


Greetings,

Olivier

PS: May be it would be great if we could provide an ubuntu dpkg with
1.3 ? Any takers?

--
Olivier Dobberkau

Je TYPO3, desto d.k.d

d.k.d Internet Service GmbH
Kaiserstr. 79
D 60329 Frankfurt/Main

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer:
Olivier Dobberkau, Søren Schaffstein, Götz Wegenast

fon:  +49 (0)69 - 43 05 61-70
fax:  +49 (0)69 - 43 05 61-90
mail: olivier.dobber...@dkd.de
home: http://www.dkd.de

aktuelle TYPO3-Projekte:
www.licht.de - Relaunch (TYPO3)
www.lahmeyer.de - Launch (TYPO3)
www.seb-assetmanagement.de - Relaunch (TYPO3)

Re: Severe errors in solr configuration