Faceting unknown fields

2010-07-08 Thread Mickael Magniez

Hello,

I'm wondering if it's possible to index and facet "unknown" fields.

Let's me explain:
I've got a set of 1M products (from computer to freezer), and each category
of product has some attributes, so number of attributes is pretty large
(1000+).


I've started to describe each attribute in my schema, but i think it will be
hard to maintain.

So, can I index and facet these fields, without describe then in my schema?

I will first try with dynamic fields, but I'm not sure it's going to work.

Anyone's got some idea?


Mickael.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceting-unknown-fields-tp951008p951008.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting unknown fields

2010-07-08 Thread Rebecca Watson
hi,

> So, can I index and facet these fields, without describe then in my schema?
>
> I will first try with dynamic fields, but I'm not sure it's going to work.

we do all our facet fields in this way, with just general string field
for single/multivalued
fields:

 
 

and faceting works...

but you will still need to know the specific name of the field(s) to use in the
facet.field URL parameter (i.e. as long as your UI knows!).

hope that helps

bec :)


Re: Faceting unknown fields

2010-07-08 Thread Mickael Magniez

Thanks,

I'll test your solution shortly


Mickael.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceting-unknown-fields-tp951008p951027.html
Sent from the Solr - User mailing list archive at Nabble.com.


Spellcheck help

2010-07-08 Thread Marc Ghorayeb

Hello,I've been trying to get rid of a bug when using the spellcheck but so far 
with no success :(When searching for a word that starts with a number, for 
example "3dsmax", i get the results that i want, BUT the spellcheck says it is 
not correctly spelled AND the collation gives me "33dsmax". Further 
investigation shows that the spellcheck is actually only checking "dsmax" which 
it considers does not exist and gives me "3dsmax" for better results, but since 
i have spellcheck.collate = true, the collation that i show is "33dsmax" with 
the first 3 being the one discarded by the spellchecker... Otherwise, the 
spellcheck works correctly for normal words... any ideas? :(My spellcheck field 
is fairly classic, whitespace tokenizer, with lowercase filter...Any help would 
be greatly appreciated :)Thanks,Marc
_
Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
http://www.messengersurvotremobile.com/?d=iPhone

Score boosting

2010-07-08 Thread Chamnap Chhorn
Hi everyone,

I have a requirement to achieve, but i can't figure out how to do it. Hope
someone could help me.

Here is the requirement: A book has several keyphrases (available to use in
searching). The author could buy the search result position with these
keyphrases or simply add keyphrases related to this book. Here, I need to
implement the search affected by the position field.

I'm not so sure how to implement this requirement. Hope anyone could help
me!

-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Distributed Indexing

2010-07-08 Thread Li Li
Is there any tools for "Distributed Indexing"? It refers to
KattaIntegration  and ZooKeeperIntegration in
http://wiki.apache.org/solr/DistributedSearch.
But it seems that they concern more on error processing and
replication. I need a dispatcher that dispatch different docs by
uniqueKey(such as url)  to different machines. And when a doc is
updated, the doc is sent to the machine that contains the url. Also I
need the docs are randomly sent to all the machines so that when I do
a distributed search the idfs of different machines are similar
because the current distributed search's idf are local.


Re: How do I get the matched terms of my query?

2010-07-08 Thread osocurious2

if you want only documents that have both values then make your q
   q=content:videos+AND+content:songs

If you want the more open query, but to be able to tell which docs have
videos, which have songs and which have both...then I'm not sure. Using
debugQuery=on might help with your understanding,  but isn't a good runtime
solution if you needed that.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-get-the-matched-terms-of-my-query-tp951422p951492.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Score boosting

2010-07-08 Thread osocurious2

Sounds like you want Payloads. I don't think you can guarantee a position,
but you can boost relative to others. You can give one author/book a boost
of 0 for the phrase Cooking, and another author/book a boost of .5 and yet
another a boost of 1.0. For searches that include the phrase Cooking, the
scores should reflect the boosts and the authors that bought the higher
boost value will sort higher. These discuss Payloads (it isn't a trivial
task by the way):
  http://www.ultramagnus.org/?p=1
 
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
or use this to see other Solr-User group discussions on the topic:

http://lucene.472066.n3.nabble.com/template/NodeServlet.jtp?tpl=search-page&node=472068&query=Using+Lucene's+payload+in+Solr

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Score-boosting-tp951214p951510.html
Sent from the Solr - User mailing list archive at Nabble.com.


Filter multivalue fields from search result

2010-07-08 Thread Alex J. G. Burzyński
Hi,

Is it possible to remove from search results the multivalued fields that
don't pass the search criteria?

My schema is defined as:










And example docs are:

++--+++
| id | name | town   | date   |
++--+++
| 1  | Microsoft Excel  | London | 2010-08-20 |
||  | Glasgow| 2010-08-24 |
||  | Leeds  | 2010-08-28 |
| 2  | Microsoft Word   | Aberdeen   | 2010-08-21 |
||  | Reading| 2010-08-25 |
||  | London | 2010-08-29 |
| 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
||  | Leeds  | 2010-08-26 |
++--+++

so the query for q=name:Microsoft town:Leeds returns docs 1 & 3.

How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?

Or is it that I should create separate doc for each name-event?

Thanks,
Alex


solr connection question

2010-07-08 Thread ZAROGKIKAS,GIORGOS
Hi solr users 

I need to know how solr manages the connections when we make a request(select 
update commit)
Is there any connection pooling or an article to learn about it connection 
management??
How can I log in a file the connections solr server 

I have setup my solr 1.4 with tomcat

Thanks in advance 





Re: solr connection question

2010-07-08 Thread Sven Maurmann

Hi,

Solr runs as a Web application. The requests you most probably mean
are just HTTP-requests to the underlying container. Internally each
request is processed against the Lucene index, usually being a file-
based one. Therefore there are no connections like in a database
application, where you have a pool of connections to your remote
databse server.

Best,
  Sven

--On Donnerstag, 8. Juli 2010 15:46 +0300 "ZAROGKIKAS,GIORGOS" 
 wrote:



Hi solr users

I need to know how solr manages the connections when we make a
request(select update commit) Is there any connection pooling or an
article to learn about it connection management?? How can I log in a file
the connections solr server

I have setup my solr 1.4 with tomcat

Thanks in advance


Re: solr connection question

2010-07-08 Thread Ruben Abad
Jorl, ok tendré que modificar mi petición de vacaciones :(
Rubén Abad 


On Thu, Jul 8, 2010 at 2:46 PM, ZAROGKIKAS,GIORGOS <
g.zarogki...@multirama.gr> wrote:

> Hi solr users
>
> I need to know how solr manages the connections when we make a
> request(select update commit)
> Is there any connection pooling or an article to learn about it connection
> management??
> How can I log in a file the connections solr server
>
> I have setup my solr 1.4 with tomcat
>
> Thanks in advance
>
>
>
>


RE: solr connection question

2010-07-08 Thread ZAROGKIKAS,GIORGOS
Yes I mean  HTTP-requests 
How can I log them?
-Original Message-
From: Sven Maurmann [mailto:sven.maurm...@kippdata.de] 
Sent: Thursday, July 08, 2010 3:56 PM
To: solr-user@lucene.apache.org
Subject: Re: solr connection question

Hi,

Solr runs as a Web application. The requests you most probably mean
are just HTTP-requests to the underlying container. Internally each
request is processed against the Lucene index, usually being a file-
based one. Therefore there are no connections like in a database
application, where you have a pool of connections to your remote
databse server.

Best,
   Sven

--On Donnerstag, 8. Juli 2010 15:46 +0300 "ZAROGKIKAS,GIORGOS" 
 wrote:

> Hi solr users
>
> I need to know how solr manages the connections when we make a
> request(select update commit) Is there any connection pooling or an
> article to learn about it connection management?? How can I log in a
file
> the connections solr server
>
> I have setup my solr 1.4 with tomcat
>
> Thanks in advance


Re: solr connection question

2010-07-08 Thread Alejandro Gonzalez
ok please don't forget it :)

2010/7/8 Ruben Abad 

> Jorl, ok tendré que modificar mi petición de vacaciones :(
> Rubén Abad 
>
>
> On Thu, Jul 8, 2010 at 2:46 PM, ZAROGKIKAS,GIORGOS <
> g.zarogki...@multirama.gr> wrote:
>
> > Hi solr users
> >
> > I need to know how solr manages the connections when we make a
> > request(select update commit)
> > Is there any connection pooling or an article to learn about it
> connection
> > management??
> > How can I log in a file the connections solr server
> >
> > I have setup my solr 1.4 with tomcat
> >
> > Thanks in advance
> >
> >
> >
> >
>


RE: Distributed Indexing

2010-07-08 Thread Yuval Feinstein
Li, 
as far as I know, you still have to do this part yourself.
A possible way to shard is to number the shards from 0 to numShards-1, 
calculate hash(uniqueKey)%numShards per each document,
and send the document to the resulting shard number.
This number is consistent and sends documents uniformly to different shards.
-- Yuval

-Original Message-
From: Li Li [mailto:fancye...@gmail.com] 
Sent: Thursday, July 08, 2010 2:44 PM
To: solr-user@lucene.apache.org
Subject: Distributed Indexing

Is there any tools for "Distributed Indexing"? It refers to
KattaIntegration  and ZooKeeperIntegration in
http://wiki.apache.org/solr/DistributedSearch.
But it seems that they concern more on error processing and
replication. I need a dispatcher that dispatch different docs by
uniqueKey(such as url)  to different machines. And when a doc is
updated, the doc is sent to the machine that contains the url. Also I
need the docs are randomly sent to all the machines so that when I do
a distributed search the idfs of different machines are similar
because the current distributed search's idf are local.


Determining matched tokens in original query

2010-07-08 Thread Mark Holland
Hi,

I'm trying to find out which tokens in a user's query matched against each
result. I've been trying to use the highlight component for this, however it
doesn't quite fit the bill.

I'm using edismax, with mm set to 50%, and I want to extract for each
matching doc which tokens /didn't/ match (I then strip the matching tokens
from the search string and run the remaining query against a different solr
index).

My problem comes that the highlighter, naturally, applies highlighting to
fields after filters have been applied. This means it's tricky to use the
highlighted terms to match the original query because things like synonyms,
stemmed words & possessives may be matched.

E.g. with the search string:
mr banana's shop

I could get a highlighted fragment like:
Mister Banana's frozen banana stand

Is there some other approach I could use?

Thanks,
Mark


Realtime + Batch indexing

2010-07-08 Thread bbarani

Hi,

Currently we are trying to acheive both realtime and batch indexing using
SOLR. 

For batch indexing we have setup a master SOLR server which uses DIH and
indexes the data.

For slave we post the XML (real time) in to the SOLR slave and add that to
the existing SOLR document.

Now my issue is that when I replicate the data present in master in to the
slave the data that was added to the slave (by posting XML) will get
overwritten.

I cant post the XML to the master as replication of the whole master to
slave again and again will cause performance issues. My question is

Is there a way to replicate just the modified data in Master (delta) to
slave environment?

What is the best approach to implement both batch / real time indexing in
Master / Slave environment?


One more issue is that when I post some documents to slave directly usung
/update handler some of the attributes are getting lost in the existing
index. Any reason why this might be happening?


Any help / suggestions would be of great help

Thanks,
BB





-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p952293.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH batch job

2010-07-08 Thread Sanjeev Kakar
Hi,

 

  We are trying to import data from the ORACLE database into Solr 1.4
for free text search and would like to provide a faceted search
experience. There are files on the network which we are indexing as
well. 

 

  We are using the DIH for indexing the data from the database and have
written a batch job for iterating over the network files and indexing
them using Tika 0.7.

 

  We have a couple of questions:

1)How do we schedule a batch job using DIH (We need fine granular
access to log any error messages and decide whether to continue or abort
the job)? Is there a patch for Solr 1.5 we can take a look at? Currently
we use Solr 1.4

2)Can we upgrade the Tika libraries in Solr 1.4 to leverage the
latest tika enhancements and use the Solr Cell module?

 

  It would be great if you could provide guidance.

 

Thanks,

Sanjeev Kakar

 



Re: Using hl.regex.pattern to print complete lines

2010-07-08 Thread Peter Spam
To clarify, I never want a snippet, I always want a whole line returned.  Is 
this possible?  Thanks!


-Pete

On Jul 7, 2010, at 5:33 PM, Peter Spam wrote:

> Hi,
> 
> I have a text file broken apart by carriage returns, and I'd like to only 
> return entire lines.  So, I'm trying to use this:
> 
>   &hl.fragmenter=regex
>   &hl.regex.pattern=^.*$
> 
> ... but I still get fragments, even if I crank up the hl.regex.slop to 3 or 
> so.  I also tried a pattern of "\n.*\n" which seems to work better, but still 
> isn't right.  Any ideas?
> 
> 
> -Pete



Delta Import by ID

2010-07-08 Thread Frank A
I'm still having issues - my config looks like:



However I really dont want to use CreationDate, but rather just pass in the
id (as done in the deltaImportQuery) - Can I do that directly - if so how do
I specify the value for dataimporter.delta.id?

(P.S. sorry for a new thread, I kept getting my mail bounced back when I did
a reply, so I'm trying a new thread.)


Re: Using hl.regex.pattern to print complete lines

2010-07-08 Thread Koji Sekiguchi

(10/07/09 2:44), Peter Spam wrote:

To clarify, I never want a snippet, I always want a whole line returned.  Is 
this possible?  Thanks!


-Pete

   

Hello Pete,

Use NullFragmenter. It can be used via GapFragmenter with
hl.fragsize=0.

Koji

--
http://www.rondhuit.com/en/



Indexing slowdowns

2010-07-08 Thread Mark Holland
Since I began using the 2010-05-18 nightly I'm experiencing indexing slow
downs which I didn't with solr-1.4.

I'm seeing indexing slow down roughly every 7m records. I'm indexing about
28m in total. These records are batched into csv files of 1m rows, which are
loaded with stream.file. Solr happily chugs away at the first 7m at around
50s/million. It will then consistently take around 20 minutes to index the
7m-8m batch, after which it returns to around 50s/million until reaching the
14m-15m batch and taking again around 20 minutes and so on.

There are essentially no differences in configuration between my 1.4 set up
and the nightly. I've played around with mergeFactor and other params to no
avail. I've also hooked up yourkit to jetty, but haven't seen anything
obvious in the results. That said, my java foo is not so strong so I may be
missing something.

Can anyone suggest where I might start looking for answers? I have a yourkit
snapshot if anyone would care to see it.

Thanks,
Mark


Re: Using hl.regex.pattern to print complete lines

2010-07-08 Thread Peter Spam
Thanks for the note, Koji.  However, hl.fragsize=0 seems to return the entire 
document, rather than just one single line.

Here's what I tried (what I previously had was commented out):

regexv = "^.*$"
thequery = 
'/solr/select?facet=true&facet.limit=10&fl=id,score,filename&tv=true&timeAllowed=3000&facet.field=filename&qt=tvrh&wt=ruby'
 + (p['fq'].empty? ? '' : ('&fq='+p['fq'].to_s) ) + '&q=' + 
CGI::escape(p['q'].to_s) + '&rows=' + p['rows'].to_s + 
"&hl=true&hl.snippets=1&hl.fragsize=0" 
#&hl.regex.slop=.8&hl.fragsize=200&hl.fragmenter=regex&hl.regex.pattern=" + 
CGI::escape(regexv)

Thanks for your help.


-Peter

On Jul 8, 2010, at 3:47 PM, Koji Sekiguchi wrote:

> (10/07/09 2:44), Peter Spam wrote:
>> To clarify, I never want a snippet, I always want a whole line returned.  Is 
>> this possible?  Thanks!
>> 
>> 
>> -Pete
>> 
>>   
> Hello Pete,
> 
> Use NullFragmenter. It can be used via GapFragmenter with
> hl.fragsize=0.
> 
> Koji
> 
> -- 
> http://www.rondhuit.com/en/
> 



Re: Indexing slowdowns

2010-07-08 Thread Robert Muir
On Thu, Jul 8, 2010 at 7:44 PM, Mark Holland wrote:

>
> Can anyone suggest where I might start looking for answers? I have a
> yourkit
> snapshot if anyone would care to see it.
>
>
Doesn't sound good. I'd like to see whatever data you can provide (i worry
it might be something in analysis)


-- 
Robert Muir
rcm...@gmail.com


Re: DIH batch job

2010-07-08 Thread Lance Norskog
There is no batch job scheduling in Solr. You will have to script this
with your OS tools (probably the 'cron' program).

Tika is integrated into the DataImportHandler in Solr 1.5. This gives
you flexibility in indexing and is worth extra effort.

On Thu, Jul 8, 2010 at 10:48 AM, Sanjeev Kakar  wrote:
> Hi,
>
>
>
>  We are trying to import data from the ORACLE database into Solr 1.4
> for free text search and would like to provide a faceted search
> experience. There are files on the network which we are indexing as
> well.
>
>
>
>  We are using the DIH for indexing the data from the database and have
> written a batch job for iterating over the network files and indexing
> them using Tika 0.7.
>
>
>
>  We have a couple of questions:
>
> 1)    How do we schedule a batch job using DIH (We need fine granular
> access to log any error messages and decide whether to continue or abort
> the job)? Is there a patch for Solr 1.5 we can take a look at? Currently
> we use Solr 1.4
>
> 2)    Can we upgrade the Tika libraries in Solr 1.4 to leverage the
> latest tika enhancements and use the Solr Cell module?
>
>
>
>  It would be great if you could provide guidance.
>
>
>
> Thanks,
>
> Sanjeev Kakar
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Realtime + Batch indexing

2010-07-08 Thread Lance Norskog
You cannot add to the same index on two different solrs. You can set
up separate shards for the batch and incremental indexes and use
distributed search to query both of them.

On Thu, Jul 8, 2010 at 10:04 AM, bbarani  wrote:
>
> Hi,
>
> Currently we are trying to acheive both realtime and batch indexing using
> SOLR.
>
> For batch indexing we have setup a master SOLR server which uses DIH and
> indexes the data.
>
> For slave we post the XML (real time) in to the SOLR slave and add that to
> the existing SOLR document.
>
> Now my issue is that when I replicate the data present in master in to the
> slave the data that was added to the slave (by posting XML) will get
> overwritten.
>
> I cant post the XML to the master as replication of the whole master to
> slave again and again will cause performance issues. My question is
>
> Is there a way to replicate just the modified data in Master (delta) to
> slave environment?
>
> What is the best approach to implement both batch / real time indexing in
> Master / Slave environment?
>
>
> One more issue is that when I post some documents to slave directly usung
> /update handler some of the attributes are getting lost in the existing
> index. Any reason why this might be happening?
>
>
> Any help / suggestions would be of great help
>
> Thanks,
> BB
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p952293.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com


Re: Filter multivalue fields from search result

2010-07-08 Thread Lance Norskog
Yes, denormalizing the index into separate (name,town) pairs is the
common design for this problem.

2010/7/8 "Alex J. G. Burzyński" :
> Hi,
>
> Is it possible to remove from search results the multivalued fields that
> don't pass the search criteria?
>
> My schema is defined as:
>
> 
>  required="true" />
> 
> 
> 
>  multiValued="true"/>
> 
>  multiValued="true"/>
>
> And example docs are:
>
> ++--+++
> | id | name                 | town       | date       |
> ++--+++
> | 1  | Microsoft Excel      | London     | 2010-08-20 |
> |    |                      | Glasgow    | 2010-08-24 |
> |    |                      | Leeds      | 2010-08-28 |
> | 2  | Microsoft Word       | Aberdeen   | 2010-08-21 |
> |    |                      | Reading    | 2010-08-25 |
> |    |                      | London     | 2010-08-29 |
> | 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
> |    |                      | Leeds      | 2010-08-26 |
> ++--+++
>
> so the query for q=name:Microsoft town:Leeds returns docs 1 & 3.
>
> How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?
>
> Or is it that I should create separate doc for each name-event?
>
> Thanks,
> Alex
>



-- 
Lance Norskog
goks...@gmail.com


Re: Indexing slowdowns

2010-07-08 Thread Yonik Seeley
Hmm, did the default number of background merge threads change
sometime recently?  I seem to recall so, but I can't find a reference
to it.

-Yonik
http://www.lucidimagination.com


Re: Using hl.regex.pattern to print complete lines

2010-07-08 Thread Koji Sekiguchi

(10/07/09 9:30), Peter Spam wrote:

Thanks for the note, Koji.  However, hl.fragsize=0 seems to return the entire 
document, rather than just one single line.

Here's what I tried (what I previously had was commented out):

regexv = "^.*$"
thequery = '/solr/select?facet=true&facet.limit=10&fl=id,score,filename&tv=true&timeAllowed=3000&facet.field=filename&qt=tvrh&wt=ruby' 
+ (p['fq'].empty? ? '' : ('&fq='+p['fq'].to_s) ) + '&q=' + CGI::escape(p['q'].to_s) + '&rows=' + p['rows'].to_s + 
"&hl=true&hl.snippets=1&hl.fragsize=0" #&hl.regex.slop=.8&hl.fragsize=200&hl.fragmenter=regex&hl.regex.pattern=" + 
CGI::escape(regexv)

Thanks for your help.


-Peter

   

Peter,

Are you sure using GapFragmenter when you set fragsize to 0?

I've never tried regex fragmenter...

If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
is available that is for getting entire field contents with search terms
highlighted. To use it, set hl.useFastVectorHighlighter to true.

Koji

--
http://www.rondhuit.com/en/



Re: Indexing slowdowns

2010-07-08 Thread Mark Miller
On 7/8/10 8:55 PM, Yonik Seeley wrote:
> Hmm, did the default number of background merge threads change
> sometime recently?  I seem to recall so, but I can't find a reference
> to it.
> 
> -Yonik
> http://www.lucidimagination.com

It did change - from 3 to 1-3:

maxThreadCount = Math.max(1, Math.min(3,
Runtime.getRuntime().availableProcessors()/2));

- Mark


Re: Using symlinks to alias cores

2010-07-08 Thread Chris Hostetter

: However, the wiki recommends against using the ALIAS command in CoreAdmin in
: a couple of places, and SOLR-1637 says it's been removed now anyway.

correct, there were a lot of problems with how to cleanly/sanely deal with 
core operations on aliases -- he command may return at some future date if 
there is a better seperation between the concept of an "authoritative' 
name for a core, and aliases -- but in the meantime, i wouldn't recomend 
using it even in older versions of Solr where it (sort of) worked.

: If I can't use ALIAS safely, is it okay to just symlink the most recent
: core's instance (or data) directory to 'current', and bring it up in Solr as
: a separate core? Will this be safe, as long as all index writing happens via
: the 'current' core?

i would not recommend that -- as long as you only index to one of those 
cores, and use "commits" to force the other instances to reload from disk 
there wouldn't be any errors -- but you'll wind up duplicating all of the 
internal memory strucutures (index objects, and caches)

a cleaner way to deal with this would be do use something like 
RewriteRule -- either in your appserver (if it supports a feature like 
that) or in a proxy sitting in front of Solr.

Frankly though: indexing code can usually be made fairly smart -- pretty 
much every programming langauge i nthe world makes it fairly easy to 
generate a string using the pattern 
"http://server:8983/solr/${YY-MM-DD}/update";, and then you just POST to 
that.


-Hoss



Re: Using hl.regex.pattern to print complete lines

2010-07-08 Thread Chris Hostetter

: If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
: is available that is for getting entire field contents with search terms
: highlighted. To use it, set hl.useFastVectorHighlighter to true.

He doesn't want the entire field -- his stored field values contain 
multi-line strings (using newline characters) and he wants to make 
fragments per "line" (ie: bounded by newline characters, or the start/end 
of the entire field value)

Peter: i haven't looked at the code, but i expect that the problem is that 
the java regex engine isn't being used in a way that makes ^ and $ match 
any line boundary -- they are probably only matching the start/end of the 
field (and . is probably only matching non-newline characters)

java regexes support embedded flags (ie: "(?xyz)your regex") so you might 
try that (i don't remember what the correct modifier flag is for the 
multiline mode off the top of my head)

-Hoss



Re: Realtime + Batch indexing

2010-07-08 Thread bbarani

Hi,

Thanks a lot for your reply.

As you suggested the best option is to have another core started up at same
/ different port and use shards for  distributed search.

I had also thought of another approach where I would be writing the real
time data to both master and slave hence it will be available at slave when
user is searching, also it would be present after replication of master to
slave.

Do you think my suggestion would work? I would surely go for shards but for
time being I am planning to implement the 2nd approach as we need to make
changes to the UI code if we are going for shards.

Thanks,
BB
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p953410.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Realtime + Batch indexing

2010-07-08 Thread Lance Norskog
No, this second part will not work. Lucene creates new index files
independent of when and what you index. So copying files from one
indexer to another will never work: the indexes will be out of sync.

You don't have to change your UI to use distributed search. You can
add a new  that fowards requests to the other shards
(with different URLs!).



 shard1,shard2



Now, solr/broker?q=word goes to shard1/solr?q=word and
shard2/solr?q=word with no UI changes.

I usually make a new core to broker these sharded queries. It makes it
easier to track what I'm doing.

On Thu, Jul 8, 2010 at 7:22 PM, bbarani  wrote:
>
> Hi,
>
> Thanks a lot for your reply.
>
> As you suggested the best option is to have another core started up at same
> / different port and use shards for  distributed search.
>
> I had also thought of another approach where I would be writing the real
> time data to both master and slave hence it will be available at slave when
> user is searching, also it would be present after replication of master to
> slave.
>
> Do you think my suggestion would work? I would surely go for shards but for
> time being I am planning to implement the 2nd approach as we need to make
> changes to the UI code if we are going for shards.
>
> Thanks,
> BB
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p953410.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com


making rotating timestamped logs from solr output

2010-07-08 Thread Cam Bazz
Hello,

I would like to log the solr console. although solr logs requests in
timestamped format, this only logs the requests, i.e. does not log
number of hits for a given query, etc.

is there any easy way to do this other then reverting to methods for
capturing solr output. I usually run solr on my server using screen
command first, running solr, then detaching from console.

but it would be nice to have output logging instead of request logging.

best regards,
c.b.


Re: Realtime + Batch indexing

2010-07-08 Thread bbarani

Thanks a ton for your reply.. Your suggestion always helped me out :)

Your inputs on configuring shards via SOLR config would help us a lot!!!

One final question about replication.. When I initiate replication I thought
SOLR would delete the existing index in slave and just transfers the master
index in to Slave. If thats the case there wont be any sync up issues right?

I am asking this because everytime I initiate replication the index size of
both slave and master becomes the same  (even if for some reason if index
size of slave is bigger than master it gets reduced to the same size as
master after replication) so thought that SOLR just deletes the slave index
and then moves all the files from master..

Again, thanks for your help

Thanks,
BB
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Realtime-Batch-indexing-tp952293p953590.html
Sent from the Solr - User mailing list archive at Nabble.com.