Re: Issues with whitespace tokenization in QueryParser

2012-06-11 Thread Bernd Fehling
Because we use in many cases mutli-term search together with synonyms
as thesaurus we had to develop a solution for this. There is a whole
chain of pitfalls through the system and you have to be careful.

The thesaurus (synonym.txt) solves not only single-terms to multi-terms
but also multi-terms to single-terms, multi-terms to multi-terms and
naturally single-terms to single-terms.
And all this together combined with some boosting where needed.

May be we can/should provide a general solution for thesaurus support to solr 
community...?
But actually we have some other more importent issues on our list.

If you want to start with multi-term synonyms turn the weakness of the
QueryParser into its strength. (Wow, sound like a Zen wisdom)

Regards
Bernd

Am 11.06.2012 05:02, schrieb John Berryman:
> According to https://issues.apache.org/jira/browse/LUCENE-2605, the Lucene
> QueryParser tokenizes on white space before giving any text to the
> Analyzer. This makes it impossible to use multi-term synonyms because the
> SynonymFilter only receives one word at a time.
> 
> Resolution to this would really help with my current project. My project
> client sells clothing and accessories online. They have plenty of examples
> of compound words e.g."rain coat". But some of these compound words are
> really tripping them up. A prime example is that a search for "dress shoes"
> returns a list of dresses and random shoes (not necessarily dress shoes). I
> wish that I was able to synonym compound words to single tokens (e.g.
> "dress shoes => dress_shoes"), but with this whitespace tokenization issue,
> it's impossible.
> 
> Has anything happened with this bug recently? For a short time I've got a
> client that would be willing to pay for this issues to be fixed if it's not
> too much of a rabbit hole. Anyone care to catch me up with what this might
> entail?
> 
> LinkedIn 
> Twitter 
> 


Re: what's better for in memory searching?

2012-06-11 Thread Li Li
I have roughly read the codes of RAMDirectory. it use a list of 1024
byte arrays and many overheads.
But as far as I know, using MMapDirectory, I can't prevent the page
faults. OS will swap less frequent pages out. Even if I allocate
enough memory for JVM, I can guarantee all the files in the directory
are in memory. am I understanding right? if it is, then some less
frequent queries will be slow.  How can I let them always in memory?

On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog  wrote:
> Yes, use MMapDirectory. It is faster and uses memory more efficiently
> than RAMDirectory. This sounds wrong, but it is true. With
> RAMDirectory, Java has to work harder doing garbage collection.
>
> On Fri, Jun 8, 2012 at 1:30 AM, Li Li  wrote:
>> hi all
>>   I want to use lucene 3.6 providing searching service. my data is
>> not very large, raw data is less that 1GB and I want to use load all
>> indexes into memory. also I need save all indexes into disk
>> persistently.
>>   I originally want to use RAMDirectory. But when I read its javadoc.
>>
>>   Warning: This class is not intended to work with huge indexes.
>> Everything beyond several hundred megabytes
>>  will waste resources (GC cycles), because it uses an internal buffer
>> size of 1024 bytes, producing millions of byte
>>  [1024] arrays. This class is optimized for small memory-resident
>> indexes. It also has bad concurrency on
>>  multithreaded environments.
>> It is recommended to materialize large indexes on disk and use
>> MMapDirectory, which is a high-performance
>>  directory implementation working directly on the file system cache of
>> the operating system, so copying data to
>>  Java heap space is not useful.
>>
>>    should I use MMapDirectory? it seems another contrib instantiated.
>> anyone test it with RAMDirectory?
>
>
>
> --
> Lance Norskog
> goks...@gmail.com


Re: what's better for in memory searching?

2012-06-11 Thread Michael Kuhlmann
Set the swapiness to 0 to avoid memory pages being swapped to disk too 
early.


http://en.wikipedia.org/wiki/Swappiness

-Kuli

Am 11.06.2012 10:38, schrieb Li Li:

I have roughly read the codes of RAMDirectory. it use a list of 1024
byte arrays and many overheads.
But as far as I know, using MMapDirectory, I can't prevent the page
faults. OS will swap less frequent pages out. Even if I allocate
enough memory for JVM, I can guarantee all the files in the directory
are in memory. am I understanding right? if it is, then some less
frequent queries will be slow.  How can I let them always in memory?

On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog  wrote:

Yes, use MMapDirectory. It is faster and uses memory more efficiently
than RAMDirectory. This sounds wrong, but it is true. With
RAMDirectory, Java has to work harder doing garbage collection.

On Fri, Jun 8, 2012 at 1:30 AM, Li Li  wrote:

hi all
   I want to use lucene 3.6 providing searching service. my data is
not very large, raw data is less that 1GB and I want to use load all
indexes into memory. also I need save all indexes into disk
persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
indexes. It also has bad concurrency on
  multithreaded environments.
It is recommended to materialize large indexes on disk and use
MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
the operating system, so copying data to
  Java heap space is not useful.

should I use MMapDirectory? it seems another contrib instantiated.
anyone test it with RAMDirectory?




--
Lance Norskog
goks...@gmail.com




Re: what's better for in memory searching?

2012-06-11 Thread Li Li
1. this setting is global, I just want my lucene searching program
don't swap. for other less important programs, it can still swap.
2. do I need call MappedByteBuffer.load() explicitly? or I have to
warm up the indexes to guarantee all my files are in physical memory?

On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann  wrote:
> Set the swapiness to 0 to avoid memory pages being swapped to disk too
> early.
>
> http://en.wikipedia.org/wiki/Swappiness
>
> -Kuli
>
> Am 11.06.2012 10:38, schrieb Li Li:
>
>> I have roughly read the codes of RAMDirectory. it use a list of 1024
>> byte arrays and many overheads.
>> But as far as I know, using MMapDirectory, I can't prevent the page
>> faults. OS will swap less frequent pages out. Even if I allocate
>> enough memory for JVM, I can guarantee all the files in the directory
>> are in memory. am I understanding right? if it is, then some less
>> frequent queries will be slow.  How can I let them always in memory?
>>
>> On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog  wrote:
>>>
>>> Yes, use MMapDirectory. It is faster and uses memory more efficiently
>>> than RAMDirectory. This sounds wrong, but it is true. With
>>> RAMDirectory, Java has to work harder doing garbage collection.
>>>
>>> On Fri, Jun 8, 2012 at 1:30 AM, Li Li  wrote:

 hi all
   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
 Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
 size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
 indexes. It also has bad concurrency on
  multithreaded environments.
 It is recommended to materialize large indexes on disk and use
 MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
 the operating system, so copying data to
  Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
 anyone test it with RAMDirectory?
>>>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goks...@gmail.com
>
>


Re: what's better for in memory searching?

2012-06-11 Thread Paul Libbrecht
Li Li,

have you considered allocating a RAM-Disk?
It's not the most flexible thing... but it's certainly close, in performance to 
a RAMDirectory.
MMapping on that is likely to be useless but I doubt you can set it to zero.
That'd need experiment.

Also, doesn't caching and auto-warming provide the lowest latency for all 
"expected queries" ?

Paul


Le 11 juin 2012 à 10:50, Li Li a écrit :

>   I want to use lucene 3.6 providing searching service. my data is
> not very large, raw data is less that 1GB and I want to use load all
> indexes into memory. also I need save all indexes into disk
> persistently.
>   I originally want to use RAMDirectory. But when I read its javadoc.




Re: what's better for in memory searching?

2012-06-11 Thread Li Li
do you mean software RAM disk? using RAM to simulate disk? How to deal
with Persistence?

maybe I can hack by increase RAMOutputStream.BUFFER_SIZE from 1024 to 1024*1024.
it may have a waste. but I can adjust my merge policy to avoid to much segments.
I will have a "big" segment and a "small" segment. Every night I will
merge them. new added documents will flush into a new segment and I
will merge the new generated segment and the small one.
Our update operations are not very frequent.

On Mon, Jun 11, 2012 at 4:59 PM, Paul Libbrecht  wrote:
> Li Li,
>
> have you considered allocating a RAM-Disk?
> It's not the most flexible thing... but it's certainly close, in performance 
> to a RAMDirectory.
> MMapping on that is likely to be useless but I doubt you can set it to zero.
> That'd need experiment.
>
> Also, doesn't caching and auto-warming provide the lowest latency for all 
> "expected queries" ?
>
> Paul
>
>
> Le 11 juin 2012 à 10:50, Li Li a écrit :
>
>>   I want to use lucene 3.6 providing searching service. my data is
>> not very large, raw data is less that 1GB and I want to use load all
>> indexes into memory. also I need save all indexes into disk
>> persistently.
>>   I originally want to use RAMDirectory. But when I read its javadoc.
>
>


Re: what's better for in memory searching?

2012-06-11 Thread Li Li
I am sorry. I make a mistake. even use RAMDirectory, I can not
guarantee they are not swapped out.

On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann  wrote:
> Set the swapiness to 0 to avoid memory pages being swapped to disk too
> early.
>
> http://en.wikipedia.org/wiki/Swappiness
>
> -Kuli
>
> Am 11.06.2012 10:38, schrieb Li Li:
>
>> I have roughly read the codes of RAMDirectory. it use a list of 1024
>> byte arrays and many overheads.
>> But as far as I know, using MMapDirectory, I can't prevent the page
>> faults. OS will swap less frequent pages out. Even if I allocate
>> enough memory for JVM, I can guarantee all the files in the directory
>> are in memory. am I understanding right? if it is, then some less
>> frequent queries will be slow.  How can I let them always in memory?
>>
>> On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog  wrote:
>>>
>>> Yes, use MMapDirectory. It is faster and uses memory more efficiently
>>> than RAMDirectory. This sounds wrong, but it is true. With
>>> RAMDirectory, Java has to work harder doing garbage collection.
>>>
>>> On Fri, Jun 8, 2012 at 1:30 AM, Li Li  wrote:

 hi all
   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
 Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
 size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
 indexes. It also has bad concurrency on
  multithreaded environments.
 It is recommended to materialize large indexes on disk and use
 MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
 the operating system, so copying data to
  Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
 anyone test it with RAMDirectory?
>>>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goks...@gmail.com
>
>


Re: what's better for in memory searching?

2012-06-11 Thread Michael Kuhlmann
You cannot guarantee this when you're running out of RAM. You'd have a 
problem then anyway.


Why are you caring that much? Did you yet have performance issues? 1GB 
should load really fast, and both auto warming and OS cache should help 
a lot as well. With such an index, you usually don't need to fine tune 
performance that much.


Did you think about using a SSD? Since you want to persist your index, 
you'll need to live with disk IO anyway.


Greetings,
Kuli

Am 11.06.2012 11:20, schrieb Li Li:

I am sorry. I make a mistake. even use RAMDirectory, I can not
guarantee they are not swapped out.

On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann  wrote:

Set the swapiness to 0 to avoid memory pages being swapped to disk too
early.

http://en.wikipedia.org/wiki/Swappiness

-Kuli

Am 11.06.2012 10:38, schrieb Li Li:


I have roughly read the codes of RAMDirectory. it use a list of 1024
byte arrays and many overheads.
But as far as I know, using MMapDirectory, I can't prevent the page
faults. OS will swap less frequent pages out. Even if I allocate
enough memory for JVM, I can guarantee all the files in the directory
are in memory. am I understanding right? if it is, then some less
frequent queries will be slow.  How can I let them always in memory?

On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskogwrote:


Yes, use MMapDirectory. It is faster and uses memory more efficiently
than RAMDirectory. This sounds wrong, but it is true. With
RAMDirectory, Java has to work harder doing garbage collection.

On Fri, Jun 8, 2012 at 1:30 AM, Li Liwrote:


hi all
   I want to use lucene 3.6 providing searching service. my data is
not very large, raw data is less that 1GB and I want to use load all
indexes into memory. also I need save all indexes into disk
persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
indexes. It also has bad concurrency on
  multithreaded environments.
It is recommended to materialize large indexes on disk and use
MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
the operating system, so copying data to
  Java heap space is not useful.

should I use MMapDirectory? it seems another contrib instantiated.
anyone test it with RAMDirectory?





--
Lance Norskog
goks...@gmail.com







Re: what's better for in memory searching?

2012-06-11 Thread Li Li
yes, I need average query time less than 10 ms. The faster the better.
I have enough memory for lucene because I know there are not too much
data. there are not many modifications. every day there are about
hundreds of document update. if indexes are not in physical memory,
then IO operations will cost a few ms.
btw, the full gc may also add uncertainty, So I need optimize it as
much as possible.
On Mon, Jun 11, 2012 at 5:27 PM, Michael Kuhlmann  wrote:
> You cannot guarantee this when you're running out of RAM. You'd have a
> problem then anyway.
>
> Why are you caring that much? Did you yet have performance issues? 1GB
> should load really fast, and both auto warming and OS cache should help a
> lot as well. With such an index, you usually don't need to fine tune
> performance that much.
>
> Did you think about using a SSD? Since you want to persist your index,
> you'll need to live with disk IO anyway.
>
> Greetings,
> Kuli
>
> Am 11.06.2012 11:20, schrieb Li Li:
>
>> I am sorry. I make a mistake. even use RAMDirectory, I can not
>> guarantee they are not swapped out.
>>
>> On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann
>>  wrote:
>>>
>>> Set the swapiness to 0 to avoid memory pages being swapped to disk too
>>> early.
>>>
>>> http://en.wikipedia.org/wiki/Swappiness
>>>
>>> -Kuli
>>>
>>> Am 11.06.2012 10:38, schrieb Li Li:
>>>
 I have roughly read the codes of RAMDirectory. it use a list of 1024
 byte arrays and many overheads.
 But as far as I know, using MMapDirectory, I can't prevent the page
 faults. OS will swap less frequent pages out. Even if I allocate
 enough memory for JVM, I can guarantee all the files in the directory
 are in memory. am I understanding right? if it is, then some less
 frequent queries will be slow.  How can I let them always in memory?

 On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog
  wrote:
>
>
> Yes, use MMapDirectory. It is faster and uses memory more efficiently
> than RAMDirectory. This sounds wrong, but it is true. With
> RAMDirectory, Java has to work harder doing garbage collection.
>
> On Fri, Jun 8, 2012 at 1:30 AM, Li Li    wrote:
>>
>>
>> hi all
>>   I want to use lucene 3.6 providing searching service. my data is
>> not very large, raw data is less that 1GB and I want to use load all
>> indexes into memory. also I need save all indexes into disk
>> persistently.
>>   I originally want to use RAMDirectory. But when I read its javadoc.
>>
>>   Warning: This class is not intended to work with huge indexes.
>> Everything beyond several hundred megabytes
>>  will waste resources (GC cycles), because it uses an internal buffer
>> size of 1024 bytes, producing millions of byte
>>  [1024] arrays. This class is optimized for small memory-resident
>> indexes. It also has bad concurrency on
>>  multithreaded environments.
>> It is recommended to materialize large indexes on disk and use
>> MMapDirectory, which is a high-performance
>>  directory implementation working directly on the file system cache of
>> the operating system, so copying data to
>>  Java heap space is not useful.
>>
>>    should I use MMapDirectory? it seems another contrib instantiated.
>> anyone test it with RAMDirectory?
>
>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>>>
>>>
>>>
>


Re: what's better for in memory searching?

2012-06-11 Thread Li Li
I found this. 
http://unix.stackexchange.com/questions/10214/per-process-swapiness-for-linux
it can provide  fine grained control of swapping

On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann  wrote:
> Set the swapiness to 0 to avoid memory pages being swapped to disk too
> early.
>
> http://en.wikipedia.org/wiki/Swappiness
>
> -Kuli
>
> Am 11.06.2012 10:38, schrieb Li Li:
>
>> I have roughly read the codes of RAMDirectory. it use a list of 1024
>> byte arrays and many overheads.
>> But as far as I know, using MMapDirectory, I can't prevent the page
>> faults. OS will swap less frequent pages out. Even if I allocate
>> enough memory for JVM, I can guarantee all the files in the directory
>> are in memory. am I understanding right? if it is, then some less
>> frequent queries will be slow.  How can I let them always in memory?
>>
>> On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog  wrote:
>>>
>>> Yes, use MMapDirectory. It is faster and uses memory more efficiently
>>> than RAMDirectory. This sounds wrong, but it is true. With
>>> RAMDirectory, Java has to work harder doing garbage collection.
>>>
>>> On Fri, Jun 8, 2012 at 1:30 AM, Li Li  wrote:

 hi all
   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
 Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
 size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
 indexes. It also has bad concurrency on
  multithreaded environments.
 It is recommended to materialize large indexes on disk and use
 MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
 the operating system, so copying data to
  Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
 anyone test it with RAMDirectory?
>>>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goks...@gmail.com
>
>


Re: what's better for in memory searching?

2012-06-11 Thread Paul Libbrecht

Le 11 juin 2012 à 11:16, Li Li a écrit :

> do you mean software RAM disk?

Right. OS level.

> using RAM to simulate disk?

Yes.
That generally makes a disk which is boost fast in reading and writing.

> How to deal with Persistence?

Synchronization (slaving?).

paul



Re: what's better for in memory searching?

2012-06-11 Thread Toke Eskildsen
On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote:
> yes, I need average query time less than 10 ms. The faster the better.
> I have enough memory for lucene because I know there are not too much
> data. there are not many modifications. every day there are about
> hundreds of document update. if indexes are not in physical memory,
> then IO operations will cost a few ms.

I'm with Michael on this one: It seems that you're doing a premature
optimization. Guessing that your final index will be < 5GB in size with
1 million documents (give or take 900.000:-), relatively simple queries
and so on, an average response time of 10 ms should be attainable even
on spinning drives. One hundred document updates per day are not many,
so again I would not expect problems.

As is often the case on this mailing list, the advice is "try it". Using
a normal on-disk index and doing some warm up is the easy solution to
implement and nearly all of your work on this will be usable for a
RAM-based solution, if you are not satisfied with the speed. Or you
could buy a small & cheap SSD and have no more worries...

Regards,
Toke Eskildsen



edismax and untokenized field

2012-06-11 Thread Vijay Ramachandran
Hello. I'm trying to understand the behaviour of edismax in solr 3.4 when
it comes to searching fields similar to "string" types, i.e., untokenized.
My document is data about products available in various stores. One of the
fields in my schema is the name of the merchant, and I would like to match
only the entire name in the merchant field to cut out false positives. For
e.g., I want "The Gap" to match in merchant, but not "gap".

To do this, I configured the field as such:


  




  


All the other fields are product descriptors such as category, product
name, etc., which I store as "text_en" field from the example schemas.

I have a merchant in the data called "Jones New York". If my query is
simply the 3 words, i.e., "q=jones+new+york", the merchant field doesn't
match. The debugQuery shows that the query splits the words up, like thus:
+((DisjunctionMaxQuery((summary:jones^2.0 |
title:jones^3.0 | merchant:jones^3.0 | cats4match:jones)~0.1)
DisjunctionMaxQuery((merchant:new^3.0)~0.1)
DisjunctionMaxQuery((summary:york^2.0 | title:york^3.0 | merchant:york^3.0
| cats4match:york)~0.1))~1) DisjunctionMaxQuery((summary:"jones ?
york"~3^5.0 | title:"jones ? york"~3^10.0 | cats4match:"jones ?
york"~3^5.0)~0.1) ()

My edismax is configured this:
  

 edismax
 explicit
 0.1
 
   dealid,category,subcategory,merchant, merchant_id, title
 
 1
 
   cats4match^1.0 merchant^3.0 title^3.0 summary^2.0
 
 
   cats4match^5.0 merchant^10.0 title^10.0 summary^5.0
 
 3
 
   cats4match^5.0 merchant^10.0 title^10.0 summary^5.0
title_phrases^10.0 summary_phrases^5.0
 
 
   cats4match^5.0 merchant^10.0 title^10.0 summary^5.0
title_phrases^10.0 summary_phrases^5.0
 
 3
 *:*

  


What gives? Can I achieve trying to query a string type field together with
other tokenized fields? Or am I missing the point entirely, and I need to
do this some other way?

thanks in advance for your help.
Vijay


Re: edismax and untokenized field

2012-06-11 Thread Tanguy Moal
Hello,
I think you have to issue a phrase query in such a case because otherwise
each "token" is searched independently in the merchant field : the query
parser splits the query on spaces!

Check the difference between debug outputs when you search for "Jones New
York", you'd get what you expected.

Hope this helps,

--
Tanguy

2012/6/11 Vijay Ramachandran 

> Hello. I'm trying to understand the behaviour of edismax in solr 3.4 when
> it comes to searching fields similar to "string" types, i.e., untokenized.
> My document is data about products available in various stores. One of the
> fields in my schema is the name of the merchant, and I would like to match
> only the entire name in the merchant field to cut out false positives. For
> e.g., I want "The Gap" to match in merchant, but not "gap".
>
> To do this, I configured the field as such:
>
> positionIncrementGap="100">
>  
>
>
>
> synonyms="names-synonyms.txt" ignoreCase="true" expand="true"/>
>  
>
>
> All the other fields are product descriptors such as category, product
> name, etc., which I store as "text_en" field from the example schemas.
>
> I have a merchant in the data called "Jones New York". If my query is
> simply the 3 words, i.e., "q=jones+new+york", the merchant field doesn't
> match. The debugQuery shows that the query splits the words up, like thus:
> +((DisjunctionMaxQuery((summary:jones^2.0 |
> title:jones^3.0 | merchant:jones^3.0 | cats4match:jones)~0.1)
> DisjunctionMaxQuery((merchant:new^3.0)~0.1)
> DisjunctionMaxQuery((summary:york^2.0 | title:york^3.0 | merchant:york^3.0
> | cats4match:york)~0.1))~1) DisjunctionMaxQuery((summary:"jones ?
> york"~3^5.0 | title:"jones ? york"~3^10.0 | cats4match:"jones ?
> york"~3^5.0)~0.1) ()
>
> My edismax is configured this:
>  
>
> edismax
> explicit
> 0.1
> 
>   dealid,category,subcategory,merchant, merchant_id, title
> 
> 1
> 
>   cats4match^1.0 merchant^3.0 title^3.0 summary^2.0
> 
> 
>   cats4match^5.0 merchant^10.0 title^10.0 summary^5.0
> 
> 3
> 
>   cats4match^5.0 merchant^10.0 title^10.0 summary^5.0
> title_phrases^10.0 summary_phrases^5.0
> 
> 
>   cats4match^5.0 merchant^10.0 title^10.0 summary^5.0
> title_phrases^10.0 summary_phrases^5.0
> 
> 3
> *:*
>
>  
>
>
> What gives? Can I achieve trying to query a string type field together with
> other tokenized fields? Or am I missing the point entirely, and I need to
> do this some other way?
>
> thanks in advance for your help.
> Vijay
>


Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Nitesh Nandy
Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices and
2 shards)

The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud

We are doing distributed search. While querying, we use field collapsing
with "ngroups" set as true as we need the number of search results.

However, there is a difference in the number of "result list" returned and
the "ngroups" value returned.

Ex:
http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3&&group=true&group.field=id&group.ngroups=true


The response XMl looks like




0
46

id
true
true
messagebody:monit AND usergroupid:3




10
9


320043

...



398807
...



346878
...


346880
...






So you can see that the ngroups value returned is 9 and the actual number
of groups returned is 4

Why do we have this discrepancy in the ngroups, matches and actual number
of groups. Is this an open issue ?

 Any kind of help is appreciated.

-- 
Regards,

Nitesh Nandy


Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Martijn v Groningen
The ngroups returns the number of groups that have matched with the
query. However if you want ngroups to be correct in a distributed
environment you need
to put document belonging to the same group into the same shard.
Groups can't cross shard boundaries. I guess you need to do
some manual document partitioning.

Martijn

On 11 June 2012 14:29, Nitesh Nandy  wrote:
> Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices and
> 2 shards)
>
> The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud
>
> We are doing distributed search. While querying, we use field collapsing
> with "ngroups" set as true as we need the number of search results.
>
> However, there is a difference in the number of "result list" returned and
> the "ngroups" value returned.
>
> Ex:
> http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3&&group=true&group.field=id&group.ngroups=true
>
>
> The response XMl looks like
>
> 
> 
> 
> 0
> 46
> 
> id
> true
> true
> messagebody:monit AND usergroupid:3
> 
> 
> 
> 
> 10
> 9
> 
> 
> 320043
> 
> ...
> 
> 
> 
> 398807
> ...
> 
> 
> 
> 346878
> ...
> 
> 
> 346880
> ...
> 
> 
> 
> 
> 
>
> So you can see that the ngroups value returned is 9 and the actual number
> of groups returned is 4
>
> Why do we have this discrepancy in the ngroups, matches and actual number
> of groups. Is this an open issue ?
>
>  Any kind of help is appreciated.
>
> --
> Regards,
>
> Nitesh Nandy



-- 
Met vriendelijke groet,

Martijn van Groningen


Re: Building a heat map from geo data in index

2012-06-11 Thread Stefan Matheis
I'm not entirely sure, that it has to be that complicated .. what about using 
for example http://www.patrick-wied.at/static/heatmapjs/ ? You could collect 
all the geo-related data and do the (heat)map stuff on the client.



On Sunday, June 10, 2012 at 7:49 PM, Jamie Johnson wrote:

> I had a request from a customer which to this point I have not seen
> much similar so I figured I'd pose the question here. I've been asked
> if it was possible to build a heat map from the results of a query. I
> can imagine a process to do this through some post processing, but
> that sounds very expensive for large/distributed indices so I was
> wondering if with all of the new geospatial support that is being
> added to lucene/solr there was a way to do geospatial faceting. What
> I am imagining is bounding box being defined and that box being broken
> into an N by N matrix, each of which would return counts so a heat map
> could be constructed. Any other thoughts on this would be greatly
> appreciated, right now I am really just fishing for some ideas.





defaultSearchField not working after upgrade to solr3.6

2012-06-11 Thread Rohit
Hi,

 

We have just migrated from solr3.5 to solr3.6, for all this time we have
been querying solr as,

 

http://122.166.9.144:8080/solr/

<>/?q=apple

 

But now this is not working and the name of the search field needs to be
provided everytime, which was not the case earlier. What might be casing
this?

 

Regards,

Rohit

 



Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Jack Krupansky
Is there a Solr wiki that discusses these issues, such as "Groups can't 
cross shard boundaries"? Seems like it should be highlighted prominently, 
maybe here:

http://wiki.apache.org/solr/FieldCollapsing

Seems like it should be mentioned on the distributed/SolrCloud wiki(s) as 
well.


Is this a "distributed IDF" type of issue or something else? Is this an 
outright bug or an (insurmountable?) limitation?


I did notice SOLR-2066, but didn't see mention of the limitation. Are there 
any other limitations for distributed grouping?


-- Jack Krupansky

-Original Message- 
From: Martijn v Groningen

Sent: Monday, June 11, 2012 8:53 AM
To: solr-user@lucene.apache.org
Subject: Re: Issue with field collapsing in solr 4 while performing 
distributed search


The ngroups returns the number of groups that have matched with the
query. However if you want ngroups to be correct in a distributed
environment you need
to put document belonging to the same group into the same shard.
Groups can't cross shard boundaries. I guess you need to do
some manual document partitioning.

Martijn

On 11 June 2012 14:29, Nitesh Nandy  wrote:
Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices 
and

2 shards)

The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud

We are doing distributed search. While querying, we use field collapsing
with "ngroups" set as true as we need the number of search results.

However, there is a difference in the number of "result list" returned and
the "ngroups" value returned.

Ex:
http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3&&group=true&group.field=id&group.ngroups=true


The response XMl looks like




0
46

id
true
true
messagebody:monit AND usergroupid:3




10
9


320043

...



398807
...



346878
...


346880
...






So you can see that the ngroups value returned is 9 and the actual number
of groups returned is 4

Why do we have this discrepancy in the ngroups, matches and actual number
of groups. Is this an open issue ?

 Any kind of help is appreciated.

--
Regards,

Nitesh Nandy




--
Met vriendelijke groet,

Martijn van Groningen 



Re: defaultSearchField not working after upgrade to solr3.6

2012-06-11 Thread Jack Krupansky
Add the "df" parameter to your query request handler. It names the default 
field. Or use "qf" for the edismax query parser.


-- Jack Krupansky

-Original Message- 
From: Rohit

Sent: Monday, June 11, 2012 8:58 AM
To: solr-user@lucene.apache.org
Subject: defaultSearchField not working after upgrade to solr3.6

Hi,



We have just migrated from solr3.5 to solr3.6, for all this time we have
been querying solr as,



http://122.166.9.144:8080/solr/

<>/?q=apple



But now this is not working and the name of the search field needs to be
provided everytime, which was not the case earlier. What might be casing
this?



Regards,

Rohit





RE: Writing custom data import handler for Solr.

2012-06-11 Thread Dyer, James
More specifically, the 3.6 Data Import Handler code (DIH) can be seen here:

http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/

The main wiki page is here:

http://wiki.apache.org/solr/DataImportHandler

The architecture of DIH is such that each import entity is driven by an 
EntityProcessor that reads data from a DataSource.  So you could create a 
KindaLikeAmazonE3DataSource and then a KindaLikeAmazonE3EntityProcessor.  The 
DataSource reads the data and passes it to the EntityProcessor.  

See also SolrEntityProcessor.  This is an Entity Processor that reads from 1 
solr core to re-index the same data in another solr core.  This Entity 
Processor, I believe, does its own data reading and doesn't use a DataSource.  
This might be a simpler approach for you.

On the wiki page, see these 2 sections:

http://wiki.apache.org/solr/DataImportHandler#EntityProcessor
http://wiki.apache.org/solr/DataImportHandler#DataSource

In the code your extension points are these 2 classes:

http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataSource.java
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/EntityProcessor.java

For a good example that you might want to base your code from, see 
SolrEntityProcessor:

http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/SolrEntityProcessor.java

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Saturday, June 09, 2012 7:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Writing custom data import handler for Solr.

Nope, the code is all you get.

On Sat, Jun 9, 2012 at 12:16 AM, ram anam  wrote:
>
> Thanks for the guidance. But is there any documentation that describes the 
> steps to implement custom data source and integrate it with SOLR. The data 
> source I am trying to integrate is like Amazon S3 Buckets. But provider is 
> different.
>
> Thanks and regards,Ram Anam.
>
>> Date: Fri, 8 Jun 2012 20:40:05 -0700
>> Subject: Re: Writing custom data import handler for Solr.
>> From: goks...@gmail.com
>> To: solr-user@lucene.apache.org
>>
>> The DataImportHandler is a toolkit in Solr. It has a few different
>> kinds of plugins. It is very possible that you do not have to write
>> any Java code.
>>
>> If you have an unusual external data feed (database, file system,
>> Amazon S3 buckets) then you would write a Datasource. The only
>> examples are the source code in trunk/solr/contrib/dataimporthandler.
>>
>> http://wiki.apache.org/solr/DataImportHandler
>>
>> On Fri, Jun 8, 2012 at 8:35 PM, ram anam  wrote:
>> >
>> > Hi Eric,
>> > I cannot disclose the data source which we are planning to index inside 
>> > SOLR as it is confidential. But client wants it be in the form of Import 
>> > Handler. We plan to install Solr and our custom data import handlers so 
>> > that client can just consume it. Could you please provide me the pointers 
>> > to examples of Custom Data Import Handlers.
>> >
>> > Thanks and regards,Ram Anam.
>> >
>> >> Date: Fri, 8 Jun 2012 13:59:34 -0400
>> >> Subject: Re: Writing custom data import handler for Solr.
>> >> From: erickerick...@gmail.com
>> >> To: solr-user@lucene.apache.org
>> >>
>> >> You need to back up a bit and describe _why_ you want to do this,
>> >> perhaps there's
>> >> an easy way to do what you want. This could easily be an XY problem...
>> >>
>> >> For instance, you can write a SolrJ program to index data, which _might_ 
>> >> be
>> >> what you want. It's a separate process runnable anywhere. See:
>> >> http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Fri, Jun 8, 2012 at 1:29 PM, ram anam  wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > I am planning to write a custom data import handler for SOLR for some 
>> >> > data source. Could you give me some pointers to documentation, examples 
>> >> > on how to write a custom data import handler and how to integrate it 
>> >> > with SOLR. Thank you for help. Thanks and regards,Ram Anam.
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com


Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread Nitesh Nandy
Martijn,

How do we add a custom algorithm for distributing documents in Solr Cloud?
According to this discussion
http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-td3985262.html
 , Mark discourages users from using custom distribution mechanism in Solr
Cloud.

Load balancing is not an issue for us at the moment. In that case, how
should we implement a custom partitioning algorithm.


On Mon, Jun 11, 2012 at 6:23 PM, Martijn v Groningen <
martijn.v.gronin...@gmail.com> wrote:

> The ngroups returns the number of groups that have matched with the
> query. However if you want ngroups to be correct in a distributed
> environment you need
> to put document belonging to the same group into the same shard.
> Groups can't cross shard boundaries. I guess you need to do
> some manual document partitioning.
>
> Martijn
>
> On 11 June 2012 14:29, Nitesh Nandy  wrote:
> > Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices
> and
> > 2 shards)
> >
> > The setup was done as per the wiki:
> http://wiki.apache.org/solr/SolrCloud
> >
> > We are doing distributed search. While querying, we use field collapsing
> > with "ngroups" set as true as we need the number of search results.
> >
> > However, there is a difference in the number of "result list" returned
> and
> > the "ngroups" value returned.
> >
> > Ex:
> >
> http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3&&group=true&group.field=id&group.ngroups=true
> >
> >
> > The response XMl looks like
> >
> > 
> > 
> > 
> > 0
> > 46
> > 
> > id
> > true
> > true
> > messagebody:monit AND usergroupid:3
> > 
> > 
> > 
> > 
> > 10
> > 9
> > 
> > 
> > 320043
> > 
> > ...
> > 
> > 
> > 
> > 398807
> > ...
> > 
> > 
> > 
> > 346878
> > ...
> > 
> > 
> > 346880
> > ...
> > 
> > 
> > 
> > 
> > 
> >
> > So you can see that the ngroups value returned is 9 and the actual number
> > of groups returned is 4
> >
> > Why do we have this discrepancy in the ngroups, matches and actual number
> > of groups. Is this an open issue ?
> >
> >  Any kind of help is appreciated.
> >
> > --
> > Regards,
> >
> > Nitesh Nandy
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>



-- 
Regards,

Nitesh Nandy


Re: Building a heat map from geo data in index

2012-06-11 Thread Jamie Johnson
That is certainly an option but the collecting of the heat map data is
really the question.

I saw this

http://stackoverflow.com/questions/8798711/solr-using-facets-to-sum-documents-based-on-variable-precision-geohashes

but don't have a really good understanding of how this would be
accomplished.  I need to get a more firm understanding of geohashes as
my understanding is extremely lacking at this point.

On Mon, Jun 11, 2012 at 8:55 AM, Stefan Matheis
 wrote:
> I'm not entirely sure, that it has to be that complicated .. what about using 
> for example http://www.patrick-wied.at/static/heatmapjs/ ? You could collect 
> all the geo-related data and do the (heat)map stuff on the client.
>
>
>
> On Sunday, June 10, 2012 at 7:49 PM, Jamie Johnson wrote:
>
>> I had a request from a customer which to this point I have not seen
>> much similar so I figured I'd pose the question here. I've been asked
>> if it was possible to build a heat map from the results of a query. I
>> can imagine a process to do this through some post processing, but
>> that sounds very expensive for large/distributed indices so I was
>> wondering if with all of the new geospatial support that is being
>> added to lucene/solr there was a way to do geospatial faceting. What
>> I am imagining is bounding box being defined and that box being broken
>> into an N by N matrix, each of which would return counts so a heat map
>> could be constructed. Any other thoughts on this would be greatly
>> appreciated, right now I am really just fishing for some ideas.
>
>
>


RE: defaultSearchField not working after upgrade to solr3.6

2012-06-11 Thread Rohit
Hi Jack,

I understand that df would make this work normaly, but why did
defaultSearchField stop working suddenly. I notice that there is talk about
deprecating it, but even then it should continue to work right? 

Regards,
Rohit

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: 11 June 2012 18:49
To: solr-user@lucene.apache.org
Subject: Re: defaultSearchField not working after upgrade to solr3.6

Add the "df" parameter to your query request handler. It names the default
field. Or use "qf" for the edismax query parser.

-- Jack Krupansky

-Original Message-
From: Rohit
Sent: Monday, June 11, 2012 8:58 AM
To: solr-user@lucene.apache.org
Subject: defaultSearchField not working after upgrade to solr3.6

Hi,



We have just migrated from solr3.5 to solr3.6, for all this time we have
been querying solr as,



http://122.166.9.144:8080/solr/

<>/?q=apple



But now this is not working and the name of the search field needs to be
provided everytime, which was not the case earlier. What might be casing
this?



Regards,

Rohit






Re: Building a heat map from geo data in index

2012-06-11 Thread Dmitry Kan
so it sounds to me, that the geohash is just a hash representation of lat,
lon coordinates for an easier referencing (see e.g.
http://en.wikipedia.org/wiki/Geohash).
I would probably start with something easier, having bbox lat,lon
coordinate pairs of top left corner (or in some coordinate systems, it is
down left corner), break each bbox into cells of size w/N, h/N (and
probably, that's equal numbers). Then you can loop over the cells and
compute your facet counts with bbox of a cell. You could then evolve this
to geohashes, if you want, but at least you would know where to start.

-- Dmitry

On Mon, Jun 11, 2012 at 4:48 PM, Jamie Johnson  wrote:

> That is certainly an option but the collecting of the heat map data is
> really the question.
>
> I saw this
>
>
> http://stackoverflow.com/questions/8798711/solr-using-facets-to-sum-documents-based-on-variable-precision-geohashes
>
> but don't have a really good understanding of how this would be
> accomplished.  I need to get a more firm understanding of geohashes as
> my understanding is extremely lacking at this point.
>
> On Mon, Jun 11, 2012 at 8:55 AM, Stefan Matheis
>  wrote:
> > I'm not entirely sure, that it has to be that complicated .. what about
> using for example http://www.patrick-wied.at/static/heatmapjs/ ? You
> could collect all the geo-related data and do the (heat)map stuff on the
> client.
> >
> >
> >
> > On Sunday, June 10, 2012 at 7:49 PM, Jamie Johnson wrote:
> >
> >> I had a request from a customer which to this point I have not seen
> >> much similar so I figured I'd pose the question here. I've been asked
> >> if it was possible to build a heat map from the results of a query. I
> >> can imagine a process to do this through some post processing, but
> >> that sounds very expensive for large/distributed indices so I was
> >> wondering if with all of the new geospatial support that is being
> >> added to lucene/solr there was a way to do geospatial faceting. What
> >> I am imagining is bounding box being defined and that box being broken
> >> into an N by N matrix, each of which would return counts so a heat map
> >> could be constructed. Any other thoughts on this would be greatly
> >> appreciated, right now I am really just fishing for some ideas.
> >
> >
> >
>



-- 
Regards,

Dmitry Kan


Re: How to do custom sorting in Solr?

2012-06-11 Thread Afroz Ahmad
You may want to look at
http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html.
While it is not the same requirement, this should give you an idea of how
to do custom sorting.

Thanks
Afroz

On Sun, Jun 10, 2012 at 4:43 PM, roz dev  wrote:

> Yes, these documents have lots of unique values as the same product could
> be assigned to lots of other categories and that too, in a different sort
> order.
>
> We did some evaluation of heap usage and found that with kind of queries we
> generate, heap usage was going up to 24-26 GB. I could trace it to the fact
> that
> fieldCache is creating an array of 2M size for each of the sort fields.
>
> Since same products are mapped to multiple categories, we incur significant
> memory overhead. Therefore, any solve where memory consumption can be
> reduced is a good one for me.
>
> In fact, we have situations where same product is mapped to more than 1
> sub-category in the same category like
>
>
> Books
>  -- Programming
>  - Java in a nutshell
>  -- Sale (40% off)
>  - Java in a nutshell
>
>
> So,another thought in my mind is to somehow use second pass collector to
> group books appropriately in Programming and Sale categories, with right
> sort order.
>
> But, i have no clue about that piece :(
>
> -Saroj
>
>
> On Sun, Jun 10, 2012 at 4:30 PM, Erick Erickson  >wrote:
>
> > 2M docs is actually pretty small. Sorting is sensitive to the number
> > of _unique_ values in the sort fields, not necessarily the number of
> > documents.
> >
> > And sorting only works on fields with a single value (i.e. it can't have
> > more than one token after analysis). So for each field you're only
> talking
> > 2M values at the vary maximum, assuming that the field in question has
> > a unique value per document, which I doubt very much given your
> > problem description.
> >
> > So with a corpus that size, I'd "just try it'.
> >
> > Best
> > Erick
> >
> > On Sun, Jun 10, 2012 at 7:12 PM, roz dev  wrote:
> > > Thanks Erik for your quick feedback
> > >
> > > When Products are assigned to a category or Sub-Category then they can
> be
> > > in any order and price type can be regular or markdown.
> > > So, reg and markdown products are intermingled  as per their assignment
> > but
> > > I want to sort them in such a way that we
> > > ensure that all the products which are on markdown are at the bottom of
> > the
> > > list.
> > >
> > > I can use these multiple sorts but I realize that they are costly in
> > terms
> > > of heap used, as they are using FieldCache.
> > >
> > > I have an index with 2M docs and docs are pretty big. So, I don't want
> to
> > > use them unless there is no other option.
> > >
> > > I am wondering if I can define a custom function query which can be
> like
> > > this:
> > >
> > >
> > >   - check if product is on the markdown
> > >   - if yes then change its sort order field to be the max value in the
> > >   given sub-category, say 99
> > >   - else, use the sort order of the product in the sub-category
> > >
> > > I have been looking at existing function queries but do not have a good
> > > handle on how to make one of my own.
> > >
> > > - Another option could be use a custom sort comparator but I am not
> sure
> > > about the way it works
> > >
> > > Any thoughts?
> > >
> > >
> > > -Saroj
> > >
> > >
> > >
> > >
> > > On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson <
> erickerick...@gmail.com
> > >wrote:
> > >
> > >> Skimming this, I two options come to mind:
> > >>
> > >> 1> Simply apply primary, secondary, etc sorts. Something like
> > >>   &sort=subcategory asc,markdown_or_regular desc,sort_order asc
> > >>
> > >> 2> You could also use grouping to arrange things in groups and sort
> > within
> > >>  those groups. This has the advantage of returning some members
> > >>  of each of the top N groups in the result set, which makes it
> > easier
> > >> to
> > >>  get some of each group rather than having to analyze the whole
> > >> list
> > >>
> > >> But your example is somewhat contradictory. You say
> > >> "products which are on markdown, are at
> > >> the bottom of the documents list"
> > >>
> > >> But in your examples, products on "markdown" are intermingled
> > >>
> > >> Best
> > >> Erick
> > >>
> > >> On Sun, Jun 10, 2012 at 3:36 AM, roz dev  wrote:
> > >> > Hi All
> > >> >
> > >> >>
> > >> >> I have an index which contains a Catalog of Products and
> Categories,
> > >> with
> > >> >> Solr 4.0 from trunk
> > >> >>
> > >> >> Data is organized like this:
> > >> >>
> > >> >> Category: Books
> > >> >>
> > >> >> Sub Category: Programming
> > >> >>
> > >> >> Products:
> > >> >>
> > >> >> Product # 1,  Price: Regular Sort Order:1
> > >> >> Product # 2,  Price: Markdown, Sort Order:2
> > >> >> Product # 3   Price: Regular, Sort Order:3
> > >> >> Product # 4   Price: Regular, Sort Order:4
> > >> >> 
> > >> >> .
> > >> >> ...
> > >> >> Product # 100   Price: Regular, Sort Order:100
> > >> >>
> > >> >> Sub Ca

Re: Building a heat map from geo data in index

2012-06-11 Thread Tanguy Moal
There is definitely something interesting to do around geohashes.

I'm wondering how one could map the N by N tiles requested tiles to a range
of geohashes. (Where the gap would be a function of N).
What I try to mean is that I don't know if a bijective function exist
between tiles and geohash ranges.
I don't even know if a contiguous range of geohashes ends up in a squared
box.

Because if you can find such a function, then you could probably solve the
issue by asking facet ranges on a geohash field to solr.

I don't if that helps but the topic is very interesting to me...
Please share your findings, if any :-)

--
Tanguy

2012/6/11 Dmitry Kan 

> so it sounds to me, that the geohash is just a hash representation of lat,
> lon coordinates for an easier referencing (see e.g.
> http://en.wikipedia.org/wiki/Geohash).
> I would probably start with something easier, having bbox lat,lon
> coordinate pairs of top left corner (or in some coordinate systems, it is
> down left corner), break each bbox into cells of size w/N, h/N (and
> probably, that's equal numbers). Then you can loop over the cells and
> compute your facet counts with bbox of a cell. You could then evolve this
> to geohashes, if you want, but at least you would know where to start.
>
> -- Dmitry
>
> On Mon, Jun 11, 2012 at 4:48 PM, Jamie Johnson  wrote:
>
> > That is certainly an option but the collecting of the heat map data is
> > really the question.
> >
> > I saw this
> >
> >
> >
> http://stackoverflow.com/questions/8798711/solr-using-facets-to-sum-documents-based-on-variable-precision-geohashes
> >
> > but don't have a really good understanding of how this would be
> > accomplished.  I need to get a more firm understanding of geohashes as
> > my understanding is extremely lacking at this point.
> >
> > On Mon, Jun 11, 2012 at 8:55 AM, Stefan Matheis
> >  wrote:
> > > I'm not entirely sure, that it has to be that complicated .. what about
> > using for example http://www.patrick-wied.at/static/heatmapjs/ ? You
> > could collect all the geo-related data and do the (heat)map stuff on the
> > client.
> > >
> > >
> > >
> > > On Sunday, June 10, 2012 at 7:49 PM, Jamie Johnson wrote:
> > >
> > >> I had a request from a customer which to this point I have not seen
> > >> much similar so I figured I'd pose the question here. I've been asked
> > >> if it was possible to build a heat map from the results of a query. I
> > >> can imagine a process to do this through some post processing, but
> > >> that sounds very expensive for large/distributed indices so I was
> > >> wondering if with all of the new geospatial support that is being
> > >> added to lucene/solr there was a way to do geospatial faceting. What
> > >> I am imagining is bounding box being defined and that box being broken
> > >> into an N by N matrix, each of which would return counts so a heat map
> > >> could be constructed. Any other thoughts on this would be greatly
> > >> appreciated, right now I am really just fishing for some ideas.
> > >
> > >
> > >
> >
>
>
>
> --
> Regards,
>
> Dmitry Kan
>


Re: defaultSearchField not working after upgrade to solr3.6

2012-06-11 Thread Jack Krupansky
Just to clarify one point from my original response, the "df" parameter is 
already set for the default request handlers, so all you need to do is 
change it from the "text" field to your preferred default field.


Or, you can simply uncomment the deprecated defaultSearchField element in 
your schema and you should get the old behavior.


As far as the rationale, the discussion is here:
https://issues.apache.org/jira/browse/SOLR-2724
"Deprecate defaultSearchField and defaultOperator defined in schema.xml"

In 4.x, this change was reverted, so the defaultSearchField element is 
present.


The issue is still open for 4.x.

Feel free to comment directly on that Jira.

-- Jack Krupansky

-Original Message- 
From: Rohit

Sent: Monday, June 11, 2012 9:49 AM
To: solr-user@lucene.apache.org
Subject: RE: defaultSearchField not working after upgrade to solr3.6

Hi Jack,

I understand that df would make this work normaly, but why did
defaultSearchField stop working suddenly. I notice that there is talk about
deprecating it, but even then it should continue to work right?

Regards,
Rohit

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: 11 June 2012 18:49
To: solr-user@lucene.apache.org
Subject: Re: defaultSearchField not working after upgrade to solr3.6

Add the "df" parameter to your query request handler. It names the default
field. Or use "qf" for the edismax query parser.

-- Jack Krupansky

-Original Message-
From: Rohit
Sent: Monday, June 11, 2012 8:58 AM
To: solr-user@lucene.apache.org
Subject: defaultSearchField not working after upgrade to solr3.6

Hi,



We have just migrated from solr3.5 to solr3.6, for all this time we have
been querying solr as,



http://122.166.9.144:8080/solr/

<>/?q=apple



But now this is not working and the name of the search field needs to be
provided everytime, which was not the case earlier. What might be casing
this?



Regards,

Rohit





Re: Building a heat map from geo data in index

2012-06-11 Thread Jamie Johnson
If you look at the Stack response from David he had suggested breaking
the geohash up into pieces and then using a prefix for refining
precision.  I hadn't imagined limiting this to a particular area, just
limiting it based on the prefix (which would be based on users zoom
level or something) allowing the information to become more precise as
the user zoomed in.  That seemed a very reasonable approach to the
problem.

On Mon, Jun 11, 2012 at 10:55 AM, Tanguy Moal  wrote:
> There is definitely something interesting to do around geohashes.
>
> I'm wondering how one could map the N by N tiles requested tiles to a range
> of geohashes. (Where the gap would be a function of N).
> What I try to mean is that I don't know if a bijective function exist
> between tiles and geohash ranges.
> I don't even know if a contiguous range of geohashes ends up in a squared
> box.
>
> Because if you can find such a function, then you could probably solve the
> issue by asking facet ranges on a geohash field to solr.
>
> I don't if that helps but the topic is very interesting to me...
> Please share your findings, if any :-)
>
> --
> Tanguy
>
> 2012/6/11 Dmitry Kan 
>
>> so it sounds to me, that the geohash is just a hash representation of lat,
>> lon coordinates for an easier referencing (see e.g.
>> http://en.wikipedia.org/wiki/Geohash).
>> I would probably start with something easier, having bbox lat,lon
>> coordinate pairs of top left corner (or in some coordinate systems, it is
>> down left corner), break each bbox into cells of size w/N, h/N (and
>> probably, that's equal numbers). Then you can loop over the cells and
>> compute your facet counts with bbox of a cell. You could then evolve this
>> to geohashes, if you want, but at least you would know where to start.
>>
>> -- Dmitry
>>
>> On Mon, Jun 11, 2012 at 4:48 PM, Jamie Johnson  wrote:
>>
>> > That is certainly an option but the collecting of the heat map data is
>> > really the question.
>> >
>> > I saw this
>> >
>> >
>> >
>> http://stackoverflow.com/questions/8798711/solr-using-facets-to-sum-documents-based-on-variable-precision-geohashes
>> >
>> > but don't have a really good understanding of how this would be
>> > accomplished.  I need to get a more firm understanding of geohashes as
>> > my understanding is extremely lacking at this point.
>> >
>> > On Mon, Jun 11, 2012 at 8:55 AM, Stefan Matheis
>> >  wrote:
>> > > I'm not entirely sure, that it has to be that complicated .. what about
>> > using for example http://www.patrick-wied.at/static/heatmapjs/ ? You
>> > could collect all the geo-related data and do the (heat)map stuff on the
>> > client.
>> > >
>> > >
>> > >
>> > > On Sunday, June 10, 2012 at 7:49 PM, Jamie Johnson wrote:
>> > >
>> > >> I had a request from a customer which to this point I have not seen
>> > >> much similar so I figured I'd pose the question here. I've been asked
>> > >> if it was possible to build a heat map from the results of a query. I
>> > >> can imagine a process to do this through some post processing, but
>> > >> that sounds very expensive for large/distributed indices so I was
>> > >> wondering if with all of the new geospatial support that is being
>> > >> added to lucene/solr there was a way to do geospatial faceting. What
>> > >> I am imagining is bounding box being defined and that box being broken
>> > >> into an N by N matrix, each of which would return counts so a heat map
>> > >> could be constructed. Any other thoughts on this would be greatly
>> > >> appreciated, right now I am really just fishing for some ideas.
>> > >
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> Regards,
>>
>> Dmitry Kan
>>


Re: Building a heat map from geo data in index

2012-06-11 Thread Tanguy Moal
Yes it looks interesting and is not too difficult to do.
However, the length of the geohashes gives you very little control on the
size of the regions to colorize. Quoting wikipedia :
geohash length


km error1



±25002



±6303



±784


±205


±2.46



±0.617



±0.0768



±0.019
This is interesting also : http://wiki.openstreetmap.org/wiki/QuadTiles
But it does what you're looking for, somehow :)

--
Tanguy


2012/6/11 Jamie Johnson 

> If you look at the Stack response from David he had suggested breaking
> the geohash up into pieces and then using a prefix for refining
> precision.  I hadn't imagined limiting this to a particular area, just
> limiting it based on the prefix (which would be based on users zoom
> level or something) allowing the information to become more precise as
> the user zoomed in.  That seemed a very reasonable approach to the
> problem.
>
> On Mon, Jun 11, 2012 at 10:55 AM, Tanguy Moal 
> wrote:
> > There is definitely something interesting to do around geohashes.
> >
> > I'm wondering how one could map the N by N tiles requested tiles to a
> range
> > of geohashes. (Where the gap would be a function of N).
> > What I try to mean is that I don't know if a bijective function exist
> > between tiles and geohash ranges.
> > I don't even know if a contiguous range of geohashes ends up in a squared
> > box.
> >
> > Because if you can find such a function, then you could probably solve
> the
> > issue by asking facet ranges on a geohash field to solr.
> >
> > I don't if that helps but the topic is very interesting to me...
> > Please share your findings, if any :-)
> >
> > --
> > Tanguy
> >
> > 2012/6/11 Dmitry Kan 
> >
> >> so it sounds to me, that the geohash is just a hash representation of
> lat,
> >> lon coordinates for an easier referencing (see e.g.
> >> http://en.wikipedia.org/wiki/Geohash).
> >> I would probably start with something easier, having bbox lat,lon
> >> coordinate pairs of top left corner (or in some coordinate systems, it
> is
> >> down left corner), break each bbox into cells of size w/N, h/N (and
> >> probably, that's equal numbers). Then you can loop over the cells and
> >> compute your facet counts with bbox of a cell. You could then evolve
> this
> >> to geohashes, if you want, but at least you would know where to start.
> >>
> >> -- Dmitry
> >>
> >> On Mon, Jun 11, 2012 at 4:48 PM, Jamie Johnson 
> wrote:
> >>
> >> > That is certainly an option but the collecting of the heat map data is
> >> > really the question.
> >> >
> >> > I saw this
> >> >
> >> >
> >> >
> >>
> http://stackoverflow.com/questions/8798711/solr-using-facets-to-sum-documents-based-on-variable-precision-geohashes
> >> >
> >> > but don't have a really good understanding of how this would be
> >> > accomplished.  I need to get a more firm understanding of geohashes as
> >> > my understanding is extremely lacking at this point.
> >> >
> >> > On Mon, Jun 11, 2012 at 8:55 AM, Stefan Matheis
> >> >  wrote:
> >> > > I'm not entirely sure, that it has to be that complicated .. what
> about
> >> > using for example http://www.patrick-wied.at/static/heatmapjs/ ? You
> >> > could collect all the geo-related data and do the (heat)map stuff on
> the
> >> > client.
> >> > >
> >> > >
> >> > >
> >> > > On Sunday, June 10, 2012 at 7:49 PM, Jamie Johnson wrote:
> >> > >
> >> > >> I had a request from a customer which to this point I have not seen
> >> > >> much similar so I figured I'd pose the question here. I've been
> asked
> >> > >> if it was possible to build a heat map from the results of a
> query. I
> >> > >> can imagine a process to do this through some post processing, but
> >> > >> that sounds very expensive for large/distributed indices so I was
> >> > >> wondering if with all of the new geospatial support that is being
> >> > >> added to lucene/solr there was a way to do geospatial faceting.
> What
> >> > >> I am imagining is bounding box being defined and that box being
> broken
> >> > >> into an N by N matrix, each of which would return counts so a heat
> map
> >> > >> could be constructed. Any other thoughts on this would be greatly
> >> > >> appreciated, right now I am really just fishing for some ideas.
> >> > >
> >> > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Regards,
> >>
> >> Dmitry Kan
> >>
>


Indexing Multiple Datasources

2012-06-11 Thread Kay
Hello,

We have 2 MS SQL Server Databases which we wanted to index .But most of the
columns in the Databases have the same names. For e.g. Both the DB’s have
the columns First name ,Last name ,etc.

How can you index multiple Databases using single db-data-config file and
one schema? 

Here is my data-config file
























 


And schema file:











 BusinessEntityID

 LastName


We would appreciate your help!

Thanks!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Multiple-Datasources-tp3988957.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Building a heat map from geo data in index

2012-06-11 Thread Jamie Johnson
Yeah I'll have to play to see how useful it is, I really don't know at
this point.

On another note we already using some binning like is described in teh
wiki you sent, specifically http://code.google.com/p/javageomodel/ for
other purposes.  Not sure if that could be used or not, guess I'd have
to think on it harder.


On Mon, Jun 11, 2012 at 12:04 PM, Tanguy Moal  wrote:
> Yes it looks interesting and is not too difficult to do.
> However, the length of the geohashes gives you very little control on the
> size of the regions to colorize. Quoting wikipedia :
> geohash length
>
>
> km error1
>
>
>
> ±25002
>
>
>
> ±6303
>
>
>
> ±784
>
>
> ±205
>
>
> ±2.46
>
>
>
> ±0.617
>
>
>
> ±0.0768
>
>
>
> ±0.019
> This is interesting also : http://wiki.openstreetmap.org/wiki/QuadTiles
> But it does what you're looking for, somehow :)
>
> --
> Tanguy
>
>
> 2012/6/11 Jamie Johnson 
>
>> If you look at the Stack response from David he had suggested breaking
>> the geohash up into pieces and then using a prefix for refining
>> precision.  I hadn't imagined limiting this to a particular area, just
>> limiting it based on the prefix (which would be based on users zoom
>> level or something) allowing the information to become more precise as
>> the user zoomed in.  That seemed a very reasonable approach to the
>> problem.
>>
>> On Mon, Jun 11, 2012 at 10:55 AM, Tanguy Moal 
>> wrote:
>> > There is definitely something interesting to do around geohashes.
>> >
>> > I'm wondering how one could map the N by N tiles requested tiles to a
>> range
>> > of geohashes. (Where the gap would be a function of N).
>> > What I try to mean is that I don't know if a bijective function exist
>> > between tiles and geohash ranges.
>> > I don't even know if a contiguous range of geohashes ends up in a squared
>> > box.
>> >
>> > Because if you can find such a function, then you could probably solve
>> the
>> > issue by asking facet ranges on a geohash field to solr.
>> >
>> > I don't if that helps but the topic is very interesting to me...
>> > Please share your findings, if any :-)
>> >
>> > --
>> > Tanguy
>> >
>> > 2012/6/11 Dmitry Kan 
>> >
>> >> so it sounds to me, that the geohash is just a hash representation of
>> lat,
>> >> lon coordinates for an easier referencing (see e.g.
>> >> http://en.wikipedia.org/wiki/Geohash).
>> >> I would probably start with something easier, having bbox lat,lon
>> >> coordinate pairs of top left corner (or in some coordinate systems, it
>> is
>> >> down left corner), break each bbox into cells of size w/N, h/N (and
>> >> probably, that's equal numbers). Then you can loop over the cells and
>> >> compute your facet counts with bbox of a cell. You could then evolve
>> this
>> >> to geohashes, if you want, but at least you would know where to start.
>> >>
>> >> -- Dmitry
>> >>
>> >> On Mon, Jun 11, 2012 at 4:48 PM, Jamie Johnson 
>> wrote:
>> >>
>> >> > That is certainly an option but the collecting of the heat map data is
>> >> > really the question.
>> >> >
>> >> > I saw this
>> >> >
>> >> >
>> >> >
>> >>
>> http://stackoverflow.com/questions/8798711/solr-using-facets-to-sum-documents-based-on-variable-precision-geohashes
>> >> >
>> >> > but don't have a really good understanding of how this would be
>> >> > accomplished.  I need to get a more firm understanding of geohashes as
>> >> > my understanding is extremely lacking at this point.
>> >> >
>> >> > On Mon, Jun 11, 2012 at 8:55 AM, Stefan Matheis
>> >> >  wrote:
>> >> > > I'm not entirely sure, that it has to be that complicated .. what
>> about
>> >> > using for example http://www.patrick-wied.at/static/heatmapjs/ ? You
>> >> > could collect all the geo-related data and do the (heat)map stuff on
>> the
>> >> > client.
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Sunday, June 10, 2012 at 7:49 PM, Jamie Johnson wrote:
>> >> > >
>> >> > >> I had a request from a customer which to this point I have not seen
>> >> > >> much similar so I figured I'd pose the question here. I've been
>> asked
>> >> > >> if it was possible to build a heat map from the results of a
>> query. I
>> >> > >> can imagine a process to do this through some post processing, but
>> >> > >> that sounds very expensive for large/distributed indices so I was
>> >> > >> wondering if with all of the new geospatial support that is being
>> >> > >> added to lucene/solr there was a way to do geospatial faceting.
>> What
>> >> > >> I am imagining is bounding box being defined and that box being
>> broken
>> >> > >> into an N by N matrix, each of which would return counts so a heat
>> map
>> >> > >> could be constructed. Any other thoughts on this would be greatly
>> >> > >> appreciated, right now I am really just fishing for some ideas.
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >>
>> >> Dmitry Kan
>> >>
>>


RE: defaultSearchField not working after upgrade to solr3.6

2012-06-11 Thread Rohit
Thanks for the pointers Jack, actually the strange part is that the
defaultSearchField element is present and uncommented yet not working.

docKey
searchText


Regards,
Rohit

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: 11 June 2012 20:35
To: solr-user@lucene.apache.org
Subject: Re: defaultSearchField not working after upgrade to solr3.6

Just to clarify one point from my original response, the "df" parameter is
already set for the default request handlers, so all you need to do is
change it from the "text" field to your preferred default field.

Or, you can simply uncomment the deprecated defaultSearchField element in
your schema and you should get the old behavior.

As far as the rationale, the discussion is here:
https://issues.apache.org/jira/browse/SOLR-2724
"Deprecate defaultSearchField and defaultOperator defined in schema.xml"

In 4.x, this change was reverted, so the defaultSearchField element is
present.

The issue is still open for 4.x.

Feel free to comment directly on that Jira.

-- Jack Krupansky

-Original Message-
From: Rohit
Sent: Monday, June 11, 2012 9:49 AM
To: solr-user@lucene.apache.org
Subject: RE: defaultSearchField not working after upgrade to solr3.6

Hi Jack,

I understand that df would make this work normaly, but why did
defaultSearchField stop working suddenly. I notice that there is talk about
deprecating it, but even then it should continue to work right?

Regards,
Rohit

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: 11 June 2012 18:49
To: solr-user@lucene.apache.org
Subject: Re: defaultSearchField not working after upgrade to solr3.6

Add the "df" parameter to your query request handler. It names the default
field. Or use "qf" for the edismax query parser.

-- Jack Krupansky

-Original Message-
From: Rohit
Sent: Monday, June 11, 2012 8:58 AM
To: solr-user@lucene.apache.org
Subject: defaultSearchField not working after upgrade to solr3.6

Hi,



We have just migrated from solr3.5 to solr3.6, for all this time we have
been querying solr as,



http://122.166.9.144:8080/solr/

<>/?q=apple



But now this is not working and the name of the search field needs to be
provided everytime, which was not the case earlier. What might be casing
this?



Regards,

Rohit






Re: defaultSearchField not working after upgrade to solr3.6

2012-06-11 Thread Jack Krupansky

Correct. In 3.6 it is simply ignored. In 4.x it currently does work.

Generally, Solr ignores any elements that it does not support.

-- Jack Krupansky

-Original Message- 
From: Rohit 
Sent: Monday, June 11, 2012 12:55 PM 
To: solr-user@lucene.apache.org 
Subject: RE: defaultSearchField not working after upgrade to solr3.6 


Thanks for the pointers Jack, actually the strange part is that the
defaultSearchField element is present and uncommented yet not working.

docKey
searchText


Regards,
Rohit

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: 11 June 2012 20:35

To: solr-user@lucene.apache.org
Subject: Re: defaultSearchField not working after upgrade to solr3.6

Just to clarify one point from my original response, the "df" parameter is
already set for the default request handlers, so all you need to do is
change it from the "text" field to your preferred default field.

Or, you can simply uncomment the deprecated defaultSearchField element in
your schema and you should get the old behavior.

As far as the rationale, the discussion is here:
https://issues.apache.org/jira/browse/SOLR-2724
"Deprecate defaultSearchField and defaultOperator defined in schema.xml"

In 4.x, this change was reverted, so the defaultSearchField element is
present.

The issue is still open for 4.x.

Feel free to comment directly on that Jira.

-- Jack Krupansky

-Original Message-
From: Rohit
Sent: Monday, June 11, 2012 9:49 AM
To: solr-user@lucene.apache.org
Subject: RE: defaultSearchField not working after upgrade to solr3.6

Hi Jack,

I understand that df would make this work normaly, but why did
defaultSearchField stop working suddenly. I notice that there is talk about
deprecating it, but even then it should continue to work right?

Regards,
Rohit

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: 11 June 2012 18:49
To: solr-user@lucene.apache.org
Subject: Re: defaultSearchField not working after upgrade to solr3.6

Add the "df" parameter to your query request handler. It names the default
field. Or use "qf" for the edismax query parser.

-- Jack Krupansky

-Original Message-
From: Rohit
Sent: Monday, June 11, 2012 8:58 AM
To: solr-user@lucene.apache.org
Subject: defaultSearchField not working after upgrade to solr3.6

Hi,



We have just migrated from solr3.5 to solr3.6, for all this time we have
been querying solr as,



http://122.166.9.144:8080/solr/

<>/?q=apple



But now this is not working and the name of the search field needs to be
provided everytime, which was not the case earlier. What might be casing
this?



Regards,

Rohit





Re: edismax and untokenized field

2012-06-11 Thread Vijay Ramachandran
Thank you for your reply. Sending this as a phrase query does change the
results as expected.

On Mon, Jun 11, 2012 at 4:39 PM, Tanguy Moal  wrote:

> I think you have to issue a phrase query in such a case because otherwise
> each "token" is searched independently in the merchant field : the query
> parser splits the query on spaces!
>
>
So parsing of query is dependent in part on the query handling itself,
independent of the field definition?


> Check the difference between debug outputs when you search for "Jones New
> York", you'd get what you expected.
>

Yes, that gives the expected result. So, I should make a separate query to
the merchant field as a phrase?

thanks!
Vijay


Re: Issue with field collapsing in solr 4 while performing distributed search

2012-06-11 Thread roz dev
I think that there is no way around doing custom logic in this case.

If indexing process knows that documents have to be grouped then they
better be together.

-Saroj


On Mon, Jun 11, 2012 at 6:37 AM, Nitesh Nandy  wrote:

> Martijn,
>
> How do we add a custom algorithm for distributing documents in Solr Cloud?
> According to this discussion
>
> http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-td3985262.html
>  , Mark discourages users from using custom distribution mechanism in Solr
> Cloud.
>
> Load balancing is not an issue for us at the moment. In that case, how
> should we implement a custom partitioning algorithm.
>
>
> On Mon, Jun 11, 2012 at 6:23 PM, Martijn v Groningen <
> martijn.v.gronin...@gmail.com> wrote:
>
> > The ngroups returns the number of groups that have matched with the
> > query. However if you want ngroups to be correct in a distributed
> > environment you need
> > to put document belonging to the same group into the same shard.
> > Groups can't cross shard boundaries. I guess you need to do
> > some manual document partitioning.
> >
> > Martijn
> >
> > On 11 June 2012 14:29, Nitesh Nandy  wrote:
> > > Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud  (2 slices
> > and
> > > 2 shards)
> > >
> > > The setup was done as per the wiki:
> > http://wiki.apache.org/solr/SolrCloud
> > >
> > > We are doing distributed search. While querying, we use field
> collapsing
> > > with "ngroups" set as true as we need the number of search results.
> > >
> > > However, there is a difference in the number of "result list" returned
> > and
> > > the "ngroups" value returned.
> > >
> > > Ex:
> > >
> >
> http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3&&group=true&group.field=id&group.ngroups=true
> > >
> > >
> > > The response XMl looks like
> > >
> > > 
> > > 
> > > 
> > > 0
> > > 46
> > > 
> > > id
> > > true
> > > true
> > > messagebody:monit AND usergroupid:3
> > > 
> > > 
> > > 
> > > 
> > > 10
> > > 9
> > > 
> > > 
> > > 320043
> > > 
> > > ...
> > > 
> > > 
> > > 
> > > 398807
> > > ...
> > > 
> > > 
> > > 
> > > 346878
> > > ...
> > > 
> > > 
> > > 346880
> > > ...
> > > 
> > > 
> > > 
> > > 
> > > 
> > >
> > > So you can see that the ngroups value returned is 9 and the actual
> number
> > > of groups returned is 4
> > >
> > > Why do we have this discrepancy in the ngroups, matches and actual
> number
> > > of groups. Is this an open issue ?
> > >
> > >  Any kind of help is appreciated.
> > >
> > > --
> > > Regards,
> > >
> > > Nitesh Nandy
> >
> >
> >
> > --
> > Met vriendelijke groet,
> >
> > Martijn van Groningen
> >
>
>
>
> --
> Regards,
>
> Nitesh Nandy
>


Re: search for alphabetic version of numbers

2012-06-11 Thread Jack Krupansky
You can certainly do a modest number of special cases as replacement 
synonyms, but if you are serious about arbitrary number support, it might be 
best to go with a custom update processor and query preprocessor that map 
text numbers to simple numeric form.


How about cases like 2,300 or 2,300.00 (embedded commas or even decimal 
point) - two thousand three hundred or 23 hundred or twenty three hundred?


Or 200 million vs 200,000,000 vs. 2?

In any case, synonyms get really messy really quickly, but with 
preprocessors you can do whatever you want


-- Jack Krupansky

-Original Message- 
From: Alireza Salimi

Sent: Monday, June 11, 2012 2:41 PM
To: solr-user@lucene.apache.org
Subject: search for alphabetic version of numbers

Hi everybody,

I have the requirement to support searching for numbers with their
alphabetic or by digits.
For example, if we have a document with a field's value of '200',
if we search for "two hundred", that document should match.

I haven't found anything like this yet. Do we have other option than
define the most common numbers and their string versions as
synonyms?

Thanks


--
Alireza Salimi
Java EE Developer 



Re: Sharing common config between different search handlers

2012-06-11 Thread Chris Hostetter

: But I would like those two Searchhandlers to share the rest of their
: configuration. Because if anything needs to be changed, it need to be
: done for both Searchhandlers. I think that's kind of ugly.

take a look at using XML includes (aka "xinclude") ... that would let you 
keep the common child elements in a disintct file that would be included.



-Hoss


Re: Sorting with customized function of score

2012-06-11 Thread Chris Hostetter

: I'm using the solr 4.0 nightly build version. In fact I intend to sort with
: a more complicate function including score, geodist() and other factors, so
: this example is to simplify the issue that I cannot sort with a customized
: function of score.
: More concrete, how can i make the sort like:
: sort=product(div(1,geodist()),score) desc ?

in order to include a "score" of a query in a function (any function, 
regardless of where/how you are using that function -- in this case in a 
"sort") you need to be explict about the query whose score you want to use 
via the "query()" function.

So something like this would probably do what you want...

q=whatever_your_query_is&sort=product(div(1,geodist()),query($q))+desc

...but i believe it would be more efficient as ...

qq=whatever_your_query_is&q={!boost+b=div(1,geodist())+v=$qq}&sort=score+desc


-Hoss


Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-11 Thread Chris Hostetter

: The new FieldValueSubsetUpdateProcessorFactory classes look phenomenal. I
: haven't looked yet, but what are the chances these will be back-ported to
: 3.6 (or how hard would it be to backport them?)... I'll have to check out
: the source in more detail.

3.x is bug fix only as we now focus on 4.0 ... but these particular 
classes are fairly straight foward and isolated should be realtively easy 
for someoen with java knowledge to backport to 3.6

: If stuck on 3.6, what would be the best way to deal with this situation?
: It's currently looking like it will have to be a custom update handler, but

In this day and age, a custom update handler is almost never the right 
answer to a problem -- nor is a custom request handler that does updates 
(theose two things are actaully different) ... my advice is always to 
start by trying to impliment what you need as an UpdateRequestProcessor, 
and if that doesn't work out then refactor your code to be a Request 
Handler instead.

-Hoss


Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-11 Thread Aaron Daubman
While I look into doing some refactoring, as well as creating some new
UpdateRequestProcessors (and/or backporting), would you please point me to
some reading material on why you say the following:

In this day and age, a custom update handler is almost never the right
> answer to a problem -- nor is a custom request handler that does updates
> (theose two things are actaully different) ... my advice is always to
> start by trying to impliment what you need as an UpdateRequestProcessor,
> and if that doesn't work out then refactor your code to be a Request
> Handler instead.
>

e.g. benefits of UpdateRequestProcessor over custom update handler?

Thanks again for the great pointers,
  Aaron


Re: score filter

2012-06-11 Thread Chris Hostetter

: I need to frame a query that is a combination of two query parts and I use a
: 'function' query to prepare the same. Something like:
: q={!type=func q.op=AND df=text}product(query($uq,0.0),query($cq,0.1))
: 
: where $uq and $cq are two queries.
: 
: Now, I want a search result returned only if I get a hit on $uq. So, I
: specify default value of $uq query as 0.0 in order for the final score to be
: zero in cases where $uq doesn't record a hit. Even though, the scoring works
: as expected (i.e, document that don't match $uq have a score of zero), all
: the documents are returned as search results. Is there a way to filter
: search results that have a score of zero?

a) you could wrap your query in {!frange} .. but that will make everything 
that does have a value> 0.0 get the same final score

b) you could use an fq={!frange} that refers back to your original $q

c) you could just use an fq that refers directly to your $uq since that's 
what you say you actaully want to filter on in the first place..

uq=...
cq=...
q={!type=func q.op=AND df=text}product(query($uq,0.0),query($cq,0.1))
fq={!v=uq}

-Hoss


Re: Correct way to deal with source data that may include a multivalued field that needs to be used for sorting?

2012-06-11 Thread Chris Hostetter

: In this day and age, a custom update handler is almost never the right
: > answer to a problem -- nor is a custom request handler that does updates
: > (theose two things are actaully different) ... my advice is always to
: > start by trying to impliment what you need as an UpdateRequestProcessor,
: > and if that doesn't work out then refactor your code to be a Request
: > Handler instead.
: >
: 
: e.g. benefits of UpdateRequestProcessor over custom update handler?

purely fro ma code reuse standpoint.  Request Handler is really the 
coarsest, broadest, level of plugin you can implement.  You can write one 
that does almost anything, but that requires you to do everything 
yourself.

writing an UpdateRequestProcessor instead of a Request Handler lets you 
re-use your customiations with any Request Hanlder, and it's lets you mix 
and match the ordering w/ other Update Processors (instead of it being 
in your handler where you have to do all your special stuff before you 
call out to the processor chain) and makes it usable regardless of wether 
your documents are coming from the XmlUpdateRequestHandler or DIH, or 
whatever.


-Hoss


Re: Question on addBean and deleteByQuery

2012-06-11 Thread Chris Hostetter

: Transfer-Encoding: chunked
: Content-Type: application/xml; charset=UTF-8
: 
: 47
: name:fred AND currency:USD
: 0

...

: Due to the way our servers are setup, we get an error and we think it is due
: to these numbers being in the body of the request. 

please be specific about the errors you are seeing.

If your servlet container can't handle "Transfer-Encoding: chunked" 
requests, that suggests that it isn't HTTP/1.1 complient -- unless it's 
returning "501 Unimplemented", in whichcase it just sounds like a 
really silly HTTP/1.1 server (chunked encoding has been arround since 
the dark ages)

Ergo: i'm suspicious that chunked encoding is reall ythe problem.  details 
(of the error your are getting) matter.



-Hoss


Re: what's better for in memory searching?

2012-06-11 Thread Mikhail Khludnev
Point about premature optimization makes sense for me. However some time
ago I've bookmarked potentially useful approach
http://lucene.472066.n3.nabble.com/High-response-time-after-being-idle-tp3616599p3617604.html.

On Mon, Jun 11, 2012 at 3:02 PM, Toke Eskildsen wrote:

> On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote:
> > yes, I need average query time less than 10 ms. The faster the better.
> > I have enough memory for lucene because I know there are not too much
> > data. there are not many modifications. every day there are about
> > hundreds of document update. if indexes are not in physical memory,
> > then IO operations will cost a few ms.
>
> I'm with Michael on this one: It seems that you're doing a premature
> optimization. Guessing that your final index will be < 5GB in size with
> 1 million documents (give or take 900.000:-), relatively simple queries
> and so on, an average response time of 10 ms should be attainable even
> on spinning drives. One hundred document updates per day are not many,
> so again I would not expect problems.
>
> As is often the case on this mailing list, the advice is "try it". Using
> a normal on-disk index and doing some warm up is the easy solution to
> implement and nearly all of your work on this will be usable for a
> RAM-based solution, if you are not satisfied with the speed. Or you
> could buy a small & cheap SSD and have no more worries...
>
> Regards,
> Toke Eskildsen
>
>


-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


After a full data import from a database

2012-06-11 Thread Jin Chun
Hi there,

I just did a full import of data from a database... and where should I be
looking for the indexed file (likely be in xml format)?

I've already checked the folder I set as  in solrconfig.xml.. and
all I see in there is bunch of .fdt, .fdx, .frq, .tis files... Any
suggestions?

Thanks,

J


Re: After a full data import from a database

2012-06-11 Thread Michael Della Bitta
Hi Jin,

The file never shows up on disk anywhere. It's parsed and various bits
of it are stored in various different ways, depending on your schema.
The raw stored data, if you've so specified, is in the .fdt file, but
that's not going to be a very convenient file format for you to look
at directly.

HTH,

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Mon, Jun 11, 2012 at 6:12 PM, Jin Chun  wrote:
> Hi there,
>
> I just did a full import of data from a database... and where should I be
> looking for the indexed file (likely be in xml format)?
>
> I've already checked the folder I set as  in solrconfig.xml.. and
> all I see in there is bunch of .fdt, .fdx, .frq, .tis files... Any
> suggestions?
>
> Thanks,
>
> J


Re: After a full data import from a database

2012-06-11 Thread Jack Krupansky

Do a query such as:

http://localhost:8983/solr/select/?q=*:*

to see the count of documents that were indexed.

-- Jack Krupansky

-Original Message- 
From: Michael Della Bitta

Sent: Monday, June 11, 2012 6:41 PM
To: solr-user@lucene.apache.org
Subject: Re: After a full data import from a database

Hi Jin,

The file never shows up on disk anywhere. It's parsed and various bits
of it are stored in various different ways, depending on your schema.
The raw stored data, if you've so specified, is in the .fdt file, but
that's not going to be a very convenient file format for you to look
at directly.

HTH,

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Mon, Jun 11, 2012 at 6:12 PM, Jin Chun  wrote:

Hi there,

I just did a full import of data from a database... and where should I be
looking for the indexed file (likely be in xml format)?

I've already checked the folder I set as  in solrconfig.xml.. and
all I see in there is bunch of .fdt, .fdx, .frq, .tis files... Any
suggestions?

Thanks,

J 




Something like 'bf' or 'bq' with MoreLikeThis

2012-06-11 Thread entdeveloper
I'm looking for a way to improve the relevancy of my MLT results. For my
index based on movies, the MoreLikeThisHandler is doing a great job of
returning related documents by the fields I specify like 'genre', but within
my "bands" of results (groups of documents with the same score cause they
all match on the mlt.fl and mlt.qf params), there's nothing else to sort the
results /within/ those "bands".

A good way to help this would be to have a
bf=recip(rord(created_at),1,1000,1000), so the newer movies should up
higher, but I don't think the MLT handler supports bf or bq. Is there
something similar I could use that would accomplish the same thing, maybe
using the _val_: hook somewhere?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Something-like-bf-or-bq-with-MoreLikeThis-tp3989060.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing Multiple Datasources

2012-06-11 Thread Jack Krupansky
You can do it by giving each database data source a "name" attribute, which 
is what you reference in the dataSource attribute of your entity.


See:
http://wiki.apache.org/solr/DataImportHandler#multipleds

Or, are you in fact trying to join or merge the tables based on first name 
and last name or something similar?


-- Jack Krupansky

-Original Message- 
From: Kay

Sent: Monday, June 11, 2012 11:59 AM
To: solr-user@lucene.apache.org
Subject: Indexing Multiple Datasources

Hello,

We have 2 MS SQL Server Databases which we wanted to index .But most of the
columns in the Databases have the same names. For e.g. Both the DB’s have
the columns First name ,Last name ,etc.

How can you index multiple Databases using single db-data-config file and
one schema?

Here is my data-config file



   










   









   


And schema file:











BusinessEntityID

LastName


We would appreciate your help!

Thanks!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Multiple-Datasources-tp3988957.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: what's better for in memory searching?

2012-06-11 Thread Li Li
is this method equivalent to set vm.swappiness which is global?
or it can set the swappiness for jvm process?

On Tue, Jun 12, 2012 at 5:11 AM, Mikhail Khludnev
 wrote:
> Point about premature optimization makes sense for me. However some time
> ago I've bookmarked potentially useful approach
> http://lucene.472066.n3.nabble.com/High-response-time-after-being-idle-tp3616599p3617604.html.
>
> On Mon, Jun 11, 2012 at 3:02 PM, Toke Eskildsen 
> wrote:
>
>> On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote:
>> > yes, I need average query time less than 10 ms. The faster the better.
>> > I have enough memory for lucene because I know there are not too much
>> > data. there are not many modifications. every day there are about
>> > hundreds of document update. if indexes are not in physical memory,
>> > then IO operations will cost a few ms.
>>
>> I'm with Michael on this one: It seems that you're doing a premature
>> optimization. Guessing that your final index will be < 5GB in size with
>> 1 million documents (give or take 900.000:-), relatively simple queries
>> and so on, an average response time of 10 ms should be attainable even
>> on spinning drives. One hundred document updates per day are not many,
>> so again I would not expect problems.
>>
>> As is often the case on this mailing list, the advice is "try it". Using
>> a normal on-disk index and doing some warm up is the easy solution to
>> implement and nearly all of your work on this will be usable for a
>> RAM-based solution, if you are not satisfied with the speed. Or you
>> could buy a small & cheap SSD and have no more worries...
>>
>> Regards,
>> Toke Eskildsen
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
> 
>  


Re: Something like 'bf' or 'bq' with MoreLikeThis

2012-06-11 Thread Jack Krupansky
The MLT handler may not have those params, but you could use the MLT search 
"component" to generate the MLT queries (and results) and then add your own 
component that would revise the MLT queries to be boosted as you desire.


-- Jack Krupansky

-Original Message- 
From: entdeveloper

Sent: Monday, June 11, 2012 7:13 PM
To: solr-user@lucene.apache.org
Subject: Something like 'bf' or 'bq' with MoreLikeThis

I'm looking for a way to improve the relevancy of my MLT results. For my
index based on movies, the MoreLikeThisHandler is doing a great job of
returning related documents by the fields I specify like 'genre', but within
my "bands" of results (groups of documents with the same score cause they
all match on the mlt.fl and mlt.qf params), there's nothing else to sort the
results /within/ those "bands".

A good way to help this would be to have a
bf=recip(rord(created_at),1,1000,1000), so the newer movies should up
higher, but I don't think the MLT handler supports bf or bq. Is there
something similar I could use that would accomplish the same thing, maybe
using the _val_: hook somewhere?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Something-like-bf-or-bq-with-MoreLikeThis-tp3989060.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Exception when optimizing index

2012-06-11 Thread Rok Rejc
Just as an addon:

I have delete whole index directory and load the data from the start. After
the data was loaded (and I commited the data) I run CheckIndex again.
Again, there was bunch of broken segments.

I will try with the latest trunk to see if the problem still exists.

Regards,
Rok


On Mon, Jun 11, 2012 at 8:32 AM, Rok Rejc  wrote:

> Hi all,
>
> I have run CheckIndex. It seems that the index is currupted. I've got
> plenty of exceptions like:
>
>   test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException
> java.lang.ArrayIndexOutOfBoundsException
> at
> org.apache.lucene.store.ByteArrayDataInput.readBytes(ByteArrayDataInput.java:181)
> at
> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.nextLeaf(BlockTreeTermsReader.java:2414)
> at
> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.next(BlockTreeTermsReader.java:2400)
> at
> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next(BlockTreeTermsReader.java:2074)
> at
> org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:771)
> at
> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1164)
> at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:602)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1748)
>
>
> and
>
>   test: terms, freq, prox...ERROR: java.lang.RuntimeException: term [6f 70
> 65 72 61 63 69 6a 61]: doc 105407 <= lastDoc 105407
> java.lang.RuntimeException: term [6f 70 65 72 61 63 69 6a 61]: doc 105407
> <= lastDoc 105407
> at
> org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:858)
> at
> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1164)
> at
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:602)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1748)
> test: stored fields...OK [723321 total field count; avg 3 fields
> per doc]
>
>
>
> final warning was:
>
>
> WARNING: 154 broken segments (containing 48127608 documents) detected
> WARNING: would write new segments file, and 48127608 documents would be
> lost, if -fix were specified
>
>
> As I mentiod - I have run optimization after initial import (no further
> adds or deletion were made).
> For import I'm creating csv files and I'm loading them through csv upload
> with multiple threads.
>
> The index is otherwise queryable.
>
> Any ideas what should I do next? Is this a bug in lucene?
>
> Many thanks...
>
> Rok
>
>
>
>
>
>
>
>
>
> On Thu, Jun 7, 2012 at 5:05 PM, Jack Krupansky wrote:
>
>> Is the index otherwise usable for queries? And it is only the optimize
>> that is failing?
>>
>> I suppose it is possible that the index could be corrupted, but it is
>> also possible that there is a bug in Lucene.
>>
>> I would suggest running Lucene "CheckIndex" next. See what it has to say.
>>
>> See:
>> https://builds.apache.org/job/**Lucene-trunk/javadoc/core/org/**
>> apache/lucene/index/**CheckIndex.html#main(java.**lang.String[])
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Rok Rejc
>> Sent: Thursday, June 07, 2012 5:50 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Exception when optimizing index
>>
>>
>> Hi Jack,
>>
>> its the virtual machine running on a VMware vSphere 5 Enterprise Plus.
>> Machine has 30 GB vRAM, 8 core vCPU 3.0 GHz, 2 TB SATA RAID-10 over iSCSI.
>> Operation system is CentOS 6.2 64bit.
>>
>> Here are java infos:
>>
>>
>>  - catalina.​base/usr/share/**tomcat6
>>  - catalina.​home/usr/share/**tomcat6
>>  - catalina.​useNamingtrue
>>  - common.​loader
>>  ${catalina.base}/lib,${**catalina.base}/lib/*.jar,${**
>> catalina.home}/lib,${catalina.**home}/lib/*.jar
>>  - file.​encodingUTF-8
>>  - file.​encoding.​pkgsun.io
>>  - file.​separator/
>>  - java.​awt.​graphicsenvsun.awt.**X11GraphicsEnvironment
>>  - java.​awt.​printerjobsun.**print.PSPrinterJob
>>  - java.​class.​path
>>  /usr/share/tomcat6/bin/**bootstrap.jar
>>  /usr/share/tomcat6/bin/tomcat-**juli.jar/usr/share/java/**
>> commons-daemon.jar
>>  - java.​class.​version50.0
>>  - java.​endorsed.​dirs
>>  - java.​ext.​dirs
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/ext
>>  /usr/java/packages/lib/ext
>>  - java.​home/usr/lib/jvm/java-1.**6.0-openjdk-1.6.0.0.x86_64/jre
>>  - java.​io.​tmpdir/var/cache/**tomcat6/temp
>>  - java.​library.​path
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/amd64/server
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/**lib/amd64
>>  /usr/lib/jvm/java-1.6.0-**openjdk-1.6.0.0.x86_64/jre/../**lib/amd64
>>  /usr/java/packages/lib/amd64/**usr/lib64/lib64/lib/usr/lib
>>  - java.​naming.​factory.​initial
>>  org.apache.naming.java.**javaURLContextFactory
>>  - java.​naming.​factory.​url.

Re: help with map reduce

2012-06-11 Thread Gora Mohanty
On 12 June 2012 11:13, Sachin Aggarwal  wrote:
> hello,
>
>
> I need help to write a map reduce program that can take the records from
> hbase table and insert into lily repository...
> which method will be a better option to do doing indexing in the same job
> or just perform insertion operation first and then call a batch update.
>
> Any method we follow...plz guide me what will the mapper will do and what
> the reducer will do ...i have gone through the pages given in the
> documentation still its not clear for me.

Sorry, why are you posting this question to lists where
it is not relevant? In particular, it is absurd to include
solr-user-request, and solr-user-subscribe.

Please try:
(a) Doing due diligence by searching Google. E.g.,
 searching for "Hbase Lily" seems to turn up
 many possibilities.
(b) Ask on a Hbase/Lily -specific list
You might also want to read http://wiki.apache.org/solr/UsingMailingLists

Regards,
Gora


Re: help with map reduce

2012-06-11 Thread Sachin Aggarwal
ok...my fault..

On Tue, Jun 12, 2012 at 11:31 AM, Gora Mohanty  wrote:

> On 12 June 2012 11:13, Sachin Aggarwal  wrote:
> > hello,
> >
> >
> > I need help to write a map reduce program that can take the records from
> > hbase table and insert into lily repository...
> > which method will be a better option to do doing indexing in the same job
> > or just perform insertion operation first and then call a batch update.
> >
> > Any method we follow...plz guide me what will the mapper will do and what
> > the reducer will do ...i have gone through the pages given in the
> > documentation still its not clear for me.
>
> Sorry, why are you posting this question to lists where
> it is not relevant? In particular, it is absurd to include
> solr-user-request, and solr-user-subscribe.
>
> Please try:
> (a) Doing due diligence by searching Google. E.g.,
> searching for "Hbase Lily" seems to turn up
> many possibilities.
> (b) Ask on a Hbase/Lily -specific list
> You might also want to read http://wiki.apache.org/solr/UsingMailingLists
>
> Regards,
> Gora
>



-- 

Thanks & Regards

Sachin Aggarwal
7760502772


Changing Index directory?

2012-06-11 Thread Bruno Mannina

Dear All,

For tests, I would like to install Solr on standard directory 
(/home/solr) but with the index in a External HardDisk (/media/myExthdd).

I suppose it will decrease performance but it's not a problem.

Where can I find the Index Directory Path variable?

Thanks a lot,
Bruno


Re: Changing Index directory?

2012-06-11 Thread Bruno Mannina

Le 12/06/2012 08:49, Bruno Mannina a écrit :

Dear All,

For tests, I would like to install Solr on standard directory 
(/home/solr) but with the index in a External HardDisk (/media/myExthdd).

I suppose it will decrease performance but it's not a problem.

Where can I find the Index Directory Path variable?

Thanks a lot,
Bruno



sorry Solrconfig.xml ...