escaping special characters does not seem to be escaping in query

2011-08-30 Thread ramdev.wudali
Hi All:
I have a few fields that are of the form: "A:2B" or "G:U2" and so on.  I 
would like to be able to search the field using a wild character search like:   
A:2*
or G:U*. I have tried out modifying the field_type definitions to allow for 
such queries but without any luck

Could someone/anyone provided me with a fieldtype that uses the canned 
Tokenizers and filters which will allow me to do a search as described ?

Thanks much

Ramdev 



How do I go about adding a score attribute to a field

2012-01-09 Thread ramdev.wudali
Hi All:
I have been using Solr for a few months now. however I have ran into a 
situation where now I need to have additional values (like score)  to a 
multivalued field. 
for example:
   field def : 
   

For each of the values, there is a corresponding score that I need to keep 
track of. The best way I can think of is, for the score to be an attribute to 
the str tag within the multivalued field 

Is there a way I could do this ?

Thanks for the community's help

Ramdev




How do I go about adding a score attribute to a field

2012-01-09 Thread ramdev.wudali
Hi All:
   I have been using Solr for a few months now. however I have ran into a 
situation where now I need to have additional values (like score)  to a 
multivalued field. 
for example:
  field def : 
  

For each of the values, there is a corresponding score that I need to keep 
track of. The best way I can think of is, for the score to be an attribute to 
the str tag within the multivalued field 

Is there a way I could do this ?

Thanks for the community's help

Ramdev




is the SolrJ call to add collection of documents a blocking function call ?

2012-03-19 Thread ramdev.wudali
Hi:
   I am trying to index a collection of SolrInputDocs to a Solr server. I was 
wondering if the call I make to add the documents (the 
add(Collection)  call ) is a blocking function call ?

I would also like to know if the add call is a call that would take longer for 
a larger collection of documents


Thanks

Ramdev


Multi-valued polyfields - Do they exist in the wild ?

2012-03-20 Thread ramdev.wudali
Hi:
   We have been keen on using polyfields for a while. But we have been 
restricted from using it because they do not seem to support Multi-values 
(yet). I am wondering if there are any Custom implementations  or is there any 
ETA on the Solr releases to include Multivalued PolyFields  .

Thanks for the support

Ramde


copyField question

2012-03-21 Thread ramdev.wudali
Hi:
   Is it it possible to store a value and a corresponding score in Solr as part 
of a single Field definition. And Can this field be a multivalued field ?
I have several terms that are score. I would like to store them as part of a 
single field definition rather than having to create two different fields (one 
storing score and the other the value).

However, If the multivalued complex data field is not possible. Is it possible 
to use copyField directive to copy fields if a certain score  is higher than a 
threshold ?


Thanks

Ramdev


Re: Multi-valued polyfields - Do they exist in the wild ?

2012-03-21 Thread ramdev.wudali
Hi Yonik:
Thanks, I am looking a field (example: Currency) which can have
multiple values within a document (I.e. Different currencies and
corresponding conversion rates).
I would like to store that information as part of one multivalued field.

Even better would be a solution that upon queried would be able to return
only the items that satisfy the query. (I.e. When a multivalued field is
queried, that the results are the entries within the multivalued field
that satisfy the query)Š

Is there any magic like that ?

If there is a possibility to store complex fields as multivalued fields,
then maybe I could use the CopyField with a condition to copy only the
content that satisfy a threshold score.


Thanks for the support/answers

Ramdev



On 3/20/12 2:12 PM, "Yonik Seeley"  wrote:

>On Tue, Mar 20, 2012 at 2:17 PM,  
>wrote:
>> Hi:
>>   We have been keen on using polyfields for a while. But we have been
>>restricted from using it because they do not seem to support
>>Multi-values (yet).
>
>Poly-fields should support multi-values, it's more what uses them may not.
>For example LatLon isn't multiValued because it doesn't have a
>mechanism to correlate multiple values per document.
>
>-Yonik
>lucenerevolution.com - Lucene/Solr Open Source Search Conference.
>Boston May 7-10



Re: copyField question

2012-03-21 Thread ramdev.wudali
Hi Tomás:
   I think there is simplicity in your solution ;)  A document would have
Tens of different values. (at the most 20)Š

So If were to follow your suggestion of naming a dynamic field with the
value as the name of the field and the corresponding Score as the value.
How would I go about changing the schema ?

Thanks

Ramdev


On 3/21/12 3:24 PM, "Tomás Fernández Löbbe"  wrote:

>> However, If the multivalued complex data field is not possible. Is it
>possible to use copyField directive to copy fields if a certain score  is
>higher than a threshold ?
>I don't think that's possible out of the box, but you could use custom
>UpdateRequestProcessor for for that.
>
>How many different values do you have? tens? hundreds? thousands?...
>millions? If those are not too many, you could use dynamic fields, using
>the value as field name and the score as field value. Unless I'm
>oversimplifying your problem.
>
>Tomás
>
>
>On Wed, Mar 21, 2012 at 5:16 PM,  wrote:
>
>> Hi:
>>   Is it it possible to store a value and a corresponding score in Solr
>>as
>> part of a single Field definition. And Can this field be a multivalued
>> field ?
>> I have several terms that are score. I would like to store them as part
>>of
>> a single field definition rather than having to create two different
>>fields
>> (one storing score and the other the value).
>>
>> However, If the multivalued complex data field is not possible. Is it
>> possible to use copyField directive to copy fields if a certain score
>>is
>> higher than a threshold ?
>>
>>
>> Thanks
>>
>> Ramdev
>>



Re: copyField question

2012-03-22 Thread ramdev.wudali
Hi Tomas:

These fields are for searching only.

Currently we have around 1.8M docs indexed.and Assuming each Doc has about
20 of these additional fields to be created as dynamic fields (worst case
scenario), and also there are about 6K if these different values (I.e. If
we were to create static fields defs, there would be 6K fields).

I did create dynamic fields as you suggested, but only on a subset of docs
(10K). I have not extensive performance analysis on it or anything. (its a
rather simple  schema/index structure).


Thanks

Ramdev


On 3/22/12 7:42 AM, "Tomás Fernández Löbbe"  wrote:

>I meant, how many values in total? A single document may have 20, but are
>those 20 shared with other document (even if they have different score) or
>each document will have 10-20 completely different values? I think Solr
>could handle a couple hundred of fields, but I don't know how it would
>behave with thousands (really, I don't know you should test it).
>
>You should be using a dynamic field for creating those fields dynamically,
>and make sure you have the omitNorms attribute set to true.
>
>What do you need to use those fields for? searching? displaying?
>
>
>On Wed, Mar 21, 2012 at 5:49 PM,  wrote:
>
>> Hi Tomás:
>>   I think there is simplicity in your solution ;)  A document would have
>> Tens of different values. (at the most 20)Š
>>
>> So If were to follow your suggestion of naming a dynamic field with the
>> value as the name of the field and the corresponding Score as the value.
>> How would I go about changing the schema ?
>>
>> Thanks
>>
>> Ramdev
>>
>>
>> On 3/21/12 3:24 PM, "Tomás Fernández Löbbe" 
>>wrote:
>>
>> >> However, If the multivalued complex data field is not possible. Is it
>> >possible to use copyField directive to copy fields if a certain score
>>is
>> >higher than a threshold ?
>> >I don't think that's possible out of the box, but you could use custom
>> >UpdateRequestProcessor for for that.
>> >
>> >How many different values do you have? tens? hundreds? thousands?...
>> >millions? If those are not too many, you could use dynamic fields,
>>using
>> >the value as field name and the score as field value. Unless I'm
>> >oversimplifying your problem.
>> >
>> >Tomás
>> >
>> >
>> >On Wed, Mar 21, 2012 at 5:16 PM, 
>> wrote:
>> >
>> >> Hi:
>> >>   Is it it possible to store a value and a corresponding score in
>>Solr
>> >>as
>> >> part of a single Field definition. And Can this field be a
>>multivalued
>> >> field ?
>> >> I have several terms that are score. I would like to store them as
>>part
>> >>of
>> >> a single field definition rather than having to create two different
>> >>fields
>> >> (one storing score and the other the value).
>> >>
>> >> However, If the multivalued complex data field is not possible. Is it
>> >> possible to use copyField directive to copy fields if a certain score
>> >>is
>> >> higher than a threshold ?
>> >>
>> >>
>> >> Thanks
>> >>
>> >> Ramdev
>> >>
>>
>>



help with Solr installation within Tomcat7

2011-03-22 Thread ramdev.wudali
Hi All:
   I have just started using Solr and have it successfully installed within a 
Tomcat7 Webapp server.
I have also indexed documents using the SolrJ interfaces. The following is my 
problem:

I installed Solr under Tomcat7 folders and setup an xml configuration file to 
indicate the Solr home variables as detailed on the wiki (for Solr install 
within TOmcat)
The indexes seem to reside within the solr_home folder under the data folder  
(/data/index )

However when I make a zip copy of the the complete install (i.e. tomcat with 
Solr), and move it to a different machine and unzip/install it,
The index seems to be inaccessible. (I did change the solr.xml configuration 
variables to point to the new location)

>From what I know, with tomcat installations, it should be as simple as zipping 
>a current working installation and unzipping/installing  on a different 
>machine/location.

Am I missing something that makes Solr "hardcode" the path to the index in an 
install ?

Simple plut, I would like to know how to "transport" an existing install of 
Solr within TOmcat 7 from one machine to another and still have it working.

Ramdev=


assit with the Clustering component in Solr/Lucene

2011-03-30 Thread ramdev.wudali
Hi:
  I recently included the CLustering component into Solr and updated the 
requestHandler accordingly (in solrconfig.xml).
Snippet of the Config for the CLuserting:

  


  
  default
  
  org.carrot2.clustering.lingo.LingoClusteringAlgorithm
  
  20


  stc
  org.carrot2.clustering.stc.STCClusteringAlgorithm

  

snippet of the Config for requestHandler
  

 
   explicit
   
   true
   default
   true
   
   headline
   pi
   
   headline
   
   true
   
   
   
   false
 

  clusteringComponent

  


When I perform a search, I see that the Cluster section within the Solr results
shows me results that are not quite consistent. There are two documents that 
are reported in two different documents

Are there parameters that can be set that will prevent this from happening ?


Thanks much

Ramdev



Re: assit with the Clustering component in Solr/Lucene

2011-03-31 Thread ramdev.wudali
Hi Staszek:
 I added the parameter as you suggested. 
(LingoClusteringAlgorithm.clusterMergingThreshold) into the searchComponent 
section that describes the Clustering module
Changing the value of the parameter  did not have any effect on my search 
results.

However, when I used the Carrot2 workbench, I could see the effect of changing 
the value. (from 6 clusters it went down to 2 clusters)

here is the XML snippet for the searchComponent:

  


  
  default
  
  org.carrot2.clustering.lingo.LingoClusteringAlgorithm
  
  20
  0.0

  


I would appreciate any insights into this behavior. 

Thanks

Ramdev


On Mar 30, 2011, at 11:51 AM, Stanislaw Osinski wrote:


Hi Ramdev,

Both of the clustering algorithms that ship with Solr (Lingo and STC) 
are designed to allow one document to appear in more than one cluster, which 
actually does make sense in many scenarios. There's no easy way to force them 
to produce hard clusterings because this would require a complete change in the 
way the algorithms work. If you need each document to belong to exactly one 
cluster, you'd have to post-process the clusters to remove the redundant 
document assignments. Alternatively, in case of the Lingo algorithm, you can 
try lowering the "LingoClusteringAlgorithm.clusterMergingThreshold" to some 
value in the range of 0.2--0.5. If you do that, clusters containing overlapping 
documents will get merged. For more information about this attribute, see here: 
http://download.carrot2.org/stable/manual/#section.attribute.LingoClusteringAlgorithm.clusterMergingThreshold.

Cheers,

Staszek


On Wed, Mar 30, 2011 at 18:21, Markus Jelsma 
 wrote:


Yes, you can set engine specific parameters. Check the comments 
in your
snippety.


> Hi:
>   I recently included the CLustering component into Solr and 
updated the
> requestHandler accordingly (in solrconfig.xml). Snippet of 
the Config for
> the CLuserting:
>
>name="clusteringComponent"
> enable="${solr.clustering.enabled:false}"
> 
class="org.apache.solr.handler.clustering.ClusteringComponent" >
> 
> 
>   
>   default
>   
>
name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgori
> thm 
>   20
> 
> 
>   stc
>
name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm<
> /str> 
>   
>
> snippet of the Config for requestHandler
>default="true"> 
>  
>explicit
>
>true
>default
>true
>
>headline
>pi
>
>headline
>
>true
>
>
>
>false
>  
> 
>   clusteringComponent
> 
>   
>
>
> When I perform a search, I see that the Cluster section 
within the Solr
> results shows me results that are not quite consistent. There 
are two
> documents that are reported in two different documents
>
> Are there parameters that can be set that will prevent this 
from happening
> ?
>
>
> Thanks much
>
> Ramdev






Re: assit with the Clustering component in Solr/Lucene

2011-03-31 Thread ramdev.wudali
That did make a difference, I now see the exact number of cluster i see from 
the workbench.
I am of course interested in why the config changes did not have much effect. 
However, I am happy that by adding the threshold to my request URL produces the 
desired results

let me know if I can do any more tests and I will do so. Thanks much

Ramdev



On Mar 31, 2011, at 10:18 AM, Stanislaw Osinski wrote:



 I added the parameter as you suggested. 
(LingoClusteringAlgorithm.clusterMergingThreshold) into the searchComponent 
section that describes the Clustering module
Changing the value of the parameter  did not have any effect on 
my search results.

However, when I used the Carrot2 workbench, I could see the 
effect of changing the value. (from 6 clusters it went down to 2 clusters)


Interesting... Can you, for the sake of debugging, append 
&LingoClusteringAlgorithm.clusterMergingThreshold=0.0 to your request URL?

S.






Re: assit with the Clustering component in Solr/Lucene

2011-05-16 Thread ramdev.wudali
Thanks much Stan,


Ramdev

On May 16, 2011, at 11:38 AM, Stanislaw Osinski wrote:


Both of the clustering algorithms that ship with Solr 
(Lingo and STC) are designed to allow one document to appear in more than one 
cluster, which actually does make sense in many scenarios. There's no easy way 
to force them to produce hard clusterings because this would require a complete 
change in the way the algorithms work. If you need each document to belong to 
exactly one cluster, you'd have to post-process the clusters to remove the 
redundant document assignments.



On the second thought, I have a simple implementation of 
k-means clustering that could do hard clustering for you. It's not available 
yet, it will most probably be part of the next major release of Carrot2 (the 
package that does the clustering). Please watch this issue 
http://issues.carrot2.org/browse/CARROT-791 to get updates on this.



Just to let you know: Carrot2 3.5.0 has landed in Solr trunk and 
branch_3x, so you can use the bisecting k-means clustering algorithm 
(org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm) which will 
produce non-overlapping clusters for you. The downside of this simple 
implementation of k-means is that, for the time being, it produces one-word 
cluster labels rather than phrases as Lingo and STC.

Cheers,

S.