escaping special characters does not seem to be escaping in query
Hi All: I have a few fields that are of the form: "A:2B" or "G:U2" and so on. I would like to be able to search the field using a wild character search like: A:2* or G:U*. I have tried out modifying the field_type definitions to allow for such queries but without any luck Could someone/anyone provided me with a fieldtype that uses the canned Tokenizers and filters which will allow me to do a search as described ? Thanks much Ramdev
How do I go about adding a score attribute to a field
Hi All: I have been using Solr for a few months now. however I have ran into a situation where now I need to have additional values (like score) to a multivalued field. for example: field def : For each of the values, there is a corresponding score that I need to keep track of. The best way I can think of is, for the score to be an attribute to the str tag within the multivalued field Is there a way I could do this ? Thanks for the community's help Ramdev
How do I go about adding a score attribute to a field
Hi All: I have been using Solr for a few months now. however I have ran into a situation where now I need to have additional values (like score) to a multivalued field. for example: field def : For each of the values, there is a corresponding score that I need to keep track of. The best way I can think of is, for the score to be an attribute to the str tag within the multivalued field Is there a way I could do this ? Thanks for the community's help Ramdev
is the SolrJ call to add collection of documents a blocking function call ?
Hi: I am trying to index a collection of SolrInputDocs to a Solr server. I was wondering if the call I make to add the documents (the add(Collection) call ) is a blocking function call ? I would also like to know if the add call is a call that would take longer for a larger collection of documents Thanks Ramdev
Multi-valued polyfields - Do they exist in the wild ?
Hi: We have been keen on using polyfields for a while. But we have been restricted from using it because they do not seem to support Multi-values (yet). I am wondering if there are any Custom implementations or is there any ETA on the Solr releases to include Multivalued PolyFields . Thanks for the support Ramde
copyField question
Hi: Is it it possible to store a value and a corresponding score in Solr as part of a single Field definition. And Can this field be a multivalued field ? I have several terms that are score. I would like to store them as part of a single field definition rather than having to create two different fields (one storing score and the other the value). However, If the multivalued complex data field is not possible. Is it possible to use copyField directive to copy fields if a certain score is higher than a threshold ? Thanks Ramdev
Re: Multi-valued polyfields - Do they exist in the wild ?
Hi Yonik: Thanks, I am looking a field (example: Currency) which can have multiple values within a document (I.e. Different currencies and corresponding conversion rates). I would like to store that information as part of one multivalued field. Even better would be a solution that upon queried would be able to return only the items that satisfy the query. (I.e. When a multivalued field is queried, that the results are the entries within the multivalued field that satisfy the query)Š Is there any magic like that ? If there is a possibility to store complex fields as multivalued fields, then maybe I could use the CopyField with a condition to copy only the content that satisfy a threshold score. Thanks for the support/answers Ramdev On 3/20/12 2:12 PM, "Yonik Seeley" wrote: >On Tue, Mar 20, 2012 at 2:17 PM, >wrote: >> Hi: >> We have been keen on using polyfields for a while. But we have been >>restricted from using it because they do not seem to support >>Multi-values (yet). > >Poly-fields should support multi-values, it's more what uses them may not. >For example LatLon isn't multiValued because it doesn't have a >mechanism to correlate multiple values per document. > >-Yonik >lucenerevolution.com - Lucene/Solr Open Source Search Conference. >Boston May 7-10
Re: copyField question
Hi Tomás: I think there is simplicity in your solution ;) A document would have Tens of different values. (at the most 20)Š So If were to follow your suggestion of naming a dynamic field with the value as the name of the field and the corresponding Score as the value. How would I go about changing the schema ? Thanks Ramdev On 3/21/12 3:24 PM, "Tomás Fernández Löbbe" wrote: >> However, If the multivalued complex data field is not possible. Is it >possible to use copyField directive to copy fields if a certain score is >higher than a threshold ? >I don't think that's possible out of the box, but you could use custom >UpdateRequestProcessor for for that. > >How many different values do you have? tens? hundreds? thousands?... >millions? If those are not too many, you could use dynamic fields, using >the value as field name and the score as field value. Unless I'm >oversimplifying your problem. > >Tomás > > >On Wed, Mar 21, 2012 at 5:16 PM, wrote: > >> Hi: >> Is it it possible to store a value and a corresponding score in Solr >>as >> part of a single Field definition. And Can this field be a multivalued >> field ? >> I have several terms that are score. I would like to store them as part >>of >> a single field definition rather than having to create two different >>fields >> (one storing score and the other the value). >> >> However, If the multivalued complex data field is not possible. Is it >> possible to use copyField directive to copy fields if a certain score >>is >> higher than a threshold ? >> >> >> Thanks >> >> Ramdev >>
Re: copyField question
Hi Tomas: These fields are for searching only. Currently we have around 1.8M docs indexed.and Assuming each Doc has about 20 of these additional fields to be created as dynamic fields (worst case scenario), and also there are about 6K if these different values (I.e. If we were to create static fields defs, there would be 6K fields). I did create dynamic fields as you suggested, but only on a subset of docs (10K). I have not extensive performance analysis on it or anything. (its a rather simple schema/index structure). Thanks Ramdev On 3/22/12 7:42 AM, "Tomás Fernández Löbbe" wrote: >I meant, how many values in total? A single document may have 20, but are >those 20 shared with other document (even if they have different score) or >each document will have 10-20 completely different values? I think Solr >could handle a couple hundred of fields, but I don't know how it would >behave with thousands (really, I don't know you should test it). > >You should be using a dynamic field for creating those fields dynamically, >and make sure you have the omitNorms attribute set to true. > >What do you need to use those fields for? searching? displaying? > > >On Wed, Mar 21, 2012 at 5:49 PM, wrote: > >> Hi Tomás: >> I think there is simplicity in your solution ;) A document would have >> Tens of different values. (at the most 20)Š >> >> So If were to follow your suggestion of naming a dynamic field with the >> value as the name of the field and the corresponding Score as the value. >> How would I go about changing the schema ? >> >> Thanks >> >> Ramdev >> >> >> On 3/21/12 3:24 PM, "Tomás Fernández Löbbe" >>wrote: >> >> >> However, If the multivalued complex data field is not possible. Is it >> >possible to use copyField directive to copy fields if a certain score >>is >> >higher than a threshold ? >> >I don't think that's possible out of the box, but you could use custom >> >UpdateRequestProcessor for for that. >> > >> >How many different values do you have? tens? hundreds? thousands?... >> >millions? If those are not too many, you could use dynamic fields, >>using >> >the value as field name and the score as field value. Unless I'm >> >oversimplifying your problem. >> > >> >Tomás >> > >> > >> >On Wed, Mar 21, 2012 at 5:16 PM, >> wrote: >> > >> >> Hi: >> >> Is it it possible to store a value and a corresponding score in >>Solr >> >>as >> >> part of a single Field definition. And Can this field be a >>multivalued >> >> field ? >> >> I have several terms that are score. I would like to store them as >>part >> >>of >> >> a single field definition rather than having to create two different >> >>fields >> >> (one storing score and the other the value). >> >> >> >> However, If the multivalued complex data field is not possible. Is it >> >> possible to use copyField directive to copy fields if a certain score >> >>is >> >> higher than a threshold ? >> >> >> >> >> >> Thanks >> >> >> >> Ramdev >> >> >> >>
help with Solr installation within Tomcat7
Hi All: I have just started using Solr and have it successfully installed within a Tomcat7 Webapp server. I have also indexed documents using the SolrJ interfaces. The following is my problem: I installed Solr under Tomcat7 folders and setup an xml configuration file to indicate the Solr home variables as detailed on the wiki (for Solr install within TOmcat) The indexes seem to reside within the solr_home folder under the data folder (/data/index ) However when I make a zip copy of the the complete install (i.e. tomcat with Solr), and move it to a different machine and unzip/install it, The index seems to be inaccessible. (I did change the solr.xml configuration variables to point to the new location) >From what I know, with tomcat installations, it should be as simple as zipping >a current working installation and unzipping/installing on a different >machine/location. Am I missing something that makes Solr "hardcode" the path to the index in an install ? Simple plut, I would like to know how to "transport" an existing install of Solr within TOmcat 7 from one machine to another and still have it working. Ramdev=
assit with the Clustering component in Solr/Lucene
Hi: I recently included the CLustering component into Solr and updated the requestHandler accordingly (in solrconfig.xml). Snippet of the Config for the CLuserting: default org.carrot2.clustering.lingo.LingoClusteringAlgorithm 20 stc org.carrot2.clustering.stc.STCClusteringAlgorithm snippet of the Config for requestHandler explicit true default true headline pi headline true false clusteringComponent When I perform a search, I see that the Cluster section within the Solr results shows me results that are not quite consistent. There are two documents that are reported in two different documents Are there parameters that can be set that will prevent this from happening ? Thanks much Ramdev
Re: assit with the Clustering component in Solr/Lucene
Hi Staszek: I added the parameter as you suggested. (LingoClusteringAlgorithm.clusterMergingThreshold) into the searchComponent section that describes the Clustering module Changing the value of the parameter did not have any effect on my search results. However, when I used the Carrot2 workbench, I could see the effect of changing the value. (from 6 clusters it went down to 2 clusters) here is the XML snippet for the searchComponent: default org.carrot2.clustering.lingo.LingoClusteringAlgorithm 20 0.0 I would appreciate any insights into this behavior. Thanks Ramdev On Mar 30, 2011, at 11:51 AM, Stanislaw Osinski wrote: Hi Ramdev, Both of the clustering algorithms that ship with Solr (Lingo and STC) are designed to allow one document to appear in more than one cluster, which actually does make sense in many scenarios. There's no easy way to force them to produce hard clusterings because this would require a complete change in the way the algorithms work. If you need each document to belong to exactly one cluster, you'd have to post-process the clusters to remove the redundant document assignments. Alternatively, in case of the Lingo algorithm, you can try lowering the "LingoClusteringAlgorithm.clusterMergingThreshold" to some value in the range of 0.2--0.5. If you do that, clusters containing overlapping documents will get merged. For more information about this attribute, see here: http://download.carrot2.org/stable/manual/#section.attribute.LingoClusteringAlgorithm.clusterMergingThreshold. Cheers, Staszek On Wed, Mar 30, 2011 at 18:21, Markus Jelsma wrote: Yes, you can set engine specific parameters. Check the comments in your snippety. > Hi: > I recently included the CLustering component into Solr and updated the > requestHandler accordingly (in solrconfig.xml). Snippet of the Config for > the CLuserting: > >name="clusteringComponent" > enable="${solr.clustering.enabled:false}" > class="org.apache.solr.handler.clustering.ClusteringComponent" > > > > > default > > name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgori > thm > 20 > > > stc > name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm< > /str> > > > snippet of the Config for requestHandler >default="true"> > >explicit > >true >default >true > >headline >pi > >headline > >true > > > >false > > > clusteringComponent > > > > > When I perform a search, I see that the Cluster section within the Solr > results shows me results that are not quite consistent. There are two > documents that are reported in two different documents > > Are there parameters that can be set that will prevent this from happening > ? > > > Thanks much > > Ramdev
Re: assit with the Clustering component in Solr/Lucene
That did make a difference, I now see the exact number of cluster i see from the workbench. I am of course interested in why the config changes did not have much effect. However, I am happy that by adding the threshold to my request URL produces the desired results let me know if I can do any more tests and I will do so. Thanks much Ramdev On Mar 31, 2011, at 10:18 AM, Stanislaw Osinski wrote: I added the parameter as you suggested. (LingoClusteringAlgorithm.clusterMergingThreshold) into the searchComponent section that describes the Clustering module Changing the value of the parameter did not have any effect on my search results. However, when I used the Carrot2 workbench, I could see the effect of changing the value. (from 6 clusters it went down to 2 clusters) Interesting... Can you, for the sake of debugging, append &LingoClusteringAlgorithm.clusterMergingThreshold=0.0 to your request URL? S.
Re: assit with the Clustering component in Solr/Lucene
Thanks much Stan, Ramdev On May 16, 2011, at 11:38 AM, Stanislaw Osinski wrote: Both of the clustering algorithms that ship with Solr (Lingo and STC) are designed to allow one document to appear in more than one cluster, which actually does make sense in many scenarios. There's no easy way to force them to produce hard clusterings because this would require a complete change in the way the algorithms work. If you need each document to belong to exactly one cluster, you'd have to post-process the clusters to remove the redundant document assignments. On the second thought, I have a simple implementation of k-means clustering that could do hard clustering for you. It's not available yet, it will most probably be part of the next major release of Carrot2 (the package that does the clustering). Please watch this issue http://issues.carrot2.org/browse/CARROT-791 to get updates on this. Just to let you know: Carrot2 3.5.0 has landed in Solr trunk and branch_3x, so you can use the bisecting k-means clustering algorithm (org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm) which will produce non-overlapping clusters for you. The downside of this simple implementation of k-means is that, for the time being, it produces one-word cluster labels rather than phrases as Lingo and STC. Cheers, S.