Grouping based on multiple filters/criterias
is it possible to have multiple filters/criterias on grouping? I am trying to do something like those, and I am assuming that from the statuses of the tickets, it doesnt seem possible? https://issues.apache.org/jira/browse/SOLR-2553 https://issues.apache.org/jira/browse/SOLR-2526 https://issues.apache.org/jira/browse/LUCENE-3257 To make everything clear, here is details which I am planning to do with Solr... so there is an activity feed of a site and it is basically working like facebook or linkedin newsfeed, though there is no relationship between users, it doesnt matter if i am following someone or not, as long as their settings allows me to see their posts and they hit my search filter, i will see their posts. the part related with grouping is tricky... so lets assume that you are able to see my posts, and I have posted 8 activities in the last one hour, those activities should appear different than other posts, as it would be a combined view of the posts... i.e activity one activity two . activity eight single activity single activity activity one activity two So here the results should be grouped depending on their post times... on solr (4.7.2), i am indexing activities as documents, and each document has bunch of fields including timestamp and source_user etc etc. is it possible to do this on current solr? (in case the details are not clear, please feel free to ask for more details :) ) - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-based-on-multiple-filters-criterias-tp4153462.html Sent from the Solr - User mailing list archive at Nabble.com.
Please help to filter of group.limit
Dear everyone. My problem have 2 query 1) Get top 1 of each group (group.limit = 1 AND group.sort = date desc & group.field=ABC) 2) Filter to get document of each group match condition. If document don't match condition then remove of list result. Help me. Thanks you. Hải
optimize and .nfsXXXX files
Hi, I am using solr 3.6.2. I use NFS and my index folder is a mounted folder. When I run the command: :/solr/collection1/update?optimize=true&maxSegments=1&waitFlush=true&expungeDeletes=true in order to optimize my index, I have some .nfsX files created while the optimize is running. The problem that i am having is that after optimize finishes its run the .nfs files aren't deleted. When I close the solr process they immediately disappear. I don't want to restart the solr process after each optimize, is there anything that can be done in order for solr to get rid of those files. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/optimize-and-nfs-files-tp4153473.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Clustering component different results than Carrot workbench
Though I am interacting with Dawid (creator of Carrot2) on Carrot2 mailing list however just wanted to post my problem to a wider audience. I am using Solr 4.7 (on both windows and linux) and saved my lingo-attributes.xml file from the workbench which I am using in Solr. Note that for testing I am just having one solr Index and all the queries are getting fired on that. Now the clusters that I am getting are good in the workbench (carrot) but pathetic in Solr. In the logs (jetty) I can see: Loaded Solr resource: clustering/carrot2/lingo-attributes.xml, so that indicates that my attribute file is being loaded. I am really confused what is accounting for the difference in the two outputs (workbench vs Solr). Again to reiterate the data sources are same (just one solr index and same queries with 100 results). This is happening on both Linux and Windows. Given below is my search component and request handler configuration: lingo org.carrot2.clustering.lingo.LingoClusteringAlgorithm 30 clustering/carrot2 true true org.carrot2.clustering.lingo.LingoClusteringAlgorithm clustering/carrot2 film_id description true false 100 clustering
Re: optimize and .nfsXXXX files
Soft commit (i.e. opening a new IndexReader in Lucene and closing the old one) should make those go away? The .nfsX files are created when a file is deleted but a local process (in this case, the current Lucene IndexReader) still has the file open. Mike McCandless http://blog.mikemccandless.com On Mon, Aug 18, 2014 at 5:20 AM, BorisG wrote: > Hi, > I am using solr 3.6.2. > I use NFS and my index folder is a mounted folder. > When I run the command: > :/solr/collection1/update?optimize=true&maxSegments=1&waitFlush=true&expungeDeletes=true > in order to optimize my index, I have some .nfsX files created while the > optimize is running. > The problem that i am having is that after optimize finishes its run the > .nfs files aren't deleted. > When I close the solr process they immediately disappear. > I don't want to restart the solr process after each optimize, is there > anything that can be done in order for solr to get rid of those files. > > Thanks, > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/optimize-and-nfs-files-tp4153473.html > Sent from the Solr - User mailing list archive at Nabble.com.
Editing http://wiki.apache.org/solr/PublicServers
Hy, My name is Istvan Kulcsar and i would like to edit this page: http://wiki.apache.org/solr/PublicServers Here is some SOLR search: http://www.odrportal.hu/kereso/ http://idea.unideb.hu/idealista/ http://www.jobmonitor.hu http://www.profession.hu/ http://webicina.com/ http://www.cylex.hu/ Én (14.08.13 23:14) http://kozbeszerzes.ceu.hu/ Thanks for help. Greets, Steve
Re: Retrieving and updating large set of documents on Solr 4.7.2
Hi, Not sure if you've seen https://issues.apache.org/jira/browse/SOLR-5244 ? It's not in Solr 4.7.2, but may be a good excuse to update Solr. Otis -- Solr Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Mon, Aug 18, 2014 at 4:09 AM, deniz wrote: > 0 down vote favorite > > > I am trying to implement an activity feed for a website, and planning to > use > Solr for this case. As it does not have any follower/following relation, > Solr is fitting for the requirements. > > There is one point which makes me concerned about performance. So as user > A, > I may have 10K activities in the feed, and then I have updated my > preferences, so the activities that I have posted should be updated too > (imagine that I am changing my user name, so all of the activities would > have my new username). In order to update the all 10K activities, i need to > retrieve the unique document ids from Solr, then update them. Retrieving > 10K > docs at once is not a good idea, if you imagine bunch of other users are > also doing a similar change. I have checked docs and forums, using Cursors > on Solr seems ok, but still makes me thing about the performance (after id > retrieval, i need to update each activity) > > Are there any other ways to handle this withou Cursors? Or I should better > use another tool/backend to have something like a username - activity_id > mapping, so i can directly retrieve the ids to update? > > Regards, > > > > > - > Zeki ama calismiyor... Calissa yapar... > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Retrieving-and-updating-large-set-of-documents-on-Solr-4-7-2-tp4153457.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: How to search for phrase "IAE_UPC_0001"
Hi Guys I've been checking into this further and have deleted the index a couple of times and rebuilt it with the suggestions you've supplied. I had a bit of an epiphany last week and decided to check if the document I was searching for was actually in the index (did this by doing a *.* query to a file and grep'ing for the 'IAE_UPC_0001@ string). It seems it isn't!! Not sure if it was in the original index or not, tho' I suspect not. As far as I can see anything with the reference in the form IAE_UPC_ has not been indexed while those with the reference in the form IAE-UPC- has. Not sure if that's a coincidence or not. Need to see if I can get the docs into the index and then check if the search works or not. Will see if the guys on the Nutch list can shed any light. All the best. P On 4 August 2014 17:09, Jack Krupansky wrote: > The standard tokenizer treats underscore as a valid token character, not a > delimiter. > > The word delimiter filter will treat underscore as a delimiter though. > > Make sure your query-time WDF does not have preserveOriginal="1" - but the > index-time WDF should have preserveOriginal="1". Otherwise, the query > phrase will generate an extra token which will participate in the matching > and might cause a mismatch. > > -- Jack Krupansky > > -Original Message- From: Paul Rogers > Sent: Monday, August 4, 2014 5:55 PM > > To: solr-user@lucene.apache.org > Subject: Re: How to search for phrase "IAE_UPC_0001" > > Hi Guys > > Thanks for the replies. I've had a look at the WordDelimiterFilterFactory > and the Term Info for the url field. It seems that all the terms exist and > I now understand that each url is being broken up using the delimiters > specified. But I think I'm still missing something. > > Am I correct in assuming the minus sign (-) is also a delimiter? > > If so why then does url:"IAE-UPC-0001" return a result (when the url > contains the substring IAE-UPC-0001) whereas url:"IAE_UPC_0001" doesn't > (when the url contains the substring IAE_UPC_0001)? > > Secondly if the url has indeed been broken into the terms IAE UPC and 0001 > why do all the searches suggested or tried succeed when the delimiter is a > minus sign (-) but not when the delimiter is an underscore (_), returning > zero matches? > > Finally, shouldn't the query url:"IAE UPC 0001"~1 work since all it is > looking for is the three terms? > > Many thanks for any enlightenment. > > P > > > > > On 4 August 2014 01:33, Harald Kirsch wrote: > > This all depends on how the tokenizers take your URLs apart. To quickly >> see what ended up in the index, go to a core in the UI, select Schema >> Browser, select the field containing your URLs, click on "Load Term Info". >> >> In your case, for the field holding the URL you could try to switch to a >> tokenizer that defines tokens as a sequence of alphanumeric characters, >> roughly [a-z0-9]+ plus diacritics. In particular punctuation and >> separation >> characters like dash, underscore, slash, dot and the like would never be >> part of a token, i.e. they don't make a difference. >> >> Then you can search the url parts with a phrase query ( >> https://cwiki.apache.org/confluence/display/solr/The+ >> Standard+Query+Parser#TheStandardQueryParser- >> SpecifyingTermsfortheStandardQueryParserwhich) like >> >> url:"IAE-UPC-0001" >> >> In the same way as during indexing, the dashes are removed to end up with >> three tokens, namely IAE, UPC and 0001. Further they have to be in that >> order. Naturally this will then match anything like: >> >> "IAE_UPC_0001" >> "IAE UPC 0001" >> "IAE/UPC+0001" >> "IAE\UPC\0001" >> "IAE.UPC,0001" >> >> Depending on how your URLs are structured, there is the chance for false >> positives, of course. >> >> The Really Good Thing here is, that you don't need to use wildcards. >> >> I have not yet looked at the wildcard-queries implementation in >> Solr/Lucene, but with the commercial search engines I know, they are a >> great way to loose the confidence of your users, because they just don't >> work as expected by anyone not knowing the implementation. Either they >> deliver only partial results or they kill the performance or they even go >> OOM. If Solr committers have not done something really ingenious, >> Solr/Lucene does have the same problems. >> >> Harald. >> >> >> >> >> >> >> On 31.07.2014 18:31, Paul Rogers wrote: >> >> Hi Guys >>> >>> I have a Solr application searching on data uploaded by Nutch. The >>> search >>> I wish to carry out is for a particular document reference contained >>> within >>> the "url" field, e.g. IAE-UPC-0001. >>> >>> The problem is is that the file names that comprise the url's are not >>> consistent, so a url might contain the reference as IAE-UPC-0001 or >>> IAE_UPC_0001 (ie using either the minus or underscore as the delimiter) >>> but >>> not both. >>> >>> I have created the query (in the solr admin interface): >>> >>> url:"IAE-UPC-0001" >>> >>> which works (retur
Combining a String Tag with a Numeric Value
Hello! I have some new entity data that I'm indexing which takes the form of: String: EntityString Float: Confidence I want to add these to a generic "Tags" field (for faceting), but I'm not sure how to hold onto the confidence. Token Payloads seem like one method, but then I'm not sure how to extract the Payload. Alternatively I could create two fields: TagIndexed which stores just the string value and TagStored which contains a delimited String|Float. What's the right way to do this? Thanks! -D
Re: Editing http://wiki.apache.org/solr/PublicServers
Steve: Sure. What we need to add you to the contributor's group is your Wiki logon though. Provide us that and we'll add you ASAP. Best, Erick On Mon, Aug 18, 2014 at 3:14 AM, wrote: > Hy, > > My name is Istvan Kulcsar and i would like to edit this page: > http://wiki.apache.org/solr/PublicServers > > Here is some SOLR search: > http://www.odrportal.hu/kereso/ > http://idea.unideb.hu/idealista/ > http://www.jobmonitor.hu > http://www.profession.hu/ > http://webicina.com/ > http://www.cylex.hu/ > Én (14.08.13 23:14) > http://kozbeszerzes.ceu.hu/ > > Thanks for help. > > Greets, > Steve
Re: How to search for phrase "IAE_UPC_0001"
I'd pull Nutch out of the mix here as a test. Create some test docs (use the exampleDocs directory?) and go from there at least long enough to insure that Solr does what you expect if the data gets there properly. You can set this up in about 10 minutes, and test it in about 15 more. May save you endless hours. Because you're conflating two issues here: 1> whether Nutch is sending the data 2> whether Solr is indexing and searching as you expect. Some of the Solr/Lucene analysis chains do transformations that may not be what you assume, particularly things like StandardTokenizer and WordDelimiterFilterFactory. So I'd take the time to see that the values you're dealing with are behaving as you expect. The admin/analysis page will help you a _lot_ here. Best, Erick On Mon, Aug 18, 2014 at 7:16 AM, Paul Rogers wrote: > Hi Guys > > I've been checking into this further and have deleted the index a couple of > times and rebuilt it with the suggestions you've supplied. > > I had a bit of an epiphany last week and decided to check if the document I > was searching for was actually in the index (did this by doing a *.* query > to a file and grep'ing for the 'IAE_UPC_0001@ string). It seems it isn't!! > Not sure if it was in the original index or not, tho' I suspect not. > > As far as I can see anything with the reference in the form IAE_UPC_ > has not been indexed while those with the reference in the form > IAE-UPC- has. Not sure if that's a coincidence or not. > > Need to see if I can get the docs into the index and then check if the > search works or not. Will see if the guys on the Nutch list can shed any > light. > > All the best. > > P > > > On 4 August 2014 17:09, Jack Krupansky wrote: > >> The standard tokenizer treats underscore as a valid token character, not a >> delimiter. >> >> The word delimiter filter will treat underscore as a delimiter though. >> >> Make sure your query-time WDF does not have preserveOriginal="1" - but the >> index-time WDF should have preserveOriginal="1". Otherwise, the query >> phrase will generate an extra token which will participate in the matching >> and might cause a mismatch. >> >> -- Jack Krupansky >> >> -Original Message- From: Paul Rogers >> Sent: Monday, August 4, 2014 5:55 PM >> >> To: solr-user@lucene.apache.org >> Subject: Re: How to search for phrase "IAE_UPC_0001" >> >> Hi Guys >> >> Thanks for the replies. I've had a look at the WordDelimiterFilterFactory >> and the Term Info for the url field. It seems that all the terms exist and >> I now understand that each url is being broken up using the delimiters >> specified. But I think I'm still missing something. >> >> Am I correct in assuming the minus sign (-) is also a delimiter? >> >> If so why then does url:"IAE-UPC-0001" return a result (when the url >> contains the substring IAE-UPC-0001) whereas url:"IAE_UPC_0001" doesn't >> (when the url contains the substring IAE_UPC_0001)? >> >> Secondly if the url has indeed been broken into the terms IAE UPC and 0001 >> why do all the searches suggested or tried succeed when the delimiter is a >> minus sign (-) but not when the delimiter is an underscore (_), returning >> zero matches? >> >> Finally, shouldn't the query url:"IAE UPC 0001"~1 work since all it is >> looking for is the three terms? >> >> Many thanks for any enlightenment. >> >> P >> >> >> >> >> On 4 August 2014 01:33, Harald Kirsch wrote: >> >> This all depends on how the tokenizers take your URLs apart. To quickly >>> see what ended up in the index, go to a core in the UI, select Schema >>> Browser, select the field containing your URLs, click on "Load Term Info". >>> >>> In your case, for the field holding the URL you could try to switch to a >>> tokenizer that defines tokens as a sequence of alphanumeric characters, >>> roughly [a-z0-9]+ plus diacritics. In particular punctuation and >>> separation >>> characters like dash, underscore, slash, dot and the like would never be >>> part of a token, i.e. they don't make a difference. >>> >>> Then you can search the url parts with a phrase query ( >>> https://cwiki.apache.org/confluence/display/solr/The+ >>> Standard+Query+Parser#TheStandardQueryParser- >>> SpecifyingTermsfortheStandardQueryParserwhich) like >>> >>> url:"IAE-UPC-0001" >>> >>> In the same way as during indexing, the dashes are removed to end up with >>> three tokens, namely IAE, UPC and 0001. Further they have to be in that >>> order. Naturally this will then match anything like: >>> >>> "IAE_UPC_0001" >>> "IAE UPC 0001" >>> "IAE/UPC+0001" >>> "IAE\UPC\0001" >>> "IAE.UPC,0001" >>> >>> Depending on how your URLs are structured, there is the chance for false >>> positives, of course. >>> >>> The Really Good Thing here is, that you don't need to use wildcards. >>> >>> I have not yet looked at the wildcard-queries implementation in >>> Solr/Lucene, but with the commercial search engines I know, they are a >>> great way to loose the confidence of your u
Re: Combining a String Tag with a Numeric Value
Hmmm, there's no particular "right way". It'd be simpler to index these as two separate fields _if_ there's only one pair per document. If there are more and you index them as two mutliValued fields, there's no good way at _query_ time to retain the association. The returned multiValued fields are guaranteed to be in the same order of insertion so you can display the correct pairs, but you can't use the association to score docs. Hmmm, somewhat abstract. OK say you want to associate two tag/value pairs, tag1:50 and tag2:100. Say further that you have two multiValued fields, Tags and Values and then index tag1 and tag2 into Tags and 50 and 100 into Values. There's no good way to express "q=tags:tag1 and factor the associated value of 50 into the score" Note that the returned _values_ will be Tags: tag1 tag2 Values 50 100 So at that point you can see the associations. that said, if there's only _one_ such tag/value pair per document, it's easy to write a FunctionQuery (http://wiki.apache.org/solr/FunctionQuery) that does this. *** If you have many tag/value pairs, payloads are probably what you want. Here's an end-to-end example: http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/ Best, Erick On Mon, Aug 18, 2014 at 7:32 AM, Dave Seltzer wrote: > Hello! > > I have some new entity data that I'm indexing which takes the form of: > > String: EntityString > Float: Confidence > > I want to add these to a generic "Tags" field (for faceting), but I'm not > sure how to hold onto the confidence. Token Payloads seem like one method, > but then I'm not sure how to extract the Payload. > > Alternatively I could create two fields: TagIndexed which stores just the > string value and TagStored which contains a delimited String|Float. > > What's the right way to do this? > > Thanks! > > -D
need help in field collapsing
Hi I have about 15 fields in my solr schema but there are two fields lets say them field1 and field2 in my schema. For most searches I feel I have a perfect schema but for one use case it is not apt: *problem*: I have to group by column using field1 and then I have to search a particular value "a" in field1 only when "b" is not present in any instance of field2 of this respective group. (Same as using "having" after group by in mysql). Is there a way to do this in Solr or do I have to maintain a separate schema for this(which will be a very costly operation for us). Thanks in advance
solr cloud going down repeatedly
Hi guys. I have a solr cloud, consisting of 3 zookeper VMs running 3.4.5 backported from Ubuntu 14.04 LTS to 12.04 LTS. They are orchestrating 4 solr nodes, which have 2 cores. Each core is sharded, so 1 shard is on each of the solr nodes. Solr runs under tomcat7 and ubuntus latest openjdk 7. Version of solr is 4.2.1. Each of the nodes have around 7GB of data, and JVM is set to run 8GB heap. All solr nodes have 16GB RAM. Few weeks back we started having issues with this installation. Tomcat was filling up catalina.out with following messages: SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: Only solution was to restart all 4 tomcats on 4 solr nodes. After that, issue would rectify itself, but would occur again, approximately a week after a restart. This happened last time yesterday, and I succeded in recording some of the stuff happening on boxes via Zabbix and atop. Basically at 15:35 load on machine went berzerk, jumping from around 0.5 to around 30+ Zabbix and atop didn't notice any heavy IO, all the other processes were practicaly idle, only JVM (tomcat) exploded with cpu usage increasing from standard ~80% to around ~750% These are the parts of Atop recordings on one of the node. Note that they are 10 mins appart: (15:28:42) CPL | avg10.12 | | avg50.36 | avg15 0.38 | (15:38:42) CPL | avg18.54 | | avg53.62 | avg15 1.61 | (15:48:42) CPL | avg1 30.14 | | avg5 27.09 | avg15 14.73 | This is the status of tomcat at last point (15:48:42): 28891tomcat7 tomcat7 411 8.68s 70m14s 209.9M 204K0K 5804K -- - S5704%java I have noticed similar stuff happening around the solr nodes. At 17:41 on call person decided to hard reset all the solr nodes, and cloud came back up running normally after that. These are the logs that I found on first node: Aug 17, 2014 3:44:58 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: Aug 17, 2014 3:46:12 PM org.apache.solr.cloud.OverseerCollectionProcessor run WARNING: Overseer cannot talk to ZK Aug 17, 2014 3:46:12 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater amILeader WARNING: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer_elect/leader Then a bunch of : Aug 17, 2014 3:46:42 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: until the server was rebooted. On other nodes I can see: node2: Aug 17, 2014 3:44:58 PM org.apache.solr.cloud.RecoveryStrategy close WARNING: Stopping recovery for zkNodeName=10.100.254.103:8080_solr_myappcore=myapp Aug 17, 2014 3:44:58 PM org.apache.solr.cloud.RecoveryStrategy close WARNING: Stopping recovery for zkNodeName=10.100.254.103:8080_solr_myapp2core=myapp2 Aug 17, 2014 3:46:24 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://node1:8080/solr/myapp node4: Aug 17, 2014 3:44:06 PM org.apache.solr.cloud.RecoveryStrategy close WARNING: Stopping recovery for zkNodeName=10.100.254.105:8080_solr_myapp2core=myapp2 Aug 17, 2014 3:44:09 PM org.apache.solr.cloud.RecoveryStrategy close WARNING: Stopping recovery for zkNodeName=10.100.254.105:8080_solr_myappcore=myapp Aug 17, 2014 3:45:37 PM org.apache.solr.common.SolrException log SEVERE: There was a problem finding the leader in zk:org.apache.solr.common.SolrException: Could not get leader props My impression is that garbage collector is at fault here. This is the cmdline of tomcat: /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Djava.util.logging.config.file=/var/lib/tomcat7/conf/logging.properties -Djava.awt.headless=true -Xmx8192m -XX:+UseConcMarkSweepGC -DnumShards=2 -Djetty.port=8080 -DzkHost=10.215.1.96:2181,10.215.1.97:2181,10.215.1.98:2181 -javaagent:/opt/newrelic/newrelic.jar -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djav .endorsed.dirs=/usr/share/tomcat7/endorsed -classpath /usr/share/tomcat7/bin/bootstrap.jar:/usr/share/tomcat7/bin/tomcat-juli.jar -Dcatalina.base=/var/lib/tomcat7 -Dcatalina.home=/usr/share/tomcat7 -Djava.io.tmpdir=/tmp/tomcat7-tomcat7-tmp org.apache.catalina.startup.Bootstrap start So, I am using MarkSweepGC. Do you have any suggestion how can I debug this further and potentially eliminate the issue causing downtimes?
Re: How to restore an index from a backup over HTTP
I¹m able to do cross-solrcloud-cluster index copy using nothing more than careful use of the ³fetchindex² replication handler command. I¹m using this as a build/deployment tool, so I manually create a collection in two clusters, index into one, test, and then ask the other cluster to fetchindex from it on each shard/replica. Some caveats: 1. It seems like fetchindex may silently decline if it thinks the index it has is newer. 2. I¹m not doing this on an index that¹s currently receiving updates. 3. SolrCloud replication doesn¹t come into this flow, even if you fetchindex on a leader. (although once you¹re done, updates should get replicated normally) 4. Both collections must be created with the same number of shards and sharding mechanism. (although replication factor can vary) I¹ve got a tool for automating this that I¹d like to push to github at some point, let me know if you¹re interested. On 8/16/14, 3:03 AM, "Greg Solovyev" wrote: >Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty >straight forward, but the main concern I have is the internal data format >that ReplicationHandler and SnapPuller use. This new handler as well as >the code that I've already written to download the index files from Solr >will depend on that format. Unfortunately, this format is not documented >and is not abstracted by SolrJ, so I wonder what I can do to make sure it >does not change on us without notice. > >Thanks, >Greg > >- Original Message - >From: "Shawn Heisey" >To: solr-user@lucene.apache.org >Sent: Friday, August 15, 2014 7:31:19 PM >Subject: Re: How to restore an index from a backup over HTTP > >On 8/15/2014 5:51 AM, Greg Solovyev wrote: >> What I want to achieve is being able to send the backed up index to >>Solr (either standalone or with ZooKeeper) in a way similar to creating >>a new Collection. I.e. create a new collection and upload an exiting >>index directly into that Collection. I've looked through Solr code and >>so far I have not found a handler that would allow this scenario. So, >>the last idea is to implement a special handler for this case, perhaps >>extending CoreAdminHandler. ReplicationHandler together with SnapPuller >>do pretty much what I need to do, except that the action has to be >>initiated by the receiving Solr server and I need to initiate the action >>externally. I.e., instead of having Solr slave download an index from >>Solr master, I need to feed the index to Solr master and ideally this >>would work the same way in standalone and SolrCloud modes. > >I have not made any attempt to verify what I'm stating below. It may >not work. > >What I think I would *try* is setting up a standalone Solr (no cloud) on >the backup server. Use scripted index/config copies and Solr start/stop >actions to get the index up and running on a known core in the >standalone Solr. Then use the replication handler's HTTP API to >replicate the index from that standalone server to each of the replicas >in your cluster. > >https://wiki.apache.org/solr/SolrReplication#HTTP_API >https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexRe >plication-HTTPAPICommandsfortheReplicationHandler > >One thing that I do not know is whether SolrCloud itself might interfere >with these actions, or whether it might automatically take care of >additional replicas if you replicate to the shard leader. If SolrCloud >*would* interfere, then this idea might need special support in >SolrCloud, perhaps as an extension to the Collections API. If it won't >interfere, then the use-case would need to be documented (on the user >wiki at a minimum) so that committers will be aware of it and preserve >the capability in future versions. An extension to the Collections API >might be a good idea either way -- I've seen a number of questions about >capability that falls under this basic heading. > >Thanks, >Shawn
Re: How to restore an index from a backup over HTTP
Thanks Jeff, I'd be interested in taking a look at the code for this tool. My github ID is grishick. Thanks, Greg - Original Message - From: "Jeff Wartes" To: solr-user@lucene.apache.org Sent: Monday, August 18, 2014 9:49:28 PM Subject: Re: How to restore an index from a backup over HTTP I¹m able to do cross-solrcloud-cluster index copy using nothing more than careful use of the ³fetchindex² replication handler command. I¹m using this as a build/deployment tool, so I manually create a collection in two clusters, index into one, test, and then ask the other cluster to fetchindex from it on each shard/replica. Some caveats: 1. It seems like fetchindex may silently decline if it thinks the index it has is newer. 2. I¹m not doing this on an index that¹s currently receiving updates. 3. SolrCloud replication doesn¹t come into this flow, even if you fetchindex on a leader. (although once you¹re done, updates should get replicated normally) 4. Both collections must be created with the same number of shards and sharding mechanism. (although replication factor can vary) I¹ve got a tool for automating this that I¹d like to push to github at some point, let me know if you¹re interested. On 8/16/14, 3:03 AM, "Greg Solovyev" wrote: >Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty >straight forward, but the main concern I have is the internal data format >that ReplicationHandler and SnapPuller use. This new handler as well as >the code that I've already written to download the index files from Solr >will depend on that format. Unfortunately, this format is not documented >and is not abstracted by SolrJ, so I wonder what I can do to make sure it >does not change on us without notice. > >Thanks, >Greg > >- Original Message - >From: "Shawn Heisey" >To: solr-user@lucene.apache.org >Sent: Friday, August 15, 2014 7:31:19 PM >Subject: Re: How to restore an index from a backup over HTTP > >On 8/15/2014 5:51 AM, Greg Solovyev wrote: >> What I want to achieve is being able to send the backed up index to >>Solr (either standalone or with ZooKeeper) in a way similar to creating >>a new Collection. I.e. create a new collection and upload an exiting >>index directly into that Collection. I've looked through Solr code and >>so far I have not found a handler that would allow this scenario. So, >>the last idea is to implement a special handler for this case, perhaps >>extending CoreAdminHandler. ReplicationHandler together with SnapPuller >>do pretty much what I need to do, except that the action has to be >>initiated by the receiving Solr server and I need to initiate the action >>externally. I.e., instead of having Solr slave download an index from >>Solr master, I need to feed the index to Solr master and ideally this >>would work the same way in standalone and SolrCloud modes. > >I have not made any attempt to verify what I'm stating below. It may >not work. > >What I think I would *try* is setting up a standalone Solr (no cloud) on >the backup server. Use scripted index/config copies and Solr start/stop >actions to get the index up and running on a known core in the >standalone Solr. Then use the replication handler's HTTP API to >replicate the index from that standalone server to each of the replicas >in your cluster. > >https://wiki.apache.org/solr/SolrReplication#HTTP_API >https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexRe >plication-HTTPAPICommandsfortheReplicationHandler > >One thing that I do not know is whether SolrCloud itself might interfere >with these actions, or whether it might automatically take care of >additional replicas if you replicate to the shard leader. If SolrCloud >*would* interfere, then this idea might need special support in >SolrCloud, perhaps as an extension to the Collections API. If it won't >interfere, then the use-case would need to be documented (on the user >wiki at a minimum) so that committers will be aware of it and preserve >the capability in future versions. An extension to the Collections API >might be a good idea either way -- I've seen a number of questions about >capability that falls under this basic heading. > >Thanks, >Shawn
Re: How to restore an index from a backup over HTTP
Shawn, the format that I am referencing is "filestream", which starts with 2 bytes carrying file size, then 4 bytes carrying checksum (optional) and then the actual bits of the file. Thanks, Greg - Original Message - From: "Shawn Heisey" To: solr-user@lucene.apache.org Sent: Sunday, August 17, 2014 12:28:12 AM Subject: Re: How to restore an index from a backup over HTTP On 8/16/2014 4:03 AM, Greg Solovyev wrote: > Thanks Shawn, this is a pretty cool idea. Adding the handler seems pretty > straight forward, but the main concern I have is the internal data format > that ReplicationHandler and SnapPuller use. This new handler as well as the > code that I've already written to download the index files from Solr will > depend on that format. Unfortunately, this format is not documented and is > not abstracted by SolrJ, so I wonder what I can do to make sure it does not > change on us without notice. I am not really sure what format you're referencing here, but I'm about 99% sure the format *over the wire* is javabin. When the javabin format changed between 1.4.1 and 3.1.0, replication between those versions became impossible. Historical: The Solr version made a huge leap after the Solr and Lucene development was merged -- it was synchronized with the Lucene version. There are no 1.5, 2.x, or 3.0 versions of Solr. https://issues.apache.org/jira/browse/SOLR-2204 Thanks, Shawn
Re: solr cloud going down repeatedly
On 8/18/2014 11:30 AM, Jakov Sosic wrote: > My impression is that garbage collector is at fault here. > > This is the cmdline of tomcat: > > /usr/lib/jvm/java-7-openjdk-amd64/bin/java > -Djava.util.logging.config.file=/var/lib/tomcat7/conf/logging.properties > -Djava.awt.headless=true -Xmx8192m -XX:+UseConcMarkSweepGC > -DnumShards=2 -Djetty.port=8080 > -DzkHost=10.215.1.96:2181,10.215.1.97:2181,10.215.1.98:2181 > -javaagent:/opt/newrelic/newrelic.jar -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.port=9010 > -Dcom.sun.management.jmxremote.local.only=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false > -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager > -Djav .endorsed.dirs=/usr/share/tomcat7/endorsed -classpath > /usr/share/tomcat7/bin/bootstrap.jar:/usr/share/tomcat7/bin/tomcat-juli.jar > -Dcatalina.base=/var/lib/tomcat7 -Dcatalina.home=/usr/share/tomcat7 > -Djava.io.tmpdir=/tmp/tomcat7-tomcat7-tmp > org.apache.catalina.startup.Bootstrap start With an 8GB heap and "UseConcMarkSweepGC" as your only GC tuning, I can pretty much guarantee that you'll see occasional GC pauses of 10-15 seconds, because I saw exactly that happening with my own setup. This is what I use now: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning I can't claim that my problem is 100% solved, but collections that go over one second are *very* rare now, and I'm pretty sure they are all under two seconds. Thanks, Shawn
Need details on this query
Hi, This might be a silly question.. I came across the below query online but I couldn't really understand the bolded part. Can someone help me understanding this part of the query? deviceType_:"Cell" OR deviceType_:"Prepaid" *OR (phone -data_source_name:("Catalog" OR "Device How To - Interactive" OR "Device How To - StepByStep"))* Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/Need-details-on-this-query-tp4153606.html Sent from the Solr - User mailing list archive at Nabble.com.
how to generate stats based on time segments?
hi I have dataset in solr like id|time|price| 1|t0|100| 1|t1|10| 1|t2|20| 1|t3|30| What i want is when i query solr for time > t0 I want to return data like t1, 100 rest,60 ( which is sum of price for t1,t2,t3) Is that something can be done? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-generate-stats-based-on-time-segments-tp4153607.html Sent from the Solr - User mailing list archive at Nabble.com.
faceted query with stats not working in solrj
Hi. I have a query that works just fine in the browser. It rolls up documents by the facet field and gives me stats on the stats field: http://localhost:8983/solr/corename/select?q=*:*&stats=on&stats.field=Spend&stats.facet=Supplier Posting this works just fine. However I cannot get stats from SolrJ or the solr admin console. From the admin console (on the Query tab) I see: can not use FieldCache on a field which is neither indexed nor has doc values: Supplier?wt=xml Both Spend and Supplier are indexed. The error must be referring to something else. In Java, I use query.addStatsFieldFacets("Spend", "Supplier"); but the stats object comes back null. response.getFieldStatsInfo() == null Thanks so much for any suggestions. using solr 4.9 -- View this message in context: http://lucene.472066.n3.nabble.com/faceted-query-with-stats-not-working-in-solrj-tp4153608.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: faceted query with stats not working in solrj
On 8/18/2014 12:47 PM, tedsolr wrote: > Hi. I have a query that works just fine in the browser. It rolls up documents > by the facet field and gives me stats on the stats field: > > http://localhost:8983/solr/corename/select?q=*:*&stats=on&stats.field=Spend&stats.facet=Supplier > > Posting this works just fine. However I cannot get stats from SolrJ or the > solr admin console. From the admin console (on the Query tab) I see: > can not use FieldCache on a field which is neither indexed > nor has doc values: Supplier?wt=xml > > Both Spend and Supplier are indexed. The error must be referring to > something else. > > In Java, I use > query.addStatsFieldFacets("Spend", "Supplier"); > but the stats object comes back null. > response.getFieldStatsInfo() == null I won't claim to know how the stats stuff works, but one thing to do is make sure Solr is logging at the INFO level or finer, then look at the Solr log to see what the differences are in the actual query that Solr is receiving when you do it in the browser and when you do it with SolrJ. You will need to look at the actual log file, not the logging tab in the admin UI. When using the example included in the Solr download, the logfile is at logs/solr.log. If you're using another method for starting Solr, that may be different. Thanks, Shawn
Currency field type not supported for stats
Just looking for confirmation that the currency field is not supported for stats. When I use a currency field as the stats.field I get his error: http://localhost:8983/solr/corename/select?q=*:*&stats=on&stats.field=SpendAsCurrency&stats.facet=Supplier Field type currency{class=org.apache.solr.schema.CurrencyField,analyzer=org.apache.solr.schema.FieldType$DefaultAnalyzer,args={precisionStep=8, multiValued=false, currencyConfig=currency.xml, defaultCurrency=USD, class=solr.CurrencyField}} is not currently supported When I run stats on a long type it works fine. I can of course work around this by modifying my schema. So is currency not a numeric type in solr? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Currency-field-type-not-supported-for-stats-tp4153610.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to search for phrase "IAE_UPC_0001"
Hi Erick Thanks for the assist. Did as you suggested (tho' I used Nutch). Cleared out solr's index and Nutch's crawl DB and then emptied all the documents out of the web server bar 10 of each type (IAE-UPC- and IAE_UPC_). Then crawled the site using Nutch. Then confirmed that all 20 docs had been uploaded and that *.* search returned all 20 docs. Now when I do a url search on either (for example) q=url:"IAE-UPC-220" or q="IAE_UPC_0001" I get a result returned for each as expected, ie it now works as expected. So seems I now need to figure out why Nutch isn't crawling the documents. Again many thanks. P On 18 August 2014 11:22, Erick Erickson wrote: > I'd pull Nutch out of the mix here as a test. Create > some test docs (use the exampleDocs directory?) and > go from there at least long enough to insure that Solr > does what you expect if the data gets there properly. > > You can set this up in about 10 minutes, and test it > in about 15 more. May save you endless hours. > > Because you're conflating two issues here: > 1> whether Nutch is sending the data > 2> whether Solr is indexing and searching as you expect. > > Some of the Solr/Lucene analysis chains do transformations > that may not be what you assume, particularly things > like StandardTokenizer and WordDelimiterFilterFactory. > > So I'd take the time to see that the values you're dealing > with are behaving as you expect. The admin/analysis page > will help you a _lot_ here. > > Best, > Erick > > > > > On Mon, Aug 18, 2014 at 7:16 AM, Paul Rogers > wrote: > > Hi Guys > > > > I've been checking into this further and have deleted the index a couple > of > > times and rebuilt it with the suggestions you've supplied. > > > > I had a bit of an epiphany last week and decided to check if the > document I > > was searching for was actually in the index (did this by doing a *.* > query > > to a file and grep'ing for the 'IAE_UPC_0001@ string). It seems it > isn't!! > > Not sure if it was in the original index or not, tho' I suspect not. > > > > As far as I can see anything with the reference in the form IAE_UPC_ > > has not been indexed while those with the reference in the form > > IAE-UPC- has. Not sure if that's a coincidence or not. > > > > Need to see if I can get the docs into the index and then check if the > > search works or not. Will see if the guys on the Nutch list can shed any > > light. > > > > All the best. > > > > P > > > > > > On 4 August 2014 17:09, Jack Krupansky wrote: > > > >> The standard tokenizer treats underscore as a valid token character, > not a > >> delimiter. > >> > >> The word delimiter filter will treat underscore as a delimiter though. > >> > >> Make sure your query-time WDF does not have preserveOriginal="1" - but > the > >> index-time WDF should have preserveOriginal="1". Otherwise, the query > >> phrase will generate an extra token which will participate in the > matching > >> and might cause a mismatch. > >> > >> -- Jack Krupansky > >> > >> -Original Message- From: Paul Rogers > >> Sent: Monday, August 4, 2014 5:55 PM > >> > >> To: solr-user@lucene.apache.org > >> Subject: Re: How to search for phrase "IAE_UPC_0001" > >> > >> Hi Guys > >> > >> Thanks for the replies. I've had a look at the > WordDelimiterFilterFactory > >> and the Term Info for the url field. It seems that all the terms exist > and > >> I now understand that each url is being broken up using the delimiters > >> specified. But I think I'm still missing something. > >> > >> Am I correct in assuming the minus sign (-) is also a delimiter? > >> > >> If so why then does url:"IAE-UPC-0001" return a result (when the url > >> contains the substring IAE-UPC-0001) whereas url:"IAE_UPC_0001" doesn't > >> (when the url contains the substring IAE_UPC_0001)? > >> > >> Secondly if the url has indeed been broken into the terms IAE UPC and > 0001 > >> why do all the searches suggested or tried succeed when the delimiter > is a > >> minus sign (-) but not when the delimiter is an underscore (_), > returning > >> zero matches? > >> > >> Finally, shouldn't the query url:"IAE UPC 0001"~1 work since all it is > >> looking for is the three terms? > >> > >> Many thanks for any enlightenment. > >> > >> P > >> > >> > >> > >> > >> On 4 August 2014 01:33, Harald Kirsch > wrote: > >> > >> This all depends on how the tokenizers take your URLs apart. To quickly > >>> see what ended up in the index, go to a core in the UI, select Schema > >>> Browser, select the field containing your URLs, click on "Load Term > Info". > >>> > >>> In your case, for the field holding the URL you could try to switch to > a > >>> tokenizer that defines tokens as a sequence of alphanumeric characters, > >>> roughly [a-z0-9]+ plus diacritics. In particular punctuation and > >>> separation > >>> characters like dash, underscore, slash, dot and the like would never > be > >>> part of a token, i.e. they don't make a difference. > >>> > >>> Then you can search th
logging in solr
Hi, Currently in my component Solr is logging to catalina.out. What is the configuration needed to redirect those logs to some custom logfile eg: Solr.log. Thanks... --Arjun
Re: Combining a String Tag with a Numeric Value
Thanks Erick, I'm not sure I need to score the documents based on the numeric value, but I am interested in being able to calculate the average (Mean) of all the numeric values for a given tag. For example, what is the average confidence of Tag1 across all documents. I'm not sure I can do that without building a FunctionQuery. -Dave On Mon, Aug 18, 2014 at 12:46 PM, Erick Erickson wrote: > Hmmm, there's no particular "right way". It'd be simpler > to index these as two separate fields _if_ there's only > one pair per document. If there are more and you index them > as two mutliValued fields, there's no good way at _query_ time > to retain the association. The returned multiValued fields are > guaranteed to be in the same order of insertion so you can > display the correct pairs, but you can't use the association > to score docs. Hmmm, somewhat abstract. OK say you want to > associate two tag/value pairs, tag1:50 and tag2:100. Say further > that you have two multiValued fields, Tags and Values and then > index tag1 and tag2 into Tags and 50 and 100 into Values. > There's no good way to express "q=tags:tag1 and factor the > associated value of 50 into the score" > > Note that the returned _values_ will be > Tags: tag1 tag2 > Values 50 100 > > So at that point you can see the associations. > > that said, if there's only _one_ such tag/value pair per document, > it's easy to write a FunctionQuery ( > http://wiki.apache.org/solr/FunctionQuery) > that does this. > > *** > > If you have many tag/value pairs, payloads are probably what you want. > Here's an end-to-end example: > > http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/ > > Best, > Erick > > On Mon, Aug 18, 2014 at 7:32 AM, Dave Seltzer wrote: > > Hello! > > > > I have some new entity data that I'm indexing which takes the form of: > > > > String: EntityString > > Float: Confidence > > > > I want to add these to a generic "Tags" field (for faceting), but I'm not > > sure how to hold onto the confidence. Token Payloads seem like one > method, > > but then I'm not sure how to extract the Payload. > > > > Alternatively I could create two fields: TagIndexed which stores just the > > string value and TagStored which contains a delimited String|Float. > > > > What's the right way to do this? > > > > Thanks! > > > > -D
Re: logging in solr
Hi, Are you using tomcat or jetty? If you use the default jetty, have a look to : http://wiki.apache.org/solr/LoggingInDefaultJettySetup Regards, Aurélien Le 18/08/2014 22:43, M, Arjun (NSN - IN/Bangalore) a écrit : Hi, Currently in my component Solr is logging to catalina.out. What is the configuration needed to redirect those logs to some custom logfile eg: Solr.log. Thanks... --Arjun
Re: logging in solr
Sorry, outdated link. And I suppose you use tomcat if you are talking about catalina.out The correct link is : http://wiki.apache.org/solr/SolrLogging#Solr_4.3_and_above Le 18/08/2014 23:06, Aurélien MAZOYER a écrit : Hi, Are you using tomcat or jetty? If you use the default jetty, have a look to : http://wiki.apache.org/solr/LoggingInDefaultJettySetup Regards, Aurélien Le 18/08/2014 22:43, M, Arjun (NSN - IN/Bangalore) a écrit : Hi, Currently in my component Solr is logging to catalina.out. What is the configuration needed to redirect those logs to some custom logfile eg: Solr.log. Thanks... --Arjun
Re: Need details on this query
OR (phone -data_source_name:("Catalog" OR "Device How To - Interactive" OR "Device How To - StepByStep")) Just an OR clause that searches for all documents that have "phone" ( in the default search field or multiple fields if its an edismax parser). Remove from that set any documents with a data_source_name that contains any of the three phrases: "Catalog" "Device How To - Interactive" "Device How To - StepByStep" and return all those documents in the query HTH, ErIck On Mon, Aug 18, 2014 at 11:42 AM, bbarani wrote: > Hi, > > This might be a silly question.. > > I came across the below query online but I couldn't really understand the > bolded part. Can someone help me understanding this part of the query? > > deviceType_:"Cell" OR deviceType_:"Prepaid" *OR (phone > -data_source_name:("Catalog" OR "Device How To - Interactive" OR "Device How > To - StepByStep"))* > > Thanks, > Barani > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Need-details-on-this-query-tp4153606.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to search for phrase "IAE_UPC_0001"
NP, glad you're making forward progress! Erick On Mon, Aug 18, 2014 at 12:31 PM, Paul Rogers wrote: > Hi Erick > > Thanks for the assist. Did as you suggested (tho' I used Nutch). Cleared > out solr's index and Nutch's crawl DB and then emptied all the documents > out of the web server bar 10 of each type (IAE-UPC- and IAE_UPC_). > Then crawled the site using Nutch. > > Then confirmed that all 20 docs had been uploaded and that *.* search > returned all 20 docs. > > Now when I do a url search on either (for example) q=url:"IAE-UPC-220" or > q="IAE_UPC_0001" I get a result returned for each as expected, ie it now > works as expected. > > So seems I now need to figure out why Nutch isn't crawling the documents. > > Again many thanks. > > P > > > > > On 18 August 2014 11:22, Erick Erickson wrote: > >> I'd pull Nutch out of the mix here as a test. Create >> some test docs (use the exampleDocs directory?) and >> go from there at least long enough to insure that Solr >> does what you expect if the data gets there properly. >> >> You can set this up in about 10 minutes, and test it >> in about 15 more. May save you endless hours. >> >> Because you're conflating two issues here: >> 1> whether Nutch is sending the data >> 2> whether Solr is indexing and searching as you expect. >> >> Some of the Solr/Lucene analysis chains do transformations >> that may not be what you assume, particularly things >> like StandardTokenizer and WordDelimiterFilterFactory. >> >> So I'd take the time to see that the values you're dealing >> with are behaving as you expect. The admin/analysis page >> will help you a _lot_ here. >> >> Best, >> Erick >> >> >> >> >> On Mon, Aug 18, 2014 at 7:16 AM, Paul Rogers >> wrote: >> > Hi Guys >> > >> > I've been checking into this further and have deleted the index a couple >> of >> > times and rebuilt it with the suggestions you've supplied. >> > >> > I had a bit of an epiphany last week and decided to check if the >> document I >> > was searching for was actually in the index (did this by doing a *.* >> query >> > to a file and grep'ing for the 'IAE_UPC_0001@ string). It seems it >> isn't!! >> > Not sure if it was in the original index or not, tho' I suspect not. >> > >> > As far as I can see anything with the reference in the form IAE_UPC_ >> > has not been indexed while those with the reference in the form >> > IAE-UPC- has. Not sure if that's a coincidence or not. >> > >> > Need to see if I can get the docs into the index and then check if the >> > search works or not. Will see if the guys on the Nutch list can shed any >> > light. >> > >> > All the best. >> > >> > P >> > >> > >> > On 4 August 2014 17:09, Jack Krupansky wrote: >> > >> >> The standard tokenizer treats underscore as a valid token character, >> not a >> >> delimiter. >> >> >> >> The word delimiter filter will treat underscore as a delimiter though. >> >> >> >> Make sure your query-time WDF does not have preserveOriginal="1" - but >> the >> >> index-time WDF should have preserveOriginal="1". Otherwise, the query >> >> phrase will generate an extra token which will participate in the >> matching >> >> and might cause a mismatch. >> >> >> >> -- Jack Krupansky >> >> >> >> -Original Message- From: Paul Rogers >> >> Sent: Monday, August 4, 2014 5:55 PM >> >> >> >> To: solr-user@lucene.apache.org >> >> Subject: Re: How to search for phrase "IAE_UPC_0001" >> >> >> >> Hi Guys >> >> >> >> Thanks for the replies. I've had a look at the >> WordDelimiterFilterFactory >> >> and the Term Info for the url field. It seems that all the terms exist >> and >> >> I now understand that each url is being broken up using the delimiters >> >> specified. But I think I'm still missing something. >> >> >> >> Am I correct in assuming the minus sign (-) is also a delimiter? >> >> >> >> If so why then does url:"IAE-UPC-0001" return a result (when the url >> >> contains the substring IAE-UPC-0001) whereas url:"IAE_UPC_0001" doesn't >> >> (when the url contains the substring IAE_UPC_0001)? >> >> >> >> Secondly if the url has indeed been broken into the terms IAE UPC and >> 0001 >> >> why do all the searches suggested or tried succeed when the delimiter >> is a >> >> minus sign (-) but not when the delimiter is an underscore (_), >> returning >> >> zero matches? >> >> >> >> Finally, shouldn't the query url:"IAE UPC 0001"~1 work since all it is >> >> looking for is the three terms? >> >> >> >> Many thanks for any enlightenment. >> >> >> >> P >> >> >> >> >> >> >> >> >> >> On 4 August 2014 01:33, Harald Kirsch >> wrote: >> >> >> >> This all depends on how the tokenizers take your URLs apart. To quickly >> >>> see what ended up in the index, go to a core in the UI, select Schema >> >>> Browser, select the field containing your URLs, click on "Load Term >> Info". >> >>> >> >>> In your case, for the field holding the URL you could try to switch to >> a >> >>> tokenizer that defines tokens as a sequence of alphanumeric character
Re: Combining a String Tag with a Numeric Value
If you're doing this in a sharded environment, it may be "interesting". Good Luck! Erick On Mon, Aug 18, 2014 at 2:03 PM, Dave Seltzer wrote: > Thanks Erick, > > I'm not sure I need to score the documents based on the numeric value, but > I am interested in being able to calculate the average (Mean) of all the > numeric values for a given tag. For example, what is the average confidence > of Tag1 across all documents. > > I'm not sure I can do that without building a FunctionQuery. > > -Dave > > > On Mon, Aug 18, 2014 at 12:46 PM, Erick Erickson > wrote: > >> Hmmm, there's no particular "right way". It'd be simpler >> to index these as two separate fields _if_ there's only >> one pair per document. If there are more and you index them >> as two mutliValued fields, there's no good way at _query_ time >> to retain the association. The returned multiValued fields are >> guaranteed to be in the same order of insertion so you can >> display the correct pairs, but you can't use the association >> to score docs. Hmmm, somewhat abstract. OK say you want to >> associate two tag/value pairs, tag1:50 and tag2:100. Say further >> that you have two multiValued fields, Tags and Values and then >> index tag1 and tag2 into Tags and 50 and 100 into Values. >> There's no good way to express "q=tags:tag1 and factor the >> associated value of 50 into the score" >> >> Note that the returned _values_ will be >> Tags: tag1 tag2 >> Values 50 100 >> >> So at that point you can see the associations. >> >> that said, if there's only _one_ such tag/value pair per document, >> it's easy to write a FunctionQuery ( >> http://wiki.apache.org/solr/FunctionQuery) >> that does this. >> >> *** >> >> If you have many tag/value pairs, payloads are probably what you want. >> Here's an end-to-end example: >> >> http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/ >> >> Best, >> Erick >> >> On Mon, Aug 18, 2014 at 7:32 AM, Dave Seltzer wrote: >> > Hello! >> > >> > I have some new entity data that I'm indexing which takes the form of: >> > >> > String: EntityString >> > Float: Confidence >> > >> > I want to add these to a generic "Tags" field (for faceting), but I'm not >> > sure how to hold onto the confidence. Token Payloads seem like one >> method, >> > but then I'm not sure how to extract the Payload. >> > >> > Alternatively I could create two fields: TagIndexed which stores just the >> > string value and TagStored which contains a delimited String|Float. >> > >> > What's the right way to do this? >> > >> > Thanks! >> > >> > -D
Re: logging in solr
On 8/18/2014 2:43 PM, M, Arjun (NSN - IN/Bangalore) wrote: > Currently in my component Solr is logging to catalina.out. What is > the configuration needed to redirect those logs to some custom logfile eg: > Solr.log. Solr uses the slf4j library for logging. Simply change your program to use slf4j, and very likely the logs will go to the same place the Solr logs do. http://www.slf4j.org/manual.html See also the wiki page on logging jars and Solr: http://wiki.apache.org/solr/SolrLogging Thanks, Shawn
[ANNOUNCE] [SECURITY] Recommendation to update Apache POI in Apache Solr 4.8.0, 4.8.1, and 4.9.0 installations
Hallo Apache Solr Users, the Apache Lucene PMC wants to make the users of Solr aware of the following issue: Apache Solr versions 4.8.0, 4.8.1, 4.9.0 bundle Apache POI 3.10-beta2 with its binary release tarball. This version (and all previous ones) of Apache POI are vulnerable to the following issues: = CVE-2014-3529: XML External Entity (XXE) problem in Apache POI's OpenXML parser = Type: Information disclosure Description: Apache POI uses Java's XML components to parse OpenXML files produced by Microsoft Office products (DOCX, XLSX, PPTX,...). Applications that accept such files from end-users are vulnerable to XML External Entity (XXE) attacks, which allows remote attackers to bypass security restrictions and read arbitrary files via a crafted OpenXML document that provides an XML external entity declaration in conjunction with an entity reference. = CVE-2014-3574: XML Entity Expansion (XEE) problem in Apache POI's OpenXML parser = Type: Denial of service Description: Apache POI uses Java's XML components and Apache Xmlbeans to parse OpenXML files produced by Microsoft Office products (DOCX, XLSX, PPTX,...). Applications that accept such files from end-users are vulnerable to XML Entity Expansion (XEE) attacks ("XML bombs"), which allows remote hackers to consume large amounts of CPU resources. The Apache POI PMC released a bugfix version (3.10.1) today. Solr users are affected by these issues, if they enable the "Apache Solr Content Extraction Library (Solr Cell)" contrib module from the folder "contrib/extraction" of the release tarball. Users of Apache Solr are strongly advised to keep the module disabled if they don't use it. Alternatively, users of Apache Solr 4.8.0, 4.8.1, or 4.9.0 can update the affected libraries by replacing the vulnerable JAR files in the distribution folder. Users of previous versions have to update their Solr release first, patching older versions is impossible. To replace the vulnerable JAR files follow these steps: - Download the Apache POI 3.10.1 binary release: http://poi.apache.org/download.html#POI-3.10.1 - Unzip the archive - Delete the following files in your "solr-4.X.X/contrib/extraction/lib" folder: # poi-3.10-beta2.jar # poi-ooxml-3.10-beta2.jar # poi-ooxml-schemas-3.10-beta2.jar # poi-scratchpad-3.10-beta2.jar # xmlbeans-2.3.0.jar - Copy the following files from the base folder of the Apache POI distribution to the "solr-4.X.X/contrib/extraction/lib" folder: # poi-3.10.1-20140818.jar # poi-ooxml-3.10.1-20140818.jar # poi-ooxml-schemas-3.10.1-20140818.jar # poi-scratchpad-3.10.1-20140818.jar - Copy "xmlbeans-2.6.0.jar" from POI's "ooxml-lib/" folder to the "solr-4.X.X/contrib/extraction/lib" folder. - Verify that the "solr-4.X.X/contrib/extraction/lib" no longer contains any files with version number "3.10-beta2". - Verify that the folder contains one xmlbeans JAR file with version 2.6.0. If you just want to disable extraction of Microsoft Office documents, delete the files above and don't replace them. "Solr Cell" will automatically detect this and disable Microsoft Office document extraction. Coming versions of Apache Solr will have the updated libraries bundled. Happy Searching and Extracting, The Apache Lucene Developers PS: Thanks to Stefan Kopf, Mike Boufford, and Christian Schneider for reporting these issues! - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
Apache Solr Wiki
Dear Solr Wiki admin, We are using Solr for our multilingual asian language keywords search, as well as visual similarity search engine (via pixolution plugin). We would like to update the Powered by Solr section. As well as help to add on to the knowledge base for other Solr setups. Can you add me, username "MarkSun" as a contributor to the wiki? Thank you! Cheers, Mark Sun CTO MotionElements Pte Ltd 190 Middle Road, #10-05 Fortune Centre Singapore 188979 mark...@motionelements.com www.motionelements.com = Asia-inspired Stock Animation | Video Footage l AE Template online marketplace = This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation.
Re: Apache Solr Wiki
Done, you should have edit rights now! Best, Erick On Mon, Aug 18, 2014 at 6:01 PM, Mark Sun wrote: > Dear Solr Wiki admin, > > We are using Solr for our multilingual asian language keywords search, as > well as visual similarity search engine (via pixolution plugin). We would > like to update the Powered by Solr section. As well as help to add on to > the knowledge base for other Solr setups. > > Can you add me, username "MarkSun" as a contributor to the wiki? > > Thank you! > > Cheers, > Mark Sun > CTO > > MotionElements Pte Ltd > 190 Middle Road, #10-05 Fortune Centre > Singapore 188979 > mark...@motionelements.com > > www.motionelements.com > = > Asia-inspired Stock Animation | Video Footage l AE Template online > marketplace > = > This message may contain confidential and/or privileged information. If > you are not the addressee or authorized to receive this for the addressee, > you must not use, copy, disclose or take any action based on this message > or any information herein. If you have received this message in error, > please advise the sender immediately by reply e-mail and delete this > message. Thank you for your cooperation.
Apache solr sink issue
Hi, I want to index a log file in Solr using Flume + Apache Solr sink Iam referring this below mentioned URL https://cwiki.apache.org/confluence/display/FLUME/How+to+Setup+Solr+Sink+for+Flume Error from flume console 2014-08-19 15:38:56,451 (concurrentUpdateScheduler-2-thread-1) [ERROR - org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.handleError(ConcurrentUpdateSolrServer.java:354)] error java.lang.Exception: Bad Request request: http://xxx.xx.xx:8983/solr/update?wt=javabin&version=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:208) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Error from solr console 473844 [qtp176433427-19] ERROR org.apache.solr.core.SolrCore â org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id Csn anyone help me with this issue and help me with the steps for integrating flume with solr sink Regards, Jeniba Johnson The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"
Re: Apache solr sink issue
Do you have this tag "id" define in your schema , it is not mandatory to have unique field but if you need it then u have to provide it else you can remove it, see below wiki page for more details http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field Some options to generate this field if your document cannot derive one https://wiki.apache.org/solr/UniqueKey On Mon, Aug 18, 2014 at 10:48 PM, Jeniba Johnson < jeniba.john...@lntinfotech.com> wrote: > Hi, > > I want to index a log file in Solr using Flume + Apache Solr sink > Iam referring this below mentioned URL > > https://cwiki.apache.org/confluence/display/FLUME/How+to+Setup+Solr+Sink+for+Flume > > > Error from flume console > 2014-08-19 15:38:56,451 (concurrentUpdateScheduler-2-thread-1) [ERROR - > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.handleError(ConcurrentUpdateSolrServer.java:354)] > error > java.lang.Exception: Bad Request > request: http://xxx.xx.xx:8983/solr/update?wt=javabin&version=2 > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:208) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:722) > > > Error from solr console > 473844 [qtp176433427-19] ERROR org.apache.solr.core.SolrCore â > org.apache.solr.common.SolrException: Document is missing mandatory > uniqueKey field: id > > > Csn anyone help me with this issue and help me with the steps for > integrating flume with solr sink > > > > Regards, > Jeniba Johnson > > > > The contents of this e-mail and any attachment(s) may contain confidential > or privileged information for the intended recipient(s). Unintended > recipients are prohibited from taking action on the basis of information in > this e-mail and using or disseminating the information, and must notify the > sender and delete it from their system. L&T Infotech will not accept > responsibility or liability for the accuracy or completeness of, or the > presence of any virus or disabling code in this e-mail" >
Any recommendation for Solr Cloud version.
Hi, I am trying to build a new Solr Cloud which will replace sold cluster ( 2 indexers + 2 searchers ). the version what I am using is 4.1. the sooner the better? i.e. version 4.9.0. Please give any suggestion for me. Thanks, Chunki.
Exact match?
If I have a long string, how do I match on 90% of the terms to see if there is a duplicate? If I add the field and index it, what is the best way to return 90%? # terms match # of terms in the field? -- Bill Bell billnb...@gmail.com cell 720-256-8076