Schema design for parent child field
Good day, I'm seeking some guidance on how best to represent the following data within a solr schema. I have a list of subjects which are detailed to n levels. Each document can contain many of these subject entities. As I see it if this had been just 1 subject per document, dynamic fields would have been a good resolution. Any suggestions on how best to create this structure in a denormalised fashion while maintaining the data integrity. For example a document could have: Subject level 1: contract Subject level 2: claims Subject level 1: patent Subject level 2: counter claims If I were to search for level 1 contract, I would only want the facet count for level 2 to contain claims and not counter claims. Any assistance in this would be much appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-design-for-parent-child-field-tp4074084.html Sent from the Solr - User mailing list archive at Nabble.com.
increase search score of certain category only for certain keyword
Hi, Currently i've certain sample data : name : summer boot category : boot shoe name : snow boot category : boot shoe name : boot pant category : pants name : modern boot pant category : pants name : modern bootcut category : pants If the keyword search "boot" , how to make the item with category "shoe" has higher rank than "pants" ? can we setting at Solr to tell solr for certain keyword we need to give "boot shoe" higher rank than other category ? Thx :) -- View this message in context: http://lucene.472066.n3.nabble.com/increase-search-score-of-certain-category-only-for-certain-keyword-tp4074051.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FileDataSource vs JdbcDataSouce (speed) Solr 3.5
Hi Mike, You could try http://wiki.apache.org/solr/UpdateCSV And make sure you commit at the very end. From: Mike L. To: "solr-user@lucene.apache.org" Sent: Saturday, June 29, 2013 3:15 AM Subject: FileDataSource vs JdbcDataSouce (speed) Solr 3.5 I've been working on improving index time with a JdbcDataSource DIH based config and found it not to be as performant as I'd hoped for, for various reasons, not specifically due to solr. With that said, I decided to switch gears a bit and test out FileDataSource setup... I assumed by eliminiating network latency, I should see drastic improvements in terms of import time..but I'm a bit surprised that this process seems to run much slower, at least the way I've initially coded it. (below) The below is a barebone file import that I wrote which consumes a tab delimited file. Nothing fancy here. The regex just seperates out the fields... Is there faster approach to doing this? If so, what is it? Also, what is the "recommended" approach in terms of index/importing data? I know thats may come across as a vague question as there are various options available, but which one would be considered the "standard" approach within a production enterprise environment. (below has been cleansed) Thanks in advance, Mike
Http status 503 Error in solr cloud setup
Hi, I setup 2 solr instances on 2 different machines and configured 2 zookeeper servers on these machines also. When I start solr on both machines and try to access the solr web-admin then I get following error on browser - "Http status 503 - server is shutting down" When I setup a single standalone solr without zookeeper, I do not get this error. Any insights ? Thanks and Regards, Sagar Chaturvedi Member Of Technical Staff NEC Technologies India, Noida [cid:image001.jpg@01CE74F4.F9A4EA60]09711931646 [cid:image002.jpg@01CE74F4.F9A4EA60] DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Schema design for parent child field
Both dynamic fields and multivalued fields are powerful Solr features that can be used to great effect, but only is used in moderation - a relatively small number of discrete values (e.g., a few dozens of strings.) Anything more complex and you are asking for trouble and creating a pseudo-schema that will be difficult to maintain or for anybody else to comprehend. So, the simple answer to your question: Flatten, in the most straightforward manner - each instance of a "record type" should be a discrete Solr document, give each "record" its own "id" to be the Solr document key/ID. Solr can support multiple document types in the same collection, or you can store each record type in separate collection. The simplest, cleanest structure is to store each record type in a separate collection and then use multiple Solr queries to emulate SQL join operations as needed. But if you would prefer to "mash" multiple record types into the same Solr collection/schema, you can do that too. Make the schema be the union of the schemas for each record type - Solr/Lucene has no significant overhead for fields which do not have values present for a given document. Each document would have a unique ID field. In addition, each document would have a parent field for each record type, so you can quickly search for all children of a given parent. You can have one common parent ID if you assign unique IDs to all children across all record types, but it can sometimes be cleaner for the child ID to reset to zero/one for each new parent. It's merely a question of whether you want to have a single key value or a tuple of key values to identify a specific child. You can duplicate a subset of the parent fields in each child to simulate the effect of a simple join in a single clean query. But you can do a separate query to get parent record details. -- Jack Krupansky -Original Message- From: Sperrink Sent: Saturday, June 29, 2013 5:08 AM To: solr-user@lucene.apache.org Subject: Schema design for parent child field Good day, I'm seeking some guidance on how best to represent the following data within a solr schema. I have a list of subjects which are detailed to n levels. Each document can contain many of these subject entities. As I see it if this had been just 1 subject per document, dynamic fields would have been a good resolution. Any suggestions on how best to create this structure in a denormalised fashion while maintaining the data integrity. For example a document could have: Subject level 1: contract Subject level 2: claims Subject level 1: patent Subject level 2: counter claims If I were to search for level 1 contract, I would only want the facet count for level 2 to contain claims and not counter claims. Any assistance in this would be much appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-design-for-parent-child-field-tp4074084.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: increase search score of certain category only for certain keyword
Use the edismax query parser with a higher boost for category than name: qf=name category^10.0 Tune the boost as needed for your app. Make sure name and category have both "text" and "string" variants - use . The string variant is good for facets, the text variant is good for keyword search. Use the text variant in qf. -- Jack Krupansky -Original Message- From: winsu Sent: Friday, June 28, 2013 9:26 PM To: solr-user@lucene.apache.org Subject: increase search score of certain category only for certain keyword Hi, Currently i've certain sample data : name : summer boot category : boot shoe name : snow boot category : boot shoe name : boot pant category : pants name : modern boot pant category : pants name : modern bootcut category : pants If the keyword search "boot" , how to make the item with category "shoe" has higher rank than "pants" ? can we setting at Solr to tell solr for certain keyword we need to give "boot shoe" higher rank than other category ? Thx :) -- View this message in context: http://lucene.472066.n3.nabble.com/increase-search-score-of-certain-category-only-for-certain-keyword-tp4074051.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: cores sharing an instance
its the singleton pattern, where in my case i want an object (which is RAM expensive) to be a centralized coordinator of application logic. thank you On Jun 29, 2013, at 1:16 AM, Shalin Shekhar Mangar wrote: > There is very little shared between multiple cores (instanceDir paths, > logging config maybe?). Why are you trying to do this? > > On Sat, Jun 29, 2013 at 1:14 AM, Peyman Faratin > wrote: >> Hi >> >> I have a multicore setup (in 4.3.0). Is it possible for one core to share an >> instance of its class with other cores at run time? i.e. >> >> At run time core 1 makes an instance of object O_i >> >> core 1 --> object O_i >> core 2 >> --- >> core n >> >> then can core K access O_i? I know they can share properties but is it >> possible to share objects? >> >> thank you >> > > > > -- > Regards, > Shalin Shekhar Mangar.
Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00
I just double check my config. We are using convertType=true. Someone else came up with the config so I am not sure why we are using it. I will try with it set to false to see if something else will break. Thanks for pointing that out. This is my first time using DIH. I really like what I have seen so far. Bill On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > The default in JdbcDataSource is to use ResultSet.getObject which > returns the underlying database's type. The type specific methods in > ResultSet are not invoked unless you are using convertType="true". > > Is MySQL actually returning java.sql.Timestamp objects? > > On Sat, Jun 29, 2013 at 5:22 AM, Bill Au wrote: > > I am running Solr 4.3.0, using DIH to import data from MySQL. I am > running > > into a very strange problem where data from a datetime column being > > imported with the right date but the time is 00:00:00. I tried using SQL > > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works. The > > raw debug response of DIH, it looks like the time porting of the datetime > > data is already 00:00:00 in Solr jdbc query result. > > > > So I looked at the source code of DIH JdbcDataSource class. It is using > > java.sql.ResultSet and its getDate() method to handle date column. The > > getDate() method returns java.sql.Date. The java api doc for > java.sql.Date > > > > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html > > > > states that: > > > > "To conform with the definition of SQL DATE, the millisecond values > wrapped > > by a java.sql.Date instance must be 'normalized' by setting the hours, > > minutes, seconds, and milliseconds to zero in the particular time zone > with > > which the instance is associated." > > > > This seems to be describing exactly my problem. Has anyone else notice > > this problem? Has anyone use DIH to index SQL datetime successfully? If > > so can you send me the relevant portion of the DIH config? > > > > Bill > > > > -- > Regards, > Shalin Shekhar Mangar. >
Re: cores sharing an instance
Cores can be reloaded, they are inside solrcore loader /I forgot the exact name/, and they will have different classloaders /that's servlet thing/, so if you want singletons you must load them outside of the core, using a parent classloader - in case of jetty, this means writing your own jetty initialization or config to force shared class loaders. or find a place inside the solr, before the core is created. Google for montysolr to see the example of the first approach. But, unless you really have no other choice, using singletons is IMHO a bad idea in this case Roman On 29 Jun 2013 10:18, "Peyman Faratin" wrote: > > its the singleton pattern, where in my case i want an object (which is RAM expensive) to be a centralized coordinator of application logic. > > thank you > > On Jun 29, 2013, at 1:16 AM, Shalin Shekhar Mangar wrote: > > > There is very little shared between multiple cores (instanceDir paths, > > logging config maybe?). Why are you trying to do this? > > > > On Sat, Jun 29, 2013 at 1:14 AM, Peyman Faratin wrote: > >> Hi > >> > >> I have a multicore setup (in 4.3.0). Is it possible for one core to share an instance of its class with other cores at run time? i.e. > >> > >> At run time core 1 makes an instance of object O_i > >> > >> core 1 --> object O_i > >> core 2 > >> --- > >> core n > >> > >> then can core K access O_i? I know they can share properties but is it possible to share objects? > >> > >> thank you > >> > > > > > > > > -- > > Regards, > > Shalin Shekhar Mangar. >
Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00
Setting convertType=false does solve the datetime issue. But there are now other columns that were working before but not working now. Since I have already done some research into the datetime to date issue and not been able to find a solution, I think I will have to keep convertType set to false and deal with the other column type that are not working now. Thanks for your help. Bill On Sat, Jun 29, 2013 at 10:24 AM, Bill Au wrote: > I just double check my config. We are using convertType=true. Someone > else came up with the config so I am not sure why we are using it. I will > try with it set to false to see if something else will break. Thanks for > pointing that out. > > This is my first time using DIH. I really like what I have seen so far. > > Bill > > > On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > >> The default in JdbcDataSource is to use ResultSet.getObject which >> returns the underlying database's type. The type specific methods in >> ResultSet are not invoked unless you are using convertType="true". >> >> Is MySQL actually returning java.sql.Timestamp objects? >> >> On Sat, Jun 29, 2013 at 5:22 AM, Bill Au wrote: >> > I am running Solr 4.3.0, using DIH to import data from MySQL. I am >> running >> > into a very strange problem where data from a datetime column being >> > imported with the right date but the time is 00:00:00. I tried using >> SQL >> > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works. The >> > raw debug response of DIH, it looks like the time porting of the >> datetime >> > data is already 00:00:00 in Solr jdbc query result. >> > >> > So I looked at the source code of DIH JdbcDataSource class. It is using >> > java.sql.ResultSet and its getDate() method to handle date column. The >> > getDate() method returns java.sql.Date. The java api doc for >> java.sql.Date >> > >> > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html >> > >> > states that: >> > >> > "To conform with the definition of SQL DATE, the millisecond values >> wrapped >> > by a java.sql.Date instance must be 'normalized' by setting the hours, >> > minutes, seconds, and milliseconds to zero in the particular time zone >> with >> > which the instance is associated." >> > >> > This seems to be describing exactly my problem. Has anyone else notice >> > this problem? Has anyone use DIH to index SQL datetime successfully? >> If >> > so can you send me the relevant portion of the DIH config? >> > >> > Bill >> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> > >
Re: broken links returned from solr search
What links? You haven't shown us what link you're clicking on that generates the 404 error. You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Jun 28, 2013 at 2:04 PM, MA LIG wrote: > Hello, > > I ran the solr example as described in > http://lucene.apache.org/solr/4_3_1/tutorial.html and then loaded some doc > files to solr as described in > http://wiki.apache.org/solr/ExtractingRequestHandler. The commands I used > to load the files were of the form > > curl " > http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F > "myfile=@test.doc" > > I can successfully see search results in > http://localhost:8983/solr/collection1/browse< > http://192.168.3.72:8983/solr/collection1/browse?q=test> > . > > However, when I click on a link, I get a 404 not found error. How can I > make these links work properly? > > Thanks in advance > > -gw >
Re: documentCache not used in 4.3.1?
It's especially weird that the hit ratio is so high and you're not seeing anything in the cache. Are you perhaps soft committing frequently? Soft commits throw away all the top-level caches including documentCache I think Erick On Fri, Jun 28, 2013 at 7:23 PM, Tim Vaillancourt wrote: > Thanks Otis, > > Yeah I realized after sending my e-mail that doc cache does not warm, > however I'm still lost on why there are no other metrics. > > Thanks! > > Tim > > > On 28 June 2013 16:22, Otis Gospodnetic > wrote: > > > Hi Tim, > > > > Not sure about the zeros in 4.3.1, but in SPM we see all these numbers > > are non-0, though I haven't had the chance to confirm with Solr 4.3.1. > > > > Note that you can't really autowarm document cache... > > > > Otis > > -- > > Solr & ElasticSearch Support -- http://sematext.com/ > > Performance Monitoring -- http://sematext.com/spm > > > > > > > > On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt > > wrote: > > > Hey guys, > > > > > > This has to be a stupid question/I must be doing something wrong, but > > after > > > frequent load testing with documentCache enabled under Solr 4.3.1 with > > > autoWarmCount=150, I'm noticing that my documentCache metrics are > always > > > zero for non-cumlative. > > > > > > At first I thought my commit rate is fast enough I just never see the > > > non-cumlative result, but after 100s of samples I still always get zero > > > values. > > > > > > Here is the current output of my documentCache from Solr's admin for 1 > > core: > > > > > > " > > > > > >- documentCache< > > > http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?entry=documentCache > > > > > > - class:org.apache.solr.search.LRUCache > > > - version:1.0 > > > - description:LRU Cache(maxSize=512, initialSize=512, > > > autowarmCount=150, regenerator=null) > > > - src:$URL: https:/ > > > /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/ > > > solr/core/src/java/org/apache/solr/search/LRUCache.java< > > > https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java > > >$ > > > - stats: > > > - lookups:0 > > > - hits:0 > > > - hitratio:0.00 > > > - inserts:0 > > > - evictions:0 > > > - size:0 > > > - warmupTime:0 > > > - cumulative_lookups:65198986 > > > - cumulative_hits:63075669 > > > - cumulative_hitratio:0.96 > > > - cumulative_inserts:2123317 > > > - cumulative_evictions:1010262 > > > " > > > > > > The cumulative values seem to rise, suggesting doc cache is working, > but > > at > > > the same time it seems I never see non-cumlative metrics, most > > importantly > > > warmupTime. > > > > > > Am I doing something wrong, is this normal/by-design, or is there an > > issue > > > here? > > > > > > Thanks for helping with my silly question! Have a good weekend, > > > > > > Tim > > >
Re: Improving performance to return 2000+ documents
Well, depending on how many docs get served from the cache the time will vary. But this is just ugly, if you can avoid this use-case it would be a Good Thing. Problem here is that each and every shard must assemble the list of 2,000 documents (just ID and sort criteria, usually score). Then the node serving the original request merges the sub-lists to pick the top 2,000. Then the node sends another request to each shard to get the full document. Then the node merges this into the full list to return to the user. Solr really isn't built for this use-case, is it actually a compelling situation? And having your document cache set at 1M is kinda high if you have very big documents. FWIW, Erick On Fri, Jun 28, 2013 at 8:44 PM, Utkarsh Sengar wrote: > Also, I don't see a consistent response time from solr, I ran ab again and > I get this: > > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 " > > http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > " > > > Benchmarking x.amazonaws.com (be patient) > Completed 100 requests > Completed 200 requests > Completed 300 requests > Completed 400 requests > Completed 500 requests > Finished 500 requests > > > Server Software: > Server Hostname: x.amazonaws.com > Server Port:8983 > > Document Path: > > /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > Document Length:1538537 bytes > > Concurrency Level: 10 > Time taken for tests: 10.858 seconds > Complete requests: 500 > Failed requests:8 >(Connect: 0, Receive: 0, Length: 8, Exceptions: 0) > Write errors: 0 > Total transferred: 769297992 bytes > HTML transferred: 769268492 bytes > Requests per second:46.05 [#/sec] (mean) > Time per request: 217.167 [ms] (mean) > Time per request: 21.717 [ms] (mean, across all concurrent requests) > Transfer rate: 69187.90 [Kbytes/sec] received > > Connection Times (ms) > min mean[+/-sd] median max > Connect:00 0.3 0 2 > Processing: 110 215 72.0190 497 > Waiting: 91 180 70.5152 473 > Total:112 216 72.0191 497 > > Percentage of the requests served within a certain time (ms) > 50%191 > 66%225 > 75%252 > 80%272 > 90%319 > 95%364 > 98%420 > 99%453 > 100%497 (longest request) > > > Sometimes it takes a lot of time, sometimes its pretty quick. > > Thanks, > -Utkarsh > > > On Fri, Jun 28, 2013 at 5:39 PM, Utkarsh Sengar >wrote: > > > Hello, > > > > I have a usecase where I need to retrive top 2000 documents matching a > > query. > > What are the parameters (in query, solrconfig, schema) I shoud look at to > > improve this? > > > > I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB > > RAM, 8vCPU and 7GB JVM heap size. > > > > I have documentCache: > >> initialSize="100" autowarmCount="0"/> > > > > allText is a copyField. > > > > This is the result I get: > > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 " > > > http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > " > > > > Benchmarking x.amazonaws.com (be patient) > > Completed 100 requests > > Completed 200 requests > > Completed 300 requests > > Completed 400 requests > > Completed 500 requests > > Finished 500 requests > > > > > > Server Software: > > Server Hostname:x.amazonaws.com > > Server Port:8983 > > > > Document Path: > > > /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > Document Length:1538537 bytes > > > > Concurrency Level: 10 > > Time taken for tests: 35.999 seconds > > Complete requests: 500 > > Failed requests:21 > >(Connect: 0, Receive: 0, Length: 21, Exceptions: 0) > > Write errors: 0 > > Non-2xx responses: 2 > > Total transferred: 766221660 bytes > > HTML transferred: 766191806 bytes > > Requests per second:13.89 [#/sec] (mean) > > Time per request: 719.981 [ms] (mean) > > Time per request: 71.998 [ms] (mean, across all concurrent > requests) > > Transfer rate: 20785.65 [Kbytes/sec] received > > > > Connection Times (ms) > > min mean[+/-sd] median max > > Connect:00 0.6 0 8 > > Processing: 9 717 2339.6199 12611 > > Waiting:9 635 2233.6164 12580 > > Total: 9 718 2339.6199 12611 > > > > Percentage of the requests served within a certain time (ms) > > 50%199 > > 66%236 > > 75%263 > > 80%281 > > 90%548 > > 95%838 > > 98% 12475 > > 99% 12545 > > 100% 12611 (longest request) > > > > -- > > Thanks, > > -Utkarsh > > > > > > -- > Thanks, > -Utkarsh >
Re: FileDataSource vs JdbcDataSouce (speed) Solr 3.5
Mike: One issue is that you're forcing all the work onto the Solr server, and single-threading to boot by using DIH. You can consider moving to a SolrJ model where you can have N clients sending data to Solr if you can partition the data up amongst the N clients cleanly. FWIW, Erick On Sat, Jun 29, 2013 at 8:20 AM, Ahmet Arslan wrote: > Hi Mike, > > > You could try http://wiki.apache.org/solr/UpdateCSV > > And make sure you commit at the very end. > > > > > > From: Mike L. > To: "solr-user@lucene.apache.org" > Sent: Saturday, June 29, 2013 3:15 AM > Subject: FileDataSource vs JdbcDataSouce (speed) Solr 3.5 > > > > I've been working on improving index time with a JdbcDataSource DIH based > config and found it not to be as performant as I'd hoped for, for various > reasons, not specifically due to solr. With that said, I decided to switch > gears a bit and test out FileDataSource setup... I assumed by eliminiating > network latency, I should see drastic improvements in terms of import > time..but I'm a bit surprised that this process seems to run much slower, > at least the way I've initially coded it. (below) > > The below is a barebone file import that I wrote which consumes a tab > delimited file. Nothing fancy here. The regex just seperates out the > fields... Is there faster approach to doing this? If so, what is it? > > Also, what is the "recommended" approach in terms of index/importing data? > I know thats may come across as a vague question as there are various > options available, but which one would be considered the "standard" > approach within a production enterprise environment. > > > (below has been cleansed) > > > > >processor="LineEntityProcessor" > url="[location_of_file]/file.csv" > dataSource="file" > transformer="RegexTransformer,TemplateTransformer"> > > regex="^(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)$" > > groupNames="field1,field2,field3,field4,field5,field6,field7,field8,field9,field10,field11,field12,field13,field14,field15,field16,field17,field18,field19,field10,field11,field12" > /> > > > > > Thanks in advance, > Mike >
Re: cores sharing an instance
Well, the code is all in the same JVM, so there's no reason a singleton approach wouldn't work that I can think of. All the multithreaded caveats apply. Best Erick On Fri, Jun 28, 2013 at 3:44 PM, Peyman Faratin wrote: > Hi > > I have a multicore setup (in 4.3.0). Is it possible for one core to share > an instance of its class with other cores at run time? i.e. > > At run time core 1 makes an instance of object O_i > > core 1 --> object O_i > core 2 > --- > core n > > then can core K access O_i? I know they can share properties but is it > possible to share objects? > > thank you > >
Re: broken links returned from solr search
Sorry, i thought it was obvious. The links that are broken are the links that are returned in the search results. Using the example in the documentation I mentioned below, to load a word doc via curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F "myfile=@myworddoc.doc" the broken link that shows up in the search results is http://localhost:8983/solr/collection1/doc1 so I just need to know where in the solr config to be able to handle requests when the URL points to collection/some_doc On Jun 29, 2013, at 1:08 PM, Erick Erickson wrote: > What links? You haven't shown us what link you're clicking on > that generates the 404 error. > > You might want to review: > http://wiki.apache.org/solr/UsingMailingLists > > Best > Erick > > > On Fri, Jun 28, 2013 at 2:04 PM, MA LIG wrote: > >> Hello, >> >> I ran the solr example as described in >> http://lucene.apache.org/solr/4_3_1/tutorial.html and then loaded some doc >> files to solr as described in >> http://wiki.apache.org/solr/ExtractingRequestHandler. The commands I used >> to load the files were of the form >> >> curl " >> http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F >> "myfile=@test.doc" >> >> I can successfully see search results in >> http://localhost:8983/solr/collection1/browse< >> http://192.168.3.72:8983/solr/collection1/browse?q=test> >> . >> >> However, when I click on a link, I get a 404 not found error. How can I >> make these links work properly? >> >> Thanks in advance >> >> -gw >>
Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00
So disabling convertType does provide a workaround for my problem with datetime column. But the problem still exists when convertType is enabled because DIH is not doing the conversion correctly for a solr date field. Solr date field does have a time portion but java.sql.Date does not. So DIH should not be calling ResultSet.getDate() for a solr date field. It should really be calling ResultSet.getTimestamp() instead. Is the fix this simple? Am I missing anything? If the fix is this simple I can submit and commit a patch to DIH. Bill On Sat, Jun 29, 2013 at 12:13 PM, Bill Au wrote: > Setting convertType=false does solve the datetime issue. But there are > now other columns that were working before but not working now. Since I > have already done some research into the datetime to date issue and not > been able to find a solution, I think I will have to keep convertType set > to false and deal with the other column type that are not working now. > > Thanks for your help. > > Bill > > > On Sat, Jun 29, 2013 at 10:24 AM, Bill Au wrote: > >> I just double check my config. We are using convertType=true. Someone >> else came up with the config so I am not sure why we are using it. I will >> try with it set to false to see if something else will break. Thanks for >> pointing that out. >> >> This is my first time using DIH. I really like what I have seen so far. >> >> Bill >> >> >> On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar < >> shalinman...@gmail.com> wrote: >> >>> The default in JdbcDataSource is to use ResultSet.getObject which >>> returns the underlying database's type. The type specific methods in >>> ResultSet are not invoked unless you are using convertType="true". >>> >>> Is MySQL actually returning java.sql.Timestamp objects? >>> >>> On Sat, Jun 29, 2013 at 5:22 AM, Bill Au wrote: >>> > I am running Solr 4.3.0, using DIH to import data from MySQL. I am >>> running >>> > into a very strange problem where data from a datetime column being >>> > imported with the right date but the time is 00:00:00. I tried using >>> SQL >>> > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works. >>> The >>> > raw debug response of DIH, it looks like the time porting of the >>> datetime >>> > data is already 00:00:00 in Solr jdbc query result. >>> > >>> > So I looked at the source code of DIH JdbcDataSource class. It is >>> using >>> > java.sql.ResultSet and its getDate() method to handle date column. The >>> > getDate() method returns java.sql.Date. The java api doc for >>> java.sql.Date >>> > >>> > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html >>> > >>> > states that: >>> > >>> > "To conform with the definition of SQL DATE, the millisecond values >>> wrapped >>> > by a java.sql.Date instance must be 'normalized' by setting the hours, >>> > minutes, seconds, and milliseconds to zero in the particular time zone >>> with >>> > which the instance is associated." >>> > >>> > This seems to be describing exactly my problem. Has anyone else notice >>> > this problem? Has anyone use DIH to index SQL datetime successfully? >>> If >>> > so can you send me the relevant portion of the DIH config? >>> > >>> > Bill >>> >>> >>> >>> -- >>> Regards, >>> Shalin Shekhar Mangar. >>> >> >> >
Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00
Yes we need to use getTimestamp instead of getDate. Please create an issue. On Sat, Jun 29, 2013 at 11:48 PM, Bill Au wrote: > So disabling convertType does provide a workaround for my problem with > datetime column. But the problem still exists when convertType is enabled > because DIH is not doing the conversion correctly for a solr date field. > Solr date field does have a time portion but java.sql.Date does not. So > DIH should not be calling ResultSet.getDate() for a solr date field. It > should really be calling ResultSet.getTimestamp() instead. Is the fix this > simple? Am I missing anything? > > If the fix is this simple I can submit and commit a patch to DIH. > > Bill > > > On Sat, Jun 29, 2013 at 12:13 PM, Bill Au wrote: > >> Setting convertType=false does solve the datetime issue. But there are >> now other columns that were working before but not working now. Since I >> have already done some research into the datetime to date issue and not >> been able to find a solution, I think I will have to keep convertType set >> to false and deal with the other column type that are not working now. >> >> Thanks for your help. >> >> Bill >> >> >> On Sat, Jun 29, 2013 at 10:24 AM, Bill Au wrote: >> >>> I just double check my config. We are using convertType=true. Someone >>> else came up with the config so I am not sure why we are using it. I will >>> try with it set to false to see if something else will break. Thanks for >>> pointing that out. >>> >>> This is my first time using DIH. I really like what I have seen so far. >>> >>> Bill >>> >>> >>> On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar < >>> shalinman...@gmail.com> wrote: >>> The default in JdbcDataSource is to use ResultSet.getObject which returns the underlying database's type. The type specific methods in ResultSet are not invoked unless you are using convertType="true". Is MySQL actually returning java.sql.Timestamp objects? On Sat, Jun 29, 2013 at 5:22 AM, Bill Au wrote: > I am running Solr 4.3.0, using DIH to import data from MySQL. I am running > into a very strange problem where data from a datetime column being > imported with the right date but the time is 00:00:00. I tried using SQL > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works. The > raw debug response of DIH, it looks like the time porting of the datetime > data is already 00:00:00 in Solr jdbc query result. > > So I looked at the source code of DIH JdbcDataSource class. It is using > java.sql.ResultSet and its getDate() method to handle date column. The > getDate() method returns java.sql.Date. The java api doc for java.sql.Date > > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html > > states that: > > "To conform with the definition of SQL DATE, the millisecond values wrapped > by a java.sql.Date instance must be 'normalized' by setting the hours, > minutes, seconds, and milliseconds to zero in the particular time zone with > which the instance is associated." > > This seems to be describing exactly my problem. Has anyone else notice > this problem? Has anyone use DIH to index SQL datetime successfully? If > so can you send me the relevant portion of the DIH config? > > Bill -- Regards, Shalin Shekhar Mangar. >>> >>> >> -- Regards, Shalin Shekhar Mangar.
RE: documentCache not used in 4.3.1?
Yes, we are softCommit'ing every 1000ms, but that should be enough time to see metrics though, right? For example, I still get non-cumulative metrics from the other caches (which are also throw away). I've also curl/sampled enough that I probably should have seen a value by now. If anyone else can reproduce this on 4.3.1 I will feel less crazy :). Cheers, Tim -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, June 29, 2013 10:13 AM To: solr-user@lucene.apache.org Subject: Re: documentCache not used in 4.3.1? It's especially weird that the hit ratio is so high and you're not seeing anything in the cache. Are you perhaps soft committing frequently? Soft commits throw away all the top-level caches including documentCache I think Erick On Fri, Jun 28, 2013 at 7:23 PM, Tim Vaillancourt wrote: > Thanks Otis, > > Yeah I realized after sending my e-mail that doc cache does not warm, > however I'm still lost on why there are no other metrics. > > Thanks! > > Tim > > > On 28 June 2013 16:22, Otis Gospodnetic > wrote: > > > Hi Tim, > > > > Not sure about the zeros in 4.3.1, but in SPM we see all these > > numbers are non-0, though I haven't had the chance to confirm with Solr > > 4.3.1. > > > > Note that you can't really autowarm document cache... > > > > Otis > > -- > > Solr & ElasticSearch Support -- http://sematext.com/ Performance > > Monitoring -- http://sematext.com/spm > > > > > > > > On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt > > > > wrote: > > > Hey guys, > > > > > > This has to be a stupid question/I must be doing something wrong, > > > but > > after > > > frequent load testing with documentCache enabled under Solr 4.3.1 > > > with autoWarmCount=150, I'm noticing that my documentCache metrics > > > are > always > > > zero for non-cumlative. > > > > > > At first I thought my commit rate is fast enough I just never see > > > the non-cumlative result, but after 100s of samples I still always > > > get zero values. > > > > > > Here is the current output of my documentCache from Solr's admin > > > for 1 > > core: > > > > > > " > > > > > >- documentCache< > > > http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?en > try=documentCache > > > > > > - class:org.apache.solr.search.LRUCache > > > - version:1.0 > > > - description:LRU Cache(maxSize=512, initialSize=512, > > > autowarmCount=150, regenerator=null) > > > - src:$URL: https:/ > > > /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/ > > > solr/core/src/java/org/apache/solr/search/LRUCache.java< > > > https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/s > olr/core/src/java/org/apache/solr/search/LRUCache.java > > >$ > > > - stats: > > > - lookups:0 > > > - hits:0 > > > - hitratio:0.00 > > > - inserts:0 > > > - evictions:0 > > > - size:0 > > > - warmupTime:0 > > > - cumulative_lookups:65198986 > > > - cumulative_hits:63075669 > > > - cumulative_hitratio:0.96 > > > - cumulative_inserts:2123317 > > > - cumulative_evictions:1010262 > > > " > > > > > > The cumulative values seem to rise, suggesting doc cache is > > > working, > but > > at > > > the same time it seems I never see non-cumlative metrics, most > > importantly > > > warmupTime. > > > > > > Am I doing something wrong, is this normal/by-design, or is there > > > an > > issue > > > here? > > > > > > Thanks for helping with my silly question! Have a good weekend, > > > > > > Tim > > >
Re: broken links returned from solr search
There's nothing built into the indexing process that stores URLs allowing you to fetch the document, you have to do that yourself. I'm not sure how the link is getting into the search results, you're assigning "doc1" as the ID of the doc, and I think the browse request handler, aka Solaritas is constructing the link as best it can. But that is only demo code, not intended to fetch the document. In a typical app, you'll construct a URL for display that has meaning in _your_ environment, typically some way for the app server to know where the document is and how to fetch it. the browse request handler is showing you how you'd do this, but isn't meant to actually fetch the doc. Best Erick On Sat, Jun 29, 2013 at 1:29 PM, gilawem wrote: > Sorry, i thought it was obvious. The links that are broken are the links > that are returned in the search results. Using the example in the > documentation I mentioned below, to load a word doc via > curl " > http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F > "myfile=@myworddoc.doc" > > the broken link that shows up in the search results is > http://localhost:8983/solr/collection1/doc1 > > so I just need to know where in the solr config to be able to handle > requests when the URL points to collection/some_doc > > > On Jun 29, 2013, at 1:08 PM, Erick Erickson wrote: > > > What links? You haven't shown us what link you're clicking on > > that generates the 404 error. > > > > You might want to review: > > http://wiki.apache.org/solr/UsingMailingLists > > > > Best > > Erick > > > > > > On Fri, Jun 28, 2013 at 2:04 PM, MA LIG wrote: > > > >> Hello, > >> > >> I ran the solr example as described in > >> http://lucene.apache.org/solr/4_3_1/tutorial.html and then loaded some > doc > >> files to solr as described in > >> http://wiki.apache.org/solr/ExtractingRequestHandler. The commands I > used > >> to load the files were of the form > >> > >> curl " > >> http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; > -F > >> "myfile=@test.doc" > >> > >> I can successfully see search results in > >> http://localhost:8983/solr/collection1/browse< > >> http://192.168.3.72:8983/solr/collection1/browse?q=test> > >> . > >> > >> However, when I click on a link, I get a 404 not found error. How can I > >> make these links work properly? > >> > >> Thanks in advance > >> > >> -gw > >> > >
Re: documentCache not used in 4.3.1?
Tim: Yeah, this doesn't make much sense to me either since, as you say, you should be seeing some metrics upon occasion. But do note that the underlying cache only gets filled when getting documents to return in query results, since there's no autowarming going on it may come and go. But you can test this pretty quickly by lengthening your autocommit interval or just not indexing anything for a while, then run a bunch of queries and look at your cache stats. That'll at least tell you whether it works at all. You'll have to have hard commits turned off (or openSearcher set to 'false') for that check too. Best Erick On Sat, Jun 29, 2013 at 2:48 PM, Vaillancourt, Tim wrote: > Yes, we are softCommit'ing every 1000ms, but that should be enough time to > see metrics though, right? For example, I still get non-cumulative metrics > from the other caches (which are also throw away). I've also curl/sampled > enough that I probably should have seen a value by now. > > If anyone else can reproduce this on 4.3.1 I will feel less crazy :). > > Cheers, > > Tim > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Saturday, June 29, 2013 10:13 AM > To: solr-user@lucene.apache.org > Subject: Re: documentCache not used in 4.3.1? > > It's especially weird that the hit ratio is so high and you're not seeing > anything in the cache. Are you perhaps soft committing frequently? Soft > commits throw away all the top-level caches including documentCache I > think > > Erick > > > On Fri, Jun 28, 2013 at 7:23 PM, Tim Vaillancourt >wrote: > > > Thanks Otis, > > > > Yeah I realized after sending my e-mail that doc cache does not warm, > > however I'm still lost on why there are no other metrics. > > > > Thanks! > > > > Tim > > > > > > On 28 June 2013 16:22, Otis Gospodnetic > > wrote: > > > > > Hi Tim, > > > > > > Not sure about the zeros in 4.3.1, but in SPM we see all these > > > numbers are non-0, though I haven't had the chance to confirm with > Solr 4.3.1. > > > > > > Note that you can't really autowarm document cache... > > > > > > Otis > > > -- > > > Solr & ElasticSearch Support -- http://sematext.com/ Performance > > > Monitoring -- http://sematext.com/spm > > > > > > > > > > > > On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt > > > > > > wrote: > > > > Hey guys, > > > > > > > > This has to be a stupid question/I must be doing something wrong, > > > > but > > > after > > > > frequent load testing with documentCache enabled under Solr 4.3.1 > > > > with autoWarmCount=150, I'm noticing that my documentCache metrics > > > > are > > always > > > > zero for non-cumlative. > > > > > > > > At first I thought my commit rate is fast enough I just never see > > > > the non-cumlative result, but after 100s of samples I still always > > > > get zero values. > > > > > > > > Here is the current output of my documentCache from Solr's admin > > > > for 1 > > > core: > > > > > > > > " > > > > > > > >- documentCache< > > > > > http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?en > > try=documentCache > > > > > > > > - class:org.apache.solr.search.LRUCache > > > > - version:1.0 > > > > - description:LRU Cache(maxSize=512, initialSize=512, > > > > autowarmCount=150, regenerator=null) > > > > - src:$URL: https:/ > > > > /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/ > > > > solr/core/src/java/org/apache/solr/search/LRUCache.java< > > > > > https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/s > > olr/core/src/java/org/apache/solr/search/LRUCache.java > > > >$ > > > > - stats: > > > > - lookups:0 > > > > - hits:0 > > > > - hitratio:0.00 > > > > - inserts:0 > > > > - evictions:0 > > > > - size:0 > > > > - warmupTime:0 > > > > - cumulative_lookups:65198986 > > > > - cumulative_hits:63075669 > > > > - cumulative_hitratio:0.96 > > > > - cumulative_inserts:2123317 > > > > - cumulative_evictions:1010262 > > > > " > > > > > > > > The cumulative values seem to rise, suggesting doc cache is > > > > working, > > but > > > at > > > > the same time it seems I never see non-cumlative metrics, most > > > importantly > > > > warmupTime. > > > > > > > > Am I doing something wrong, is this normal/by-design, or is there > > > > an > > > issue > > > > here? > > > > > > > > Thanks for helping with my silly question! Have a good weekend, > > > > > > > > Tim > > > > > >
Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00
https://issues.apache.org/jira/browse/SOLR-4978 On Sat, Jun 29, 2013 at 2:33 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Yes we need to use getTimestamp instead of getDate. Please create an issue. > > On Sat, Jun 29, 2013 at 11:48 PM, Bill Au wrote: > > So disabling convertType does provide a workaround for my problem with > > datetime column. But the problem still exists when convertType is > enabled > > because DIH is not doing the conversion correctly for a solr date field. > > Solr date field does have a time portion but java.sql.Date does not. So > > DIH should not be calling ResultSet.getDate() for a solr date field. It > > should really be calling ResultSet.getTimestamp() instead. Is the fix > this > > simple? Am I missing anything? > > > > If the fix is this simple I can submit and commit a patch to DIH. > > > > Bill > > > > > > On Sat, Jun 29, 2013 at 12:13 PM, Bill Au wrote: > > > >> Setting convertType=false does solve the datetime issue. But there are > >> now other columns that were working before but not working now. Since I > >> have already done some research into the datetime to date issue and not > >> been able to find a solution, I think I will have to keep convertType > set > >> to false and deal with the other column type that are not working now. > >> > >> Thanks for your help. > >> > >> Bill > >> > >> > >> On Sat, Jun 29, 2013 at 10:24 AM, Bill Au wrote: > >> > >>> I just double check my config. We are using convertType=true. Someone > >>> else came up with the config so I am not sure why we are using it. I > will > >>> try with it set to false to see if something else will break. Thanks > for > >>> pointing that out. > >>> > >>> This is my first time using DIH. I really like what I have seen so > far. > >>> > >>> Bill > >>> > >>> > >>> On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar < > >>> shalinman...@gmail.com> wrote: > >>> > The default in JdbcDataSource is to use ResultSet.getObject which > returns the underlying database's type. The type specific methods in > ResultSet are not invoked unless you are using convertType="true". > > Is MySQL actually returning java.sql.Timestamp objects? > > On Sat, Jun 29, 2013 at 5:22 AM, Bill Au wrote: > > I am running Solr 4.3.0, using DIH to import data from MySQL. I am > running > > into a very strange problem where data from a datetime column being > > imported with the right date but the time is 00:00:00. I tried > using > SQL > > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works. > The > > raw debug response of DIH, it looks like the time porting of the > datetime > > data is already 00:00:00 in Solr jdbc query result. > > > > So I looked at the source code of DIH JdbcDataSource class. It is > using > > java.sql.ResultSet and its getDate() method to handle date column. > The > > getDate() method returns java.sql.Date. The java api doc for > java.sql.Date > > > > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html > > > > states that: > > > > "To conform with the definition of SQL DATE, the millisecond values > wrapped > > by a java.sql.Date instance must be 'normalized' by setting the > hours, > > minutes, seconds, and milliseconds to zero in the particular time > zone > with > > which the instance is associated." > > > > This seems to be describing exactly my problem. Has anyone else > notice > > this problem? Has anyone use DIH to index SQL datetime > successfully? > If > > so can you send me the relevant portion of the DIH config? > > > > Bill > > > > -- > Regards, > Shalin Shekhar Mangar. > > >>> > >>> > >> > > > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Improving performance to return 2000+ documents
Hello Utkarsh, This may or may not be relevant for your use-case, but the way we deal with this scenario is to retrieve the top N documents 5,10,20or100 at a time (user selectable). We can then page the results, changing the start parameter to return the next set. This allows us to 'retrieve' millions of documents - we just do it at the user's leisure, rather than make them wait for the whole lot in one go. This works well because users very rarely want to see ALL 2000 (or whatever number) documents at one time - it's simply too much to take in at one time. If your use-case involves an automated or offline procedure (e.g. running a report or some data-mining op), then presumably it doesn't matter so much it takes a bit longer (as long as it returns in some reasonble time). Have you looked at doing paging on the client-side - this will hugely speed-up your search time. HTH Peter On Sat, Jun 29, 2013 at 6:17 PM, Erick Erickson wrote: > Well, depending on how many docs get served > from the cache the time will vary. But this is > just ugly, if you can avoid this use-case it would > be a Good Thing. > > Problem here is that each and every shard must > assemble the list of 2,000 documents (just ID and > sort criteria, usually score). > > Then the node serving the original request merges > the sub-lists to pick the top 2,000. Then the node > sends another request to each shard to get > the full document. Then the node merges this > into the full list to return to the user. > > Solr really isn't built for this use-case, is it actually > a compelling situation? > > And having your document cache set at 1M is kinda > high if you have very big documents. > > FWIW, > Erick > > > On Fri, Jun 28, 2013 at 8:44 PM, Utkarsh Sengar >wrote: > > > Also, I don't see a consistent response time from solr, I ran ab again > and > > I get this: > > > > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 " > > > > > http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > " > > > > > > Benchmarking x.amazonaws.com (be patient) > > Completed 100 requests > > Completed 200 requests > > Completed 300 requests > > Completed 400 requests > > Completed 500 requests > > Finished 500 requests > > > > > > Server Software: > > Server Hostname: x.amazonaws.com > > Server Port:8983 > > > > Document Path: > > > > > /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > Document Length:1538537 bytes > > > > Concurrency Level: 10 > > Time taken for tests: 10.858 seconds > > Complete requests: 500 > > Failed requests:8 > >(Connect: 0, Receive: 0, Length: 8, Exceptions: 0) > > Write errors: 0 > > Total transferred: 769297992 bytes > > HTML transferred: 769268492 bytes > > Requests per second:46.05 [#/sec] (mean) > > Time per request: 217.167 [ms] (mean) > > Time per request: 21.717 [ms] (mean, across all concurrent > requests) > > Transfer rate: 69187.90 [Kbytes/sec] received > > > > Connection Times (ms) > > min mean[+/-sd] median max > > Connect:00 0.3 0 2 > > Processing: 110 215 72.0190 497 > > Waiting: 91 180 70.5152 473 > > Total:112 216 72.0191 497 > > > > Percentage of the requests served within a certain time (ms) > > 50%191 > > 66%225 > > 75%252 > > 80%272 > > 90%319 > > 95%364 > > 98%420 > > 99%453 > > 100%497 (longest request) > > > > > > Sometimes it takes a lot of time, sometimes its pretty quick. > > > > Thanks, > > -Utkarsh > > > > > > On Fri, Jun 28, 2013 at 5:39 PM, Utkarsh Sengar > >wrote: > > > > > Hello, > > > > > > I have a usecase where I need to retrive top 2000 documents matching a > > > query. > > > What are the parameters (in query, solrconfig, schema) I shoud look at > to > > > improve this? > > > > > > I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB > > > RAM, 8vCPU and 7GB JVM heap size. > > > > > > I have documentCache: > > >> > initialSize="100" autowarmCount="0"/> > > > > > > allText is a copyField. > > > > > > This is the result I get: > > > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 " > > > > > > http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > > " > > > > > > Benchmarking x.amazonaws.com (be patient) > > > Completed 100 requests > > > Completed 200 requests > > > Completed 300 requests > > > Completed 400 requests > > > Completed 500 requests > > > Finished 500 requests > > > > > > > > > Server Software: > > > Server Hostname:x.amazonaws.com > > > Server Port:8983 > > > > > > Document Path: > > > > > > /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > > Document Length:1538537 bytes > > > > > > Concurrency Level: 10 > > > Time taken for tests: 35.999
Re: Varnish
OK. Here is the answer for us. Here is a sample default.vcl. We are validating the LastModified ( if (!beresp.http.last-modified) ) is changing when the core is indexed and the version changes of the index. This does 10 minutes caching and a 1hr grace period (if solr is down, it will deliver results up to 1 hr). This uses the URL for caching. You can also do: http://localhost?PURGEME To clear varnish if your IP is in the ACL list. backend server1 { .host = "XXX.domain.com"; .port = "8983"; .probe = { .url = "/solr/pingall/select/?q=*%3A*"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } backend server2{ .host = "XXX1.domain.com"; .port = "8983"; .probe = { .url = "/solr/pingall/select/?q=*%3A*"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } backend server3{ .host = "XXX2.domain.com"; .port = "8983"; .probe = { .url = "/solr/pingall/select/?q=*%3A*"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } backend server4{ .host = "XXX3.domain.com"; .port = "8983"; .probe = { .url = "/solr/pingall/select/?q=*%3A*"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } director default round-robin { { .backend = server1; } { .backend = server2; } { .backend = server3; } { .backend = server4; } } acl purge { "localhost"; "10.0.1.0"/24; "10.0.3.0"/24; } sub vcl_recv { if (req.url ~ "\?PURGEME$") { if (!client.ip ~ purge) { error 405 "Not allowed. " + client.ip; } ban("req.url ~ /"); error 200 "Cached Cleared"; } remove req.http.Cookie; if (req.backend.healthy) { set req.grace = 15s; } else { set req.grace = 1h; } return (lookup); } sub vcl_fetch { set beresp.grace = 1h; if (!beresp.http.last-modified) { set beresp.ttl = 600s; } if (beresp.ttl < 600s) { set beresp.ttl = 600s; } unset beresp.http.Set-Cookie; } sub vcl_deliver { if (obj.hits > 0) { set resp.http.X-Cache = "HIT"; } else { set resp.http.X-Cache = "MISS"; } } sub vcl_hash { hash_data(req.url); return (hash); } On Tue, Jun 25, 2013 at 4:44 PM, Learner wrote: > Check this link.. > http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Varnish-tp4072057p4073205.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Varnish
On a large website, by putting 1 varnish in front of all 4 SOLR boxes we were able to trim 25% off the load time (TTFB) of the page. Our hit ratio was between 55 and 75%. We gave varnish 24GB of RAM, and was not able to fill it under full load with a 10 minute cache timeout. We get about 2.4M SOLR calls every 15 to 20 minutes. One varnish was able to handle it with almost no lingering connections, and load average of < 1. Varnish is very optimized and worth trying. On Sat, Jun 29, 2013 at 6:47 PM, William Bell wrote: > OK. > > Here is the answer for us. Here is a sample default.vcl. We are validating > the LastModified ( if (!beresp.http.last-modified) ) > is changing when the core is indexed and the version changes of the index. > > This does 10 minutes caching and a 1hr grace period (if solr is down, it > will deliver results up to 1 hr). > > This uses the URL for caching. > > You can also do: > > http://localhost?PURGEME > > To clear varnish if your IP is in the ACL list. > > > backend server1 { > .host = "XXX.domain.com"; > .port = "8983"; > .probe = { > .url = "/solr/pingall/select/?q=*%3A*"; > .interval = 5s; > .timeout = 1s; > .window = 5; > .threshold = 3; > } > } > backend server2{ > .host = "XXX1.domain.com"; > .port = "8983"; > .probe = { > .url = "/solr/pingall/select/?q=*%3A*"; > .interval = 5s; > .timeout = 1s; > .window = 5; > .threshold = 3; > } > } > backend server3{ > .host = "XXX2.domain.com"; > .port = "8983"; > .probe = { > .url = "/solr/pingall/select/?q=*%3A*"; > .interval = 5s; > .timeout = 1s; > .window = 5; > .threshold = 3; > } > } > backend server4{ > .host = "XXX3.domain.com"; > .port = "8983"; > .probe = { > .url = "/solr/pingall/select/?q=*%3A*"; > .interval = 5s; > .timeout = 1s; > .window = 5; > .threshold = 3; > } > } > > director default round-robin { > { > .backend = server1; > } > { > .backend = server2; > } > { > .backend = server3; > } > { > .backend = server4; > } > } > > acl purge { > "localhost"; > "10.0.1.0"/24; > "10.0.3.0"/24; > } > > > sub vcl_recv { >if (req.url ~ "\?PURGEME$") { > if (!client.ip ~ purge) { > error 405 "Not allowed. " + client.ip; > } > ban("req.url ~ /"); > error 200 "Cached Cleared"; >} >remove req.http.Cookie; >if (req.backend.healthy) { > set req.grace = 15s; >} else { > set req.grace = 1h; >} >return (lookup); > } > > sub vcl_fetch { > set beresp.grace = 1h; > if (!beresp.http.last-modified) { > set beresp.ttl = 600s; > } > if (beresp.ttl < 600s) { > set beresp.ttl = 600s; > } > unset beresp.http.Set-Cookie; > } > > sub vcl_deliver { > if (obj.hits > 0) { > set resp.http.X-Cache = "HIT"; > } else { > set resp.http.X-Cache = "MISS"; > } > } > > sub vcl_hash { > hash_data(req.url); > return (hash); > } > > > > > > > On Tue, Jun 25, 2013 at 4:44 PM, Learner wrote: > >> Check this link.. >> http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Varnish-tp4072057p4073205.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076 > -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Http status 503 Error in solr cloud setup
I do not know what causes the error. This setup will not work. You need one or three zookeepers. SolrCloud demands that a majority of the ZK servers agree. If you have two ZKs this will not work. On 06/29/2013 05:47 AM, Sagar Chaturvedi wrote: Hi, I setup 2 solr instances on 2 different machines and configured 2 zookeeper servers on these machines also. When I start solr on both machines and try to access the solr web-admin then I get following error on browser -- "Http status 503 -- server is shutting down" When I setup a single standalone solr without zookeeper, I do not get this error. Any insights ? /Thanks and Regards,/ /Sagar Chaturvedi/ /Member Of Technical Staff / /NEC Technologies India, Noida/ /09711931646/ DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Varnish
Solr HTTP caching also support "e-tags". These are unique keys for the output of a query. If you send a query twice, and the index has not changed, the return will be the same. The e-tag is generated from the query string and the index generation number. If Varnish supports e-tags, you can keep some queries cached longer than your timeout. Lance On 06/29/2013 05:51 PM, William Bell wrote: On a large website, by putting 1 varnish in front of all 4 SOLR boxes we were able to trim 25% off the load time (TTFB) of the page. Our hit ratio was between 55 and 75%. We gave varnish 24GB of RAM, and was not able to fill it under full load with a 10 minute cache timeout. We get about 2.4M SOLR calls every 15 to 20 minutes. One varnish was able to handle it with almost no lingering connections, and load average of < 1. Varnish is very optimized and worth trying. On Sat, Jun 29, 2013 at 6:47 PM, William Bell wrote: OK. Here is the answer for us. Here is a sample default.vcl. We are validating the LastModified ( if (!beresp.http.last-modified) ) is changing when the core is indexed and the version changes of the index. This does 10 minutes caching and a 1hr grace period (if solr is down, it will deliver results up to 1 hr). This uses the URL for caching. You can also do: http://localhost?PURGEME To clear varnish if your IP is in the ACL list. backend server1 { .host = "XXX.domain.com"; .port = "8983"; .probe = { .url = "/solr/pingall/select/?q=*%3A*"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } backend server2{ .host = "XXX1.domain.com"; .port = "8983"; .probe = { .url = "/solr/pingall/select/?q=*%3A*"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } backend server3{ .host = "XXX2.domain.com"; .port = "8983"; .probe = { .url = "/solr/pingall/select/?q=*%3A*"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } backend server4{ .host = "XXX3.domain.com"; .port = "8983"; .probe = { .url = "/solr/pingall/select/?q=*%3A*"; .interval = 5s; .timeout = 1s; .window = 5; .threshold = 3; } } director default round-robin { { .backend = server1; } { .backend = server2; } { .backend = server3; } { .backend = server4; } } acl purge { "localhost"; "10.0.1.0"/24; "10.0.3.0"/24; } sub vcl_recv { if (req.url ~ "\?PURGEME$") { if (!client.ip ~ purge) { error 405 "Not allowed. " + client.ip; } ban("req.url ~ /"); error 200 "Cached Cleared"; } remove req.http.Cookie; if (req.backend.healthy) { set req.grace = 15s; } else { set req.grace = 1h; } return (lookup); } sub vcl_fetch { set beresp.grace = 1h; if (!beresp.http.last-modified) { set beresp.ttl = 600s; } if (beresp.ttl < 600s) { set beresp.ttl = 600s; } unset beresp.http.Set-Cookie; } sub vcl_deliver { if (obj.hits > 0) { set resp.http.X-Cache = "HIT"; } else { set resp.http.X-Cache = "MISS"; } } sub vcl_hash { hash_data(req.url); return (hash); } On Tue, Jun 25, 2013 at 4:44 PM, Learner wrote: Check this link.. http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html -- View this message in context: http://lucene.472066.n3.nabble.com/Varnish-tp4072057p4073205.html Sent from the Solr - User mailing list archive at Nabble.com. -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: documentCache not used in 4.3.1?
That's a good idea, I'll try that next week. Thanks! Tim On 29/06/13 12:39 PM, Erick Erickson wrote: Tim: Yeah, this doesn't make much sense to me either since, as you say, you should be seeing some metrics upon occasion. But do note that the underlying cache only gets filled when getting documents to return in query results, since there's no autowarming going on it may come and go. But you can test this pretty quickly by lengthening your autocommit interval or just not indexing anything for a while, then run a bunch of queries and look at your cache stats. That'll at least tell you whether it works at all. You'll have to have hard commits turned off (or openSearcher set to 'false') for that check too. Best Erick On Sat, Jun 29, 2013 at 2:48 PM, Vaillancourt, Timwrote: Yes, we are softCommit'ing every 1000ms, but that should be enough time to see metrics though, right? For example, I still get non-cumulative metrics from the other caches (which are also throw away). I've also curl/sampled enough that I probably should have seen a value by now. If anyone else can reproduce this on 4.3.1 I will feel less crazy :). Cheers, Tim -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, June 29, 2013 10:13 AM To: solr-user@lucene.apache.org Subject: Re: documentCache not used in 4.3.1? It's especially weird that the hit ratio is so high and you're not seeing anything in the cache. Are you perhaps soft committing frequently? Soft commits throw away all the top-level caches including documentCache I think Erick On Fri, Jun 28, 2013 at 7:23 PM, Tim Vaillancourt wrote: Thanks Otis, Yeah I realized after sending my e-mail that doc cache does not warm, however I'm still lost on why there are no other metrics. Thanks! Tim On 28 June 2013 16:22, Otis Gospodnetic wrote: Hi Tim, Not sure about the zeros in 4.3.1, but in SPM we see all these numbers are non-0, though I haven't had the chance to confirm with Solr 4.3.1. Note that you can't really autowarm document cache... Otis -- Solr& ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt wrote: Hey guys, This has to be a stupid question/I must be doing something wrong, but after frequent load testing with documentCache enabled under Solr 4.3.1 with autoWarmCount=150, I'm noticing that my documentCache metrics are always zero for non-cumlative. At first I thought my commit rate is fast enough I just never see the non-cumlative result, but after 100s of samples I still always get zero values. Here is the current output of my documentCache from Solr's admin for 1 core: " - documentCache< http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?en try=documentCache - class:org.apache.solr.search.LRUCache - version:1.0 - description:LRU Cache(maxSize=512, initialSize=512, autowarmCount=150, regenerator=null) - src:$URL: https:/ /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/ solr/core/src/java/org/apache/solr/search/LRUCache.java< https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/s olr/core/src/java/org/apache/solr/search/LRUCache.java $ - stats: - lookups:0 - hits:0 - hitratio:0.00 - inserts:0 - evictions:0 - size:0 - warmupTime:0 - cumulative_lookups:65198986 - cumulative_hits:63075669 - cumulative_hitratio:0.96 - cumulative_inserts:2123317 - cumulative_evictions:1010262 " The cumulative values seem to rise, suggesting doc cache is working, but at the same time it seems I never see non-cumlative metrics, most importantly warmupTime. Am I doing something wrong, is this normal/by-design, or is there an issue here? Thanks for helping with my silly question! Have a good weekend, Tim
Re: broken links returned from solr search
OK thanks. So I guess I will set up my own "normal" webserver and have the solr server a sort of private web-based API (or possibly a front-end that, when a user clicks on a search result link, just redirects the user to my "normal" web server that has the related file). That's easy enough. If that's not how solr is supposed to be used, please feel free to let me know. Thanks! On Jun 29, 2013, at 3:34 PM, Erick Erickson wrote: > There's nothing built into the indexing process that stores URLs allowing > you to fetch the document, you have to do that yourself. I'm not sure how > the link is getting into the search results, you're assigning "doc1" as the > ID of the doc, and I think the browse request handler, aka Solaritas is > constructing the link as best it can. But that is only demo code, not > intended to fetch the document. > > In a typical app, you'll construct a URL for display that has meaning in > _your_ environment, typically some way for the app server to know where the > document is and how to fetch it. the browse request handler is showing you > how you'd do this, but isn't meant to actually fetch the doc. > > Best > Erick > > > On Sat, Jun 29, 2013 at 1:29 PM, gilawem wrote: > >> Sorry, i thought it was obvious. The links that are broken are the links >> that are returned in the search results. Using the example in the >> documentation I mentioned below, to load a word doc via >>curl " >> http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F >> "myfile=@myworddoc.doc" >> >> the broken link that shows up in the search results is >> http://localhost:8983/solr/collection1/doc1 >> >> so I just need to know where in the solr config to be able to handle >> requests when the URL points to collection/some_doc >> >> >> On Jun 29, 2013, at 1:08 PM, Erick Erickson wrote: >> >>> What links? You haven't shown us what link you're clicking on >>> that generates the 404 error. >>> >>> You might want to review: >>> http://wiki.apache.org/solr/UsingMailingLists >>> >>> Best >>> Erick >>> >>> >>> On Fri, Jun 28, 2013 at 2:04 PM, MA LIG wrote: >>> Hello, I ran the solr example as described in http://lucene.apache.org/solr/4_3_1/tutorial.html and then loaded some >> doc files to solr as described in http://wiki.apache.org/solr/ExtractingRequestHandler. The commands I >> used to load the files were of the form curl " http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; >> -F "myfile=@test.doc" I can successfully see search results in http://localhost:8983/solr/collection1/browse< http://192.168.3.72:8983/solr/collection1/browse?q=test> . However, when I click on a link, I get a 404 not found error. How can I make these links work properly? Thanks in advance -gw >> >>