RE: running SOLR on same server as your website
Just make sure that outside users can't talk directly to your solr instance. If they can talk to Solr, they can add/delete documents which will affect your site. Tim -Original Message- From: okayndc [mailto:bodymo...@gmail.com] Sent: Wednesday, September 07, 2011 10:45 AM To: solr-user@lucene.apache.org Subject: Re: running SOLR on same server as your website Right now, the index is relatively small in size ~less than 1mb. I think right now, it's okay but, a couple years down the road, we may have to transfer SOLR onto a separate application server. On Wed, Sep 7, 2011 at 10:15 AM, Jaeger, Jay - DOT wrote: > You could host Solr inside the same Tomcat container, or in a different > servlet container (say, a second Tomcat instance) on the same server. > > Be aware of your OS memory requirements, though: In my experience, Solr > performs best when it has lots of OS memory to cache index files (at least, > if your index is very big). For that reason alone, we chose to host our > Solr instance (used internally only) in a separate virtual machine in its > own web app server instance. > > It is all a matter of managing your memory, CPU and disk performance. If > those are already constrained or nearly constrained on your website, then > adding Solr into that mix is probably not such a good idea. If those are > not issues on your existing website, and your Solr load is modest, then you > can probably squeeze it onto the same server. > > Like most real-world answers, it comes down to "it depends". > > JRJ > > -Original Message- > From: okayndc [mailto:bodymo...@gmail.com] > Sent: Wednesday, September 07, 2011 9:02 AM > To: solr-user@lucene.apache.org > Subject: running SOLR on same server as your website > > Hi everyone! > > Is it not a good practice to run SOLR on the same server where you website > files sit? Or is it a MUST to house SOLR on it's own application server? > The problem that I'm facing is that, my website's files sit on a servlet > container (Tomcat) and I think it would be more convenient to house the > SOLR > instance on the same server? Is this not a good idea? What is your SOLR > setup? > > Thanks >
RE: Foreign characters question
I had the same problem, the correction differs by which application server you are using. If it's Tomcat, try here: http://wiki.apache.org/solr/SolrTomcat near uri charset. I use glassfish, and I added this entry to the wiki after getting help from this group: http://wiki.apache.org/solr/SolrGlassfish I hope this helps. Tim -Original Message- From: Blargy [mailto:zman...@hotmail.com] Sent: Tuesday, July 13, 2010 12:55 PM To: solr-user@lucene.apache.org Subject: Foreign characters question I am trying to add the following synonym while indexing/searching swimsuit, bañadores, bañador I testing searching for "bañadores" however it didn't return any results. After further inspection I noticed in the field analysis admin that swimsuit gets expanded to ba�adores. Not sure if it will show up but the "n" is a black diamond with a white question mark in it. So basically, how can I add support for foreign characters? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p964078.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: date boosting and dismax
I used this before my search term and it works well: {!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)} Its enough that when I search for *:* the articles appear in chronological order. Tim -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, July 14, 2010 11:47 AM To: solr-user@lucene.apache.org Subject: date boosting and dismax I've started a couple of previous threads on this topic, but I did not have a good date field in my index to use at the time. I now have a schema with the document's post_date in tdate format, so I would like to actually do some implementation. Right now, we are not doing relevancy ranking at all - we sort by descending post_date. We have been working on our application code so we can switch to dismax and use relevancy, but it's still important to have a small bias towards newer content. The idea is nothing this list hasn't heard before - to give newer documents a slight relevancy boost. An important sub-goal is to ensure that the adjustment doesn't render Solr's caches useless. I'm thinking that this means that at a minimum, I need to round dates to a resolution of 1 day, but if it's doable, 1 week might be even better. I do like the idea of having different boosts for different time ranges. Can anyone give me a starting point on how to do this? I will need actual URL examples and dismax configuration snippets. Thanks, Shawn
RE: date boosting and dismax
Re: flexibility. This boost does decays over time, the further it gets from now the less of a boost it receives. You are right though, it doesn't allow a fine degree of control, particularly if you don't want to smoothly decay the boost. I hadn't considered your suggestion, so I'll keep it in mind if the need arises. Re: Adding boost to query: I am no expert, but I did this and it worked: SolrJ: solrQuery.setQuery("{!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)} " + queryparam); Where queryparam is what you are searching for. You quite literally just prepend it. Via http://localhost:8080/apache-solr-1.4.0/select, just prepend it to your q= like this: q={!boost+b%3Drecip(ms(NOW,publishdate),3.16e-11,1,1)}+findthis Tim -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, July 14, 2010 1:16 PM To: solr-user@lucene.apache.org Subject: Re: date boosting and dismax One of the replies I got on a previous thread mentioned range queries, with this example: [NOW-6MONTHS TO NOW]^5.0 , [NOW-1YEARS TO NOW-6MONTHS]^3.0 [NOW-2YEARS TO NOW-1YEARS]^2.0 [* TO NOW-2YEARS]^1.0 Something like this seems more flexible, and into it, I read an implication that the performance would be better than the boost function you've shown, but I don't know how to actually put it into a URL or handler config. I also seem to remember seeing something about how to do "less than" in range queries as well as the "less than or equal to" implied by the above, but I cannot find it now. Thanks, Shawn On 7/14/2010 10:26 AM, Tim Gilbert wrote: > I used this before my search term and it works well: > > {!boost b=recip(ms(NOW,publishdate),3.16e-11,1,1)} > > Its enough that when I search for *:* the articles appear in > chronological order. > > Tim
Advice requested. How to map 1:M or M:M relationships with support for facets
Hi guys, Question: What is the best way to create a solr schema which supports a 'multivalue' where the value is a two item array of event category and a date. I want to have faceted searches, counts and Date Range ability on both the category and the dates. Details: This is a person database where Person can have details about them (like address) and Person have many "Events". Events have a category (type of event) and a Date for when that event occurred. At the bottom you will see a simple diagram showing the relationship. Briefly, a Person has many Events and Events have a single category and a single person. What I would like to be able to do is: Have a facet which shows all of the event categories, with a 'sub-facet' that show Category + date. For example, if a Category was "Attended Conference" and date was 2008-09-08, I'd be able to show a count of all "Attended Conference", then have a tree type control and show the years (for example): Eg. + Attended Conference (1038) | + 2010 (100) +--- 2009 (134) +--- 2008 (234) | + Another Event Category (23432) | +-2010 (234) +2009 (245) Etc. For scale, I expect to have < 100 "Event Categories" and < a million person_event records on < 250,000 persons. I don't care very much about disk space, so if it's a 1 GB or 100 GB due to indexing, that's okay if the solution works (and its fast! :-)) Solutions I looked at: * I looked at poly but they seem to be a fixed length and appeared to be the same type. Typical use case was latitude & longitude. I don't think this will work because there are a variable number of events attached to a person. * I looked at multiValued but it didn't seem to permit two fields having a relationship, ie. Event Category & Event Date. It seemed to me that they need to be broken out. That's not necessarily a bad thing, but it didn't seem ideal. * I thought about concatenating category & date to create a fake fields strictly for faceting purposes, but I believe that will break date ranges. Eg. EventCategoryId + "|" + Date = 1|2009 as a facet would allow me to show counts for that event type. Seems a bit unwieldy to me... What's the groups advice for handling this situation in the best way? Thanks in advance, as always sorry if this question has been asked and answered a few times already. I googled for a few hours before writing this... but things change so fast with Solr that any article older than a year was suspect to me, also there are so many patches that provide additional functionality... Tim Schema:
RE: Advice requested. How to map 1:M or M:M relationships with support for facets
Thank you for your advice. Tim -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Tuesday, September 07, 2010 11:01 PM To: solr-user@lucene.apache.org Subject: Re: Advice requested. How to map 1:M or M:M relationships with support for facets These days the best practice for a 'drill-down' facet in a UI is to encode both the unique value of the facet and the displayable string into one facet value. In the UI, you unpack and show the display string, and search with the full facet string. If you want to also do date ranges, make a separate matching 'date' field. This will store the date twice. Solr schema design is all about denormalizing. Tim Gilbert wrote: > > Hi guys, > > *Question:* > > What is the best way to create a solr schema which supports a > 'multivalue' where the value is a two item array of event category and > a date. I want to have faceted searches, counts and Date Range ability > on both the category and the dates. > > *Details:* > > This is a person database where Person can have details about them > (like address) and Person have many "Events". Events have a category > (type of event) and a Date for when that event occurred. At the bottom > you will see a simple diagram showing the relationship. Briefly, a > Person has many Events and Events have a single category and a single > person. > > What I would like to be able to do is: > > Have a facet which shows all of the event categories, with a > 'sub-facet' that show Category + date. For example, if a Category was > "Attended Conference" and date was 2008-09-08, I'd be able to show a > count of all "Attended Conference", then have a tree type control and > show the years (for example): > > Eg. > > + Attended Conference (1038) > > | > > + 2010 (100) > > +--- 2009 (134) > > +--- 2008 (234) > > | > > + Another Event Category (23432) > > | > > +-2010 (234) > > +2009 (245) > > Etc. > > For scale, I expect to have < 100 "Event Categories" and < a million > person_event records on < 250,000 persons. I don't care very much > about disk space, so if it's a 1 GB or 100 GB due to indexing, that's > okay if the solution works (and its fast! J) > > *Solutions I looked at:* > > * I looked at poly but they seem to be a fixed length and appeared > to be the same type. Typical use case was latitude & longitude. > I don't think this will work because there are a variable number > of events attached to a person. > * I looked at multiValued but it didn't seem to permit two fields > having a relationship, ie. Event Category & Event Date. It > seemed to me that they need to be broken out. That's not > necessarily a bad thing, but it didn't seem ideal. > * I thought about concatenating category & date to create a fake > fields strictly for faceting purposes, but I believe that will > break date ranges. Eg. EventCategoryId + "|" + Date = 1|2009 as > a facet would allow me to show counts for that event type. Seems > a bit unwieldy to me... > > What's the groups advice for handling this situation in the best way? > > Thanks in advance, as always sorry if this question has been asked and > answered a few times already. I googled for a few hours before writing > this... but things change so fast with Solr that any article older than > a year was suspect to me, also there are so many patches that provide > additional functionality... > > Tim > > Schema: >
RE: Schema required?
Hi Frank, Check out the Dynamic Fields option from here http://wiki.apache.org/solr/SchemaXml Tim -Original Message- From: Frank Calfo [mailto:fca...@aravo.com] Sent: Monday, October 18, 2010 5:25 PM To: solr-user@lucene.apache.org Subject: Schema required? We need to index documents where the fields in the document can change frequently. It appears that we would need to update our Solr schema definition before we can reindex using new fields. Is there any way to make the Solr schema optional? --frank
RE: Mulitple facet - fq
As Prasad said: fq=(category:corporate category:personal) But you might want to check your schema.xml to see what you have here: You can always specify your operator in your search between your facets. fq=(category:corporate AND category:personal) or fq=(category:corporate OR category:personal) I have an application where I am using searches on 10 more facets with AND OR + and - options and it works flawlessly. fq=(+category:corporate AND -category:personal) meaning category is corporate and not personal. Tim -Original Message- From: Pradeep Singh [mailto:pksing...@gmail.com] Sent: Wednesday, October 20, 2010 11:56 AM To: solr-user@lucene.apache.org Subject: Re: Mulitple facet - fq fq=(category:corporate category:personal) On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ wrote: > Under category facet, there are multiple selections, whicih can be > personal,corporate or other > > How can I get both "personal" and "corporate" ones, I tried > fq=category:corporate&fq=category:personal > > It looks easy, but I can't find the solution. > > > -- > > Yavuz Selim YILMAZ >
RE: Mulitple facet - fq
Sorry, what Pradeep said, not Prasad. My apologies Pradeep. -Original Message- From: Tim Gilbert Sent: Wednesday, October 20, 2010 12:18 PM To: 'solr-user@lucene.apache.org' Subject: RE: Mulitple facet - fq As Prasad said: fq=(category:corporate category:personal) But you might want to check your schema.xml to see what you have here: You can always specify your operator in your search between your facets. fq=(category:corporate AND category:personal) or fq=(category:corporate OR category:personal) I have an application where I am using searches on 10 more facets with AND OR + and - options and it works flawlessly. fq=(+category:corporate AND -category:personal) meaning category is corporate and not personal. Tim -Original Message- From: Pradeep Singh [mailto:pksing...@gmail.com] Sent: Wednesday, October 20, 2010 11:56 AM To: solr-user@lucene.apache.org Subject: Re: Mulitple facet - fq fq=(category:corporate category:personal) On Wed, Oct 20, 2010 at 7:39 AM, Yavuz Selim YILMAZ wrote: > Under category facet, there are multiple selections, whicih can be > personal,corporate or other > > How can I get both "personal" and "corporate" ones, I tried > fq=category:corporate&fq=category:personal > > It looks easy, but I can't find the solution. > > > -- > > Yavuz Selim YILMAZ >
RE: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?
> > Where do you get your Lucene/Solr downloads from? > > [X] ASF Mirrors (linked in our release announcements or via the Lucene > website) > > [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) > > [X] I/we build them from source via an SVN/Git checkout. > > [] Other (someone in your company mirrors them internally or via a > downstream project) > -Original Message- From: Juan Grande [mailto:juan.gra...@gmail.com] Sent: Friday, January 21, 2011 10:25 AM To: solr-user@lucene.apache.org Subject: Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors? > > Where do you get your Lucene/Solr downloads from? > > [] ASF Mirrors (linked in our release announcements or via the Lucene > website) > > [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) > > [X] I/we build them from source via an SVN/Git checkout. > > [] Other (someone in your company mirrors them internally or via a > downstream project) > Juan Grande
uniqueKey merge documents on commit
Hi, I have a unique key within my index, but rather than the default behavour of overwriting I am wondering if there is a method to "merge" the two different documents on commit of the second document. I have a testcase which explains what I'd like to happen: @Test public void testMerge() throws SolrServerException, IOException { SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField("secid", "testid"); doc1.addField("value1_i", 1); SolrAllSec.GetSolrServer().add(doc1); SolrAllSec.GetSolrServer().commit(); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField("secid", "testid"); doc2.addField("value2_i", 2); SolrAllSec.GetSolrServer().add(doc2); SolrAllSec.GetSolrServer().commit(); SolrQuery solrQuery = new SolrQuery(); solrQuery = solrQuery.setQuery("secid:testid"); QueryResponse response = SolrAllSec.GetSolrServer().query(solrQuery, METHOD.GET); List result = response.getResults(); Assert.isTrue(result.size() == 1); Assert.isTrue(result.contains("value1")); Assert.isTrue(result.contains("value2")); } Other than reading "doc1" and adding the fields from "doc2" and recommitting, is there another way? Thanks in advance, Tim
RE: Solr and Permissions
What about using the BitwiseQueryParserPlugin? https://issues.apache.org/jira/browse/SOLR-1913 You could encode your documents with a series of permissions based on Bit flags and then OR them on query. Tim -Original Message- From: r...@intelligencebank.com [mailto:r...@intelligencebank.com] On Behalf Of Liam O'Boyle Sent: Thursday, March 10, 2011 7:53 PM To: solr-user@lucene.apache.org Subject: Solr and Permissions Morning, We use solr to index a range of content to which, within our application, access is restricted by a system of user groups and permissions. In order to ensure that search results don't reveal information about items which the user doesn't have access to, we need to somehow filter the results; this needs to be done within Solr itself, rather than after retrieval, so that the facet and result counts are correct. Currently we do this by creating a filter query which specifies all of the items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)), but this has definite scalability issues - we're starting to run into issues, as this can be a set of ORs of potentially unlimited size (and practically, we're hitting the low thousands sometimes). While we can adjust maxBooleanClauses upwards, I understand that this has performance implications... So, has anyone had to implement something similar in the past? Any suggestions for a more scalable approach? Any advice on safe and sensible limits on how far I can push maxBooleanClauses? Thanks for your advice, Liam
RE: keeping data consistent between Database and Solr
I use Solr + MySql with data coming from several DHI type "loaders" that I have written to move data from many different databases into my "BI" solution. I don't use DHI because I am not simply replicating the data, but I am moving/merging/processing the incoming data during the loading. For me, I have an Aspect (aspectj) which wraps my Data Access Object and every time a "persist" is called (I am using hibernate), I update Solr with the same data an instant later using @Around advice. This handles nearly every event during the day. I have a simple "retry" procedure on my Solrj add/commit on network error in hopes that it will eventually work. In case of error I rebuild the solr index from scratch each night by recreating it based on the data in MySQL. That takes about 10 minutes and I run it at night. This allows for me to have "eventual consistency" for any issues that cropped up during the day. Obviously the size of my database (< 2 million records) makes this approach manageable. YMMV. Tim -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Tuesday, March 15, 2011 9:13 AM To: solr-user@lucene.apache.org Subject: Re: keeping data consistent between Database and Solr On 3/14/2011 9:38 PM, onlinespend...@gmail.com wrote: > But my main question is, how do I guarantee that data between my Cassandra > database and Solr index are consistent and up-to-date? Our MySQL database has two unique indexes. One is a document ID, implemented in MySQL as an autoincrement integer and in Solr as a long. The other is what we call a tag id, implemented in MySQL as a varchar and Solr as a single lowercased token and serving as Solr's uniqueKey. We have an update trigger on the database that updates the document ID whenever the database document is updated. We have a homegrown build system for Solr. In a nutshell, it keeps track of the newest document ID in the Solr Index. If the DIH delta-import fails, it doesn't update the stored ID, which means that on the next run, it will try and index those documents again. Changes to the entries in the database are automatically picked up because the document ID is newer, but the tag id doesn't change, so the document in Solr is overwritten. Things are actually more complex than I've written, because our index is distributed. Hopefully it can give you some ideas for yours. Shawn
RE: Javabin->JSon
Markus is right, this isn't the list for Java questions, but you can look into Jackson. Jackson is a java binder that can convert java pojos into json. http://jackson.codehaus.org/ I use it in Spring MVC to convert my output to json. Tim -Original Message- From: paulohess [mailto:pauloh...@yahoo.com] Sent: Tuesday, March 29, 2011 3:16 PM To: solr-user@lucene.apache.org Subject: Javabin->JSon Hi guys, I have a Javabin object and I need to convert that to a JSon object. How ? pls help? I am using solrj (client) that doesn't support JSON so (wt=json) won't convert it to JSon. thanks Paulo -- View this message in context: http://lucene.472066.n3.nabble.com/Javabin-JSon-tp2750066p2750066.html Sent from the Solr - User mailing list archive at Nabble.com.
Fast DIH with 1:M multValue entities
We are working on importing a large number of records into Solr using DIH. We have one schema with ~2000 fields declared which map off to several database schemas so that typically each document will have ~500 fields in use. We have about 2 million "rows" which we are importing, and we are seeing < 20 minutes in test across 14 different "entity's" which really map off to one virtual document. Then we added our multiValue stuff and, well, it didn't work out nearly as well. :-) We have several fields which are 1:M and so in our data-config.xml we might have something like this: That is a lot of database queries for a small result set which is really slowing things down for us. My question is more to ask advice, so it's a multi-parter :-) 1) Is there a way to declare in DIH an in-memory lookup where we can query for the entire Many side of the query in one database query, and match up on the PK? Then we can declare that field multiValued. 2) Assuming that isn't currently available, I thought "denormalizing" the 1:M into a delimited list and then using http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel imiterFilterFactory to tokenize. That would allow us to search on individual bits, and build something into the front-end to handle the display. That means we wouldn't use multiValued and we'd have to modify our db but we'd lose out on some of the abilities. 3) The third option was to open up DIH and try to add the first feature into it ourselves. Am I approaching this the right way? Are there other ways I haven't considered or don't know about? Thanks in advance, Tim
RE: Fast DIH with 1:M multValue entities
How did I miss that? Thanks, I will try that as it seems to be "in memory" lookup solution I needed. Thanks Erick, Tim -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, April 14, 2011 10:58 AM To: solr-user@lucene.apache.org Subject: Re: Fast DIH with 1:M multValue entities I'm not sure this applies, but have you looked at http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor <http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor> Best Erick On Thu, Apr 14, 2011 at 9:12 AM, Tim Gilbert wrote: > We are working on importing a large number of records into Solr using > DIH. We have one schema with ~2000 fields declared which map off to > several database schemas so that typically each document will have ~500 > fields in use. We have about 2 million "rows" which we are importing, > and we are seeing < 20 minutes in test across 14 different "entity's" > which really map off to one virtual document. Then we added our > multiValue stuff and, well, it didn't work out nearly as well. :-) > > > > We have several fields which are 1:M and so in our data-config.xml we > might have something like this: > > > > > > > > > > > query="{call dbo.getFundManager_Data(${FundId.FundId})}"> > > > > > > > > > > > > > > That is a lot of database queries for a small result set which is really > slowing things down for us. > > > > My question is more to ask advice, so it's a multi-parter :-) > > > > 1) Is there a way to declare in DIH an in-memory > lookup where we can query for the entire Many side of the query in one > database query, and match up on the PK? Then we can declare that field > multiValued. > > 2) Assuming that isn't currently available, I thought > "denormalizing" the 1:M into a delimited list and then using > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel > imiterFilterFactory to tokenize. That would allow us to search on > individual bits, and build something into the front-end to handle the > display. That means we wouldn't use multiValued and we'd have to modify > our db but we'd lose out on some of the abilities. > > 3) The third option was to open up DIH and try to add > the first feature into it ourselves. > > > > Am I approaching this the right way? Are there other ways I haven't > considered or don't know about? > > > > Thanks in advance, > > > > Tim > >
Non-English query via Solr Example Admin corrupts text
Hi guys/gals, I am using apache-solr-1.4.0.war deployed to glassfishv3 on my development machine which is Ubuntu 9.10 64-bit. I am using Solrj 1.4 using the CommonsHttpSolrServer connection to that Solr instance (http://localhost:8080/apache-solr-1.4.0) during my development. To simplify things however, I have found that I can duplicate my issue directly from Solr example admin page so for ease of confirmation, I will use the Solr Example Admin page for this example: I deployed the apache-solr-1.4.0/dist/apache-solr-1.4.0.war file to my glassfishv3 application server. It deploys successfully. I access http://localhost:8080/apache-solr-1.4.0/admin/form.jsp and enter into "Solr/Lucene Statement" textarea this word: numéro (Note the é) When I check the server.log file, I see this: INFO: [] webapp=/apache-solr-1.4.0 path=/select params={indent=on&version=2.2&q=numéro&fq=&start=0&rows=10&fl=*,score&qt=standard&wt=standard&explainOther=&hl.fl=} hits=0 status=0 QTime=16 As well, the output from the Admin system is with the same incorrect decoding. In my SolrJ using application, I have a test case which queries for "numéro" and succeeds if I use Embedded and fails if I use CommonsHttpSolrServer... I don't want to use embedded for a number of reasons including that its not recommended (http://wiki.apache.org/solr/EmbeddedSolr) I am sorry if you'd dealt with this issue in the past, I've spent a few hours googling for solr utf-8 query and glassfishv3 utf-8 uri plus other permutations/combinations but there were seemingly endless amounts of chaff that I couldn't find anything useful after scouring it for a few hours. I can't decide whether it's a glassfish issue or not so I am not sure where to direct my energy. Any tips or advice are appreciated! Thanks in advance, Tim Gilbert
RE: Non-English query via Solr Example Admin corrupts text
Chris, You are the best. Switching to POST solved the problem. I hadn't noticed that option earlier but after finding: https://issues.apache.org/jira/browse/SOLR-612 I found the option in the code. Thank you, you just made my day. Secondly, in an effort to narrow down whether this was a glassfish issue or not, here is what I found. Starting with glassfishv3 (I think) UTF-8 is the default for URI. You can see this by going to the admin site, clicking on Network Config | Network Listeners | then select the listener. Select the tab "HTTP" and about half way down, you will see URI Encoding: UTF-8. HOWEVER, that doesn't appear to be correct because following Abdelhamid Abid's advice, I deployed Solr to Tomcat, then followed the direction here: http://wiki.apache.org/solr/SolrTomcat to force tomcat to UTF-8 for URI. Then I deployed Solr to tomcat, and using CommonsHttpSolrServer, connected to that tomcat served instance. It worked- first time. So, it appears that there is a problem with glassfishv3 and UTF-8 URI's for at least the apache-solr-1.4.0.war. I wonder if I added that sun-web.xml file into the war to force UTF-8 it might work... not sure. However, the workaround is to change the method to POST as Chris suggested. You can do that in Solrj here: server.query(solrQuery, METHOD.POST); and it works as you'd expect. Thanks for the advice/tips, Tim -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, May 20, 2010 2:41 PM To: solr-user@lucene.apache.org Subject: Re: Non-English query via Solr Example Admin corrupts text : I am using apache-solr-1.4.0.war deployed to glassfishv3 on my ... : INFO: [] webapp=/apache-solr-1.4.0 path=/select : params={indent=on&version=2.2&q=numéro&fq=&start=0&rows=10&fl=*,score&qt=standard&wt=standard&explainOther=&hl.fl=} : hits=0 status=0 QTime=16 ... : In my SolrJ using application, I have a test case which queries for : "numéro" and succeeds if I use Embedded and fails if I use : CommonsHttpSolrServer... I don't want to use embedded for a number of ... : I am sorry if you'd dealt with this issue in the past, I've spent a few : hours googling for solr utf-8 query and glassfishv3 utf-8 uri plus other : permutations/combinations but there were seemingly endless amounts of : chaff that I couldn't find anything useful after scouring it for a few : hours. I can't decide whether it's a glassfish issue or not so I am not : sure where to direct my energy. Any tips or advice are appreciated! I suspect if you switched to using POST instead of GET your problem would go away -- this stems from amiguity in the way HTTP servers/browsers deal with encoding UTF8 in URLs. a quick search for "glassfish url encoding" turns up this thread... http://forums.java.net/jive/thread.jspa?threadID=38020 which refreneces... http://wiki.glassfish.java.net/Wiki.jsp?page=FaqHttpRequestParameterEncoding ...it looks like you want to modify the "default-charset attribute of the " -Hoss
RE: Non-English query via Solr Example Admin corrupts text
I wanted to improve the documentation in the solr wiki by adding in my findings. However, when I try to log in and create a new account, I receive this error message: You are not allowed to do newaccount on this page. Login and try again. Does anyone know how I can get permission to add a page to the documentation? Tim -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, May 20, 2010 3:21 PM To: solr-user@lucene.apache.org Subject: RE: Non-English query via Solr Example Admin corrupts text : Starting with glassfishv3 (I think) UTF-8 is the default for URI. You : can see this by going to the admin site, clicking on Network Config | : Network Listeners | then select the listener. Select the tab "HTTP" and : about half way down, you will see URI Encoding: UTF-8. : : HOWEVER, that doesn't appear to be correct because following Abdelhamid ... I know nothing about glassfish, but according to that forum URL i mentioned before, the URI Encoding option in glassfish explicitly (and evidently contenciously) does not apply to hte query args -- only the path, hence the two different config options mentioned in the FAQ... : http://forums.java.net/jive/thread.jspa?threadID=38020 ... : http://wiki.glassfish.java.net/Wiki.jsp?page=FaqHttpRequestParameterEnco ding -Hoss
RE: SolrJ Unicode problem
I had a similar problem a few days ago and I found that the documents where not being loaded correctly as UTF-8 into Solr. In my case, the loader program was a Java.jar I was executing from a cron job. There I added this: java -Dfile.encoding=UTF-8 -jar /home/tim/solr/bin/loadSiteSearch.jar Then, within that program, I wrote function to take the strings I was loading and expressly declare them as UTF-8 like this: private String toUTF8(String value) { return new String(value.getBytes(), "UTF-8"); } and that solved the problem for me. Tim -Original Message- From: Hugh Cayless [mailto:philomou...@gmail.com] Sent: Friday, May 28, 2010 12:51 PM To: solr-user@lucene.apache.org Subject: SolrJ Unicode problem Hi, I'm a solr newbie, and I'm hoping someone can point me in the right direction. I'm trying to index a bunch of documents with Greek text in them. I can successfully index documents by generating add xml and using curl to send them to my server, but when I use solrj to create and send documents, the encoding gets throughly messed up. Instead of the result (from an add doc posted with curl): c.etiq.mom;;2077 Της Βησο ς Χρη εις Πανοπολίτης I get (from a SolrInputDocument loaded with solrj): c.etiq.mom;;2077 ??? ? ??? ??? �?? I can confirm that the SolrInputDocument's transcription field contains Greek text before I call .add(documents) on the StreamingUpdateSolrServer (i.e., I can get Greek back out of it). So I don't know what to do next. Any ideas? Thanks, Hugh
RE: Auto-suggest internal terms
I was interested in the same thing and stumbled upon this article: http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent -and-jquery/ I haven't followed through, but it looked promising to me. Tim -Original Message- From: Jay Hill [mailto:jayallenh...@gmail.com] Sent: Wednesday, June 02, 2010 4:02 PM To: solr-user@lucene.apache.org Subject: Auto-suggest internal terms I've got a situation where I'm looking to build an auto-suggest where any term entered will lead to suggestions. For example, if I type "wine" I want to see suggestions like this: french *wine* classes *wine* book discounts burgundy *wine* etc. I've tried some tricks with shingles, but the only solution that worked was pre-processing my queries into a core in all variations. Anyone know any tricks to accomplish this in Solr without doing any custom work? -Jay
RE: solrj Unicode queries don't return results
I had the same problem a while back. You didn't mention which application server you are using (if any) but some application servers have problems with UTF-8 queries and GET. Tomcat has a well documented solution http://wiki.apache.org/solr/SolrTomcat (near the bottom), I recently experienced problems with glassfish and switched to post to solve it (http://wiki.apache.org/solr/SolrGlassfish) Tim -Original Message- From: jlist9 [mailto:jli...@gmail.com] Sent: Monday, June 07, 2010 2:33 PM To: solr-user@lucene.apache.org Cc: dioxide.softw...@gmail.com Subject: solrj Unicode queries don't return results Hi, I'm having a problem with Unicode queries using solrj. I have an index with unicode strings. From /solr/admin web interface, I can find results using the Java unicode format, such as \u751f\u6d3b. (If I just type in a UTF-8 string, I can't find any result though. Not sure why.) But in solrj, I tried having the string in UTF-8 in UTF-8 encoded Java source file, and I also tried using the Java unicode format in query.setQuery( ), but none of these approaches return any results. When I searched online, I found a similar question here w/o no answers. http://www.mail-archive.com/solr-user@lucene.apache.org/msg21380.html So what's the right way of doing unicode queries with solrj? Thank you, Jack
RE: TikaEntityProcessor on Solr 1.4?
When I wanted to add some content to the solrj wiki for glassfish, I had a problem in that their anti-spam measures broke the ability to create a new account. Someone here (Chris I think) was kind enough to create a ticket in the correct place: https://issues.apache.org/jira/browse/INFRA-2726 You can see it was very quickly solved. I am not suggesting that the problem is the same, only that this may be the correct place to create a new ticket with the problem of getting a file from the wiki and perhaps someone can help you there. Tim -Original Message- From: Sixten Otto [mailto:six...@sfko.com] Sent: Tuesday, June 08, 2010 3:53 PM To: solr-user@lucene.apache.org Subject: Re: TikaEntityProcessor on Solr 1.4? 2010/5/22 Noble Paul നോബിള് नोब्ळ् : > just copy the dih-extras jar file from the nightly should be fine Now that I've finally got a server on which to attempt to set these things up... this turns out not to be a viable solution. The extras jar does contain the TikaEntityProcessor class, but NOT the BinFileDataSource and BinURLDataSource on which it depends. I tried both replacing the 1.4 DIH jar with the one from the trunk, and adding those two specific classes to the extras jar, neither of which worked. (And I apologize, but I didn't copy down the exceptions involved; if I can find some free time, I might go back and make the attempt again, a bit more methodically.) Sixten