Re: MoreLikeThis Question
Hi! On Wed, Feb 15, 2012 at 07:27, Jamie Johnson wrote: > Is there anyway with MLT to say get similar based on all fields or is > it always a requirement to specify the fields? It seems to be not the case. But you could append the fields Parameter in the solrconfig.xml: ... Cheers, Michael
RE: OR-FilterQuery
> > q=some text > > fq=id:(1 OR 2 OR 3...) > > > > Should I better use q:some text AND id:(1 OR 2 OR 3...)? > > > 1. These two opts have the different scoring. > 2. if you hit same fq=id:(1 OR 2 OR 3...) many times you have > a benefit due > to reading docset from heap instead of searching on disk. OK, understood. Thank you.
RE: OR-FilterQuery
> In other words, there's no attempt to decompose the fq clause > and store parts of it in the cache, it's exact-match or > nothing. Ah ok, thank you.
Solr as an part of api to unburden databases
Hi, does anyone of the maillinglist users use solr as an API to avoid database queries? I know that this depends on the type of data. Imagine you have something like Quora "Q&A" System, which is most just "text". If I would embed some of these "Q&A" into my personal site, and would invoke the Quroa API, I guess, they would do some database operations. Would it be possible to call the Quora API that internally calls solr and stream the results back to my website? This should be highly configurable, but the advantage would be that it would unburden the databases. There would be something like a three layer architecture: Client -> | API (is doing some authorization/authentication checks) -> | solr Solr -> | API (may be filter the data, remove unofficial data, etc. ) -> | Client I'm not really familiar with that kind of architecture, and therefore does not know, if it makes any sense. Any comments are appreciated! Best regards, Ramo
MoreLikeThis Requesthandler
Hi, I'm quite new to Solr. We want to find similar documents based on a MoreLikeThis query. In general this works fine and gives us reasonable results. Now we want to influence the result score by ranking more recent documents higher than older documents. Is this possible with the MoreLikeThis Requesthandler? If so, how can we achieve this? Thanks in advance, Robert
Error Indexing in solr 3.5
Hi, When I tried to index in solr 3.5 i got the following exception org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at com.quartz.test.FullImport.callIndex(FullImport.java:80) at com.quartz.test.GetObjectTypes.checkObjectTypeProp(GetObjectTypes.java:245) at com.quartz.test.GetObjectTypes.execute(GetObjectTypes.java:640) at com.quartz.test.QuartzSchedMain.main(QuartzSchedMain.java:55) Caused by: java.lang.RuntimeException: Invalid version or the data in not in 'javabin' format at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:466) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) I placed the latest solrj 3.5 jar in the example/solr/lib directory and then re-started the same but still I am getting the above mentioned exception. Please let me know if I am missing anything. -- View this message in context: http://lucene.472066.n3.nabble.com/Error-Indexing-in-solr-3-5-tp3746735p3746735.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting stopwords
Koji Sekiguchi wrote > > (12/02/14 22:25), O. Klein wrote: >> I have not been able to find any logic in the behavior of hl.q and how it >> analyses the query. Could you explain how it is supposed to work? > > Nothing special on hl.q. If you use hl.q, the value of it will be used for > highlighting rather than the value of q. There's no tricks, I think. > > koji > -- > Apache Solr Query Log Visualizer > http://soleami.com/ > Field definitions: content_text (no stopwords, only synonyms in index) content_hl (stopwords, synonyms in index and query, and only field in hl.fl) Searching is done with edismax on content_text 1. If I use a query like hl.q=spell Check it doesn't highlight terms with uppercase, synonyms get highlighted (all fields have LowerCaseFilterFactory) 2. hl.q=content_hl:(spell Check) also highlights terms with uppercase, synonyms are not highlighted 4. hl.q=content_hl:(spell Check) content_text:(spell Check) highlights terms with uppercase and synonyms, but sometimes no highlights at all. So if 1 also highlights terms with uppercase I get the behavior I need. I can do this on client side, but maybe it's a bug? -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3746817.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr binary response for C#?
Hi, I just created a JIRA to investigate an Avro based serialization format for Solr: https://issues.apache.org/jira/browse/SOLR-3135 You're welcome to contribute. Guess we'll first need to define schemas, then create an AvroResponseWriter and then support in the C# Solr client. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 14. feb. 2012, at 15:14, Erick Erickson wrote: > It's not as compact as binary format, but would just using something > like JSON help enough? This is really simple, just specify > &wt=json (there's a method to set this on the server, at least in Java). > > Otherwise, you might get a more knowledgeable response on the > C# java list, I'm frankly clueless > > Best > Erick > > On Mon, Feb 13, 2012 at 1:15 PM, naptowndev wrote: >> Admittedly I'm new to this, but the project we're working on feeds results >> from Solr to an ASP.net application. Currently we are using XML, but our >> payloads can be rather large, some up to 17MB. We are looking for a way to >> minimize that payload and increase performance and I'm curious if there's >> anything anyone has been working out that creates a binary response that can >> be read by C# (similar to the javabin response built into Solr). >> >> That, or if anyone has experience implementing an external protocol like >> Thrift with Solr and consuming it with C# - again all in the effort to >> increase performance across the wire and while being consumed. >> >> Any help and direction would be greatly appreciated! >> >> Thanks! >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Solr-binary-response-for-C-tp3741101p3741101.html >> Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stemming and accents (HunspellStemFilterFactory)
Or if you know that you'll always strip accents in your search you may pre-process your pt_PT.dic to remove accents from it and use that custom dictionary instead in Solr. Another alternative could be to extend HunSpellFilter so that it can take in the class name of a TokenFilter class to apply when parsing the dictionary into memory. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 14. feb. 2012, at 16:27, Chantal Ackermann wrote: > Hi Bráulio, > > I don't know about HunspellStemFilterFactory especially but concerning > accents: > > There are several accent filter that will remove accents from your > tokens. If the Hunspell filter factory requires the accents, then simply > add the accent filters after Hunspell in your index and query filter > chains. > > You would then have Hunspell produce the tokens as result of the > stemming and only afterwards the accents would be removed (your example: > 'forum' instead of 'fórum'). Do the same on the query side in case > someone inputs accents. > > Accent filters are: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory > (lowercases, as well!) > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory > > and others on that page. > > Chantal > > > On Tue, 2012-02-14 at 14:48 +0100, Bráulio Bhavamitra wrote: >> Hello all, >> >> I'm evaluating the HunspellStemFilterFactory I found it works with a >> pt_PT dictionary. >> >> For example, if I search for 'fóruns' it stems it to 'fórum' and then find >> 'fórum' references. >> >> But if I search for 'foruns' (without accent), >> then HunspellStemFilterFactory cannot stem >> word, as it does' not exist in its dictionary. >> >> It there any way to make HunspellStemFilterFactory work without accents >> differences? >> >> best, >> bráulio >
Re: Semantic autocomplete with Solr
Check out http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ You can feed it anything, such as a log of previous searches, or a pre-computed dictionary of "item" + "color" combinations that exist in your DB etc. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 14. feb. 2012, at 23:46, Roman Chyla wrote: > done something along these lines: > > https://svnweb.cern.ch/trac/rcarepo/wiki/InspireAutoSuggest#Autosuggestautocompletefunctionality > > but you would need MontySolr for that - > https://github.com/romanchyla/montysolr > > roman > > On Tue, Feb 14, 2012 at 11:10 PM, Octavian Covalschi > wrote: >> Hey guys, >> >> Has anyone done any kind of "smart" autocomplete? Let's say we have a web >> store, and we'd like to autocomplete user's searches. So if I'll type in >> "jacket" next word that will be suggested should be something related to >> jacket (color, fabric) etc... >> >> It seems to me I have to structure this data in a particular way, but that >> way I can do without solr, so I was wondering if Solr could help us. >> >> Thank you in advance.
Re: Solr as an part of api to unburden databases
On Wed, Feb 15, 2012 at 11:48:14AM +0100, Ramo Karahasan wrote: > Hi, > > > > does anyone of the maillinglist users use solr as an API to avoid database > queries? [...] Like in a... cache? Why not use a cache then? (memcached, for example, but there are more). Regards -- tomás
Re: Solr soft commit feature
If you are looking for NRT functionality with Solr 3.5, you may want to take a look at Solr 3.5 with RankingAlgorithm. This allows you to add/update documents without a commit while being able to search concurrently. The add/update performance to add 1m docs is about 5000 docs in about 498 ms with one concurrent searcher. You can get more information about Solr 3.5 with RankingAlgorithm from here: http://tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 2/14/2012 4:41 PM, Dipti Srivastava wrote: Hi All, Is there a way to soft commit in the current released version of solr 3.5? Regards, Dipti Srivastava This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.
Re: MoreLikeThis Question
Hi, you would not want to include the unique ID and similar stuff, though? No idea whether it would impact the number of hits but it would most probably influence the scoring if nothing else. E.g. if you compare by certain fields, I would expect that a score of 1.0 indicates a match on all of those fields (haven't tested that explicitly, though). If the unique ID is included you could never reach that score. Just my 2 cents... Chantal On Wed, 2012-02-15 at 07:27 +0100, Jamie Johnson wrote: > Is there anyway with MLT to say get similar based on all fields or is > it always a requirement to specify the fields?
Re: Solr as an part of api to unburden databases
> > > > does anyone of the maillinglist users use solr as an API to avoid database > > queries? [...] > > Like in a... cache? > > Why not use a cache then? (memcached, for example, but there are more). > Good point. A cache only uses lookup by one kind of cache key while SOLR provides lookup by ... well... any search configuration that your index setup (mainly the schema) supports. If the "database queries" always do a find by unique id, then use a cache. Otherwise using SOLR is a valid option. Chantal
Re: Error Indexing in solr 3.5
Hi, I've got these errors when my client used a different SolrJ version from the SOLR server it connected to: SERVER 3.5 responding ---> CLIENT some other version You haven't provided any information on your client, though. Chantal On Wed, 2012-02-15 at 13:09 +0100, mechravi25 wrote: > Hi, > > When I tried to index in solr 3.5 i got the following exception > > org.apache.solr.client.solrj.SolrServerException: Error executing query > at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) > at com.quartz.test.FullImport.callIndex(FullImport.java:80) > at > com.quartz.test.GetObjectTypes.checkObjectTypeProp(GetObjectTypes.java:245) > at com.quartz.test.GetObjectTypes.execute(GetObjectTypes.java:640) > at com.quartz.test.QuartzSchedMain.main(QuartzSchedMain.java:55) > Caused by: java.lang.RuntimeException: Invalid version or the data in not in > 'javabin' format > at > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) > at > org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:466) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) > at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) > > > > I placed the latest solrj 3.5 jar in the example/solr/lib directory and then > re-started the same but still I am getting the above mentioned exception. > > Please let me know if I am missing anything. > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Error-Indexing-in-solr-3-5-tp3746735p3746735.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet on TrieDateField field without including date
On Wed, Feb 15, 2012 at 8:58 AM, Jamie Johnson wrote: > I would like to be able to facet based on the time of > day items are purchased across a date span. I was hoping that I could > do a query of something like date:[NOW-1WEEK TO NOW] and then specify > I wanted facet broken into hourly bins. Is this possible? Do I Will range faceting do everything you need? http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range -Yonik lucidimagination.com
Re: Facet on TrieDateField field without including date
I think it would if I indexed the time information separately. Which was my original thought, but I was hoping to store this in one field instead of 2. So my idea was I'd store the time portion as as a number (an int might suffice from 0 to 24 since I only need this to have that level of granularity) then do range queries over that. I couldn't think of a way to do this using the date field though because it would give me bins broken up by hours in a particular day, something like 2012-01-01-00:00:00 - 2012-01-01-01:00:00 10 2012-01-01-01:00:00 - 2012-01-01-02:00:00 20 2012-01-01-02:00:00 - 2012-01-01-03:00:00 5 But what I really want is just the time portion across all days 00:00:00 - 01:00:00 10 01:00:00 - 02:00:00 20 02:00:00 - 03:00:00 5 I would then use the date field to limit the time range in which the facet was operating. Does that make sense? Is there a more efficient way of doing this? On Wed, Feb 15, 2012 at 9:16 AM, Yonik Seeley wrote: > On Wed, Feb 15, 2012 at 8:58 AM, Jamie Johnson wrote: >> I would like to be able to facet based on the time of >> day items are purchased across a date span. I was hoping that I could >> do a query of something like date:[NOW-1WEEK TO NOW] and then specify >> I wanted facet broken into hourly bins. Is this possible? Do I > > Will range faceting do everything you need? > http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range > > -Yonik > lucidimagination.com
Re: MoreLikeThis Question
Yes, agree that ID would be one that would need to be ignored. I don't think specifying them is too difficult I was just curious if it was possible to do this or not. On Wed, Feb 15, 2012 at 8:41 AM, Chantal Ackermann wrote: > Hi, > > you would not want to include the unique ID and similar stuff, though? > No idea whether it would impact the number of hits but it would most > probably influence the scoring if nothing else. > > E.g. if you compare by certain fields, I would expect that a score of > 1.0 indicates a match on all of those fields (haven't tested that > explicitly, though). If the unique ID is included you could never reach > that score. > > Just my 2 cents... > > Chantal > > > On Wed, 2012-02-15 at 07:27 +0100, Jamie Johnson wrote: >> Is there anyway with MLT to say get similar based on all fields or is >> it always a requirement to specify the fields? >
Re: Facet on TrieDateField field without including date
On Wed, Feb 15, 2012 at 9:30 AM, Jamie Johnson wrote: > I think it would if I indexed the time information separately. Which > was my original thought, but I was hoping to store this in one field > instead of 2. So my idea was I'd store the time portion as as a > number (an int might suffice from 0 to 24 since I only need this to > have that level of granularity) then do range queries over that. I > couldn't think of a way to do this using the date field though because > it would give me bins broken up by hours in a particular day, > something like > > 2012-01-01-00:00:00 - 2012-01-01-01:00:00 10 > 2012-01-01-01:00:00 - 2012-01-01-02:00:00 20 > 2012-01-01-02:00:00 - 2012-01-01-03:00:00 5 > > But what I really want is just the time portion across all days > > 00:00:00 - 01:00:00 10 > 01:00:00 - 02:00:00 20 > 02:00:00 - 03:00:00 5 > > I would then use the date field to limit the time range in which the > facet was operating. Does that make sense? Is there a more efficient > way of doing this? Hmm, no there's no way to do this. Even if you were to write a custom faceting component, it seems like it would still be very expensive to derive the hour of the day from ms for every doc. -Yonik lucidimagination.com > On Wed, Feb 15, 2012 at 9:16 AM, Yonik Seeley > wrote: >> On Wed, Feb 15, 2012 at 8:58 AM, Jamie Johnson wrote: >>> I would like to be able to facet based on the time of >>> day items are purchased across a date span. I was hoping that I could >>> do a query of something like date:[NOW-1WEEK TO NOW] and then specify >>> I wanted facet broken into hourly bins. Is this possible? Do I >> >> Will range faceting do everything you need? >> http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range >> >> -Yonik >> lucidimagination.com
Re: Semantic autocomplete with Solr
Thank you! I'll check them out. On Wed, Feb 15, 2012 at 6:50 AM, Jan Høydahl wrote: > Check out > http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/ > You can feed it anything, such as a log of previous searches, or a > pre-computed dictionary of "item" + "color" combinations that exist in your > DB etc. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > > On 14. feb. 2012, at 23:46, Roman Chyla wrote: > > > done something along these lines: > > > > > https://svnweb.cern.ch/trac/rcarepo/wiki/InspireAutoSuggest#Autosuggestautocompletefunctionality > > > > but you would need MontySolr for that - > https://github.com/romanchyla/montysolr > > > > roman > > > > On Tue, Feb 14, 2012 at 11:10 PM, Octavian Covalschi > > wrote: > >> Hey guys, > >> > >> Has anyone done any kind of "smart" autocomplete? Let's say we have a > web > >> store, and we'd like to autocomplete user's searches. So if I'll type in > >> "jacket" next word that will be suggested should be something related to > >> jacket (color, fabric) etc... > >> > >> It seems to me I have to structure this data in a particular way, but > that > >> way I can do without solr, so I was wondering if Solr could help us. > >> > >> Thank you in advance. > >
Re: Facet on TrieDateField field without including date
Use multiple fields and you get what you want. The extra fields are going to cost very little and will have a bit positive impact. On Wed, Feb 15, 2012 at 9:30 AM, Jamie Johnson wrote: > I think it would if I indexed the time information separately. Which > was my original thought, but I was hoping to store this in one field > instead of 2. So my idea was I'd store the time portion as as a > number (an int might suffice from 0 to 24 since I only need this to > have that level of granularity) then do range queries over that. I > couldn't think of a way to do this using the date field though because > it would give me bins broken up by hours in a particular day, > something like > > 2012-01-01-00:00:00 - 2012-01-01-01:00:00 10 > 2012-01-01-01:00:00 - 2012-01-01-02:00:00 20 > 2012-01-01-02:00:00 - 2012-01-01-03:00:00 5 > > But what I really want is just the time portion across all days > > 00:00:00 - 01:00:00 10 > 01:00:00 - 02:00:00 20 > 02:00:00 - 03:00:00 5 > > I would then use the date field to limit the time range in which the > facet was operating. Does that make sense? Is there a more efficient > way of doing this? > > On Wed, Feb 15, 2012 at 9:16 AM, Yonik Seeley > wrote: > > On Wed, Feb 15, 2012 at 8:58 AM, Jamie Johnson > wrote: > >> I would like to be able to facet based on the time of > >> day items are purchased across a date span. I was hoping that I could > >> do a query of something like date:[NOW-1WEEK TO NOW] and then specify > >> I wanted facet broken into hourly bins. Is this possible? Do I > > > > Will range faceting do everything you need? > > http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range > > > > -Yonik > > lucidimagination.com >
Re: Facet on TrieDateField field without including date
Thanks guys that's what I figured, just wanted to make sure I was going down the right path. On Wed, Feb 15, 2012 at 9:55 AM, Ted Dunning wrote: > Use multiple fields and you get what you want. The extra fields are going > to cost very little and will have a bit positive impact. > > On Wed, Feb 15, 2012 at 9:30 AM, Jamie Johnson wrote: > >> I think it would if I indexed the time information separately. Which >> was my original thought, but I was hoping to store this in one field >> instead of 2. So my idea was I'd store the time portion as as a >> number (an int might suffice from 0 to 24 since I only need this to >> have that level of granularity) then do range queries over that. I >> couldn't think of a way to do this using the date field though because >> it would give me bins broken up by hours in a particular day, >> something like >> >> 2012-01-01-00:00:00 - 2012-01-01-01:00:00 10 >> 2012-01-01-01:00:00 - 2012-01-01-02:00:00 20 >> 2012-01-01-02:00:00 - 2012-01-01-03:00:00 5 >> >> But what I really want is just the time portion across all days >> >> 00:00:00 - 01:00:00 10 >> 01:00:00 - 02:00:00 20 >> 02:00:00 - 03:00:00 5 >> >> I would then use the date field to limit the time range in which the >> facet was operating. Does that make sense? Is there a more efficient >> way of doing this? >> >> On Wed, Feb 15, 2012 at 9:16 AM, Yonik Seeley >> wrote: >> > On Wed, Feb 15, 2012 at 8:58 AM, Jamie Johnson >> wrote: >> >> I would like to be able to facet based on the time of >> >> day items are purchased across a date span. I was hoping that I could >> >> do a query of something like date:[NOW-1WEEK TO NOW] and then specify >> >> I wanted facet broken into hourly bins. Is this possible? Do I >> > >> > Will range faceting do everything you need? >> > http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range >> > >> > -Yonik >> > lucidimagination.com >>
Re: Facet on TrieDateField field without including date
I've done something like that by calculating the hours during indexing time (in the script part of the DIH config using java.util.Calendar which gives you all those field values without effort). I've also extracted information on which weekday it is (using the integer constants of Calendar). If you need this only for one timezone it is straight forward but if the queries come from different time zones you'll have to shift appropriately. I found that pre-calculating has the advantage that you end up with very simple data: simple integers. And it makes it quite easy to build more complex queries on that. For example I have created a grid (build from facets) where the columns are the weekdays and the rows are the hours of day. The facets are created using a field containing the combination of weekday and hour of day. Chantal On Wed, 2012-02-15 at 15:49 +0100, Yonik Seeley wrote: > On Wed, Feb 15, 2012 at 9:30 AM, Jamie Johnson wrote: > > I think it would if I indexed the time information separately. Which > > was my original thought, but I was hoping to store this in one field > > instead of 2. So my idea was I'd store the time portion as as a > > number (an int might suffice from 0 to 24 since I only need this to > > have that level of granularity) then do range queries over that. I > > couldn't think of a way to do this using the date field though because > > it would give me bins broken up by hours in a particular day, > > something like > > > > 2012-01-01-00:00:00 - 2012-01-01-01:00:00 10 > > 2012-01-01-01:00:00 - 2012-01-01-02:00:00 20 > > 2012-01-01-02:00:00 - 2012-01-01-03:00:00 5 > > > > But what I really want is just the time portion across all days > > > > 00:00:00 - 01:00:00 10 > > 01:00:00 - 02:00:00 20 > > 02:00:00 - 03:00:00 5 > > > > I would then use the date field to limit the time range in which the > > facet was operating. Does that make sense? Is there a more efficient > > way of doing this? > > Hmm, no there's no way to do this. > Even if you were to write a custom faceting component, it seems like > it would still be very expensive to derive the hour of the day from ms > for every doc. > > -Yonik > lucidimagination.com > > > > > > On Wed, Feb 15, 2012 at 9:16 AM, Yonik Seeley > > wrote: > >> On Wed, Feb 15, 2012 at 8:58 AM, Jamie Johnson wrote: > >>> I would like to be able to facet based on the time of > >>> day items are purchased across a date span. I was hoping that I could > >>> do a query of something like date:[NOW-1WEEK TO NOW] and then specify > >>> I wanted facet broken into hourly bins. Is this possible? Do I > >> > >> Will range faceting do everything you need? > >> http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range > >> > >> -Yonik > >> lucidimagination.com
Solr multiple cores - multiple databases approach
Hello, I have a use where I'm trying to integrate Solr: - 2 databases with the same schema - I want to index multiple enttities from those databases My question is what is the best way of approaching this topic: - should I create a core for each database and inside that core create a document with all information that I need?
Re: Solr multiple cores - multiple databases approach
Hello Radu, > - I want to index multiple enttities from those databases Do you want to combine data of both databases within one document or are you just interested in indexing both databases on their own? If the second applies: You can do it within one core by using a field (i.e. "source") to filter on it or create a core per database which would completely seperate both indizes from eachother. It depends on your usecase and access-patterns. To tell you more, you should provide us more information. Regards, Em Am 15.02.2012 16:23, schrieb Radu Toev: > Hello, > > I have a use where I'm trying to integrate Solr: > - 2 databases with the same schema > - I want to index multiple enttities from those databases > My question is what is the best way of approaching this topic: > - should I create a core for each database and inside that core create a > document with all information that I need? >
Re: Solr soft commit feature
Hi Nagendra, Certainly interesting! Would this work in a Master/slave setup where the reads are from the slaves and all writes are to the master? Regards, Dipti Srivastava On 2/15/12 5:40 AM, "Nagendra Nagarajayya" wrote: > >If you are looking for NRT functionality with Solr 3.5, you may want to >take a look at Solr 3.5 with RankingAlgorithm. This allows you to >add/update documents without a commit while being able to search >concurrently. The add/update performance to add 1m docs is about 5000 >docs in about 498 ms with one concurrent searcher. You can get more >information about Solr 3.5 with RankingAlgorithm from here: > >http://tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x > >Regards, > >- Nagendra Nagarajayya >http://solr-ra.tgels.org >http://rankingalgorithm.tgels.org > >On 2/14/2012 4:41 PM, Dipti Srivastava wrote: >> Hi All, >> Is there a way to soft commit in the current released version of solr >>3.5? >> >> Regards, >> Dipti Srivastava >> >> >> This message is private and confidential. If you have received it in >>error, please notify the sender and remove it from your system. >> >> >> >> > > This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.
Search for hashtags and mentions
Hi, We are using solr version 3.5 to search though Tweets, I am using WordDelimiterFactory with the following setting, to be able to search for @username or #hashtags I saw the following patch but this doesn't seem to be working as I expected, am I missing something? https://issues.apache.org/jira/browse/SOLR-2059 But searching for @username is also returning results for just username or #hashtag is just returning result for hastag. How can I achieve this? Regards, Rohit
problem with accents
Hi, I've got a problem with the configuration of solr. I have defined a new type of data : "text_fr" to use accent like "é à è". I have added this on my fieldtype definition : Everything seems to be ok, data are well added. But when I'm going to this adress : http://localhost:8983/solr/admin to make a research there is a problem. If I search "cherche" and "cherché" the results are differents although they should be the same, isn't it? Thank you guys Romain
Re: problem with accents
Did you specify the correct field with the search? If you just specified entered the word in the search box without the field, the search would be made against your default search field (defined in schema.xml). If you go to the "full interface" link on the admin page, you can then click the debug:enable checkbox which will give you a lot more information about what the parsed query looks like.. Best Erick On Wed, Feb 15, 2012 at 2:12 PM, R M wrote: > > > Hi, > I've got a problem with the configuration of solr. > I have defined a new type of data : "text_fr" to use accent like "é à è". I > have added this on my fieldtype definition : class="solr.ISOLatin1AccentFilterFactory"/> > Everything seems to be ok, data are well added. But when I'm going to this > adress : http://localhost:8983/solr/admin to make a research there is a > problem. > If I search "cherche" and "cherché" the results are differents although they > should be the same, isn't it? > Thank you guys > Romain >
update extracted docs
Hi I have a solr 3.5 database which is populated by using /update/extract (configured pretty much as per the examples) and additional metadata. The uploads are handled by a perl-driven webapp which uses WebService::Solr (which use behind-the-scenes POSTing). That all works fine. When I come to update the metadata associated with the stored docs, again using my perl web app, I find the solr doc (by id), amend or append all the changed metadata and use /update to re-post them. Again that works fine ... but I'm getting nervous because I'm not sure why it works. If I try to update only the changed fields for a single doc, the unchanged fields are removed. Slightly surprising, but if that's what I should expect, it's not difficult to accept. So how come using /update doesn't remove the text content (and the indexing on it) which was originally obtained using /update/extract? And can I depend on it being there in future, after optimization, for example? And if I can't, what is the best technique for updating metadata under these circumstances? Harold Frayman Please consider the environment before printing this email. -- Visit guardian.co.uk - newspaper of the year www.guardian.co.ukwww.observer.co.uk www.guardiannews.com On your mobile, visit m.guardian.co.uk or download the Guardian iPhone app www.guardian.co.uk/iphone To save up to 30% when you subscribe to the Guardian and the Observer visit www.guardian.co.uk/subscriber - This e-mail and all attachments are confidential and may also be privileged. If you are not the named recipient, please notify the sender and delete the e-mail and all attachments immediately. Do not disclose the contents to another person. You may not use the information for any purpose, or store, or copy, it in any way. Guardian News & Media Limited is not liable for any computer viruses or other material transmitted with or as part of this e-mail. You should employ virus checking software. Guardian News & Media Limited A member of Guardian Media Group plc Registered Office PO Box 68164 Kings Place 90 York Way London N1P 2AP Registered in England Number 908396
Re: Search for hashtags and mentions
Do you want to index the hashtags and usernames to different fields? Probably using http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternTokenizerFactory will solve your problem. However I don't fully understand the problem when you search Thanks Emmanuel 2012/2/15 Rohit : > Hi, > > > > We are using solr version 3.5 to search though Tweets, I am using > WordDelimiterFactory with the following setting, to be able to search for > @username or #hashtags > > > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" > preserveOriginal="1" handleAsChar="@#"/> > > > > I saw the following patch but this doesn't seem to be working as I expected, > am I missing something? > > > > https://issues.apache.org/jira/browse/SOLR-2059 > > > > But searching for @username is also returning results for just username or > #hashtag is just returning result for hastag. How can I achieve this? > > > > Regards, > > Rohit >
Re: update extracted docs
Solr or Lucene does not update documents. It deletes the old one and replaces it with a new one when it has the same id. So if you create a document with the changed fields only, and the same id, and upload that one, the old one will be erased and replaced with the new one. So THAT behaviour is expectable. For updating documents you simply add the entire document again with the modified fields, or, if that is an expensive procedure and want to avoid the extraction of the metadata, you can store all the fields and retrieve the full document, create a new document with all the fields, even the not modified ones, and use the /update handler to add it again. Does that answer your question? Thanks Emmanuel 2012/2/15 Harold Frayman : > Hi > > I have a solr 3.5 database which is populated by using /update/extract > (configured pretty much as per the examples) and additional metadata. The > uploads are handled by a perl-driven webapp which uses WebService::Solr > (which use behind-the-scenes POSTing). That all works fine. > > When I come to update the metadata associated with the stored docs, again > using my perl web app, I find the solr doc (by id), amend or append all the > changed metadata and use /update to re-post them. Again that works fine ... > but I'm getting nervous because I'm not sure why it works. > > If I try to update only the changed fields for a single doc, the unchanged > fields are removed. Slightly surprising, but if that's what I should > expect, it's not difficult to accept. > > So how come using /update doesn't remove the text content (and the indexing > on it) which was originally obtained using /update/extract? And can I > depend on it being there in future, after optimization, for example? > > And if I can't, what is the best technique for updating metadata under > these circumstances? > > Harold Frayman > > Please consider the environment before printing this email. > -- > Visit guardian.co.uk - newspaper of the year > > www.guardian.co.uk www.observer.co.uk www.guardiannews.com > > On your mobile, visit m.guardian.co.uk or download the Guardian > iPhone app www.guardian.co.uk/iphone > > To save up to 30% when you subscribe to the Guardian and the Observer > visit www.guardian.co.uk/subscriber > - > This e-mail and all attachments are confidential and may also > be privileged. If you are not the named recipient, please notify > the sender and delete the e-mail and all attachments immediately. > Do not disclose the contents to another person. You may not use > the information for any purpose, or store, or copy, it in any way. > > Guardian News & Media Limited is not liable for any computer > viruses or other material transmitted with or as part of this > e-mail. You should employ virus checking software. > > Guardian News & Media Limited > > A member of Guardian Media Group plc > Registered Office > PO Box 68164 > Kings Place > 90 York Way > London > N1P 2AP > > Registered in England Number 908396
Re: feeding mahout cluster output back to solr
I was looking at this http://java.dzone.com/videos/configuring-mahout-clustering seems like possible but can anyone shed more light, specially on the part of mapping clusters to original docs abhay -- View this message in context: http://lucene.472066.n3.nabble.com/feeding-mahout-cluster-output-back-to-solr-tp3745883p3748349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can I rebuild an index and remove some fields?
I implemented an index shrinker and it works. I reduced my test index from 6.6 GB to 3.6 GB by removing a single shingled field I did not need anymore. I'm actually using Lucene.Net for this project so code is C# using Lucene.Net 2.9.2 API. But basic idea is: Create an IndexReader wrapper that only enumerates the terms you want to keep, and that removes terms from documents when returning documents. Use the SegmentMerger to re-write each segment (where each segment is wrapped by the wrapper class), writing new segment to a new directory. Collect the SegmentInfos and do a commit in order to create a new segments file in new index directory Done - you now have a shrunk index with specified terms removed. Implementation uses separate thread for each segment, so it re-writes them in parallel. Took about 15 minutes to do 770,000 doc index on my macbook. On Tue, Feb 14, 2012 at 10:12 PM, Li Li wrote: > I have roughly read the codes of 4.0 trunk. maybe it's feasible. > SegmentMerger.add(IndexReader) will add to be merged Readers > merge() will call > mergeTerms(segmentWriteState); > mergePerDoc(segmentWriteState); > > mergeTerms() will construct fields from IndexReaders > for(int > readerIndex=0;readerIndex final MergeState.IndexReaderAndLiveDocs r = > mergeState.readers.get(readerIndex); > final Fields f = r.reader.fields(); > final int maxDoc = r.reader.maxDoc(); > if (f != null) { > slices.add(new ReaderUtil.Slice(docBase, maxDoc, readerIndex)); > fields.add(f); > } > docBase += maxDoc; > } > So If you wrapper your IndexReader and override its fields() method, > maybe it will work for merge terms. > > for DocValues, it can also override AtomicReader.docValues(). just > return null for fields you want to remove. maybe it should > traverse CompositeReader's getSequentialSubReaders() and wrapper each > AtomicReader > > other things like term vectors norms are similar. > On Wed, Feb 15, 2012 at 6:30 AM, Robert Stewart wrote: > >> I was thinking if I make a wrapper class that aggregates another >> IndexReader and filter out terms I don't want anymore it might work. And >> then pass that wrapper into SegmentMerger. I think if I filter out terms >> on GetFieldNames(...) and Terms(...) it might work. >> >> Something like: >> >> HashSet ignoredTerms=...; >> >> FilteringIndexReader wrapper=new FilterIndexReader(reader); >> >> SegmentMerger merger=new SegmentMerger(writer); >> >> merger.add(wrapper); >> >> merger.Merge(); >> >> >> >> >> >> On Feb 14, 2012, at 1:49 AM, Li Li wrote: >> >> > for method 2, delete is wrong. we can't delete terms. >> > you also should hack with the tii and tis file. >> > >> > On Tue, Feb 14, 2012 at 2:46 PM, Li Li wrote: >> > >> >> method1, dumping data >> >> for stored fields, you can traverse the whole index and save it to >> >> somewhere else. >> >> for indexed but not stored fields, it may be more difficult. >> >> if the indexed and not stored field is not analyzed(fields such as >> >> id), it's easy to get from FieldCache.StringIndex. >> >> But for analyzed fields, though theoretically it can be restored from >> >> term vector and term position, it's hard to recover from index. >> >> >> >> method 2, hack with metadata >> >> 1. indexed fields >> >> delete by query, e.g. field:* >> >> 2. stored fields >> >> because all fields are stored sequentially. it's not easy to >> delete >> >> some fields. this will not affect search speed. but if you want to get >> >> stored fields, and the useless fields are very long, then it will slow >> >> down. >> >> also it's possible to hack with it. but need more effort to >> >> understand the index file format and traverse the fdt/fdx file. >> >> >> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html >> >> >> >> this will give you some insight. >> >> >> >> >> >> On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart > >wrote: >> >> >> >>> Lets say I have a large index (100M docs, 1TB, split up between 10 >> >>> indexes). And a bunch of the "stored" and "indexed" fields are not >> used in >> >>> search at all. In order to save memory and disk, I'd like to rebuild >> that >> >>> index *without* those fields, but I don't have original documents to >> >>> rebuild entire index with (don't have the full-text anymore, etc.). Is >> >>> there some way to rebuild or optimize an existing index with only a >> sub-set >> >>> of the existing indexed fields? Or alternatively is there a way to >> avoid >> >>> loading some indexed fields at all ( to avoid loading term infos and >> terms >> >>> index ) ? >> >>> >> >>> Thanks >> >>> Bob >> >> >> >> >> >> >> >>
Size of suggest dictionary
Hello, We're building an auto suggest component based on the "label" field of documents. Is there a way to see how many terms are in the dictionary, or how much memory it's taking up? I looked on the statistics page but didn't find anything obvious. Thanks in advance, Mike ps- here's the config: suggestlabel org.apache.solr.spelling.suggest.Suggester org.apache.solr.spelling.suggest.tst.TSTLookup label true true suggestlabel 10 suggestlabel
Date formatting issue
Hi all, here's an interesting one, in my xml imported if I use very simple xpath like this I will get the date properly imported, however if I use this expression for another node which is nested: I will receive this type of exception: java.text.ParseException: Unparseable date: "Tue Aug 16 20:10:23 EDT 2011" I have t use the Xpath above as I have a few of those release date nodes and I need to flatten them so we can look at dates per audience/group I've also run just this /document/audiences/audience/audience_release_date and it works, however I need a more precise result then that since different groups could have different release dates. Any help greatly appreciated, Radek. Radoslaw Zajkowski Senior Developer O° proximity CANADA t: 416-972-1505 ext.7306 c: 647-281-2567 f: 416-944-7886 2011 ADCC Interactive Agency of the Year 2011 Strategy Magazine Digital Agency of the Year http://www.proximityworld.com/ Join us on: Facebook - http://www.facebook.com/ProximityCanada Twitter - http://twitter.com/ProximityWW YouTube - http://www.youtube.com/proximitycanada Please consider the environment before printing this e-mail. This message and any attachments contain information, which may be confidential or privileged. If you are not the intended recipient, please refrain from any disclosure, copying, distribution or use of this information. Please be aware that such actions are prohibited. If you have received this transmission in error, kindly notify us by e-mail to mailto:helpd...@bbdo.com. We appreciate your cooperation.
Re: Size of suggest dictionary
Hello Mike, have a look at Solr's Schema Browser. Click on "FIELDS", select "label" and have a look at the number of distinct (term-)values. Regards, Em Am 15.02.2012 23:07, schrieb Mike Hugo: > Hello, > > We're building an auto suggest component based on the "label" field of > documents. Is there a way to see how many terms are in the dictionary, or > how much memory it's taking up? I looked on the statistics page but didn't > find anything obvious. > > Thanks in advance, > > Mike > > ps- here's the config: > > > > suggestlabel > name="classname">org.apache.solr.spelling.suggest.Suggester > name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup > label > true > > > > class="org.apache.solr.handler.component.SearchHandler"> > > true > suggestlabel > 10 > > > suggestlabel > > >
Re: Query in starting solr 3.5
: WARNING: XML parse warning in "solrres:/dataimport.xml", line 2, column 95: : Include operation failed, reverting to fallback. Resource error reading file : as XML (href='solr/conf/solrconfig_master.xml'). Reason: Can't find resource : 'solr/conf/solrconfig_master.xml' in classpath or : '/solr/apache-solr-3.5.0/example/multicore/core1/conf/', : cwd=/solr/apache-solr-3.5.0/example : : The partial content of dataimport file that I used in solr1.4 is as follows : : http://www.w3.org/2001/XInclude";> I *think* what happened there is that some fixes were made to what path was used for relative includes -- before it was inconsistent and undefined, and now it's a true relative path from where you do the include. so in your case, (i think) it is looking for /solr/apache-solr-3.5.0/example/multicore/core1/conf/solr/conf/solrconfig_master.xml and not finding it -- so just fix the path to be what you actually want it to be realtive to that file (If you look for SOLR-1656 in Solr's CHANGES.txt file it has all the details) : The 3 files given in Fallback tag are present in the location. Does solr 3.5 : support fallback? Can someone please suggest a solution? I think the fallback should be working fine (particularly since they are absolute paths in your case) ... nothing about that error says it's not, it actually says it's using hte fallback because the include itself is failing. (so unless you see a *subsequent* error you are getting the fallbacks) : WARNING: the luceneMatchVersion is not specified, defaulting to LUCENE_24 : emulation. You should at some point declare and reindex to at least 3.0, : because 2.4 emulation is deprecated and will be removed in 4.0. This : parameter will be mandatory in 4.0. : : The solution i got after googling is to apply a patch. Is there any other citation please? where did you read that you need a patch to get rid of that warning? This warning is just letting you know that in the absense of explicit confiugration, it's assuming you want the legacy behavior you would get if you explicitly configured the option with LUCENE_24. if you add this line to your solrconfig.xml... LUCENE_24 ...no behavior will change, and the warning will go away. but as the warning points out, you should give serious consideration (on every upgrade) to wether or not you can re-index after upgrade, and then change it to the current value (LUCENE_35) to eliminate some buggy behavior that is supported for back compat with existing indexes. -Hoss
Re: Language specific tokenizer for purpose of multilingual search in single-core solr,
: I want to do multilingual search in single-core solr. That requires to : define language specific tokenizers in scheme.xml. Say for example, I have : two tokenizers, one for English ("en") and one for simplified Chinese : ("zh-cn"). Can I just put following definitions together in one schema.xml, : and both sets of the files ( stopwords, synonym, and protwords) in one : directory? absolutely. -Hoss
Re: Search for hashtags and mentions
We need the rest of your fieldType, it's quite possible that other parts of it are stripping out the characters in question. Try looking at the admin/analysis page. If that doesn't help, please show us the whole fieldType definition and the results of attaching &debugQuery=on to the URL. Best Erick On Wed, Feb 15, 2012 at 2:04 PM, Rohit wrote: > Hi, > > > > We are using solr version 3.5 to search though Tweets, I am using > WordDelimiterFactory with the following setting, to be able to search for > @username or #hashtags > > > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" > preserveOriginal="1" handleAsChar="@#"/> > > > > I saw the following patch but this doesn't seem to be working as I expected, > am I missing something? > > > > https://issues.apache.org/jira/browse/SOLR-2059 > > > > But searching for @username is also returning results for just username or > #hashtag is just returning result for hastag. How can I achieve this? > > > > Regards, > > Rohit >
Spatial Search and faceting
Hi Solr community, I am doing a spatial search and then do a facet by city. Is it possible to then sort the faceted cities by distance? We would like to display the hits per city, but sort them by distance. Thanks & Regards Ericz q=iphone fq={!bbox} sfield=geopoint pt=49.594857,8.468614 d=50 fl=id,description,city,geopoint facet=true facet.field=city f.city.facet.limit=10 f.city.facet.sort=count //geodist() asc
Re: Search for hashtags and mentions
On Wed, Feb 15, 2012 at 2:04 PM, Rohit wrote: > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" > preserveOriginal="1" handleAsChar="@#"/> There is no such parameter as 'handleAsChar'. If you want to do this, you need to use a custom types file. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory -- lucidimagination.com
Re: Can I rebuild an index and remove some fields?
great. I think you could make it a public tool. maybe others also need such functionality. On Thu, Feb 16, 2012 at 5:31 AM, Robert Stewart wrote: > I implemented an index shrinker and it works. I reduced my test index > from 6.6 GB to 3.6 GB by removing a single shingled field I did not > need anymore. I'm actually using Lucene.Net for this project so code > is C# using Lucene.Net 2.9.2 API. But basic idea is: > > Create an IndexReader wrapper that only enumerates the terms you want > to keep, and that removes terms from documents when returning > documents. > > Use the SegmentMerger to re-write each segment (where each segment is > wrapped by the wrapper class), writing new segment to a new directory. > Collect the SegmentInfos and do a commit in order to create a new > segments file in new index directory > > Done - you now have a shrunk index with specified terms removed. > > Implementation uses separate thread for each segment, so it re-writes > them in parallel. Took about 15 minutes to do 770,000 doc index on my > macbook. > > > On Tue, Feb 14, 2012 at 10:12 PM, Li Li wrote: > > I have roughly read the codes of 4.0 trunk. maybe it's feasible. > >SegmentMerger.add(IndexReader) will add to be merged Readers > >merge() will call > > mergeTerms(segmentWriteState); > > mergePerDoc(segmentWriteState); > > > > mergeTerms() will construct fields from IndexReaders > >for(int > > readerIndex=0;readerIndex > final MergeState.IndexReaderAndLiveDocs r = > > mergeState.readers.get(readerIndex); > > final Fields f = r.reader.fields(); > > final int maxDoc = r.reader.maxDoc(); > > if (f != null) { > >slices.add(new ReaderUtil.Slice(docBase, maxDoc, readerIndex)); > >fields.add(f); > > } > > docBase += maxDoc; > >} > >So If you wrapper your IndexReader and override its fields() method, > > maybe it will work for merge terms. > > > >for DocValues, it can also override AtomicReader.docValues(). just > > return null for fields you want to remove. maybe it should > > traverse CompositeReader's getSequentialSubReaders() and wrapper each > > AtomicReader > > > >other things like term vectors norms are similar. > > On Wed, Feb 15, 2012 at 6:30 AM, Robert Stewart >wrote: > > > >> I was thinking if I make a wrapper class that aggregates another > >> IndexReader and filter out terms I don't want anymore it might work. > And > >> then pass that wrapper into SegmentMerger. I think if I filter out > terms > >> on GetFieldNames(...) and Terms(...) it might work. > >> > >> Something like: > >> > >> HashSet ignoredTerms=...; > >> > >> FilteringIndexReader wrapper=new FilterIndexReader(reader); > >> > >> SegmentMerger merger=new SegmentMerger(writer); > >> > >> merger.add(wrapper); > >> > >> merger.Merge(); > >> > >> > >> > >> > >> > >> On Feb 14, 2012, at 1:49 AM, Li Li wrote: > >> > >> > for method 2, delete is wrong. we can't delete terms. > >> > you also should hack with the tii and tis file. > >> > > >> > On Tue, Feb 14, 2012 at 2:46 PM, Li Li wrote: > >> > > >> >> method1, dumping data > >> >> for stored fields, you can traverse the whole index and save it to > >> >> somewhere else. > >> >> for indexed but not stored fields, it may be more difficult. > >> >>if the indexed and not stored field is not analyzed(fields such as > >> >> id), it's easy to get from FieldCache.StringIndex. > >> >>But for analyzed fields, though theoretically it can be restored > from > >> >> term vector and term position, it's hard to recover from index. > >> >> > >> >> method 2, hack with metadata > >> >> 1. indexed fields > >> >> delete by query, e.g. field:* > >> >> 2. stored fields > >> >> because all fields are stored sequentially. it's not easy to > >> delete > >> >> some fields. this will not affect search speed. but if you want to > get > >> >> stored fields, and the useless fields are very long, then it will > slow > >> >> down. > >> >> also it's possible to hack with it. but need more effort to > >> >> understand the index file format and traverse the fdt/fdx file. > >> >> > >> > http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html > >> >> > >> >> this will give you some insight. > >> >> > >> >> > >> >> On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart < > bstewart...@gmail.com > >> >wrote: > >> >> > >> >>> Lets say I have a large index (100M docs, 1TB, split up between 10 > >> >>> indexes). And a bunch of the "stored" and "indexed" fields are not > >> used in > >> >>> search at all. In order to save memory and disk, I'd like to > rebuild > >> that > >> >>> index *without* those fields, but I don't have original documents to > >> >>> rebuild entire index with (don't have the full-text anymore, etc.). > Is > >> >>> there some way to rebuild or optimize an existing index with only a > >> sub-set > >> >>> of the existing indexed fields? Or alternatively is there a way to > >> avoid > >> >>> loa
Re: Spatial Search and faceting
One way to do it is to group by city and then sort=geodist() asc select?group=true&group.field=city&sort=geodist() desc&rows=10&fl=city It might require 2 calls to SOLR to get it the way you want. On Wed, Feb 15, 2012 at 5:51 PM, Eric Grobler wrote: > Hi Solr community, > > I am doing a spatial search and then do a facet by city. > Is it possible to then sort the faceted cities by distance? > > We would like to display the hits per city, but sort them by distance. > > Thanks & Regards > Ericz > > q=iphone > fq={!bbox} > sfield=geopoint > pt=49.594857,8.468614 > d=50 > fl=id,description,city,geopoint > > facet=true > facet.field=city > f.city.facet.limit=10 > f.city.facet.sort=count //geodist() asc -- Bill Bell billnb...@gmail.com cell 720-256-8076
RE: Search for hashtags and mentions
Go the problem, I need to user "types=" parameter to ignore character like #,@ in WordDelimiterFilterFactory factory. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: 16 February 2012 06:22 To: solr-user@lucene.apache.org Subject: Re: Search for hashtags and mentions On Wed, Feb 15, 2012 at 2:04 PM, Rohit wrote: > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" > preserveOriginal="1" handleAsChar="@#"/> There is no such parameter as 'handleAsChar'. If you want to do this, you need to use a custom types file. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory -- lucidimagination.com
is it possible to run deltaimport command with out delta query?
hi all.. i am new to solr .can any body explain me about the delta-import and delta query and also i have the below questions 1.is it possible to run deltaimport without delataquery? 2. is it possible to write a delta query without having last_modified column in database? if yes pls explain me pls help me anybody thanx in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-possible-to-run-deltaimport-command-with-out-delta-query-tp3749328p3749328.html Sent from the Solr - User mailing list archive at Nabble.com.
Using Solr for a rather busy "Yellow Pages"-type index - good idea or not really?
Hi, all, I'm new here. Used Solr on a couple of projects before, but didn't need to dive deep into anything until now. These days, I'm doing a spike for a "yellow pages" type search server with the following technical requirements: ~10 mln listings in the database. A listing has a name, address, description, coordinates and a number of tags / filtering fields; no more than a kilobyte all told; i.e. theoretically the whole thing should fit in RAM without sharding. A typical query is either "all text matches on name and/or description within a bounded box", or "some combination of tag matches within a bounded box". Bounded boxes are 1 to 50 km wide, and contain up to 10^5 unfiltered listings (the average is more like 10^3). More than 50% of all the listings are in the frequently requested bounding boxes, however a vast majority of listings are almost never displayed (because they don't match the other filters). Data "never changes" (i.e., a daily batch update; rebuild of the entire index and restart of all search servers is feasible, as long as it takes minutes, not hours). This thing ideally should serve up to 10^3 requests per second on a small (as in, "less than 10 commodity boxes") cluster. In other words, a typical request should be CPU bound and take ~100-200 msec to process. Because of coordinates (that are almost never the same), caching of queries makes no sense; from what little I understand about Lucene internals, caching of filters probably doesn't make sense either. After perusing documentation and some googling (but almost no source code exploring yet), I understand how the schema and the queries will look like, and now have to figure out a specific configuration that fits the performance/scalability requirements. Here is what I'm thinking: 1. Search server is an internal service that uses embedded Solr for the indexing part. RAMDirectoryFactory as index storage. 2. All data is in some sort of persistent storage on a file system, and is loaded into the memory when a search server starts up. 3. Data updates are handled as "update the persistent storage, start another cluster, load the world into RAM, flip the load balancer, kill the old cluster" 4. Solr returns IDs with relevance scores; actual presentations of listings (as JSON documents) are constructed outside of Solr and cached in Memcached, as a mostly static content with a few templated bits, like <%=DISTANCE_TO(-123.0123, 45.6789) %>. 5. All Solr caching is switched off. Obviously, we are not the first people to do something like this with Solr, so I'm hoping for some collective wisdom on the following: Does this sounds like a feasible set of requirements in terms of performance and scalability for Solr? Are we on the right path to solving this problem well? If not, what should we be doing instead? What nasty technical/architectural gotchas are we probably missing at this stage? One particular advice I'd be really happy to hear is "you may not need RAMDataFactory if you use instead". Aso, is there a blog, wiki page or a maillist thread where a similar problem is discussed? Yes, we have seen http://www.ibm.com/developerworks/opensource/library/j-spatial, it's a good introduction that is outdated and doesn't go into the nasty bits, anyway. Many thanks in advance, -- Alex Verkhovsky
AW: is it possible to run deltaimport command with out delta query?
Hi, may you have a look at http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport hth, Ramo -Ursprüngliche Nachricht- Von: nagarjuna [mailto:nagarjuna.avul...@gmail.com] Gesendet: Donnerstag, 16. Februar 2012 07:27 An: solr-user@lucene.apache.org Betreff: is it possible to run deltaimport command with out delta query? hi all.. i am new to solr .can any body explain me about the delta-import and delta query and also i have the below questions 1.is it possible to run deltaimport without delataquery? 2. is it possible to write a delta query without having last_modified column in database? if yes pls explain me pls help me anybody thanx in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-possible-to-run-deltaimport-command -with-out-delta-query-tp3749328p3749328.html Sent from the Solr - User mailing list archive at Nabble.com.