Re: solr multicore vs sharding vs 1 big collection

2015-08-03 Thread Bill Bell
Yeah a separate by month or year is good and can really help in this case.

Bill Bell
Sent from mobile


> On Aug 2, 2015, at 5:29 PM, Jay Potharaju  wrote:
> 
> Shawn,
> Thanks for the feedback. I agree that increasing timeout might alleviate
> the timeout issue. The main problem with increasing timeout is the
> detrimental effect it will have on the user experience, therefore can't
> increase it.
> I have looked at the queries that threw errors, next time I try it
> everything seems to work fine. Not sure how to reproduce the error.
> My concern with increasing the memory to 32GB is what happens when the
> index size grows over the next few months.
> One of the other solutions I have been thinking about is to rebuild
> index(weekly) and create a new collection and use it. Are there any good
> references for doing that?
> Thanks
> Jay
> 
>> On Sun, Aug 2, 2015 at 10:19 AM, Shawn Heisey  wrote:
>> 
>>> On 8/2/2015 8:29 AM, Jay Potharaju wrote:
>>> The document contains around 30 fields and have stored set to true for
>>> almost 15 of them. And these stored fields are queried and updated all
>> the
>>> time. You will notice that the deleted documents is almost 30% of the
>>> docs.  And it has stayed around that percent and has not come down.
>>> I did try optimize but that was disruptive as it caused search errors.
>>> I have been playing with merge factor to see if that helps with deleted
>>> documents or not. It is currently set to 5.
>>> 
>>> The server has 24 GB of memory out of which memory consumption is around
>> 23
>>> GB normally and the jvm is set to 6 GB. And have noticed that the
>> available
>>> memory on the server goes to 100 MB at times during a day.
>>> All the updates are run through DIH.
>> 
>> Using all availble memory is completely normal operation for ANY
>> operating system.  If you hold up Windows as an example of one that
>> doesn't ... it lies to you about "available" memory.  All modern
>> operating systems will utilize memory that is not explicitly allocated
>> for the OS disk cache.
>> 
>> The disk cache will instantly give up any of the memory it is using for
>> programs that request it.  Linux doesn't try to hide the disk cache from
>> you, but older versions of Windows do.  In the newer versions of Windows
>> that have the Resource Monitor, you can go there to see the actual
>> memory usage including the cache.
>> 
>>> Every day at least once i see the following error, which result in search
>>> errors on the front end of the site.
>>> 
>>> ERROR org.apache.solr.servlet.SolrDispatchFilter -
>>> null:org.eclipse.jetty.io.EofException
>>> 
>>> From what I have read these are mainly due to timeout and my timeout is
>> set
>>> to 30 seconds and cant set it to a higher number. I was thinking maybe
>> due
>>> to high memory usage, sometimes it leads to bad performance/errors.
>> 
>> Although this error can be caused by timeouts, it has a specific
>> meaning.  It means that the client disconnected before Solr responded to
>> the request, so when Solr tried to respond (through jetty), it found a
>> closed TCP connection.
>> 
>> Client timeouts need to either be completely removed, or set to a value
>> much longer than any request will take.  Five minutes is a good starting
>> value.
>> 
>> If all your client timeout is set to 30 seconds and you are seeing
>> EofExceptions, that means that your requests are taking longer than 30
>> seconds, and you likely have some performance issues.  It's also
>> possible that some of your client timeouts are set a lot shorter than 30
>> seconds.
>> 
>>> My objective is to stop the errors, adding more memory to the server is
>> not
>>> a good scaling strategy. That is why i was thinking maybe there is a
>> issue
>>> with the way things are set up and need to be revisited.
>> 
>> You're right that adding more memory to the servers is not a good
>> scaling strategy for the general case ... but in this situation, I think
>> it might be prudent.  For your index and heap sizes, I would want the
>> company to pay for at least 32GB of RAM.
>> 
>> Having said that ... I've seen Solr installs work well with a LOT less
>> memory than the ideal.  I don't know that adding more memory is
>> necessary, unless your system (CPU, storage, and memory speeds) is
>> particularly slow.  Based on your document count and index size, your
>> documents are quite small

Re: Solr performance is slow with just 1GB of data indexed

2015-08-23 Thread Bill Bell
We use 8gb to 10gb for those size indexes all the time.


Bill Bell
Sent from mobile


> On Aug 23, 2015, at 8:52 AM, Shawn Heisey  wrote:
> 
>> On 8/22/2015 10:28 PM, Zheng Lin Edwin Yeo wrote:
>> Hi Shawn,
>> 
>> Yes, I've increased the heap size to 4GB already, and I'm using a machine
>> with 32GB RAM.
>> 
>> Is it recommended to further increase the heap size to like 8GB or 16GB?
> 
> Probably not, but I know nothing about your data.  How many Solr docs
> were created by indexing 1GB of data?  How much disk space is used by
> your Solr index(es)?
> 
> I know very little about clustering, but it looks like you've gotten a
> reply from Toke, who knows a lot more about that part of the code than I do.
> 
> Thanks,
> Shawn
> 


TimeAllowed bug

2015-08-24 Thread Bill Bell
Weird fq caching bug when using timeAllowed

Find a pwid (in this case YLGVQ)
Run a query w/ a FQ on the pwid and timeAllowed=1.
http://hgsolr2devsl.healthgrades.com:8983/solr/providersearch/select/?q=*:*&wt=json&fl=pwid&fq=pwid:YLGVQ&timeAllowed=1
Ensure #2 returns 0 results
Rerun the query without the timeAllowed param.
http://hgsolr2devsl.healthgrades.com:8983/solr/providersearch/select/?q=*:*&wt=json&fl=pwid&fq=pwid:YLGVQ
Note that after removing the timeAllowed parameter the query is still returning 
0 results.

 Solr seems to be caching the FQ when the timeAllowed parameter is present.


Bill Bell
Sent from mobile



Re: How to boost documents at index time?

2015-03-28 Thread Bill Bell
Issue a Jura ticket ?

Did you try debugQuery ?

Bill Bell
Sent from mobile


> On Mar 28, 2015, at 1:49 AM, CKReddy Bhimavarapu  wrote:
> 
> I am want to boost docs at index time, I am doing this using boost
> parameter in doc field .
> but I can't see direct impact on the  doc by using  debuQuery.
> 
> My question is that is there any other way to boost doc at index time and
> can see the reflected changes i.e direct impact.
> 
> -- 
> ckreddybh. 


Re: ZFS File System for SOLR 3.6 and SOLR 4

2015-03-28 Thread Bill Bell
Is the an advantage for Xfs over ext4 for Solr ? Anyone done testing?

Bill Bell
Sent from mobile


> On Mar 27, 2015, at 8:14 AM, Shawn Heisey  wrote:
> 
>> On 3/27/2015 12:30 AM, abhi Abhishek wrote:
>> i am trying to use ZFS as filesystem for my Linux Environment. are
>> there any performance implications of using any filesystem other than
>> ext-3/ext-4 with SOLR?
> 
> That should work with no problem.
> 
> The only time Solr tends to have problems is if you try to use a network
> filesystem.  As long as it's a local filesystem and it implements
> everything a program can typically expect from a local filesystem, Solr
> should work perfectly.
> 
> Because of the compatibility problems that the license for ZFS has with
> the GPL, ZFS on Linux is probably not as well tested as other
> filesystems like ext4, xfs, or btrfs, but I have not heard about any big
> problems, so it's probably safe.
> 
> Thanks,
> Shawn
> 


Re: Facet

2015-04-05 Thread Bill Bell
Ok

Clarification

The limit is set to -1. But the average result is 300. 

The amount of strings stored in the field increased a lot. Like 250k to 350k. 
But the amount coming out is limited by facet.prefix. 

Would creating 900 fields be better ? Then I could just put the prefix in the 
field name. Like this: proc_ps122

Thoughts ?

So far I heard solcloud, docvalues as viable solutions. Stay away from enum.

Bill Bell
Sent from mobile


> On Apr 5, 2015, at 2:56 AM, Toke Eskildsen  wrote:
> 
> William Bell  wrote:
> Sent: 05 April 2015 06:20
> To: solr-user@lucene.apache.org
> Subject: Facet
> 
>> We increased our number of terms (String) in a facet by 50,000.
> 
> Do you mean facet.limit=5?
> 
>> Now we are getting an error when we facet by this field - so we switched it 
>> to
>> facet.method=enum, and now the results come back. However, when we put
>> it into production we literally hit a wall (CPU went to 100% for 16 cores)
>> after about 30 minutes live.
> 
> It was strange that enum worked. Internally, the difference between 
> facet.limit=100 and facet.limit=5 is quite small. The real hits are for 
> fine-counting within SolrCloud and serializing the result in order to deliver 
> it to the client. I thought enum behaved the same as fc with regard to those 
> two.
> 
>> We tried adding more machines to reduce the CPU, but it did not help.
> 
> Sounds like SolrCloud. More machines does not help here, it might even be 
> worse. What happens is that distributed faceting is two-phase, where the 
> second phase is fine-counting. The fine-counting essentially makes all shards 
> perform micro-searches for a large part of the terms returned: Your shards 
> are bogged down by tens of thousands of small searches.
> 
> If you are feeling adventurous, you can try putting
> http://tokee.github.io/lucene-solr/
> on a test-installation (I am the author). It changes the way the 
> fine-counting is done.
> 
> 
> Depending on your container, you might need to raise the internal limits for 
> GET-communication. Tomcat has a default of 2MB somewhere (sorry, don't 
> remember the details), which is not a lot for 50,000 values.
> 
>> What are some ideas? We are going to try docValues on the field. Does
>> anyone know if method=fc or method=enum works for docValue? I cannot find
>> any documentation on that.
> 
> If DocValues are enabled, fc will use them. It does not change anything for 
> enum. But I would argue against enum for anything in the thousands anyway.
> 
>> We are thinking of splitting the field into 2 fields (fielda, fieldb). At
>> least the number will be less, but not sure if it will help memory?
> 
> The killer is the number of terms requested/returned.
> 
>> The weird thing is for the first 30 minutes things are performing great.
>> Literally at like 10% CPU across 16 cores, not much memory and normal GC.
> 
> It might be because you have just been lucky. Take a look at
> https://twitter.com/anjacks0n/status/509284768035262464
> for how different performance can be for different result set sizes.
> 
>> Originally the facet was a method=fc. Is there an issue with enum? We have
>> facet.threads=20 set, and not sure this is wise for a enum ?
> 
> Facet threading does not thread within each field, it just means that 
> multiple fields are processed in parallel.
> 
> - Toke Eskildsen


Re: Division with Stats Component when Grouping in Solr

2015-06-13 Thread Bill Bell
It would be cool to be able to set 2 group by with facets 

>> GROUP BY
>>site_id, keyword


Bill Bell
Sent from mobile


On Jun 13, 2015, at 2:28 PM, Yonik Seeley  wrote:

>> GROUP BY
>>site_id, keyword


Re: boost results within 250km

2014-04-09 Thread Bill Bell
Just take geodist and use the map function and send to bf or boost 

Bill Bell
Sent from mobile


> On Apr 9, 2014, at 8:26 AM, Erick Erickson  wrote:
> 
> Why do you want to do this? This sounds like an XY problem, you're
> asking how to do something specific without explaining why you care,
> perhaps there are other ways to do this.
> 
> Best,
> Erick
> 
>> On Tue, Apr 8, 2014 at 11:30 PM, Aman Tandon  wrote:
>> How can i gave the more boost to the results within 250km than others
>> without using result filtering.


Re: stucked with log4j configuration

2014-04-12 Thread Bill Bell
Well I hope log4j2 is something Solr supports when GA

Bill Bell
Sent from mobile


> On Apr 12, 2014, at 7:26 AM, Aman Tandon  wrote:
> 
> I have upgraded my solr4.2 to solr 4.7.1 but in my logs there is an error
> for log4j
> 
> log4j: Could not find resource
> 
> Please find the attachment of the screenshot of the error console
> https://drive.google.com/file/d/0B5GzwVkR3aDzdjE1b2tXazdxcGs/edit?usp=sharing
> -- 
> With Regards
> Aman Tandon


Latest jetty

2014-07-26 Thread Bill Bell
Since we are now on latest Java JDK can we move to Jetty 9?

Thoughts ?

Bill Bell
Sent from mobile



Re: SolrCloud Scale Struggle

2014-08-02 Thread Bill Bell
Seems way overkill. Are you using /get at all ? If you need the docs avail 
right away - why ? How about after 30 seconds ? How many docs do you get added 
per second during peak ? Even Google has a delay when you do Adwords. 

One idea is yo have an empty core that you insert into and then shard into the 
queries. So one fire would be called newdocs and then you would add this core 
into your query. There are a couple issues with this with scoring but it works 
nicely. I would not even use Solrcloud for that core.

Try to reduce number of Java running. Reduce memory and use one java per 
machine. 

Then if you need faster avail if docs you really need to ask why. Why not 
later? If it got search or just showing the user the info ? If for showing 
maybe query a not indexes table for the few not yet indexed ?? Or just store in 
a db to show the user the info and index later?

Bill Bell
Sent from mobile


> On Aug 1, 2014, at 4:19 AM, "anand.mahajan"  wrote:
> 
> Hello all,
> 
> Struggling to get this going with SolrCloud - 
> 
> Requirement in brief :
> - Ingest about 4M Used Cars listings a day and track all unique cars for
> changes
> - 4M automated searches a day (during the ingestion phase to check if a doc
> exists in the index (based on values of 4-5 key fields) or it is a new one
> or an updated version)
> - Of the 4 M - About 3M Updates to existing docs (for every non-key value
> change)
> - About 1M inserts a day (I'm assuming these many new listings come in
> every day)
> - Daily Bulk CSV exports of inserts / updates in last 24 hours of various
> snapshots of the data to various clients
> 
> My current deployment : 
> i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines
> - 24 Core + 96 GB RAM each.
> ii)There are over 190M docs in the SolrCloud at the moment (for all
> replicas its consuming overall disk 2340GB which implies - each doc is at
> about 5-8kb in size.)
> iii) The docs are split into 36 Shards - and 3 replica per shard (in all
> 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
> running on each host)
> iv) There are 60 fields per doc and all fields are stored at the moment  :( 
> (The backend is only Solr at the moment)
> v) The current shard/routing key is a combination of Car Year, Make and
> some other car level attributes that help classify the cars
> vi) We are mostly using the default Solr config as of now - no heavy caching
> as the search is pretty random in nature 
> vii) Autocommit is on - with maxDocs = 1
> 
> Current throughput & Issues :
> With the above mentioned deployment the daily throughout is only at about
> 1.5M on average (Inserts + Updates) - falling way short of what is required.
> Search is slow - Some queries take about 15 seconds to return - and since
> insert is dependent on at least one Search that degrades the write
> throughput too. (This is not a Solr issue - but the app demands it so)
> 
> Questions :
> 
> 1. Autocommit with maxDocs = 1 - is that a goof up and could that be slowing
> down indexing? Its a requirement that all docs are available as soon as
> indexed.
> 
> 2. Should I have been better served had I deployed a Single Jetty Solr
> instance per server with multiple cores running inside? The servers do start
> to swap out after a couple of days of Solr uptime - right now we reboot the
> entire cluster every 4 days.
> 
> 3. The routing key is not able to effectively balance the docs on available
> shards - There are a few shards with just about 2M docs - and others over
> 11M docs. Shall I split the larger shards? But I do not have more nodes /
> hardware to allocate to this deployment. In such case would splitting up the
> large shards give better read-write throughput? 
> 
> 4. To remain with the current hardware - would it help if I remove 1 replica
> each from a shard? But that would mean even when just 1 node goes down for a
> shard there would be only 1 live node left that would not serve the write
> requests.
> 
> 5. Also, is there a way to control where the Split Shard replicas would go?
> Is there a pattern / rule that Solr follows when it creates replicas for
> split shards?
> 
> 6. I read somewhere that creating a Core would cost the OS one thread and a
> file handle. Since a core repsents an index in its entirty would it not be
> allocated the configured number of write threads? (The dafault that is 8)
> 
> 7. The Zookeeper cluster is deployed on the same boxes as the Solr instance
> - Would separating the ZK cluster out help?
> 
> Sorry for the long thread _ I thought of asking these all at once rather
> than posting separate ones.
> 
> Thanks,
> Anand
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Scale Struggle

2014-08-02 Thread Bill Bell
Auto correct not good

Corrected below 

Bill Bell
Sent from mobile


> On Aug 2, 2014, at 11:11 AM, Bill Bell  wrote:
> 
> Seems way overkill. Are you using /get at all ? If you need the docs avail 
> right away - why ? How about after 30 seconds ? How many docs do you get 
> added per second during peak ? Even Google has a delay when you do Adwords. 
> 
> One idea is to have an empty core that you insert into and then shard into 
> the queries. So one core would be called newdocs and then you would add this 
> core into your query. There are a couple issues with this with scoring but it 
> works nicely. I would not even use Solrcloud for that core.
> 
> Try to reduce number of Java instances running. Reduce memory and use one 
> java per machine. 
> 
> Then if you need faster avail of docs you really need to ask why. Why not 
> later? Do you need search or just showing the user the info ? If for showing 
> maybe query a indexed table for the few not yet indexed ?? Or just store in a 
> db to show the user the info and index later?
> 
> Bill Bell
> Sent from mobile
> 
> 
>> On Aug 1, 2014, at 4:19 AM, "anand.mahajan"  wrote:
>> 
>> Hello all,
>> 
>> Struggling to get this going with SolrCloud - 
>> 
>> Requirement in brief :
>> - Ingest about 4M Used Cars listings a day and track all unique cars for
>> changes
>> - 4M automated searches a day (during the ingestion phase to check if a doc
>> exists in the index (based on values of 4-5 key fields) or it is a new one
>> or an updated version)
>> - Of the 4 M - About 3M Updates to existing docs (for every non-key value
>> change)
>> - About 1M inserts a day (I'm assuming these many new listings come in
>> every day)
>> - Daily Bulk CSV exports of inserts / updates in last 24 hours of various
>> snapshots of the data to various clients
>> 
>> My current deployment : 
>> i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines
>> - 24 Core + 96 GB RAM each.
>> ii)There are over 190M docs in the SolrCloud at the moment (for all
>> replicas its consuming overall disk 2340GB which implies - each doc is at
>> about 5-8kb in size.)
>> iii) The docs are split into 36 Shards - and 3 replica per shard (in all
>> 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
>> running on each host)
>> iv) There are 60 fields per doc and all fields are stored at the moment  :( 
>> (The backend is only Solr at the moment)
>> v) The current shard/routing key is a combination of Car Year, Make and
>> some other car level attributes that help classify the cars
>> vi) We are mostly using the default Solr config as of now - no heavy caching
>> as the search is pretty random in nature 
>> vii) Autocommit is on - with maxDocs = 1
>> 
>> Current throughput & Issues :
>> With the above mentioned deployment the daily throughout is only at about
>> 1.5M on average (Inserts + Updates) - falling way short of what is required.
>> Search is slow - Some queries take about 15 seconds to return - and since
>> insert is dependent on at least one Search that degrades the write
>> throughput too. (This is not a Solr issue - but the app demands it so)
>> 
>> Questions :
>> 
>> 1. Autocommit with maxDocs = 1 - is that a goof up and could that be slowing
>> down indexing? Its a requirement that all docs are available as soon as
>> indexed.
>> 
>> 2. Should I have been better served had I deployed a Single Jetty Solr
>> instance per server with multiple cores running inside? The servers do start
>> to swap out after a couple of days of Solr uptime - right now we reboot the
>> entire cluster every 4 days.
>> 
>> 3. The routing key is not able to effectively balance the docs on available
>> shards - There are a few shards with just about 2M docs - and others over
>> 11M docs. Shall I split the larger shards? But I do not have more nodes /
>> hardware to allocate to this deployment. In such case would splitting up the
>> large shards give better read-write throughput? 
>> 
>> 4. To remain with the current hardware - would it help if I remove 1 replica
>> each from a shard? But that would mean even when just 1 node goes down for a
>> shard there would be only 1 live node left that would not serve the write
>> requests.
>> 
>> 5. Also, is there a way to control where the Split Shard replicas would go?
>> Is there a pattern / rule that Solr follows when it creates replicas for
>> split shards?
>> 
>> 6. I read somewhere that creating a Core would cost the OS on

DIH

2013-10-15 Thread Bill Bell
We have a custom Field processor in DIH and we are not CPU bound on one core... 
How do we thread it ?? We need to use more cores

The box has 32 cores and 1 is 100% CPU bound.

Ideas ?

Bill Bell
Sent from mobile



Re: DIH

2013-10-15 Thread Bill Bell
We are NOW CPU bound Thoughts ???

Bill Bell
Sent from mobile


> On Oct 15, 2013, at 8:49 PM, Bill Bell  wrote:
> 
> We have a custom Field processor in DIH and we are not CPU bound on one 
> core... How do we thread it ?? We need to use more cores
> 
> The box has 32 cores and 1 is 100% CPU bound.
> 
> Ideas ?
> 
> Bill Bell
> Sent from mobile
> 


Re: Skipping caches on a /select

2013-10-17 Thread Bill Bell
But global on a qt would be awesome !!!

Bill Bell
Sent from mobile


> On Oct 17, 2013, at 2:43 PM, Yonik Seeley  wrote:
> 
> There isn't a global  "cache=false"... it's a local param that can be
> applied to any "fq" or "q" parameter independently.
> 
> -Yonik
> 
> 
>> On Thu, Oct 17, 2013 at 4:39 PM, Tim Vaillancourt  
>> wrote:
>> Thanks Yonik,
>> 
>> Does "cache=false" apply to all caches? The docs make it sound like it is
>> for filterCache only, but I could be misunderstanding.
>> 
>> When I force a commit and perform a /select a query many times with
>> "cache=false", I notice my query gets cached still, my guess is in the
>> queryResultCache. At first the query takes 500ms+, then all subsequent
>> requests take 0-1ms. I'll confirm this queryResultCache assumption today.
>> 
>> Cheers,
>> 
>> Tim
>> 
>> 
>>> On 16/10/13 06:33 PM, Yonik Seeley wrote:
>>> 
>>> On Wed, Oct 16, 2013 at 6:18 PM, Tim Vaillancourt
>>> wrote:
>>>> 
>>>> I am debugging some /select queries on my Solr tier and would like to see
>>>> if there is a way to tell Solr to skip the caches on a given /select
>>>> query
>>>> if it happens to ALREADY be in the cache. Live queries are being inserted
>>>> and read from the caches, but I want my debug queries to bypass the cache
>>>> entirely.
>>>> 
>>>> I do know about the "cache=false" param (that causes the results of a
>>>> select to not be INSERTED in to the cache), but what I am looking for
>>>> instead is a way to tell Solr to not read the cache at all, even if there
>>>> actually is a cached result for my query.
>>> 
>>> Yeah, cache=false for "q" or "fq" should already not use the cache at
>>> all (read or write).
>>> 
>>> -Yonik


Re: Spatial Distance Range

2013-10-22 Thread Bill Bell
Yes frange works 

Bill Bell
Sent from mobile


> On Oct 22, 2013, at 8:17 AM, Eric Grobler  wrote:
> 
> Hi Everyone,
> 
> Normally one would search for documents where the location is within a
> specified distance, for example widthin 5 km:
> fq={!geofilt pt=45.15,-93.85 sfield=store
> d=5}<http://localhost:8983/solr/select?wt=json&indent=true&fl=name,store&q=*:*&fq=%7B!geofilt%20pt=45.15,-93.85%20sfield=store%20d=5%7D>
> 
> It there a way to specify a range between 10 and 20 km?
> Something like:
> fq={!geofilt pt=45.15,-93.85 sfield=store distancefrom=10
> distanceupto=20}<http://localhost:8983/solr/select?wt=json&indent=true&fl=name,store&q=*:*&fq=%7B!geofilt%20pt=45.15,-93.85%20sfield=store%20d=5%7D>
> 
> Thanks
> Ericz


Re: Solr - what's the next big thing?

2013-10-26 Thread Bill Bell
Full JSON support deep complex object indexing and search Game changer 

Bill Bell
Sent from mobile


> On Oct 26, 2013, at 1:04 PM, Otis Gospodnetic  
> wrote:
> 
> Hi,
> 
>> On Sat, Oct 26, 2013 at 5:58 AM, Saar Carmi  wrote:
>> LOL,  Jack.  I can imagine Otis saying that.
> 
> Funny indeed, but not really.
> 
>> Otis,  with these marriage,  are we going to see map reduce based queries?
> 
> Can you please describe what you mean by that?  Maybe with an example.
> 
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> 
>>> On Oct 25, 2013 10:03 PM, "Jack Krupansky"  wrote:
>>> 
>>> But a lot of that big yellow elephant stuff is in 4.x anyway.
>>> 
>>> (Otis: I was afraid that you were going to say that the next big thing in
>>> Solr is... Elasticsearch!)
>>> 
>>> -- Jack Krupansky
>>> 
>>> -Original Message- From: Otis Gospodnetic
>>> Sent: Friday, October 25, 2013 2:43 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Solr - what's the next big thing?
>>> 
>>> Saar,
>>> 
>>> The marriage with the big yellow elephant is a big deal. It changes the
>>> scale.
>>> 
>>> Otis
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>> On Oct 25, 2013 5:32 AM, "Saar Carmi"  wrote:
>>> 
>>> If I am not mistaken the most impressive improvement of Solr 4.0 compared
>>>> to previous versions was the Solr Cloud architecture.
>>>> 
>>>> What would be the next big thing in Solr 5.0 ?
>>>> 
>>>> Saar
>>> 


Re: Proposal for new feature, cold replicas, brainstorming

2013-10-27 Thread Bill Bell
Yeah replicate to a DR site would be good too. 

Bill Bell
Sent from mobile


> On Oct 24, 2013, at 6:27 AM, yriveiro  wrote:
> 
> I'm wondering some time ago if it's possible have replicas of a shard
> synchronized but in an state that they can't accept queries only updates. 
> 
> This replica in "replication" mode only awake to accept queries if it's the
> last alive replica and goes to replication mode when other replica becomes
> alive and synchronized.
> 
> The motivation of this is simple, I want have replication but I don't want
> have n replicas actives with full resources allocated (cache and so on).
> This is usefull in enviroments where replication is needed but a high query
> throughput is not fundamental and the resources are limited.
> 
> I know that right now is not possible, but I think that it's a feature that
> can be implemented in a easy way creating a new status for shards.
> 
> The bottom line question is, I'm the only one with this kind of
> requeriments? Does it make sense one functionality like this?
> 
> 
> 
> -
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Proposal-for-new-feature-cold-replicas-brainstorming-tp4097501.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance of "rows" and "start" parameters

2013-11-04 Thread Bill Bell
Do you want to look thru then all ? Have you considered Lucene API? Not sure if 
that is better but it might be.

Bill Bell
Sent from mobile


> On Nov 4, 2013, at 6:43 AM, "michael.boom"  wrote:
> 
> I saw that some time ago there was a JIRA ticket dicussing this, but still i
> found no relevant information on how to deal with it.
> 
> When working with big nr of docs (e.g. 70M) in my case, I'm using
> start=0&rows=30 in my requests.
> For the first req the query time is ok, the next one is visibily slower, the
> third even more slow and so on until i get some huge query times of up
> 140secs, after a few hundreds requests. My test were done with SolrMeter at
> a rate of 1000qpm. Same thing happens at 100qpm, tough.
> 
> Is there a best practice on how to do in this situation, or maybe an
> explanation why is the query time increasing, from request to request ?
> 
> Thanks!
> 
> 
> 
> -
> Thanks,
> Michael
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performance-of-rows-and-start-parameters-tp4099194.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Core admin: create new core

2013-11-04 Thread Bill Bell
You could pre create a bunch of directories and base configs. Create as needed. 
Then use schema less API to set it up ... Or make changes in a script and 
reload the core..

Bill Bell
Sent from mobile


> On Nov 4, 2013, at 6:06 AM, Erick Erickson  wrote:
> 
> Right, this has been an issue for a while, there's no current
> way to do this.
> 
> Someday, I'll be able to work on SOLR-4779 which should
> go some toward making this work more easily. It's still not
> exactly what you're looking for, but it might work.
> 
> Of course with SolrCloud you can specify a configuration
> set that is used for multiple collections.
> 
> People are using Puppet or similar to automate this over
> large numbers of nodes, but that's not entirely satisfactory
> either in our case I suspect.
> 
> FWIW,
> Erick
> 
> 
>> On Mon, Nov 4, 2013 at 4:00 AM, Bram Van Dam  wrote:
>> 
>> The core admin CREATE function requires that the new instance dir and
>> schema/config exist already. Is there a particular reason for this? It
>> would be incredible convenient if I could create a core with a new schema
>> and new config simply by calling CREATE (maybe providing the contents of
>> config.xml and schema.xml as base64 encoded strings in HTTP POST or
>> something?).
>> 
>> I'm guessing this isn't currently possible?
>> 
>> Ta,
>> 
>> - bram
>> 


Re: Jetty 9?

2013-11-07 Thread Bill Bell
So no Jetty 9 until Solr 5? Java 7 is at rel 40 Is that our commitment to 
not require Java 7 until Solr 5? 

Most people are probably already on Java 7...

Bill Bell
Sent from mobile


> On Nov 7, 2013, at 1:29 AM, Furkan KAMACI  wrote:
> 
> Here is an issue points to that:
> https://issues.apache.org/jira/browse/SOLR-4839
> 
> 
> 2013/11/7 William Bell 
> 
>> When are we moving Solr to Jetty 9?
>> 
>> --
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076
>> 


Re: How to work with remote solr savely?

2013-11-22 Thread Bill Bell
Do you have a sample jetty XML to setup basic auth for updates in Solr?

Sent from my iPad

> On Nov 22, 2013, at 7:34 AM, "michael.boom"  wrote:
> 
> Use HTTP basic authentication, setup in your servlet container
> (jetty/tomcat).
> 
> That should work fine if you are *not* using SolrCloud.
> 
> 
> 
> -
> Thanks,
> Michael
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-tp4102612p4102613.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: useColdSearcher in SolrCloud config

2013-11-22 Thread Bill Bell
Wouldn't that be true means use cold searcher? It seems backwards to me...

Sent from my iPad

> On Nov 22, 2013, at 2:44 AM, ade-b  wrote:
> 
> Hi
> 
> The definition of useColdSearcher config element in solrconfig.xml is
> 
> "If a search request comes in and there is no current registered searcher,
> then immediately register the still warming searcher and use it.  If "false"
> then all requests will block until the first searcher is done warming".
> 
> By the term 'block', I assume SOLR returns a non 200 response to requests.
> Does anybody know the exact response code returned when the server is
> blocking requests?
> 
> If a new SOLR server is introduced into an existing array of SOLR servers
> (in SOLR Cloud setup), it will sync it's index from the leader. To save you
> having to specify warm-up queries in the solrconfig.xml file for first
> searchers, would/could the new server not auto warm it's caches from the
> caches of an existing server?
> 
> Thanks
> Ade 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/useColdSearcher-in-SolrCloud-config-tp4102569.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: NullPointerException

2013-11-22 Thread Bill Bell
It seems to be a modified row and referenced in EvaluatorBag.

I am not familiar with either.

Sent from my iPad

> On Nov 22, 2013, at 3:05 AM, Adrien RUFFIE  wrote:
> 
> Hello all,
> 
> I have perform a full indexation with solr, but when I try to perform an 
> incrementation indexation I get the following exception (cf attachment).
> 
> Any one have a idea of the problem ?
> 
> Greate thank
> 


Re: Reverse mm(min-should-match)

2013-11-22 Thread Bill Bell
This is an awesome idea!

Sent from my iPad

> On Nov 22, 2013, at 12:54 PM, Doug Turnbull 
>  wrote:
> 
> Instead of specifying a percentage or number of query terms must match
> tokens in a field, I'd like to do the opposite -- specify how much of a
> field must match a query.
> 
> The problem I'm trying to solve is to boost document titles that closely
> match the query string. If a title looks something like
> 
> *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
> 
> I want to be able to specify how much of the field must match the query
> string. This differs from normal mm. Normal mm specifies a how much of the
> query must match a field.
> 
> As an example, with this title, if I use normal mm=100% and perform the
> following query:
> 
> mm=100%
> q=solr
> 
> This will match the title above, as 100% of [solr] matches the field
> 
> What I really want to get at is a reverse mm:
> 
> Rmm=100%
> q=solr
> 
> The title above will not match in this case. Only 1/6 of the tokens in the
> field match the query.
> 
> However an exact search would match:
> 
> Rmm=100%
> q=solr the worlds greatest search engine
> 
> Here 100% of the query matches the title, so I'm good.
> 
> Is there any way to achieve this in Solr?
> 
> -- 
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections 


Re: Call to Solr via TCP

2013-12-10 Thread Bill Bell
Yeah open socket to port and send correct Get syntax and Solr will respond with 
results...



Bill Bell
Sent from mobile


> On Dec 10, 2013, at 2:50 PM, Doug Turnbull 
>  wrote:
> 
> Zwer, is there a reason you need to do this? Its probably very hard to
> get solr to speak TCP. But if you're having a performance or
> infrastructure problem, the group might be able to help you with a far
> simpler solution.
> 
> Sent from my Windows Phone From: Zwer
> Sent: 12/10/2013 12:15 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Call to Solr via TCP
> Maybe I asked incorrectly.
> 
> 
> Solr is Web Application, hosted by some servlet container and is reachable
> via HTTP.
> 
> HTTP is an extension of TCP and I would like to know whether exists some
> lower way to communicate with application (i.e. Solr) hosted by Jetty?
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Call-to-Solr-via-TCP-tp4105932p4105935.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Status if 4.6.1?

2014-01-18 Thread Bill Bell
We just need the bug fix for Solr.xml 

https://issues.apache.org/jira/browse/SOLR-5543

Bill Bell
Sent from mobile



Re: Luke 4.6.1 released

2014-02-16 Thread Bill Bell
Yes it works with Solr 

Bill Bell
Sent from mobile


> On Feb 16, 2014, at 3:38 PM, Alexandre Rafalovitch  wrote:
> 
> Does it work with Solr? I couldn't tell what the description was from
> this repo and it's Solr relevance.
> 
> I am sure all the long timers know, but for more recent Solr people,
> the additional information would be useful.
> 
> Regards,
>   Alex.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> 
> 
>> On Mon, Feb 17, 2014 at 3:02 AM, Dmitry Kan  wrote:
>> Hello!
>> 
>> Luke 4.6.1 has been just released. Grab it here:
>> 
>> https://github.com/DmitryKey/luke/releases/tag/4.6.1
>> 
>> fixes:
>> loading the jar from command line is now working fine.
>> 
>> --
>> Dmitry Kan
>> Blog: http://dmitrykan.blogspot.com
>> Twitter: twitter.com/dmitrykan


Re: embedded documents

2014-08-24 Thread Bill Bell
See my Jira. It supports it via json.fsuffix=_json&wt=json

http://mail-archives.apache.org/mod_mbox/lucene-dev/201304.mbox/%3CJIRA.12641293.1365394604231.125944.1365397875874@arcas%3E

Bill Bell
Sent from mobile


> On Aug 24, 2014, at 6:43 AM, "Jack Krupansky"  wrote:
> 
> Indexing and query of raw JSON would be a valuable addition to Solr, so maybe 
> you could simply explain more precisely your data model and transformation 
> rules. For example, when multi-level nesting occurs, what does your loader do?
> 
> Maybe if the fielld names were derived by concatenating the full path of JSON 
> key names, like titles_json.FR, field_naming nesting could be handled in a 
> fully automated manner.
> 
> I had been thinking of filing a Jira proposing exactly that, so that even the 
> most deeply nested JSON maps could be supported, although combinations of 
> arrays and maps would be problematic.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Michael Pitsounis
> Sent: Wednesday, August 20, 2014 7:14 PM
> To: solr-user@lucene.apache.org
> Subject: embedded documents
> 
> Hello everybody,
> 
> I had a requirement to store complicated json documents in solr.
> 
> i have modified the JsonLoader to accept complicated json documents with
> arrays/objects as values.
> 
> It stores the object/array and then flatten it and  indexes the fields.
> 
> e.g  basic example document
> 
> {
>   "titles_json":{"FR":"This is the FR title" , "EN":"This is the EN
> title"} ,
>   "id": 103,
>   "guid": "3b2f2998-85ac-4a4e-8867-beb551c0b3c6"
>  }
> 
> It will store titles_json:{"FR":"This is the FR title" , "EN":"This is the
> EN title"}
> and then index fields
> 
> titles.FR:"This is the FR title"
> titles.EN:"This is the EN title"
> 
> 
> Do you see any problems with this approach?
> 
> 
> 
> Regards,
> Michael Pitsounis 


Re: How to solve?

2014-09-06 Thread Bill Bell
Yeah we already use it. I will try to create a custom functionif I get it 
to work I will post.

The challenge for me is how to dynamically match and add them based in the 
faceting.

Here is a better example.

The doctor core has payload as name:val. The "name" are doctor specialties. I 
need to pull back by the name since the user faceted on a specialty. So far 
payloads work. But the user now wants to facet on another specialty. For 
example they are looking for a cardiologist and an internal medicine doctor and 
if the doctor practices at the same hospital I need to take the values and add 
them. Else take the max value for the 2 specialties. 

Make sense now ?

Seems like I need to create a payload and my own custom function.

Bill Bell
Sent from mobile


> On Sep 6, 2014, at 12:57 PM, Erick Erickson  wrote:
> 
> Here's a blog with an end-to-end example. Jack's right, it takes some
> configuration and having first-class support in Solr would be a good
> thing...
> 
> http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/
> 
> Best,
> Erick
> 
>> On Sat, Sep 6, 2014 at 10:24 AM, Jack Krupansky  
>> wrote:
>> Payload really don't have first class support in Solr. It's a solid feature
>> of Lucene, but never expressed well in Solr. Any thoughts or proposals are
>> welcome!
>> 
>> (Hmmm... I wonder what the good folks at Heliosearch have up their sleeves
>> in this area?!)
>> 
>> -- Jack Krupansky
>> 
>> -Original Message- From: William Bell
>> Sent: Friday, September 5, 2014 10:03 PM
>> To: solr-user@lucene.apache.org
>> Subject: How to solve?
>> 
>> 
>> We have a core with each document as a person.
>> 
>> We want to boost based on the sweater color, but if the person has sweaters
>> in their closet which are the same manufactuer we want to boost even more
>> by adding them together.
>> 
>> Peter Smit - Sweater: Blue = 1 : Nike, Sweater: Red = 2: Nike, Sweater:
>> Blue=1 : Polo
>> Tony S - Sweater: Red =2: Nike
>> Bill O - Sweater:Red = 2: Polo, Blue=1: Polo
>> 
>> Scores:
>> 
>> Peter Smit - 1+2 = 3.
>> Tony S - 2
>> Bill O - 2 + 1
>> 
>> I thought about using payloads.
>> 
>> sweaters_payload
>> Blue: Nike: 1
>> Red: Nike: 2
>> Blue: Polo: 1
>> 
>> How do I query this?
>> 
>> http://localhost:8983/solr/persons?q=*:*&sort=??
>> 
>> Ideas?
>> 
>> 
>> 
>> 
>> --
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076


Re: Solr Dynamic Field Performance

2014-09-14 Thread Bill Bell
How about perf if you dynamically create 5000 fields ?

Bill Bell
Sent from mobile


> On Sep 14, 2014, at 10:06 AM, Erick Erickson  wrote:
> 
> Dynamic fields, once they are actually _in_ a document, aren't any
> different than statically defined fields. Literally, there's no place
> in the search code that I know of that _ever_ has to check
> whether a field was dynamically or statically defined.
> 
> AFAIK, the only additional cost would be figuring out which pattern
> matched at index time, which is such a tiny portion of the cost of
> indexing that I doubt you could measure it.
> 
> Best,
> Erick
> 
> On Sun, Sep 14, 2014 at 7:58 AM, Saumitra Srivastav
>  wrote:
>> I have a collection with 200 fields and >300M docs running in cloud mode.
>> Each doc have around 20 fields. I now have a use case where I need to
>> replace these explicit fields with 6 dynamic fields. Each of these 200
>> fields will match one of the 6 dynamic field.
>> 
>> I am evaluating performance implications of switching to dynamicFields. I
>> have tested with a smaller dataset(5M docs) but didn't noticed any indexing
>> or query performance degradation.
>> 
>> Query on dynamic fields will either be faceting, range query or full text
>> search.
>> 
>> Are there any known performance issues with using dynamicFields instead of
>> explicit ones?
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Solr-Dynamic-Field-Performance-tp4158737.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Old facet value doesn't go away after index update

2014-12-19 Thread Bill Bell
Set mincount=1

Bill Bell
Sent from mobile


> On Dec 19, 2014, at 12:22 PM, Tang, Rebecca  wrote:
> 
> Hi there,
> 
> I have an index that has a field called collection_facet.
> 
> There was a value 'Ness Motley Law Firm Documents' that we wanted to update 
> to 'Ness Motley Law Firm'.  There were 36,132 records with this value.  So I 
> re-indexed just the 36,132 records.  After the update, I ran a facet query 
> (q=*:*&facet=true&facet.field=collection_facet) to see if the value got 
> updated and I saw
> Ness Motley Law Firm 36,132  -- as expected
> Ness Motley Law Firm Documents 0 — Why is this value still here even though 
> clearly there are no records with this value anymore?  I thought maybe it was 
> cached, so I restarted solr, but I still got the same results.
> 
> "facet_fields": { "collection_facet": [
> … "Ness Motley Law Firm", 36132,
> … "Ness Motley Law Firm Documents", 0 ]
> 
> 
> 
> Rebecca Tang
> Applications Developer, UCSF CKM
> Legacy Tobacco Document Library
> E: rebecca.t...@ucsf.edu


Re: How large is your solr index?

2015-01-03 Thread Bill Bell
For Solr 5 why don't we switch it to 64 bit ??

Bill Bell
Sent from mobile


> On Dec 29, 2014, at 1:53 PM, Jack Krupansky  wrote:
> 
> And that Lucene index document limit includes deleted and updated
> documents, so even if your actual document count stays under 2^31-1,
> deleting and updating documents can push the apparent document count over
> the limit unless you very aggressively merge segments to expunge deleted
> documents.
> 
> -- Jack Krupansky
> 
> -- Jack Krupansky
> 
> On Mon, Dec 29, 2014 at 12:54 PM, Erick Erickson 
> wrote:
> 
>> When you say 2B docs on a single Solr instance, are you talking only one
>> shard?
>> Because if you are, you're very close to the absolute upper limit of a
>> shard, internally
>> the doc id is an int or 2^31. 2^31 + 1 will cause all sorts of problems.
>> 
>> But yeah, your 100B documents are going to use up a lot of servers...
>> 
>> Best,
>> Erick
>> 
>> On Mon, Dec 29, 2014 at 7:24 AM, Bram Van Dam 
>> wrote:
>>> Hi folks,
>>> 
>>> I'm trying to get a feel of how large Solr can grow without slowing down
>> too
>>> much. We're looking into a use-case with up to 100 billion documents
>>> (SolrCloud), and we're a little afraid that we'll end up requiring 100
>>> servers to pull it off.
>>> 
>>> The largest index we currently have is ~2billion documents in a single
>> Solr
>>> instance. Documents are smallish (5k each) and we have ~50 fields in the
>>> schema, with an index size of about 2TB. Performance is mostly OK. Cold
>>> searchers take a while, but most queries are alright after warming up. I
>>> wish I could provide more statistics, but I only have very limited
>> access to
>>> the data (...banks...).
>>> 
>>> I'd very grateful to anyone sharing statistics, especially on the larger
>> end
>>> of the spectrum -- with or without SolrCloud.
>>> 
>>> Thanks,
>>> 
>>> - Bram
>> 


Re: Collations are not working fine.

2015-02-09 Thread Bill Bell
Can you order the collation a by highest to lowest hits ?

Bill Bell
Sent from mobile


> On Feb 9, 2015, at 6:47 AM, Nitin Solanki  wrote:
> 
> I am working on spell checking in Solr. I have implemented Suggestions and
> collations in my spell checker component.
> 
> Most of the time collations work fine but in few case it fails.
> 
> *Working*:
> I tried query:*gone wthh thes wnd*: In this "wnd" doesn't give suggestion
> "wind" but collation is coming right = "gone with the wind", hits = 117
> 
> 
> *Not working:*
> But when I tried query: *gone wthh thes wint*: In this "wint" does give
> suggestion "wind" but collation is not coming right. Instead of gone with
> the wind it gives gone with the west, hits = 1.
> 
> And I want to also know what is *hits* in collations.


Re: Sort on multivalued attributes

2015-02-09 Thread Bill Bell
Definitely needed !!

Bill Bell
Sent from mobile


> On Feb 9, 2015, at 5:51 AM, Jan Høydahl  wrote:
> 
> Sure, vote for it. Number of votes do not directly make prioritized sooner.
> So you better also add a comment to the JIRA, it will raise committer's 
> attention.
> Even better of course is if you are able to help bring the issue forward by 
> submitting patches.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 9. feb. 2015 kl. 12.15 skrev Flavio Pompermaier :
>> 
>> Do I have to vote for it..?
>> 
>>> On Mon, Feb 9, 2015 at 11:50 AM, Jan Høydahl  wrote:
>>> 
>>> See https://issues.apache.org/jira/browse/SOLR-2522
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>>>> 9. feb. 2015 kl. 10.30 skrev Flavio Pompermaier :
>>>> 
>>>> In my use case it could be very helpful because I use the SIREn plugin to
>>>> index arbitrary JSON-LD and this plugin automatically index also all
>>> nested
>>>> attributes as a Solr field.
>>>> Thus I need for example to gather all entries with a certain value of the
>>>> "type" attribute, ordered by "name" (but name could be a multivalued
>>>> attribute in my use case :( )
>>>> I'd like to avoid to switch to Elasticsearch just to have this single
>>>> feature.
>>>> 
>>>> Thanks for the support,
>>>> Flavio
>>>> 
>>>> On Mon, Feb 9, 2015 at 10:02 AM, Anshum Gupta 
>>>> wrote:
>>>> 
>>>>> Sure, that's correct and makes sense in some use cases. I'll need to
>>> check
>>>>> if Solr functions support such a thing.
>>>>> 
>>>>> On Mon, Feb 9, 2015 at 12:47 AM, Flavio Pompermaier <
>>> pomperma...@okkam.it>
>>>>> wrote:
>>>>> 
>>>>>> I saw that this is possible in Lucene (
>>>>>> https://issues.apache.org/jira/browse/LUCENE-5454) and also in
>>>>>> Elasticsearch. Or am I wrong?
>>>>>> 
>>>>>> On Mon, Feb 9, 2015 at 9:05 AM, Anshum Gupta 
>>>>>> wrote:
>>>>>> 
>>>>>>> Unless I'm missing something here, sorting on a multi-valued field
>>>>> would
>>>>>> be
>>>>>>> non-deterministic in nature.
>>>>>>> 
>>>>>>> On Sun, Feb 8, 2015 at 11:59 PM, Flavio Pompermaier <
>>>>>> pomperma...@okkam.it>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi to all,
>>>>>>>> 
>>>>>>>> Is there any possibility that in the near future Solr could support
>>>>>>> sorting
>>>>>>>> on multivalued fields?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Flavio
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Anshum Gupta
>>>>>>> http://about.me/anshumgupta
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Anshum Gupta
>>>>> http://about.me/anshumgupta
> 


Re: Performance question on Spatial Search

2013-07-29 Thread Bill Bell
Can you compare with the old geo handler as a baseline. ?

Bill Bell
Sent from mobile


On Jul 29, 2013, at 4:25 PM, Erick Erickson  wrote:

> This is very strange. I'd expect slow queries on
> the first few queries while these caches were
> warmed, but after that I'd expect things to
> be quite fast.
> 
> For a 12G index and 256G RAM, you have on the
> surface a LOT of hardware to throw at this problem.
> You can _try_ giving the JVM, say, 18G but that
> really shouldn't be a big issue, your index files
> should be MMaped.
> 
> Let's try the crude thing first and give the JVM
> more memory.
> 
> FWIW
> Erick
> 
> On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower  wrote:
>> I've been doing some performance analysis of a spacial search use case I'm
>> implementing in Solr 4.3.0. Basically I'm seeing search times alot higher
>> than I'd like them to be and I'm hoping people may have some suggestions
>> for how to optimize further.
>> 
>> Here are the specs of what I'm doing now:
>> 
>> Machine:
>> - 16 cores @ 2.8ghz
>> - 256gb RAM
>> - 1TB (RAID 1+0 on 10 SSD)
>> 
>> Content:
>> - 45M docs (not very big only a few fields with no large textual content)
>> - 1 geo field (using config below)
>> - index is 12gb
>> - 1 shard
>> - Using MMapDirectory
>> 
>> Field config:
>> 
>> > distErrPct="0.025" maxDistErr="0.00045"
>> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>> units="degrees"/>
>> 
>> > required="false" stored="true" type="geo"/>
>> 
>> 
>> What I've figured out so far:
>> 
>> - Most of my time (98%) is being spent in
>> java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
>> driven by BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
>> which from what I gather is basically reading terms from the .tim file
>> in blocks
>> 
>> - I moved from Java 1.6 to 1.7 based upon what I read here:
>> http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
>> and it definitely had some positive impact (i haven't been able to
>> measure this independantly yet)
>> 
>> - I changed maxDistErr from 0.09 (which is 1m precision per docs)
>> to 0.00045 (50m precision) ..
>> 
>> - It looks to me that the .tim file are being memory mapped fully (ie
>> they show up in pmap output) the virtual size of the jvm is ~18gb
>> (heap is 6gb)
>> 
>> - I've optimized the index but this doesn't have a dramatic impact on
>> performance
>> 
>> Changing the precision and the JVM upgrade yielded a drop from ~18s
>> avg query time to ~9s avg query time.. This is fantastic but I want to
>> get this down into the 1-2 second range.
>> 
>> At this point it seems that basically i am bottle-necked on basically
>> copying memory out of the mapped .tim file which leads me to think
>> that the only solution to my problem would be to read less data or
>> somehow read it more efficiently..
>> 
>> If anyone has any suggestions of where to go with this I'd love to know
>> 
>> 
>> thanks,
>> 
>> steve


Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Bill Bell
This seems like a fairly large issue. Can you create a Jira issue ?

Bill Bell
Sent from mobile


On Jul 30, 2013, at 12:34 PM, Dotan Cohen  wrote:

> On Tue, Jul 30, 2013 at 9:21 PM, Aloke Ghoshal  wrote:
>> Does adding facet.mincount=2 help?
>> 
>> 
> 
> In fact, when adding facet.mincount=20 (I know that some dupes are in
> the hundreds) I got the OutOfMemoryError in seconds instead of
> minutes.
> 
> -- 
> Dotan Cohen
> 
> http://gibberish.co.il
> http://what-is-what.com


Re: Concat 2 fields in another field

2013-08-27 Thread Bill Bell
If for search just copyField into a multivalued field

Or do it on indexing using DIH or code. A rhino script works too.

Bill Bell
Sent from mobile


On Aug 27, 2013, at 7:15 AM, "Jack Krupansky"  wrote:

> I have additional examples in the two most recent early access releases of my 
> book - variations on using the existing update processors.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Federico Chiacchiaretta
> Sent: Tuesday, August 27, 2013 8:39 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Concat 2 fields in another field
> 
> Hi,
> we do the same thing using an update request processor chain, this is the
> snippet from solrconfig.xml
> 
> 
>  > firstname concatfield   class="solr.CloneFieldUpdateProcessorFactory"> lastname str> concatfield   "solr.ConcatFieldUpdateProcessorFactory"> concatfield
>  _ 
>   "solr.RunUpdateProcessorFactory" />
> 
> 
> 
> Regards,
> Federico Chiacchiaretta
> 
> 
> 
> 2013/8/27 Markus Jelsma 
> 
>> You may be more interested in the ConcatFieldUpdateProcessorFactory:
>> 
>> http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/ConcatFieldUpdateProcessorFactory.html
>> 
>> 
>> 
>> -Original message-
>> > From:Alok Bhandari 
>> > Sent: Tuesday 27th August 2013 14:05
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: Concat 2 fields in another field
>> >
>> > Thanks for reply.
>> >
>> > But I don't want to introduce any scripting in my code so want to know > is
>> > there any Java component available for the same.
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/Concat-2-fields-in-another-field-tp4086786p4086791.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
> 


Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-27 Thread Bill Bell
Index and query

analyzer type="index">

Bill Bell
Sent from mobile


On Aug 26, 2013, at 5:42 AM, skorrapa  wrote:

> I have also re indexed the data and tried. And also tried with the belowl
>   sortMissingLast="true" omitNorms="true">
>  
>
>
>  
>
>
>
>  
>
>
>
>  
>
> This didnt work as well...
> 
> 
> 
> On Mon, Aug 26, 2013 at 4:03 PM, skorrapa [via Lucene] <
> ml-node+s472066n4086601...@n3.nabble.com> wrote:
> 
>> Hello All,
>> 
>> I am still facing the same issue. Case insensitive search isnot working on
>> Solr 4.3
>> I am using the below configurations in schema.xml
>> > sortMissingLast="true" omitNorms="true">
>>  
>>
>>
>>  
>>
>>
>>
>>  
>>
>>
>>
>>  
>>
>> Basically I want my string which could have spaces or characters like '-'
>> or \ to be searched upon case insensitively.
>> Please help.
>> 
>> 
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> 
>> http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html
>> To unsubscribe from Solr 4.2.1 update to 4.3/4.4 problem, click 
>> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4081896&code=a29ycmFwYXRpLnN1c2htYUBnbWFpbC5jb218NDA4MTg5Nnw0MjEwNTY0Mzc=>
>> .
>> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086606.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Some highlighted snippets aren't being returned

2013-09-08 Thread Bill Bell
Zip up all your configs 

Bill Bell
Sent from mobile


On Sep 8, 2013, at 3:00 PM, "Eric O'Hanlon"  wrote:

> Hi again Everyone,
> 
> I didn't get any replies to this, so I thought I'd re-send in case anyone 
> missed it and has any thoughts.
> 
> Thanks,
> Eric
> 
> On Aug 7, 2013, at 1:51 PM, Eric O'Hanlon  wrote:
> 
>> Hi Everyone,
>> 
>> I'm facing an issue in which my solr query is returning highlighted snippets 
>> for some, but not all results.  For reference, I'm searching through an 
>> index that contains web crawls of human-rights-related websites.  I'm 
>> running solr as a webapp under Tomcat and I've included the query's solr 
>> params from the Tomcat log:
>> 
>> ...
>> webapp=/solr-4.2
>> path=/select
>> params={facet=true&sort=score+desc&group.limit=10&spellcheck.q=Unangan&f.mimetype_code.facet.limit=7&hl.simple.pre=&q.alt=*:*&f.organization_type__facet.facet.limit=6&f.language__facet.facet.limit=6&hl=true&f.date_of_capture_.facet.limit=6&group.field=original_url&hl.simple.post=&facet.field=domain&facet.field=date_of_capture_&facet.field=mimetype_code&facet.field=geographic_focus__facet&facet.field=organization_based_in__facet&facet.field=organization_type__facet&facet.field=language__facet&facet.field=creator_name__facet&hl.fragsize=600&f.creator_name__facet.facet.limit=6&facet.mincount=1&qf=text^1&hl.fl=contents&hl.fl=title&hl.fl=original_url&wt=ruby&f.geographic_focus__facet.facet.limit=6&defType=edismax&rows=10&f.domain.facet.limit=6&q=Unangan&f.organization_based_in__facet.facet.limit=6&q.op=AND&group=true&hl.usePhraseHighlighter=true}
>>  hits=8 status=0 QTime=108
>> ...
>> 
>> For the query above (which can be simplified to say: find all documents that 
>> contain the word "unangan" and return facets, highlights, etc.), I get five 
>> search results.  Only three of these are returning highlighted snippets.  
>> Here's the "highlighting" portion of the solr response (note: printed in 
>> ruby notation because I'm receiving this response in a Rails app):
>> 
>> 
>> "highlighting"=>
>> {"20100602195444/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf"=>
>>   {},
>>  
>> "20100902203939/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf"=>
>>   {},
>>  
>> "20111202233029/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf"=>
>>   {},
>>  "20100618201646/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
>>   {"contents"=>
>> ["...actual snippet is returned here..."]},
>>  "20100902235358/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
>>   {"contents"=>
>> ["...actual snippet is returned here..."]},
>>  
>> "20110302213056/http://www.komnasham.go.id/publikasi/doc_download/2-uu-no-39-tahun-1999"=>
>>   {"contents"=>
>> ["...actual snippet is returned here..."]},
>>  
>> "20110302213102/http://www.komnasham.go.id/publikasi/doc_view/2-uu-no-39-tahun-1999?tmpl=component&format=raw"=>
>>   {"contents"=>
>> ["...actual snippet is returned here..."]},
>>  
>> "20120303113654/http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf"=>
>>   {}}
>> 
>> 
>> I have eight (as opposed to five) results above because I'm also doing a 
>> grouped query, grouping by a field called "original_url", and this leads to 
>> five grouped results.
>> 
>> I've confirmed that my highlight-lacking results DO contain the word 
>> "unangan", as expected, and this term is appearing in a text field that's 
>> indexed and stored, and being searched for all text searches.  For example, 
>> one of the search results is for a crawl of this document: 
>> http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf
>> 
>> And if you view that document on the web, you'll see that it does contain 
>> "unangan".
>> 
>> Has anyone seen this before?  And does anyone have any good suggestions for 
>> troubleshooting/fixing the problem?
>> 
>> Thanks!
>> 
>> - Eric
> 


Re: Solr 4.5 spatial search - distance and score

2013-09-13 Thread Bill Bell
You can apply his 4.5 patches to 4.4 or take trunk and it is there

Bill Bell
Sent from mobile


On Sep 12, 2013, at 6:23 PM, Weber  wrote:

> I'm trying to get score by using a custom boost and also get the distance. I
> found David's code* to get it using "Intersects", which I want to replace by
> {!geofilt} or geodist()
> 
> *David's code: https://issues.apache.org/jira/browse/SOLR-4255
> 
> He told me geodist() will be available again for this kind of field, which
> is a geohash type.
> 
> Then, I'd like to know how it can be done today on 4.4 with {!geofilt} and
> how it will be done on 4.5 using geodist()
> 
> Thanks in advance.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-5-spatial-search-distance-and-score-tp4089706.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field with default value and stored=false, will be reset back to the default value in case of updating other fields

2013-10-09 Thread Bill Bell
You have to update the whole record including all fields...

Bill Bell
Sent from mobile


> On Oct 9, 2013, at 7:50 PM, deniz  wrote:
> 
> hi all,
> 
> I have encountered some problems and post it on stackoverflow here:
> http://stackoverflow.com/questions/19285251/solr-field-with-default-value-resets-itself-if-it-is-stored-false
>  
> 
> as you can see from the response, does it make sense to open a bug ticket
> for this? because, although i can workaround this by setting everything back
> to stored=true, it does not make sense to keep every field stored while i
> dont need to return them in the search result.. or will anyone can make more
> detailed explanations that this is expected and normal? 
> 
> 
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Field-with-default-value-and-stored-false-will-be-reset-back-to-the-default-value-in-case-of-updatins-tp4094508.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-10 Thread Bill Bell
Does this work ?
I can suggest -XX:-UseLoopPredicate to switch off predicates.

???

Which version of 7 is recommended ?

Bill Bell
Sent from mobile


> On Oct 10, 2013, at 11:29 AM, "Smiley, David W."  wrote:
> 
> *Don't* use JDK 7u40, it's been known to cause index corruption and
> SIGSEGV faults with Lucene: LUCENE-5212   This has not been unnoticed by
> Oracle.
> 
> ~ David
> 
>> On 10/10/13 12:34 PM, "Guido Medina"  wrote:
>> 
>> 2. Java version: There are huges performance winning between Java 5, 6
>>   and 7; we use Oracle JDK 7u40.
> 


Re: pagination with grouping

2011-09-08 Thread Bill Bell
There are 2 use cases:

1. rows=10 means 10 groups.
2. rows=10 means to results (irregardless of groups).

I thought there was a total number of groups (ngroups) or case #1.

I don't believe case #2 has been coded.

On 9/8/11 2:22 PM, "alx...@aim.com"  wrote:

>
> 
>
> Hello,
>
>When trying to implement pagination as in the case without grouping I see
>two issues.
>1. with rows=10 solr feed displays 10 groups not 10 results
>2. there is no total number of results with grouping  to show the last
>page.
>
>In detail:
>1. I need to display only 10 results in one page. For example if I have
>group.limit=5 and the first group has 5 docs, the second 3 and the third
>2 then only these 3 group must be displayed in the first page.
>Currently specifying rows=10, shows 10 groups and if we have 5 docs in
>each group then in the first page we will have 50 docs.
>
>2.I need to show the last page, for which I need total number of results
>with grouping. For example if I have 5 groups with number of docs 5, 4,
>3,2 1 then this total number must be 15.
>
>Any ideas how to achieve this.
>
>Thanks in advance.
>Alex.
>
>
>




Re: Re; DIH Scheduling

2011-09-12 Thread Bill Bell
You can easily use cron with curl to do what you want to do.

On 9/12/11 2:47 PM, "Pulkit Singhal"  wrote:

>I don't see anywhere in:
>http://issues.apache.org/jira/browse/SOLR-2305
>any statement that shows the code's inclusion was "decided against"
>when did this happen and what is needed from the community before
>someone with the powers to do so will actually commit this?
>
>2011/6/24 Noble Paul നോബിള്‍ नोब्ळ् 
>
>> On Thu, Jun 23, 2011 at 9:13 PM, simon  wrote:
>> > The Wiki page describes a design for a scheduler, which has not been
>> > committed to Solr yet (I checked). I did see a patch the other day
>> > (see https://issues.apache.org/jira/browse/SOLR-2305) but it didn't
>> > look well tested.
>> >
>> > I think that you're basically stuck with something like cron at this
>> > time. If your application is written in java, take a look at the
>> > Quartz scheduler - http://www.quartz-scheduler.org/
>>
>> It was considered and decided against.
>> >
>> > -Simon
>> >
>>
>>
>>
>> --
>> -
>> Noble Paul
>>




Re: Distinct elements in a field

2011-09-17 Thread Bill Bell
SOLR-2242 can do it.

On 9/16/11 2:15 AM, "swiss knife"  wrote:

>I could get this number by using
>
> group.ngroups=true&group.limit=0
>
> but doing grouping for this seems like an overkill
>
> Would you advise using JIRA SOLR-1814 ?
>
>- Original Message -
>From: swiss knife
>Sent: 09/15/11 12:43 PM
>To: solr-user@lucene.apache.org
>Subject: Distinct elements in a field
>
> Simple question: I want to know how many distinct elements I have in a
>field and these verify a query. Do you know if there's a way to do it
>today in 3.4. I saw SOLR-1814 and SOLR-2242. SOLR-1814 seems fairly easy
>to use. What do you think ? Thank you




Re: indexing a xml file

2011-09-24 Thread Bill Bell
Send us the example "solr.xml" and "schema.xml'". You are missing fields
in the schema.xml that you are referencing.

On 9/24/11 8:15 AM, "ahmad ajiloo"  wrote:

>hello
>Solr Tutorial page explains about index a xml file. but when I try to
>index
>a xml file with this command:
>~/Desktop/apache-solr-3.3.0/example/exampledocs$ java -jar post.jar
>solr.xml
>I get this error:
>SimplePostTool: FATAL: Solr returned an error #400 ERROR:unknown field
>'name'
>
>can anyone help me?
>thanks




Best Solr escaping?

2011-09-24 Thread Bill Bell
What is the best algorithm for escaping strings before sending to Solr? Does
someone have some code?

A few things I have witnessed in "q" using DIH handler
* Double quotes - " that are not balanced can cause several issues from an
error (strip the double quote?), to no results.
* Should we use + or %20 ­ and what cases make sense:
> * "Dr. Phil Smith" or "Dr.+Phil+Smith" or "Dr.%20Phil%20Smith" - also what is
> the impact of double quotes?
* Unmatched parenthesis I.e. Opening ( and not closing.
> * (Dr. Holstein
> * Cardiologist+(Dr. Holstein
Regular encoding of strings does not always work for the whole string due to
several issues like white space:
* White space works better when we use back quote "Bill\ Bell" especially
when using facets.

Thoughts? Code? Ideas? Better Wikis?





Re: Search query doesn't work in solr/browse pnnel

2011-09-24 Thread Bill Bell
Yes. It appears that "&" cannot be encoded in the URL or there is really
bad results.
For example we get an error on first request, but if we refresh it goes
away.



On 9/23/11 2:57 PM, "hadi"  wrote:

>When I create a query like "something&fl=content" in solr/browse the "&"
>and
>"=" in URL converted to %26 and %3D and no result occurs. but it works in
>solr/admin advanced search and also in URL bar directly, How can I solve
>this problem?  Thanks
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Search-query-doesn-t-work-in-solr-brows
>e-pnnel-tp3363032p3363032.html
>Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr stopword problem in Query

2011-09-26 Thread Bill Bell
This is pretty serious issue

Bill Bell
Sent from mobile


On Sep 26, 2011, at 4:09 AM, Isan Fulia  wrote:

> Hi all,
> 
> I have a text field named* textForQuery* .
> Following content has been indexed into solr in field textForQuery
> *Coke Studio at MTV*
> 
> when i fired the query as
> *textForQuery:("coke studio at mtv")* the results showed 0 documents
> 
> After runing the same query in debugMode i got the following results
> 
> 
> 
> textForQuery:("coke studio at mtv")
> textForQuery:("coke studio at mtv")
> PhraseQuery(textForQuery:"coke studio ? mtv")
> textForQuery:"coke studio *? *mtv"
> 
> Why the query did not matched any document even when there is a document
> with value of textForQuery as *Coke Studio at MTV*?
> Is this because of the stopword *at* present in stopwordList?
> 
> 
> 
> -- 
> Thanks & Regards,
> Isan Fulia.


Re: Scoring of DisMax in Solr

2011-10-04 Thread Bill Bell
This seems like a bug to me.

On 10/4/11 6:52 PM, "David Ryan"  wrote:

>Hi,
>
>
>When I examine the score calculation of DisMax in Solr,   it looks to me
>that DisMax is using  tf x idf^2 instead of tf x idf.
>Does anyone have insight why tf x idf is not used here?
>
>Here is the score contribution from one one field:
>
>score(q,c) =  queryWeight x fieldWeight
>   = tf x idf x idf x queryNorm x fieldNorm
>
>Here is the example that I used to derive the formula above. Clearly, idf
>is
>multiplied twice in the score calculation.
>*
>http://localhost:8983/solr/select/?q=GB&version=2.2&start=0&rows=10&indent
>=on&debugQuery=true&fl=id,score
>*
>
>
>0.18314168 = (MATCH) sum of:
>  0.18314168 = (MATCH) weight(text:gb in 1), product of:
>0.35845062 = queryWeight(text:gb), product of:
>  2.3121865 = idf(docFreq=6, numDocs=26)
>  0.15502669 = queryNorm
>0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
>  1.4142135 = tf(termFreq(text:gb)=2)
>  2.3121865 = idf(docFreq=6, numDocs=26)
>  0.15625 = fieldNorm(field=text, doc=1)
>
>
>
>Thanks!




Re: Scoring of DisMax in Solr

2011-10-05 Thread Bill Bell
Markus,

The calculation is correct.

Look at your output.

Result = queryWeight(text:gb) * fieldWeight(text:gb in 1)

Result = (idf(docFreq=6, numDocs=26) * queryNorm) *
(tf(termFreq(text:gb)=2) * idf(docFreq=6, numDocs=26) *
fieldNorm(field=text, doc=1))

This you should notice that idf(docFreq=6, numDocs=26 is repeated twice.

This si just how the weight() is calculated.




> > 0.18314168 = (MATCH) sum of:
> >   0.18314168 = (MATCH) weight(text:gb in 1), product of:
> > 0.35845062 = queryWeight(text:gb), product of:
> >   2.3121865 = idf(docFreq=6, numDocs=26)
> >   0.15502669 = queryNorm
> >
> > 0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
> >   1.4142135 = tf(termFreq(text:gb)=2)
> >   2.3121865 = idf(docFreq=6, numDocs=26)
> >   0.15625 = fieldNorm(field=text, doc=1)





On 10/5/11 11:42 AM, "Markus Jelsma"  wrote:

>Hi,
>
>I don't see 2.3121865 * 2 anywhere in your debug output or something that
>looks like that.
>
>
>> Hi Markus,
>> 
>> The idf calculation itself is correct.
>> What I am trying to understand here is  why idf value is multiplied
>>twice
>> in the final score calculation. Essentially,  tf x idf^2 is used instead
>> of tf x idf.
>> I'd like to understand the rational behind that.
>> 
>> On Wed, Oct 5, 2011 at 9:43 AM, Markus Jelsma
>wrote:
>> > In Lucene's default similarity idf = 1 + ln (numDocs / df + 1).
>> > 1 + ln(26 / 7) =~ 2.3121865
>> > 
>> > I don't see a problem.
>> > 
>> > > Hi,
>> > > 
>> > > 
>> > > When I examine the score calculation of DisMax in Solr,   it looks
>>to
>> > > me that DisMax is using  tf x idf^2 instead of tf x idf.
>> > > Does anyone have insight why tf x idf is not used here?
>> > > 
>> > > Here is the score contribution from one one field:
>> > > 
>> > > score(q,c) =  queryWeight x fieldWeight
>> > > 
>> > >= tf x idf x idf x queryNorm x fieldNorm
>> > > 
>> > > Here is the example that I used to derive the formula above.
>>Clearly,
>> > > idf is multiplied twice in the score calculation.
>> > > *
>> > 
>> > 
>>http://localhost:8983/solr/select/?q=GB&version=2.2&start=0&rows=10&inden
>> > t=
>> > 
>> > > on&debugQuery=true&fl=id,score *
>> > > 
>> > > 
>> > > 
>> > > 0.18314168 = (MATCH) sum of:
>> > >   0.18314168 = (MATCH) weight(text:gb in 1), product of:
>> > > 0.35845062 = queryWeight(text:gb), product of:
>> > >   2.3121865 = idf(docFreq=6, numDocs=26)
>> > >   0.15502669 = queryNorm
>> > > 
>> > > 0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
>> > >   1.4142135 = tf(termFreq(text:gb)=2)
>> > >   2.3121865 = idf(docFreq=6, numDocs=26)
>> > >   0.15625 = fieldNorm(field=text, doc=1)
>> > > 
>> > > 
>> > > 
>> > > 
>> > > Thanks!




Re: is there a way to know which mm value was used?

2011-10-05 Thread Bill Bell
It would be good to output the mm value for debugging.

Something like mm_value = 2

Then you should know the results are right.

On 10/5/11 9:58 AM, "Shawn Heisey"  wrote:

>On 10/5/2011 9:06 AM, elisabeth benoit wrote:
>> thanks for answering.
>>
>> echoParams just echos mm value in solrconfig.xml (in my case mm = 4<-1
>> 6<-2), not the actual value of mm for one particular request.
>>
>> I think would be very useful to be able to know which mm value was
>> effectively used, in particular for request with stopwords.
>>
>> It's of course possible to calculate mm in my own code, but this would
>> necessitate to be synchronize with mm default value in solrconfig.xml +
>>with
>> stopwords.txt + identifying all stopwords in request.
>
>Just tried this on a Solr 3.4.0 server.  I have an edismax handler that
>includes echoParams, set to "all", as well as an mm parameter, set to
>"2<-1 4<-50%".  If I send a request with no mm parameter, that value is
>reflected in the response.  When I add "&mm=50%25" to the URL in my
>browser (%25 being the URL encoding for the percent symbol), the
>response changes the mm value to "50%" as expected, overriding the value
>in solrconfig.xml.  I have not tried it with SolrJ or any of the other
>client APIs, just a browser.
>
>Is this not happening for you?
>
>Thanks,
>Shawn
>




Re: what is the recommended way to store locations?

2011-10-06 Thread Bill Bell
You could client-side Google Geocoding on why the user typed in.
Then get the lat,long returned from Google, and do a geo spatial searchŠ



On 10/6/11 9:27 AM, "Jason Toy"  wrote:

>In our current system ,we have 3 fields for location,  city, state, and
>country.People in our system search for one of those 3 strings.
>So a user can search for "San Francisco" or "California".  In solr I store
>those 3 fields as strings and when a search happens I search with an OR
>statement across those 3 fields.
>
>Is there a more efficient way to store this data storage wise and/or speed
>wise?  We don't currently plan to use any spacial features like "3 miles
>near SF".




Re: Performance issue: Frange with geodist()

2011-10-15 Thread Bill Bell
I added a Jira issue for this:

https://issues.apache.org/jira/browse/SOLR-2840



On 10/13/11 8:15 AM, "Yonik Seeley"  wrote:

>On Thu, Oct 13, 2011 at 9:55 AM, Mikhail Khludnev
> wrote:
>> is it possible with geofilt and facet.query?
>>
>> facet.query={!geofilt pt=45.15,-93.85 sfield=store d=5}
>
>Yes, that should be both possible and faster... something along the lines
>of:
>&sfield=store&pt=45.15,-93.85
>&facet.query={!geofilt d=10 key=d10}
>&facet.query={!geofilt d=20 key=d20}
>&facet.query={!geofilt d=50 key=d50}
>
>Eventually we should implement range faceting over functions and also
>add a max distance you care about to the geodist function.
>
>-Yonik
>http://www.lucene-eurocon.com - The Lucene/Solr User Conference
>
>
>> On Thu, Oct 13, 2011 at 4:20 PM, roySolr 
>>wrote:
>>
>>> I don't want to use some basic facets. When the user doesn't get any
>>> results
>>> i want
>>> to search in the radius of his search location. Example:
>>>
>>> apple store in Manchester gives no result. I want this:
>>>
>>> Click here to see 2 results in a radius of 10km.
>>> Click here to see 11 results in a radius of 50km.
>>> Click here to see 19 results in a radius of 100km.
>>>
>>> With geodist() and facet.query is this possible but the performance
>>>isn't
>>> very good..
>>>
>>>
>>> --
>>> View this message in context:
>>> 
>>>http://lucene.472066.n3.nabble.com/Performance-issue-Frange-with-geodist
>>>-tp3417962p3418429.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail (Mike) Khludnev
>> Developer
>> Grid Dynamics
>> tel. 1-415-738-8644
>> Skype: mkhludnev
>> 
>>  
>>




Question

2010-10-01 Thread Bill Bell
I have a question concerning SOLR 5 Spatial.

I would like to have a field with multiple LAT/LONG values tied to one ID 
(multiple offices for a Doctor) is this possible? Can I do multi-values?

   
   
   


I think I could add multiple fields, but those fields would not be sortable 
across 3 fields

fq={!sfilt%20fl=store_lat_lon}&sort=hsin(6371,true,store,vector(39.7391536,-104.9847034))+asc

I would need something like  :

fq={!sfilt%20fl=store_lat_lon}&sort=hsin(6371,true,store OR store1 OR 
store2,vector(39.7391536,-104.9847034))+asc

Thanks.





Bill Bell | Principal Architect
STATÊRA | www.statera.com<http://www.statera.com>
720.346.0070 x212 - Office | 720.256.8076  - Mobile | 
bb...@statera.com<mailto:bb...@statera.com>



Help with MMapDirectoryFactory in 3.5

2012-02-11 Thread Bill Bell
 I am using Solr 3.5.

I noticed in solrconfig.xml:



I don't see this parameter taking.. When I set
-Dsolr.directoryFactory=solr.MMapDirectoryFactory

How do I see the setting in the log or in stats.jsp ? I cannot find a place
that indicates it is set or not.

I would assume StandardDirectoryFactory is being used but I do see (when I
set it or NOT set it)

ame:  searcher  class:  org.apache.solr.search.SolrIndexSearcher  version:
1.0  description:  index searcher  stats: searcherName :  Searcher@71fc3828
main 
caching :  true 
numDocs :  2121163 
maxDoc :  2121163 
reader :  
SolrIndexReader{this=1867ec28,r=ReadOnlyDirectoryReader@1867ec28,refCnt=1,se
gments=1} 
readerDir :  
org.apache.lucene.store.MMapDirectory@C:\solr\jetty\example\solr\providersea
rch\data\index 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@45c1cfc1
indexVersion :  1324594650551
openedAt :  Sat Feb 11 09:49:31 MST 2012
registeredAt :  Sat Feb 11 09:49:31 MST 2012
warmupTime :  0

Also, how do I set unman and what is the purpose of chunk size?




Re: Help with MMapDirectoryFactory in 3.5

2012-02-11 Thread Bill Bell
Also, does someone have an example of using unmap in 3.5 and chunksize?

From:  Bill Bell 
Date:  Sat, 11 Feb 2012 10:39:56 -0700
To:  
Subject:  Help with MMapDirectoryFactory in 3.5

 I am using Solr 3.5.

I noticed in solrconfig.xml:



I don't see this parameter taking.. When I set
-Dsolr.directoryFactory=solr.MMapDirectoryFactory

How do I see the setting in the log or in stats.jsp ? I cannot find a place
that indicates it is set or not.

I would assume StandardDirectoryFactory is being used but I do see (when I
set it or NOT set it)

ame:  searcher  class:  org.apache.solr.search.SolrIndexSearcher  version:
1.0  description:  index searcher  stats: searcherName : Searcher@71fc3828
main 
caching : true 
numDocs : 2121163 
maxDoc : 2121163 
reader : 
SolrIndexReader{this=1867ec28,r=ReadOnlyDirectoryReader@1867ec28,refCnt=1,se
gments=1} 
readerDir : 
org.apache.lucene.store.MMapDirectory@C:\solr\jetty\example\solr\providersea
rch\data\index 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@45c1cfc1
indexVersion : 1324594650551
openedAt : Sat Feb 11 09:49:31 MST 2012
registeredAt : Sat Feb 11 09:49:31 MST 2012
warmupTime : 0 

Also, how do I set unman and what is the purpose of chunk size?




boost question. need boost to take a query like bq

2012-02-11 Thread Bill Bell


We like the boost parameter in SOLR 3.5 with eDismax.

The question we have is what we would like to replace bq with boost, but we
get the "multi-valued field issue" when we try to do the equivalent queriesŠ
HTTP ERROR 400
Problem accessing /solr/providersearch/select. Reason:
can not use FieldCache on multivalued field: specialties_ids


q=*:*bq=multi_field:87^2&defType=dismax

How do you do this using boost?

q=*:*&boost=multi_field:87&defType=edismax

We know we can use bq with edismax, but we like the "multiply" feature of
boost.

If I change it to a single valued field I get results, but they are all 1.0.


1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm


q=*:*&boost=single_field:87&defType=edismax  // this works, but I need it on
multivalued






FW: boost question. need boost to take a query like bq

2012-02-11 Thread Bill Bell


I did find a solution, but the output is horrible. Why does explain look so
badly?


6.351252 = (MATCH) boost(*:*,query(specialties_ids:
#1;#0;#0;#0;#0;#0;#0;#0;#0; ,def=0.0)), product of:
  1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm
  6.351252 = query(specialties_ids: #1;#0;#0;#0;#0;#0;#0;#0;#0;
,def=0.0)=6.351252



defType=edismax&boost=query($param)¶m=multi_field:87
--


We like the boost parameter in SOLR 3.5 with eDismax.

The question we have is what we would like to replace bq with boost, but we
get the "multi-valued field issue" when we try to do the equivalent queriesŠ
HTTP ERROR 400
Problem accessing /solr/providersearch/select. Reason:
can not use FieldCache on multivalued field: specialties_ids


q=*:*bq=multi_field:87^2&defType=dismax

How do you do this using boost?

q=*:*&boost=multi_field:87&defType=edismax

We know we can use bq with edismax, but we like the "multiply" feature of
boost.

If I change it to a single valued field I get results, but they are all 1.0.


1.0 = (MATCH) MatchAllDocsQuery, product of:
  1.0 = queryNorm


q=*:*&boost=single_field:87&defType=edismax  // this works, but I need it on
multivalued






Debugging on 3,5

2012-02-14 Thread Bill Bell

I did find a solution, but the output is horrible. Why does explain look so 
badly?


6.351252 = (MATCH) boost(*:*,query(specialties_ids: #1;#0;#0;#0;#0;#0;#0;#0;#0; 
,def=0.0)), product of:
  1.0 = (MATCH) MatchAllDocsQuery, product of:
1.0 = queryNorm
  6.351252 = query(specialties_ids: #1;#0;#0;#0;#0;#0;#0;#0;#0; 
,def=0.0)=6.351252



defType=edismax&boost=query($param)¶m=multi_field:87
--


We like the boost parameter in SOLR 3.5 with eDismax.

The question we have is what we would like to replace bq with boost, but we get 
the "multi-valued field issue" when we try to do this.

Bill Bell
Sent from mobile



Mmap

2012-02-14 Thread Bill Bell
Does someone have an example of using unmap in 3.5 and chunksize?

 I am using Solr 3.5.

I noticed in solrconfig.xml:



I don't see this parameter taking.. When I set 
-Dsolr.directoryFactory=solr.MMapDirectoryFactory

How do I see the setting in the log or in stats.jsp ? I cannot find a place 
that indicates it is set or not.

I would assume StandardDirectoryFactory is being used but I do see (when I set 
it or NOT set it)

Bill Bell
Sent from mobile



Re: Improving performance for SOLR geo queries?

2012-02-14 Thread Bill Bell
Can we get this back ported to 3x?

Bill Bell
Sent from mobile


On Feb 14, 2012, at 3:45 AM, Matthias Käppler  wrote:

> hey thanks all for the suggestions, didn't have time to look into them
> yet as we're feature-sprinting for MWC, but will report back with some
> feedback over the next weeks (we will have a few more performance
> sprints in March)
> 
> Best,
> Matthias
> 
> On Mon, Feb 13, 2012 at 2:32 AM, Yonik Seeley
>  wrote:
>> On Thu, Feb 9, 2012 at 1:46 PM, Yonik Seeley  
>> wrote:
>>> One way to speed up numeric range queries (at the cost of increased
>>> index size) is to lower the precisionStep.  You could try changing
>>> this from 8 to 4 and then re-indexing to see how that affects your
>>> query speed.
>> 
>> Your issue, and the fact that I had been looking at the post-filtering
>> code again for another client, reminded me that I had been planning on
>> implementing post-filtering for spatial.  It's now checked into trunk.
>> 
>> If you have the ability to use trunk, you can add a high cost (like
>> cost=200) along with cache=false to trigger it.
>> 
>> More details here:
>> http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/
>> 
>> -Yonik
>> lucidimagination.com
> 
> 
> 
> -- 
> Matthias Käppler
> Lead Developer API & Mobile
> 
> Qype GmbH
> Großer Burstah 50-52
> 20457 Hamburg
> Telephone: +49 (0)40 - 219 019 2 - 160
> Skype: m_kaeppler
> Email: matth...@qype.com
> 
> Managing Director: Ian Brotherston
> Amtsgericht Hamburg
> HRB 95913
> 
> This e-mail and its attachments may contain confidential and/or
> privileged information. If you are not the intended recipient (or have
> received this e-mail in error) please notify the sender immediately
> and destroy this e-mail and its attachments. Any unauthorized copying,
> disclosure or distribution of this e-mail and  its attachments is
> strictly forbidden. This notice also applies to future messages.


Re: Dynamically Load Query Time Synonym File

2012-02-26 Thread Bill Bell
It would depend.

If the synonyms are used on indexing, you need to reindex. Otherwise, you
could reload and use the synonyms on "query".

On 2/26/12 4:05 AM, "Ahmet Arslan"  wrote:

>
>> Is there a way to dynamically load a synonym file without
>> restarting solr core ?
>
>There is an open jira for this :
>https://issues.apache.org/jira/browse/SOLR-1307
>




Re: Vector based queries

2012-03-11 Thread Bill Bell
It is way too slow

Sent from my Mobile device
720-256-8076

On Mar 11, 2012, at 12:07 PM, Pat Ferrel  wrote:

> I found a description here: 
> http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/
> 
> If it is the same four years later, it looks like lucene is doing an index 
> lookup for each important term in the example doc boosting each term based on 
> the term weights. My guess would be that this is a little slower than 2-3word 
> query but still scalable.
> 
> Has anyone used this on a very large index?
> 
> Thanks,
> Pat
> 
> On 3/11/12 10:45 AM, Pat Ferrel wrote:
>> MoreLikeThis looks exactly like what I need. I would probably create a new 
>> "like" method to take a mahout vector and build a search? I build the vector 
>> by starting from a doc and reweighting certain terms. The prototype just 
>> reweights words but I may experiment with dirichlet clusters and reweighting 
>> an entire cluster of words so you could boost the importance of a topic in 
>> the results. Still the result of either algorithm would be a mahout vector.
>> 
>> Is there a description of how this works somewhere? Is it basically an index 
>> lookup? I always though the Google feature used precalculated results (and 
>> it probably does). I'm curious but mainly asking to see how fast it is.
>> 
>> Thanks
>> Pat
>> 
>> On 3/11/12 8:36 AM, Paul Libbrecht wrote:
>>> Maybe that's exactly it but... given a document with n tokens A, and m 
>>> tokens B, a query A^n B^m would find what you're looking for or?
>>> 
>>> paul
>>> 
>>> PS  I've always viewed queries as linear forms on the vector space and I'd 
>>> like to see this really mathematically written one day...
>>> Le 11 mars 2012 à 07:23, Lance Norskog a écrit :
>>> 
 Look at the MoreLikeThis feature in Lucene. I believe it does roughly
 what you describe.
 
 On Sat, Mar 10, 2012 at 9:58 AM, Pat Ferrel  wrote:
> I have a case where I'd like to get documents which most closely match a
> particular vector. The RowSimilarityJob of Mahout is ideal for
> precalculating similarity between existing documents but in my case the
> query is constructed at run time. So the UI constructs a vector to be used
> as a query. We have this running in prototype using a run time calculation
> of cosine similarity but the implementation is not scalable to large doc
> stores.
> 
> One thought is to calculate fairly small clusters. The UI will know which
> cluster to target for the vector query. So we might be able to narrow down
> the number of docs per query to a reasonable size.
> 
> It seems like a place for multiple hash functions maybe? Could we use some
> kind of hack of the boost feature of Solr or some other approach?
> 
> Does anyone have a suggestion?
 
 
 -- 
 Lance Norskog
 goks...@gmail.com
>>> 


Re: 3 Way Solr Join . . ?

2012-03-11 Thread Bill Bell
Sure we do this a lot for smaller indexes.

Create a string field. Not text. Store it. Then it will come out when you do a 
simple select query.

  



Sent from my Mobile device
720-256-8076

On Mar 11, 2012, at 11:09 AM, Angelyna Bola  wrote:

> William,
> 
> :: You can also use external fields, or store formatted info into a
> String field in json or xml format.
> 
> Thank you for the idea . . .
> 
> I have tried to load xml formatted data into Solr (not to be confused
> with the Solr XML load format), but not had any luck. Could you please
> point me to an example of how to load and take advatage of xml format
> in a solr core?
> 
> I can see it being straight forward to load json format into a solr
> core, but I do not see how I can leverage it for this problem?  Could
> you please point me to an example?
> 
> External fields are new to me. From what I'm reading I am not seeing
> how I can use them to help with this problem. Could you explain?
> 
> Respectfully,
> 
> Angelyna
> 
> 
> 
> On Sat, Mar 10, 2012 at 7:58 PM, Angelina Bola  
> wrote:
>> Does "Solr" support a 3-way join? i.e.
>> http://wiki.apache.org/solr/Join (I have the 2-way join working)
>> 
>> For example, I am pulling 3 different tables from a RDBMS into one Solr core:
>> 
>>  Table#1: Customers (parent table)
>>  Table#2: Addresses  (child table with foreign key to customers)
>>  Table#3: Phones (child table with foreign key to customers)
>> 
>> with a ONE to MANY relationship between:
>> 
>>   Customers and Addresses
>>   Customers and Phones
>> 
>> When I pull them into Solr I cannot denormalize the relationships as a
>> given customers can have many addresses and many phones.
>> 
>> When they come into the my single core (customerInfo), each document
>> gets a customerInfo_type and a uid corresponding to that type, for
>> example:
>> 
>>   Customer Document
>>   customerInfo_type='customer'
>>   customer_id
>> 
>>   Address Document
>>   customerInfo_type='address'
>>   fk_address_customer_id
>> 
>>   Phone Document
>>   customerInfo_type='phone'
>>   fk_phone_customer_id
>> 
>> Logically, I need to query in Solr for Customers who:
>> 
>>   - Have an address in a given state
>>   - Have a phone in a given area code
>>   - Are a given gender
>> 
>> Syntactically, it would think it would look like:
>> 
>> - http://localhost:8983/solr/customerInfo/select/?
>>q={!join from=fk_address_customer_id to=customer_id}address_State:Maine&
>>fq={!join from=customer_id to=fk_phone_customer_id}phone_area_code:212&
>>fq=customer_gender:female
>> 
>> But that does not work for me.
>> 
>> Appreciate any thoughts,
>> 
>> Angelyna



Re: 3 Way Solr Join . . ?

2012-03-11 Thread Bill Bell
You can do concatenation johns and then put into Solr. You can denormalize the 
results. Everyone is telling you the same thing.

Select customer_name, (select group_concat(city) from address where 
nameid=customers.nameid) as state_bar from customers

DIH handler has a way to split on comma to add to a multiValued field.

As I also mentioned elsewhere you can concat into an XML field and store it 
into the index. That works fantastic to denormalize.

Why do you need everything in the index? Why not do an external field to get it 
later ? Are you trying to search on something? What? If you need the adds 
searchable then searching on city or state is pretty useful as I showed above,

Sent from my Mobile device
720-256-8076

On Mar 11, 2012, at 10:59 AM, Angelyna Bola  wrote:

> Walter,
> 
> :: Fields can be multi-valued. Put multiple phone numbers in a field
> and match all of them.
> 
> Thank you for the suggestion, unfortunately I oversimplified my example =(
> 
> Let me try again:
> 
>I should have said that I need to match on 2 fields (as a set) from
> within a given child table.
> 
>Logically, I need to query in Solr for Customers who:
> 
>- Have an address in a given state (e.g. NY) and that address is of
> a given type (e.g. condo)
>- Have a phone in a given area code (e.g. 212) and of a given brand
> (e.g. Nokia)
>- Are a given gender (e.g. male)
> 
> Respectfully,
> 
> Angelyna
> 
> 
> 
> On Sat, Mar 10, 2012 at 7:58 PM, Angelina Bola  
> wrote:
>> Does "Solr" support a 3-way join? i.e.
>> http://wiki.apache.org/solr/Join (I have the 2-way join working)
>> 
>> For example, I am pulling 3 different tables from a RDBMS into one Solr core:
>> 
>>   Table#1: Customers (parent table)
>>   Table#2: Addresses  (child table with foreign key to customers)
>>   Table#3: Phones (child table with foreign key to customers)
>> 
>> with a ONE to MANY relationship between:
>> 
>>Customers and Addresses
>>Customers and Phones
>> 
>> When I pull them into Solr I cannot denormalize the relationships as a
>> given customers can have many addresses and many phones.
>> 
>> When they come into the my single core (customerInfo), each document
>> gets a customerInfo_type and a uid corresponding to that type, for
>> example:
>> 
>>Customer Document
>>customerInfo_type='customer'
>>customer_id
>> 
>>Address Document
>>customerInfo_type='address'
>>fk_address_customer_id
>> 
>>Phone Document
>>customerInfo_type='phone'
>>fk_phone_customer_id
>> 
>> Logically, I need to query in Solr for Customers who:
>> 
>>- Have an address in a given state
>>- Have a phone in a given area code
>>- Are a given gender
>> 
>> Syntactically, it would think it would look like:
>> 
>>  - http://localhost:8983/solr/customerInfo/select/?
>> q={!join from=fk_address_customer_id to=customer_id}address_State:Maine&
>> fq={!join from=customer_id to=fk_phone_customer_id}phone_area_code:212&
>> fq=customer_gender:female
>> 
>> But that does not work for me.
>> 
>> Appreciate any thoughts,
>> 
>> Angelyna


Re: Solr core swap after rebuild in HA-setup / High-traffic

2012-03-17 Thread Bill Bell
DIH sets the time of update to the start time not the end time,

So when the index is rebuilt, if you run an delta and use the update time you 
should be okay. We normally go back a few minutes to make sure we have all s a 
fail safe as well.

Sent from my Mobile device
720-256-8076

On Mar 14, 2012, at 12:58 PM, KeesSchepers  wrote:

> Hello everybody,
> 
> I am designing a new Solr architecture for one of my clients. This sorl
> architecture is for a high-traffic website with million of visitors but I am
> facing some design problems were I hope you guys could help me out.
> 
> In my situation there are 4 Solr servers running, 1 server is master and 3
> are slave. They are running Solr version 1.4.
> 
> I use two cores 'live' and 'rebuild' and I use Solr DIH to rebuild a core
> which goes like this:
> 
> 1. I wipe the reindex core
> 2. I run the DIH to the complete dataset (4 million documents) in peices of
> 20.000 records (to prevent very long mysql locks)
> 3. After the DIH is finished (2 hours) we have to also have to update the
> rebuild core with changes from the last two hours, this is a problem
> 4. After updating is done and the core is not more then some seconds behind
> we want to SWAP the cores.
> 
> Everything goes well except for step 3. The rebuild and the core swap is all
> okay. 
> 
> Because the website is undergoing changes every minute we cannot pauze the
> delta-import on the live and walk behind for 2 hours. The problem is that I
> can't figure out a closing system with not delaying the live core to long
> and use the DIH instead of writing a lot of code.
> 
> Did anyone face this problem before or could give me some tips?
> 
> Thanks!
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-core-swap-after-rebuild-in-HA-setup-High-traffic-tp3826461p3826461.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance Question

2012-03-19 Thread Bill Bell
The size of the index does matter practically speaking.

Bill Bell
Sent from mobile


On Mar 19, 2012, at 11:41 AM, Mikhail Khludnev  
wrote:

> Exactly. That's what I mean.
> 
> On Mon, Mar 19, 2012 at 6:15 PM, Jamie Johnson  wrote:
> 
>> Mikhail,
>> 
>> Thanks for the response.  Just to be clear you're saying that the size
>> of the index does not matter, it's more the size of the results?
>> 
>> On Fri, Mar 16, 2012 at 2:43 PM, Mikhail Khludnev
>>  wrote:
>>> Hello,
>>> 
>>> Frankly speaking the computational complexity of Lucene search depends
>> from
>>> size of search result: numFound*log(start+rows), but from size of index.
>>> 
>>> Regards
>>> 
>>> On Fri, Mar 16, 2012 at 9:34 PM, Jamie Johnson 
>> wrote:
>>> 
>>>> I'm curious if anyone tell me how Solr/Lucene performs in a situation
>>>> where you have 100,000 documents each with 100 tokens vs having
>>>> 1,000,000 documents each with 10 tokens.  Should I expect the
>>>> performance to be the same?  Any information would be greatly
>>>> appreciated.
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Lucid Certified
>>> Apache Lucene/Solr Developer
>>> Grid Dynamics
>>> 
>>> <http://www.griddynamics.com>
>>> 
>> 
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Lucid Certified
> Apache Lucene/Solr Developer
> Grid Dynamics
> 
> <http://www.griddynamics.com>
> 


Re: DataImportHandler: backups prior to full-import

2012-03-28 Thread Bill Bell
You could use the Solr Command Utility SCU that runs from Windows and can be 
scheduled to run. 

https://github.com/justengland/Solr-Command-Utility

This is a windows system that will index using a core, and swap it if it 
succeeds. It works it's Solr.

Let me know if you have any questions.

On Mar 28, 2012, at 10:11 PM, Shawn Heisey  wrote:

> On 3/28/2012 12:46 PM, Artem Shnayder wrote:
>> Does anyone know of any work done to automatically run a backup prior to a
>> DataImportHandler full-import?
>> 
>> I've asked this question on #solr and was pointed to
>> https://wiki.apache.org/solr/SolrReplication?highlight=%28backup%29#HTTP_API
>> which
>> is helpful but is not an automatic backup in the context of full-import's.
>> I'm wondering if anyone else has done this work yet.
> 
> I have located a previous message from you where you mention that you are on 
> Ubuntu.  If that's true, you can use hard links to make nearly instantaneous 
> backups with a single command:
> 
> ln /path/to/index/* /path/to/backup/.
> 
> One caveat to that - the backup must be on the same filesystem as the index.  
> If keeping backups on another filesystem (or even another computer) is 
> important, then treat the hard link backup as a temporary directory.  Copy 
> the files from that directory to your remote location, then delete them.
> 
> This works because of the way that Lucene (and by extension Solr) manages 
> files on disk - existing segment files are never modified.  If they get 
> merged, new files are created before the old ones are deleted.  There is only 
> one file in an index directory that does change without getting a new name - 
> segments.gen.  I have verified (on Solr 3.5) that even this file is properly 
> handled so that a hard link backup keeps the correct version.
> 
> For people running on Windows, this particular method won't work.  Newer 
> Windows server versions do have one feature that might actually make it 
> possible to do something similar - shadow copies.  I do not know how to 
> leverage the feature, though.
> 
> Thanks,
> Shawn
> 


Re: Empty facet counts

2012-03-29 Thread Bill Bell
Send schema.xml and did you apply any patches? What version of Solr?

Bill Bell
Sent from mobile


On Mar 29, 2012, at 5:26 AM, Youri Westerman  wrote:

> Hi,
> 
> I'm currently learning how to use solr and everything seems pretty straight
> forward. For some reason when I use faceted queries it returns only empty
> sets in the facet_count section.
> 
> The get params I'm using are:
>  ?q=*:*&rows=0&facet=true&facet.field=urn
> 
> The result:
>  "facet_counts": {
> 
>  "facet_queries": { },
>  "facet_fields": { },
>  "facet_dates": { },
>  "facet_ranges": { }
> 
>  }
> 
> The urn field is indexed and there are enough entries to be counted. When
> adding facet.method=Enum, nothing changes.
> Does anyone know why this is happening? Am I missing something?
> 
> Thanks in advance!
> 
> Youri


Re: ExtractingRequestHandler

2012-04-01 Thread Bill Bell
I have had good luck with creating a separate core index for just data. This is 
a different core than the indexed core.

Very fast.

Bill Bell
Sent from mobile


On Apr 1, 2012, at 11:15 AM, Erick Erickson  wrote:

> Yes, you can. but Generally, storing the raw input in Solr is
> not the best approach. The problem here is that pretty soon
> you get a huge index that contains *everything*. Solr was not
> intended to be a data store.
> 
> Besides, you then need to store the binary form of the file. Solr
> only deals with text, not markup.
> 
> Most people index the text in Solr, and enough information
> so the application knows where to go to fetch the original
> document when the user drills down (e.g. file path, database
> PK, etc). Would that work for your situation?
> 
> Best
> Erick
> 
> On Sat, Mar 31, 2012 at 3:55 PM,   wrote:
>> Hi,
>> 
>> I want to index various filetypes in solr, this can easily done with
>> ExtractingRequestHandler. But I also need the extracted content back.
>> I know ext.extract.only but then nothing gets indexed, right?
>> 
>> Can I index the document AND get the content back as with ext.extract.only?
>> In a single request?
>> 
>> Thank you
>> 
>> 


Question concerning date fields

2012-04-20 Thread Bill Bell
We are loading a long (number of seconds since 1970?) value into Solr using 
java and Solrj. What is the best way to convert this into the right Solr date 
fields?

Sent from my Mobile device
720-256-8076


Re: change index/store at indexing time

2012-04-27 Thread Bill Bell
Yes you can. Just use a script that is called for each row.

Bill Bell
Sent from mobile


On Apr 27, 2012, at 6:38 PM, "Vazquez, Maria (STM)"  
wrote:

> Hi,
> I'm migrating a project from Lucene 2.9 to Solr 3.4.
> There is a special case in the code that indexes the same field in two 
> different ways, which is completely legal in Lucene directly but I don't know 
> how to duplicate this same behavior in Solr:
> 
> if (isFirstGeo) {
> document.add(new Field("geoids", geoId, Field.Store.YES, 
> Field.Index.NOT_ANALYZED_NO_NORMS));
> isFirstGeo = false;
> } else {
> if (countProducts < 100)
>  document.add(new Field("geoids", geoId, Field.Store.NO, 
> Field.Index.NOT_ANALYZED_NO_NORMS));
> else
>  document.add(new Field("geoids", geoId, Field.Store.YES, 
> Field.Index.NO));
> }
> 
> Is there any way to do this in Solr in a Tranformer? I'm using the DIH to 
> index and I can't see a way to do this other than having three fields in the 
> schema like geoids_store_index, geoids_nostore_index, and 
> geoids_store_noindex.
> 
> Thanks a lot in advance.
> Maria
> 
> 
> 


Ampersand issue

2012-04-27 Thread Bill Bell
We are indexing a simple XML field from SQL Server into Solr as a stored field. 
We have noticed that the & is outputed as &amp; when using wt=XML. When 
using wt=JSON we get the normal &. If there a way to indicate that we don't 
want to encode the field since it is already XML when using wt=XML ?

Bill Bell
Sent from mobile



Re: commit stops

2012-04-27 Thread Bill Bell
We also see extreme slowness using Solr 3.6 when trying to commit a delete. We 
also get hangs. We do 1 commit at most a week. Rebuilding from scratching using 
DIH works fine and has never hung.

Bill Bell
Sent from mobile


On Apr 27, 2012, at 5:59 PM, "mav.p...@holidaylettings.co.uk" 
 wrote:

> Thanks for the reply
> 
> The client expects a response within 2 minutes and after that will report
> an error. When we build fresh it seems to work and the operation takes a
> second or two to complete. Once it gets to a stage it hangs it simply
> won't accept any further commits. I did an index check and all was ok.
> 
> I don¹t see any major commit happening at any  time, it seems to just
> hang. Even starting up and shutting down takes ages.
> 
> We make 3 - 4 commits a day.
> 
> We use solr 3.5
> 
> No autocommit
> 
> 
> 
> On 28/04/2012 00:56, "Yonik Seeley"  wrote:
> 
>> On Fri, Apr 27, 2012 at 9:18 AM, mav.p...@holidaylettings.co.uk
>>  wrote:
>>> We have an index of about 3.5gb which seems to work fine until it
>>> suddenly stops accepting new commits.
>>> 
>>> Users can still search on the front end but nothing new can be
>>> committed and it always times out on commit.
>>> 
>>> Any ideas?
>> 
>> Perhaps the commit happens to cause a major merge which may take a
>> long time (and solr isn't going to allow overlapping commits).
>> How long does a commit request take to time out?
>> 
>> What Solr version is this?  Do you have any kind of auto-commit set
>> up?  How often are you manually committing?
>> 
>> -Yonik
>> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
>> Boston May 7-10
> 


Re: Does Solr fit my needs?

2012-04-27 Thread Bill Bell
You could use SQL Server and External Fields in Solr to get what you need from 
the database on result of the query.

Bill Bell
Sent from mobile


On Apr 27, 2012, at 8:31 AM, "G.Long"  wrote:

> Hi there :)
> 
> I'm looking for a way to save xml files into some sort of database and i'm 
> wondering if Solr would fit my needs.
> The xml files I want to save have a lot of child nodes which also contain 
> child nodes with multiple values. The depth level can be more than 10.
> 
> After having indexed the files, I would like to be able to query for subparts 
> of those xml files and be able to reconstruct them as xml files with all 
> their children included. However, I'm wondering if it is possible with an 
> index like solr lucene to keep or easily recover the structure of my xml data?
> 
> Thanks for your help,
> 
> Regards,
> 
> Gary


Re: Solritas in production

2012-05-08 Thread Bill Bell
I would not use Solaritas unless for very rudimentary solutions and prototypes.

Sent from my Mobile device
720-256-8076

On May 6, 2012, at 6:02 AM, András Bártházi  wrote:

> Hi,
> 
> We're currently evaluating Solr as a Sphinx replacement. Our site has
> 1.000.000+ pageviews a day, it's a real estate search engine. The
> development is almost done, and it seems to be working fine, however some
> of my colleagues come with the idea that we're using it wrong. We're using
> it as a service from PHP/Symfony.
> 
> They think we should use Solritas as a frontend, so site visitors will
> directly use it, so no PHP will be involved, so it will be use much less
> infrastructure. One of them said that even mobile.de using it that way (I
> have found no clue about it at all).
> 
> Do you think is it a good idea?
> 
> Do you know services using Solritas as a frontend on a public site?
> 
> My personal opinion is that using Solritas in production is a very bad idea
> for us, but have not so much experience with Solr yet, and Solritas
> documentation is far from a detailed, up-to-date one, so don't really know
> what is it really usable for.
> 
> Thanks,
>  Andras


Re: Replication. confFiles and permissions.

2012-05-09 Thread Bill Bell
Why would you replicate data import properties? The master does the importing 
not the slave...

Sent from my Mobile device
720-256-8076

On May 9, 2012, at 7:23 AM, stockii  wrote:

> Hello.
> 
> 
> i running a solr replication. works well, but i need to replicate my
> dataimport-properties. 
> 
> if server1 replicate this file after he create everytime a new file, with
> *.timestamp, because the first replication run create this file with wrong
> permissions ...
> 
> how can is say to solr replication "chmod 755  dataimport-properties ..."  ?
> ;-)
> 
> thx
> 
> -
> --- System 
> 
> 
> One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 
> 1 Core with 45 Million Documents other Cores < 200.000
> 
> - Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
> - Solr2 for Update-Request  - delta every Minute - 4GB Xmx
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Replication-confFiles-and-permissions-tp3973825.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it possible to limit the bandwidth of replication

2012-05-09 Thread Bill Bell
+1 as well especially for larger indexes

Sent from my Mobile device
720-256-8076

On May 9, 2012, at 9:46 AM, Jan Høydahl  wrote:

>> I think we have to add this for java based rep. 
> +1
> 


Re: slave index not cleaned

2012-05-14 Thread Bill Bell
This is a known issue in 1.4 especially in Windows. Some of it was resolved in 
3x.

Bill Bell
Sent from mobile


On May 14, 2012, at 5:54 AM, Erick Erickson  wrote:

> Hmmm, replication will require up to twice the space of the
> index _temporarily_, just checking if that's what you're seeing
> But that should go away reasonably soon. Out of curiosity, what
> happens if you restart your server, do the extra files go away?
> 
> But it sounds like your index is growing over a longer period of time
> than just a single replication, is that true?
> 
> Best
> Erick
> 
> On Fri, May 11, 2012 at 6:03 AM, Jasper Floor  wrote:
>> Hi,
>> 
>> On Thu, May 10, 2012 at 5:59 PM, Otis Gospodnetic
>>  wrote:
>>> Hi Jasper,
>> 
>> Sorry, I should've added more technical info wihtout being prompted.
>> 
>>> Solr does handle that for you.  Some more stuff to share:
>>> 
>>> * Solr version?
>> 
>> 1.4
>> 
>>> * JVM version?
>> 1.7 update 2
>> 
>>> * OS?
>> Debian (2.6.32-5-xen-amd64)
>> 
>>> * Java replication?
>> yes
>> 
>>> * Errors in Solr logs?
>> no
>> 
>>> * deletion policy section in solrconfig.xml?
>> missing I would say, but I don't see this on the replication wiki page.
>> 
>> This is what we have configured for replication:
>> 
>> 
>>
>> 
>>> name="masterUrl">${solr.master.url}/df-stream-store/replication
>> 
>>00:20:00
>>internal
>>5000
>>1
>> 
>> 
>> 
>> 
>> We will be updating to 3.6 fairly soon however. To be honest, from
>> what I've read, the Solr cloud is what we really want in the future
>> but we will have to be patient for that.
>> 
>> thanks in advance
>> 
>> mvg,
>> Jasper
>> 
>>> You may also want to look at your Index report in SPM 
>>> (http://sematext.com/spm) before/during/after replication and share what 
>>> you see.
>>> 
>>> Otis
>>> 
>>> Performance Monitoring for Solr / ElasticSearch / HBase - 
>>> http://sematext.com/spm
>>> 
>>> 
>>> 
>>> - Original Message -
>>>> From: Jasper Floor 
>>>> To: solr-user@lucene.apache.org
>>>> Cc:
>>>> Sent: Thursday, May 10, 2012 9:08 AM
>>>> Subject: slave index not cleaned
>>>> 
>>>> Perhaps I am missing the obvious but our slaves tend to run out of
>>>> disk space. The index sizes grow to multiple times the size of the
>>>> master. So I just toss all the data and trigger a replication.
>>>> However, can't solr handle this for me?
>>>> 
>>>> I'm sorry if I've missed a simple setting which does this for me, but
>>>> if its there then I have missed it.
>>>> 
>>>> mvg
>>>> Jasper
>>>> 


Re: UI

2012-05-21 Thread Bill Bell
The php.net plugin is the best. SolrPHPClient is missing several features.

Sent from my Mobile device
720-256-8076

On May 21, 2012, at 6:35 AM, Tolga  wrote:

> Hi,
> 
> Can you recommend a good PHP UI to search? Is SolrPHPClient good?


solr-user@lucene.apache.org

2011-02-02 Thread Bill Bell
solr-user-help



Function Question

2011-02-02 Thread Bill Bell
This is posted as an enhancement on SOLR-2345.

I am willing to work on it. But I am stuck. I would like to loop through the 
lat/long values when they are stored in a multiValue list. But it appears that 
I cannot figure out to do that. For example:


sort=geodist() asc

This should grab the closest point in the MultiValue list, and return the 
distance so that is can be scored.

The problem is I cannot find a way to get the MultiValue list?

In function: 
src/java/org/apache/solr/search/function/distance/HaversineConstFunction.java

Has code similar to:

VectorValueSource p2;
this.p2 = vs
List sources = p2.getSources();
ValueSource latSource = sources.get(0);
ValueSource lonSource = sources.get(1);
DocValues latVals = latSource.getValues(context1, readerContext1);
DocValues lonVals = lonSource.getValues(context1, readerContext1);
double latRad = latVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
double lonRad = lonVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
etc...

It would be good if I could loop through sources.get() but it only returns 2 
sources even when there are 2 pairs of lat/long. The getSources() only returns 
the following:

sources:[double(store_0_coordinate), double(store_1_coordinate)]

How do I just get the 4 values in the function?


Function Question

2011-02-02 Thread Bill Bell

This is posted as an enhancement on SOLR-2345.

I am willing to work on it. But I am stuck. I would like to loop through
the lat/long values when they are stored in a multiValue list. But it
appears that I cannot figure out to do that. For example:

sort=geodist() asc
This should grab the closest point in the MultiValue list, and return the
distance so that is can be scored.
The problem is I cannot find a way to get the MultiValue list?
In function: 
src/java/org/apache/solr/search/function/distance/HaversineConstFunction.ja
va
Has code similar to:
VectorValueSource p2;
this.p2 = vs
List sources = p2.getSources();
ValueSource latSource = sources.get(0);
ValueSource lonSource = sources.get(1);
DocValues latVals = latSource.getValues(context1, readerContext1);
DocValues lonVals = lonSource.getValues(context1, readerContext1);
double latRad = latVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
double lonRad = lonVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
etc...
It would be good if I could loop through sources.get() but it only returns
2 sources even when there are 2 pairs of lat/long. The getSources() only
returns the following:
sources:[double(store_0_coordinate), double(store_1_coordinate)]
How do I just get the 4 values in the function?




Function Question

2011-02-02 Thread Bill Bell
This is posted as an enhancement on SOLR-2345.

I am willing to work on it. But I am stuck. I would like to loop through
the lat/long values when they are stored in a multiValue list. But it
appears that I cannot figure out to do that. For example:


sort=geodist() asc

This should grab the closest point in the MultiValue list, and return the
distance so that is can be scored.

The problem is I cannot find a way to get the MultiValue list?

In function: 
src/java/org/apache/solr/search/function/distance/HaversineConstFunction.ja
va

Has code similar to:

VectorValueSource p2;
this.p2 = vs
List sources = p2.getSources();
ValueSource latSource = sources.get(0);
ValueSource lonSource = sources.get(1);
DocValues latVals = latSource.getValues(context1, readerContext1);
DocValues lonVals = lonSource.getValues(context1, readerContext1);
double latRad = latVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
double lonRad = lonVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
etc...

It would be good if I could loop through sources.get() but it only returns
2 sources even when there are 2 pairs of lat/long. The getSources() only
returns the following:

sources:[double(store_0_coordinate), double(store_1_coordinate)]

How do I just get the 4 values in the function?





Re: geodist and spacial search

2011-02-04 Thread Bill Bell
Why not just:

q=*:*
fq={!bbox}
sfield=store
pt=49.45031,11.077721
d=40
fl=store
sort=geodist() asc


http://localhost:8983/solr/select?q=*:*&sfield=store&pt=49.45031,11.077721&;
d=40&fq={!bbox}&sort=geodist%28%29%20asc

That will sort, and filter up to 40km.

No need for the 

fq={!func}geodist()
sfield=store
pt=49.45031,11.077721


Bill




On 2/4/11 4:30 AM, "Eric Grobler"  wrote:

>Hi Grant,
>
>Thanks for the tip
>This seems to work:
>
>q=*:*
>fq={!func}geodist()
>sfield=store
>pt=49.45031,11.077721
>
>fq={!bbox}
>sfield=store
>pt=49.45031,11.077721
>d=40
>
>fl=store
>sort=geodist() asc
>
>
>On Thu, Feb 3, 2011 at 7:46 PM, Grant Ingersoll 
>wrote:
>
>> Use a filter query?  See the {!geofilt} stuff on the wiki page.  That
>>gives
>> you your filter to restrict down your result set, then you can sort by
>>exact
>> distance to get your sort of just those docs that make it through the
>> filter.
>>
>>
>> On Feb 3, 2011, at 10:24 AM, Eric Grobler wrote:
>>
>> > Hi Erick,
>> >
>> > Thanks I saw that example, but I am trying to sort by distance AND
>> specify
>> > the max distance in 1 query.
>> >
>> > The reason is:
>> > running bbox on 2 million documents with a 20km distance takes only
>> 200ms.
>> > Sorting 2 million documents by distance takes over 1.5 seconds!
>> >
>> > So it will be much faster for solr to first filter the 20km documents
>>and
>> > then to sort them.
>> >
>> > Regards
>> > Ericz
>> >
>> > On Thu, Feb 3, 2011 at 1:27 PM, Erick Erickson
>>> >wrote:
>> >
>> >> Further down that very page ...
>> >>
>> >> Here's an example of sorting by distance ascending:
>> >>
>> >>  -
>> >>
>> >>  ...&q=*:*&sfield=store&pt=45.15,-93.85&sort=geodist()
>> >> asc<
>> >>
>> 
>>http://localhost:8983/solr/select?wt=json&indent=true&fl=name,store&q=*:*
>>&sfield=store&pt=45.15,-93.85&sort=geodist()%20asc
>> >>>
>> >>
>> >>
>> >>
>> >>
>> >> The key is just the &sort=geodist(), I'm pretty sure that's
>>independent
>> of
>> >> the bbox, but
>> >> I could be wrong.
>> >>
>> >> Best
>> >> Erick
>> >>
>> >> On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler <
>> impalah...@googlemail.com
>> >>> wrote:
>> >>
>> >>> Hi
>> >>>
>> >>> In http://wiki.apache.org/solr/SpatialSearch
>> >>> there is an example of a bbox filter and a geodist function.
>> >>>
>> >>> Is it possible to do a bbox filter and sort by distance - combine
>>the
>> >> two?
>> >>>
>> >>> Thanks
>> >>> Ericz
>> >>>
>> >>
>>
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem docs using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>




Re: keepword file with phrases

2011-02-05 Thread Bill Bell
You need to switch the order. Do synonyms and expansion first, then
shingles..

Have you tried using analysis.jsp ?

On 2/5/11 10:31 AM, "lee carroll"  wrote:

>Just to add things are going not as expected before the keepword, the
>synonym list is not be expanded for shingles I think I don't understand
>term
>position
>
>On 5 February 2011 16:08, lee carroll 
>wrote:
>
>> Hi List
>> I'm trying to achieve the following
>>
>> text in "this aisle contains preserves and savoury spreads"
>>
>> desired index entry for a field to be used for faceting (ie strict set
>>of
>> normalised terms)
>> is "jams" "savoury spreads" ie two facet terms
>>
>> current set up for the field is
>>
>> >positionIncrementGap="100">
>>   
>> 
>> 
>> > outputUnigrams="true"/>
>> > synonyms="goodForSynonyms.txt" ignoreCase="true" expand="true"/>
>> > words="goodForKeepWords.txt" ignoreCase="true"/>
>>   
>>   
>> 
>> 
>> > outputUnigrams="true"/>
>> > synonyms="goodForSynonyms.txt" ignoreCase="true" expand="true"/>
>> > words="goodForKeepWords.txt" ignoreCase="true"/>
>>   
>> 
>>
>> The thinking here is
>> get rid of any mark up nonsense
>> split into tokens based on whitespace => "this" "aisle" "contains"
>> "preserves" "and" "savoury" "spreads"
>> produce shingles of 1 or 2 tokens => "this","this aisle", "aisle",
>>"aisle
>> contains", "contains", "contains preserves","preserves","and",
>>   "and savoury",
>> "savoury", "savoury spreads", "spreads"
>>
>> expand synonyms using a synomym file (preserves -> jam) =>
>>
>> "this","this aisle", "aisle", "aisle contains", "contains","contains
>> preserves","preserves","jam","and","and savoury", "savoury", "savoury
>> spreads", "spreads"
>>
>> produce a normalised term list using a keepword file of jam , "savoury
>> spreads" in it
>>
>> which should place "jam" "savoury spreads" into the index field
>>facet.
>>
>> However i don't get savoury spreads in the index. from the analysis.jsp
>> everything goes to plan upto the last step where the keepword file does
>>not
>> like keeping the phrase "savoury spreads". i've tried niavely quoting
>>the
>> phrase in the keepword file :-)
>>
>> What is the best way to achive the above ? Is this the correct approach
>>or
>> is there a better way ?
>>
>> thanks in advance lee
>>
>>
>>
>>
>>




Re: geodist and spacial search

2011-02-05 Thread Bill Bell
Sure. I just didn't understand why you would use

fq={!func}geodist()
sfield=store
pt=49.45031,11.077721



You would normally use {!geofilt}



On 2/5/11 8:59 AM, "Estrada Groups"  wrote:

>Use the {!geofilt} param like Grant suggested. IMO, it works the best
>especially on larger datasets.
>
>Adam
>
>Sent from my iPhone
>
>On Feb 4, 2011, at 10:56 PM, Bill Bell  wrote:
>
>> Why not just:
>> 
>> q=*:*
>> fq={!bbox}
>> sfield=store
>> pt=49.45031,11.077721
>> d=40
>> fl=store
>> sort=geodist() asc
>> 
>> 
>> 
>>http://localhost:8983/solr/select?q=*:*&sfield=store&pt=49.45031,11.07772
>>1&
>> d=40&fq={!bbox}&sort=geodist%28%29%20asc
>> 
>> That will sort, and filter up to 40km.
>> 
>> No need for the 
>> 
>> fq={!func}geodist()
>> sfield=store
>> pt=49.45031,11.077721
>> 
>> 
>> Bill
>> 
>> 
>> 
>> 
>> On 2/4/11 4:30 AM, "Eric Grobler"  wrote:
>> 
>>> Hi Grant,
>>> 
>>> Thanks for the tip
>>> This seems to work:
>>> 
>>> q=*:*
>>> fq={!func}geodist()
>>> sfield=store
>>> pt=49.45031,11.077721
>>> 
>>> fq={!bbox}
>>> sfield=store
>>> pt=49.45031,11.077721
>>> d=40
>>> 
>>> fl=store
>>> sort=geodist() asc
>>> 
>>> 
>>> On Thu, Feb 3, 2011 at 7:46 PM, Grant Ingersoll 
>>> wrote:
>>> 
>>>> Use a filter query?  See the {!geofilt} stuff on the wiki page.  That
>>>> gives
>>>> you your filter to restrict down your result set, then you can sort by
>>>> exact
>>>> distance to get your sort of just those docs that make it through the
>>>> filter.
>>>> 
>>>> 
>>>> On Feb 3, 2011, at 10:24 AM, Eric Grobler wrote:
>>>> 
>>>>> Hi Erick,
>>>>> 
>>>>> Thanks I saw that example, but I am trying to sort by distance AND
>>>> specify
>>>>> the max distance in 1 query.
>>>>> 
>>>>> The reason is:
>>>>> running bbox on 2 million documents with a 20km distance takes only
>>>> 200ms.
>>>>> Sorting 2 million documents by distance takes over 1.5 seconds!
>>>>> 
>>>>> So it will be much faster for solr to first filter the 20km documents
>>>> and
>>>>> then to sort them.
>>>>> 
>>>>> Regards
>>>>> Ericz
>>>>> 
>>>>> On Thu, Feb 3, 2011 at 1:27 PM, Erick Erickson
>>>> >>>> wrote:
>>>>> 
>>>>>> Further down that very page ...
>>>>>> 
>>>>>> Here's an example of sorting by distance ascending:
>>>>>> 
>>>>>> -
>>>>>> 
>>>>>> ...&q=*:*&sfield=store&pt=45.15,-93.85&sort=geodist()
>>>>>> asc<
>>>>>> 
>>>> 
>>>> 
>>>>http://localhost:8983/solr/select?wt=json&indent=true&fl=name,store&q=*
>>>>:*
>>>> &sfield=store&pt=45.15,-93.85&sort=geodist()%20asc
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> The key is just the &sort=geodist(), I'm pretty sure that's
>>>> independent
>>>> of
>>>>>> the bbox, but
>>>>>> I could be wrong.
>>>>>> 
>>>>>> Best
>>>>>> Erick
>>>>>> 
>>>>>> On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler <
>>>> impalah...@googlemail.com
>>>>>>> wrote:
>>>>>> 
>>>>>>> Hi
>>>>>>> 
>>>>>>> In http://wiki.apache.org/solr/SpatialSearch
>>>>>>> there is an example of a bbox filter and a geodist function.
>>>>>>> 
>>>>>>> Is it possible to do a bbox filter and sort by distance - combine
>>>> the
>>>>>> two?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Ericz
>>>>>>> 
>>>>>> 
>>>> 
>>>> --
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>> 
>>>> Search the Lucene ecosystem docs using Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>> 
>>>> 
>> 
>> 




Re: Is there anything like MultiSearcher?

2011-02-05 Thread Bill Bell
Why not just use sharding across the 2 cores?

On 2/5/11 8:49 AM, "Roman Chyla"  wrote:

>Dear Solr experts,
>
>Could you recommend some strategies or perhaps tell me if I approach
>my problem from a wrong side? I was hoping to use MultiSearcher to
>search across multiple indexes in Solr, but there is no such a thing
>and MultiSearcher was removed according to this post:
>http://osdir.com/ml/solr-user.lucene.apache.org/2011-01/msg00250.html
>
>I though I had two use cases:
>
>1. maintenance - I wanted to build two separate indexes, one for
>fulltext and one for metadata (the docs have the unique ids) -
>indexing them separately would make things much simpler
>2. ability to switch indexes at search time (ie. for testing purposes
>- one fulltext index could be built by Solr standard mechanism, the
>other by a rather different process - independent instance of lucene)
>
>I think the recommended approach is to use the Distributed search - I
>found a nice solution here:
>http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-
>return-one-result-set
>- however it seems to me, that data are sent over HTTP (5M from one
>core, and 5M from the other core being merged by the 3rd solr core?)
>and I would like to do it only for local indexes and without the
>network overhead.
>
>Could you please shed some light if there already exist an optimal
>solution to my use cases? And if not, whether I could just try to
>build a new SolrQuerySearcher that is extending lucene MultiSearcher
>instead of IndexSearch - or you think there are some deeply rooted
>problems there and the MultiSearch-er cannot work inside Solr?
>
>Thank you,
>
>  Roman




Re: keepword file with phrases

2011-02-05 Thread Bill Bell
OK that makes sense.

If you double quote the synonyms file will that help for white space?

Bill


On 2/5/11 4:37 PM, "Chris Hostetter"  wrote:

>
>: You need to switch the order. Do synonyms and expansion first, then
>: shingles..
>
>except then he would be building shingles out of all the permutations of
>"words" in his symonyms -- including the multi-word synonyms.  i don't
>*think* that's what he wants based on his example (but i may be wrong)
>
>: Have you tried using analysis.jsp ?
>
>he already mentioned he has, in his original mail, and that's how he can
>tell it's not working.
>
>lee: based on your followup post about seeing problems in the synonyms
>output, i suspect the problem you are having is with how the
>synonymfilter 
>"parses" the synonyms file -- by default it assumes it should split on
>certain characters to creates multi-word synonyms -- but in your case the
>tokens you are feeding synonym filter (the output of your shingle filter)
>really do have whitespace in them
>
>there is a "tokenizerFactory" option that Koji added a hwile back to the
>SYnonymFilterFactory that lets you specify the classname of a
>TokenizerFactory to use when parsing the synonym rule -- that may be what
>you need to get your synonyms with spaces in them (so they work properly
>with your shingles)
>
>(assuming of course that i really understand your problem)
>
>
>-Hoss




Re: How to use q.op

2011-02-05 Thread Bill Bell
That sentence would be great to add to the Wiki. I changed the Wiki to add
that.



On 2/5/11 5:03 PM, "Chris Hostetter"  wrote:

>
>: Dismax uses a strategy called Min-Should-Match which emulates the binary
>: operator in the Standard Handler. In a nutshell, this parameter (called
>mm)
>: specifies how many of the entered terms need to be present in your
>matched
>: documents. You can either specify an absolute number or a percentage.
>: 
>: More information can be found here:
>: 
>http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27
>_Match.29
>
>in future versions of solr, dismax will use the q.op param to provide a
>default for mm, but in Solr 1.4 and prior, you should basically set mm=0
>if you want the equivilent of q.op=OR, and mm=100% if you want the
>equivilent of q.op=AND
>
>-Hoss




Re: dynamic fields revisited

2011-02-07 Thread Bill Bell
You can change the match to be my* and then insert the name you want. 

Bill Bell
Sent from mobile


On Feb 7, 2011, at 4:15 PM, gearond  wrote:

> 
> Just so anyone else can know and save themselves 1/2 hour if they spend 4
> minutes searching.
> 
> When putting a dynamic field into a document into an index, the name of the
> field RETAINS the 'constant' part of the dynamic field name.
> 
> Example
> -
> If a dynamic integer field is named '*_i' in the schema.xml file,
>  __and__
> you insert a field names 'my_integer_i', which matches the globbed field
> name '*_i',
>  __then__
> the name of the field will be 'my_integer_i' in the index
> and in your GETs/(updating)POSTs to the index on that document and
>  __NOT__
> 'my_integer' like I was kind of hoping that it would be :-(
> 
> I.E., the suffix (or prefix if you set it up that way,) will NOT be dropped.
> I was hoping that everything except the globbing character, '*', would just
> be a flag to the query processor and disappear after being 'noticed'.
> 
> Not so :-)
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/dynamic-fields-revisited-tp2161080p2447814.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Hits when using group=true

2011-02-10 Thread Bill Bell
It would be good if someone added the hits= on group=true in the log.

We are using this parameter and have build a really cool SOLR log analyzer
(that I am pushing to release to open source).
But it is not as effective if we cannot get group=true to output hits= in
the log - since 90% of our queries are group=true...

There is a ticket in SOLR for this under SOLR-2337. Can someone help me
identify what would be required to get this to work?

Bill

>




Re: Solr design decisions

2011-02-11 Thread Bill Bell
You could commit on a time schedule. Like every 5 mins. If there is nothing to 
commit it doesn't do anything anyway.

Bill Bell
Sent from mobile


On Feb 11, 2011, at 8:22 AM, Greg Georges  wrote:

> Hello all,
> 
> I have just finished to book "Solr 1.4 Enterprise Search Server". I now 
> understand most of the basics of Solr and also how we can scale the solution. 
> Our goal is to have a centralized search service for a multitude of apps.
> 
> Our first application which we want to index, is a system in which we must 
> index documents through Solr Cell. These documents are associated to certain 
> clients (companies). Each client can have a multitude of users, and each user 
> can be part of a group of users. We have permissions on each physical 
> document in the system, and we want this to also be present in our enterprise 
> search for the system.
> 
> I read that we can associate roles and ids to solr documents in order to show 
> only a subset of search results for a particular user. The question I am 
> asking is this. A best practice in Solr is to batch commit changes. The 
> problem in my case is that if we change a documents permissions (role), and 
> if we batch commit there can be a period where the document in the search 
> results can be associated to the old role. What should I do in this case? 
> Should I just commit the change right away? What if this action is done many 
> times by many clients, will the performance still scale even if I do not 
> batch commit my changes? Thanks
> 
> Greg


Re: Solr design decisions

2011-02-11 Thread Bill Bell
Thanks. If you do 2 commits should it do anything? Are people using it to clear 
caches?



Bill Bell
Sent from mobile


On Feb 11, 2011, at 9:55 AM, Yonik Seeley  wrote:

> On Fri, Feb 11, 2011 at 10:47 AM, Bill Bell  wrote:
>> You could commit on a time schedule. Like every 5 mins. If there is nothing 
>> to commit it doesn't do anything anyway.
> 
> It does do something!  A new searcher is opened and caches are invalidated, 
> etc.
> I'd recommend normally using commitWithin instead of explicitly
> committing or using autocommit.
> 
> -Yonik
> http://lucidimagination.com


Development question

2011-02-13 Thread Bill Bell
I am working on https://issues.apache.org/jira/browse/SOLR-2155

Trying to get a list of multiValued fields from the cacheŠ

ValueSource vs = sf.getType().getValueSource(sf, fp);
DocValues llVals = vs.getValues(context, reader);
org.apache.lucene.spatial.geohash.GeoHashUtils.decode(llVals.strVal(doc));

 public String strVal(int doc) {
int ord=termsIndex.getOrd(doc);
if (ord == 0) {
  return null;
} else {
  return termsIndex.lookup(ord, new BytesRef()).utf8ToString();
}
  }

I figure the problem is that lookup only returns one. I need more than 1Š I
thought ./lucene/src/java/org/apache/lucene/document/Document.java would
help me, but it didn't much. Would I want to call getFieldables(name) ? Or
would that slow down the caching? Thoughts?

1. What is termIndex ? Why does ord() matter?
2. Is there a helper for getting a multiValue field from the Field cache?

The strVal(doc) only returns one of the multiValues. Thought one of you
gurus might know the answer.

Thanks.

Bill







Re: slave out of sync

2011-02-14 Thread Bill Bell
We wrote a utility that looks at the index version on both slaves and
complains if they are not at the same version...

Bill


On 2/14/11 5:19 PM, "Tri Nguyen"  wrote:

>Hi,
>
>We're thinking of having a master-slave configuration where there are
>multiple 
>slaves.  Let's say during replication, one of the slaves does not
>replicate 
>properly.
>
>How will we dectect that the 1 slave is out of sync?
>
>Tri




How to use XML parser in DIH for a database?

2011-02-16 Thread Bill Bell
I am using DIH.

I am trying to take a column in a SQL Server database that returns an XML
string and use Xpath to get data out of it.

I noticed that Xpath works with external files, how do I get it to work with
a database?

I need something like "//insur[5][@name='Blue Cross']"

Thanks.




Re: How to use XML parser in DIH for a database?

2011-02-16 Thread Bill Bell
It only works on FileDataSource right ?

Bill Bell
Sent from mobile


On Feb 16, 2011, at 2:17 AM, Stefan Matheis  
wrote:

> What about using
> http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor ?
> 
> On Wed, Feb 16, 2011 at 10:08 AM, Bill Bell  wrote:
>> I am using DIH.
>> 
>> I am trying to take a column in a SQL Server database that returns an XML
>> string and use Xpath to get data out of it.
>> 
>> I noticed that Xpath works with external files, how do I get it to work with
>> a database?
>> 
>> I need something like "//insur[5][@name='Blue Cross']"
>> 
>> Thanks.
>> 
>> 
>> 


  1   2   >