from:"Aloke Ghoshal"

Re: DateDiff

2013-11-06 Thread Aloke Ghoshal

Hi Adam, With FunctionQuery (http://wiki.apache.org/solr/FunctionQuery) & DateMath ( http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/util/DateMathParser.html) - round to the day level, subtract & divide by milliseconds_in_a_day (86400K). ?q=*floor*(*div*(*sub*(*ms*(NOW/DAY),*ms*(NOW

Re: Facet question: Getting only the matched value from multivalued field

2013-11-03 Thread Aloke Ghoshal

Hi Susheel, You might be able to pull something off using facet.prefix: http://wiki.apache.org/solr/SimpleFacetParameters#facet.prefix. Will work when the prefix is exact and doesn't require any analysis, something along these lines: http://solr.pl/en/2013/03/25/autocomplete-on-multivalued-fields-

Re: Solr subset searching in 100-million document index

2013-10-25 Thread Aloke Ghoshal

Hi Sandeep, You are quite likely below capacity with this current set-up: http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache Few things for you to confirm: 1. Which version of Solr are you using? 2. The size of your index. - Are fields stored? How much are these stored fields contr

Re: Changing indexed property on a field from false to true

2013-10-24 Thread Aloke Ghoshal

Upayavira - Nice idea pushing in a nominal update when all fields are stored, and it does work. The nominal update could be sent to a boolean type dynamic field, that's not to be used for anything other than maybe identifying documents that are done re-indexing. On Wed, Oct 23, 2013 at 7:47 PM,

Re: Find documents that are composed of % words

2013-10-16 Thread Aloke Ghoshal

ions (omitNorms option) For a field with default boost (= 1), norm = lengthNorm (approximately 1/sqrrt(numTerms)). Norm's been multiplied twice in the query to divide the score (approx.) by numTerms. Hope that helps. Regards, Aloke On Fri, Oct 11, 2013 at 5:36 PM, shahzad73 wrote: > Aloke

Re: Find documents that are composed of % words

2013-10-10 Thread Aloke Ghoshal

Something you could do via function queries. Performance (for 500+ words) is a doubtful. 1) With a separate float field (myfieldwordcount) that holds the count of words from your query field (myfield): http://localhost:8983/solr/collection1/select?wt=xml&indent=true&defType=func &fl=id,myfield &q

Re: Find documents that are composed of % words

2013-10-09 Thread Aloke Ghoshal

Hi Shahzad, Have you tried with the Minimum Should Match feature: http://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29 Regards, Aloke On Wed, Oct 9, 2013 at 4:55 PM, Otis Gospodnetic wrote: > Hi, > > You can take your words, combine some % of them with AND. Then take

Re: dynamic field question

2013-10-09 Thread Aloke Ghoshal

Hi David, A separate Solr document for each section is a good option if you also need to handle phrases, case, special characters, etc. within the title field. How do you map them to dynamic fields? E.g.: "Appendix for cities", "APPENDIX 1: Cities" Regards, Aloke On Wed, Oct 9, 2013 at 9:45 AM

Re: Will Solr work with a mapped drive?

2013-09-20 Thread Aloke Ghoshal

Hi, Try the UNC path instead: http://wiki.apache.org/tomcat/FAQ/Windows#Q6 Regards, Aloke On 9/20/13, johnmu...@aol.com wrote: > Hi, > > > I'm having this same problem as described here: > http://stackoverflow.com/questions/17708163/absolute-paths-in-solr-xml-configuration-using-tomcat6-on-wind

Re: ReplicationFactor for solrcloud

2013-09-11 Thread Aloke Ghoshal

Hi Aditya, You need to start another 6 instances (9 instances in total) to achieve this. The first 3 instances, as you mention, are already assigned to the 3 shards. The next 3 will be become their replicas, followed by the next 3 as the next replicas. You could create two copies each of the exam

Re: Some highlighted snippets aren't being returned

2013-09-08 Thread Aloke Ghoshal

Hi Eric, As Bryan suggests, you should look at appropriately setting up the fragSize & maxAnalyzedChars for long documents. One issue I find with your search request is that in trying to highlight across three separate fields, you have added each of them as a separate request param: hl.fl=content

Re: Order of fields in a search query.

2013-08-31 Thread Aloke Ghoshal

Hi Deepak, As Hoss explains it, there wouldn't be any effect of changing the order of individual search terms. In addition, you could look at the Scoring algo: http://lucene.apache.org/core/2_9_4/scoring.html#Algorithm, http://lucene.apache.org/core/2_9_4/api/core/org/apache/lucene/search/package

Re: Newbie SOLR question

2013-08-30 Thread Aloke Ghoshal

Hi, Please refer to my response from a few months back: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201303.mbox/%3ccaht6s2az_w2av04rdmoeeck5e9o0k4ytktf0pjsecsh-lls...@mail.gmail.com%3E Our modelling is to index N (individual pages) + 1 (original document) in Solr. Once a document ha

Re: Problem with importing tab-delimited csv file

2013-08-23 Thread Aloke Ghoshal

Hi Rob, I think the wrong Content-type header is getting passed. Try one of these instead: curl ' http://localhost:8983/solr/update/csv?commit=true&separator=%09&stream.file=/tmp/sample.tmp ' OR curl 'http://localhost:8983/solr/update/csv?commit=true&separator=%09' -H 'Content-type:application/

Re: removing duplicates

2013-08-21 Thread Aloke Ghoshal

tp://wiki.apache.org/solr/UpdateXmlMessages#A.22delete.22_documents_by_ID_and_by_Query ). Regards, Aloke On Thu, Aug 22, 2013 at 3:04 AM, Ali, Saqib wrote: > Thanks Aloke and Robert. Can you please give me code/query snippets? > (newbie here) > > > On Wed, Aug 21, 2013 at 2:31 P

Re: removing duplicates

2013-08-21 Thread Aloke Ghoshal

Hi, Facet by one of the duplicate fields (probably by the numeric field that you mentioned) and set facet.mincount=2. Regards, Aloke On Thu, Aug 22, 2013 at 2:44 AM, Ali, Saqib wrote: > hello, > > We have documents that are duplicates i.e. the ID is different, but rest of > the fields are sam

Re: convert text file to solr document where delimiter fields are fields of document

2013-08-20 Thread Aloke Ghoshal

Hi, Since your data is well delimited, I'd suggest using CSV Updater, with the delimiter/ separator set to: *'~*' See: http://wiki.apache.org/solr/UpdateCSV#separator Looks like you might also have to additionally split based on your second delimiter: *';'* See: http://wiki.apache.org/solr/Update

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread Aloke Ghoshal

Location of the schema.xml: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1/solr/example/solr/collection1/conf/schema.xml On Mon, Aug 19, 2013 at 6:52 PM, Aloke Ghoshal wrote: > Here you go, it is the default 4.2.1 schema.xml ( > http://svn.apache.org/repos/asf/luce

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread Aloke Ghoshal

Here you go, it is the default 4.2.1 schema.xml ( http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1/solr/example/solr/solr.xml), with the following additions: Test with the field *ContTest*. Regards, Aloke On Mon,

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-19 Thread Aloke Ghoshal

Hi Vicky, Please check you if you have a second "multiValued" field by the name "content" defined in your schema.xml. It is typically part of the default schema definition & is different from the one you had initially posted had "Content" with a capital C. Here's the debugQuery on my system (wit

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-16 Thread Aloke Ghoshal

Hi, That's correct the Analyzers will get applied to both Index & Query time. In fact I do get results back for speedPost with this field definition. Regards, Aloke On Fri, Aug 16, 2013 at 5:21 PM, vicky desai wrote: > Hi, > > Another Example I found is q=Content:wi-fi doesn't match for docume

Re: struggling with solr.WordDelimiterFilterFactory

2013-08-16 Thread Aloke Ghoshal

Hi, Based on your WhitespaceTokenizerFactory & due to the LowerCaseFilterFactory the words actually indexed are: speed, post, speedpost You should get results for: q:Content:speedpost So either remove the LowerCaseFilterFactory or add the LowerCaseFilterFactory to as a query time Analyzer as wel

Re: Load a list of values in a solr field and query over its items

2013-08-14 Thread Aloke Ghoshal

Should work once you set up both fields as multiValued ( http://wiki.apache.org/solr/SchemaXml#Common_field_options). On Thu, Aug 15, 2013 at 12:07 AM, Utkarsh Sengar wrote: > Hello, > > Is it possible to load a list in a solr filed and query for items in that > list? > > example_core1: > > docu

Re: SOLR OR query, want 1 of the 2 results

2013-08-12 Thread Aloke Ghoshal

Hi, I would suggest boosting over sorting. Something along: radius:[0 TO 10]^100 OR radius:[10 TO *] Regards, Aloke On Mon, Aug 12, 2013 at 6:43 PM, Raymond Wiker wrote: > It will probably have better performance than having a "plan b" query that > executes if the first query fails... > > >

Re: Solr search on a large text field is very slow

2013-08-08 Thread Aloke Ghoshal

Compare timings in the following cases: - Without the wildcard - With suffix wild card only - test* - With reverse wild card filter factory and two separate terms - *test OR test* On Thu, Aug 8, 2013 at 8:15 PM, meena.sri...@mathworks.com < meena.sri...@mathworks.com> wrote: > Index size is arou

Re: How might one search for dupe IDs other than faceting on the ID field?

2013-07-30 Thread Aloke Ghoshal

Does adding facet.mincount=2 help? On Tue, Jul 30, 2013 at 11:46 PM, Dotan Cohen wrote: > To search for duplicate IDs, I am running the following query: > select?q=*:*&facet=true&facet.field=id&rows=0 > > However, since upgrading from Solr 4.1 to Solr 4.3 I am receiving > OutOfMemoryError error

Re: Switch to new leader transparently?

2013-07-10 Thread Aloke Ghoshal

Hi Floyd, We use SolrNet to connect to Solr from a C# application. Since SolrNet is not aware about SolrCloud or ZK, we use a Http load balancer in front of the Solr nodes & query via the load balancer url. You could use something like HAProxy or Apache reverse proxy for load balancing. On the ot

Re: Is it possible to find a leader from a list of cores in solr via java code

2013-07-03 Thread Aloke Ghoshal

One option could be to get the clusterstate.json via the following Solr url & figure out the leader from the response json: * http://server:port/solr/zookeeper?detail=true&path=%2Fclusterstate.json* On Wed, Jul 3, 2013 at 5:57 PM, vicky desai wrote: > Hi, > > I have a requirement where in I want

Re: Solr Suggest does not work in solrcloud environment

2013-06-21 Thread Aloke Ghoshal

Hi Simon, Good that it works. The reason as far as I could make out is that by itself/ standalone the SpellCheckComponent (used by the suggester) is not distributed. One way to explicitly distribute the search is to provide the shards: http://wiki.apache.org/solr/SpellCheckComponent#Distributed_S

Re: Solr Suggest does not work in solrcloud environment

2013-06-19 Thread Aloke Ghoshal

Hi, Check the obvious first, that you have rebuilt & reloaded the suggest dictionary individually on all nodes. Also the other checks here: http://stackoverflow.com/questions/6653186/solr-suggester-not-returning-any-results Then, try with one of query component OR distrib=false setting: http://l

Re: Filtering down terms in suggest

2013-06-13 Thread Aloke Ghoshal

Thanks Barani. Could also work out this way provided we start with a large set of suggestions initially to increase the likelihood of getting some matches when filtering down with the second query. On Wed, Jun 12, 2013 at 10:51 PM, bbarani wrote: > I would suggest you to take the suggested stri

Re: Filtering down terms in suggest

2013-06-12 Thread Aloke Ghoshal

h wildcard searches, or better yet NGram > (EdgeNGram) behavior to get the right suggestion data back. > > I would suggest an additional core to accomplish this (fed via > replication) to avoid cache entry collision with your normal queries. > > Hope that's useful to you. > &

Re: Filtering down terms in suggest

2013-06-12 Thread Aloke Ghoshal

wildcard patterns) and then send the suggestion query > to the right field. > > Obviously this will get out of hand if you have too many of these...so > this has limits. > > Jason > > On Jun 11, 2013, at 8:29 AM, Aloke Ghoshal wrote: > > > Hi, > > > > Trying t

Filtering down terms in suggest

2013-06-11 Thread Aloke Ghoshal

Hi, Trying to find a way to filter down the suggested terms set based on the term value of another indexed field? Let's say we have the following documents indexed in Solr: userid:1, groupid:1, content:"alpha beta gamma" userid:2, groupid:1, content:"alternate better garden" userid:3, groupid:2,

Re: LIMIT on number of OR in fq

2013-06-09 Thread Aloke Ghoshal

True, the container's request header size limit must be the reason then. Try: http://serverfault.com/questions/136249/how-do-we-increase-the-maximum-allowed-http-get-query-length-in-jetty On Sun, Jun 9, 2013 at 11:04 PM, Jack Krupansky wrote: > Maybe it is hitting some kind of container limit o

Re: LIMIT on number of OR in fq

2013-06-09 Thread Aloke Ghoshal

Hi Kamal, You might have to increase the value of maxBooleanClauses in solrconfig.xml (http://wiki.apache.org/solr/SolrConfigXml). The default value 1024 should have been fine for 280 search terms. Though not relevant to your query (OR query) take a look at for an explanation: http://solr.pl/en/2

Re: SQL MINUS equivalent in solr

2013-06-02 Thread Aloke Ghoshal

Hi, A work around could be to add columns from the second table as fields to the Solr document from the first table. E.g. For DB query: SELECT project_id FROM projects MINUS SELECT project_id FROM archived_project; Add archived_projects as a boolean field to Projects in Solr & then query as: q=(

Re: Get page number of searchresult of a pdf in solr

2013-03-01 Thread Aloke Ghoshal

Hi, We are going about solving this problem by splitting a N-page document in to N separate documents (one per page, type=Page) + 1 additional combined document (that has all the pages, type=Combined). All the N+1 documents have the same doc_id. The search is initially performed against the combi

Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Aloke Ghoshal

Hi, If you haven't already, please refer to: http://www.ngdata.com/site/blog/57-ng.html http://lucene.472066.n3.nabble.com/solr-cloud-concepts-td3726292.html http://wiki.apache.org/solr/SolrCloud#FAQ Regards, Aloke On Thu, Jan 3, 2013 at 3:12 PM, Alexandre Rafalovitch wrote: > Hello, > > I am

Re: ZooKeeper ensemble behind load balancer

2012-12-30 Thread Aloke Ghoshal

Hi Marcin, Since you are thinking of this in the context of Amazon, I would suggest taking a different route. Assign an Elastic IP (EIP) to each EC2 instance running the ZK node & use the EIP in Solr. This way you could easily map the EIP to a new EC2 instance subsequently, if required, and the ch

Re: score calculation

2012-12-13 Thread Aloke Ghoshal

Hi Tom, This is great. Should make it to the documentations. Regards, Aloke On Thu, Dec 13, 2012 at 1:23 PM, Burgmans, Tom < tom.burgm...@wolterskluwer.com> wrote: > I am also busy with getting this clear. Here are my notes so far (by > copying and writing myself): > > > > queryWeight = the

Re: star searches with high page number requests taking long times

2012-12-07 Thread Aloke Ghoshal

Hi Robert, You could look at pageDoc & pageScore to improve things for deep paging ( http://wiki.apache.org/solr/CommonQueryParameters#pageDoc_and_pageScore). Regards, Aloke On Sat, Dec 8, 2012 at 8:08 AM, Upayavira wrote: > Yes, expected. > > When it does a search for the first, say, 10 resul

Re: Running Solr Core/ Tika on Azure

2012-10-30 Thread Aloke Ghoshal

ure and indexing DB or XML. > > Above project "boostraps" itself with all of the Java and Solr files it > needs to run and starts Solr using bundled in Jetty web server, so as long > as you have Tika in your libs and a configured handler you should be able > to use it.

Running Solr Core/ Tika on Azure

2012-10-30 Thread Aloke Ghoshal

Hi, Looking for feedback on running Solr Core/ Tika parsing engine on Azure. There's one offering for Solr within Azure from Lucid works. This offering however doesn't mention Tika. We are looking at options to make content from files (doc, excel, pdfs, etc.) stored within Azure storage search-ab

44 matches

Mail list logo