Re: Can I discover what part of a score is attributable to a subquery?

2012-04-13 Thread Paul Libbrecht
Benson, In mid 2009, I has such a question answered with a nifty score bitwise manipulation, and a little precision loss. For each result I could pick the language of a multilingual match. If interested, I can dig. Paul -- Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté. Bens

Category the result search

2012-04-13 Thread hadi
hi I am new to solr, I crawled about 1000 news site with nutch and i use solr to browse the result, but i want to categorize the sites to some categories like(sport news,politic news,science and etc ..) I know i have to use solr faceting but i do not know how can i do such implementation for solr

Re: remoteLink that change it's text

2012-04-13 Thread Marcelo Carvalho Fernandes
Sorry! Wrong list! Marcelo Carvalho Fernandes +55 21 8272-7970 +55 21 2205-2786 On Fri, Apr 13, 2012 at 10:54 PM, Marcelo Carvalho Fernandes < mcf2...@gmail.com> wrote: > Hi! > > I have the following gsp code... > > > > id="${i}" > update="[succes

remoteLink that change it's text

2012-04-13 Thread Marcelo Carvalho Fernandes
Hi! I have the following gsp code... Select this product How to have each remoteLink to change it's "Select this product" text to what "addaction" renders? The problem I'm facing is that I don't know what to put in 'what-to-put-here ' in order to achieve that. Of course, I'

dynamic analyzer based on condition

2012-04-13 Thread srinir
Hi, I want to pick different analyzers for the same field for different languages. I can determine the language from a different field. I would have different fieldTypes defined in my schema.xml such as text_en, text_de, text_fr, etc where i specify which analyzer and filter to use during indexing

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-13 Thread Ali S Kureishy
Thanks Otis. I really appreciate the details offered here. This was very helpful information. I'm going to go through Solandra and Elastic Search and see if those make sense. I was also given a suggestion to use SolrCloud on FuseDFS (that's two recommendations for SolrCloud so far), so I will giv

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-13 Thread Benson Margulies
On Fri, Apr 13, 2012 at 7:07 PM, Chris Hostetter wrote: > > : Given a query including a subquery, is there any way for me to learn > : that subquery's contribution to the overall document score? > > You have to just execute the subquery itself ... doc collection > and score calculation doesn't kee

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread Lance Norskog
This all comes from a database? Here is what you want. The DataImportHandler includes a toolkit for doing full and incremental loading from databases. Read this first: http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/DIHQuickStart Then these: http://www.lucidimaginatio

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-13 Thread Chris Hostetter
: Given a query including a subquery, is there any way for me to learn : that subquery's contribution to the overall document score? You have to just execute the subquery itself ... doc collection and score calculation doesn't keep track the subscores. you could do this using functions in the "

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-13 Thread Benson Margulies
On Fri, Apr 13, 2012 at 6:43 PM, John Chee wrote: > On Fri, Apr 13, 2012 at 2:40 PM, Benson Margulies > wrote: >> Given a query including a subquery, is there any way for me to learn >> that subquery's contribution to the overall document score? I need this number to be available in a SearchCom

Re: term frequency outweighs exact phrase match

2012-04-13 Thread alxsss
Hello Hoss, Here are the explain tags for two doc 0.021646015 = (MATCH) sum of: 0.021646015 = (MATCH) sum of: 0.02141003 = (MATCH) max plus 0.01 times others of: 2.84194E-4 = (MATCH) weight(content:apache^0.5 in 3578), product of: 0.0029881175 = queryWeight(content:apache^0.5

Re: two structures in solr

2012-04-13 Thread Chris Hostetter
: I need to store *two big structures* in SOLR: projects and contractors. : Contractors will search for available projects and project owners will : search for contractors who would do it for them. http://wiki.apache.org/solr/MultipleIndexes : that *I want to have two structures*. I guess runnin

Re: Can I discover what part of a score is attributable to a subquery?

2012-04-13 Thread John Chee
On Fri, Apr 13, 2012 at 2:40 PM, Benson Margulies wrote: > Given a query including a subquery, is there any way for me to learn > that subquery's contribution to the overall document score? > > I can provide 'why on earth would anyone ...' if someone wants to know. Have you tried debugQuery=true?

Re: Post Sorting hook before the doc slicing.

2012-04-13 Thread Chris Hostetter
: Basically, I need to find item X in the result set and return say N items : before and N items after. : : < - N items -- Item X --- N items > ... : So I might be wrong, but it looks like the only way would be to create a : custom SolrIndexSearcher which will find the offset and

Re: Options for automagically Scaling Solr (without needing distributed index/replication) in a Hadoop environment

2012-04-13 Thread Jan Høydahl
Hi, For a web crawl+search like this you will probably need a lot of additional Big Data crunching, so a Hadoop based solution is wise. In addition to those products mentioned we also now have Amazon's own CloudSearch http://aws.amazon.com/cloudsearch/ It's new, is not as cool as Solr (not eve

Re: Boosting StandardQuery scores with a "subquery"?

2012-04-13 Thread Chris Hostetter
: I'm having some trouble wrapping my head around boosting StandardQueries. : It looks like the function: query(subquery, default) : is what I want, but the : examples seem to focus on just returning a score (e.g. product of popularity : and th

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread srini
Thanks Again for quick reply. Little curious about the procedure you suggested. I thought of using same procedure as you suggested. Like writing a java program to fetch xml record from db and parse the content hand it to Solr for indexing. but what if my database content get changed? should I re r

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread Alexander Aristov
Hi This is not solr format. You must re-format your XML into solr XML. you may find examples on solr wiki or in solr examples dir. Best Regards Alexander Aristov On 13 April 2012 23:13, srini wrote: > Erick, > > Thanks for your reply. when you say Solr does not index arbitery xml > document,

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread Erick Erickson
Right, that will not work at all for direct transmission to Solr. You could write a Java program that parses this and sends it to Solr via SolrJ. Personally I haven't connected a database to Solr with XPathEntityProcessor in the mix, but I believe I've seen messages go by with this configuration.

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread srini
Erick, Thanks for your reply. when you say Solr does not index arbitery xml document, then below is the way my xml document looks like which is sitting in oracle. Could you suggest the best of indexing it ? which method should I follow? Should I use XPathEntityProcessor? http://www.w3.org/2001/

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread Erick Erickson
Solr does not index arbitrary XML content. There is and XML form of a solr document that can be sent to Solr, but it is a specific form of XML. An example of the XML you're trying to index and what you mean by "not working" would be helpful. Best Erick On Fri, Apr 13, 2012 at 11:50 AM, srini wr

Re: mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Peter Wolanin
Ok, thanks for the info. As long as the second one works, we can just use that. I just verified that it works for 3.5 at least. -Peter On Fri, Apr 13, 2012 at 1:12 PM, Michael Ryan wrote: > It looks like the first format was removed in 3.6 as part of > https://issues.apache.org/jira/browse/SO

RE: mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Michael Ryan
It looks like the first format was removed in 3.6 as part of https://issues.apache.org/jira/browse/SOLR-1052. The second format works in all 3.x versions. -Michael -Original Message- From: Peter Wolanin [mailto:peter.wola...@acquia.com] Sent: Friday, April 13, 2012 12:32 PM To: solr-us

mergePolicy element format change in 3.6 vs 3.5?

2012-04-13 Thread Peter Wolanin
Trying to maintain the Drupal integration module across multiple versions of 3.x, we've gotten a bug report suggesting that Solr 3.6 needs this change to solrconfig: - org.apache.lucene.index.LogByteSizeMergePolicy + I don't see this mentioned in the release notes - is the second format use

Re: performance impact using string or float when querying ranges

2012-04-13 Thread Yonik Seeley
On Fri, Apr 13, 2012 at 8:11 AM, Erick Erickson wrote: > Well, I guess my first question is whether using stirngs > is "fast enough", in which case there's little reason to > make your life more complex. > > But yes, range queries will be significantly faster with > any of the Trie types than with

Re: Solr is not extracting the CDATA part of xml

2012-04-13 Thread srini
not sure why CDATA part did not get interpreted. this is how xml content looks like. I added quotes just to present the exact content xml content. "" -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-is-not-extracting-the-CDATA-part-of-xml-tp3908317p3908341.html Sent from

Solr is not extracting the CDATA part of xml

2012-04-13 Thread srini
I am trying to use method that is suggested in solr forum to remove CDATA part of xml. but it is not working. result show whole xml content instead of CDATA part. schema.xml mappings.txt "" => "" my xml content -- View

RE: solr 3.5 taking long to index

2012-04-13 Thread Rohit
Hi Shawn, Thanks for the information, let me give this a try, since this is a live box I will try it during the weekend and update you. Regards, Rohit Mobile: +91-9901768202 About Me: http://about.me/rohitg -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: 13 Apri

Errors during indexing

2012-04-13 Thread Ben McCarthy
Hello We have just switched to Solr4 as we needed the ability to return geodist() along with our results. I use a simple multithreaded java app and solr to ingest the data. We keep seeing the following: 13-Apr-2012 15:50:10 org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr

Re: searching across multiple fields using edismax - am i setting this up right?

2012-04-13 Thread Erick Erickson
as to 1) you have to define your request handler with a leading /, as in name= "/partItemNoSearch". Don't forget to restart your server. 3) Of course. The input terms MUST be run through the associated analysis chain to have any hope of matching correctly. Best Erick On Fri, Apr 13, 2012 at 8:36

Re: searching across multiple fields using edismax - am i setting this up right?

2012-04-13 Thread geeky2
thank you for the response. it seems to be working well ;) 1) i tried your suggestion about removing the qt parameter - *somecore/partItemNoSearch*&q=dishwasher&debugQuery=on&rows=10 but this results in a 404 error message - is there some configuration i am missing to support this short-hand s

RE: Solr data export to CSV File

2012-04-13 Thread Ben McCarthy
A combination of the CSV response writer and SOLRJ to page through all of the results sending it to something like apache commons fileutils: FileUtils.writeStringToFile(new File(output.csv), outputLine ("line.separator"), true); Would be quiet quick to knock up in Java. Thank

RE: Realtime /get versus SearchHandler

2012-04-13 Thread Darren Govoni
Yes --- Original Message --- On 4/13/2012 06:25 AM Benson Margulies wrote:A discussion over on the dev list led me to expect that the by-if field retrievals in a SolrCloud query would come through the get handler. In fact, I've seen them turn up in my search component in the search hand

Re: Solr data export to CSV File

2012-04-13 Thread Erick Erickson
Does this help? http://wiki.apache.org/solr/CSVResponseWriter Best Erick On Fri, Apr 13, 2012 at 7:59 AM, Pavnesh wrote: > Hi Team, > > > > A very-very thanks to you guy who had developed such a nice product. > > I have one query regarding solr that I have app 36 Million data in my solr > and I

Re: Issues with language based indexing

2012-04-13 Thread Erick Erickson
Please review: http://wiki.apache.org/solr/UsingMailingLists there's so little information to go on here that I really can't say anything that isn't a guess. At a minimum we need the raw input, the fieldType definitions from your schema, the results of adding &debugQuery=on to your URL Best Eric

Re: performance impact using string or float when querying ranges

2012-04-13 Thread Erick Erickson
Well, I guess my first question is whether using stirngs is "fast enough", in which case there's little reason to make your life more complex. But yes, range queries will be significantly faster with any of the Trie types than with strings. Trie types are all numeric types. Best Erick On Fri, A

Re: How to read SOLR cache statistics?

2012-04-13 Thread Erick Erickson
Well, the place to start is here: *stats*: lookups : 98 *hits *: 59 *hitratio *: 0.60 *inserts *: 41 *evictions *: 0 *size *: 41 the important bits are hitratio and evictions. Caches only really start to "show their stuff" when the hit ratio is quite high. That's the percentage of requests that a

Solr data export to CSV File

2012-04-13 Thread Pavnesh
Hi Team, A very-very thanks to you guy who had developed such a nice product. I have one query regarding solr that I have app 36 Million data in my solr and I wants to export all the data to a csv file but I have found nothing on the same so please help me on this topic . Regards Pav

Re: Facets involving multiple fields

2012-04-13 Thread Erick Erickson
Nope. Information about your higher level use-case would probably be a good thing, this is starting to smell like an "XY" problem. Best Erick On Fri, Apr 13, 2012 at 5:48 AM, Marc SCHNEIDER wrote: > Hi, > > Thanks for your answer. > Yes it works in this case when I know the facet name (Computer)

Re: Trouble handling Unit symbol

2012-04-13 Thread Rajani Maski
Fine. Thank you. I will look at it. On Fri, Apr 13, 2012 at 5:21 PM, Erick Erickson wrote: > Please review: > http://wiki.apache.org/solr/UsingMailingLists > > Especially the bit about adding &debugQuery=on > and showing the results. You're asking people > to guess at solutions without providing

Re: Boost differences in two environments for same query and config

2012-04-13 Thread Erick Erickson
Well, next thing I'd do is just copy your entire directory to the remote machine and try that. If that gives identical results on both, then try moving just your /data directory to the remote machine. I suspect that you've done something different between the two machines that's leading to this,

Re: two structures in solr

2012-04-13 Thread Erick Erickson
bq: Is that right? I don't know, does it work ? You'll probably want an additional field for unique id (just named "id" in the example) that should be disjoint between your types. Best Erick On Fri, Apr 13, 2012 at 3:41 AM, tkoomzaaskz wrote: > Thank you very much Erick for your reply! > > So s

Re: Trouble handling Unit symbol

2012-04-13 Thread Erick Erickson
Please review: http://wiki.apache.org/solr/UsingMailingLists Especially the bit about adding &debugQuery=on and showing the results. You're asking people to guess at solutions without providing much in the way of context. You might try looking at your index with Luke to see what's actually in you

Realtime /get versus SearchHandler

2012-04-13 Thread Benson Margulies
A discussion over on the dev list led me to expect that the by-if field retrievals in a SolrCloud query would come through the get handler. In fact, I've seen them turn up in my search component in the search handler that is configured with my custom QT. (I have a 'prepare' method that sets ShardPa

Issues with language based indexing

2012-04-13 Thread JGar
Hello, I am new to Solr. it is resulting some docs in my search for "Acciones y Valores" string. When i go and search for the same word in the given doc manually, i could not find those word. Pls help on what basis the doc is found in the search . Thanks -- View this message in context: http://

Re: How to read SOLR cache statistics?

2012-04-13 Thread Kashif Khan
Hi Li Li, I have been through that WIKI before but that does not explain what is *evictions*, *inserts*, *cumulative_inserts*, *cumulative_evictions*, *hitratio *and all. These terms are foreign to me. What does the following line mean? *item_ABC : {field=ABC,memSize=340592,tindexSize=1192,time=

Re: Facets involving multiple fields

2012-04-13 Thread Marc SCHNEIDER
Hi, Thanks for your answer. Yes it works in this case when I know the facet name (Computer). What if I want to automatically compute all facets? facet.query=keyword:* short_title:* doesn't work, right? Marc. On Thu, Apr 12, 2012 at 2:08 PM, Erick Erickson wrote: > facet.query=keywords:computer

Re: Solr Scoring

2012-04-13 Thread Kissue Kissue
Thanks a lot. I had already implemented Walter's solution and was wondering if this was the right way to deal with it. This has now given me the confidence to go with the solution. Many thanks. On Fri, Apr 13, 2012 at 1:04 AM, Erick Erickson wrote: > GAH! I had my head in "make this happen in on

Re: Boost differences in two environments for same query and config

2012-04-13 Thread Kerwin
Hi Erick, Thanks for your suggestions. I did an optimize on the remote installation and this time with the same number of documents but still face the same issue as seen from the debug output below: 9.950362E-4 = (MATCH) sum of: 9.950362E-4 = (MATCH) weight(RECORD_TYPE:info in 35916), pro

Re: two structures in solr

2012-04-13 Thread tkoomzaaskz
Thank you very much Erick for your reply! So should it go something like the following: http://lucene.472066.n3.nabble.com/file/n3907393/solr_index.png sorry for an ugly drawing ;) In this example, the index will have 13 columns: 6 for project, 6 for contractor and one to define the type. Is th

AW: Lexical analysis tools for German language data

2012-04-13 Thread Michael Ludwig
> Von: Tomas Zerolo > > > There can be transformations or inflections, like the "s" in > > > "Weinachtsbaum" (Weinachten/Baum). > > > > I remember from my linguistics studies that the terminus technicus > > for these is "Fugenmorphem" (interstitial or joint morpheme) [...] > > IANAL (I am not a l

Re: How to read SOLR cache statistics?

2012-04-13 Thread Li Li
http://wiki.apache.org/solr/SolrCaching On Fri, Apr 13, 2012 at 2:30 PM, Kashif Khan wrote: > Does anyone explain what does the following parameters mean in SOLR cache > statistics? > > *name*: queryResultCache > *class*: org.apache.solr.search.LRUCache > *version*: 1.0 > *description*: LRU

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-13 Thread Mikhail Khludnev
Did I get right that you have two separate processes (different app) access the same LuceneDIrectory simultaneously? In this case I suggest to read about Locking mechanism. I'm not really experienced in it. You showed logs from StrUpdHandler failure, it's clear. Can you show logs from Embeded serve

Re: Solr Scoring

2012-04-13 Thread Li Li
another way is to use payload http://wiki.apache.org/solr/Payloads the advantage of payload is that you only need one field and can make frq file smaller than use two fields. but the disadvantage is payload is stored in prx file, so I am not sure which one is fast. maybe you can try them both. On