Re: Xml representation of indexed document

2012-03-09 Thread Anupam Bhattacharya
You can use Luke to view Lucene Indexes. Anupam On Sat, Mar 10, 2012 at 12:27 PM, Chamnap Chhorn wrote: > Hi all, > > I'm doing data import using DIH in solr 3.5. I'm curious to know whether it > is see the xml representation of indexed data from the browser. Is it > possible? > I just want to m

Xml representation of indexed document

2012-03-09 Thread Chamnap Chhorn
Hi all, I'm doing data import using DIH in solr 3.5. I'm curious to know whether it is see the xml representation of indexed data from the browser. Is it possible? I just want to make sure these data is correctly indexed with correct value or for debugging purpose. -- Chamnap

Re: How to Index Custom XML structure

2012-03-09 Thread Jan Høydahl
You could setup a ManifoldCF job to fetch the XMLs and then setup a new SolrOutputConnection for /solr/update/xslt?tr=myStyleSheet.xsl where myStyleSheet.xsl is the stylesheet to use for that kind of XML. See http://wiki.apache.org/solr/XsltUpdateRequestHandler -- Jan Høydahl, search solution a

Re: Highlighting "text" field when query is for "string" field

2012-03-09 Thread solrdude
Or is it because query is on "keyword" field and I expect matching keywords to be highlighted on "excerpts" field? Any insights would help a lot. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-text-field-when-query-is-for-string-field-tp3475334p3814159.ht

Re: Stemmer Question

2012-03-09 Thread Jamie Johnson
So I've thrown something together fairly quickly which is based on what Ahmet had sent that I believe will preserve the original token as well as the stemmed version. I didn't go as far as weighting them differently using the payloads however. I am not sure how to use the preserveOriginal attribu

RE: DIH - FileListEntityProcessor reading from Multiple Disk Directories

2012-03-09 Thread mike.rawlins
I knew there had to be an easy way. That was it. Thanks for the tip! Mike Rawlins Sr. Software Engineer Chair, ASC X12 Technical Assessment Subcommittee 18111 Preston Road, Suite 600 Dallas, TX 75252 +1 972.643.3101 direct mike.rawl...@gxs.com www.gxs.com GXS Blog -Original Message- Fr

Re: does solr have a mechanism for intercepting requests - before they are handed off to a request handler

2012-03-09 Thread Mikhail Khludnev
I'm doing something like that by hacking SolrRequestParsers, I tried to find more legal way but haven't found it http://mail-archives.apache.org/mod_mbox/lucene-dev/201202.mbox/%3CCAF=Pa597RpLjVWZbM=0aktjhpnea4m931j0s1s4bda4qe+t...@mail.gmail.com%3E I added into solrconfig.xml https://github.com/

Knowing which fields matched a search

2012-03-09 Thread Russell Black
When searching across multiple fields, is there a way to identify which field(s) resulted in a match without using highlighting or stored fields?

Re: Time Stats

2012-03-09 Thread Raimon Bosch
second mean is 48.05%... 2012/3/9 Raimon Bosch > The answer is so easy. Just need to create an index with each visit. In > this way I could use faceted date search to create time statistics. > > "flats for rent new york" at 1/12/2011 => bounce_rate=48.6% > "flats for rent new york" at 1/1/2012 =

Re: Time Stats

2012-03-09 Thread Raimon Bosch
The answer is so easy. Just need to create an index with each visit. In this way I could use faceted date search to create time statistics. "flats for rent new york" at 1/12/2011 => bounce_rate=48.6% "flats for rent new york" at 1/1/2012 => bounce_rate=49.7% "flats for rent new york" at 1/2/2012 =

Re: Stemmer Question

2012-03-09 Thread Jamie Johnson
Further digging leads me to believe this is not the case. The Synonym Filter supports this, but the Stemming Filter does not. Ahmet, Would you be willing to provide your filter as well? I wonder if we can make it aware of the preserveOriginal attribute on WordDelimterFilterFactory? On Fri, Ma

Re: Stemmer Question

2012-03-09 Thread Jamie Johnson
Ok, so I'm digging through the code and I noticed in org.apache.lucene.analysis.synonym.SynonymFilter there are mentions of a keepOrig attribute. Doing some googling led me to http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters which speaks of an attribute preserveOriginal="1" on solr.Word

Re: Upgrade solr

2012-03-09 Thread Erick Erickson
Take a look at the solr/CHANGES.txt file. Each release has an "Upgrading from " section, the one you're interested in is "Upgrading from Solr 1.4" in the 3.1.0 section, and then the ones that are in subsequent sections. Of course I'd try it on a copy of my index first... If at all possible, the e

Re: Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
On the other hand, I'm aware of the fact that if I go with Lucene approach, failover is something that I will have to support manually! which is a nightmare! On Fri, Mar 9, 2012 at 2:13 PM, Alireza Salimi wrote: > This solution makes sense, but I still don't know if I can use solrCloud > with > t

RE: DIH - FileListEntityProcessor reading from Multiple Disk Directories

2012-03-09 Thread Dyer, James
Did you try setting "baseDir" to the root directory and "recursive" to true ? (see http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor for more information). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 From: mike.rawl...@gxs.com [mailto:mike.rawl...@gxs.

Re: Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
This solution makes sense, but I still don't know if I can use solrCloud with this configuration or not. On Fri, Mar 9, 2012 at 2:06 PM, Robert Stewart wrote: > Split up index into say 100 cores, and then route each search to a > specific core by some mod operator on the user id: > > core_number

Re: Lucene vs Solr design decision

2012-03-09 Thread Robert Stewart
Split up index into say 100 cores, and then route each search to a specific core by some mod operator on the user id: core_number = userid % num_cores core_name = "core"+core_number That way each index core is relatively small (maybe 100 million docs or less). On Mar 9, 2012, at 2:02 PM, Glen

Re: Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
probably, and besides that, how can I use the features that SolrCloud provides (i.e. high availability and distribution)? The other solution would be to use SolrCloud and keep all of the users' information in single collection and use NRT. But on the other hand the frequency of updates on that big

Re: Lucene vs Solr design decision

2012-03-09 Thread Glen Newton
millions of cores will not work... ...yet. -glen On Fri, Mar 9, 2012 at 1:46 PM, Lan wrote: > Solr has no limitation on the number of cores. It's limited by your hardware, > inodes and how many files you could keep open. > > I think even if you went the Lucene route you would run into same hardw

Re: Lucene vs Solr design decision

2012-03-09 Thread Lan
Solr has no limitation on the number of cores. It's limited by your hardware, inodes and how many files you could keep open. I think even if you went the Lucene route you would run into same hardware limits. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-desig

DIH - FileListEntityProcessor reading from Multiple Disk Directories

2012-03-09 Thread mike.rawlins
All, I have an application that has RDF files in multiple subdirectories under a root directory. I'm using the DIH with a FileListEntityProcessor to load the index. All worked fine when the files were in a single directory, but I can't seem to figure out how to make a single data-config.xml rea

Re: Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
Sorry I didn't mention that, the number of users can be millions! Meaning that millions of cores! So I'm not sure if it's a good idea. On Fri, Mar 9, 2012 at 1:35 PM, Lan wrote: > Solr has cores which are independent search indexes. You could create a > separate core per user. > > -- > View this

Re: Lucene vs Solr design decision

2012-03-09 Thread Lan
Solr has cores which are independent search indexes. You could create a separate core per user. -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-vs-Solr-design-decision-tp3813457p3813489.html Sent from the Solr - User mailing list archive at Nabble.com.

Lucene vs Solr design decision

2012-03-09 Thread Alireza Salimi
Hi everybody, Let's say we have a system with billions of small documents (average of 2-3 fields). and each document belongs to JUST ONE user and searches are user specific, meaning that when we search for something, we just look into documents of that user. On the other hand we need to see the n

Re: How to rank an exact match higher?

2012-03-09 Thread Lan
Here's one way to do it using dismax. 1. You'll have two fields. title_text which is has a type of TextField title_string which has type String. This is an exact match field. 2. Set the dismax qf=title_string^10 title_text^1 You could even make this better by doing also handling infix searches

does solr have a mechanism for intercepting requests - before they are handed off to a request handler

2012-03-09 Thread geeky2
hello all, does solr have a mechanism that could intercept a request (before it is handed off to a request handler). the intent (from the business) is to send in a generic request - then pre-parse the url and send it off to a specific request handler. thank you, mark -- View this message in co

RE: Solr DIH and $deleteDocById

2012-03-09 Thread Dyer, James
This (almost) sounds like https://issues.apache.org/jira/browse/SOLR-2492 which was fixed in Solr 3.4 .. Are you on an earlier version? But maybe not, because you're seeing the # deleted documents increment, and prior to this bug fix (I think) the deleted counter wasn't getting incremented eith

Re: Stemmer Question

2012-03-09 Thread Ahmet Arslan
> I'd be very interested to see how you > did this if it is available. Does > this seem like something useful to the community at large? I PMed it to you. Filter is not a big deal. Just modified from {@link org.apache.lucene.wordnet.SynonymTokenFilter}. If requested, I can provide it publicly t

Re: Geolocation in SOLR with PHP application

2012-03-09 Thread Adolfo Castro Menna
Hi, Take a look at http://wiki.apache.org/solr/SpatialSearch Then from php, you need to pass the right parameters as described in the link above. On Fri, Mar 9, 2012 at 8:00 AM, Spadez wrote: > A quick, bump, I could really do with some input on this please. > > -- > View this message in contex

Re: Multithreaded DIH giving Operation not allowed after ResultSet closed solr 3.5

2012-03-09 Thread Mikhail Khludnev
Hello, AFAIK DIH is not multi-threaded at all. see https://issues.apache.org/jira/browse/SOLR-3011 Regards On Fri, Mar 9, 2012 at 4:22 PM, Rohit Khanna wrote: > When i try running a multi threaded DIH in solr 3.5 I get the following > error "Operation not allowed after ResultSet closed ". > >

Multicore -Create new Core request errors

2012-03-09 Thread Sujatha Arun
Hello, When I issue this query to create a new Solr Core , I get the error message HTTP Status 500 - Can't find resource 'solrconfig.xml' in classpath or '/home/searchuser/searchinstances/multi_core_prototype/solr/conf/ http:// /multi_core_prototype/admin/cores?action=CREATE&name=coreX&instanceDi

Multithreaded DIH giving Operation not allowed after ResultSet closed solr 3.5

2012-03-09 Thread Rohit Khanna
When i try running a multi threaded DIH in solr 3.5 I get the following error "Operation not allowed after ResultSet closed ". I have multiple entities mapped to fields, after the first query finishes i get this error for every other query thats been mentioned in my data-config.xml file. I have me

Re: docBoost with "fq" search

2012-03-09 Thread Ahmet Arslan
> if you store your boost in a search-able numeric field... You can simply sort by that field too. q=*:*&sort=your_boost_field desc

Re: Reporting tools

2012-03-09 Thread Koji Sekiguchi
(12/03/09 12:35), Donald Organ wrote: Are there any reporting tools out there? So I can analyzer search term frequency, filter frequency, etc? You may be interested in: Free Query Log Visualizer for Apache Solr http://soleami.com/ koji -- Query Log Visualizer for Apache Solr http://soleami.

Re: Geolocation in SOLR with PHP application

2012-03-09 Thread Spadez
A quick, bump, I could really do with some input on this please. -- View this message in context: http://lucene.472066.n3.nabble.com/Geolocation-in-SOLR-with-PHP-application-tp3807120p3812364.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing bigdata

2012-03-09 Thread Robert Stewart
It very much depends on your data and also what query features you will use. How many fields, the size of each field, how many unique values per field, how many fields are stored vs. only indexed, etc. I have a system with 3+ billion does, and each instance (each index core) has 120million doc

Re: Reporting tools

2012-03-09 Thread Ahmet Arslan
> Are there any reporting tools out > there?  So I can analyzer search term > frequency, filter frequency,  etc? You might be interested in this : http://www.sematext.com/search-analytics/index.html

Re: docBoost with "fq" search

2012-03-09 Thread Tanguy Moal
Hi Gian Marco, I don't know if it's possible to exploit documents' boost values from function queries (see http://wiki.apache.org/solr/FunctionQuery), but if you store your boost in a search-able numeric field, you could either : do q=*:* AND _val_:"your_boost_field" if you're using default

Re: docBoost with "fq" search

2012-03-09 Thread Gian Marco Tagliani
Hi Ahmet, thanks for the answer. I'm really suprised because I always thought docBoost as a kind of sorting tool. And I used in that way, I'm giving big boost to the documents I want back first in search. Do you think there is a trick to force the usage of docBoost in my special case? Gian Ma

Re: Reporting tools

2012-03-09 Thread Tommaso Teofili
as Gora says there is the stats component you can take advantage of or you could also use JMX directly [1] or LucidGaze [2][3] or commercial services like [4] or [5] (these are the ones I know but there may be also others), each of them with different level/type of service. Tommaso [1] : http://w

Re: Reporting tools

2012-03-09 Thread Gora Mohanty
On 9 March 2012 09:05, Donald Organ wrote: > Are there any reporting tools out there?  So I can analyzer search term > frequency, filter frequency,  etc? Do not have direct experience of any Solr reporting tool, but please see the Solr StatsComponent: http://wiki.apache.org/solr/StatsComponent T