wrong index version of solr3.2?

2011-06-09 Thread Bernd Fehling
After switching to solr 3.2 and building a new index from scratch I ran check_index which reports: Segments file=segments_or numSegments=1 version=FORMAT_3_1 [Lucene 3.1] Why do I get FORMAT_3_1 and Lucene 3.1, anything wrong with my index? from my schema.xml: from my solrconfig.xml: LUCENE_

Re: Multiple Values not getting Indexed

2011-06-09 Thread Stefan Matheis
Pawan, just separating multiple values by comma does not make them multi-value in solr-speak. But if you're already using DIH, you may try the http://wiki.apache.org/solr/DataImportHandler#RegexTransformer to 'splitBy' the field and get the expected field-values Regards Stefan On Thu, Jun 9, 201

Re: Code for getting distinct facet counts across shards(Distributed Process).

2011-06-09 Thread Bill Bell
I have coded and tested this and it appears to work. Are you having any problems? On 6/9/11 12:35 AM, "rajini maski" wrote: > In solr 1.4.1, for getting "distinct facet terms count" across shards, > > > >The piece of code added for getting count of distinct facet terms across >distributed proce

Re: Multiple Values not getting Indexed

2011-06-09 Thread Bill Bell
Is there a way to splitBy and trim the field after splitting? I know I can do it with Javascript in DIH, but how about using the regex parser? On 6/9/11 1:18 AM, "Stefan Matheis" wrote: >Pawan, > >just separating multiple values by comma does not make them >multi-value in solr-speak. But if you

Re: Multiple Values not getting Indexed

2011-06-09 Thread Bill Bell
You have to take the input and splitBy something like "," to get it into an array and reposted back to Solr... I believe others have suggested that? On 6/8/11 10:14 PM, "Pawan Darira" wrote: >Hi > >I am trying to index 2 fields with multiple values. BUT, it is only >putting >1 value for each &

Solr monitoring: Newrelic

2011-06-09 Thread roySolr
Hello, I found this tool to monitor solr querys, cache etc. http://newrelic.com/ http://newrelic.com/ I have some problems with the installation of it. I get the following errors: Could not locate a Tomcat, Jetty or JBoss instance in /var/www/sites/royr Try re-running the install command from

Re: Solr monitoring: Newrelic

2011-06-09 Thread Sujatha Arun
You need to install the new relic folder under "tomcat" folder, in case app server is "tomcat". Then from the command line ,you need to run the install commnad given in the new relic site from your newrelic folder. Once this is done, restart the appserver and you shld be able to see a log f

Re: Solr monitoring: Newrelic

2011-06-09 Thread roySolr
I use Jetty, it's standard in the solr package. Where can i find the "jetty" folder? then i can start this command: java -jar newrelic.jar install -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-monitoring-Newrelic-tp3042889p3042981.html Sent from the Solr - User mail

Re: Displaying highlights in formatted HTML document

2011-06-09 Thread lboutros
Hi Bryan, how do you index your html files ? I mean do you create fields for different parts of your document (for different stop words lists, stemming, etc) ? with DIH or solrj or something else ? iorixxx, could you please explain a bit more your solution, because I don't see how your solution

Re: Solr monitoring: Newrelic

2011-06-09 Thread Sujatha Arun
There is no jetty folder in the standard package ,but the jetty war file is under example/lib folder ,so this where u need to put the newrelic folder i guess Regards Sujatha On Thu, Jun 9, 2011 at 2:03 PM, roySolr wrote: > > I use Jetty, it's standard in the solr package. Where can i find >

Re: Solr monitoring: Newrelic

2011-06-09 Thread roySolr
Yes, that's the problem. There is no jetty folder. I have try the example/lib directory, it's not working. There is no jetty war file, only jetty-***.jar files Same error, could not locate a jetty instance. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-monitoring-Newr

Re: Displaying highlights in formatted HTML document

2011-06-09 Thread Ahmet Arslan
> iorixxx, could you please explain a bit more your solution, > because I don't > see how your solution could give an "exact highlighting", I > mean with the > different fields analysis for each fields. It does not work with your use case (e.g. different synonyms applied different parts of the ht

ExtractingRequestHandler - renaming tika generated fields

2011-06-09 Thread Jan Høydahl
Hi, I post a PDF from a CMS client, which has metadata about the document. One of those metadata is the title. I trust the title of the CMS more than the title extracted from the PDF, but I cannot find a way to both send &literal.title=CMS-Title as well as changing the name of the title field

Re: how to Index and Search non-Eglish Text in solr

2011-06-09 Thread Mohammad Shariq
Can I specify multiple language in filter tag in schema.xml ??? like below On 8 June 2011 18:47, Erick Erickson wrote: > This page is a handy reference for individual languages... > http://wiki.apache.org/solr/LanguageAnalysis > > But the usual approa

Re: Solr monitoring: Newrelic

2011-06-09 Thread Sujatha Arun
Try the RPM support accessed from the accout support page ,Giving all details ,they are very helpful. Regards Sujatha On Thu, Jun 9, 2011 at 2:33 PM, roySolr wrote: > Yes, that's the problem. There is no jetty folder. > I have try the example/lib directory, it's not working. There is no jetty

Re: AW: How to deal with many files using solr external file field

2011-06-09 Thread Martin Grotzke
Hi, as I'm also involved in this issue (on the side of Sven) I created a patch, that replaces the float array by a map that stores score by doc, so it contains as many entries as the external scoring file contains lines, but no more. I created an issue for this: https://issues.apache.org/jira/bro

Re: Tokenising based on known words?

2011-06-09 Thread lee carroll
we've played with HyphenationCompoundWordTokenFilterFactory it works better than maintaining a word dictionary to split (although we ended up not using it for reasons i can't recall) see http://lucene.apache.org/solr/api/org/apache/solr/analysis/HyphenationCompoundWordTokenFilterFactory.html O

Boost or sort a query with range values

2011-06-09 Thread jlefebvre
Hello I try to boost a query with a range values but I can't find the correct syntax : this is ok .&bq=myfield:"-1"^5 but I want to do something lik this &bq=myfield:"-1 to 1"^5 Boost value from -1 to 1 thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-or-sor

Re: Boost or sort a query with range values

2011-06-09 Thread lee carroll
[* TO *]^5 On 9 June 2011 11:31, jlefebvre wrote: > Hello > > I try to boost a query with a range values but I can't find the correct > syntax : > this is ok .&bq=myfield:"-1"^5 but I want to do something lik this > &bq=myfield:"-1 to 1"^5 > > Boost value from -1 to 1 > > thanks > > -- > View

Re: Boost or sort a query with range values

2011-06-09 Thread jlefebvre
thanks it's ok another question how to do a condition in &bq ? something like &bq=iif(myfield1 = 0 AND myfield2 = 1;1;0) thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-or-sort-a-query-with-range-values-tp3043328p3043406.html Sent from the Solr - User mailing l

Re: Boost or sort a query with range values

2011-06-09 Thread Jan Høydahl
Check the new if() function in Trunk, SOLR-2136. You could then use it in &bf= or &boost= -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 9. juni 2011, at 13.05, jlefebvre wrote: > thanks it's ok > > another question > how to d

Re: Boost or sort a query with range values

2011-06-09 Thread Jan Høydahl
Btw. your example is a simple boolean query, and this will also work: &bq=(myfield1:0 AND myfield2:1)^100.0 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 9. juni 2011, at 13.31, Jan Høydahl wrote: > Check the new if() function

Re: London open source search social - 13th June

2011-06-09 Thread Richard Marr
Just a quick reminder that we're meeting on Monday. Come along if you're around. On 1 June 2011 13:27, Richard Marr wrote: > Hi guys, > > Just to let you know we're meeting up to talk all-things-search on Monday > 13th June. There's usually a good mix of backgrounds and experience levels > so i

[Mahout] Integration with Solr

2011-06-09 Thread Adam Estrada
Has anyone integrated Mahout with Solr? I know that Carrot2 is part of the core build but the docs say that it's not very good for very large indexes. Anyone have thoughts on this? Thanks, Adam

Re: Tokenising based on known words?

2011-06-09 Thread Mark Mandel
Synonyms really wouldn't work for every possible combination of words in our index. Thanks for the idea though. Mark On Thu, Jun 9, 2011 at 3:42 PM, Gora Mohanty wrote: > On Thu, Jun 9, 2011 at 4:37 AM, Mark Mandel wrote: > > Not sure if this possible, but figured I would ask the question. >

Edismax sorting help

2011-06-09 Thread Denis Kuzmenok
Hi, everyone. I have fields: text fields: name, title, text boolean field: isflag (true / false) int field: popularity (0 to 9) Now i do query: defType=edismax start=0 rows=20 fl=id,name q=lg optimus fq= qf=name^3 title text^0.3 sort=score desc pf=name bf=isflag sqrt(popularity) mm=100% debug

Re: tika integration exception and other related queries

2011-06-09 Thread Gary Taylor
Naveen, Not sure our requirement matches yours, but one of the things we index is a "comment" item that can have one or more files attached to it. To index the whole thing as a single Solr document we create a zipfile containing a file with the comment details in it and any additional attach

Re: [Mahout] Integration with Solr

2011-06-09 Thread Tomás Fernández Löbbe
I don't know much of it, but I know Grant Ingersoll posted about that: http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ On Thu, Jun 9, 2011 at 9:24 AM, Adam Estrada wrote: > Has anyone integrated Mahout with Solr? I know that Carro

RE: Tokenising based on known words?

2011-06-09 Thread Steven A Rowe
Hi Mark, Are you familiar with shingles aka token n-grams? http://lucene.apache.org/solr/api/org/apache/solr/analysis/ShingleFilterFactory.html Use the empty string for the tokenSeparator to get wordstogether style tokens in your index. I think you'll want to apply this filter only at index-t

how can I return function results in my query?

2011-06-09 Thread Jason Toy
I want to be able to run a query like idf(text, 'term') and have that data returned with my search results. I've searched the docs,but I'm unable to find how to do it. Is this possible and how can I do that ?

Re: how can I return function results in my query?

2011-06-09 Thread Ahmet Arslan
> I want to be able to run a > query  like idf(text, 'term') and have that data > returned with my search results.  I've searched the > docs,but I'm unable to > find how to do it.  Is this possible and how can I do > that ? http://wiki.apache.org/solr/FunctionQuery#idf

Re: how to Index and Search non-Eglish Text in solr

2011-06-09 Thread Erick Erickson
No, you'd have to create multiple fieldTypes, one for each language Best Erick On Thu, Jun 9, 2011 at 5:26 AM, Mohammad Shariq wrote: > Can I specify multiple language in filter tag in schema.xml ???  like below > > >   >       >       words="stopwords.txt" enablePositionIncrements="true"/

Re: Edismax sorting help

2011-06-09 Thread Yonik Seeley
2011/6/9 Denis Kuzmenok : > Hi, everyone. > > I have fields: > text fields: name, title, text > boolean field: isflag (true / false) > int field: popularity (0 to 9) > > Now i do query: > defType=edismax > start=0 > rows=20 > fl=id,name > q=lg optimus > fq= > qf=name^3 title text^0.3 > sort=sco

Re: Solr monitoring: Newrelic

2011-06-09 Thread Ken Krugler
It sounds like "roySolr" is running embedded Jetty, launching solr using the start.jar If so, then there's no app container where Newrelic can be installed. -- Ken On Jun 9, 2011, at 2:28am, Sujatha Arun wrote: > Try the RPM support accessed from the accout support page ,Giving all > details

Re: Edismax sorting help

2011-06-09 Thread Denis Kuzmenok
Your solution seems to work fine, not perfect, but much better then mine :) Thanks! >> If i do query like "Samsung" i want to see prior most relevant results >> with  isflag:true and bigger popularity, but if i do query like "Nokia >> 6500"  and  there is isflag:false, then it should be higher

Re: Does MultiTerm highlighting work with the fastVectorHighlighter?

2011-06-09 Thread Koji Sekiguchi
(11/06/09 4:24), Burton-West, Tom wrote: We are trying to implement highlighting for wildcard (MultiTerm) queries. This seems to work find with the regular highlighter but when we try to use the fastVectorHighlighter we don't see any results in the highlighting section of the response. Appe

Re: [Mahout] Integration with Solr

2011-06-09 Thread Tommaso Teofili
Hello Adam, I've managed to create a small POC of integrating Mahout with Solr for a clustering task, do you want to use it for clustering only or possibly for other purposes/algorithms? More generally speaking, I think it'd be nice if Solr could be extended with a proper API for integrating cluste

Indexing data from multiple datasources

2011-06-09 Thread Greg Georges
Hello all, I have checked the forums to see if it is possible to create and index from multiple datasources. I have found references to SOLR-1358, but I don't think this fits my scenario. In all, we have an application where we upload files. On the file upload, I use the Tika extract handler to

[Free Text] Field Tokenizing

2011-06-09 Thread Adam Estrada
All, I am at a bit of a loss here so any help would be greatly appreciated. I am using the DIH to grab data from a DB. The field that I am most interested in has anywhere from 1 word to several paragraphs worth of free text. What I would really like to do is pull out phrases like "Joe's coffee sho

Re: [Mahout] Integration with Solr

2011-06-09 Thread Adam Estrada
Thanks for the reply, Tommaso! I would like to see tighter integration like in the way Nutch integrates with Solr. There is a single param that you set which points to the Solr instance. My interest in Mahout is with it's abitlity to handle large data and find frequency, co-location of data, cluste

RE: Does MultiTerm highlighting work with the fastVectorHighlighter?

2011-06-09 Thread Burton-West, Tom
Hi Koji, Thank you for your reply. >> It is the feature of FVH. FVH supports TermQuery, PhraseQuery, BooleanQuery >> and DisjunctionMaxQuery >> and Query constructed by those queries. Sorry, I'm not sure I understand. Are you saying that FVH supports MultiTerm highlighting? Tom

Re: ExtractingRequestHandler - renaming tika generated fields

2011-06-09 Thread Jan Høydahl
One solution to this problem is to change the order of field operation (http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations) to first do fmap.*= processing, then add the fields from literal.*=. Why would anyone want to rename a field they just have explicitly named any

Re: Does MultiTerm highlighting work with the fastVectorHighlighter?

2011-06-09 Thread Koji Sekiguchi
(11/06/10 0:14), Burton-West, Tom wrote: Hi Koji, Thank you for your reply. It is the feature of FVH. FVH supports TermQuery, PhraseQuery, BooleanQuery and DisjunctionMaxQuery and Query constructed by those queries. Sorry, I'm not sure I understand. Are you saying that FVH supports MultiT

Re: Indexing data from multiple datasources

2011-06-09 Thread Erick Erickson
Hmmm, when you say you use Tika, are you using some custom Java code? Because if you are, the best thing to do is query your database at that point and add whatever information you need to the document. If you're using DIH to do the crawl, consider implementing a Transformer to do the database que

Re: [Free Text] Field Tokenizing

2011-06-09 Thread Erick Erickson
The problem here is that none of the built-in filters or tokenizers have a prayer of recognizing what #you# think are phrases, since it'll be unique to your situation. If you have a list of phrases you care about, you could substitute a single token for the phrases you care about... But the overr

Re: [Free Text] Field Tokenizing

2011-06-09 Thread Adam Estrada
Erick, I totally understand that BUT the keyword tokenizer factory does a really good job extracting phrases (or what look like phrases from) from my data. I don't know why exactly but it does do it. I am going to continue working through it to see if I can't figure it out ;-) Adam On Thu, Jun 9

Re: [Free Text] Field Tokenizing

2011-06-09 Thread Erick Erickson
The KeywordTokenizer doesn't do anything to break up the input stream, it just treats the whole input to the field as a single token. So I don't think you'll be able to "extract" anything starting with that tokenizer. Look at the admin/analysis page to see a step-by-step breakdown of what your ana

RE: Indexing data from multiple datasources

2011-06-09 Thread Greg Georges
Hello Erick, Thanks for the response. No, I am using the extract handler to extract the data from my text files. In your second approach, you say I could use a DIH to update the index which would have been created by the extract handler in the first phase. I thought that lets say I get info fro

Re: Indexing data from multiple datasources

2011-06-09 Thread Erick Erickson
How are you using it? Streaming the files to Solr via HTTP? You can use Tika on the client to extract the various bits from the structured documents, and use SolrJ to assemble various bits of that data Tika exposes into a Solr document that you then send to Solr. At the point you're transferring da

RE: Indexing data from multiple datasources

2011-06-09 Thread David Ross
This thread got me thinking a bit... Does SOLR support the concept of "partial updates" to documents? By this I mean updating a subset of fields in a document that already exists in the index, and without having to resubmit the entire document. An example would be storing/indexing user tags ass

RE: Indexing data from multiple datasources

2011-06-09 Thread Greg Georges
No from what I understand, the way Solr does an update is to delete the document, then recreate all the fields, there is no partial updating of the file.. maybe because of performance issues or locking? -Original Message- From: David Ross [mailto:davidtr...@hotmail.com] Sent: 9 juin 201

Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, there seems to be no way to index CSV using the DataImportHandler. Using a combination of LineEntityProcessor and RegexTransformer as proposed in http://robotli

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, to make my point more clear: if the CSV has a fixed schema / column layout, using the RegexTransformer is of course a possibility (however awkward). But if you want to implement a (more or less) schema free shopping search engine ... regards On Thu, Jun 9, 2011 at 9:31 PM, Helmut Hoffer von

Unique Results from Edgy Text

2011-06-09 Thread Jamie Johnson
I am using the guide found here ( http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/) to build an autocomplete search capability but in my data set I have some documents which have the same value for the field that is being returned, so for instance

RE: Processing/Indexing CSV

2011-06-09 Thread Dyer, James
Helmut, I recently submitted SOLR-2549 (https://issues.apache.org/jira/browse/SOLR-2549) to handle both fixed-width and delimited flat files. To be honest, I only needed fixed-width support for my app so this might not support everything you mention for delimited files, but it should be a goo

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Bryan Loofbourrow
Ludovic, >> how do you index your html files ? I mean do you create fields for different parts of your document (for different stop words lists, stemming, etc) ? with DIH or solrj or something else ? << We are sending them over http, and using Tika to strip the HTML, at present. We do not split

Re: Processing/Indexing CSV

2011-06-09 Thread Yonik Seeley
On Thu, Jun 9, 2011 at 3:31 PM, Helmut Hoffer von Ankershoffen wrote: > Hi, > > there seems to be no way to index CSV using the DataImportHandler. Looking over the features you want, it looks like you're starting from a CSV file (as opposed to CSV stored in a database). Is there a reason that you

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread lboutros
I am not (yet) a tika user, perhaps that the iorixxx's solution is good for you. We will share the highlighter module and 2 other developments soon. ('have to see how to do that) Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Displaying-highli

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, just looked at your code. Definitely an improvement :-) The problem with the double-quotes is, that the delimiter (let's say ',') might be part of the column value. The goal is to process something like this without any tricky configuration name1,name2,name3 val1,"val2,...",val3 ... The use

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
s/provide and/provide any/ig ,-) On Thu, Jun 9, 2011 at 10:01 PM, Helmut Hoffer von Ankershoffen < helmut...@googlemail.com> wrote: > Hi, > > just looked at your code. Definitely an improvement :-) > > The problem with the double-quotes is, that the delimiter (let's say ',') > might be part of th

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Bryan Loofbourrow
> -Original Message- > From: Ahmet Arslan [mailto:iori...@yahoo.com] > Sent: Wednesday, June 08, 2011 11:56 PM > To: solr-user@lucene.apache.org > Subject: Re: Displaying highlights in formatted HTML document > > > > --- On Thu, 6/9/11, Bryan Loofbourrow > wrote: > > > From: Bryan Loofbour

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, yes, it's about CSV files loaded via HTTP from shops to be fed into a shopping search engine. The CSV Loader cannot map fields (only field values) etc. DIH is flexible enough for building the importing part of such a thing but misses elegant handling of CSV data ... Regards On Thu, Jun 9, 2

Re: Processing/Indexing CSV

2011-06-09 Thread Yonik Seeley
On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen wrote: > Hi, > yes, it's about CSV files loaded via HTTP from shops to be fed into a > shopping search engine. > The CSV Loader cannot map fields (only field values) etc. You can provide your own list of fieldnames and optionally igno

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, ... that would be an option if there is a defined set of field names and a single column/CSV layout. The scenario however is different csv files (from different shops) with individual column layouts (separators, encodings etc.). The idea is to map known field names to defined field names in th

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Ahmet Arslan
> OK, I think see what you're up to. Might be pretty viable > for me as well. > Can you talk about anything in your mappings.txt files that > is an > important part of the solution? It is not important. I just copied it. Plus html strip char filter does not have mappings parameter. It was a copy

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Bryan Loofbourrow
> > OK, I think see what you're up to. Might be pretty viable > > for me as well. > > Can you talk about anything in your mappings.txt files that > > is an > > important part of the solution? > > It is not important. I just copied it. Plus html strip char filter does > not have mappings parameter.

RE: solr Invalid Date in Date Math String/Invalid Date String

2011-06-09 Thread Chris Hostetter
: Here is the error message: : : Fieldtype: tdate (I use the default one in solr schema.xml) : Field value(Index): 2006-12-22T13:52:13Z : Field value(query): [2006-12-22T00:00:00Z TO 2006-12-22T23:59:59Z] <<< : with '[' and ']' : : And it generates the result below: i think the piece of info

Re: Processing/Indexing CSV

2011-06-09 Thread Ken Krugler
On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: > Hi, > > ... that would be an option if there is a defined set of field names and a > single column/CSV layout. The scenario however is different csv files (from > different shops) with individual column layouts (separators, encod

Re: Solr Indexing Patterns

2011-06-09 Thread Judioo
Very informative links and statement Jonathan. thank you. On 6 June 2011 20:55, Jonathan Rochkind wrote: > This is a start, for many common best practices: > > http://wiki.apache.org/solr/SolrRelevancyFAQ > > Many of the questions in there have an answer that involves de-normalizing. > As an e

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
On Thu, Jun 9, 2011 at 11:05 PM, Ken Krugler wrote: > > On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: > > > Hi, > > > > ... that would be an option if there is a defined set of field names and > a > > single column/CSV layout. The scenario however is different csv files > (from

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, btw: there seems to somewhat of a non-match regarding efforts to Enhance DIH regarding the CSV format (James Dyer) and the effort to maintain the CSVLoader (Ken Krugler). How about merging your efforts and migrating the CSVLoader to a CSVEntityProcessor (cp. my initial email)? :-) Best Regard

RE: Displaying highlights in formatted HTML document

2011-06-09 Thread Ahmet Arslan
> Yes, I asked the wrong question. What I was subconsciously > getting at is > this: how are you avoiding the possibility of getting hits > in the HTML > elements? Is that accomplished by putting tag names in your > stopwords, or > by some other mechanism? HtmlStripCharFilter removes html tags. Af

Re: Processing/Indexing CSV

2011-06-09 Thread Ken Krugler
On Jun 9, 2011, at 2:21pm, Helmut Hoffer von Ankershoffen wrote: > Hi, > > btw: there seems to somewhat of a non-match regarding efforts to Enhance DIH > regarding the CSV format (James Dyer) and the effort to maintain the > CSVLoader (Ken Krugler). How about merging your efforts and migrating t

SolrCloud questions

2011-06-09 Thread Upayavira
I'm exploring SolrCloud for a new project, and have some questions based upon what I've found so far. The setup I'm planning is going to have a number of multicore hosts, with cores being moved between hosts, and potentially with cores merging as they get older (cores are time based, so once today

Re: Tokenising based on known words?

2011-06-09 Thread Mark Mandel
Thanks for the feedback! This definitely gives me some options to work on! Mark On Thu, Jun 9, 2011 at 11:21 PM, Steven A Rowe wrote: > Hi Mark, > > Are you familiar with shingles aka token n-grams? > > > http://lucene.apache.org/solr/api/org/apache/solr/analysis/ShingleFilterFactory.html > > U

Where to find the Log file

2011-06-09 Thread Ruixiang Zhang
Where can I find the log file of solr? Is it turned on by default? (I use Jetty) Thanks Ruixiang

Re: Boosting result on query.

2011-06-09 Thread Jeff Boul
HI, Thank you for your answer. But... I cannot use a boost calculated offline since the boost will changed depending of the query made. Each query will boost the query differently. Any other ideaàs ? Jeff -- View this message in context: http://lucene.472066.n3.nabble.com/Boosting-result-on-

Re: Where to find the Log file

2011-06-09 Thread Jack Repenning
On Jun 9, 2011, at 5:45 PM, Ruixiang Zhang wrote: > Where can I find the log file of solr? (I use > Jetty) By default, it's in /solr/logs/solr.log > Is it turned on by default? Yes. Oh, yes. Very much so. Uh-huh, you betcha. -==- Jack Repenning Technologist Codesion Business Unit CollabNet,

Re: Where to find the Log file

2011-06-09 Thread Morris Mwanga
Here's help on how to setup logging http://skybert.wordpress.com/2009/07/22/how-to-get-solr-to-log-to-a-log-file/ - Morris - Original Message - From: "Ruixiang Zhang" To: solr-user@lucene.apache.org Sent: Thursday, June 9, 2011 8:45:30 PM GMT -05:00 US/Canada Eastern Subject: Where to

Re: tika integration exception and other related queries

2011-06-09 Thread Naveen Gupta
Hi Gary, Similar thing we are doing, but we are not creating an XML doc, rather we are leaving TIKA to extract the content and depends on dynamic fields. We are not storing the text as well. But not sure if in future that would be the case. What about microsoft 7 and later related attachments. Is

ERROR on posting update request using CURL in php

2011-06-09 Thread Naveen Gupta
Hi This is my document in php $xmldoc = 'F_14674gmail.com121sample.pptx'; $ch = curl_init("http://localhost:8080/solr/update";); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt ($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_HTTPHEADER, array("Co

Re: how to Index and Search non-Eglish Text in solr

2011-06-09 Thread Mohammad Shariq
Thanks Erick for your help. I have another silly question. Suppose I created mutiple fieldTypes e.g. news_English, news_Chinese, news_Japnese etc. after creating these field, can I copy all these to CopyField "*defaultquery" *like below : * *and my "defaultquery" looks like :* *Is this right

Re: Multiple Values not getting Indexed

2011-06-09 Thread Pawan Darira
it did not work :( On Thu, Jun 9, 2011 at 12:53 PM, Bill Bell wrote: > You have to take the input and splitBy something like "," to get it into > an array and reposted back to > Solr... > > I believe others have suggested that? > > On 6/8/11 10:14 PM, "Pawan Darira" wrote: > > >Hi > > > >I am t

Re: Multiple Values not getting Indexed

2011-06-09 Thread Gora Mohanty
On Fri, Jun 10, 2011 at 10:36 AM, Pawan Darira wrote: > it did not work :( [...] Please provide more details of what you tried, what was the error, and any error messages that you got. Just saying that "it did not work" makes it pretty much impossible for anyone to help you. You might take a loo

Re: ERROR on posting update request using CURL in php

2011-06-09 Thread Naveen Gupta
Hi, curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary 'testdoc' Regards Naveen On Fri, Jun 10, 2011 at 10:18 AM, Naveen Gupta wrote: > Hi > > This is my document > > in php > > $xmldoc = 'F_146 name="userid">74gmail.com name="attachment_size">121 nam

Re: ERROR on posting update request using CURL in php

2011-06-09 Thread Naveen
Hi, Basically i need to post something like this using curl in php The example of php explained in earlier thread, curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary 'testdoc' Should we need to create a temp file and using put command can we do it u

Re: SolrCloud questions

2011-06-09 Thread Mohammad Shariq
I am also planning to move to SolrCloud; since its still in under development, I am not sure about its behavior in Production. Please update us once you find it stable. On 10 June 2011 03:56, Upayavira wrote: > I'm exploring SolrCloud for a new project, and have some questions based > upon what

Re: Unique Results from Edgy Text

2011-06-09 Thread Ahmet Arslan
--- On Thu, 6/9/11, Jamie Johnson wrote: > From: Jamie Johnson > Subject: Unique Results from Edgy Text > To: solr-user@lucene.apache.org > Date: Thursday, June 9, 2011, 10:42 PM > I am using the guide found here ( > http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-que

Re: Indexing data from multiple datasources

2011-06-09 Thread Tom Gross
it's a feature request since ages ... https://issues.apache.org/jira/browse/SOLR-139 On 06/09/2011 09:25 PM, Greg Georges wrote: No from what I understand, the way Solr does an update is to delete the document, then recreate all the fields, there is no partial updating of the file.. maybe bec