Re: KStemmer for Solr 3.x +

2011-04-08 Thread David Smiley (@MITRE.org)
I see no reason why it would not be compatible. - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/KStemmer-for-Solr-3-x-tp2796594p2798213.html Sent from the Solr - User mailing list archive at Nabble.

Re: Lucid Works

2011-04-08 Thread Erik Hatcher
On Apr 8, 2011, at 17:32 , Mark wrote: > How come this new version is bundled with rails and why is there no .war > output format? Rails, via JRuby, is used in LucidWorks Enterprise for both the admin and search interfaces. (and also powers the Alerts REST API). > I wanted a simple drop in re

Re: Lucid Works

2011-04-08 Thread Mark
How come this new version is bundled with rails and why is there no .war output format? I wanted a simple drop in replacement for my current war :( On 4/8/11 1:27 PM, Andrzej Bialecki wrote: On 4/8/11 9:55 PM, Andy wrote: --- On Fri, 4/8/11, Andrzej Bialecki wrote: :) If you don't need t

Re: Special characters during indexing and searching

2011-04-08 Thread alexw
Sorry wrong link to the thread, here is the correct one: http://lucene.472066.n3.nabble.com/Special-characters-during-indexing-and-searching-td2795914.html -- View this message in context: http://lucene.472066.n3.nabble.com/Special-characters-during-indexing-and-searching-tp2795914p2797158.html

Re: Special characters during indexing and searching

2011-04-08 Thread alexw
I am using Nabble to view the thread, and the format seems to be ok: http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=reply&node=2796849 1> what version of Solr. Solr 1.4 2> have you looked in your index (admin page and/or luke) to see if what you have indexed there is what you

Re: KStemmer for Solr 3.x +

2011-04-08 Thread Mark
And by alternatives I meant if it was not 3.1 compatible what are other stemmers that behave/perform closely. thanks On 4/8/11 1:59 PM, Mark wrote: I only want an alternative if it is not compatible with 3.1. On 4/8/11 1:26 PM, Smiley, David W. wrote: LucidKStemmer (& LucidGaze) are LGPL lic

Re: KStemmer for Solr 3.x +

2011-04-08 Thread Mark
I only want an alternative if it is not compatible with 3.1. On 4/8/11 1:26 PM, Smiley, David W. wrote: LucidKStemmer (& LucidGaze) are LGPL licensed -- I just verified this with the NOTICE.txt in the download. I wish Lucid's site was more clear on this -- I checked their first but found no i

Re: UIMA example setup w/o OpenCalais

2011-04-08 Thread Jay Luker
Thank you, that worked. For the record, my objection to the OpenCalais service is that their ToS states that they will "retain a copy of the metadata submitted by you", and that by submitting data to the service you "grant Thomson Reuters a non-exclusive perpetual, sublicensable, royalty-free lice

ArrayIndexOutOfBoundsException with facet query

2011-04-08 Thread Burton-West, Tom
The query below results in an array out of bounds exception: select/?q=solr&version=2.2&start=0&rows=0&facet=true&facet.field=topicStr Here is the exception: Exception during facet.field of topicStr:java.lang.ArrayIndexOutOfBoundsException: -1931149 at org.apache.lucene.index.TermInfosR

Re: Lucid Works

2011-04-08 Thread Andrzej Bialecki
On 4/8/11 9:55 PM, Andy wrote: --- On Fri, 4/8/11, Andrzej Bialecki wrote: :) If you don't need the new functionality in 4.x, you don't need the performance improvements, What performance improvements does 4.x have over 3.1? Ah... well, many - take a look at the CHANGES.txt. reindexi

Re: KStemmer for Solr 3.x +

2011-04-08 Thread Smiley, David W.
LucidKStemmer (& LucidGaze) are LGPL licensed -- I just verified this with the NOTICE.txt in the download. I wish Lucid's site was more clear on this -- I checked their first but found no information on the license terms. I don't know why you want an alternative. If you insist I suppose you cou

SOLR-236 (Field Collapsing) patch and 3.1

2011-04-08 Thread Will Milspec
Hi all, We're using the solr-236 (field collapsing) patch on solr 1.4.1 and wish to upgrade to 3.1 Has anyone applied this patch to 3.1, successfully or unsuccessfully? [ftr, Solr 4.x includes field collapsing; 3.1 does not ] The issue has several patch files, including some for 1.4.1 specifica

Re: Special characters during indexing and searching

2011-04-08 Thread Erick Erickson
I'm having real trouble with the formatting. Either Google has changed or somehow all the markup is getting stripped on your end. Could you send as plain text and see if that works? But from what I can make out, we're doing *something* different. Because I get parsed queries like below, and they'r

Re: Lucid Works

2011-04-08 Thread Andy
--- On Fri, 4/8/11, Andrzej Bialecki wrote: > :) If you don't need the new functionality in 4.x, you don't > need the performance improvements, What performance improvements does 4.x have over 3.1? > reindexing cycles are long (indexes tend to stay around) > then 3.1 is a safer bet. If you n

Re: Lucid Works

2011-04-08 Thread Andrzej Bialecki
On 4/8/11 4:58 PM, Mark wrote: Doesn't look like you allow new members to post questions in that forum. There's a "Create new account" link there, you simply need to register and log in. I have just one last question ;) We are deciding whether to upgrade our 1.4 production environment to

Re: Special characters during indexing and searching

2011-04-08 Thread alexw
Thanks Erick. Here is the Solr response with debug on. The productName IS in the qf parameter in dismax. I have also pasted my dismax definition and the "text" field type definition: − 0 47 − on on 0 bit/star dismax 10 2.2 − − bit/star bit/star bit/star − +DisjunctionMaxQuery((longDesc:

RE: One item, multiple fields, and range queries

2011-04-08 Thread wojtekpia
Hi Hoss, I realize I'm reviving a really old thread, but I have the same need, and SpanNumericRangeQuery sounds like a good solution for me. Can you give me some guidance on how to implement that? Thanks, Wojtek -- View this message in context: http://lucene.472066.n3.nabble.com/One-item-multip

KStemmer for Solr 3.x +

2011-04-08 Thread Mark
Is there any compatible KStemmer (com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory) or equivalent that works well with 3.1? If not, what would be a decent alternative? Thanks

Re: Trying to Post. Emails rejected as spam.

2011-04-08 Thread Parker Johnson
I have tried to change to plain text format and reword my question several times. Weird and annoying. Here is my question, maybe it'll somehow go through this time: In my master/slave setup, my slaves are polling the master every minute. My indexes are getting large, to the point where I

Re: Tips for getting unique results?

2011-04-08 Thread Peter Spam
Thanks for the note, Shaun, but the documentation indicates that the sorting is only in ascending order :-( facet.sort This param determines the ordering of the facet field constraints. • count - sort the constraints by count (highest count first) • index - to return the constra

Re: Special characters during indexing and searching

2011-04-08 Thread Erick Erickson
This works fine for me. Tack on &debugQuery=on to your URL and post that please unless the stuff below helps But note a couple of things 1> productName isn't part of the default dismax configuration in your solrconfig.xml file, so unless you put it there it's not being searched on. Try putting

Re: Lucid Works

2011-04-08 Thread Erick Erickson
Unless you need the goodies in 4.x, I'd go with 3.1, just on the principle that 4.x is more fluid than 3.1, and I'd go with more static code. 4.x gets whatever patches the committers decide are good whereas 3.1 (or 3.2 if that comes out) will have a smaller set of changes. Both are well tested, it

Re: Strip spaces and new line characters from data

2011-04-08 Thread Erick Erickson
Your schema stuff didn't come through, possibly your mail server is removing it. But two things come to mind. First, Solr has a trimfilterfactory, see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.TrimFilterFactory

Re: How to index PDF file stored in SQL Server 2008

2011-04-08 Thread Darx Oman
Hi again what you are missing is field mapping no need for TikaEntityProcessor since you are not accessing pdf files

Re: How to index PDF file stored in SQL Server 2008

2011-04-08 Thread Darx Oman
Hi there TikaEntityProcessor is available as part of DIH-extras*.jar in 3.x and 4.0

RE: Problems indexing very large set of documents

2011-04-08 Thread Brandon Waterloo
I think I've finally found the problem. The files that work are PDF version 1.6. The files that do NOT work are PDF version 1.4. I'll look into updating all the old documents to PDF 1.6. Thanks everyone! ~Brandon Waterloo From: Ezequiel Calderara [ezech...@gm

Special characters during indexing and searching

2011-04-08 Thread alexw
Hi, I have a field named "productName" in my schema which uses the standard "text" field type. And one of my product name is "star/bit". When I search for "star/bit" (without quotes) using the dismax request hander, NO results was found. After some research, looks like during indexing, "star/bit"

Re: Trade Mark symbol(TM) in Index

2011-04-08 Thread Markus Jelsma
http://lucene.472066.n3.nabble.com/Indexing-data-with-Trade-Mark-Symbol- td2774421.html > Hi, > > I have to jump into this topic. > > I can not find the mentioned replies, Markus but I still noticed that > problem, too. > > What could be the cause? > > Regards, > Em > > Markus Jelsma-2 wrote:

Re: Problems indexing very large set of documents

2011-04-08 Thread Ezequiel Calderara
Ohh sorry... didn't realize that they already sent you that link :P On Fri, Apr 8, 2011 at 12:35 PM, Ezequiel Calderara wrote: > Maybe those files are created with a different Adobe Format version... > > See this: > http://lucene.472066.n3.nabble.com/PDF-parser-exception-td644885.html > > On Fri,

Re: Problems indexing very large set of documents

2011-04-08 Thread Ezequiel Calderara
Maybe those files are created with a different Adobe Format version... See this: http://lucene.472066.n3.nabble.com/PDF-parser-exception-td644885.html On Fri, Apr 8, 2011 at 12:14 PM, Brandon Waterloo < brandon.water...@matrix.msu.edu> wrote: > A second test has revealed that it is something to

Re: Trade Mark symbol(TM) in Index

2011-04-08 Thread Em
Hi, I have to jump into this topic. I can not find the mentioned replies, Markus but I still noticed that problem, too. What could be the cause? Regards, Em Markus Jelsma-2 wrote: > > You opened the same thread this monday and got two replies. > >> Hi, >> Has anyone indexed the data with T

RE: Problems indexing very large set of documents

2011-04-08 Thread Brandon Waterloo
A second test has revealed that it is something to do with the contents, and not the literal filenames, of the second set of files. I renamed one of the second-format files and tested it and Solr still failed. However, the problem still only applies to those files of the second naming format.

Re: Lucid Works

2011-04-08 Thread Mark
Doesn't look like you allow new members to post questions in that forum. I have just one last question ;) We are deciding whether to upgrade our 1.4 production environment to 4.x or 3.1. What were you decisions when deciding to release 4.x over 3.1? Thanks again On 4/8/11 1:13 AM, Andrzej Bi

Surge 2011 CFP Deadline Extended

2011-04-08 Thread Katherine Jeschke
OmniTI is pleased to announce that the CFP deadline for Surge 2011, the Scalability and Performance Conference, (Baltimore: Sept 28-30, 2011) has been extended to 23:59:59 EDT, April 17, 2011. The event focuses upon case studies that demonstrate successes (and failures) in Web applications and Inte

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-08 Thread Andy
Perfect. Thank you very much. Andy --- On Fri, 4/8/11, Pascal Coupet wrote: > From: Pascal Coupet > Subject: Re: Very very large scale Solr Deployment = how to do (Expert > Question)? > To: solr-user@lucene.apache.org > Date: Friday, April 8, 2011, 10:20 AM > I dit put a pdf version here: > h

RE: Problems indexing very large set of documents

2011-04-08 Thread Brandon Waterloo
I had some time to do some research into the problems. From what I can tell, it appears Solr is tripping up over the filename. These are strictly examples, but, Solr handles this filename fine: 32-130-A0-84-african_activist_archive-a0a6s3-b_12419.pdf However, it fails with either a parsing er

Strip spaces and new line characters from data

2011-04-08 Thread alexei
Hello Everyone, I am getting my integer field data from xml. Some docs fail because of a newline character at the end of the string. I am attempting to strip spaces and new line characters as follows: The above still results in a numberformatexception. Is this the right appr

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-08 Thread Pascal Coupet
I dit put a pdf version here: https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B02DHBZQYYT_MmRkZTY0YjQtODJmZS00Mzg0LWJiNTEtOWJjNzViNmNjZjdh&hl=en&authkey=CL2Fq_QG Zoom it to get a better view. Pascal 2011/4/8 Andy > Could anyone please post a version of the document in pdf or

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-08 Thread Andy
Could anyone please post a version of the document in pdf or openoffice format? I'm on Linux so there's no way for me to use MS Word. Thanks. --- On Fri, 4/8/11, Albert Vila wrote: > From: Albert Vila > Subject: Re: Very very large scale Solr Deployment = how to do (Expert > Question)? > To

Re: MoreLikeThis match

2011-04-08 Thread Brian Lamb
I've looked at both wiki pages and none really clarify the difference between these two. If I copy and paste an existing index value for field and do an mlt search, it shows up under match but not results. What is the difference between these two? On Thu, Apr 7, 2011 at 2:24 PM, Brian Lamb wrote:

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-08 Thread Albert Vila
Yes, It won't work if you are using OpenOffice. However it works fine with Microsoft Word. Hope it helps. Albert On 8 April 2011 14:55, Andy wrote: > I can't view the document either -- it showed up empty. > > Has anyone succeeded in viewing it? > > Andy > > --- On Fri, 4/8/11, Albert Vila wro

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-08 Thread Andy
I can't view the document either -- it showed up empty. Has anyone succeeded in viewing it? Andy --- On Fri, 4/8/11, Albert Vila wrote: > From: Albert Vila > Subject: Re: Very very large scale Solr Deployment = how to do (Expert > Question)? > To: solr-user@lucene.apache.org > Date: Friday,

Tutorial StreamingUpdateSolrServer

2011-04-08 Thread stockii
Hello. i want to change my full-imports from DIH to use of Java and StreamingUpdateSolrServer ... is in the wiki a little how to or something similar ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Cor

solrj dependency wstx-asl

2011-04-08 Thread Tim Terlegård
solrj has a dependency on wstx-asl. I've successfully used Solr 1.4 maven artifacts for a while and the wstx-asl dependency had the wrong groupId so it's always been missing in my application, but it has still worked fine. Is wstx-asl really needed? Is it only needed in certain circumstances? Is it

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-08 Thread François Schiettecatte
You might also want to look at the heritrix crawler too: http://crawler.archive.org/ I have written three crawlers in the past, all for RSS feeds, it is not easy. Happy to provide tips and help if you want to go down that route. François On Apr 8, 2011, at 1:53 AM, Andrea Campi wrote:

Re: Indexing pdf files - question.

2011-04-08 Thread Mike
Hi Erick, Thank you for the Reply. Now I am able to index the PDF files and search. I am left with couple of questions: 1. Can I add custom field to Search Response XML (Ex: Need to as description which gives brief description about the PDF file). 2. Currently Solr runs as a separate applicatio

Re: UIMA example setup w/o OpenCalais

2011-04-08 Thread Tommaso Teofili
Hi Jay, you should be able to do so by simply removing the OpenCalaisAnnotator from the execution pipeline commenting the line 124 of the file: solr/contrib/uima/src/main/resources/org/apache/uima/desc/OverridingParamsExtServicesAE.xml Hope this helps, Tommaso 2011/4/7 Jay Luker > Hi, > > I'd wo

Re: Tips for getting unique results?

2011-04-08 Thread Shaun Campbell
Pete Surely the default sort order for facets is by descending count order. See http://wiki.apache.org/solr/SimpleFacetParameters. If your results are really sorted in ascending order can't you sort them externally eg Java? Hope that helps. Shaun

RE: Using MLT feature

2011-04-08 Thread Frederico Azeiteiro
Yes, i guess that could be an option, but I'm not very experienced with Java development and SOLR modifications. As my main goal was to create a similar sig in C#, I just use the c# method to create the sig myself before indexing instead of SOLR Deduplicate function. That way, when searching I c

Re: Using MLT feature

2011-04-08 Thread lboutros
Couldn't you extend the TextProfileSignature and modify the TokenComparator class to use lexical order when token have the same frequency ? Ludovic. 2011/4/8 Frederico Azeiteiro [via Lucene] < ml-node+2794604-1683988626-383...@n3.nabble.com> > Hi. > > Yes, I manage to create a stable comparator

RE: Using MLT feature

2011-04-08 Thread Frederico Azeiteiro
Hi. Yes, I manage to create a stable comparator in c# for profile. The problem is before that on: ... tokens.put(s, tok); ... Imagine you have 2 tokens with the same frequency, on the stable sort comparator for profile it will maintain the original order. The problem is that the original orde

Re: Using MLT feature

2011-04-08 Thread lboutros
It seems that tokens are sorted by frequencies : ... Collections.sort(profile, new TokenComparator()); ... and private static class TokenComparator implements Comparator { public int compare(Token t1, Token t2) { return t2.cnt - t1.cnt; } and cnt is the token count. Ludovic. 20

Re: Sourcesense packager

2011-04-08 Thread Simone Tripodi
Hi Mark, thanks for your interest!!! That's a feature we haven't had the the time to work on (yet ;)) As workaround, what I can suggest you is modifying the tomcat-users.xml file inside the produced tomcat, once unzipped. HTH, please let me know!!! Have a nice day, Simo http://people.apache.org/~s

StreamingUpdateSolrServer and PHP

2011-04-08 Thread stockii
is it possible to use StreamingUpdateSolrServer with a php application ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores < 100.000 - Solr1 for Search-Requests -

Re: Lucid Works

2011-04-08 Thread Andrzej Bialecki
On 4/7/11 10:16 PM, Mark wrote: Andrezej, Thanks for the info. I have a question regarding stability though. How are you able to guarantee the stability of this release when 4.0 is still a work in progress? I believe the last version Lucid released was 1.4 so why did you choose to release a 4.x

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-08 Thread Albert Vila
Ephraim, I still can't view the document. Don't know if I'm doing something wrong, but I downloaded it and It appears to be empty. Albert On 7 April 2011 09:32, Ephraim Ofir wrote: > You can't view it online, but you should be able to download it from: > https://docs.google.com/leaf?id=0BwOEbnJ