date:20120420

Re: Question concerning date fields

2012-04-20 Thread Gora Mohanty

On 21 April 2012 09:12, Bill Bell wrote: > We are loading a long (number of seconds since 1970?) value into Solr using > java and Solrj. What is the best way to convert this into the right Solr date > fields? [...] There are various options, depending on the source of your data, and how you are

Question concerning date fields

2012-04-20 Thread Bill Bell

We are loading a long (number of seconds since 1970?) value into Solr using java and Solrj. What is the best way to convert this into the right Solr date fields? Sent from my Mobile device 720-256-8076

Re: Storing the md5 hash of pdf files as a field in the index

2012-04-20 Thread Otis Gospodnetic

Hi Joe, You could write a custom URP - Update Request Processor. This URP would take the value from one SolrDocument field (say the one that has the full path to your PDF and is thus unique), compute MD5 using Java API for doing that, and would stick that MD5 value in some field that you've de

Re: Crawling an SCM to update a Solr index

2012-04-20 Thread Otis Gospodnetic

Kristian, For what it's worth, for http://search-lucene.com and http://search-hadoop.com we simply check out the source code from the SCM and index from the file system. It works reasonably well. The only issues that I can recall us having is with the source code organization under SCM - modu

Re: How to index pdf's content with SolrJ?

2012-04-20 Thread Erick Erickson

This might help: http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/ The bit here is you have to have Tika parse your file and then extract the content to send to Solr... Best Erick On Fri, Apr 20, 2012 at 7:36 PM, vasuj wrote: > > 0 > down vote > favorite > share [g+] > share

Re: Date granularity

2012-04-20 Thread Erick Erickson

Well, that's just the way Solr works. You can tune the range performance by playing with the prescisionStep, Trie fields are built to make range queries perform well. Best Erick On Fri, Apr 20, 2012 at 10:20 AM, vybe3142 wrote: > ... Inelegant as opposed to the possibility of using /DAY to speci

Re: Convert a SolrDocumentList to DocList

2012-04-20 Thread Erick Erickson

OK, this description really sounds like an XY problem. Why do you want to do this? What is the higher-level problem you're trying to solve? Best Erick On Fri, Apr 20, 2012 at 9:18 AM, Ramprakash Ramamoorthy wrote: > Dear all, > > Is there any way I can convert a SolrDocumentList to a DocL

How to index pdf's content with SolrJ?

2012-04-20 Thread vasuj

0 down vote favorite share [g+] share [fb] share [tw] I'm trying to index a few pdf documents using SolrJ as described at http://wiki.apache.org/solr/ContentStreamUpdateRequestExample, below there's the code: import static org.apache.solr.handler.extraction.ExtractingParams.LITERALS_PREFIX; impor

null pointer error with solr deduplication

2012-04-20 Thread Peter Markey

Hello, I have been trying out deduplication in solr by following: http://wiki.apache.org/solr/Deduplication. I have defined a signature field to hold the values of the signature created based on few other fields in a document and the idea seems to work like a charm in a single solr instance. But,

Re: Opposite to MoreLikeThis?

2012-04-20 Thread Darren Govoni

You could run the MLT for the document in question, then gather all those doc id's in the MLT results and negate those in a subsequent query. Not sure how robust that would work with very large result sets, but something to try. Another approach would be to gather the "interesting terms" from the

Re: Language Identification

2012-04-20 Thread Jan Høydahl

Hi, Solr just reuses Tika's language identifier. But you are of course free to do your language detection on the Nutch side if you choose and not invoke the one in Solr. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 20. apr.

Re: SolrCloud indexing question

2012-04-20 Thread Jamie Johnson

I believe the SolrJ code round robins which server the request is sent to and as such probably wouldn't send to the same server in your case, but if you had an HttpSolrServer for instance and were pointing to only one particular intsance my guess would be that would be 5 separate requests from the

Re: How to escape “<” character in regex in Solr schema.xml?

2012-04-20 Thread smooth almonds

Thanks Jeevanandam. I couldn't get any regex pattern to work except a basic one to look for sentence-ending punctuation followed by whitespace: [.!?](?=\s) However, this isn't good enough for my needs so I'm switching tactics at the moment and working on plugging in OpenNLP's SentenceDetector int

Language Identification

2012-04-20 Thread Bai Shen

I'm working on using Shuyo's work to improve the language identification of our search. Apparently, it's been moved from Nutch to Solr. Is there a reason for this? http://code.google.com/p/language-detection/issues/detail?id=34 I would prefer to have the processing done in Nutch as that has the

Crawling an SCM to update a Solr index

2012-04-20 Thread Van Tassell, Kristian

Hello everyone, I'm in the process of pulling together requirements for a SCM (source code manager) crawling mechanism for our Solr index. I probably don't need to argue the need for a crawler, but to be specific, we have an index which receives its updates from a custom built application. I wo

Re: SolrCloud indexing question

2012-04-20 Thread Darren Govoni

Gotcha. Now does that mean if I have 5 threads all writing to a local shard, will that shard piggyhop those index requests onto a SINGLE connection to the leader? Or will they spawn 5 connections from the shard to the leader? I really hope the formerthe latter won't scale well. On Fri, 2012-0

Re: String ordering appears different with sort vs range query

2012-04-20 Thread Cat Bieber

Thanks for looking at this. I'll see if we can sneak an upgrade to 3.6 into the project to get this working. -Cat On 04/20/2012 12:03 PM, Erick Erickson wrote: BTW, nice problem statement... Anyway, I see this too in 3.5. I do NOT see this in 3.6 or trunk, so it looks like a bug that got fixed

How can I get the top term in solr?

2012-04-20 Thread neosky

Actually I would like to know two meaning of the top term in document level and index file level. 1.The top term in document level means that I would like to know the top term frequency in all document(only calculate once in one document) The solr schema.jsp seems to provide to top 10 term, but it

Re: Special characters in synonyms.txt on Solr 3.5

2012-04-20 Thread Robert Muir

On Fri, Apr 20, 2012 at 12:10 PM, carl.nordenf...@bwinparty.com wrote: > Directly injecting the letter "ö" into synonyms like so: > island, ön > island, "ön" > > renders the following exception on startup (both lines renders the same > error): > > java.lang.RuntimeException: java.nio.charset.Malf

Special characters in synonyms.txt on Solr 3.5

2012-04-20 Thread carl.nordenf...@bwinparty.com

Hi, I'm having issues with special characters in synonyms.txt on Solr 3.5. I'm running a multi-lingual index and need certain terms to give results across all languages no matter what language the user uses. I figured that this should be easily resovled by just adding the different words to syn

Re: String ordering appears different with sort vs range query

2012-04-20 Thread Erick Erickson

BTW, nice problem statement... Anyway, I see this too in 3.5. I do NOT see this in 3.6 or trunk, so it looks like a bug that got fixed in the 3.6 time-frame. Don't have the time right now to go back over the JIRA's to see... Best Erick On Thu, Apr 19, 2012 at 3:39 PM, Cat Bieber wrote: > I'm tr

Re: Further questions about behavior in ReversedWildcardFilterFactory

2012-04-20 Thread neosky

I have to discard this method at this time. Thank you all the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Further-questions-about-behavior-in-ReversedWildcardFilterFactory-tp3905416p3926423.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dismax request handler and Dismax query parser

2012-04-20 Thread Erick Erickson

Right, this is often a source of confusion and there's a discussion about this on the dev list (but the URL escapes me).. Anyway, qt and defType have pretty much completely different meanings. Saying "defType=dismax" means you're providing all the dismax parameters on the URL. Saying "qt=handlern

Re: Abbreviations with KeywordTokenizerFactory

2012-04-20 Thread Erick Erickson

Yeah, this is a pretty ugly problem. You have two problems, neither of which is all that amenable to simple solutions. 1> context at index time. St, in your example, is either Saint or Street. Solr has nothing built in to it to distinguish this. so you need to do some processing "somew

RE: Maximum Open Cursors using JdbcDataSource and cacheImpl

2012-04-20 Thread Keith Naas

I have removed most of the file to protect the innocent. As you can see we have a high level item that has subentity called skus, and then those skus contain subentities for size/width/etc. The database is configured for only 10 open cursors, and voila, when the 11th item is being processed w

Storing the md5 hash of pdf files as a field in the index

2012-04-20 Thread kuchenbrett

Hi, I want to build an index of quite a number of pdf and msword files using the Data Import Request Handler and the Tika Entity Processor. It works very well. Now I would like to use the md5 digest of the binary (pdf/word) file as the unique key in t he index. But I do not know how to implem

Re: How can I use a function or fieldvalue as the default for query(subquery, default)?

2012-04-20 Thread jimtronic

I was able to use solr 3.1 functions to accomplish this logic: /solr/select?q=_val_:sum(query("{!dismax qf=text v='solr rocks'}"),product(map(query("{!dismax qf=text v='solr rocks'}",-1),0,100,0,1), product(this_field,that_field))) -- View this message in context: http://lucene.472066.n3.nab

Re: SolrCloud indexing question

2012-04-20 Thread Jamie Johnson

my understanding is that you can send your updates/deletes to any shard and they will be forwarded to the leader automatically. That being said your leader will always be the place where the index happens and then distributed to the other replicas. On Fri, Apr 20, 2012 at 7:54 AM, Darren Govoni

Re: Date granularity

2012-04-20 Thread vybe3142

... Inelegant as opposed to the possibility of using /DAY to specify day granularity on a single term query In any case, if that's how SOLR works, that's fine Any rough idea of the performance of range queries vs truncated day queries? Otherwise, I might just write up a quick program to compare t

Re: Large Index and OutOfMemoryError: Map failed

2012-04-20 Thread Gopal Patwa

We cannot avoid auto soft commit, since we need Lucene NRT feature. And I use StreamingUpdateSolrServer for adding/updating index. On Thu, Apr 19, 2012 at 7:42 AM, Boon Low wrote: > Hi, > > Also came across this error recently, while indexing with > 10 DIH > processes in parallel + default index

Convert a SolrDocumentList to DocList

2012-04-20 Thread Ramprakash Ramamoorthy

Dear all, Is there any way I can convert a SolrDocumentList to a DocList and set it in the QueryResult object? Or, the workaround adding a SolrDocumentList object to the QueryResult object? -- With Thanks and Regards, Ramprakash Ramamoorthy, Project Trainee, Zoho Corporation. +9

Re: Solr file size limit?

2012-04-20 Thread Bram Rongen

Hmm, reading your reply again I see that Solr only uses the first 10k tokens from each field so field length should not be a problem per se.. It could be my document contain very large tokens and unorganized tokens, could this startle Solr? On Fri, Apr 20, 2012 at 2:03 PM, Bram Rongen wrote: > Y

Re: Solr file size limit?

2012-04-20 Thread Bram Rongen

Yeah, I'm indexing some PDF documents.. I've extracted the text through tika (pre-indexing).. and the largest field in my DB is 20MB. That's quite extensive ;) My Solution for the moment is to cut this text to the first 500KB, that should be enough for a decent index and search capabilities.. Shoul

SolrCloud indexing question

2012-04-20 Thread Darren Govoni

Hi, I just wanted to make sure I understand how distributed indexing works in solrcloud. Can I index locally at each shard to avoid throttling a central port? Or all the indexing has to go through a single shard leader? thanks

Re: Solr Cloud vs sharding vs grouping

2012-04-20 Thread Martijn v Groningen

Hi Jean-Sebastien, For some grouping features (like total group count and grouped faceting), the distributed grouping requires you to partition your documents into the right shard. Basically groups can't cross shards. Otherwise the group counts or grouped facet counts may not be correct. If you us

Re: Importing formats - Which works best with Solr?

2012-04-20 Thread Erick Erickson

CSV files can also be imported, which may be more compact. Best Erick On Fri, Apr 20, 2012 at 6:01 AM, Dmitry Kan wrote: > James, > > You could create xml files of format: > > > 1 name="Name"> name="Surname"> > > > > and then post them to SOLR using, for example, the post.sh utility from > SO

Re: Date granularity

2012-04-20 Thread Erick Erickson

The only way to get more "elegant" would be to index the dates with the granularity you want, i.e. truncate to DAY at index time then truncate to DAY at query time as well. Why do you consider ranges inelegant? How else would you imagine it would be done? Best Erick On Thu, Apr 19, 2012 at 4:07

Re: Solr with UIMA

2012-04-20 Thread dsy99

Hi Rahul, Thank you for the reply. I tried by modifying the updateRequestProcessorChain as follows: But still I am not able to see the UIMA fields in the result. I executed the following curl command to index a file named "test.docx" curl "http://localhost:8983/solr/update/extract?fmap.content

Re: How sorlcloud distribute data among shards of the same cluster?

2012-04-20 Thread Boon Low

Thanks. My colleague also pointed a previous thread and the solution out: add a new update.chain for data import/update handlers to bypass the distributed update processor. A simpler use case example for SolrCloud newbies could be on distributed search, to experience the features of the cloud-

Re: Solr file size limit?

2012-04-20 Thread Lance Norskog

Good point! Do you store the large file in your documents, or just index them? Do you have a "largest file" limit in your environment? Try this: ulimit -a What is the "file size"? On Thu, Apr 19, 2012 at 8:04 AM, Shawn Heisey wrote: > On 4/19/2012 7:49 AM, Bram Rongen wrote: >> >> Yesterday I'v

Re: Wrong categorization with DIH

2012-04-20 Thread Lance Norskog

Working with the DIH is a little easier if you make database view and load from that. You can set all of the field names and see exactly what the DIH gets. On Thu, Apr 19, 2012 at 10:11 AM, Ramo Karahasan wrote: > Hi, > > yes i use every oft hem. > > Thanks for your hint... I'll have a look at th

Re: Solr Cloud vs sharding vs grouping

2012-04-20 Thread Lance Norskog

The implementation of grouping in the trunk is completely different from 236. Grouping works across distributed search: https://issues.apache.org/jira/browse/SOLR-2066 committed last September. On Thu, Apr 19, 2012 at 6:04 PM, Jean-Sebastien Vachon wrote: > Hi All, > > I am currently trying out

Re: Importing formats - Which works best with Solr?

2012-04-20 Thread Dmitry Kan

James, You could create xml files of format: 1 and then post them to SOLR using, for example, the post.sh utility from SOLR's binary distribution. HTH, Dmitry On Fri, Apr 20, 2012 at 12:35 PM, Spadez wrote: > Hi, > > I am designing a custom scrapping solution. I need to store my data, do

Re: PolySearcher in Solr

2012-04-20 Thread Lance Norskog

The PolySearcher in Lucy seems to do exactly what is "Distributed Search" in Solr. On Fri, Apr 20, 2012 at 2:58 AM, Lance Norskog wrote: > In Solr&Lucene, a "shard" is one part of an "index". There cannot be > "multiple indices in one shard". > > All of the shards in an index share the same schem

Re: PolySearcher in Solr

2012-04-20 Thread Lance Norskog

In Solr&Lucene, a "shard" is one part of an "index". There cannot be "multiple indices in one shard". All of the shards in an index share the same schema, and no document is in two or more shards. "distributed search" as implemented by solr searches several shards in one index. On Thu, Apr 19, 20

Importing formats - Which works best with Solr?

2012-04-20 Thread Spadez

Hi, I am designing a custom scrapping solution. I need to store my data, do some post processing on it and then import it into SOLR. If I want to import data into SOLR in the quickest, easiest way possible, what format should I be saving my scrapped data in? I get the impression that .XML would

Re: # open files with SolrCloud

2012-04-20 Thread Sami Siren

On Thu, Apr 19, 2012 at 3:12 PM, Sami Siren wrote: > I have a simple solrcloud setup from trunk with default configs; 1 > shard with one replica. As few other people have reported there seems > to be some kind of leak somewhere that causes the number of open files > to grow over time when doing in

47 matches

Mail list logo