On 21 April 2012 09:12, Bill Bell wrote:
> We are loading a long (number of seconds since 1970?) value into Solr using
> java and Solrj. What is the best way to convert this into the right Solr date
> fields?
[...]
There are various options, depending on the source of
your data, and how you are
We are loading a long (number of seconds since 1970?) value into Solr using
java and Solrj. What is the best way to convert this into the right Solr date
fields?
Sent from my Mobile device
720-256-8076
Hi Joe,
You could write a custom URP - Update Request Processor. This URP would take
the value from one SolrDocument field (say the one that has the full path to
your PDF and is thus unique), compute MD5 using Java API for doing that, and
would stick that MD5 value in some field that you've de
Kristian,
For what it's worth, for http://search-lucene.com and http://search-hadoop.com
we simply check out the source code from the SCM and index from the file
system. It works reasonably well. The only issues that I can recall us having
is with the source code organization under SCM - modu
This might help:
http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/
The bit here is you have to have Tika parse your file
and then extract the content to send to Solr...
Best
Erick
On Fri, Apr 20, 2012 at 7:36 PM, vasuj wrote:
>
> 0
> down vote
> favorite
> share [g+]
> share
Well, that's just the way Solr works. You can tune the range
performance by playing with the prescisionStep, Trie
fields are built to make range queries perform well.
Best
Erick
On Fri, Apr 20, 2012 at 10:20 AM, vybe3142 wrote:
> ... Inelegant as opposed to the possibility of using /DAY to speci
OK, this description really sounds like an XY problem. Why do you
want to do this? What is the higher-level problem you're trying to solve?
Best
Erick
On Fri, Apr 20, 2012 at 9:18 AM, Ramprakash Ramamoorthy
wrote:
> Dear all,
>
> Is there any way I can convert a SolrDocumentList to a DocL
0
down vote
favorite
share [g+]
share [fb]
share [tw]
I'm trying to index a few pdf documents using SolrJ as described at
http://wiki.apache.org/solr/ContentStreamUpdateRequestExample, below there's
the code:
import static
org.apache.solr.handler.extraction.ExtractingParams.LITERALS_PREFIX;
impor
Hello,
I have been trying out deduplication in solr by following:
http://wiki.apache.org/solr/Deduplication. I have defined a signature field
to hold the values of the signature created based on few other fields in a
document and the idea seems to work like a charm in a single solr instance.
But,
You could run the MLT for the document in question, then gather all
those doc id's in the MLT results and negate those in a subsequent
query. Not sure how robust that would work with very large result sets,
but something to try.
Another approach would be to gather the "interesting terms" from the
Hi,
Solr just reuses Tika's language identifier. But you are of course free to do
your language detection on the Nutch side if you choose and not invoke the one
in Solr.
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
On 20. apr.
I believe the SolrJ code round robins which server the request is sent
to and as such probably wouldn't send to the same server in your case,
but if you had an HttpSolrServer for instance and were pointing to
only one particular intsance my guess would be that would be 5
separate requests from the
Thanks Jeevanandam. I couldn't get any regex pattern to work except a basic
one to look for sentence-ending punctuation followed by whitespace:
[.!?](?=\s)
However, this isn't good enough for my needs so I'm switching tactics at the
moment and working on plugging in OpenNLP's SentenceDetector int
I'm working on using Shuyo's work to improve the language identification of
our search. Apparently, it's been moved from Nutch to Solr. Is there a
reason for this?
http://code.google.com/p/language-detection/issues/detail?id=34
I would prefer to have the processing done in Nutch as that has the
Hello everyone,
I'm in the process of pulling together requirements for a SCM (source code
manager) crawling mechanism for our Solr index. I probably don't need to argue
the need for a crawler, but to be specific, we have an index which receives its
updates from a custom built application. I wo
Gotcha.
Now does that mean if I have 5 threads all writing to a local shard,
will that shard piggyhop those index requests onto a SINGLE connection
to the leader? Or will they spawn 5 connections from the shard to the
leader? I really hope the formerthe latter won't scale well.
On Fri, 2012-0
Thanks for looking at this. I'll see if we can sneak an upgrade to 3.6
into the project to get this working.
-Cat
On 04/20/2012 12:03 PM, Erick Erickson wrote:
BTW, nice problem statement...
Anyway, I see this too in 3.5. I do NOT see
this in 3.6 or trunk, so it looks like a bug that got fixed
Actually I would like to know two meaning of the top term in document level
and index file level.
1.The top term in document level means that I would like to know the top
term frequency in all document(only calculate once in one document)
The solr schema.jsp seems to provide to top 10 term, but it
On Fri, Apr 20, 2012 at 12:10 PM, carl.nordenf...@bwinparty.com
wrote:
> Directly injecting the letter "ö" into synonyms like so:
> island, ön
> island, "ön"
>
> renders the following exception on startup (both lines renders the same
> error):
>
> java.lang.RuntimeException: java.nio.charset.Malf
Hi,
I'm having issues with special characters in synonyms.txt on Solr 3.5.
I'm running a multi-lingual index and need certain terms to give results across
all languages no matter what language the user uses.
I figured that this should be easily resovled by just adding the different
words to syn
BTW, nice problem statement...
Anyway, I see this too in 3.5. I do NOT see
this in 3.6 or trunk, so it looks like a bug that got fixed
in the 3.6 time-frame. Don't have the time right now
to go back over the JIRA's to see...
Best
Erick
On Thu, Apr 19, 2012 at 3:39 PM, Cat Bieber wrote:
> I'm tr
I have to discard this method at this time. Thank you all the same.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Further-questions-about-behavior-in-ReversedWildcardFilterFactory-tp3905416p3926423.html
Sent from the Solr - User mailing list archive at Nabble.com.
Right, this is often a source of confusion and there's a discussion about
this on the dev list (but the URL escapes me)..
Anyway, qt and defType have pretty much completely different meanings.
Saying "defType=dismax" means you're providing all the dismax
parameters on the URL.
Saying "qt=handlern
Yeah, this is a pretty ugly problem. You have two
problems, neither of which is all that amenable to
simple solutions.
1> context at index time. St, in your example, is
either Saint or Street. Solr has nothing built
in to it to distinguish this. so you need to do some
processing "somew
I have removed most of the file to protect the innocent. As you can see we
have a high level item that has subentity called skus, and then those skus
contain subentities for size/width/etc. The database is configured for only 10
open cursors, and voila, when the 11th item is being processed w
Hi,
I want to build an index of quite a number of pdf and msword files using the
Data Import Request Handler and the Tika Entity Processor. It works very well.
Now I would like to use the md5 digest of the binary (pdf/word) file as the
unique key in t
he index. But I do not know how to implem
I was able to use solr 3.1 functions to accomplish this logic:
/solr/select?q=_val_:sum(query("{!dismax qf=text v='solr
rocks'}"),product(map(query("{!dismax qf=text v='solr
rocks'}",-1),0,100,0,1), product(this_field,that_field)))
--
View this message in context:
http://lucene.472066.n3.nab
my understanding is that you can send your updates/deletes to any
shard and they will be forwarded to the leader automatically. That
being said your leader will always be the place where the index
happens and then distributed to the other replicas.
On Fri, Apr 20, 2012 at 7:54 AM, Darren Govoni
... Inelegant as opposed to the possibility of using /DAY to specify day
granularity on a single term query
In any case, if that's how SOLR works, that's fine
Any rough idea of the performance of range queries vs truncated day queries?
Otherwise, I might just write up a quick program to compare t
We cannot avoid auto soft commit, since we need Lucene NRT feature. And I
use StreamingUpdateSolrServer for adding/updating index.
On Thu, Apr 19, 2012 at 7:42 AM, Boon Low wrote:
> Hi,
>
> Also came across this error recently, while indexing with > 10 DIH
> processes in parallel + default index
Dear all,
Is there any way I can convert a SolrDocumentList to a DocList and
set it in the QueryResult object?
Or, the workaround adding a SolrDocumentList object to the
QueryResult object?
--
With Thanks and Regards,
Ramprakash Ramamoorthy,
Project Trainee,
Zoho Corporation.
+9
Hmm, reading your reply again I see that Solr only uses the first 10k
tokens from each field so field length should not be a problem per se.. It
could be my document contain very large tokens and unorganized tokens,
could this startle Solr?
On Fri, Apr 20, 2012 at 2:03 PM, Bram Rongen wrote:
> Y
Yeah, I'm indexing some PDF documents.. I've extracted the text through
tika (pre-indexing).. and the largest field in my DB is 20MB. That's quite
extensive ;) My Solution for the moment is to cut this text to the first
500KB, that should be enough for a decent index and search capabilities..
Shoul
Hi,
I just wanted to make sure I understand how distributed indexing works
in solrcloud.
Can I index locally at each shard to avoid throttling a central port? Or
all the indexing has to go through a single shard leader?
thanks
Hi Jean-Sebastien,
For some grouping features (like total group count and grouped
faceting), the distributed grouping requires you to partition your
documents into the right shard. Basically groups can't cross shards.
Otherwise the group counts or grouped facet counts may not be correct.
If you us
CSV files can also be imported, which may be more
compact.
Best
Erick
On Fri, Apr 20, 2012 at 6:01 AM, Dmitry Kan wrote:
> James,
>
> You could create xml files of format:
>
>
> 1 name="Name"> name="Surname">
>
>
>
> and then post them to SOLR using, for example, the post.sh utility from
> SO
The only way to get more "elegant" would be to
index the dates with the granularity you want, i.e.
truncate to DAY at index time then truncate
to DAY at query time as well.
Why do you consider ranges inelegant? How else
would you imagine it would be done?
Best
Erick
On Thu, Apr 19, 2012 at 4:07
Hi Rahul,
Thank you for the reply. I tried by modifying the
updateRequestProcessorChain as follows:
But still I am not able to see the UIMA fields in the result. I executed
the following curl command to index a file named "test.docx"
curl
"http://localhost:8983/solr/update/extract?fmap.content
Thanks. My colleague also pointed a previous thread and the solution out: add a
new update.chain for data import/update handlers to bypass the distributed
update processor.
A simpler use case example for SolrCloud newbies could be on distributed
search, to experience the features of the cloud-
Good point! Do you store the large file in your documents, or just index them?
Do you have a "largest file" limit in your environment? Try this:
ulimit -a
What is the "file size"?
On Thu, Apr 19, 2012 at 8:04 AM, Shawn Heisey wrote:
> On 4/19/2012 7:49 AM, Bram Rongen wrote:
>>
>> Yesterday I'v
Working with the DIH is a little easier if you make database view and
load from that. You can set all of the field names and see exactly
what the DIH gets.
On Thu, Apr 19, 2012 at 10:11 AM, Ramo Karahasan
wrote:
> Hi,
>
> yes i use every oft hem.
>
> Thanks for your hint... I'll have a look at th
The implementation of grouping in the trunk is completely different
from 236. Grouping works across distributed search:
https://issues.apache.org/jira/browse/SOLR-2066
committed last September.
On Thu, Apr 19, 2012 at 6:04 PM, Jean-Sebastien Vachon
wrote:
> Hi All,
>
> I am currently trying out
James,
You could create xml files of format:
1
and then post them to SOLR using, for example, the post.sh utility from
SOLR's binary distribution.
HTH,
Dmitry
On Fri, Apr 20, 2012 at 12:35 PM, Spadez wrote:
> Hi,
>
> I am designing a custom scrapping solution. I need to store my data, do
The PolySearcher in Lucy seems to do exactly what is "Distributed
Search" in Solr.
On Fri, Apr 20, 2012 at 2:58 AM, Lance Norskog wrote:
> In Solr&Lucene, a "shard" is one part of an "index". There cannot be
> "multiple indices in one shard".
>
> All of the shards in an index share the same schem
In Solr&Lucene, a "shard" is one part of an "index". There cannot be
"multiple indices in one shard".
All of the shards in an index share the same schema, and no document
is in two or more shards. "distributed search" as implemented by solr
searches several shards in one index.
On Thu, Apr 19, 20
Hi,
I am designing a custom scrapping solution. I need to store my data, do some
post processing on it and then import it into SOLR.
If I want to import data into SOLR in the quickest, easiest way possible,
what format should I be saving my scrapped data in? I get the impression
that .XML would
On Thu, Apr 19, 2012 at 3:12 PM, Sami Siren wrote:
> I have a simple solrcloud setup from trunk with default configs; 1
> shard with one replica. As few other people have reported there seems
> to be some kind of leak somewhere that causes the number of open files
> to grow over time when doing in
47 matches
Mail list logo