Thank you for your answers.
I have a Map and want to boost the score of that documents
during search time.
In my example i get that map inside ValueSource and boost the matched
documents score.
In the query if {!graph} is added then it will return boosted query
otherwise it will return regular l
Erick,
Ok. Let me try with plain java one. Possibly I'll need more tight
integration like injecting a core into the singleton, etc. But I don't know
yet.
Thanks for your efforts.
On Wed, Dec 28, 2011 at 5:48 PM, Erick Erickson wrote:
> I must be missing something here. Why would this be any dif
Hi Juan,
I'm using Solr 3.1
The type of the date field is long.
Let's say, the documents indexed in Solr server be..
1326c5cc09bbc99a_1
1326c5cc09bbc99a
1316078009000
<.. Some Other fields here ..>
Some subject here...
Some message here...
1321dff33cecd5f4_1
1321dff33cecd5f4
131495631
Alexander,
I have two ideas how to implement fast dedupe externally, assuming your PKs
don't fit to java.util.*Map:
- your crawler can use inprocess RDBMS (Derby, H2) to track dupes;
- if your crawler is stateless - it doesn't track PKs which has been
already crawled, you can retrieve it
It seems like my operation system was causing me trouble in some way. I
couldn't find what was triggering this issue, but after migrating the whole
project from wamp to lamp it has been resolved and everything is running
smoothly again.
Thank you very much for your help!
Regards,
--
View this me
Yes I have been warned that query index each time before adding doc to
index might be resource consuming. Will check it.
As for the overwrite parameter I think the name is not the best then.
People outside the "business" like me misuse it and assume what I wrote.
Overwrite shall mean what it means
Unfortunately I have a lot of duplicates and taking that searching might
suffer I will try with implementing update procesor.
But your idea is interesting and I will consider it, thanks.
Best Regards
Alexander Aristov
On 28 December 2011 19:12, Tanguy Moal wrote:
> Hello Alexander,
>
> I don
This copying is a bit overstated here because of the way that small
segments are merged into larger segments. Those larger segments are then
copied much less often than the smaller ones.
While you can wind up with lots of copying in certain extreme cases, it is
quite rare. In particular, if you
On Tue, Dec 27, 2011 at 1:10 PM, Ahmet Arslan wrote:
>
> To achieve this behavior, you can use StandardTokenizerFactory and
> EdgeNGramFilterFactory and LowerCaseFilterFactory at index time.
>
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
>
Thanks, b
Here is an example of schema design: a PDF file of 5MB might have
maybe 50k of actual text. The Solr ExtractingRequestHandler will find
that text and only index that. If you set the field to stored=true,
the 5mb will be saved. If saved=false, the PDF is not saved. Instead,
you would store a link to
Yes, the 3.5 Solr is opening and reading the Solr 1.4 index. When you
do a commit, it will rewrite the index in 3.5 format.
Doing a complete copy of the configs from 1.4 to 3.5 is easy, but
there are a lot of new features and changed defaults in the
solrconfig.xml file. These make indexing faster,
: Subject: Sort facets by defined custom Collator
deja-vu...
http://www.lucidimagination.com/search/p:solr/s:email/l:user/sort:date?q=%22Facet+Ordering%22
-Hoss
: I've seen in the solr faceting overview that it is possible to sort
: either by count or lexicographically, but is there a way to sort so
: the lowest counts come back first?
Peter Sturge looked into this a while back and provided a patch, but there
were some issues with it that never got reso
I have a database where a user is searching for documents, and the
things which I'm faceting on are tags. Tags boil down to things of
interest, perhaps names, places, etc. The user in our case has asked
for the ability to change the ordering so they can easily find things
that appear very infrequ
(11/12/29 5:50), Jamie Johnson wrote:
I've seen in the solr faceting overview that it is possible to sort
either by count or lexicographically, but is there a way to sort so
the lowest counts come back first?
As far as I know, no. What is your use case?
koji
--
http://www.rondhuit.com/en/
: Is it possible that the system is running out of RAM, and swapping,
: or is aggressively swapping for some reason?
it doesn't have to be the solr /tomcat process memory getting swapped out
-- but that's certainly possible -- it could also be that the filesystem
cache is expunging the disk pag
Right, I think that's what's happening here.
Google "swapiness" if you are on Linux.
Alternatively, one could add something to prevent the OS from swapping out
Solr's process. Here is how ElasticSearch does it, for example:
https://github.com/elasticsearch/elasticsearch/issues/464
Otis
On Wed, Dec 28, 2011 at 2:16 AM, Parvin Gasimzade
wrote:
> I have created custom Solr FunctionQuery in Solr 3.4.
> I extended ValueSourceParser, ValueSource, Query and QParserPlugin classes.
Note that you only need a QParserPlugin implementation for top level
query types, not function queries.
Wi
Hi Parvin,
You must also add the query parser definition to solrconfig.xml, for
example:
*Juan*
On Wed, Dec 28, 2011 at 4:16 AM, Parvin Gasimzade <
parvin.gasimz...@gmail.com> wrote:
> Hi all,
>
> I have created custom Solr FunctionQuery in Solr 3.4.
> I extended ValueSourceParser, ValueSou
What else, if anything, do you have running on the server?
Because it's possible that pages are being swapped out
for other processes to use.
Solr itself shouldn't, as far as I know, time out anything so I
expect you're running into issues with the op system.
Best
Erick
On Wed, Dec 28, 2011 at 1
Hi,
I don't have an answer, but maybe I can help you if you provide more
information, for example:
- Which Solr version are you running?
- Which is the type of the date field?
- The output you are getting
- The output you expect
- Any other information that you consider relevant.
Thanks,
*Juan*
I've seen in the solr faceting overview that it is possible to sort
either by count or lexicographically, but is there a way to sort so
the lowest counts come back first?
: Of course. What I meant to say was there is
: always exactly one token in a non-tokenized
: field and it's offset is always exactly 0. There
: will never be tokens at position 1.
:
: So asking to match phrases, which is based on
: term positions is basically a no-op.
That's not always true.
c
: I have a lots of files in my FTP account,and i use the curlftpfs to mount
: them to folder and then start index them with solrj api, but after a minutes
: pass something strange happen and the mounted folder is not accessible and
: crash,also i can not unmount it and the message "device is in us
: Exception in thread "main" java.io.IOException: Job failed!
:
: at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
:
: at
:
org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
:
: at
:
org.apache.nutch.indexer.solr.SolrDeleteDup
: Can I use a XPathEntityProcessor in conjunction with an
: ExtractingRequestHandler? Also, the scripting language that
: XPathEntityProcessor uses/supports, is that just ECMA/JavaScript?
:
: Or is XPathEntityProcessor only supported for use in conjuntion with the
: DataImportHandler?
The Entit
You really haven't posted enough details for people to guess as to what
your problem might be (in particuar: the actaul examples of your configs,
and any log messages during hte import)
please consult this wiki page and then post a followup with more
details...
https://wiki.apache.org
: That said, writing your own update request handler
: that detected this case isn't very difficult,
: extend UpdateRequestProcessorFactory/UpdateRequestProcessor
: and use it as a plugin.
i can't find the thread at the moment, but the general issue that has
caused people headaches with this typ
On Wed, Dec 28, 2011 at 8:52 PM, Odey wrote:
> Hello,
>
> I'm running Solr 3.5 on a XAMPP/Tomcat environment. It's working pretty good
> for just one exception: when Solr remains idle without handling any requests
> for about 5-10 mins the first request sent again will be delayed for a few
> secon
Hello,
I'm running Solr 3.5 on a XAMPP/Tomcat environment. It's working pretty good
for just one exception: when Solr remains idle without handling any requests
for about 5-10 mins the first request sent again will be delayed for a few
seconds. Subsequent requests are lightning-fast as usual. So i
Hello Alexander,
I don't know much about your requirements in terms of size and
performances, but I've had a similar use case and found a pretty simple
workaround.
If your duplicate rate is not too high, you can have the
SignatureProcessor to generate fingerprint of documents (you already did
On Wed, Dec 28, 2011 at 5:47 AM, ku3ia wrote:
> So, based on p.2) and on my previous researches, I conclude, that the more
> documents I want to retrieve, the slower is search and main problem is the
> cycle in writeDocs method. Am I right? Can you advice something in this
> situation?
For the fi
Thanks Eric,
it sets me direction. I will be writing new plugin and will get back to the
dev forum with results and then we will decide next steps.
Best Regards
Alexander Aristov
On 28 December 2011 18:08, Erick Erickson wrote:
> Well, the short answer is that nobody else has
> 1> had a simil
Right, you were mislead by the discussion in for that patch,
the option you specified was NOT how the patch was
eventually implemented. Try reading this page instead:
http://wiki.apache.org/solr/MultitermQueryAnalysis
The short form is that with 3.6 (i.e. 3.x at this point) you
may not have to do
Well, the short answer is that nobody else has
1> had a similar requirement
AND
2> not found a suitable work around
AND
3> implemented the change and contributed it back.
So, if you'd like to volunteer .
Seriously. If you think this would be valuable and are
willing to work on it, hop on over
There's no easy/efficient way that I know of to do this. Perhaps a good
question is what value-add this is going to make for your app and is
there a better way to convey this information. For instance, would
highlighting convey "enough" information to your user?
You're right that you don't want to
I must be missing something here. Why would this be any different from
any other singleton? I just did a little experiment where I implemented
the classic singleton pattern in a RequestHandler and accessed
from a Filter (both plugins) with no problem at all, just the usual
blah var = MySingleton.ge
Could it be a commit you're needing?
curl 'localhost:8983/solr/update?commit=true'
/Martin
On Wed, Dec 28, 2011 at 11:47 AM, mumairshamsi wrote:
> http://lucene.472066.n3.nabble.com/file/n3616191/02.xml 02.xml
>
> i am trying to index this file for this i am using this command
>
> java
> http://lucene.472066.n3.nabble.com/file/n3616191/02.xml
> 02.xml
>
> i am trying to index this file for this i am using this
> command
>
> java -jar post.jar *.xml
>
> commands run fine but when i search not result is
> displaying
>
> I think it is encoding problem can any one help
Thanks community! That helps!
To check practically, I have now setup Solr 3.5 in test environment. Few
observations on that,
1. I simply copy-pasted one of the Solr 1.4 instance on Solr 3.5 setup
(after correcting schema.config and solr.config files based on what is
suited for 3.5). If
Hi,
Thanks a lot guys. I tried the following options
1.) Downloaded the solr 3.5.0 version and updated the schema.xml file with
the sample fields i have. I then tried to set the property
"ignoreCaseForWildcards=true" for a field type as mentioned in the url given
for the patch-2438, but got the
http://lucene.472066.n3.nabble.com/file/n3616191/02.xml 02.xml
i am trying to index this file for this i am using this command
java -jar post.jar *.xml
commands run fine but when i search not result is displaying
I think it is encoding problem can any one help ??
--
View this mes
The issue i'm facing is... I didn't get the expected results when i combine
"group" param and "sort" param.
The query is...
http://localhost:8080/solr/core1/select/?qt=nutch&q=*:*&fq=userid:333&group=true&group.field=threadid&group.sort=date%20desc&sort=date%20desc
where "threadid" is a hexadeci
the problem with dedupe (SignatureUpdateProcessor ) is that it REPLACES old
docs. I have tried it already.
Best Regards
Alexander Aristov
On 28 December 2011 13:04, Lance Norskog wrote:
> The SignatureUpdateProcessor is for exactly this problem:
>
>
> http://www.lucidimagination.com/search/lin
Thans iorixxx and Koji for your reply ,
so can i fulfill my needed requirement by using hl.regex.pattern and making
hl.fragmenter=regex ??
i was watching on these fields on wiki. i am thinking to use it to make my
highlighted text show in my desire format.
my string is like below
1s: This is v
Hi all.
Due to my code review, I discovered next things:
1) as I wrote before, seems there is a low disk read speed;
2) at ~/solr-3.5/solr/core/src/java/org/apache/solr/response/XMLWriter.java
and in the same classes there is a writeDocList => writeDocs method, which
contains a cycle for of all doc
Thanks for your reply, I thought about using the debug mode, too, but
the information is not easy to parse and doesnt contain everything I
want. Furthermore I dont want to enable debug mode in production.
Is there anything else I could try?
On Tue, Dec 27, 2011 at 12:48 PM, Ahmet Arslan wrote:
>
Dear list,
I'd like to bounce on that issue...
IMHO, configuration parsing could be a little bit stricter... At least,
what stands for a "severe" configuration error could be user-defined.
Let me give some examples that are common errors and that don't trigger
the "abortOnConfigurationError"
You would have to implement this yourself in your indexing code. Solr
has an analysis plugin which does the analysis for your text and then
returns the result, but does not query or index. You can use this to
calculate the fuzzy hash, then search against index.
You might be able to code this in an
The SignatureUpdateProcessor is for exactly this problem:
http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/Deduplication
On Tue, Dec 27, 2011 at 10:42 PM, Alexander Aristov
wrote:
> I get docs from external sources and the only place I keep them is solr
> index. I have
(11/12/28 17:08), Ahmet Arslan wrote:
FastVectorHighlighter requires Solr3.1
http://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighlighter
Right. In addition, baoundaryScanner requires 3.5.
koji
--
http://www.rondhuit.com/en/
> I tried by adding BoundaryScanner in my
> solrconfig.xml and set
> hl.useFastVectorHighlighter=true, termVectors=on,
> termPositions=on and
> termOffsets=on. in my query. then also i didn't get any
> effect on my
> highlighting.
> do i missing anything , or doing anything wrong??
> i like to
Hi Kogi ,
Thanks for reply.
I tried by adding BoundaryScanner in my solrconfig.xml and set
hl.useFastVectorHighlighter=true, termVectors=on, termPositions=on and
termOffsets=on. in my query. then also i didn't get any effect on my
highlighting.
my solr config setting is as below
Hi Kogi ,
Thanks for reply.
I tried by adding BoundaryScanner in my solrconfig.xml and set
hl.useFastVectorHighlighter=true, termVectors=on, termPositions=on and
termOffsets=on. in my query. then also i didn't get any effect on my
highlighting.
my solr config setting is as below
> i can't delete 1s ,2s ...etc from my
> field value , i have to keep text in
> this format... so i'll apply slop in my search to do my
> needed search done.
It is OK if you cant delete 1s, 2s, etc from field value. We can eat up
those special markups in analysis chain. PatternReplaceCharFil
55 matches
Mail list logo