Try this command.
bin/nutch crawl urls//.txt -dir crawl/
-threads 10 -depth 2 -topN 1000
Your folder structure will look like this:
-- urls -- -- .txt
|
|
-- crawl --
The folder name will be for different domains. So for each domain
: But i don't know if it's possible to merge this "autocreated" facet with a
: facet already predefined ? i tried to used (adding this to my
: code in my previous post) :
: **
copyField applies to the raw input of those fields -- so the special logic
you have in the analyzer for your text_tag_
(12/02/22 11:58), dhaivat wrote:
Thanks for reply,
But can you please tell me why it's working for some documents and not for
other.
As Solr 1.4.1 cannot recognize hl.useFastVectorHighlighter flag, Solr just
ignore it, but due to hl=true is there, Solr tries to create highlight snippets
by usi
Koji Sekiguchi wrote
>
> (12/02/21 21:22), dhaivat wrote:
>> Hi Koji,
>>
>> Thanks for quick reply, i am using solr 1.4.1
>>
>
> Uh, you cannot use FVH on Solr 1.4.1. FVH is available Solr 3.1 or later.
> So your hl.useFastVectorHighlighter=true flag is ignored.
>
> koji
> --
> Query Log Visu
: How do I see the setting in the log or in stats.jsp ? I cannot find a place
: that indicates it is set or not.
I don't think the DirectoryFactory plugin hook was ever setup so that it
can report it's info/stats ... it doesn't look like it implements
SOlrInfoMBean, so it can't really report an
: I am using the SolrJ client's StreamingUpdateSolrServer and when ever i
: stop tomcat, it throws a memory leak warning. sample error message:
:
: SEVERE: The web application [/MyApplication] appears to have started a
: thread named [pool-1004-thread-1] but has failed to stop it. This is very
:
bq: How could I overlook it?
Easy, the same way I did for a year and more
Best
Erick
On Tue, Feb 21, 2012 at 6:50 PM, Em wrote:
> Erick,
>
> damn!
>
> The NOW of now isn't the same NOW a second later. So obvisiously. How
> could I overlook it?
>
> Kind regards,
> Em
>
> Am 22.02.2012 00:17
(12/02/21 21:22), dhaivat wrote:
Hi Koji,
Thanks for quick reply, i am using solr 1.4.1
Uh, you cannot use FVH on Solr 1.4.1. FVH is available Solr 3.1 or later.
So your hl.useFastVectorHighlighter=true flag is ignored.
koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/
Erick,
damn!
The NOW of now isn't the same NOW a second later. So obvisiously. How
could I overlook it?
Kind regards,
Em
Am 22.02.2012 00:17, schrieb Erick Erickson:
> Be a little careful here. Any "fq" that references NOW will probably
> NOT be effectively cached. Think of the fq cache as a ma
I try to configured nutch (1.4) on my solr 3.2
But when I try with a crawl command
"bin/nutch inject crawl/crawldb urls"
don't works, and it reply with "can't convert a empty path"
why, in your opinion?
tx
a.
Apples and oranges here.
Filter queries do NOT contribute to score. But they are cached so
if you have a frequent use-case for filtering, you'll get much
faster performance. OTOH, if your filter queries are never
repeated, filter queries aren't helpful.
So if correctness isn't defined by the fq c
Be a little careful here. Any "fq" that references NOW will probably
NOT be effectively cached. Think of the fq cache as a map, with
the key being the fq clause and the value being the set of
documents that match that value.
So something like NOW gives
2012-01-23T00:00:00Z
but issuing that a secon
Hi Ramo,
sorry for confusing you.
Forget everything that I said after "However" - it was wrong (I mixed
something here).
Yes, you can index documents via any UpdateRequestHandler you like while
using the DIH.
Kind regards,
Em
Am 21.02.2012 23:41, schrieb Ramo Karahasan:
> Hi,
>
> what do you
Hi,
I'm using SOLR and Lucene in my application for search.
I'm facing an issue of highlighting using FastVectorHighlighter not working
when I use PayloadTermQueries as clauses of a BooleanQuery.
After Debugging I found that In DefaultSolrHighlighter.Java,
fvh.getFieldQuery does not return an
Eks,
that sounds strange!
Am I getting you right?
You have a master which indexes batch-updates from time to time.
Furthermore you got some slaves, pulling data from that master to keep
them up-to-date with the newest batch-updates.
Additionally your slaves index own content in soft-commit mode t
Hi,
what do you mean? Are you referring the time i add a new document? But that
should be okay, all documents will be added with delta import that are older
than the last time I've indexed, right?
Thanks,
Ramo
-Ursprüngliche Nachricht-
Von: Em [mailto:mailformailingli...@yahoo.de]
Gesen
Hi Spadez,
MySQL, as well as any other SQL-database, needs the same amount of work
to integrate its data into Solr.
Choose your favorite database and get started!
Best,
Em
Am 21.02.2012 18:32, schrieb Spadez:
> Thank you for the information Damien.
>
> Is there a better database to use at the
Hi Ramo,
yes, it's possible.
However keep in mind that your cURL, CSV, XML, JSON etc. update-requests
store the information that is needed to do delta-updates with your DIH
(if needed!).
Kind regards,
Em
Am 21.02.2012 23:18, schrieb Ramo Karahasan:
> Hi,
>
>
>
> currently i'm indexing via DH
Hi,
> But they [the cache configurations] are default for both tests, can it
affect on
> results?
Yes, they affect both results. Try to increase the values for
queryResultCache and documentCache from 512 to 1024 (provided that you
got two distinct queries "bay" and "girl"). In general they should
Hi,
currently i'm indexing via DHI and delta import.
Is it possible to additionaly index data via cURL as XML or JSON into the
index which was created via DHI, for example for "real-time"indexing data,
like comments on a question?
Thank you,
Ramo
Hi,
>>First: I am really surprised that the difference between explicit
>>Date-Values and the more friendly date-keywords is that large.
Maybe it is that I use shards. I have 11 shards, summary ~310M docs.
>>Did you made a server restart between both tests?
I tried to run these test one after a
Hi,
your QTimes are somewhat slow!
First: I am really surprised that the difference between explicit
Date-Values and the more friendly date-keywords is that large.
Did you made a server restart between both tests?
Second: Could you show us your solrconfig to make sure that your caches
are configu
And drinks on me to those who decoupled implicit commit from close...
this was tricky trap
On Tue, Feb 21, 2012 at 9:10 PM, eks dev wrote:
> Thanks Mark,
> Hmm, I would like to have this information asap, not to wait until the
> first search gets executed (depends on user) . Is solr going to crea
Hi, Em, thanks for your response. But seems a have a problem.
I wrote a script, which sends a queries (curl based), with a certain delay.
I had made a dictionary of matched words. I run my script with 500ms delay
during 60 seconds. Take look at catalina logs:
INFO: [] webapp=/solr path=/select
par
Hi Per,
Solr provides the so called "UniqueKey"-field.
Refer to the Wiki to learn more:
http://wiki.apache.org/solr/UniqueKey
> Optimistic locking (versioning)
... is not provided by Solr out of the box. If you add a new document
with the same UniqueKey it replaces the old one.
You have to do the
Thanks Mark,
Hmm, I would like to have this information asap, not to wait until the
first search gets executed (depends on user) . Is solr going to create
new searcher as a part of "replication transaction"...
Just to make it clear why I need it...
I have simple master, many slaves config where ma
Well, you could create a keyword-file out of your database and join it
with your self-maintained keywordslist.
Doing so, keep in mind that you have to reload your SolrCore in order to
make the changes visible to the indexing-process (and keep in mind that
you have to reindex those documents that ma
Post commit calls are made before a new searcher is opened.
Might be easier to try to hook in with a new searcher listener?
On Feb 21, 2012, at 8:23 AM, eks dev wrote:
> Hi all,
> I am a bit confused with IndexSearcher refresh lifecycles...
> In a master slave setup, I override postCommit listen
In a way I agree that it would be easier to do that but i really wants to
avoid this solution because it prefer to work "harder" on preparing my index
than adding field requests on my front query :)
So the only solution i see right now is to do that on my own in order to
have my database fully pre
Hi,
Which is faster for boolean compound expressions. filter queries or a
single query with boolean expressions?
For that matter, is there any difference other than maybe speed?
thanks
Hi,
1) and 2) should have equal performance, given that several searches are
performed with the same fq-param.
Since the filters are cached, 1) and 2) perform better.
Kind regards,
Em
Am 21.02.2012 19:06, schrieb ku3ia:
> Hi all!
>
> Please advice me:
> 1) q=test&fq=date:[NOW-30DAY+TO+NOW]
> 2
Hi all!
Please advice me:
1) q=test&fq=date:[NOW-30DAY+TO+NOW]
2) q=test&fq=date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]
3) q=test+AND+date:[NOW-30DAY+TO+NOW]
4) q=test+AND+date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z]
where date:
Which of these queries will be faster by QTime at
Thank you for the information Damien.
Is there a better database to use at the core of the sight which is more
compatible with SOLR than MYSQL, or is hooking MYSQL up with SOLR simple
enough.
--
View this message in context:
http://lucene.472066.n3.nabble.com/SOLR-Just-for-search-or-whole-site-
Wouldn't it be easier to store both types in different fields?
At query-time you are able to do a facet on both and can combine the
results client-side to present them within the GUI.
Kind regards,
Em
Am 21.02.2012 17:52, schrieb Xavier:
> Sure, the difference between my 2 facets are :
>
> - 'pr
Sure, the difference between my 2 facets are :
- 'predefined_facets' contains values already filled in my database like :
'web langage', 'cooking', 'fishing'
- 'text_tag_facets' will contain the same possible value but determined
automatically from a given wordslist by searching in the docum
Hi Xavier,
> It's maybe because (As I understood) the real (stored) value of this
dynamic
> facet is still the initial fulltext ?? (or maybe i'm wrong ...)
Exactly.
CopyField does not copy the analyzed result of a field into another one.
Instead, the original content given to that field (the unan
Thanks for this answer.
I have posted my new question (related to this post) into a new topic ;)
(
http://lucene.472066.n3.nabble.com/How-to-merge-an-quot-autofacet-quot-with-a-predefined-facet-td3763988.html
)
Best regards
--
View this message in context:
http://lucene.472066.n3.nabble.com/H
Hi everyone,
Like explained in this post :
http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-td3761201.html
I have created a dynamic facet at indexation by searching terms in a
fulltext field.
But i don't know if it's possible to merg
I would strongly recommend using Solr just for search. Solr is designed for
doing fast search lookups. It is really not designed for performing all the
functions of a relational database system. You certainly COULD use Solr for
everything, and the software is constantly being enhanced to make
setting stored="true" simply places a verbatim copy
of the input in the index. Returning that field in
a document will simply return that verbatim copy,
there's no way to do anything else.
The facet *values* you get back in your response should
be what you put in your index though, why doesn't tha
Seems that's an error from the documentation with the 'Factory' missing in
the classname !!?
I found
That is working fine !!!
Conclusion i have this files :
*synonymswords.txt :*
php,mysql,html,css=>web_langage
And
*keepwords.txt :*
web langage
With this fieldType :
Hi all,
I am a bit confused with IndexSearcher refresh lifecycles...
In a master slave setup, I override postCommit listener on slave
(solr trunk version) to read some user information stored in
userCommitData on master
--
@Override
public final void postCommit() {
// This returnes "stale"
Hi
Does solr/lucene provide any mechanism for "unique key constraint" and
"optimistic locking (versioning)"?
Unique key constraint: That a client will not succeed creating a new
document in solr/lucene if a document already exists having the same
value in some field (e.g. an id field). Of cour
I am new to this but I wanted to pitch a setup to you. I have a website
being coded at the moment, in the very early stages, but is effectively a
full text scrapper and search engine. We have decided on SOLR for the search
system.
We basically have two sets of data:
One is the content for the se
Hi Koji,
Thanks for quick reply, i am using solr 1.4.1
i am querying *"camera"*
here is the example of documents :
which matches the
70
Electronics/Cell Phones
/b/l/blackberry-8100-pearl-2.jpg
349.99
BlackBerry 8100 Pearl sports a large 240 x 260 screen
that supports over 65,000
Dhaivat,
Can you give us the concrete document that you are trying to search and make
a highlight snippet? And what is your Solr version?
koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/
(12/02/21 20:29), dhaivat wrote:
Hi
I am newbie to Solr and i am using Sorj Client to cre
Ok thanks.
But I reviewed some of my searches and the - was not surrounded by
withespaces in all cases, so I'll have to remove lucene operators myself
from the user input. I understand there is no predefined way to do so.
--
View this message in context:
http://lucene.472066.n3.nabble.com/lucene
Hi
I am newbie to Solr and i am using Sorj Client to create index and query the
solr data.. When i am querying the data i want to use Highlight feature of
solr so i am using Fast Vector Highlighter to enable highlight on words.. I
found that it's working fine for some documents and for some docum
That's it ! Thanks :)
First time i see that documentation page (which is really helpfull) :
http://lucidworks.lucidimagination.com/display/solr/Filter+Descriptions#FilterDescriptions-KeepWordsFilter
So, now i want to "associate" a wordslist to a value of an existing facets
So i tried i combine
Hi,
Have a look at the following link:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28Lemmatization%29#Stemming
Regards,
Dirceu
On Tue, Feb 21, 2012 at 11:18 AM, dsy99 wrote:
> Dear all,
> I want to know, do SOLR support Lemmatization? If yes, which in-built
> Lemm
Dear all,
I want to know, do SOLR support Lemmatization? If yes, which in-built
Lemmatizer class should be included in SOLR schema file to analyze the
tokens using lemmatization rather than stemming.
Thanks in advance.
With Thanks & Regds:
Divakar Yadav
--
View this message in context:
http:/
Hi Team ,
Is there any article or site where I can learn about lucene index
Method: how is it written and maintained?
And one quick question : The Standard method that Lucene uses to handle
Indexes, Is it apache package or Lucene has own Index writing Method? Does
lucene use memory mapped f
52 matches
Mail list logo