Re: nutch and solr

2012-02-21 Thread tamanjit.bin...@yahoo.co.in
Try this command. bin/nutch crawl urls//.txt -dir crawl/ -threads 10 -depth 2 -topN 1000 Your folder structure will look like this: -- urls -- -- .txt | | -- crawl -- The folder name will be for different domains. So for each domain

Re: How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Chris Hostetter
: But i don't know if it's possible to merge this "autocreated" facet with a : facet already predefined ? i tried to used (adding this to my : code in my previous post) : : ** copyField applies to the raw input of those fields -- so the special logic you have in the analyzer for your text_tag_

Re: Fast Vector Highlighter Working for some records only

2012-02-21 Thread Koji Sekiguchi
(12/02/22 11:58), dhaivat wrote: Thanks for reply, But can you please tell me why it's working for some documents and not for other. As Solr 1.4.1 cannot recognize hl.useFastVectorHighlighter flag, Solr just ignore it, but due to hl=true is there, Solr tries to create highlight snippets by usi

Re: Fast Vector Highlighter Working for some records only

2012-02-21 Thread dhaivat
Koji Sekiguchi wrote > > (12/02/21 21:22), dhaivat wrote: >> Hi Koji, >> >> Thanks for quick reply, i am using solr 1.4.1 >> > > Uh, you cannot use FVH on Solr 1.4.1. FVH is available Solr 3.1 or later. > So your hl.useFastVectorHighlighter=true flag is ignored. > > koji > -- > Query Log Visu

Re: Help with MMapDirectoryFactory in 3.5

2012-02-21 Thread Chris Hostetter
: How do I see the setting in the log or in stats.jsp ? I cannot find a place : that indicates it is set or not. I don't think the DirectoryFactory plugin hook was ever setup so that it can report it's info/stats ... it doesn't look like it implements SOlrInfoMBean, so it can't really report an

Re: Solrj Stream Server memory leak

2012-02-21 Thread Chris Hostetter
: I am using the SolrJ client's StreamingUpdateSolrServer and when ever i : stop tomcat, it throws a memory leak warning. sample error message: : : SEVERE: The web application [/MyApplication] appears to have started a : thread named [pool-1004-thread-1] but has failed to stop it. This is very :

Re: Date filter query

2012-02-21 Thread Erick Erickson
bq: How could I overlook it? Easy, the same way I did for a year and more Best Erick On Tue, Feb 21, 2012 at 6:50 PM, Em wrote: > Erick, > > damn! > > The NOW of now isn't the same NOW a second later. So obvisiously. How > could I overlook it? > > Kind regards, > Em > > Am 22.02.2012 00:17

Re: Fast Vector Highlighter Working for some records only

2012-02-21 Thread Koji Sekiguchi
(12/02/21 21:22), dhaivat wrote: Hi Koji, Thanks for quick reply, i am using solr 1.4.1 Uh, you cannot use FVH on Solr 1.4.1. FVH is available Solr 3.1 or later. So your hl.useFastVectorHighlighter=true flag is ignored. koji -- Query Log Visualizer for Apache Solr http://soleami.com/

Re: Date filter query

2012-02-21 Thread Em
Erick, damn! The NOW of now isn't the same NOW a second later. So obvisiously. How could I overlook it? Kind regards, Em Am 22.02.2012 00:17, schrieb Erick Erickson: > Be a little careful here. Any "fq" that references NOW will probably > NOT be effectively cached. Think of the fq cache as a ma

nutch and solr

2012-02-21 Thread alessio crisantemi
I try to configured nutch (1.4) on my solr 3.2 But when I try with a crawl command "bin/nutch inject crawl/crawldb urls" don't works, and it reply with "can't convert a empty path" why, in your opinion? tx a.

Re: filter query or boolean?

2012-02-21 Thread Erick Erickson
Apples and oranges here. Filter queries do NOT contribute to score. But they are cached so if you have a frequent use-case for filtering, you'll get much faster performance. OTOH, if your filter queries are never repeated, filter queries aren't helpful. So if correctness isn't defined by the fq c

Re: Date filter query

2012-02-21 Thread Erick Erickson
Be a little careful here. Any "fq" that references NOW will probably NOT be effectively cached. Think of the fq cache as a map, with the key being the fq clause and the value being the set of documents that match that value. So something like NOW gives 2012-01-23T00:00:00Z but issuing that a secon

Re: mixed indexing through dhi and other ways

2012-02-21 Thread Em
Hi Ramo, sorry for confusing you. Forget everything that I said after "However" - it was wrong (I mixed something here). Yes, you can index documents via any UpdateRequestHandler you like while using the DIH. Kind regards, Em Am 21.02.2012 23:41, schrieb Ramo Karahasan: > Hi, > > what do you

Solr Highlighting not working with PayloadTermQueries

2012-02-21 Thread Nitin Arora
Hi, I'm using SOLR and Lucene in my application for search. I'm facing an issue of highlighting using FastVectorHighlighter not working when I use PayloadTermQueries as clauses of a BooleanQuery. After Debugging I found that In DefaultSolrHighlighter.Java, fvh.getFieldQuery does not return an

Re: reader/searcher refresh after replication (commit)

2012-02-21 Thread Em
Eks, that sounds strange! Am I getting you right? You have a master which indexes batch-updates from time to time. Furthermore you got some slaves, pulling data from that master to keep them up-to-date with the newest batch-updates. Additionally your slaves index own content in soft-commit mode t

AW: mixed indexing through dhi and other ways

2012-02-21 Thread Ramo Karahasan
Hi, what do you mean? Are you referring the time i add a new document? But that should be okay, all documents will be added with delta import that are older than the last time I've indexed, right? Thanks, Ramo -Ursprüngliche Nachricht- Von: Em [mailto:mailformailingli...@yahoo.de] Gesen

Re: SOLR - Just for search or whole site DB?

2012-02-21 Thread Em
Hi Spadez, MySQL, as well as any other SQL-database, needs the same amount of work to integrate its data into Solr. Choose your favorite database and get started! Best, Em Am 21.02.2012 18:32, schrieb Spadez: > Thank you for the information Damien. > > Is there a better database to use at the

Re: mixed indexing through dhi and other ways

2012-02-21 Thread Em
Hi Ramo, yes, it's possible. However keep in mind that your cURL, CSV, XML, JSON etc. update-requests store the information that is needed to do delta-updates with your DIH (if needed!). Kind regards, Em Am 21.02.2012 23:18, schrieb Ramo Karahasan: > Hi, > > > > currently i'm indexing via DH

Re: Date filter query

2012-02-21 Thread Em
Hi, > But they [the cache configurations] are default for both tests, can it affect on > results? Yes, they affect both results. Try to increase the values for queryResultCache and documentCache from 512 to 1024 (provided that you got two distinct queries "bay" and "girl"). In general they should

mixed indexing through dhi and other ways

2012-02-21 Thread Ramo Karahasan
Hi, currently i'm indexing via DHI and delta import. Is it possible to additionaly index data via cURL as XML or JSON into the index which was created via DHI, for example for "real-time"indexing data, like comments on a question? Thank you, Ramo

Re: Date filter query

2012-02-21 Thread ku3ia
Hi, >>First: I am really surprised that the difference between explicit >>Date-Values and the more friendly date-keywords is that large. Maybe it is that I use shards. I have 11 shards, summary ~310M docs. >>Did you made a server restart between both tests? I tried to run these test one after a

Re: Date filter query

2012-02-21 Thread Em
Hi, your QTimes are somewhat slow! First: I am really surprised that the difference between explicit Date-Values and the more friendly date-keywords is that large. Did you made a server restart between both tests? Second: Could you show us your solrconfig to make sure that your caches are configu

Re: reader/searcher refresh after replication (commit)

2012-02-21 Thread eks dev
And drinks on me to those who decoupled implicit commit from close... this was tricky trap On Tue, Feb 21, 2012 at 9:10 PM, eks dev wrote: > Thanks Mark, > Hmm, I would like to have this information asap, not to wait until the > first search gets executed (depends on user) . Is solr going to crea

Re: Date filter query

2012-02-21 Thread ku3ia
Hi, Em, thanks for your response. But seems a have a problem. I wrote a script, which sends a queries (curl based), with a certain delay. I had made a dictionary of matched words. I run my script with 500ms delay during 60 seconds. Take look at catalina logs: INFO: [] webapp=/solr path=/select par

Re: Unique key constraint and optimistic locking (versioning)

2012-02-21 Thread Em
Hi Per, Solr provides the so called "UniqueKey"-field. Refer to the Wiki to learn more: http://wiki.apache.org/solr/UniqueKey > Optimistic locking (versioning) ... is not provided by Solr out of the box. If you add a new document with the same UniqueKey it replaces the old one. You have to do the

Re: reader/searcher refresh after replication (commit)

2012-02-21 Thread eks dev
Thanks Mark, Hmm, I would like to have this information asap, not to wait until the first search gets executed (depends on user) . Is solr going to create new searcher as a part of "replication transaction"... Just to make it clear why I need it... I have simple master, many slaves config where ma

Re: How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Em
Well, you could create a keyword-file out of your database and join it with your self-maintained keywordslist. Doing so, keep in mind that you have to reload your SolrCore in order to make the changes visible to the indexing-process (and keep in mind that you have to reindex those documents that ma

Re: reader/searcher refresh after replication (commit)

2012-02-21 Thread Mark Miller
Post commit calls are made before a new searcher is opened. Might be easier to try to hook in with a new searcher listener? On Feb 21, 2012, at 8:23 AM, eks dev wrote: > Hi all, > I am a bit confused with IndexSearcher refresh lifecycles... > In a master slave setup, I override postCommit listen

Re: How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Xavier
In a way I agree that it would be easier to do that but i really wants to avoid this solution because it prefer to work "harder" on preparing my index than adding field requests on my front query :) So the only solution i see right now is to do that on my own in order to have my database fully pre

filter query or boolean?

2012-02-21 Thread darren
Hi, Which is faster for boolean compound expressions. filter queries or a single query with boolean expressions? For that matter, is there any difference other than maybe speed? thanks

Re: Date filter query

2012-02-21 Thread Em
Hi, 1) and 2) should have equal performance, given that several searches are performed with the same fq-param. Since the filters are cached, 1) and 2) perform better. Kind regards, Em Am 21.02.2012 19:06, schrieb ku3ia: > Hi all! > > Please advice me: > 1) q=test&fq=date:[NOW-30DAY+TO+NOW] > 2

Date filter query

2012-02-21 Thread ku3ia
Hi all! Please advice me: 1) q=test&fq=date:[NOW-30DAY+TO+NOW] 2) q=test&fq=date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] 3) q=test+AND+date:[NOW-30DAY+TO+NOW] 4) q=test+AND+date:[2012-01-23T00:00:00Z+TO+2012-02-21T23:59:59Z] where date: Which of these queries will be faster by QTime at

Re: SOLR - Just for search or whole site DB?

2012-02-21 Thread Spadez
Thank you for the information Damien. Is there a better database to use at the core of the sight which is more compatible with SOLR than MYSQL, or is hooking MYSQL up with SOLR simple enough. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Just-for-search-or-whole-site-

Re: How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Em
Wouldn't it be easier to store both types in different fields? At query-time you are able to do a facet on both and can combine the results client-side to present them within the GUI. Kind regards, Em Am 21.02.2012 17:52, schrieb Xavier: > Sure, the difference between my 2 facets are : > > - 'pr

Re: How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Xavier
Sure, the difference between my 2 facets are : - 'predefined_facets' contains values already filled in my database like : 'web langage', 'cooking', 'fishing' - 'text_tag_facets' will contain the same possible value but determined automatically from a given wordslist by searching in the docum

Re: How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Em
Hi Xavier, > It's maybe because (As I understood) the real (stored) value of this dynamic > facet is still the initial fulltext ?? (or maybe i'm wrong ...) Exactly. CopyField does not copy the analyzed result of a field into another one. Instead, the original content given to that field (the unan

Re: How to index a facetfield by searching words matching from another Textfield

2012-02-21 Thread Xavier
Thanks for this answer. I have posted my new question (related to this post) into a new topic ;) ( http://lucene.472066.n3.nabble.com/How-to-merge-an-quot-autofacet-quot-with-a-predefined-facet-td3763988.html ) Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/H

How to merge an "autofacet" with a predefined facet

2012-02-21 Thread Xavier
Hi everyone, Like explained in this post : http://lucene.472066.n3.nabble.com/How-to-index-a-facetfield-by-searching-words-matching-from-another-Textfield-td3761201.html I have created a dynamic facet at indexation by searching terms in a fulltext field. But i don't know if it's possible to merg

RE: SOLR - Just for search or whole site DB?

2012-02-21 Thread Demian Katz
I would strongly recommend using Solr just for search. Solr is designed for doing fast search lookups. It is really not designed for performing all the functions of a relational database system. You certainly COULD use Solr for everything, and the software is constantly being enhanced to make

Re: How to index a facetfield by searching words matching from another Textfield

2012-02-21 Thread Erick Erickson
setting stored="true" simply places a verbatim copy of the input in the index. Returning that field in a document will simply return that verbatim copy, there's no way to do anything else. The facet *values* you get back in your response should be what you put in your index though, why doesn't tha

Re: How to index a facetfield by searching words matching from another Textfield

2012-02-21 Thread Xavier
Seems that's an error from the documentation with the 'Factory' missing in the classname !!? I found That is working fine !!! Conclusion i have this files : *synonymswords.txt :* php,mysql,html,css=>web_langage And *keepwords.txt :* web langage With this fieldType :

reader/searcher refresh after replication (commit)

2012-02-21 Thread eks dev
Hi all, I am a bit confused with IndexSearcher refresh lifecycles... In a master slave setup, I override postCommit listener on slave (solr trunk version) to read some user information stored in userCommitData on master -- @Override public final void postCommit() { // This returnes "stale"

Unique key constraint and optimistic locking (versioning)

2012-02-21 Thread Per Steffensen
Hi Does solr/lucene provide any mechanism for "unique key constraint" and "optimistic locking (versioning)"? Unique key constraint: That a client will not succeed creating a new document in solr/lucene if a document already exists having the same value in some field (e.g. an id field). Of cour

SOLR - Just for search or whole site DB?

2012-02-21 Thread Spadez
I am new to this but I wanted to pitch a setup to you. I have a website being coded at the moment, in the very early stages, but is effectively a full text scrapper and search engine. We have decided on SOLR for the search system. We basically have two sets of data: One is the content for the se

Re: Fast Vector Highlighter Working for some records only

2012-02-21 Thread dhaivat
Hi Koji, Thanks for quick reply, i am using solr 1.4.1 i am querying *"camera"* here is the example of documents : which matches the 70 Electronics/Cell Phones /b/l/blackberry-8100-pearl-2.jpg 349.99 BlackBerry 8100 Pearl sports a large 240 x 260 screen that supports over 65,000

Re: Fast Vector Highlighter Working for some records only

2012-02-21 Thread Koji Sekiguchi
Dhaivat, Can you give us the concrete document that you are trying to search and make a highlight snippet? And what is your Solr version? koji -- Query Log Visualizer for Apache Solr http://soleami.com/ (12/02/21 20:29), dhaivat wrote: Hi I am newbie to Solr and i am using Sorj Client to cre

Re: lucene operators interfearing in edismax

2012-02-21 Thread jmlucjav
Ok thanks. But I reviewed some of my searches and the - was not surrounded by withespaces in all cases, so I'll have to remove lucene operators myself from the user input. I understand there is no predefined way to do so. -- View this message in context: http://lucene.472066.n3.nabble.com/lucene

Fast Vector Highlighter Working for some records only

2012-02-21 Thread dhaivat
Hi I am newbie to Solr and i am using Sorj Client to create index and query the solr data.. When i am querying the data i want to use Highlight feature of solr so i am using Fast Vector Highlighter to enable highlight on words.. I found that it's working fine for some documents and for some docum

Re: How to index a facetfield by searching words matching from another Textfield

2012-02-21 Thread Xavier
That's it ! Thanks :) First time i see that documentation page (which is really helpfull) : http://lucidworks.lucidimagination.com/display/solr/Filter+Descriptions#FilterDescriptions-KeepWordsFilter So, now i want to "associate" a wordslist to a value of an existing facets So i tried i combine

Re: Do SOLR supports Lemmatization

2012-02-21 Thread Dirceu Vieira
Hi, Have a look at the following link: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28Lemmatization%29#Stemming Regards, Dirceu On Tue, Feb 21, 2012 at 11:18 AM, dsy99 wrote: > Dear all, > I want to know, do SOLR support Lemmatization? If yes, which in-built > Lemm

Do SOLR supports Lemmatization

2012-02-21 Thread dsy99
Dear all, I want to know, do SOLR support Lemmatization? If yes, which in-built Lemmatizer class should be included in SOLR schema file to analyze the tokens using lemmatization rather than stemming. Thanks in advance. With Thanks & Regds: Divakar Yadav -- View this message in context: http:/

Query regarding Lucene Indexing Method

2012-02-21 Thread syed kather
Hi Team , Is there any article or site where I can learn about lucene index Method: how is it written and maintained? And one quick question : The Standard method that Lucene uses to handle Indexes, Is it apache package or Lucene has own Index writing Method? Does lucene use memory mapped f