date:20120131

solr utf8 for words like compagnieën?

2012-01-31 Thread RT


Hi,

I am having a bit of trouble getting words with characters such as:

ë, á, ø etc into SOLR.

Programming in C++ (Using Qt's QString) I am wondering what conversion to 
apply before compiling words with such letters into the solrquery.


Is UTF8 the correct encoding?

Thanks in advance.

Kind regards,

Roland.

Re: language specific fields of "text"

2012-01-31 Thread Paul Libbrecht

Hello bing,

Le 31 janv. 2012 à 04:27, bing a écrit :
> I understand your point of missing "text_en" in the document. It is. Not
> "text_en" but "text" exists.

Unless you use copyField or upload the field as another element, it will not 
get fed.

> But then it arises the question: isn't it dynamic to add language specific
> suffixes to an existing filed "text"?

not that I know of.

> I am new here. As far as I know, for some field "title", people can create
> "title_en" "title_fr" to incorporate different analyzers in the same schema.
> Even this, I am not seeing it happens. Thus, I am thinking whether it is
> possible I neglect some obvious point? 

You'd use copyField.

> "Bing" is very common in the names of Chinese, as there are several Chinese
> characters corresponding to the same pronunciation. 

good, I learn everyday.

paul

Re: solr utf8 for words like compagnieën?

2012-01-31 Thread Gora Mohanty

On Tue, Jan 31, 2012 at 1:50 PM, RT  wrote:
> Hi,
>
> I am having a bit of trouble getting words with characters such as:
>
> ė, į, ų etc into SOLR.
>
> Programming in C++ (Using Qt's QString) I am wondering what conversion to
> apply before compiling words with such letters into the solrquery.
>
> Is UTF8 the correct encoding?

UTF8 should be fine, though Latin1 will also work here.
How are you getting the UTF8 for these strings? Have
you looked at
http://developer.qt.nokia.com/doc/qt-4.8/QString.html#converting-between-8-bit-strings-and-unicode-strings

Regards,
Gora

removing cores solrcloud

2012-01-31 Thread Phil Hoy

Hi,

I am running solrcloud and i am able to add cores 
http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin but how does 
one remove cores. If i use the core admin unload command, distributed queries 
then error as they still query the removed core. Do I need to update zookeeper 
somehow?

Phil

Indexing content in XML files

2012-01-31 Thread bing

Hi, all, 

I am investigating the indexing in XML files. Currently, I have two
findings:

1. Use DataImportHanlder. This requires to create one more configuration
file for DIH, data-config.xml, which defines the fields specifically for my
XML files. 

2. Use the example package coming with Solr. This only requires to define
the fields in the schema, and no additional configuration file needed. 
\apache-solr-3.5.0\example\exampledocs>java -jar post.jar *.xml

I don't know whether I understand the two methods correctly, but it seems to
me that they are absolutely different. If I want to index XML files with
many self-defined fields, probably with embedded fields, which one makes
more sense? 

Thanks. 

Best
Bing

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-content-in-XML-files-tp3702795p3702795.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Response status

2012-01-31 Thread Jens Ellenberg

Hello,

Is there a reference to this status-codes?


Erik Hatcher wrote
> 
> It means the request was successful.  If the status is non-zero (err,  
> 1) then there was an error of some sort.
> 
>   Erik
> 
> On Dec 4, 2008, at 9:32 AM, Robert Young wrote:
> 
>> In the standard response format, what does the status mean? It  
>> always seems
>> to be 0.
>>
>> Thanks
>> Rob
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Response-status-tp490876p3702747.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing content in XML files

2012-01-31 Thread Ahmet Arslan

> 2. Use the example package coming with Solr. This only
> requires to define
> the fields in the schema, and no additional configuration
> file needed. 
> \apache-solr-3.5.0\example\exampledocs>java -jar post.jar
> *.xml

Bing, please see Hoss' explanation about intended usage of post.jar 

http://search-lucene.com/m/O9dek2ngjHf

Document with longer field names and many fields

2012-01-31 Thread tech20nn

We are planning to import data from various tables of ERP DB into a single
Solr/Lucene index.
Since these tables have overlapping columns we are planing to name the
corresponding document field as  _.  I have
following questions on this.

1) Does having long field name (_) affect
performance ?
2) We will end up with  close to 200 fields per document in schema
definition. At time of document storage only 20 field per will be indexed
and stored for each stored document. Is there a limitation here, are we
creating performance bottleneck by designing schema this way ?

Thanks
Vijay



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Document-with-longer-field-names-and-many-fields-tp3703077p3703077.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Response status

2012-01-31 Thread Erik Hatcher


On Jan 31, 2012, at 04:42 , Jens Ellenberg wrote:

> Hello,
> 
> Is there a reference to this status-codes?

Just the source code.  SolrCore#setResponseHeaderValues, which predominately 
uses the codes specified in SolrException:

BAD_REQUEST( 400 ),
UNAUTHORIZED( 401 ),  // not currently used
FORBIDDEN( 403 ),
NOT_FOUND( 404 ),
SERVER_ERROR( 500 ),
SERVICE_UNAVAILABLE( 503 ),
UNKNOWN(0);

Erik


> 
> 
> Erik Hatcher wrote
>> 
>> It means the request was successful.  If the status is non-zero (err,  
>> 1) then there was an error of some sort.
>> 
>>  Erik
>> 
>> On Dec 4, 2008, at 9:32 AM, Robert Young wrote:
>> 
>>> In the standard response format, what does the status mean? It  
>>> always seems
>>> to be 0.
>>> 
>>> Thanks
>>> Rob
>> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Response-status-tp490876p3702747.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Edismax, Filter Query and Highlighting

2012-01-31 Thread Vadim Kisselmann

Hi,

i have problems with edismax, filter queries and highlighting.

First of all: can edismax deal with filter queries?

My case:
Edismax is my default requestHandler.
My query in SolrAdminGUI: (roomba OR irobot) AND language:de

You can see, that my q is "roomba OR irobot" and my fq is
"language:de"(language is a field in schema.xml)
 With this params i turn highlighting on: &hl=true&hl.fl=text,title,url

In my shown result you can see that highlighting matched on
de in url(last ).


Erste Erfahrung mit unserem Roomba
Roboter Staubsauger

 Erste Erfahrung mit unserem Roomba Roboter Staubsauger
 Tags: Haushaltshilfe, Roboter
http://www.blog-gedanken.de/produkte/erste-erfahrung-mit-unserem-roomba-roboter-staubsauger/

in calalina.out i can see the following query:
path=/select/ 
params={hl=true&version=2.2&indent=on&rows=10&start=0&q=(roomba+OR+irobot)+AND+language:de}
hits=1 status=0 QTime=65

language:de is a filter, and shouldn't be highlighted.
Do i have a thinking error, or is my query wrong? Or is it an edismax problem?

Vest Regards
Vadim

Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Ahmet Arslan

> in calalina.out i can see the following query:
> path=/select/
> params={hl=true&version=2.2&indent=on&rows=10&start=0&q=(roomba+OR+irobot)+AND+language:de}
> hits=1 status=0 QTime=65
> 
> language:de is a filter, and shouldn't be highlighted.
> Do i have a thinking error, or is my query wrong? Or is it
> an edismax problem?

In your example, language:de is a part of query. Use &fq= instead.
q=(roomba OR irobot)&fq=language:de

SolrException with branch_3x

2012-01-31 Thread Bernd Fehling


On January 11th I downloaded branch_3x with svn into eclipse (indigo).
Compiled and tested it without problems.
Today I updated my branch_3x from repository.
Compiled fine but get now SolrException when starting.

Jan 31, 2012 1:50:15 PM org.apache.solr.core.SolrCore initListeners
INFO: [] Added SolrEventListener for firstSearcher: 
org.apache.solr.core.QuerySenderListener{queries=[{q=*:*,start=0,rows=10,spellcheck.build=true}, {q=(text:(*:*).

Jan 31, 2012 2:00:10 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: QueryResponseWriter init failure
at org.apache.solr.core.SolrCore.initWriters(SolrCore.java:1499)
at org.apache.solr.core.SolrCore.(SolrCore.java:557)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:319)
...

It isn't able to init QueryResponseWriter on startup :-(
My config hasn't changed since 3 weeks ago.
Can't find any issue in CHANGES.txt belonging to this.


And something else to mention, in SolrCore.java initWriters at lines 1491 to 
1495:
if(info.isDefault()){
   defaultResponseWriter = writer;
   if(defaultResponseWriter != null)
 log.warn("Multiple default queryResponseWriter registered ignoring: " + 
old.getClass().getName());
}

This will also log.warn for the first defaultResponseWriter.
I would place "defaultResponseWriter = writer;" _AFTER_ the if/log.warn.


Regards,
Bernd

Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Vadim Kisselmann

Hi Ahmet,

thanks for quick response :)
I've also discovered this failure.
I wonder that the query themselves works.
For example: query = language:de
I get results which only have language:de.
Also works the fq and i get only the "de"-result in my field "language".
I can't understand the behavior. It seems like the fq works, but at
the end my fq-params be converted to q-params.

Regards
Vadim



2012/1/31 Ahmet Arslan :
>> in calalina.out i can see the following query:
>> path=/select/
>> params={hl=true&version=2.2&indent=on&rows=10&start=0&q=(roomba+OR+irobot)+AND+language:de}
>> hits=1 status=0 QTime=65
>>
>> language:de is a filter, and shouldn't be highlighted.
>> Do i have a thinking error, or is my query wrong? Or is it
>> an edismax problem?
>
> In your example, language:de is a part of query. Use &fq= instead.
> q=(roomba OR irobot)&fq=language:de
>

Re: removing write.lock file in solr after indexing

2012-01-31 Thread Erick Erickson

On Mon, Jan 30, 2012 at 2:42 AM, Shyam Bhaskaran
 wrote:
> Hi,
>
> We are using Solr 4.0 and after indexing every time it is observed that the 
> write.lock remains without getting cleared and for the next indexing we have 
> to delete the file to get the indexing process running.
>
> We use SolrServer for our indexing and I do not see any  methods to close or 
> clear the indexes on completion of indexing.
>
>
> I have seen that adding the below lines into solrconfig.xml file avoids the 
> issue of physically removing the write.lock file when doing indexing.
>
>
>
> 
>
>  simple
>
>  true
>
> 
>
>
> But I am hesitant in adding this directive, as it might not be a good idea to 
> set this directive in production as it would defeat the purpose of locking 
> the index while another process writes into it.
>
> Let me know if we can do this programmatically, is there something like 
> close() which would remove the write.lock file after completion of indexing 
> using SolrServer?
>
> Thanks
> Shyam

Re: removing write.lock file in solr after indexing

2012-01-31 Thread Erick Erickson

Oops, fat fingers... Anyway, this is surprising. Can you provide
more details on how you do your indexing?

Best
Erick

On Tue, Jan 31, 2012 at 8:59 AM, Erick Erickson  wrote:
> On Mon, Jan 30, 2012 at 2:42 AM, Shyam Bhaskaran
>  wrote:
>> Hi,
>>
>> We are using Solr 4.0 and after indexing every time it is observed that the 
>> write.lock remains without getting cleared and for the next indexing we have 
>> to delete the file to get the indexing process running.
>>
>> We use SolrServer for our indexing and I do not see any  methods to close or 
>> clear the indexes on completion of indexing.
>>
>>
>> I have seen that adding the below lines into solrconfig.xml file avoids the 
>> issue of physically removing the write.lock file when doing indexing.
>>
>>
>>
>> 
>>
>>  simple
>>
>>  true
>>
>> 
>>
>>
>> But I am hesitant in adding this directive, as it might not be a good idea 
>> to set this directive in production as it would defeat the purpose of 
>> locking the index while another process writes into it.
>>
>> Let me know if we can do this programmatically, is there something like 
>> close() which would remove the write.lock file after completion of indexing 
>> using SolrServer?
>>
>> Thanks
>> Shyam

Re: Query for exact part of sentence

2012-01-31 Thread Erick Erickson

Unless you provide your schema configuration, there's
not much to go on here. Two things though:

1> look at the admin/analysis page to see how your
 data is broken up into tokens.
2> at a guess you have WordDelimiterFilterFactory
 in your chain and perhaps catenateNumbers="1"

Best
Erick

On Mon, Jan 30, 2012 at 3:21 AM, Arkadi Colson  wrote:
> Hi
>
> I'm using the pecl PHP class to query SOLR and was wondering how to query
> for a part of a sentence exactly.
>
> There are 2 data items index in SOLR
> 1327497476: 123 456 789
> 1327497521. 1234 5678 9011
>
> However when running the query, both data items are returned as you can see
> below. Any idea why?
>
> Thanks!
>
> SolrObject Object
> (
>    [responseHeader] =>  SolrObject Object
>        (
>            [status] =>  0
>            [QTime] =>  5016
>            [params] =>  SolrObject Object
>                (
>                    [debugQuery] =>  true
>                    [shards] =>
>  solr01:8983/solr,solr02:8983/solr,solr03:8983/solr
>                    [fl] =>
>  id,smsc_module,smsc_ssid,smsc_description,smsc_content,smsc_courseid,smsc_date_created,smsc_date_edited,score,metadata_stream_size,metadata_stream_source_info,metadata_stream_name,metadata_stream_content_type,last_modified,author,title,subject
>                    [sort] =>  smsc_date_created asc
>                    [indent] =>  on
>                    [start] =>  0
>                    [q] =>  (smsc_content:\"123 456\" ||
> smsc_description:\"123 456\")&&  (smsc_module:Intradesk)&&
>  (smsc_date_created:[2011-12-25T10:29:51Z TO NOW])&&  (smsc_ssid:38)
>                    [distrib] =>  true
>                    [wt] =>  xml
>                    [version] =>  2.2
>                    [rows] =>  55
>                )
>
>        )
>
>    [response] =>  SolrObject Object
>        (
>            [numFound] =>  2
>            [start] =>  0
>            [docs] =>  Array
>                (
>                    [0] =>  SolrObject Object
>                        (
>                            [smsc_module] =>  Intradesk
>                            [smsc_ssid] =>  38
>                            [id] =>  1327497476
>                            [smsc_courseid] =>  0
>                            [smsc_date_created] =>  2011-12-25T10:29:51Z
>                            [smsc_date_edited] =>  2011-12-25T10:29:51Z
>                            [score] =>  10.028017
>                        )
>
>                    [1] =>  SolrObject Object
>                        (
>                            [smsc_module] =>  Intradesk
>                            [smsc_ssid] =>  38
>                            [id] =>  1327497521
>                            [smsc_courseid] =>  0
>                            [smsc_date_created] =>  2011-12-25T10:29:51Z
>                            [smsc_date_edited] =>  2011-12-25T10:29:51Z
>                            [score] =>  5.541335
>                        )
>
>                )
>
>        )
>    [debug] =>  SolrObject Object
>        (
>            [rawquerystring] =>  (smsc_content:\"123 456\" ||
> smsc_description:\"123 456\")&&  (smsc_module:Intradesk)&&
>  (smsc_date_created:[2011-12-25T10:29:51Z TO NOW])&&  (smsc_ssid:38)
>            [querystring] =>  (smsc_content:\"123 456\" ||
> smsc_description:\"123 456\")&&  (smsc_module:Intradesk)&&
>  (smsc_date_created:[2011-12-25T10:29:51Z TO NOW])&&  (smsc_ssid:38)
>            [parsedquery] =>  +(smsc_content:123 smsc_content:456
> smsc_description:123 smsc_content:456) +smsc_module:intradesk
> +smsc_date_created:[2011-12-25T10:29:51Z TO 2012-01-25T13:33:21.098Z]
> +smsc_ssid:38
>            [parsedquery_toString] =>  +(smsc_content:123 smsc_content:456
> smsc_description:123 smsc_content:456) +smsc_module:intradesk
> +smsc_date_created:[2011-12-25T10:29:51 TO 2012-01-25T13:33:21.098]
> +smsc_ssid:`#8;#0;#0;#0;&
>            [QParser] =>  LuceneQParser
>            [timing] =>  SolrObject Object
>

Re: search returns 'categories' instead of url

2012-01-31 Thread remi tassing

After looking at the Carrot2 introduction, it seems this can be solved with
clustering but with pre-defined categories.

Does that make sense?

Remi

On Sun, Jan 29, 2012 at 8:42 PM, remi tassing  wrote:

> Hi,
>
> Let's say Solr is setup and can return relevant urls. What if I wanted to
> get the most cited terms from a predefined list, instead? It could be from
> a list of products, names, cities...
>
> Any ideas?
>
> Remi

Re: Index-Analyzer on Master with StopFilterFactory and Query-Analyzer on Slave with StopFilterFactory

2012-01-31 Thread Erick Erickson

I think it would be easy to get confused about what
was where, resulting in hard-to-track bugs because
the config file wasn't what you were expecting. I also
don't understand why you think this is desirable.
There might be an infinitesimal savings in memory,
due to not instantiating one analysis chain, but I'm not
even sure about that.

The savings is so tiny that the increased risk of
messing up seems far too high a price to pay.

Best
Erick

On Mon, Jan 30, 2012 at 11:44 AM, Daniel Brügge
 wrote:
> Hi,
>
> I am using a 'text_general' fieldType (class = solr.TextField) in my
> schema. And I have a master/slave setup,
> where I index on the master and read from the slaves. In the text_general
> field I am using 2 analyzers. One for
> indexing and one for querying with stopword-filters.
>
> What I am thinking is if it would make sense to have a different schema on
> the master than on the slave? So just the
> index-analyzer on the master's schema and the query-analyzer on the slave's
> schema?
>
>
>  positionIncrementGap="100">
>  
> 
>   words="stopwords.txt, stopwords_en.txt" enablePositionIncrements="true" />
>  
> 
>  
> 
>   words="stopwords.txt, stopwords_en.txt" enablePositionIncrements="true" />
>  
> 
>  
>
> What do you think?
>
> Thanks & best regards
>
> Daniel

Re: Out of Memory

2012-01-31 Thread Erick Erickson

Right. Mutlivalued fields use fieldCache for
faceting (as I remember) whereas single valued
fields don't under some circumstances. See:
http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache

Before your change, you were probably using the
filterCache for what faceting you were doing.

So yes, you're probably memory-constrained at this
point. How much physical memory do you have anyway?

Best
Erick

On Mon, Jan 30, 2012 at 12:10 PM, Milan Dobrota  wrote:
> Hi,
>
> I have a Solr instance with 6M item index. It normally uses around 3G of
> memory. I have suddenly started getting out of memory errors and increasing
> the Xmx parameter to over 4G didn't fix the problem. It was just buying us
> time. Inspecting the heap, I figured that 90% of memory is occupied by
> FieldCache. Is this normal? We do very little sorting and no faceting.
>
> Is FieldCache ever supposed to get cleared? Can this be done through HTTP?
>
> Do we need more memory? If so, I don't understand why the minimal set of
> changes we introduced (one multivalued field) would cause the memory to
> drastically increase.
>
> The communication with the Solr instance is done via HTTP.
>
> Java version:
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)
>
> Milan

Re: SOLVED: Strange things happen when I query with many facet.prefixes and fq filters

2012-01-31 Thread Erick Erickson

Ah, thanks for bringing closure. Should have occurred to me
when I saw your query

On Mon, Jan 30, 2012 at 2:55 PM, Yuhao  wrote:
> Good question.  I checked the output sent to Jetty.  In the case where it 
> returns a blank page, nothing at all is sent to Jetty.  This raised my 
> suspicion that Solr never got a chance to process the query.  Sure enough, it 
> led me to the finding that Jetty by default cannot take more than 4 KB of 
> header.  After I increased that limit, everything works.
> Problem solved.
>
>
>
>
> 
>  From: Erick Erickson 
> To: solr-user@lucene.apache.org; Yuhao 
> Sent: Sunday, January 29, 2012 1:05 PM
> Subject: Re: Strange things happen when I query with many facet.prefixes and 
> fq filters
>
> The very first question I have is "what do your Solr logs show"? I suspect
> you'll see something interesting there. Otherwise, there's no way really to
> say what's going on here without reproducing your setup...
>
> Best
> Erick
>
> On Fri, Jan 27, 2012 at 6:48 PM, Yuhao  wrote:
>> Hi,
>>
>> I'm having issues when running the following query, which is produced by 
>> expanding several hierarchical facets (implemented the facet.prefix way).  I 
>> realize it's pretty massive, but I'd like to figure out what exactly is 
>> causing the problem.  Is it too many facet.prefix clauses, too many fq 
>> filters, the combo of both, or what.  Anyway, here is the URL I start out
>>  with:
>>
>> http://40.163.5.153:920/solr/browse?&fq=Gene_Ontology_Associations%3A%220%2Fbiological_process%28GO%3A0008150%29%22&fq=Gene_Ontology_Associations%3A%221%2Fbiological_process%28GO%3A0008150%29%3Bmetabolic+process%28GO%3A0008152%29%22&fq=Gene_Ontology_Associations%3A%222%2Fbiological_process%28GO%3A0008150%29%3Bmetabolic+process%28GO%3A0008152%29%3Bsteroid+metabolic+process%28GO%3A0008202%29%22&fq=Gene_Ontology_Associations%3A%223%2Fbiological_process%28GO%3A0008150%29%3Bmetabolic+process%28GO%3A0008152%29%3Bsteroid+metabolic+process%28GO%3A0008202%29%3Bcholesterol+metabolic+process%28GO%3A0008203%29%22&fq=Mouse_Phenotype_Associations%3A%220%2Fmammalian+phenotype%28MP%3A001%29%22&fq=Mouse_Phenotype_Associations%3A%221%2Fmammalian+phenotype%28MP%3A001%29%3Bhomeostasis%2Fmetabolism+phenotype%28MP%3A0005376%29%22&fq=Mouse_Phenotype_Associations%3A%222%2Fmammalian+phenotype%28MP%3A001%29%3Bhomeostasis%2Fme
>> tabolism+phenotype%28MP%3A0005376%29%3Babnormal+homeostasis%28MP%3A0001764%29%22&fq=Mouse_Phenotype_Associations%3A%223%2Fmammalian+phenotype%28MP%3A001%29%3Bhomeostasis%2Fmetabolism+phenotype%28MP%3A0005376%29%3Babnormal+homeostasis%28MP%3A0001764%29%3Babnormal+lipid+homeostasis%28MP%3A0002118%29%22&fq=Mouse_Phenotype_Associations%3A%224%2Fmammalian+phenotype%28MP%3A001%29%3Bhomeostasis%2Fmetabolism+phenotype%28MP%3A0005376%29%3Babnormal+homeostasis%28MP%3A0001764%29%3Babnormal+lipid+homeostasis%28MP%3A0002118%29%3Babnormal+cholesterol+homeostasis%28MP%3A0005278%29%22&fq=Mouse_Phenotype_Associations%3A%225%2Fmammalian+phenotype%28MP%3A001%29%3Bhomeostasis%2Fmetabolism+phenotype%28MP%3A0005376%29%3Babnormal+homeostasis%28MP%3A0001764%29%3Babnormal+lipid+homeostasis%28MP%3A0002118%29%3Babnormal+cholesterol+homeostasis%28MP%3A0005278%29%3Babnormal+cholesterol+level%28MP%3A0003947%29%22&fq=Mouse_Phenotype_Associations%3A%226%2Fm
>> ammalian+phenotype%28MP%3A001%29%3Bhomeostasis%2Fmetabolism+phenotype%28MP%3A0005376%29%3Babnormal+homeostasis%28MP%3A0001764%29%3Babnormal+lipid+homeostasis%28MP%3A0002118%29%3Babnormal+cholesterol+homeostasis%28MP%3A0005278%29%3Babnormal+cholesterol+level%28MP%3A0003947%29%3Bdecreased+cholesterol+level%28MP%3A0003983%29%22&fq=Mouse_Phenotype_Associations%3A%227%2Fmammalian+phenotype%28MP%3A001%29%3Bhomeostasis%2Fmetabolism+phenotype%28MP%3A0005376%29%3Babnormal+homeostasis%28MP%3A0001764%29%3Babnormal+lipid+homeostasis%28MP%3A0002118%29%3Babnormal+cholesterol+homeostasis%28MP%3A0005278%29%3Babnormal+cholesterol+level%28MP%3A0003947%29%3Bdecreased+cholesterol+level%28MP%3A0003983%29%3Bdecreased+liver+cholesterol+level%28MP%3A0010026%29%22&fq=BKL_Diagnostic_Marker_Associations%3A%220%2FCardiovascular+Diseases%28MESH%3AD002318%29%22&fq=BKL_Molecular_Mechanism_Associations%3A%220%2FCardiovascular+Diseases%28MESH%3AD002318%29%22&fq=
>> BKL_Diagnostic_Marker_Associations%3A%221%2FCardiovascular+Diseases%28MESH%3AD002318%29%3BArteriosclerosis%28MESH%3AD001161%29%22&q=&fq=BKL_Diagnostic_Marker_Associations:%222%2FCardiovascular+Diseases%28MESH%3AD002318%29%3BArteriosclerosis%28MESH%3AD001161%29%3BAtherosclerosis%28MESH%3AD050197%29%22&f.Gene_Ontology_Associations.facet.prefix=4%2Fbiological_process%28GO%3A0008150%29%3Bmetabolic+process%28GO%3A0008152%29%3Bsteroid+metabolic+process%28GO%3A0008202%29%3Bcholesterol+metabolic+process%28GO%3A0008203%29&f.Mouse_Phenotype_Associations.facet.prefix=8%2Fmammalian+phenotype%28MP%3A001%29%3Bhomeostasis%2Fmetabolism+phenotype%28MP%3A0005376%29%3Babnormal+homeostasis%28MP%3A0001764%

Re: product(popularity,score) gives error undefined field score

2012-01-31 Thread Erick Erickson

We need more information on your setup. What version of Solr?

Best
Erick

On Mon, Jan 30, 2012 at 7:10 PM, abhayd  wrote:
> hi
>
> I m trying to add some weight for popularity in the score returned by solr
> query.
> http://localhost:10101/solr/syx/select?q={!boost%20b=product(popularity,score)}SIM&rows=100&fl=score,id&debug=true
>
> I get error "undefined field score"
>
> Any idea how to do this?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/product-popularity-score-gives-error-undefined-field-score-tp3701734p3701734.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index-Analyzer on Master with StopFilterFactory and Query-Analyzer on Slave with StopFilterFactory

2012-01-31 Thread Daniel Brügge

OK, thanks Erick. Then I won't touch it. I was just wondering, if it would
make sense. But on the other hand the schema.xml is also replicated in my
setup, so maybe it's really confusing.

Thanks

Daniel

On Tue, Jan 31, 2012 at 3:07 PM, Erick Erickson wrote:

> I think it would be easy to get confused about what
> was where, resulting in hard-to-track bugs because
> the config file wasn't what you were expecting. I also
> don't understand why you think this is desirable.
> There might be an infinitesimal savings in memory,
> due to not instantiating one analysis chain, but I'm not
> even sure about that.
>
> The savings is so tiny that the increased risk of
> messing up seems far too high a price to pay.
>
> Best
> Erick
>
> On Mon, Jan 30, 2012 at 11:44 AM, Daniel Brügge
>  wrote:
> > Hi,
> >
> > I am using a 'text_general' fieldType (class = solr.TextField) in my
> > schema. And I have a master/slave setup,
> > where I index on the master and read from the slaves. In the text_general
> > field I am using 2 analyzers. One for
> > indexing and one for querying with stopword-filters.
> >
> > What I am thinking is if it would make sense to have a different schema
> on
> > the master than on the slave? So just the
> > index-analyzer on the master's schema and the query-analyzer on the
> slave's
> > schema?
> >
> >
> >  > positionIncrementGap="100">
> >  
> > 
> >   > words="stopwords.txt, stopwords_en.txt" enablePositionIncrements="true"
> />
> >  
> > 
> >  
> > 
> >   > words="stopwords.txt, stopwords_en.txt" enablePositionIncrements="true"
> />
> >  
> > 
> >  
> >
> > What do you think?
> >
> > Thanks & best regards
> >
> > Daniel
>

RE: removing write.lock file in solr after indexing

2012-01-31 Thread Shyam Bhaskaran

Hi Erick,


Below is the sample flow.


String solrHome = "/opt/solr/home";

File solrXml = new File( solrHome, "solr.xml" ); 

container = new CoreContainer(); 

container.load(solrHome, solrXml); 

SolrServer solr = new EmbeddedSolrServer(container, "core1"); 

solr.deleteByQuery("*:*"); 

SolrInputDocument doc1 = new SolrInputDocument(); 

doc1.addField( "id", "id1", 1.0f ); 

doc1.addField( "name", "doc1", 1.0f ); 

Collection docs = new ArrayList(); 

docs.add( doc1 ); 

solr.commit(); 

SolrCore curCore = container.getCore("core1");
 
curCore.close();



I have also seen that EmbeddedSolrServer process is not terminating after 
completion of the indexing process, can this be a reason. But even after manual 
termination of the process the 'write.lock' file stays in the index directory.


-Shyam

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, January 31, 2012 7:30 PM
To: solr-user@lucene.apache.org
Subject: Re: removing write.lock file in solr after indexing

Oops, fat fingers... Anyway, this is surprising. Can you provide
more details on how you do your indexing?

Best
Erick

On Tue, Jan 31, 2012 at 8:59 AM, Erick Erickson  wrote:
> On Mon, Jan 30, 2012 at 2:42 AM, Shyam Bhaskaran
>  wrote:
>> Hi,
>>
>> We are using Solr 4.0 and after indexing every time it is observed that the 
>> write.lock remains without getting cleared and for the next indexing we have 
>> to delete the file to get the indexing process running.
>>
>> We use SolrServer for our indexing and I do not see any  methods to close or 
>> clear the indexes on completion of indexing.
>>
>>
>> I have seen that adding the below lines into solrconfig.xml file avoids the 
>> issue of physically removing the write.lock file when doing indexing.
>>
>>
>>
>> 
>>
>>  simple
>>
>>  true
>>
>> 
>>
>>
>> But I am hesitant in adding this directive, as it might not be a good idea 
>> to set this directive in production as it would defeat the purpose of 
>> locking the index while another process writes into it.
>>
>> Let me know if we can do this programmatically, is there something like 
>> close() which would remove the write.lock file after completion of indexing 
>> using SolrServer?
>>
>> Thanks
>> Shyam

Re: Multilingual search in multicore solr

2012-01-31 Thread Erick Erickson

See below:

On Mon, Jan 30, 2012 at 10:16 PM, bing  wrote:
> Hi, Erick Erickson,
>
> Your suggestions are sound.
>
> For (1), if I use SolrJ as the client to access Solr, then java coding
> becomes the most challenging part. Technically, I want to achieve the same
> effect with highlighting, faceting search, language detection, etc. Do you
> know some example SC that I can refer to?
>

It's actually surprisingly easy. You want to use either the
CommonsHttpSolrServer or the StreamingUpdateSolrServer
to connect to a Solr instance.

>From there, you assemble a list of SolrInputDocument and call
server.add(list).

The basic bits are about 25 lines of code.

Adding Tika in is almost equally as easy.

Don't know of any canned code lying around though.

Best
Erick

> For (2), I agree with you on the difficulty in detecting language from just
> a few words. Thus, alternatively I can suggest a set of results and let
> users to decide.
> You also mentioned score. Say, I have not so many cores, and so for every
> query I direct it to all the cores, returned with a set of scores.  Is it
> confident to conclude that the highest score gives the most confidence of
> the results?
>
Absolutely not, I mislead you a bit in my original suggestion. The cores
all have independent statistics, so the scores are not comparable. Sorry
about that! This is not as bad a problem if you simply have different
*fields* per language in a single core, but still is a concern.

> Thanks.
>
> Best Regards,
> Ni Bing
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multilingual-search-in-multicore-solr-tp3698969p3702041.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: can solr automatically search for different punctuation of a word

2012-01-31 Thread Erick Erickson

Take a look at solrconfig.xml, the  directives there. Either add
a path (relative) there or just plop the jar into one of the dirs
already specified.

Best
Erick

On Mon, Jan 30, 2012 at 10:38 PM,   wrote:
>
>  Hi Chantal,
>
> In the readme file at  solr/contrib/analysis-extras/README.txt it says to add 
> the ICU library (in lib/)
>
> Do I need also add ... and where?
>
> Thanks.
> Alex.
>
>
>
>
>
> -Original Message-
> From: Chantal Ackermann 
> To: solr-user 
> Sent: Fri, Jan 13, 2012 1:52 am
> Subject: Re: can solr automatically search for different punctuation of a word
>
>
> Hi Alex,
>
>
>
> for me, ICUFoldingFilterFactory works very good. It does lowercasing and
>
> removes diacritica (this is how umlauts and accenting of letters is
>
> called - punctuation means comma, points etc.). It will work for any any
>
> language, not only German. And it will also handle apostrophs as in
>
> "C'est bien".
>
>
>
> ICU requires additional libraries in the classpath. For an in-built solr
>
> solution have a look at ASCIIFoldingFilterFactory.
>
>
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory
>
>
>
>
>
>
>
> Example configuration:
>
> 
>        positionIncrementGap="100">
>
>        
>
>                
>
>                
>
>        
>
> 
>
>
>
> And dependencies (example for Maven) in addition to solr-core:
>
> 
>
>        org.apache.lucene
>
>        lucene-icu
>
>        ${solr.version}
>
>        runtime
>
> 
>
> 
>
>        org.apache.solr
>
>        solr-analysis-extras
>
>        ${solr.version}
>
>        runtime
>
> 
>
>
>
> Cheers,
>
> Chantal
>
>
>
> On Fri, 2012-01-13 at 00:09 +0100, alx...@aim.com wrote:
>
>> Hello,
>
>>
>
>> I would like to know if solr has a functionality to automatically search for 
>> a
>
> different punctuation of a word.
>
>> For example if I if a user searches for a word Uber, and stemmer is german
>
> lang, then solr looks for both Uber and  Über,  like in synonyms.
>
>>
>
>> Is it possible to give a file with a list of possible substitutions of 
>> letters
>
> to solr and have it search for all possible punctuations?
>
>>
>
>>
>
>> Thanks.
>
>> Alex.
>
>
>
>
>

Re: solr custom component

2012-01-31 Thread Erick Erickson

Look at the Sort class. You just specify the field
you want to sort on, direction, and pass the class
to your IndexSearcher.search method.

Best
Erick

On Tue, Jan 31, 2012 at 1:24 AM, Peter Markey  wrote:
> Hi Eric,
>
> I tried looking for a sample code to sort on Date but was unable to find
> one? I am using 3.4 version.
> Any idea as to where I can find one?
>
> Thanks a ton
>
> On Fri, Jan 27, 2012 at 8:13 AM, Erick Erickson 
> wrote:
>
>> Why not just sort on date and take the first doc returned in the list?
>>
>> Best
>> Erick
>>
>> On Thu, Jan 26, 2012 at 10:33 AM, Peter Markey 
>> wrote:
>> > Hello,
>> >
>> > I am building a custom component in Solr and I am trying to construct a
>> > query to get the latest (based on a date field) DocID using
>> SolrIndexSearcher.
>> > Below is a short snippet of my code:
>> >
>> > SolrIndexSearcher searcher =
>> > final SchemaField sf = searcher.getSchema().getField(dateField);
>> > //dateField is one of the fields that contains timestamp of the record
>> >
>> > final IndexSchema schema = searcher.getSchema();
>> >
>> > Query rangeQ = ((DateField)(sf.getType())).getRangeQuery(null,
>> sf,null,NOW,
>> > false,true); //NOW is current Date
>> >
>> > DocList dateDocs = searcher.getDocList(rangeQ, base, null, 0, 1); //base
>> is
>> > a set of doc filters to limit search
>> >
>> >
>> >
>> > Though I get some docs that satisfy the query, my goal is to get the doc
>> > whose's dateField is closest to the current time. Are there any other
>> > queries I can employ for this?
>> >
>> >
>> > Thanks a lot for any suggestions.
>>

Re: Indexing content in XML files

2012-01-31 Thread Erick Erickson

Also, be aware that Solr does NOT index arbitrary XML,
the XML used by the simple post tool is strictly formatted
in a way Solr understands.

A third possibility for arbitrary XML is to write a SolrJ
program that parses your XML and populates
SolrInputDocuments and sends those to Solr.

Best
Erick

On Tue, Jan 31, 2012 at 6:34 AM, Ahmet Arslan  wrote:
>> 2. Use the example package coming with Solr. This only
>> requires to define
>> the fields in the schema, and no additional configuration
>> file needed.
>> \apache-solr-3.5.0\example\exampledocs>java -jar post.jar
>> *.xml
>
> Bing, please see Hoss' explanation about intended usage of post.jar
>
> http://search-lucene.com/m/O9dek2ngjHf

Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Erick Erickson

Seeing the results with &debugQuery=on would help.

No, fq does NOT get translated into q params, it's a
completely separate mechanism so I'm not quite sure
what you're seeing.

Best
Erick

On Tue, Jan 31, 2012 at 8:40 AM, Vadim Kisselmann
 wrote:
> Hi Ahmet,
>
> thanks for quick response :)
> I've also discovered this failure.
> I wonder that the query themselves works.
> For example: query = language:de
> I get results which only have language:de.
> Also works the fq and i get only the "de"-result in my field "language".
> I can't understand the behavior. It seems like the fq works, but at
> the end my fq-params be converted to q-params.
>
> Regards
> Vadim
>
>
>
> 2012/1/31 Ahmet Arslan :
>>> in calalina.out i can see the following query:
>>> path=/select/
>>> params={hl=true&version=2.2&indent=on&rows=10&start=0&q=(roomba+OR+irobot)+AND+language:de}
>>> hits=1 status=0 QTime=65
>>>
>>> language:de is a filter, and shouldn't be highlighted.
>>> Do i have a thinking error, or is my query wrong? Or is it
>>> an edismax problem?
>>
>> In your example, language:de is a part of query. Use &fq= instead.
>> q=(roomba OR irobot)&fq=language:de
>>

Grouping and sorting results

2012-01-31 Thread Vijay J

Hi,

I'm running into some issues with solr scoring affecting ordering of query
results. Is it possible to run a Solr boolean query to check if
a document contains any search terms and get the results back without any
scoring mechanism besides presence or absence of any of the search
terms? Basically I'd like to turn off tf-idf/Vector space scoring for one
query and get the ordering by boolean instead of score.

As an example, suppose I have a collection of wholesale providers and some
of the wholesale providers are preferred over other providers. I'd like
Solr to get all the preferred providers,
apply an ordering  then get all the non-preferred providers and apply an
ordering.  (i.e. give me a group of preferred providers and apply a sort to
the preferred provider result set and group non-preferred provider and sort
that result set)
The preferred status of the provider is not known ahead of time until query
execution.


Can you give us a lead on this?

Thank you!


Regards,
Vijay

SOLVED: SolrException with branch_3x

2012-01-31 Thread Bernd Fehling


After changing the below suggested lines and compiling the branch_3x runs fine 
now.
SolrException is gone.

Regards,
Bernd

Am 31.01.2012 14:21, schrieb Bernd Fehling:

On January 11th I downloaded branch_3x with svn into eclipse (indigo).
Compiled and tested it without problems.
Today I updated my branch_3x from repository.
Compiled fine but get now SolrException when starting.

Jan 31, 2012 1:50:15 PM org.apache.solr.core.SolrCore initListeners
INFO: [] Added SolrEventListener for firstSearcher:
org.apache.solr.core.QuerySenderListener{queries=[{q=*:*,start=0,rows=10,spellcheck.build=true},
 {q=(text:(*:*).
Jan 31, 2012 2:00:10 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: QueryResponseWriter init failure
at org.apache.solr.core.SolrCore.initWriters(SolrCore.java:1499)
at org.apache.solr.core.SolrCore.(SolrCore.java:557)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:466)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:319)
...

It isn't able to init QueryResponseWriter on startup :-(
My config hasn't changed since 3 weeks ago.
Can't find any issue in CHANGES.txt belonging to this.


And something else to mention, in SolrCore.java initWriters at lines 1491 to 
1495:
if(info.isDefault()){
defaultResponseWriter = writer;
if(defaultResponseWriter != null)
log.warn("Multiple default queryResponseWriter registered ignoring: " + 
old.getClass().getName());
}

This will also log.warn for the first defaultResponseWriter.
I would place "defaultResponseWriter = writer;" _AFTER_ the if/log.warn.


Regards,
Bernd

Re: Document with longer field names and many fields

2012-01-31 Thread Erick Erickson

This should be OK. There's no real issue with Solr docs having
up to 200 fields, and theres no real limitation on what
portion of those fields each doc has. In other words, only
having 20 out of 200 possible fields in a doc isn't a problem.
There's no overhead for "unused" fields.

Depending upon the number of *unique* values in these fields,
you may bet some extra memory consumption, but whether
that matters depends a lot on the particulars.

By a huge margin, most of the time is spent in searching
itself, not parsing the field names so long ones really don't
matter at all I think.

Best
Erick

On Tue, Jan 31, 2012 at 7:18 AM, tech20nn  wrote:
> We are planning to import data from various tables of ERP DB into a single
> Solr/Lucene index.
> Since these tables have overlapping columns we are planing to name the
> corresponding document field as  _.  I have
> following questions on this.
>
> 1) Does having long field name (_) affect
> performance ?
> 2) We will end up with  close to 200 fields per document in schema
> definition. At time of document storage only 20 field per will be indexed
> and stored for each stored document. Is there a limitation here, are we
> creating performance bottleneck by designing schema this way ?
>
> Thanks
> Vijay
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Document-with-longer-field-names-and-many-fields-tp3703077p3703077.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query for exact part of sentence

2012-01-31 Thread Arkadi Colson

The text field in the schema configuration looks like this. I changed 
catenateNumbers to 0 but it still doesn't work as aspected.










generateNumberParts="1" catenateWords="1" catenateNumbers="1" 
catenateAll="0" splitOnCaseChange="1"/>


protected="protwords.txt"/>






ignoreCase="true" expand="true"/>



generateNumberParts="1" catenateWords="0" catenateNumbers="0" 
catenateAll="0" splitOnCaseChange="1"/>


protected="protwords.txt"/>






On 01/31/2012 03:03 PM, Erick Erickson wrote:

Unless you provide your schema configuration, there's
not much to go on here. Two things though:

1>  look at the admin/analysis page to see how your
  data is broken up into tokens.
2>  at a guess you have WordDelimiterFilterFactory
  in your chain and perhaps catenateNumbers="1"

Best
Erick

On Mon, Jan 30, 2012 at 3:21 AM, Arkadi Colson  wrote:

Hi

I'm using the pecl PHP class to query SOLR and was wondering how to query
for a part of a sentence exactly.

There are 2 data items index in SOLR
1327497476: 123 456 789
1327497521. 1234 5678 9011

However when running the query, both data items are returned as you can see
below. Any idea why?

Thanks!

SolrObject Object
(
[responseHeader] =>SolrObject Object
(
[status] =>0
[QTime] =>5016
[params] =>SolrObject Object
(
[debugQuery] =>true
[shards] =>
  solr01:8983/solr,solr02:8983/solr,solr03:8983/solr
[fl] =>
  
id,smsc_module,smsc_ssid,smsc_description,smsc_content,smsc_courseid,smsc_date_created,smsc_date_edited,score,metadata_stream_size,metadata_stream_source_info,metadata_stream_name,metadata_stream_content_type,last_modified,author,title,subject
[sort] =>smsc_date_created asc
[indent] =>on
[start] =>0
[q] =>(smsc_content:\"123 456\" ||
smsc_description:\"123 456\")&&(smsc_module:Intradesk)&&
  (smsc_date_created:[2011-12-25T10:29:51Z TO NOW])&&(smsc_ssid:38)
[distrib] =>true
[wt] =>xml
[version] =>2.2
[rows] =>55
)

)

[response] =>SolrObject Object
(
[numFound] =>2
[start] =>0
[docs] =>Array
(
[0] =>SolrObject Object
(
[smsc_module] =>Intradesk
[smsc_ssid] =>38
[id] =>1327497476
[smsc_courseid] =>0
[smsc_date_created] =>2011-12-25T10:29:51Z
[smsc_date_edited] =>2011-12-25T10:29:51Z
[score] =>10.028017
)

[1] =>SolrObject Object
(
[smsc_module] =>Intradesk
[smsc_ssid] =>38
[id] =>1327497521
[smsc_courseid] =>0
[smsc_date_created] =>2011-12-25T10:29:51Z
[smsc_date_edited] =>2011-12-25T10:29:51Z
[score] =>5.541335
)

)

)
[debug] =>SolrObject Object
(
[rawquerystring] =>(smsc_content:\"123 456\" ||
smsc_description:\"123 456\")&&(smsc_module:Intradesk)&&
  (smsc_date_created:[2011-12-25T10:29:51Z TO NOW])&&(smsc_ssid:38)
[querystring] =>(smsc_content:\"123 456\" ||
smsc_description:\"123 456\")&&(smsc_module:Intradesk)&&
  (smsc_date_created:[2011-12-25T10:29:51Z TO NOW])&&(smsc_ssid:38)
[parsedquery] =>+(smsc_content:123 smsc_content:456
smsc_description:123 smsc_content:456) +smsc_module:intradesk
+smsc_date_created:[2011-12-25T10:29:51Z TO 2012-01-25T13:33:21.098Z]
+smsc_ssid:38
[parsedquery_toString] =>+(smsc_content:123 smsc_content:456
smsc_description:123 smsc_content:456) +smsc_module:intradesk
+smsc_date_created:[2011-12-25T10:29:51 TO 2012-01-25T13:33:21.098]
+smsc_ssid:`#8;#0;#0;#0;&
[QParser] =>LuceneQParser
[timing] =>SolrObject Object





--
Smartbit bvba
Hoogstraat 13
B-3670 Meeuwen
T: +32 11 64 08 80
F: +32 89 46 81 10
W: http://www.smartbit.be
E: ark...@smartbit.be

ShingleFilterFactory not indexing the whole doc, where is the limit ?

2012-01-31 Thread Pierre JdlF

I'm trying to index word-ngrams using the solr.ShingleFilterFactory,
(storing their positions + offset)
...

  




  
...

...
i'm testing it with a (big?) html document, [1.300.000 chars], with lots of tags
Looking at the index (using Schema browser web interface), i can see
some ngrams were indexed (8939)
but it appears that they were found only in the beginning of the
document (first 1/8 of the document)

other fields are indexing the whole doc without problem
so i was wondering if solr.ShingleFilterFactory had a limit ?
- in the sense of maximum blob of text it can manage ?
- in the sense of maximum number of ngrams produced ?

note that if i try with lower values like: minShingleSize="2" maxShingleSize="3"
i obtain 6465 ngrams (corresponding to the first 1/5 of the doc)

i though the sky was the limit !
any idea ?

-- 
+ Pierre

Re: replication, disk space

2012-01-31 Thread anna headley

Hey Jonathan,

Any update?

We are experiencing the same thing you describe.  As days go on these index
directories continue to collect.  We have deleted timestamped indices that
are not currently in-use, but I've been nervous to remove the one simply
called 'index'.  Did you end up doing that successfully?

Some days, instead of getting additional directories the current index
doubles in size.  It looks like the files are getting mv'ed into it during
replication, but they have different filenames and so don't overwrite the
files that were already in there.  Have you seen this at all?

Some observations:
- We wiped the slave index and triggered a fresh replication.  The problem
was better but not solved for about a week (only had 2 full-size indices,
instead of getting a new one every day).  The problem came back in force
after the master index was deleted and recreated.
- We also have memory issues on both our master and slave machines right
now; we're in the process of moving over to 64-bit servers to alleviate
that problem.
- We are also running red hat (6) and solr 1.4.

Best,
Anna



On Thu, Jan 19, 2012 at 13:25, Dyer, James wrote:

> You can do all the steps to rename the timestamp dir back to "index", but
> I don't think you don't have to.  Solr will know on restart to use the
> timestamped directory so long as it is in the properties file (sorry, I
> must have told you to look at the wrong file...I'm working on old memories
> here.)  You might want to test this in your dev enviornment but I think its
> going to work.  The only thing is if it really bothers you that the index
> isn't being stored in "index"...
>
> The reason why you get into this situation with the timestamped directory
> is explained here:
> http://wiki.apache.org/solr/SolrReplication#What_if_I_add_documents_to_the_slave_or_if_slave_index_gets_corrupted.3F
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Thursday, January 19, 2012 11:43 AM
> To: solr-user@lucene.apache.org
> Cc: Dyer, James
> Subject: Re: replication, disk space
>
> Okay, I do have an index.properties file too, and THAT one does contain
> the name of an index directory.
>
> But it's got the name of the timestamped index directory!  Not sure how
> that happened, could have been Solr trying to recover from running out
> of disk space in the middle of a replication? I certainly never did that
> intentionally.
>
> But okay, if someone can confirm if this plan makes sense to restore
> things without downtime:
>
> 1. rm the 'index' directory, which seems to be an old copy of the index
> at this point
> 2. 'mv index.20120113121302 index'
> 3. Manually edit index.properties to have index=index, not
> index=index.20120113121302
> 4. Send reload core command.
>
> Does this make sense?  (I just experimentally tried an reload core
> command, and even though it's not supposed to, it DID result in about 20
> seconds of unresponsiveness from my solr server, not sure why, could
> just be lack of CPU or RAM on the server to do what's being asked of it.
> But if that's the best I can do, 20 minutes of unavailability, I'll take
> it).
>
> On 1/19/2012 12:37 PM, Jonathan Rochkind wrote:
> > Hmm, I don't have a "replication.properties" file, I don't think. Oh
> > wait, yes I do there it is!  I guess the replication process makes
> > this file?
> >
> > Okay
> >
> > I don't see an index directory in the replication.properties file at
> > all though. Below is my complete replication.properties.
> >
> > So I'm still not sure how to properly recover from this situation
> > withotu downtime. It _looks_ to me like the timestamped directory is
> > actually the live/recent one.  It's files have a more recent
> > timestamp, and it's the one that /admin/replication.jsp mentions.
> >
> > replication.properties:
> >
> > #Replication details
> > #Wed Jan 18 10:58:25 EST 2012
> > confFilesReplicated=[solrconfig.xml, schema.xml]
> > timesIndexReplicated=350
> > lastCycleBytesDownloaded=6524299012
> >
> replicationFailedAtList=1326902305288,1326406990614,1326394654410,1326218508294,1322150197956,1321987735253,1316104240679,1314371534794,1306764945741,1306678853902
> >
> > replicationFailedAt=1326902305288
> > timesConfigReplicated=1
> >
> indexReplicatedAtList=1326902305288,1326825419865,1326744428192,1326645554344,1326569088373,1326475488777,1326406990614,1326394654410,1326303313747,1326218508294
> >
> > confFilesReplicatedAt=1316547200637
> > previousCycleTimeInSeconds=295
> > timesFailed=54
> > indexReplicatedAt=1326902305288
> > ~
> >
> >
> > On 1/18/2012 1:41 PM, Dyer, James wrote:
> >> I've seen this happen when the configuration files change on the
> >> master and replication deems it necessary to do a core-reload on the
> >> slave. In this case, replication copies the entire index to the new
> >> directory then does a core re-load to make the new config files and
> >> new

Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Vadim Kisselmann

Hi Erick,
thanks for your response:)

Here its my query:
(roomba OR irobot) AND language:de AND
url:"http://www.blog-gedanken.de/produkte/erste-erfahrung-mit-unserem-roomba-roboter-staubsauger/";
Url and language are fields in my schema.xml

With &hl=true&hl.fl=text,url i see this, but i want only see "roomba"
or "robot" highlighted:
http://www.blog-gedanken.de/produkte/erste-erfahrung-mit-unserem-roomba-roboter-staubsauger/

you see, the whole url is highlighted.

with debugQuery=on:

(roomba OR irobot) AND
language:de AND
url:"http://www.blog-gedanken.de/produkte/erste-erfahrung-mit-unserem-roomba-roboter-staubsauger/";
(roomba OR irobot) AND language:de AND
url:"http://www.blog-gedanken.de/produkte/erste-erfahrung-mit-unserem-roomba-roboter-staubsauger/";
(+(+(DisjunctionMaxQuery((title:roomba)~0.01)
DisjunctionMaxQuery((title:irobot)~0.01)) +language:de
+PhraseQuery(url:"http www blog gedanken de produkte erste erfahrung
mit unserem roomba roboter staubsauger"))
DisjunctionMaxQuery((text:"roomba irobot"~100)~0.01))/no_coord
+(+((title:roomba)~0.01
(title:irobot)~0.01) +language:de +url:"http www blog gedanken de
produkte erste erfahrung mit unserem roomba roboter staubsauger")
(text:"roomba irobot"~100)~0.01

26.130154 = (MATCH) sum of:
  26.130154 = (MATCH) sum of:
0.30008852 = (MATCH) product of:
  0.60017705 = (MATCH) sum of:
0.60017705 = (MATCH) weight(title:roomba in 199491)
[DefaultSimilarity], result of:
  0.60017705 = score(doc=199491,freq=1.0 = termFreq=1
), product of:
0.119503364 = queryWeight, product of:
  13.392695 = idf(docFreq=19, maxDocs=4820692)
  0.008923026 = queryNorm
5.0222607 = fieldWeight in 199491, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1
  13.392695 = idf(docFreq=19, maxDocs=4820692)
  0.375 = fieldNorm(doc=199491)
  0.5 = coord(1/2)
0.08084078 = (MATCH) weight(language:de in 199491)
[DefaultSimilarity], result of:
  0.08084078 = score(doc=199491,freq=1.0 = termFreq=1
), product of:
0.026857855 = queryWeight, product of:
  3.0099492 = idf(docFreq=645950, maxDocs=4820692)
  0.008923026 = queryNorm
3.0099492 = fieldWeight in 199491, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1
  3.0099492 = idf(docFreq=645950, maxDocs=4820692)
  1.0 = fieldNorm(doc=199491)
25.749224 = (MATCH) weight(url:"http www blog gedanken de produkte
erste erfahrung mit unserem roomba roboter staubsauger" in 199491)
[DefaultSimilarity], result of:
  25.749224 = score(doc=199491,freq=1.0 = phraseFreq=1.0
), product of:
0.9586678 = queryWeight, product of:
  107.43752 = idf(), sum of:
1.0006605 = idf(docFreq=4817508, maxDocs=4820692)
1.4342768 = idf(docFreq=3122520, maxDocs=4820692)
4.5387235 = idf(docFreq=140042, maxDocs=4820692)
10.954706 = idf(docFreq=228, maxDocs=4820692)
3.1167865 = idf(docFreq=580497, maxDocs=4820692)
9.476681 = idf(docFreq=1003, maxDocs=4820692)
9.195494 = idf(docFreq=1329, maxDocs=4820692)
11.576243 = idf(docFreq=122, maxDocs=4820692)
6.3489246 = idf(docFreq=22913, maxDocs=4820692)
12.31089 = idf(docFreq=58, maxDocs=4820692)
13.392695 = idf(docFreq=19, maxDocs=4820692)
11.229373 = idf(docFreq=173, maxDocs=4820692)
12.862067 = idf(docFreq=33, maxDocs=4820692)
  0.008923026 = queryNorm
26.85938 = fieldWeight in 199491, product of:
  1.0 = tf(freq=1.0), with freq of:
1.0 = phraseFreq=1.0
  107.43752 = idf(), sum of:
1.0006605 = idf(docFreq=4817508, maxDocs=4820692)
1.4342768 = idf(docFreq=3122520, maxDocs=4820692)
4.5387235 = idf(docFreq=140042, maxDocs=4820692)
10.954706 = idf(docFreq=228, maxDocs=4820692)
3.1167865 = idf(docFreq=580497, maxDocs=4820692)
9.476681 = idf(docFreq=1003, maxDocs=4820692)
9.195494 = idf(docFreq=1329, maxDocs=4820692)
11.576243 = idf(docFreq=122, maxDocs=4820692)
6.3489246 = idf(docFreq=22913, maxDocs=4820692)
12.31089 = idf(docFreq=58, maxDocs=4820692)
13.392695 = idf(docFreq=19, maxDocs=4820692)
11.229373 = idf(docFreq=173, maxDocs=4820692)
12.862067 = idf(docFreq=33, maxDocs=4820692)
  0.25 = fieldNorm(doc=199491)
ExtendedDismaxQParser16.00.00.00.00.00.00.00.015.00.00.00.08.00.07.0

I hope you can read it:)

Best Regards
Vadim





2012/1/31 Erick Erickson :
> Seeing the results with &debugQuery=on would help.
>
> No, fq does NOT get translated into q params, it's a
> completely separate mechanism so I'm not quite sure
> what you're seeing.
>
> Best
> Erick
>
> On Tue, Jan 31, 2012 at 8:40 AM, Vadim Kisselmann
>  wrote:
>> Hi Ahmet,

Re: product(popularity,score) gives error undefined field score

2012-01-31 Thread abhayd

i m using 4.0 from trunk.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/product-popularity-score-gives-error-undefined-field-score-tp3701734p3703647.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solrj commit affecting documents count

2012-01-31 Thread dprasadx

Hi, I am using solrj server to commit few changes in the data into the master
index through a java program. It works OK unless we do not do a full-import.
But when I do a full-import (say for 800 records), and if I perform a solrj
commit in between the full-import indexing, I see a commit happens to the
index at that time, and my total documents in the index gets affected. 
For example while indexing 800 odd records, if solrj commit happens in the
middle of indexing, I see the documents count in the master has changed to
400... and once the full-import indexing is completed, I get back all the
800 records.

Is there a way to commit the changes into the master without affecting the
count while the indexing is going on ? Please help.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-commit-affecting-documents-count-tp3703646p3703646.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Erick Erickson

I didn't read your first post carefully enough, I was keying
on the words "filter query". Your query does not have
any filter queries! I thought you were talking
about &fq=language:de type clauses, which is what
I was responding to. Solr/Lucene have no way of
interpreting an extended "q" clause and saying
"this part is a query and should be highlighted and
this part isn't".

Try the &fq option maybe?

Best
Erick

On Tue, Jan 31, 2012 at 10:08 AM, Vadim Kisselmann
 wrote:
> Hi Erick,
> thanks for your response:)
>
> Here its my query:
> (roomba OR irobot) AND language:de AND
> url:"http://www.blog-gedanken.de/produkte/erste-erfahrung-mit-unserem-roomba-roboter-staubsauger/";
> Url and language are fields in my schema.xml
>
> With &hl=true&hl.fl=text,url i see this, but i want only see "roomba"
> or "robot" highlighted:
>  name="url">http://www.blog-gedanken.de/produkte/erste-erfahrung-mit-unserem-roomba-roboter-staubsauger/
>
> you see, the whole url is highlighted.
>
> with debugQuery=on:
>
> (roomba OR irobot) AND
> language:de AND
> url:"http://www.blog-gedanken.de/produkte/erste-erfahrung-mit-unserem-roomba-roboter-staubsauger/";
> (roomba OR irobot) AND language:de AND
> url:"http://www.blog-gedanken.de/produkte/erste-erfahrung-mit-unserem-roomba-roboter-staubsauger/";
> (+(+(DisjunctionMaxQuery((title:roomba)~0.01)
> DisjunctionMaxQuery((title:irobot)~0.01)) +language:de
> +PhraseQuery(url:"http www blog gedanken de produkte erste erfahrung
> mit unserem roomba roboter staubsauger"))
> DisjunctionMaxQuery((text:"roomba irobot"~100)~0.01))/no_coord
> +(+((title:roomba)~0.01
> (title:irobot)~0.01) +language:de +url:"http www blog gedanken de
> produkte erste erfahrung mit unserem roomba roboter staubsauger")
> (text:"roomba irobot"~100)~0.01
>  name="de.blog-gedanken/produkte/erste-erfahrung-mit-unserem-roomba-roboter-staubsauger">
> 26.130154 = (MATCH) sum of:
>  26.130154 = (MATCH) sum of:
>    0.30008852 = (MATCH) product of:
>      0.60017705 = (MATCH) sum of:
>        0.60017705 = (MATCH) weight(title:roomba in 199491)
> [DefaultSimilarity], result of:
>          0.60017705 = score(doc=199491,freq=1.0 = termFreq=1
> ), product of:
>            0.119503364 = queryWeight, product of:
>              13.392695 = idf(docFreq=19, maxDocs=4820692)
>              0.008923026 = queryNorm
>            5.0222607 = fieldWeight in 199491, product of:
>              1.0 = tf(freq=1.0), with freq of:
>                1.0 = termFreq=1
>              13.392695 = idf(docFreq=19, maxDocs=4820692)
>              0.375 = fieldNorm(doc=199491)
>      0.5 = coord(1/2)
>    0.08084078 = (MATCH) weight(language:de in 199491)
> [DefaultSimilarity], result of:
>      0.08084078 = score(doc=199491,freq=1.0 = termFreq=1
> ), product of:
>        0.026857855 = queryWeight, product of:
>          3.0099492 = idf(docFreq=645950, maxDocs=4820692)
>          0.008923026 = queryNorm
>        3.0099492 = fieldWeight in 199491, product of:
>          1.0 = tf(freq=1.0), with freq of:
>            1.0 = termFreq=1
>          3.0099492 = idf(docFreq=645950, maxDocs=4820692)
>          1.0 = fieldNorm(doc=199491)
>    25.749224 = (MATCH) weight(url:"http www blog gedanken de produkte
> erste erfahrung mit unserem roomba roboter staubsauger" in 199491)
> [DefaultSimilarity], result of:
>      25.749224 = score(doc=199491,freq=1.0 = phraseFreq=1.0
> ), product of:
>        0.9586678 = queryWeight, product of:
>          107.43752 = idf(), sum of:
>            1.0006605 = idf(docFreq=4817508, maxDocs=4820692)
>            1.4342768 = idf(docFreq=3122520, maxDocs=4820692)
>            4.5387235 = idf(docFreq=140042, maxDocs=4820692)
>            10.954706 = idf(docFreq=228, maxDocs=4820692)
>            3.1167865 = idf(docFreq=580497, maxDocs=4820692)
>            9.476681 = idf(docFreq=1003, maxDocs=4820692)
>            9.195494 = idf(docFreq=1329, maxDocs=4820692)
>            11.576243 = idf(docFreq=122, maxDocs=4820692)
>            6.3489246 = idf(docFreq=22913, maxDocs=4820692)
>            12.31089 = idf(docFreq=58, maxDocs=4820692)
>            13.392695 = idf(docFreq=19, maxDocs=4820692)
>            11.229373 = idf(docFreq=173, maxDocs=4820692)
>            12.862067 = idf(docFreq=33, maxDocs=4820692)
>          0.008923026 = queryNorm
>        26.85938 = fieldWeight in 199491, product of:
>          1.0 = tf(freq=1.0), with freq of:
>            1.0 = phraseFreq=1.0
>          107.43752 = idf(), sum of:
>            1.0006605 = idf(docFreq=4817508, maxDocs=4820692)
>            1.4342768 = idf(docFreq=3122520, maxDocs=4820692)
>            4.5387235 = idf(docFreq=140042, maxDocs=4820692)
>            10.954706 = idf(docFreq=228, maxDocs=4820692)
>            3.1167865 = idf(docFreq=580497, maxDocs=4820692)
>            9.476681 = idf(docFreq=1003, maxDocs=4820692)
>            9.195494 = idf(docFreq=1329, maxDocs=4820692)
>            11.576243 = idf(docFreq=122, maxDocs=4820692)
>            6.3489246 = idf(docFreq=22913, maxDoc

Re: Response status

2012-01-31 Thread Jens Ellenberg

Thanks,

this helps a lot

greetings
Jens

Am 31.01.2012 13:53, schrieb Erik Hatcher-4 [via Lucene]:
>
> On Jan 31, 2012, at 04:42 , Jens Ellenberg wrote:
>
> > Hello,
> >
> > Is there a reference to this status-codes?
>
> Just the source code.  SolrCore#setResponseHeaderValues, which 
> predominately uses the codes specified in SolrException:
>
> BAD_REQUEST( 400 ),
> UNAUTHORIZED( 401 ),  // not currently used
> FORBIDDEN( 403 ),
> NOT_FOUND( 404 ),
> SERVER_ERROR( 500 ),
> SERVICE_UNAVAILABLE( 503 ),
> UNKNOWN(0);
>
> Erik
>
>
> >
> >
> > Erik Hatcher wrote
> >>
> >> It means the request was successful.  If the status is non-zero (err,
> >> 1) then there was an error of some sort.
> >>
> >> Erik
> >>
> >> On Dec 4, 2008, at 9:32 AM, Robert Young wrote:
> >>
> >>> In the standard response format, what does the status mean? It
> >>> always seems
> >>> to be 0.
> >>>
> >>> Thanks
> >>> Rob
> >>
> >
> >
> > --
> > View this message in context: 
> http://lucene.472066.n3.nabble.com/Response-status-tp490876p3702747.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> 
> If you reply to this email, your message will be added to the 
> discussion below:
> http://lucene.472066.n3.nabble.com/Response-status-tp490876p3703172.html
> To unsubscribe from Response status, click here 
> .
> NAML 
> 
>  
>



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Response-status-tp490876p3703708.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Vadim Kisselmann

Hi Erick,

> I didn't read your first post carefully enough, I was keying
> on the words "filter query". Your query does not have
> any filter queries! I thought you were talking
> about &fq=language:de type clauses, which is what
> I was responding to.

no problem, i understand:)

> Solr/Lucene have no way of
> interpreting an extended "q" clause and saying
> "this part is a query and should be highlighted and
> this part isn't".
>
> Try the &fq option maybe?

I thought so, unfortunately.
&fq will be the only option. I should rebuild my application :)

Best Regards
Vadim

Re: removing cores solrcloud

2012-01-31 Thread Mark Miller

On Jan 31, 2012, at 4:49 AM, Phil Hoy wrote:

> Hi,
> 
> I am running solrcloud and i am able to add cores 
> http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin but how 
> does one remove cores. If i use the core admin unload command, distributed 
> queries then error as they still query the removed core. Do I need to update 
> zookeeper somehow?
> 
> Phil

Hey Phil - yeah, currently you would have to manually remove the core from 
zookeeper. Once we see it, we expect it to be part of the index - perhaps we 
should remove it on an explicit core reload though?

What version of trunk are you using?

- Mark Miller
lucidimagination.com

Re: ShingleFilterFactory not indexing the whole doc, where is the limit ?

2012-01-31 Thread Ahmet Arslan

> I'm trying to index word-ngrams using
> the solr.ShingleFilterFactory,
> (storing their positions + offset)
> ...
>      class="solr.TextField"
> positionIncrementGap="1">
>       
>            class="solr.HTMLStripCharFilterFactory"/>
>      class="solr.WhitespaceTokenizerFactory" />
>          class="solr.LowerCaseFilterFactory" />
>          class="solr.ShingleFilterFactory" minShingleSize="3"
> maxShingleSize="5" outputUnigrams="false"
> tokenSeparator="_"/>
>       
> ...
>  indexed="true"
> stored="true" multiValued="false" termVectors="true"
> termPositions="true" termOffsets="true"/>
> ...
> i'm testing it with a (big?) html document, [1.300.000
> chars], with lots of tags
> Looking at the index (using Schema browser web interface), i
> can see
> some ngrams were indexed (8939)
> but it appears that they were found only in the beginning of
> the
> document (first 1/8 of the document)

It could be the maxFieldLength setting in solrconfig.xml . Set it to 
2147483647

Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Ahmet Arslan

> > Try the &fq option maybe?
> 
> I thought so, unfortunately.
> &fq will be the only option. I should rebuild my
> application :)

Could hl.q help? http://wiki.apache.org/solr/HighlightingParameters#hl.q

Re: Solrj commit affecting documents count

2012-01-31 Thread Andre Bois-Crettez


Why do you commit in the middle of a full import then, if you don't have
to ?

dprasadx wrote:

Hi, I am using solrj server to commit few changes in the data into the master
index through a java program. It works OK unless we do not do a full-import.
But when I do a full-import (say for 800 records), and if I perform a solrj
commit in between the full-import indexing, I see a commit happens to the
index at that time, and my total documents in the index gets affected.
For example while indexing 800 odd records, if solrj commit happens in the
middle of indexing, I see the documents count in the master has changed to
400... and once the full-import indexing is completed, I get back all the
800 records.

Is there a way to commit the changes into the master without affecting the
count while the indexing is going on ? Please help.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-commit-affecting-documents-count-tp3703646p3703646.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

Re: Solrj commit affecting documents count

2012-01-31 Thread Ahmet Arslan

Not sure if this helps but full-import deletes whole index using *:* query in 
the beginning of import. Can you can disable this behavior by using 
&clean=false&command=full-import

--- On Tue, 1/31/12, Andre Bois-Crettez  wrote:

> From: Andre Bois-Crettez 
> Subject: Re: Solrj commit affecting documents count
> To: "solr-user@lucene.apache.org" 
> Date: Tuesday, January 31, 2012, 6:35 PM
> Why do you commit in the middle of a
> full import then, if you don't have
> to ?
> 
> dprasadx wrote:
> > Hi, I am using solrj server to commit few changes in
> the data into the master
> > index through a java program. It works OK unless we do
> not do a full-import.
> > But when I do a full-import (say for 800 records), and
> if I perform a solrj
> > commit in between the full-import indexing, I see a
> commit happens to the
> > index at that time, and my total documents in the index
> gets affected.
> > For example while indexing 800 odd records, if solrj
> commit happens in the
> > middle of indexing, I see the documents count in the
> master has changed to
> > 400... and once the full-import indexing is completed,
> I get back all the
> > 800 records.
> >
> > Is there a way to commit the changes into the master
> without affecting the
> > count while the indexing is going on ? Please help.
> >
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Solrj-commit-affecting-documents-count-tp3703646p3703646.html
> > Sent from the Solr - User mailing list archive at
> Nabble.com.
> >
> >
> 
> --
> André Bois-Crettez
> 
> Search technology, Kelkoo
> http://www.kelkoo.com/
> 
> 
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
> 
> Ce message et les pièces jointes sont confidentiels et
> établis à l'attention exclusive de leurs destinataires. Si
> vous n'êtes pas le destinataire de ce message, merci de le
> détruire et d'en avertir l'expéditeur.
>

Re: Solrj commit affecting documents count

2012-01-31 Thread dprasadx

I need to index our data 10 times a day due to frequent data changes.
We have placed a mechanism where the data entered by the user in the front
end is submitted into the solr index directly through solrj server.  So, if
the solrj commit occurs during the middle of indexing, I lose all the
records in the index and the index contains only the documents during that
point of commit. Once the indexing is completed, I get back all the records.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-commit-affecting-documents-count-tp3703646p3703972.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query for exact part of sentence

2012-01-31 Thread zarni aung

Did you rebuild the index?  That would help since the index analyzer has
been changed.

On Tue, Jan 31, 2012 at 9:53 AM, Arkadi Colson  wrote:

> The text field in the schema configuration looks like this. I changed
> catenateNumbers to 0 but it still doesn't work as aspected.
>
> 
> 
> 
> 
> 
>
> ignoreCase="true"
>words="stopwords_en.txt"
>enablePositionIncrements="**true"
>/>
> ignoreCase="true"
>words="stopwords_du.txt"
>enablePositionIncrements="**true"
>/>
>  generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
>  maxGramSize="15"/>
> 
> 
> 
>  ignoreCase="true" expand="true"/>
> ignoreCase="true"
>words="stopwords_en.txt"
>enablePositionIncrements="**true"
>/>
> ignoreCase="true"
>words="stopwords_du.txt"
>enablePositionIncrements="**true"
>/>
>  generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> 
>
>
>
> On 01/31/2012 03:03 PM, Erick Erickson wrote:
>
>> Unless you provide your schema configuration, there's
>> not much to go on here. Two things though:
>>
>> 1>  look at the admin/analysis page to see how your
>>  data is broken up into tokens.
>> 2>  at a guess you have WordDelimiterFilterFactory
>>  in your chain and perhaps catenateNumbers="1"
>>
>> Best
>> Erick
>>
>> On Mon, Jan 30, 2012 at 3:21 AM, Arkadi Colson
>>  wrote:
>>
>>> Hi
>>>
>>> I'm using the pecl PHP class to query SOLR and was wondering how to query
>>> for a part of a sentence exactly.
>>>
>>> There are 2 data items index in SOLR
>>> 1327497476: 123 456 789
>>> 1327497521. 1234 5678 9011
>>>
>>> However when running the query, both data items are returned as you can
>>> see
>>> below. Any idea why?
>>>
>>> Thanks!
>>>
>>> SolrObject Object
>>> (
>>>[responseHeader] =>SolrObject Object
>>>(
>>>[status] =>0
>>>[QTime] =>5016
>>>[params] =>SolrObject Object
>>>(
>>>[debugQuery] =>true
>>>[shards] =>
>>>  solr01:8983/solr,solr02:8983/**solr,solr03:8983/solr
>>>[fl] =>
>>>  id,smsc_module,smsc_ssid,smsc_**description,smsc_content,smsc_**
>>> courseid,smsc_date_created,**smsc_date_edited,score,**
>>> metadata_stream_size,metadata_**stream_source_info,metadata_**
>>> stream_name,metadata_stream_**content_type,last_modified,**
>>> author,title,subject
>>>[sort] =>smsc_date_created asc
>>>[indent] =>on
>>>[start] =>0
>>>[q] =>(smsc_content:\"123 456\" ||
>>> smsc_description:\"123 456\")&&(smsc_module:Intradesk)&&
>>>  (smsc_date_created:[2011-12-**25T10:29:51Z TO NOW])&&(smsc_ssid:38)
>>>[distrib] =>true
>>>[wt] =>xml
>>>[version] =>2.2
>>>[rows] =>55
>>>)
>>>
>>>)
>>>
>>>[response] =>SolrObject Object
>>>(
>>>[numFound] =>2
>>>[start] =>0
>>>[docs] =>Array
>>>(
>>>[0] =>SolrObject Object
>>>(
>>>[smsc_module] =>Intradesk
>>>[smsc_ssid] =>38
>>>[id] =>1327497476
>>>[smsc_courseid] =>0
>>>[smsc_date_created] =>2011-12-25T10:29:51Z
>>>[smsc_date_edited] =>2011-12-25T10:29:51Z
>>>[score] =>10.028017
>>>)
>>>
>>>[1] =>SolrObject Object
>>>(
>>>[smsc_module] =>Intradesk
>>>[smsc_ssid] =>38
>>>[id] =>1327497521
>>>[smsc_courseid] =>0
>>>[smsc_date_created] =>2011-12-25T10:29:51Z
>>>[smsc_date_edited] =>2011-12-25T10:29:51Z
>>>[score] =>5.541335
>>>)
>>>
>>>)
>>>
>>>)
>>>[debug] =>SolrObject Object
>>>(
>>>[rawquerystring] =>(smsc_content:\"123 456\" ||
>>> smsc_description:\"123 456\")&&(smsc_module:Intradesk)&&
>>>  (smsc_date_created:[2011-12-**25T10:29:51Z TO NOW])&&(smsc_ssid:38)
>>>[querystring] =>(smsc_content:\"123 456\" ||
>>> smsc_description:\"123 456\")&&(smsc_module:Intradesk)&&
>>>  (smsc_

Re: /no_coord in dismax scoring explain

2012-01-31 Thread Chris Hostetter


: What does "/no_coord" mean in the dismax scoring output? I've looked
: through the wiki mail archives, lucidfind, and can't find any reference.

it's part of the BooleanQuery toString output if the BQ was constructed 
with disableCoord=true

-Hoss

Re: "index-time" over boosted

2012-01-31 Thread Chris Hostetter


: it worked (I'm using Solr-3.4.0, not that it matters)!!
: 
: I'll try to figure out what went wrong ...with my limited skills.

skimming the thread, i'm going to guess that even though you were adding 
omitNorms=true and restarting solr you weren't re-indexing until Jan 
suggested starting clean in this message.

omitNorms is a setting that affects documents as they are indexed.

: The solution omitNorms="true" works for now but it's not a long term
: solution in my opinion. I also need to figure out how to make all that work.

To be crystal clear here, you originally said your problem was...

>> I've come accros a problem where newly indexed pages almost always come
>> first even when the term frequency is relatively slow.

omitNorms has *NOTHING* to do with the age of a document in the index.  if 
documents where scoring higher becuase of the fieldNorm then that means 
either:

  1) the documents were getting an explicit index time doc/field boost by 
 the client that added them
  2) the document lengths were the differentiator

...omiting norms may have "fixed" a few examples of your "newer docs scre 
higher" situation, but if thats really the core of your problem, then 
omitting norms is a completley orthoginal change.



-Hoss

Re: solr utf8 for words like compagnieën?

2012-01-31 Thread RT


Hi Gora,

thanks a lot for the below feedback. I use toLatin1() frequently and will 
opt for that to see what it does for me.


Thanks again.

Kind regards,

Roland

Gora Mohanty wrote:

On Tue, Jan 31, 2012 at 1:50 PM, RT  wrote:

Hi,

I am having a bit of trouble getting words with characters such as:

ė, į, ų etc into SOLR.

Programming in C++ (Using Qt's QString) I am wondering what conversion to
apply before compiling words with such letters into the solrquery.

Is UTF8 the correct encoding?


UTF8 should be fine, though Latin1 will also work here.
How are you getting the UTF8 for these strings? Have
you looked at
http://developer.qt.nokia.com/doc/qt-4.8/QString.html#converting-between-8-bit-strings-and-unicode-strings

Regards,
Gora

Re: ShingleFilterFactory not indexing the whole doc, where is the limit ?

2012-01-31 Thread Pierre JdlF

Works now ! thanks a lot
... i guess until a document with more than 2.147.483.647 chars
'happy night
+ Pierre

On Tue, Jan 31, 2012 at 5:23 PM, Ahmet Arslan  wrote:
>> I'm trying to index word-ngrams using
>> the solr.ShingleFilterFactory,
>> (storing their positions + offset)
>> ...
>>     > class="solr.TextField"
>> positionIncrementGap="1">
>>       
>>           > class="solr.HTMLStripCharFilterFactory"/>
>>     > class="solr.WhitespaceTokenizerFactory" />
>>         > class="solr.LowerCaseFilterFactory" />
>>         > class="solr.ShingleFilterFactory" minShingleSize="3"
>> maxShingleSize="5" outputUnigrams="false"
>> tokenSeparator="_"/>
>>       
>> ...
>> > indexed="true"
>> stored="true" multiValued="false" termVectors="true"
>> termPositions="true" termOffsets="true"/>
>> ...
>> i'm testing it with a (big?) html document, [1.300.000
>> chars], with lots of tags
>> Looking at the index (using Schema browser web interface), i
>> can see
>> some ngrams were indexed (8939)
>> but it appears that they were found only in the beginning of
>> the
>> document (first 1/8 of the document)
>
> It could be the maxFieldLength setting in solrconfig.xml . Set it to 
> 2147483647

RE: removing cores solrcloud

2012-01-31 Thread Phil Hoy

Hi Mark,

I am using the embedded zookeeper server, how would you recommend I connect to 
it so that I can remove the missing core or is it only possible when using a 
stand-alone zookeeper instance?

You are of course correct the reload command as well a few others should cause 
a resync with the zookeepers state too.

I am currently using version 4.0.0.2011.12.12.09.26.56.

Phil

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: 31 January 2012 16:09
To: solr-user@lucene.apache.org
Subject: Re: removing cores solrcloud

On Jan 31, 2012, at 4:49 AM, Phil Hoy wrote:

> Hi,
> 
> I am running solrcloud and i am able to add cores 
> http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin but how 
> does one remove cores. If i use the core admin unload command, distributed 
> queries then error as they still query the removed core. Do I need to update 
> zookeeper somehow?
> 
> Phil

Hey Phil - yeah, currently you would have to manually remove the core from 
zookeeper. Once we see it, we expect it to be part of the index - perhaps we 
should remove it on an explicit core reload though?

What version of trunk are you using?

- Mark Miller
lucidimagination.com

__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs
__

Re: solr utf8 for words like compagnieën?

2012-01-31 Thread RT


Hi,

Both Latin1 and Utf8 conversion yield the same negative results.

I get compagnieën back from SOLR as:

compagnieÃ«n

I post with: toLatin1() and retrieve from SOLR into QString with 
QString::fromLatin1()


Rather dissapointing. Any ideas as to what I may be doing wrong are very 
welcome at this stage.


Thanks,

Roland.

Gora Mohanty wrote:

On Tue, Jan 31, 2012 at 1:50 PM, RT  wrote:

Hi,

I am having a bit of trouble getting words with characters such as:

ė, į, ų etc into SOLR.

Programming in C++ (Using Qt's QString) I am wondering what conversion to
apply before compiling words with such letters into the solrquery.

Is UTF8 the correct encoding?


UTF8 should be fine, though Latin1 will also work here.
How are you getting the UTF8 for these strings? Have
you looked at
http://developer.qt.nokia.com/doc/qt-4.8/QString.html#converting-between-8-bit-strings-and-unicode-strings

Regards,
Gora

Re: removing cores solrcloud

2012-01-31 Thread Mark Miller


On Jan 31, 2012, at 1:03 PM, Phil Hoy wrote:

> Hi Mark,
> 
> I am using the embedded zookeeper server, how would you recommend I connect 
> to it so that I can remove the missing core or is it only possible when using 
> a stand-alone zookeeper instance?

Nope, both cases are the same - you just need a ZK tool and the ZK address to 
connect that tool to ZK. ZK itself comes with some command line scripts that 
you could use - their are also a couple GUI tools out there.

If you use eclipse, my favorite way to interact with ZK is 
http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper

I think (hard to remember what came in when) you just have to remove the node 
from /node_states and the overseer will update the cluster state. Sami Siren 
might be able to comment more on that.

I am looking into doing this automatically when you unload a SolrCore - 
https://issues.apache.org/jira/browse/SOLR-3080

> 
> You are of course correct the reload command as well a few others should 
> cause a resync with the zookeepers state too.
> 
> I am currently using version 4.0.0.2011.12.12.09.26.56.
> 
> Phil
> 
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com] 
> Sent: 31 January 2012 16:09
> To: solr-user@lucene.apache.org
> Subject: Re: removing cores solrcloud
> 
> 
> On Jan 31, 2012, at 4:49 AM, Phil Hoy wrote:
> 
>> Hi,
>> 
>> I am running solrcloud and i am able to add cores 
>> http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin but how 
>> does one remove cores. If i use the core admin unload command, distributed 
>> queries then error as they still query the removed core. Do I need to update 
>> zookeeper somehow?
>> 
>> Phil
> 
> 
> Hey Phil - yeah, currently you would have to manually remove the core from 
> zookeeper. Once we see it, we expect it to be part of the index - perhaps we 
> should remove it on an explicit core reload though?
> 
> What version of trunk are you using?
> 
> - Mark Miller
> lucidimagination.com
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> __
> This email has been scanned by the brightsolid Email Security System. Powered 
> by MessageLabs
> __

- Mark Miller
lucidimagination.com

Re: SolrCloud - issues running with embedded zookeeper ensemble

2012-01-31 Thread Mark Miller

Hey Dipti -

Can you give the exact startup cmds you are using for each of the instances? I 
have got Example C going, so I'll have to try and dig into whatever you are 
seeing.

- mark

On Jan 27, 2012, at 12:53 PM, Dipti Srivastava wrote:

> Hi Mark,
> Did you get a chance to look into the issues with running the embedded 
> Zookeeper ensemble, as per Example C, from the 
> http://wiki.apache.org/solr/SolrCloud2
> 
> Hi All,
> Did anyone else run multiple shards with embedded zk ensemble successfully? 
> If so would like some tips on any issues that you came across.
> 
> Regards,
> Dipti
> 
> From: diptis 
> Date: Fri, 23 Dec 2011 10:32:52 -0700
> To: "markrmil...@gmail.com" 
> Subject: Re: Release build or code for SolrCloud
> 
> Hi Mark,
> There is some issue with specifying localhost vs actual host names for zk. 
> When I changed my script to specify the actual hostname (which should be 
> local by default) the first, 2nd and 3rd instances came up, that have the 
> embedded zk running. Now, I am getting the same exception for the 4th AMI 
> which in NOT part of the zookeeper ensemble. I want to zk only on 3 of the 4 
> instances.
> 
> java -Dbootstrap_confdir=./solr/conf –DzkRun="9983>" 
> -DzkHost=:9983,:9983,:9983 -DnumShards=2 -jar
> start.jar
> 
> Dipti
> 
> From: Mark Miller 
> Reply-To: "markrmil...@gmail.com" 
> Date: Fri, 23 Dec 2011 09:34:52 -0700
> To: diptis 
> Subject: Re: Release build or code for SolrCloud
> 
> I'm having trouble getting a quorum up using the built in SolrZkServer as 
> well - so i have not been able to replicate this - I'll have to keep digging. 
> Not sure if it's due to a ZooKeeper update or what yet.
> 
> 2011/12/21 Dipti Srivastava 
>> Hi Mark,
>> Thanks! So now I am deploying a 4 node cluster on AMI's and the main
>> instance that bootstraps the config to the zookeeper does not come up I
>> get an exception as follows. My solrcloud.sh looks like
>> 
>> #!/usr/bin/env bash
>> 
>> cd ..
>> 
>> rm -r -f example/solr/zoo_data
>> rm -f example/example.log
>> 
>> cd example
>> #java -DzkRun -DnumShards=2 -DSTOP.PORT=7983 -DSTOP.KEY=key -jar start.jar
>> 1>example.log 2>&1 &
>> java -Dbootstrap_confdir=./solr/conf -DzkRun
>> -DzkHost=:9983,:9983,:9983 -DnumShards=2 -jar
>> start.jar
>> 
>> 
>> 
>> 
>> And when I RUN it
>> 
>> --CLOUD--[ec2-user@ cloud-dev]$ ./solrcloud.sh
>> 2011-12-22 02:18:23.352:INFO::Logging to STDERR via
>> org.mortbay.log.StdErrLog
>> 2011-12-22 02:18:23.510:INFO::jetty-6.1-SNAPSHOT
>> Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader
>> locateSolrHome
>> INFO: JNDI not configured for solr (NoInitialContextEx)
>> Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader
>> locateSolrHome
>> INFO: solr home defaulted to 'solr/' (could not find system property or
>> JNDI)
>> Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader 
>> INFO: Solr home set to 'solr/'
>> Dec 22, 2011 2:18:23 AM org.apache.solr.servlet.SolrDispatchFilter init
>> INFO: SolrDispatchFilter.init()
>> Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader
>> locateSolrHome
>> INFO: JNDI not configured for solr (NoInitialContextEx)
>> Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader
>> locateSolrHome
>> INFO: solr home defaulted to 'solr/' (could not find system property or
>> JNDI)
>> Dec 22, 2011 2:18:23 AM org.apache.solr.core.CoreContainer$Initializer
>> initialize
>> INFO: looking for solr.xml: /home/ec2-user/solrcloud/example/solr/solr.xml
>> Dec 22, 2011 2:18:23 AM org.apache.solr.core.CoreContainer 
>> INFO: New CoreContainer 1406140084
>> Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader
>> locateSolrHome
>> INFO: JNDI not configured for solr (NoInitialContextEx)
>> Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader
>> locateSolrHome
>> INFO: solr home defaulted to 'solr/' (could not find system property or
>> JNDI)
>> Dec 22, 2011 2:18:23 AM org.apache.solr.core.SolrResourceLoader 
>> INFO: Solr home set to 'solr/'
>> Dec 22, 2011 2:18:24 AM org.apache.solr.cloud.SolrZkServerProps
>> getProperties
>> INFO: Reading configuration from: solr/zoo.cfg
>> Dec 22, 2011 2:18:24 AM org.apache.solr.cloud.SolrZkServerProps
>> parseProperties
>> INFO: Defaulting to majority quorums
>> Dec 22, 2011 2:18:24 AM org.apache.solr.servlet.SolrDispatchFilter init
>> SEVERE: Could not start Solr. Check solr/home property and the logs
>> java.lang.IllegalArgumentException: port out of range:-1
>>at java.net.InetSocketAddress.(InetSocketAddress.java:83)
>>at java.net.InetSocketAddress.(InetSocketAddress.java:63)
>>at
>> org.apache.solr.cloud.SolrZkServerProps.setClientPort(SolrZkServer.java:310
>> )
>>at
>> org.apache.solr.cloud.SolrZkServerProps.getMySeverId(SolrZkServer.java:273)
>>at
>> org.apache.solr.cloud.SolrZkServerProps.parseProperties(SolrZkServer.java:4
>> 50)
>>at 
>> org.apache.solr.cloud.SolrZkServer.parseConfig(SolrZkServer.java:85)
>>at
>>

string encoding

2012-01-31 Thread RT


Hi,

there is a post going on encoding international characters. In the mean 
time based on this section:


http://wiki.apache.org/solr/FAQ

where it states that there may be a problem with the Container Servelet, I 
am using the jetty setup from the example directory. And wondering whether 
there are known issues with that.


Although there are apparently no known issues with character encoding in 
solr, are there any known issues with encoding in the example configuration 
of the solr package?


Kind regards and thanks for the help.

Roland

Re: string encoding

2012-01-31 Thread Gora Mohanty

On Tue, Jan 31, 2012 at 11:51 PM, RT  wrote:
> Hi,
>
> there is a post going on encoding international characters. In the mean time
> based on this section:
>
> http://wiki.apache.org/solr/FAQ
>
> where it states that there may be a problem with the Container Servelet, I
> am using the jetty setup from the example directory. And wondering whether
> there are known issues with that.
>
> Although there are apparently no known issues with character encoding in
> solr, are there any known issues with encoding in the example configuration
> of the solr package?

Have used Solr both with the built-in Jetty, and with Tomcat to store Unicode
without issues. You should ensure that the rest of your indexing/searching
chain can also handle Unicode, but by and large, this should be straightforward.

Regards,
Gora

Re: solr utf8 for words like compagnieën?

2012-01-31 Thread Chris Hostetter


: Programming in C++ (Using Qt's QString) I am wondering what conversion to
: apply before compiling words with such letters into the solrquery.

if you are implementing your own client and talking to Solr via HTTP then 
how you escape/encode characters in URLs is largely dependent on how you 
have your servlet container configured.

For example...

https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
https://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config


One way to help debug what is going on is to use "echoParams=all" in your 
request, and then look at the "requestHeader" section of the response.
that will echo back *exactly* what solr got from the servlet container for 
each of your input params.  If you use it with the python response writer 
(wt=python) you will see escape sequences for any non-ascii characters -- 
which can be helpful for verifying the problem is in how the utf8 input is 
parsed, and not in how it is being returned to your client

there is also an "example/exampledocs/test_utf8.sh" that is helpful for 
sanity checking these sorts of things independent of your client code.


-Hoss

Re: Solrj commit affecting documents count

2012-01-31 Thread Erick Erickson

I really don't understand this. It seems that not doing a full import
and just re-submitting the changed documents is something
you should consider.

Or just don't commit. Or consider using two cores, the idea here
is that you have your "live" core that serves requests, and
indexing to your new core, then swapping the cores after the
indexing is complete.

Best
Erick

On Tue, Jan 31, 2012 at 11:54 AM, dprasadx  wrote:
> I need to index our data 10 times a day due to frequent data changes.
> We have placed a mechanism where the data entered by the user in the front
> end is submitted into the solr index directly through solrj server.  So, if
> the solrj commit occurs during the middle of indexing, I lose all the
> records in the index and the index contains only the documents during that
> point of commit. Once the indexing is completed, I get back all the records.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solrj-commit-affecting-documents-count-tp3703646p3703972.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Vadim Kisselmann

Hmm, i don´t know, but i can test it tomorrow at work.
i´m not sure about the right syntax with hl.q. (?)
but i report :)




2012/1/31 Ahmet Arslan :
>> > Try the &fq option maybe?
>>
>> I thought so, unfortunately.
>> &fq will be the only option. I should rebuild my
>> application :)
>
> Could hl.q help? http://wiki.apache.org/solr/HighlightingParameters#hl.q

Re: Grouping and sorting results

2012-01-31 Thread Erick Erickson

Nothing that I know of wil give you what you want OOB, but
there are two possibilities:

Query Elevantion Component is a broad-brush way to go,
but it's not very flexible so if 10 minutes looking at how it
works doesn't excite you, don't spend too much time on it.

If, at query time, you know your provider ordering, you can
take advantage of the tf/idf stuff by, say, adding a clause like
(provider1^10 provider2^8 provider3^6 ) where your
default operator is OR Made up boost numbers BTW.

Best
Erick

On Tue, Jan 31, 2012 at 9:48 AM, Vijay J  wrote:
> Hi,
>
> I'm running into some issues with solr scoring affecting ordering of query
> results. Is it possible to run a Solr boolean query to check if
> a document contains any search terms and get the results back without any
> scoring mechanism besides presence or absence of any of the search
> terms? Basically I'd like to turn off tf-idf/Vector space scoring for one
> query and get the ordering by boolean instead of score.
>
> As an example, suppose I have a collection of wholesale providers and some
> of the wholesale providers are preferred over other providers. I'd like
> Solr to get all the preferred providers,
> apply an ordering  then get all the non-preferred providers and apply an
> ordering.  (i.e. give me a group of preferred providers and apply a sort to
> the preferred provider result set and group non-preferred provider and sort
> that result set)
> The preferred status of the provider is not known ahead of time until query
> execution.
>
>
> Can you give us a lead on this?
>
> Thank you!
>
>
> Regards,
> Vijay

Re: What is the most basic schema.xml you can have for indexing a simple database?

2012-01-31 Thread Chris Hostetter


: Yes, refactoring the various example schema.xml's is what i have been 
: doing up to now. The end results is usually quite verbose with a lot of 
: redundancy. What is the most compact possible schema.xml?

>From my Solr OOTB Talk...
http://people.apache.org/~hossman/apachecon2011/


  

  
  

  
  



...that's the most compact possible schema.xml ... works great for 
throwing arbitrary data into solr to see what it looks like and then 
iteratively adding to your schema as you see fit.

-Hoss

Re: Problem with replication

2012-01-31 Thread astubbs

Actually, I get: 
No files to download for index generation:
this is after deleting the data directory on the slave.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-replication-tp2294313p3704457.html
Sent from the Solr - User mailing list archive at Nabble.com.

On-Demanp update if using DHI

2012-01-31 Thread Ramo Karahasan

Hi,

 

i'm using DHI for indexing my data. Currently I always to a delta import
manually as described here:
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport

 

The data is fetched from a database. How is it possible to update the index
if new data is inserted into the database? 

We have an PHP Applicatoin where we import some new data into the database,
I would like to index these data 1.) on-demand if we don't have much updates
2.) as a batch for example 50.000 documents  

Automatically

 

Would be great get some hints

 

Best regards,

Ramo

Re: solr utf8 for words like compagnieën?

2012-01-31 Thread RT


Hi Chris, Gora

thanks for the help. I am indeed writing a client conversing with solr with 
http get calls.


Using your suggestions (in particular the echoParams tip) I managed to find 
the problem.


Curiously it turns out that on sending messages I should not convert to 
Utf8 or otherwise and on receiving one has to. NOt sure if this is a Qt 
oddity but things seem to be fine now.


Thanks again,

Roland.


Chris Hostetter wrote:

: Programming in C++ (Using Qt's QString) I am wondering what conversion to
: apply before compiling words with such letters into the solrquery.

if you are implementing your own client and talking to Solr via HTTP then 
how you escape/encode characters in URLs is largely dependent on how you 
have your servlet container configured.


For example...

https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
https://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config


One way to help debug what is going on is to use "echoParams=all" in your 
request, and then look at the "requestHeader" section of the response.
that will echo back *exactly* what solr got from the servlet container for 
each of your input params.  If you use it with the python response writer 
(wt=python) you will see escape sequences for any non-ascii characters -- 
which can be helpful for verifying the problem is in how the utf8 input is 
parsed, and not in how it is being returned to your client


there is also an "example/exampledocs/test_utf8.sh" that is helpful for 
sanity checking these sorts of things independent of your client code.



-Hoss

Re: Problem with replication

2012-01-31 Thread astubbs

It may have been a permissions problem, or it stared working after the master
had done another fresh scheduled full-import and jumped an index version.
Timestamp issue?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-replication-tp2294313p3704559.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: On-Demanp update if using DHI

2012-01-31 Thread Erick Erickson

There's nothing built into Solr or DIH that automatically looks
to see if your DB has changed. People sometimes use
cron jobs or similar to fire off the delta-import query
on a regular schedule.

Best
Erick

On Tue, Jan 31, 2012 at 3:06 PM, Ramo Karahasan
 wrote:
> Hi,
>
>
>
> i'm using DHI for indexing my data. Currently I always to a delta import
> manually as described here:
> http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport
>
>
>
> The data is fetched from a database. How is it possible to update the index
> if new data is inserted into the database?
>
> We have an PHP Applicatoin where we import some new data into the database,
> I would like to index these data 1.) on-demand if we don't have much updates
> 2.) as a batch for example 50.000 documents
>
> Automatically
>
>
>
> Would be great get some hints
>
>
>
> Best regards,
>
> Ramo
>

Re: Solr Join query with fq not correctly filtering results?

2012-01-31 Thread Mike Hugo

I've been looking into this a bit further and am trying to figure out why
the FQ isn't getting applied.

Can anyone point me to a good spot in the code to start looking at how FQ
parameters are applied to query results in Solr4?

Thanks,

Mike

On Thu, Jan 26, 2012 at 10:06 PM, Mike Hugo  wrote:

> I created issue https://issues.apache.org/jira/browse/SOLR-3062 for this
> problem.  I was able to track it down to something in this commit -
> http://svn.apache.org/viewvc?view=revision&revision=1188624 (LUCENE-1536:
> Filters can now be applied down-low, if their DocIdSet implements a new
> bits() method, returning all documents in a random access way
> ) - before that commit the join / fq functionality works as expected /
> documented on the wiki page.  After that commit it's broken.
>
> Any assistance is greatly appreciated!
>
> Thanks,
>
> Mike
>
>
> On Thu, Jan 26, 2012 at 11:04 AM, Mike Hugo  wrote:
>
>> Hello,
>>
>> I'm trying out the Solr JOIN query functionality on trunk.  I have the
>> latest checkout, revision #1236272 - I did the following steps to get the
>> example up and running:
>>
>> cd solr
>> ant example
>> java -jar start.jar
>> cd exampledocs
>> java -jar post.jar *.xml
>>
>> Then I tried a few of the sample queries on the wiki page
>> http://wiki.apache.org/solr/Join.  In particular, this is one that I'm
>> interest in
>>
>> Find all manufacturer docs named "belkin", then join them against
>>> (product) docs and filter that list to only products with a price less than
>>> 12 dollars
>>>
>>> http://localhost:8983/solr/select?q={!join+from=id+to=manu_id_s}compName_s:Belkin&fq=price:%5B%2A+TO+12%5D
>>
>>
>> However, when I run that query, I get two results, one with a price of
>> 19.95 and another with a price of 11.5  Because of the filter query, I'm
>> only expecting to see one result - the one with a price of 11.99.
>>
>> I was also able to replicate this in a unit test added to
>> org.apache.solr.TestJoin:
>>
>>   @Test
>>   public void testJoin_withFilterQuery() throws Exception {
>> assertU(add(doc("id", "1","name", "john", "title", "Director",
>> "dept_s","Engineering")));
>> assertU(add(doc("id", "2","name", "mark", "title", "VP",
>> "dept_s","Marketing")));
>> assertU(add(doc("id", "3","name", "nancy", "title", "MTS",
>> "dept_s","Sales")));
>> assertU(add(doc("id", "4","name", "dave", "title", "MTS",
>> "dept_s","Support", "dept_s","Engineering")));
>> assertU(add(doc("id", "5","name", "tina", "title", "VP",
>> "dept_s","Engineering")));
>>
>> assertU(add(doc("id","10", "dept_id_s", "Engineering", "text","These
>> guys develop stuff")));
>> assertU(add(doc("id","11", "dept_id_s", "Marketing", "text","These
>> guys make you look good")));
>> assertU(add(doc("id","12", "dept_id_s", "Sales", "text","These guys
>> sell stuff")));
>> assertU(add(doc("id","13", "dept_id_s", "Support", "text","These guys
>> help customers")));
>>
>> assertU(commit());
>>
>> //***
>> //This works as expected - the correct number of results are found
>> //***
>> // find people that develop stuff
>> assertJQ(req("q","{!join from=dept_id_s to=dept_s}text:develop",
>> "fl","id")
>>
>> ,"/response=={'numFound':3,'start':0,'docs':[{'id':'1'},{'id':'4'},{'id':'5'}]}"
>> );
>>
>> *//
>> *// this fails - the response returned finds all three people - it
>> should only find John*
>> *//expected
>> =/response=={"numFound":1,"start":0,"docs":[{"id":"1"}]}*
>> *//response = {*
>> *//"responseHeader":{*
>> *//  "status":0,*
>> *//  "QTime":4},*
>> *//"response":{"numFound":3,"start":0,"docs":[*
>> *//  {*
>> *//"id":"1"},*
>> *//  {*
>> *//"id":"4"},*
>> *//  {*
>> *//"id":"5"}]*
>> *//}}*
>> *//
>> *// find people that develop stuff - but limit via filter query to a
>> name of "john"*
>> *assertJQ(req("q","{!join from=dept_id_s to=dept_s}text:develop",
>> "fl","id", "fq", "name:john")*
>> *,"/response=={'numFound':1,'start':0,'docs':[{'id':'1'}]}"*
>> *);*
>>
>>   }
>>
>>
>> Interestingly, I know this worked at some point.  I had a snapshot build
>> in my ivy cache from 10/2/2011 and it was working with that
>> build maven_artifacts/org/apache/solr/
>> solr/4.0-SNAPSHOT/solr-4.0-20111002.161157-1.pom"
>>
>>
>> Mike
>>
>
>

Re: On-Demanp update if using DHI

2012-01-31 Thread Igor MILOVANOVIC

To have it updated on-demand you could just implement it inside your
application, in form of event trigger or hook (depending how is your
application's architecture).

For batch udpates it is just as simple as cron job script running as
fast as every minute. Limits (50k documents) are imposed on your side,
not on side of Solr...

On Tue, Jan 31, 2012 at 9:06 PM, Ramo Karahasan
 wrote:
>
> We have an PHP Applicatoin where we import some new data into the database,
> I would like to index these data 1.) on-demand if we don't have much updates
> 2.) as a batch for example 50.000 documents

--
Igor Milovanović
http://about.me/igor.milovanovic
http://umotvorine.com/

AW: On-Demanp update if using DHI

2012-01-31 Thread Ramo Karahasan

This would mean, that i call somewhere in my application the url that is 
described in: 
http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport from my 
application?

Ramo

-Ursprüngliche Nachricht-
Von: Igor MILOVANOVIC [mailto:pleti...@gmail.com] 
Gesendet: Dienstag, 31. Januar 2012 23:15
An: solr-user@lucene.apache.org
Betreff: Re: On-Demanp update if using DHI

To have it updated on-demand you could just implement it inside your 
application, in form of event trigger or hook (depending how is your 
application's architecture).

For batch udpates it is just as simple as cron job script running as fast as 
every minute. Limits (50k documents) are imposed on your side, not on side of 
Solr...

On Tue, Jan 31, 2012 at 9:06 PM, Ramo Karahasan  
wrote:
>
> We have an PHP Applicatoin where we import some new data into the 
> database, I would like to index these data 1.) on-demand if we don't 
> have much updates
> 2.) as a batch for example 50.000 documents

--
Igor Milovanović
http://about.me/igor.milovanovic
http://umotvorine.com/

Re: Edismax, Filter Query and Highlighting

2012-01-31 Thread Koji Sekiguchi


(12/02/01 4:28), Vadim Kisselmann wrote:

Hmm, i don´t know, but i can test it tomorrow at work.
i´m not sure about the right syntax with hl.q. (?)
but i report :)


hl.q can accept same syntax of q, including local params.

koji
--
http://www.rondhuit.com/en/

Re: Source code of post in example package of Solr

2012-01-31 Thread bing

Hi, iorixxx, Thanks. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Closed-Source-code-of-post-in-example-package-of-Solr-tp3702100p3705333.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr shards

2012-01-31 Thread Chris Hostetter


: Now in my case the indices are being built outside of Solr. So basically I
: create three sets of indices through Lucene API's. And at this point, I
: change the schema.xml and define the fields I have in these new indices. I

do you define a uniqueKey field in your schema.xml?  does that field 
actually exist in all of your documents? is it indexed? is it stored? is 
it actaully unique across all of your documents?

(these are things that Solr would normally take care of checking for you 
when indexing, but since you've bypassed Solr building these indexes you 
have to be more vigilent in checking these things -- you are in 
deep into "unsupported, experts only" territory)


-Hoss

Re: Strange things happen when I query with many facet.prefixes and fq filters

2012-01-31 Thread Chris Hostetter


: References: <1327606185216-3691370.p...@n3.nabble.com>
:  
:  <1327704368796-3694787.p...@n3.nabble.com>
: Message-ID: <1327708135.30472.yahoomail...@web160304.mail.bf1.yahoo.com>
: Subject: Strange things happen when I query with many facet.prefixes and fq
:  filters

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss

Re: Hierarchical faceting in UI

2012-01-31 Thread Chris Hostetter


: References:
: 
: Message-ID: <1327357980.36539.yahoomail...@web160302.mail.bf1.yahoo.com>
: Subject: Hierarchical faceting in UI

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss

Re: Hierarchical faceting in UI

2012-01-31 Thread Chris Hostetter


I'm not really following your specific example, but a worked through 
example of the "index full breadcrumb" type approach darren was suggesting 
for doing drill down i na hierarchy is described in slides 32-35 of 
this presentation (which was recorded as a webcast)...

http://people.apache.org/%7Ehossman/apachecon2010/facets/
http://www.lucidimagination.com/why-lucid/webinars/mastering-power-faceted-search

in general, i *strongly* recommend that you use unique ids for each 
"node" of your taxonomy as a way to avoid confusion when multiple nodes 
have the same label/name.  (in the presentation i talk about doing that, 
but the slides show simple strings to help viewers follow what's going on)

-Hoss

Re: On-Demanp update if using DHI

2012-01-31 Thread Gora Mohanty

On Wed, Feb 1, 2012 at 4:13 AM, Ramo Karahasan
 wrote:
> This would mean, that i call somewhere in my application the url that is 
> described in: 
> http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport from my 
> application?
[...]

Yes, if by application you mean the script used by a task scheduler
like cron.

Also, if you are doing frequent delta imports and/or your index is large,
you should consider how often to optimize when doing the import.
Optimization is on by default, so in order not to optimize, you need to
add &optimize=false to the URL. It is advisable to optimize at least
once in a while, say at off-peak times.

Regards,
Gora

not displaying html code in the results

2012-01-31 Thread John kim

Hello,

What options do I have to hide "ugly" data in the search results? For
example, I am crawling HTML pages and some documents have loose tags
or a long string such as "32lkj31U682860678Stock "

I could scrub the data before getting ingested into the index. (html
parsing, removing strings longer than x characters)

Once the data is in the index, is there anything i can do to the index
to not display ugly data?

Once the data is returned, i could create some rules to hide certain text...

what's the best way to go about this problem?

Re: Indexing leave behind write.lock file.

2012-01-31 Thread Koorosh Vakhshoori

Here is how I got SolrJ to delete the write.lock file. I switched to the
CoreContainer's remove() method. So the new code is:

...
SolrCore curCore = container.remove("core1");
curCore.close();

Now my understanding for why it is working. Based on Solr source code, the
issue had to do with the core's reference count not ending up at zero when
the close() method is called. The getCore() method increments the reference
count while remove() doesn't. Since the close() method decrements the count
first and if and only if the count is zero it would unlock the core, i.e.
remove the write.lock.

Regards,

Koorosh 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-leave-behind-write-lock-file-tp3701915p3705554.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multilingual search in multicore solr

2012-01-31 Thread bing

Hi, Erick, 

Thanks for your comment. Though I have some experience in Solr,  I am
completely a newbie in SolrJ, and haven't tried using SolrJ to access Solr.
For now, I have a src package of solr3.5.0, and a SolrJ sc downloaded from
web that I want to incorporate into Solr and have a try. How would I do to
build and run it? Where should I put the sc in the package? Is IDE a must to
do that? 

I cannot find many start-up tutorials about that, thus would be grateful if
any suggestions and hints brought about. 

Best 
Bing 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multilingual-search-in-multicore-solr-tp3698969p3705556.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing content in XML files

2012-01-31 Thread bing

Hi, all, 

Thanks for the comment. Then I will abandon post.jar, and try to learn SolrJ
instead. 

Best
Bing

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-content-in-XML-files-tp3702795p3705563.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: removing write.lock file in solr after indexing

2012-01-31 Thread Shyam Bhaskaran

Hi Erick,

I was able to resolve the issue with 'write.lock' files.

Using container.remove("core1") or using container.shutdown() is helping to 
remove the 'write.lock' files.

-Shyam

Hierarchical faceting with solr 1.4 version

2012-01-31 Thread Gupta, Veeranjaneya

Hi,

I am using SOLR1.4 version. I have a requirement to display the data as below.



















Here we store 3 types of polygons MSA, AREA and HOOD.
One MSA can have more than one AREA types as children and one AREA can have 
more than one HOOD types as children.
How to store this kind of data in solr and how to query the solr to get this 
hierarchical data?

Thanks, Gupta

Re: hot deploy of newer version of solr schema in production

2012-01-31 Thread roz dev

Thanks Jan for your inputs.

I am keen to know about the way people keep running live sites while there
is a breaking change which calls for complete re-indexing.
we want to build a new index , with new schema (it may take couple of
hours) without impacting live e-commerce site.

any thoughts are welcome

Thanks
Saroj


On Tue, Jan 24, 2012 at 12:21 AM, Jan Høydahl  wrote:

> Hi,
>
> To be able to do a true hot deploy of newer schema without reindexing, you
> must carefully see to that none of your changes are breaking changes. So
> you should test the process on your development machine and make sure it
> works. Adding and deleting fields would work, but not changing the
> field-type or analysis of an existing field. Depending on from/to version,
> you may want to keep the old schema-version number.
>
> The process is:
> 1. Deploy the new schema, including all dependencies such as dictionaries
> 2. Do a RELOAD CORE http://wiki.apache.org/solr/CoreAdmin#RELOAD
>
> My preference is to do a more thorough upgrade of schema including new
> functionality and breaking changes, and then do a full reindex. The
> exception is if my index is huge and the reason for Solr upgrade or schema
> change is to fix a bug, not to use new functionality.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 24. jan. 2012, at 01:51, roz dev wrote:
>
> > Hi All,
> >
> > I need community's feedback about deploying newer versions of solr schema
> > into production while existing (older) schema is in use by applications.
> >
> > How do people perform these things? What has been the learning of people
> > about this.
> >
> > Any thoughts are welcome.
> >
> > Thanks
> > Saroj
>
>

84 matches

Mail list logo