date:20121008

Re: Adding a new pseudo field

2012-10-08 Thread Upayavira

If I've understood you correctly, you could achieve this also with the
XSLTResponseWriter, it would be pretty trivial to write an XLST that
exposes the node position in the results, containing:



Stick that in solr/conf/xslt, and reference it with wt=xslt&tr=.xsl

That way you wouldn't need to modify Solr at all.

Also, look in Solr 4.0, which has calculated fields. Not sure if there's
the scope to find the document position as a function query though.

Upayavira

On Mon, Oct 8, 2012, at 05:02 AM, deniz wrote:
> well basically i was about to explain and ask once more for your opinions
> but
> this morning i just wanted to try something in the source code and it
> succeeded... so here is what i want and i did for getting it:
> 
> 
> What I wanted: .
> 
> The exact thing I want to is similar to "score" field. Normally it always
> exists but we can see it in a normal query response, unless we set
> fl:*,score. 
> For my case, I would like to see each documents position in a pseudo
> field
> like "score", so when i run a query with fl:*,position I want to see
> 5 for the 5th document in the result set.
> so to make it more clear when you search for
> "q=name:deniz&fl=*,position,score" the result set will be something like
> :
> 
> 19865
> 210024
> 31403
> 
> and when user runs another query lets say
> "q=name:stephan&fl=*,position,score" the result set will be like:
> 
> 11408
> 29865
> 310021
> 
> as you see, each time a different query will have different score,
> therefore
> a documents position - or ranking whichever you prefer to say - will be
> changed according to query
> 
> 
> What I did:
> 
> well after digging the source code, I am now able to see dynamic
> positions
> for each different search.. I have simply added a position function to
> DocIterator and implemented in in subclasses. Then I have added a control
> block in ReturnFields for checking if fl has position in it. It is
> working
> in a similar way with score. and the last thing to do was adding a custom
> augmenter class like PositionAugmenter - similar to ScoreAugmenter. Then
> I
> am done :) 
> 
> I hope it helps if anyone faces a similar issue...
> 
> 
> 
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Adding-a-new-pseudo-field-tp4011995p4012375.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Storing queries in Solr

2012-10-08 Thread Upayavira

Solr has a small query cache, but this does not hold queries for any
length of time, so won't suit your purpose.

The LucidWorks Search product has (I believe) a click tracking feature,
but that is about boosting documents that are clicked on, not specific
search terms. Parsing the Solr log, or pushing query terms to a
different core/index would really be the only way to achieve what you're
suggesting, as far as I am aware.

Processing logs would be preferable anyhow, as you don't really want to
be triggering an index write during each query (assuming you have more
queries than updates to your main index), and also if this is for
building a suggester index, then it is unlikely to need updating that
regularly - every hour or every day should be more than sufficient. You
could write a SearchComponent that logs queries in another format,
should the existing log format not be sufficient for you.

Upayavira

On Mon, Oct 8, 2012, at 01:24 AM, Jorge Luis Betancourt Gonzalez wrote:
> Hi!
> 
> I was wondering if there are any built-in mechanism that allow me to
> store the queries made to a solr server inside the index itself. I know
> that the suggester module exist, but as far as I know it only works for
> terms existing in the index, and not with queries. I remember reading
> about using some external program to parse the solr log and pushing the
> queries or any other interesting data into the index, is this the only
> way of accomplish this?
> 
> Greetings!
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

Re: Adding a new pseudo field

2012-10-08 Thread Upayavira

Good question. I know xslt could output json, but you'd have to write a
stylesheet that transforms the xml into json. I'm not sure whether you
can influence the content-type for the output with the xslt response
writer though.

There's also the velocity response writer, which sits behind the /browse
interface, that might help you also.

Upayavira

On Mon, Oct 8, 2012, at 08:54 AM, deniz wrote:
> Could xslt processor be useful for json response too? because i will be
> using
> the response not for browser but for some other jars.. 
> 
> 
> 
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Adding-a-new-pseudo-field-tp4011995p4012393.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: add shard to index

2012-10-08 Thread Upayavira

Given that Solr does not support distributed IDF, adding a shard without
balancing the number of documents could seriously skew your scoring. If
you are okay with that, then the next question is what happens if you
download the clusterstate.json from ZooKeeper, and add another entry,
along the lines of "shard3":{}, then upload it again, what would happen
then?

My theory is that the next host you start up would become the first node
of shard3. Worth a try (unless someone more knowledgeable tells us
otherwise!)

Upayavira

On Mon, Oct 8, 2012, at 01:35 AM, Radim Kolar wrote:
> i am reading this: http://wiki.apache.org/solr/SolrCloud section 
> Re-sizing a Cluster
> 
> Its possible to add shard to an existing index? I do not need to get 
> data redistributed, they can stay where they are, its enough for me if 
> new entries will be distributed into new number of shards. restarting 
> solr is fine.

Re: Problem with relating values in two multi value fields

2012-10-08 Thread Toke Eskildsen

On Mon, 2012-10-08 at 08:42 +0200, Torben Honigbaum wrote:
> sorry, my fault. This was one of my first ideas. My problem is, that
> I've 1.000.000 documents, each with about 20 attributes. Additionally
> each document has between 200 and 500 option-value pairs. So if I
> denormalize the data, it means that I've 1.000.000 x 350 (200 + 500 /
> 2) = 350.000.000 documents, each with 20 attributes. 

If you have a few hundred or less distinct primary attributes (the A, B,
C's in your example), you could create a new field for each of them:

  3
  A B C D
  200
  400
  240
  310
  ...
  ...

Query for "options:A" and facet on field "option_A" to get facets for
the specific field.

This normalization does increase the index size due to duplicated
secondary values between the option-fields, but since our assumption is
a relatively small amount of primary values, it should not be too much.

Alternatively, if you have many distinct primary attributes, index the
pairs as Jack suggests:

  3
  A B C D
  A=200
  B=400
  C=240
  D=310
  ...
  ...

Query for "options:A" and facet on field "option" with
field.prefix="A=". Your result will be A=200 (2), A=450 (1)... so you'll
have to strip "=" before display.

This normalization is potentially a lot heavier than the previous one,
as we have distinct_primaries * distinct_secondaries distinct values. 

Worst case, where every document only contains distinct combinations of
primary/secondary, we have 350M distinct option-values, which is quite
heavy for a single box to facet on. Whether that is better or worse that
350M documents, I don't know.

> Is denormalization the only way to handle this problem? I 

What you are trying to do does look quite a lot like hierarchical
faceting, which Solr does not support directly. But even if you apply
one of the experimental patches, it does not mitigate the potential
combinatorial explosion of your primary & secondary values.

So that leaves the question: How many distinct combinations of primary
and secondary values do you have?

Regards,
Toke Eskildsen

Re: add shard to index

2012-10-08 Thread Rafał Kuć

Hello!

Radim there is a JIRA issue -
https://issues.apache.org/jira/browse/SOLR-3755. It is work in
progress, but once finished Solr will enable you to add additional
shards on a live collection and split the ones that were already
created.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Given that Solr does not support distributed IDF, adding a shard without
> balancing the number of documents could seriously skew your scoring. If
> you are okay with that, then the next question is what happens if you
> download the clusterstate.json from ZooKeeper, and add another entry,
> along the lines of "shard3":{}, then upload it again, what would happen
> then?

> My theory is that the next host you start up would become the first node
> of shard3. Worth a try (unless someone more knowledgeable tells us
> otherwise!)

> Upayavira

> On Mon, Oct 8, 2012, at 01:35 AM, Radim Kolar wrote:
>> i am reading this: http://wiki.apache.org/solr/SolrCloud section 
>> Re-sizing a Cluster
>> 
>> Its possible to add shard to an existing index? I do not need to get 
>> data redistributed, they can stay where they are, its enough for me if 
>> new entries will be distributed into new number of shards. restarting 
>> solr is fine.

Reloading ExternalFileField blocks Solr

2012-10-08 Thread Martin Koch

Hi List

We're using Solr-4.0.0-Beta with a 7M document index running on a single
host with 16 shards. We'd like to use an ExternalFileField to hold a value
that changes often. However, we've discovered that the file is apparently
re-read by every shard/core on *every commit*; the index is unresponsive in
this period (around 20s on the host we're running on). This is unacceptable
for our needs. In the future, we'd like to add other values as
ExternalFileFields, and this will make the problem worse.

It would be better if the external file were instead read in in the
background, updating previously read relevant values for each shard as they
are read in.

I guess a change in the ExternalFileField code would be required to achieve
this, but I have no experience here, so suggestions are very welcome.

Thanks,
/Martin Koch - Issuu - Senior Systems Architect.

Solr 4 spatial search - point intersects polygon

2012-10-08 Thread Jorge Suja

Hi everyone, 

I've been playing around with the new spatial search functionalities
included in the newer versions of solr (solr 4.1 and solr trunk 5.0), and
i've found something strange when I try to find a point inside a polygon
(particularly inside a square).

You can reproduce this problem using the spatial-solr-sandbox project that
has the following config for the fields:

/[...]

[...]

[...]/

I'm trying to find the following document:

/
G292223
Dubai
55.28 25.252220

/
I want to test if this point is located inside a polygon so i'm using the
following query:

/q=geohash:"Intersects(POLYGON((55.18 25.352220,55.38
25.352220,55.38 25.152220,55.18 25.152220,55.18 25.352220)))"/

As you can see, it's a small square that contains the point described
before. I get some results, but that document is not there, and the ones
returned are wrong since they are not even inside the square.

/

G1809498
Guilin
110.286390 25.281940


[...]/

However, if i change a little bit the shape of the square (just changed a
little bit one corner), it returns the result as expected

/q=geohash:"Intersects(POLYGON((55.18 25.352220,*55.48*
25.352220,55.38 25.152220,55.18 25.152220,55.18 25.352220)))"/

Now it returns a single result and it's OK

/

G292223
Dubai
55.28 25.252220

/


If i use a bbox with the same size and position than the first square, it
returns correctly the document.

/q=geohash:"Intersects(55.18 25.152220 55.38 25.352220)"



G292223
Dubai
55.28 25.252220

/

If you draw another polygon such a triangle it works well too.

I've tested this against different points and it's always the same, it seems
that if you draw a straight square (or rectangle),
it can't find the point inside it, and it returns wrong results.

Am i doing anything wrong?

Thanks in advance

Jorge



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-spatial-search-point-intersects-polygon-tp4012402.html
Sent from the Solr - User mailing list archive at Nabble.com.

I don't understand

2012-10-08 Thread Tolga


Hi,

There are two servers with the same configuration. I crawl the same URL. 
One of them is giving the following error:


Caused by: org.apache.solr.common.SolrException: ERROR: 
[doc=http://bilgisayarciniz.org/] multiple values encountered for non 
multiValued copy field text: bilgisayarciniz web hizmetleri


I really fail to understand. Why does this happen?

Regards,

PS: Neither server has multiValued=true for title field.

Re: I don't understand

2012-10-08 Thread Jan Høydahl

Hi,

Please describe your environemnt better

* How do you "crawl", using which crawler?
* To which RequestHandler do you send the docs?
* Which version of Solr
* Can you share your schema and other relevant config with us?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

8. okt. 2012 kl. 12:11 skrev Tolga :

> Hi,
> 
> There are two servers with the same configuration. I crawl the same URL. One 
> of them is giving the following error:
> 
> Caused by: org.apache.solr.common.SolrException: ERROR: 
> [doc=http://bilgisayarciniz.org/] multiple values encountered for non 
> multiValued copy field text: bilgisayarciniz web hizmetleri
> 
> I really fail to understand. Why does this happen?
> 
> Regards,
> 
> PS: Neither server has multiValued=true for title field.

solr1.4 code Example

2012-10-08 Thread Sujatha Arun

hi,

I am unable to unzip the  5883_Code.zip file for solr 1.4 from paktpub site
.I get the error message

  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.


any pointers?

Regards
Sujatha

Re: I don't understand

2012-10-08 Thread Tolga


Hi Jan, thanks for your fast reply. Below is the information you requested:

* I use nutch, using the command "nutch crawl urls -dir crawl-$(date 
+%FT%H-%M-%S) -solr http://localhost:8983/solr/ -depth 10 -topN 5"

* What do you mean "which RequestHandler"? How can I find that out?
* 3.6.1
* Both schemas are below:



sortMissingLast="true"

omitNorms="true"/>

precisionStep="0"

omitNorms="true" positionIncrementGap="0"/>






















































multiValued="true"/>








id
content





sortMissingLast="true"

omitNorms="true"/>

precisionStep="0"

omitNorms="true" positionIncrementGap="0"/>






















































multiValued="true"/>










id
content



These schemas mention Nutch because Nutch tutorial tells me to overwrite 
Solr's schema with its own.



On 10/08/2012 01:33 PM, Jan Høydahl wrote:

Hi,

Please describe your environemnt better

* How do you "crawl", using which crawler?
* To which RequestHandler do you send the docs?
* Which version of Solr
* Can you share your schema and other relevant config with us?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

8. okt. 2012 kl. 12:11 skrev Tolga :


Hi,

There are two servers with the same configuration. I crawl the same URL. One of 
them is giving the following error:

Caused by: org.apache.solr.common.SolrException: ERROR: 
[doc=http://bilgisayarciniz.org/] multiple values encountered for non 
multiValued copy field text: bilgisayarciniz web hizmetleri

I really fail to understand. Why does this happen?

Regards,

PS: Neither server has multiValued=true for title field.

Re: I don't understand

2012-10-08 Thread Tolga


Hi Jan, thanks for your fast reply. Below is the information you requested:

* I use nutch, using the command "nutch crawl urls -dir crawl-$(date 
+%FT%H-%M-%S) -solr http://localhost:8983/solr/ -depth 10 -topN 5"

* What do you mean "which RequestHandler"? How can I find that out?
* 3.6.1
* Both schemas are below:



sortMissingLast="true"

omitNorms="true"/>

precisionStep="0"

omitNorms="true" positionIncrementGap="0"/>






















































multiValued="true"/>








id
content





sortMissingLast="true"

omitNorms="true"/>

precisionStep="0"

omitNorms="true" positionIncrementGap="0"/>






















































multiValued="true"/>










id
content



These schemas mention Nutch because Nutch tutorial tells me to overwrite 
Solr's schema with its own.


Regards,

On 10/08/2012 01:33 PM, Jan Høydahl wrote:

Hi,

Please describe your environemnt better

* How do you "crawl", using which crawler?
* To which RequestHandler do you send the docs?
* Which version of Solr
* Can you share your schema and other relevant config with us?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

8. okt. 2012 kl. 12:11 skrev Tolga :


Hi,

There are two servers with the same configuration. I crawl the same URL. One of 
them is giving the following error:

Caused by: org.apache.solr.common.SolrException: ERROR: 
[doc=http://bilgisayarciniz.org/] multiple values encountered for non 
multiValued copy field text: bilgisayarciniz web hizmetleri

I really fail to understand. Why does this happen?

Regards,

PS: Neither server has multiValued=true for title field.

Re: QueryElevationComponent not working in Distributed Search

2012-10-08 Thread Erick Erickson

You shouldn't try copying files around, your comment that you
" tried replacing QueryElevationComponent.java" leads me to
think you tried that. Instead, I notice that there's a SOLR-2949.3x
patch. If you want to try that, you can apply the patch to the 3.x code
line. See "working with patches" at
http://wiki.apache.org/solr/HowToContribute

WARNING: I have no clue whether that patch will apply cleanly, nor
whether it will actually fix distrib QEV. It doesn't look like it was
applied to 3.x. Also, looking at the comments it's not clear that
it _would_ work, see Marks last comment.

What kinds of errors do you get with 4.0? It's true that a bunch
has changed, but I really don't see any other reliable way to
get distributed QEV working other than either using 4.0 or
patching 3.6... and if you do this latter you're kind of on you own.

Best
Erick

On Mon, Oct 8, 2012 at 2:21 AM, vasokan  wrote:
> Hi Erick,
>
> I cannot migrate to 4.0-ALPHA or 4.0-BETA because of the dependency in
> configuration as part of indexing in solrconfig.xml and schema.xml.
>
> When I try to use 4.0 version, I get a series of errors that pops up.  Also
> I cannot change the entire configuration files that are available to me.
>
> So I tried patching up the diffs that were available as attachments in the
> issue that I have mentioned below.
> https://issues.apache.org/jira/browse/SOLR-2949 .  But still I was facing
> some issues and tried replacing QueryElevationComponent.java from the newer
> versions.  But I still do not find the functionality of elevating to be
> working for distributed search.
>
> Can you pleae let me know if there is any mean that I can include this fix
> without migrating to newer versions.
>
> Thank you,
> Vinoth
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/QueryElevationComponent-not-working-in-Distributed-Search-tp4011785p4012382.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: add shard to index

2012-10-08 Thread Erick Erickson

Right, but even if that worked, you'd then get docs being assigned
to the wrong shard. The shard assignment would be something
like (hash(id)/3). So a document currently on shard 0 would be
indexed next time, perhaps, on shard 2, leaving two "live" docs
in your system with the same ID. Bad Things would happen
then...

I believe that currently your only real option is to re-index from
scratch when you add more shards.

I was thinking about this at one point. Unless the guys work
some magic, it will be an expensive process. Not as
expensive as re-indexing for sure, but consider 12
documents in 3 shards.

shard1 - 1, 4, 7, 10
shard2 - 2, 5, 8, 11
shard3 - 3, 6, 9, 12

Now you add a shard and the docs are re-distributed
shard1 - 1, 5, 9
shard2 - 2, 6, 10
shard3 - 3, 7, 11
shard4 - 4, 8, 12

In this simple case, only 3 out of your 12 documents stayed on the
same shard! All the rest had to be moved.

Then the indexes have to be distributed across all replicas, then

Now, there won't have to be any analysis done. You won't have to
reconstruct all of the documents from your system-of-record. You
won't have to a _ton_ of work that you originally had to do. This should
be enormously faster than re-indexing. But it still won't be
something to casually do on a live system under load .

Disclaimer: I really may be talking through my hat here, but this _sounds_
right.

FWIW
Erick

On Mon, Oct 8, 2012 at 4:33 AM, Upayavira  wrote:
> Given that Solr does not support distributed IDF, adding a shard without
> balancing the number of documents could seriously skew your scoring. If
> you are okay with that, then the next question is what happens if you
> download the clusterstate.json from ZooKeeper, and add another entry,
> along the lines of "shard3":{}, then upload it again, what would happen
> then?
>
> My theory is that the next host you start up would become the first node
> of shard3. Worth a try (unless someone more knowledgeable tells us
> otherwise!)
>
> Upayavira
>
> On Mon, Oct 8, 2012, at 01:35 AM, Radim Kolar wrote:
>> i am reading this: http://wiki.apache.org/solr/SolrCloud section
>> Re-sizing a Cluster
>>
>> Its possible to add shard to an existing index? I do not need to get
>> data redistributed, they can stay where they are, its enough for me if
>> new entries will be distributed into new number of shards. restarting
>> solr is fine.

Re: I don't understand

2012-10-08 Thread Erick Erickson

Well, the schemas are different. The first schema doesn't have a
copyField directive anywhere in it and the second one does.

And the  is in a non-standard place anyway, it's
usually outside the  tag. Kind of surprising it works
at all there, now I've got to go figure out why .

Anyway apparently you've edited the schemas inconsistently.
and this copyField will never work unless the text field is multiValued...

Best
Erick

On Mon, Oct 8, 2012 at 7:11 AM, Tolga  wrote:
> Hi Jan, thanks for your fast reply. Below is the information you requested:
>
> * I use nutch, using the command "nutch crawl urls -dir crawl-$(date
> +%FT%H-%M-%S) -solr http://localhost:8983/solr/ -depth 10 -topN 5"
> * What do you mean "which RequestHandler"? How can I find that out?
> * 3.6.1
> * Both schemas are below:
>
> 
> 
>  sortMissingLast="true"
> omitNorms="true"/>
>  omitNorms="true" positionIncrementGap="0"/>
>  precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>  omitNorms="true" positionIncrementGap="0"/>
>
>  positionIncrementGap="100">
> 
> 
>  ignoreCase="true" words="stopwords.txt"/>
>  generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> 
>  positionIncrementGap="100">
> 
> 
> 
>  generateWordParts="1" generateNumberParts="1"/>
> 
> 
> 
> 
> 
>
> 
> 
> 
> 
>
> 
> 
>  required="true"/>
> 
> 
> 
> 
>
> 
>  multiValued="true"/>
>
> 
>  multiValued="true"/>
>  indexed="false"/>
>  indexed="false"/>
> 
>
> 
> 
>
> 
>  indexed="true" multiValued="true"/>
>
> 
> 
>  multiValued="true"/>
> 
>  indexed="true"/>
>  indexed="true"/>
>
> 
>  multiValued="true"/>
> 
> id
> content
> 
> 
>
> 
> 
>  sortMissingLast="true"
> omitNorms="true"/>
>  omitNorms="true" positionIncrementGap="0"/>
>  precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>  omitNorms="true" positionIncrementGap="0"/>
>
>  positionIncrementGap="100">
> 
> 
>  ignoreCase="true" words="stopwords.txt"/>
>  generateWordParts="1" generateNumberParts="1"
> catenateWords="1" catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> 
>  positionIncrementGap="100">
> 
> 
> 
>  generateWordParts="1" generateNumberParts="1"/>
> 
> 
> 
> 
> 
>
> 
> 
> 
> 
>
> 
> 
>  required="true"/>
> 
> 
> 
> 
>
> 
>  multiValued="true"/>
>
> 
>  multiValued="true"/>
>  indexed="false"/>
>  indexed="false"/>
> 
>
> 
> 
>
> 
>  indexed="true" multiValued="true"/>
>
> 
> 
>  multiValued="true"/>
> 
> 
>  indexed="true"/>
>  indexed="true"/>
>
> 
>  multiValued="true"/>
> 
> 
> id
> content
> 
> 
>
> These schemas mention Nutch because Nutch tutorial tells me to overwrite
> Solr's schema with its own.
>
> Regards,
>
>
> On 10/08/2012 01:33 PM, Jan Høydahl wrote:
>>
>> Hi,
>>
>> Please describe your environemnt better
>>
>> * How do you "crawl", using which crawler?
>> * To which RequestHandler do you send the docs?
>> * Which version of Solr
>> * Can you share your schema and other relevant config with us?
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>>
>> 8. okt. 2012 kl. 12:11 skrev Tolga :
>>
>>> Hi,
>>>
>>> There are two servers with the same configuration. I crawl the s

Re: solr 1.4.1 -> 3.6.1; SOLR-758

2012-10-08 Thread Jack Krupansky

The Extended Dismax query parser (edismax) mostly "obsoletes" Dismax except 
in the sense that some apps prefer the restricted syntax of Dismax:


http://wiki.apache.org/solr/ExtendedDisMax

-- Jack Krupansky

-Original Message- 
From: Patrick Kirsch

Sent: Monday, October 08, 2012 2:32 AM
To: solr-user@lucene.apache.org
Subject: solr 1.4.1 -> 3.6.1; SOLR-758

Regarding https://issues.apache.org/jira/browse/SOLR-758 (Enhance
DisMaxQParserPlugin to support full-Solr syntax and to support alternate
escaping strategies.)

I'm updating from solr 1.4.1 to 3.6.1 (I'm aware that it is not beautiful).
After applying the attached patches to 3.6.1 I'm experiencing this problem:
 - SEVERE: org.apache.solr.common.SolrException: Error Instantiating
QParserPlugin, org.apache.solr.search.AdvancedQParserPlugin is not a
org.apache.solr.search.QParserPlugin
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:421)
at
org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:441)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1612)
[...]
   These patches seems no valid anymore.

Which leads me to the more experienced users here:

- Although not directly mentioned in
https://issues.apache.org/jira/browse/SOLR-758, is there any other (new)
QParser which obsoletes the DisMax?

- Futhermore I tried to make the patches apply ("forward porting"), but
always get the error "Error Instantiating QParserPlugin,
org.apache.solr.search.AdvancedQParserPlugin is not a
org.apache.solr.search.QParserPlugin", although the class dependency is
linear:

./core/src/java/org/apache/solr/search/AdvancedQParserPlugin.java:
[...]
public class AdvancedQParserPlugin extends DisMaxQParserPlugin {
[...]

./core/src/java/org/apache/solr/search/DisMaxQParserPlugin.java:
[...]
public class DisMaxQParserPlugin extends QParserPlugin {
[...]


Thanks,
 Patrick

Re: solr1.4 code Example

2012-10-08 Thread Toke Eskildsen

On Mon, 2012-10-08 at 13:08 +0200, Sujatha Arun wrote:
> I am unable to unzip the  5883_Code.zip file for solr 1.4 from paktpub site
> .I get the error message
> 
>   End-of-central-directory signature not found. [...]

It is a corrupt ZIP-file. I'm guessing you got it from
http://www.packtpub.com/files/code/5883_Code.zip
I tried downloading the archive and it was indeed corrupt. You can read
some of the files by using jar for unpacking: 'jar xvf 5883_Code.zip'.

You'll need to contact packtpub to get them to fix it peroperly. A quick
search indicates that they've had problems before:
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201005.mbox/%
3c4bf66e8f.4070...@shoptimax.de%3E

Re: long query response time in shards search

2012-10-08 Thread Jack Krupansky

What release of Solr are you on? Solr 4.0 has improved wildcard support (FST 
"automatons".) But even then, such heavy use of wildcards may be 
problematic.


If you intend to use wildcard in that manner, you might want to create a 
customer stemming filter that does that stemming at index time (and query 
time) so you don't need to do such heavy wildcarding.


Do these complex queries always run slow (the first time each is tried) or 
just sometimes or some of the queries? (Solr will cache the results of a 
given query so that the next time the same results can be returned without 
re-querying the index.)


-- Jack Krupansky

-Original Message- 
From: Jason

Sent: Monday, October 08, 2012 12:26 AM
To: solr-user@lucene.apache.org
Subject: Re: long query response time in shards search

Hi, Otis
Thanks your reply.

yes, all cores are in same server.

* what do you consider "too long"?
just id(key) query response takes too long.
almost id(key) query response takes under 10ms.
example
-
2012-10-05 16:38:32,078 [http-8080-exec-3979] INFO
org.apache.solr.core.SolrCore - [usp00] webapp=/solr_us path=/select
params={rows=1&shards=usp00,usp01,usp02,usp03,usp04,usp05&fl=cin,score&start=0&q=id:(US200840881A1)}
status=0 QTime=164085

* how many queries are running concurrently?
approximately 5 to 10 queries.
but queries are very complex. complex means many terms include wildcard.

* can you show some example queries?
example
-
q=(angiogenesis*+OR+neovascula*+OR+(vessel*+OR+vascula*)+N+(proliferat*+OR+growth*))+5N+(inhibit*+OR+prevent*+OR+treat*+OR+thera*+OR+medic*)+AND+(ibd+OR+crohn*+OR+behcet*+OR+inflammat*+2N+(bowel*+OR+intestin*+OR+colitis*+OR+enteritis*+OR+gastroenteritis*)+OR+ulcerative*+W+colitis*+OR+intestin*+W+behcet*+OR+macula*+W+degenerat*+OR+amd+OR+armd)

* how many CPU cores does your server have?
32 cores (server has 4 CPU and 8 cores in each CPU.)
128G RAM

Also, total index for all cores include 15million docs and size is 400G.

complex queries are problem??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-query-response-time-in-shards-search-tp4012366p4012378.html
Sent from the Solr - User mailing list archive at Nabble.com.

search by multiple 'LIKE' operator connected with 'AND' operator

2012-10-08 Thread gremlin

Hi.

I have a trouble with SOLR configuration. Just want to implement
configuration that would be operate with index like MySQL query: field_name
LIKE '%foo%' AND field_name LIKE '%bar%'.

So, for example, I have 4 indexed titles:
'Kathy Lee',
'Kathy Norris',
'Kathy Davies',
'Kathy Bird'

and with my query Kathy Norris I receive all these indexes. Quoted query
give no results at all.

latest field definition that I've try (very simple, just for tests):

  



  
  



  


Also I've try field with ShingleFilterFactory, also ShingleFilterFactory
combined with NGrams. But no results.

Btw. I have default solr configuration for drupal search_api_solr module,
just modified with a new request handler.

Trying different configurations not give expected results.

Thanks for help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-by-multiple-LIKE-operator-connected-with-AND-operator-tp4012536.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Storing queries in Solr

2012-10-08 Thread Jorge Luis Betancourt Gonzalez

Thanks for the quick response, I'm trying to get a suggester query, I found odd 
the being a very common issue solr doesn't provide any built in mechanism for 
query suggestions, but implementing the other components isn't so hard either.

Greetiings!

On Oct 8, 2012, at 3:38 AM, Upayavira wrote:

> Solr has a small query cache, but this does not hold queries for any
> length of time, so won't suit your purpose.
> 
> The LucidWorks Search product has (I believe) a click tracking feature,
> but that is about boosting documents that are clicked on, not specific
> search terms. Parsing the Solr log, or pushing query terms to a
> different core/index would really be the only way to achieve what you're
> suggesting, as far as I am aware.
> 
> Processing logs would be preferable anyhow, as you don't really want to
> be triggering an index write during each query (assuming you have more
> queries than updates to your main index), and also if this is for
> building a suggester index, then it is unlikely to need updating that
> regularly - every hour or every day should be more than sufficient. You
> could write a SearchComponent that logs queries in another format,
> should the existing log format not be sufficient for you.
> 
> Upayavira
> 
> On Mon, Oct 8, 2012, at 01:24 AM, Jorge Luis Betancourt Gonzalez wrote:
>> Hi!
>> 
>> I was wondering if there are any built-in mechanism that allow me to
>> store the queries made to a solr server inside the index itself. I know
>> that the suggester module exist, but as far as I know it only works for
>> terms existing in the index, and not with queries. I remember reading
>> about using some external program to parse the solr log and pushing the
>> queries or any other interesting data into the index, is this the only
>> way of accomplish this?
>> 
>> Greetings!
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> 
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Wildcards and fuzzy/phonetic query

2012-10-08 Thread Hågen Pihlstrøm Hasle

Hi!

I'm quite new to Solr, I was recently asked to help out on a project where the 
previous "Solr-person" quit quite suddenly.  I've noticed that some of our 
searches don't return the expected result, and I'm hoping you guys can help me 
out.

We've indexed a lot of names, and would like to search for a person in our 
system using these names.  We previously used Oracle Text for this, and we 
experience that Solr is much faster.  So far so good! :)  But when we try to 
use wildcards things start to to wrong.

We're using Solr 3.4, and I see that some of our problems are solved in 3.6.  
Ref SOLR-2438:
https://issues.apache.org/jira/browse/SOLR-2438

But we would also like to be able to combine wildcards with fuzzy searches, and 
wildcards with a phonetic filter.  I don't see anything about phonetic filters 
in SOLR-2438 or SOLR-2921.  (https://issues.apache.org/jira/browse/SOLR-2921)  
Is it possible to make the phonetic filters MultiTermAware?

Regarding fuzzy queries, in Oracle Text I can search for "chr%" ("chr*" in 
Solr..) and find both christian and kristian.  As far as I understand, this is 
not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined.  Is this 
correct, or have I misunderstood anything?  Are there any workarounds or 
filter-combinations I can use to achieve the same result?  I've seen people 
suggest using a boolean query to combine the two, but I don't really see how 
that would solve my "chr*"-problem.

As I mentioned earlier I'm quite new to this, so I apologize if what I'm asking 
about only shows my ignorance..


Regards, Hågen

Re: search by multiple 'LIKE' operator connected with 'AND' operator

2012-10-08 Thread Jack Krupansky

The PositionFilterFactory is probably preventing phrase queries from 
working. What are you expecting it to do? It basically means query if all 
the quoted terms occur at the same position.


SQL "like" is comparable to Lucene wildcard, but change the "%" to "*" and 
"_" to "?".


-- Jack Krupansky

-Original Message- 
From: gremlin

Sent: Monday, October 08, 2012 10:47 AM
To: solr-user@lucene.apache.org
Subject: search by multiple 'LIKE' operator connected with 'AND' operator

Hi.

I have a trouble with SOLR configuration. Just want to implement
configuration that would be operate with index like MySQL query: field_name
LIKE '%foo%' AND field_name LIKE '%bar%'.

So, for example, I have 4 indexed titles:
'Kathy Lee',
'Kathy Norris',
'Kathy Davies',
'Kathy Bird'

and with my query Kathy Norris I receive all these indexes. Quoted query
give no results at all.

latest field definition that I've try (very simple, just for tests):

 
   
   
   
 
 
   
   
   
 


Also I've try field with ShingleFilterFactory, also ShingleFilterFactory
combined with NGrams. But no results.

Btw. I have default solr configuration for drupal search_api_solr module,
just modified with a new request handler.

Trying different configurations not give expected results.

Thanks for help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-by-multiple-LIKE-operator-connected-with-AND-operator-tp4012536.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Storing queries in Solr

2012-10-08 Thread Gérard Dupont

Hi Jorge,

As far as I know, there isn't built-in component to achieve such function
in Solr (maybe in latest 4.1 that I didn't explored in depth yet). However
I've done myself in the past using different approaches.

The first one is similar to Upayavira's suggestion ans uses an independent
index where queries and clicks where stored in order to make "popular
queries" suggestion and/or document suggestions. My second implementation
was using a dedicated field on the original documents' index in order to
add terms of queries that lead to a click on each particular document (ie
re-indexing the document with a new field) and using this field as boosted
terms and/or document suggestion. However this later solution is likely to
not scale very well especially if your document index is very dynamic (my
particular case relied on almost static documents repository).

Finally, remember that exploiting queries and clicks may lead to private
data management issues.Since you're storing their queries, warn your users
appropriately.

br,

gdupont

On 8 October 2012 02:24, Jorge Luis Betancourt Gonzalez  wrote:

> Hi!
>
> I was wondering if there are any built-in mechanism that allow me to store
> the queries made to a solr server inside the index itself. I know that the
> suggester module exist, but as far as I know it only works for terms
> existing in the index, and not with queries. I remember reading about using
> some external program to parse the solr log and pushing the queries or any
> other interesting data into the index, is this the only way of accomplish
> this?
>
> Greetings!
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>

-- 
Gérard Dupont
Information Processing Control and Cognition (IPCC)
CASSIDIAN - an EADS company

Document & Learning team - LITIS Laboratory

Re: Wildcards and fuzzy/phonetic query

2012-10-08 Thread Jack Krupansky

A regular expression term may provide what you want, but not exactly. Maybe 
something like:


/(ch|k)r.*/

(No guarantee that will actually work.)

See:
http://lucene.apache.org/core/4_0_0-BETA/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Regexp_Searches

And probably slower than desirable.

-- Jack Krupansky

-Original Message- 
From: Hågen Pihlstrøm Hasle

Sent: Monday, October 08, 2012 11:21 AM
To: solr-user@lucene.apache.org
Subject: Wildcards and fuzzy/phonetic query

Hi!

I'm quite new to Solr, I was recently asked to help out on a project where 
the previous "Solr-person" quit quite suddenly.  I've noticed that some of 
our searches don't return the expected result, and I'm hoping you guys can 
help me out.


We've indexed a lot of names, and would like to search for a person in our 
system using these names.  We previously used Oracle Text for this, and we 
experience that Solr is much faster.  So far so good! :)  But when we try to 
use wildcards things start to to wrong.


We're using Solr 3.4, and I see that some of our problems are solved in 3.6. 
Ref SOLR-2438:

https://issues.apache.org/jira/browse/SOLR-2438

But we would also like to be able to combine wildcards with fuzzy searches, 
and wildcards with a phonetic filter.  I don't see anything about phonetic 
filters in SOLR-2438 or SOLR-2921. 
(https://issues.apache.org/jira/browse/SOLR-2921)

Is it possible to make the phonetic filters MultiTermAware?

Regarding fuzzy queries, in Oracle Text I can search for "chr%" ("chr*" in 
Solr..) and find both christian and kristian.  As far as I understand, this 
is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined. 
Is this correct, or have I misunderstood anything?  Are there any 
workarounds or filter-combinations I can use to achieve the same result? 
I've seen people suggest using a boolean query to combine the two, but I 
don't really see how that would solve my "chr*"-problem.


As I mentioned earlier I'm quite new to this, so I apologize if what I'm 
asking about only shows my ignorance..



Regards, Hågen=

Re: SolrJ - IOException

2012-10-08 Thread Briggs Thompson

I have also just ran into this a few times over the weekend in a newly
deployed system. We are running Solr 4.0 Beta (not using SolrCloud) and it
is hosted via AWS.

I have a RabbitMQ consumer that reads updates from a queue and posts
updates to Solr via SolrJ. There is quite a bit of error handling around
the indexing request, and even if Solr is not live the consumer application
successfully logs the exception and attempts to move along in the queue.
There are two consumer applications running at once, and at times processes
400 requests per minute. The high volume times is not necessarily when this
problem occurs, though.

This exception is causing the entire application to hang - which is
surprising considering all SolrJ logic is wrapped with try/catches. Has
anyone found out more information regarding the possible keep alive bug?
Any insight is much appreciated.

Thanks,
Briggs Thompson


Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector
tryExecute
INFO: I/O exception (java.net.SocketException) caught when processing
request: Broken pipe
Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector
tryExecute
INFO: Retrying request
Oct 8, 2012 7:25:48 AM com.<>.rabbitmq.worker.SolrWriter work
SEVERE: {"id":4049703,"datetime":"2012-10-08 07:22:05"}
IOException occured when talking to server at:
<
server>
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at:
<
server>
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:362)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:69)
at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:96)
at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:79)
at com.<>.solr.SolrIndexService.IndexCoupon(SolrIndexService.java:57)
at com.<>.solr.SolrIndexService.Index(SolrIndexService.java:36)
at com.<>.rabbitmq.worker.SolrWriter.work(SolrWriter.java:47)
at com.<>.rabbitmq.job.Runner.run(Runner.java:84)
at com.<>.rabbitmq.job.SolrConsumer.main(SolrConsumer.java:10)
Caused by: org.apache.http.client.ClientProtocolException
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:306)
... 10 more
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
retry request with a non-repeatable request entity. The cause lists the
reason the original request failed.
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:686)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
... 13 more
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at
org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:147)
at
org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:154)
at
org.apache.http.impl.conn.LoggingSessionOutputBuffer.flush(LoggingSessionOutputBuffer.java:95)
at
org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:178)
at
org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:72)
at
org.apache.http.entity.mime.HttpMultipart.doWriteTo(HttpMultipart.java:206)
at org.apache.http.entity.mime.HttpMultipart.writeTo(HttpMultipart.java:224)
at
org.apache.http.entity.mime.MultipartEntity.writeTo(MultipartEntity.java:183)
at
org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
at
org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
at
org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
at
org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
at
org.apache.http.impl.conn.AbstractClientConnAdapter.sendRequestEntity(AbstractClientConnAdapter.java:227)
at
org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExec

Re: multivalued filed question (FieldCache error)

2012-10-08 Thread giovanni.bricc...@banzai.it


Thank you very much!

I've singlelined, spaced removed every fl field in my solrconfig and now 
the app works fine


Giovanni

Il 05/10/12 20:49, Chris Hostetter ha scritto:

: So extracting the attachment you will be able to track down what appens
:
: this is the query that shows the error, and below you can see the latest stack
: trace and the qt definition

Awesome -- exactly what we needed.

I've reproduced your problem, and verified that it has something to do
with the extra newlines which are confusing the parsing into not
recognizing "store_slug" as a simple field name.

The workarround is to modify the fl in your config to look like this...

  sku,store_slug

...or even like this...

 sku,  store_slug   

...and then it should work fine.

having a newline immediately following the store_slug field name is
somehow confusing things, and making it not recognize "store_slug" as a
simple field name -- so then it tries to parse it as a function, and
since bare field names can also be used as functions that parsing works,
but then you get the error that the field can't be used as a function
since it's multivalued.

I'll try to get a fix for this into 4.0-FINAL...

https://issues.apache.org/jira/browse/SOLR-3916

-Hoss

Re: search by multiple 'LIKE' operator connected with 'AND' operator

2012-10-08 Thread gremlin

Disabling PositionFilterFactory is totally break multiword search, and I
could find titles only by single word.

Default solr.TextField field with WhitespaceTokenizerFactory returns only
complete words match, enabling NGramFilterFactory for that field doesn't do
anything for me. If I use field described I coud find by both words, but no
'both at a time', just 'one of any'.
TextField field copied by copyField into NGram field also doesn't helps.

Maybe I miss something from schema configuration?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-by-multiple-LIKE-operator-connected-with-AND-operator-tp4012536p4012554.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Wildcards and fuzzy/phonetic query

2012-10-08 Thread Erick Erickson

whether phonetic filters can be multiterm aware:

I'd be leery of this, as I basically don't quite know how that would
behave. You'd have to insure that the  algorithms changed the
first parts of the words uniformly, regardless of what followed. I'm
pretty sure that _some_ phonetic algorithms do not follow this
pattern, i.e. eric wouldn't necessarily have the same beginning
as erickson. That said, some of the algorithms _may_ follow this
rule and might be OK candidates for being MultiTermAware

But, you don't need this in order to try it out. See the "Expert Level
Schema Possibilities"
at:
http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/

You can define your own analysis chain for wildcards as part of your 
definition and include whatever you want, whether or not it's
MultiTermAware and it
will be applied at query time. Use the  entry
as a basis. _But_ you shouldn't include anything in this section that
produces more than one output per input token. Note, "token", not
"field". I.e. a really bad candidate for this section is
WordDelimiterFilterFactory
if you use the admin/analysis page (which you'll get to know intimately) and
look at a type that has WordDelimiterFilterFactory in its chain and
put something
like erickErickson1234, you'll see what I mean.. Make sure and check the
"verbose" box

If you can determine that some of the phonetic algorithms _should_ be
MultiTermAware, please feel free to raise a JIRA and we can discuss... I suspect
it'll be on a case-by-case basis.

Best
Erick

On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle
 wrote:
> Hi!
>
> I'm quite new to Solr, I was recently asked to help out on a project where 
> the previous "Solr-person" quit quite suddenly.  I've noticed that some of 
> our searches don't return the expected result, and I'm hoping you guys can 
> help me out.
>
> We've indexed a lot of names, and would like to search for a person in our 
> system using these names.  We previously used Oracle Text for this, and we 
> experience that Solr is much faster.  So far so good! :)  But when we try to 
> use wildcards things start to to wrong.
>
> We're using Solr 3.4, and I see that some of our problems are solved in 3.6.  
> Ref SOLR-2438:
> https://issues.apache.org/jira/browse/SOLR-2438
>
> But we would also like to be able to combine wildcards with fuzzy searches, 
> and wildcards with a phonetic filter.  I don't see anything about phonetic 
> filters in SOLR-2438 or SOLR-2921.  
> (https://issues.apache.org/jira/browse/SOLR-2921)
> Is it possible to make the phonetic filters MultiTermAware?
>
> Regarding fuzzy queries, in Oracle Text I can search for "chr%" ("chr*" in 
> Solr..) and find both christian and kristian.  As far as I understand, this 
> is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined.  Is 
> this correct, or have I misunderstood anything?  Are there any workarounds or 
> filter-combinations I can use to achieve the same result?  I've seen people 
> suggest using a boolean query to combine the two, but I don't really see how 
> that would solve my "chr*"-problem.
>
> As I mentioned earlier I'm quite new to this, so I apologize if what I'm 
> asking about only shows my ignorance..
>
>
> Regards, Hågen

Re: SolrJ - IOException

2012-10-08 Thread Briggs Thompson

Also note there were no exceptions in the actual Solr log, only on the
SolrJ side.

Thanks,
Briggs

On Mon, Oct 8, 2012 at 10:45 AM, Briggs Thompson <
w.briggs.thomp...@gmail.com> wrote:

> I have also just ran into this a few times over the weekend in a newly
> deployed system. We are running Solr 4.0 Beta (not using SolrCloud) and it
> is hosted via AWS.
>
> I have a RabbitMQ consumer that reads updates from a queue and posts
> updates to Solr via SolrJ. There is quite a bit of error handling around
> the indexing request, and even if Solr is not live the consumer application
> successfully logs the exception and attempts to move along in the queue.
> There are two consumer applications running at once, and at times processes
> 400 requests per minute. The high volume times is not necessarily when this
> problem occurs, though.
>
> This exception is causing the entire application to hang - which is
> surprising considering all SolrJ logic is wrapped with try/catches. Has
> anyone found out more information regarding the possible keep alive bug?
> Any insight is much appreciated.
>
> Thanks,
> Briggs Thompson
>
>
> Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector
> tryExecute
> INFO: I/O exception (java.net.SocketException) caught when processing
> request: Broken pipe
> Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector
> tryExecute
> INFO: Retrying request
> Oct 8, 2012 7:25:48 AM com.<>.rabbitmq.worker.SolrWriter work
> SEVERE: {"id":4049703,"datetime":"2012-10-08 07:22:05"}
> IOException occured when talking to server at: 
> <
> server>
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: 
> <
> server>
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:362)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
> at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:69)
> at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:96)
> at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:79)
> at com.<>.solr.SolrIndexService.IndexCoupon(SolrIndexService.java:57)
> at com.<>.solr.SolrIndexService.Index(SolrIndexService.java:36)
> at com.<>.rabbitmq.worker.SolrWriter.work(SolrWriter.java:47)
> at com.<>.rabbitmq.job.Runner.run(Runner.java:84)
> at com.<>.rabbitmq.job.SolrConsumer.main(SolrConsumer.java:10)
> Caused by: org.apache.http.client.ClientProtocolException
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:306)
> ... 10 more
> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity. The cause lists the
> reason the original request failed.
> at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:686)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> ... 13 more
> Caused by: java.net.SocketException: Broken pipe
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at
> org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:147)
> at
> org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:154)
> at
> org.apache.http.impl.conn.LoggingSessionOutputBuffer.flush(LoggingSessionOutputBuffer.java:95)
> at
> org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:178)
> at
> org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:72)
> at
> org.apache.http.entity.mime.HttpMultipart.doWriteTo(HttpMultipart.java:206)
> at
> org.apache.http.entity.mime.HttpMultipart.writeTo(HttpMultipart.java:224)
> at
> org.apache.http.entity.mime.MultipartEntity.writeTo(MultipartEntity.java:183)
> at
> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
> at
> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
> at
> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
> at
> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(Abst

Re: Problem with relating values in two multi value fields

2012-10-08 Thread Mikhail Khludnev

Toke,
You are absolutely right, concatenating term is a possible solution. I
found faceting is quite complicated in this case, but it was a hot fix
which we delivered to production.

Torben,
This problem arise quite often, beside of these two approaches discussed
there, also possible to approach SpanQueries and TermPositions - you can
check our experience here:
http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html
http://vimeo.com/album/2012142/video/33817062
Our current way is BlockJoin which is really performant in case of batched
updates: http://blog.griddynamics.com/2012/08/block-join-query-performs.html.
Bad thing that there is no open facet component for block join. We
have a
code, but are not ready to share it, yet.

On Mon, Oct 8, 2012 at 12:44 PM, Toke Eskildsen wrote:

> On Mon, 2012-10-08 at 08:42 +0200, Torben Honigbaum wrote:
> > sorry, my fault. This was one of my first ideas. My problem is, that
> > I've 1.000.000 documents, each with about 20 attributes. Additionally
> > each document has between 200 and 500 option-value pairs. So if I
> > denormalize the data, it means that I've 1.000.000 x 350 (200 + 500 /
> > 2) = 350.000.000 documents, each with 20 attributes.
>
> If you have a few hundred or less distinct primary attributes (the A, B,
> C's in your example), you could create a new field for each of them:
>
> 
>   3
>   A B C D
>   200
>   400
>   240
>   310
>   ...
>   ...
> 
>
> Query for "options:A" and facet on field "option_A" to get facets for
> the specific field.
>
> This normalization does increase the index size due to duplicated
> secondary values between the option-fields, but since our assumption is
> a relatively small amount of primary values, it should not be too much.
>
>
> Alternatively, if you have many distinct primary attributes, index the
> pairs as Jack suggests:
> 
>   3
>   A B C D
>   A=200
>   B=400
>   C=240
>   D=310
>   ...
>   ...
> 
>
> Query for "options:A" and facet on field "option" with
> field.prefix="A=". Your result will be A=200 (2), A=450 (1)... so you'll
> have to strip "=" before display.
>
> This normalization is potentially a lot heavier than the previous one,
> as we have distinct_primaries * distinct_secondaries distinct values.
>
> Worst case, where every document only contains distinct combinations of
> primary/secondary, we have 350M distinct option-values, which is quite
> heavy for a single box to facet on. Whether that is better or worse that
> 350M documents, I don't know.
>
> > Is denormalization the only way to handle this problem? I
>
> What you are trying to do does look quite a lot like hierarchical
> faceting, which Solr does not support directly. But even if you apply
> one of the experimental patches, it does not mitigate the potential
> combinatorial explosion of your primary & secondary values.
>
> So that leaves the question: How many distinct combinations of primary
> and secondary values do you have?
>
> Regards,
> Toke Eskildsen
>
>


-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

Re: Reloading ExternalFileField blocks Solr

2012-10-08 Thread Mikhail Khludnev

Martin,

Can you tell me what's the content of that field, and how it should affect
search result?

On Mon, Oct 8, 2012 at 12:55 PM, Martin Koch  wrote:

> Hi List
>
> We're using Solr-4.0.0-Beta with a 7M document index running on a single
> host with 16 shards. We'd like to use an ExternalFileField to hold a value
> that changes often. However, we've discovered that the file is apparently
> re-read by every shard/core on *every commit*; the index is unresponsive in
> this period (around 20s on the host we're running on). This is unacceptable
> for our needs. In the future, we'd like to add other values as
> ExternalFileFields, and this will make the problem worse.
>
> It would be better if the external file were instead read in in the
> background, updating previously read relevant values for each shard as they
> are read in.
>
> I guess a change in the ExternalFileField code would be required to achieve
> this, but I have no experience here, so suggestions are very welcome.
>
> Thanks,
> /Martin Koch - Issuu - Senior Systems Architect.
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

Re: Wildcards and fuzzy/phonetic query

2012-10-08 Thread Otis Gospodnetic

Hi,

Consider looking into synonyms and ngrams.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 8, 2012 11:21 AM, "Hågen Pihlstrøm Hasle" 
wrote:

> Hi!
>
> I'm quite new to Solr, I was recently asked to help out on a project where
> the previous "Solr-person" quit quite suddenly.  I've noticed that some of
> our searches don't return the expected result, and I'm hoping you guys can
> help me out.
>
> We've indexed a lot of names, and would like to search for a person in our
> system using these names.  We previously used Oracle Text for this, and we
> experience that Solr is much faster.  So far so good! :)  But when we try
> to use wildcards things start to to wrong.
>
> We're using Solr 3.4, and I see that some of our problems are solved in
> 3.6.  Ref SOLR-2438:
> https://issues.apache.org/jira/browse/SOLR-2438
>
> But we would also like to be able to combine wildcards with fuzzy
> searches, and wildcards with a phonetic filter.  I don't see anything about
> phonetic filters in SOLR-2438 or SOLR-2921.  (
> https://issues.apache.org/jira/browse/SOLR-2921)
> Is it possible to make the phonetic filters MultiTermAware?
>
> Regarding fuzzy queries, in Oracle Text I can search for "chr%" ("chr*" in
> Solr..) and find both christian and kristian.  As far as I understand, this
> is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined.
>  Is this correct, or have I misunderstood anything?  Are there any
> workarounds or filter-combinations I can use to achieve the same result?
>  I've seen people suggest using a boolean query to combine the two, but I
> don't really see how that would solve my "chr*"-problem.
>
> As I mentioned earlier I'm quite new to this, so I apologize if what I'm
> asking about only shows my ignorance..
>
>
> Regards, Hågen

Re: solr 1.4.1 -> 3.6.1; SOLR-758

2012-10-08 Thread Chris Hostetter


: Regarding https://issues.apache.org/jira/browse/SOLR-758 (Enhance
: DisMaxQParserPlugin to support full-Solr syntax and to support alternate
: escaping strategies.)

FWIW: i'm not really sure what/how that issue relates to the problem you 
are seeing (or how you *think* it relates to hte problem you are seeing) 
... so i'm just going to focus on the specifics of your error... 

: After applying the attached patches to 3.6.1 I'm experiencing this problem:

The mailing list typically rejects patches - none came with your message.

:  - SEVERE: org.apache.solr.common.SolrException: Error Instantiating
: QParserPlugin, org.apache.solr.search.AdvancedQParserPlugin is not a
: org.apache.solr.search.QParserPlugin

Besides the obvious problem of not extending the expect class, the other 
posibility is that when compiling you "AdvancedQParserPlugin" you may be 
compailing against the wrong version of solr -- ie: you could get this 
error if the AdvancedQParserPlugin.class file you have was generated when 
your AdvancedQParserPlugin.java file was compiled against a different 
QParserPlugin.class then the one in use at runtime.



-Hoss

Re: solr1.4 code Example

2012-10-08 Thread Sujatha Arun

did get some files by jar unpacking ,but could not get the  ones I wanted
...thanks anyway !!

On Mon, Oct 8, 2012 at 5:56 PM, Toke Eskildsen wrote:

> On Mon, 2012-10-08 at 13:08 +0200, Sujatha Arun wrote:
> > I am unable to unzip the  5883_Code.zip file for solr 1.4 from paktpub
> site
> > .I get the error message
> >
> >   End-of-central-directory signature not found. [...]
>
> It is a corrupt ZIP-file. I'm guessing you got it from
> http://www.packtpub.com/files/code/5883_Code.zip
> I tried downloading the archive and it was indeed corrupt. You can read
> some of the files by using jar for unpacking: 'jar xvf 5883_Code.zip'.
>
> You'll need to contact packtpub to get them to fix it peroperly. A quick
> search indicates that they've had problems before:
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201005.mbox/%
> 3c4bf66e8f.4070...@shoptimax.de%3E
>
>
>

Re: Wildcards and fuzzy/phonetic query

2012-10-08 Thread Hågen Pihlstrøm Hasle

I guess synonyms would give me a similar result as using regexes, like Jack 
wrote about.  

I've thought about that, but I don't think it would be good enough.  
Substituting "k" for "ch" is easy enough, but the problem is that I have to 
think of every possible substitution in advance.  I'd like "Fil*" to find 
Phillip, I'd like "Hen*" to find "Hansen", and so on.  The possibilities are 
quite endless, and I can't think of them all.  I can't limit myself to 
Norwegian names either, a lot of people living in Norway have names from other 
countries.  I'd like "Moha*" to find "Mouhammed", etc..  Or am I too 
pessimistic?

I haven't read enough about Ngrams yet, so I'm not sure if I've understood it 
properly.  It divides the word into several pieces and tries to find one or 
more matches?  Would that really help in my "Chr*" example?  I guess you mean 
the combination of synonyms and ngrams?  

Is it possible to combine ngrams with a fuzzy query?  So that every piece of a 
word is matched in a fuzzy way?  Could that help me?

I'll certainly look into ngrams more, thanks for the suggestion.

Regards, Hågen  

On Oct 8, 2012, at 7:23 PM, Otis Gospodnetic wrote:

> Hi,
> 
> Consider looking into synonyms and ngrams.
> 
> Otis
> --
> Performance Monitoring - http://sematext.com/spm
> On Oct 8, 2012 11:21 AM, "Hågen Pihlstrøm Hasle" 
> wrote:
> 
>> Hi!
>> 
>> I'm quite new to Solr, I was recently asked to help out on a project where
>> the previous "Solr-person" quit quite suddenly.  I've noticed that some of
>> our searches don't return the expected result, and I'm hoping you guys can
>> help me out.
>> 
>> We've indexed a lot of names, and would like to search for a person in our
>> system using these names.  We previously used Oracle Text for this, and we
>> experience that Solr is much faster.  So far so good! :)  But when we try
>> to use wildcards things start to to wrong.
>> 
>> We're using Solr 3.4, and I see that some of our problems are solved in
>> 3.6.  Ref SOLR-2438:
>> https://issues.apache.org/jira/browse/SOLR-2438
>> 
>> But we would also like to be able to combine wildcards with fuzzy
>> searches, and wildcards with a phonetic filter.  I don't see anything about
>> phonetic filters in SOLR-2438 or SOLR-2921.  (
>> https://issues.apache.org/jira/browse/SOLR-2921)
>> Is it possible to make the phonetic filters MultiTermAware?
>> 
>> Regarding fuzzy queries, in Oracle Text I can search for "chr%" ("chr*" in
>> Solr..) and find both christian and kristian.  As far as I understand, this
>> is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined.
>> Is this correct, or have I misunderstood anything?  Are there any
>> workarounds or filter-combinations I can use to achieve the same result?
>> I've seen people suggest using a boolean query to combine the two, but I
>> don't really see how that would solve my "chr*"-problem.
>> 
>> As I mentioned earlier I'm quite new to this, so I apologize if what I'm
>> asking about only shows my ignorance..
>> 
>> 
>> Regards, Hågen

Re: add shard to index

2012-10-08 Thread Radim Kolar

Do it as it is done in cassandra database. Adding new node and 
redistributing data can be done in live system without problem it looks 
like this:


every cassandra node has key range assigned. instead of assigning keys 
to nodes like hash(key) mod nodes, then every node has its portion of 
hash keyspace. They do not need to be same, some node can have larger 
portion of keyspace then another.


hash function max possible value is 12.

shard1 - 1-4
shard2 - 5-8
shard3 - 9-12

now lets add new shard. In cassandra adding new shard by default cuts 
existing one by half, so you will have

shard1 - 1-2
shard23-4
shard35-8
shard4   9-12

see? You needed to move only documents from old shard1. Usually you are 
adding more then 1 shard during reorganization, you do not need to 
rebalance cluster by moving every node into different position in hash 
keyspace that much.

Re: add shard to index

2012-10-08 Thread Michael Della Bitta

AKA Consistent Hashing: http://en.wikipedia.org/wiki/Consistent_hashing

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Oct 8, 2012 at 11:33 AM, Radim Kolar  wrote:
> Do it as it is done in cassandra database. Adding new node and
> redistributing data can be done in live system without problem it looks like
> this:
>
> every cassandra node has key range assigned. instead of assigning keys to
> nodes like hash(key) mod nodes, then every node has its portion of hash
> keyspace. They do not need to be same, some node can have larger portion of
> keyspace then another.
>
> hash function max possible value is 12.
>
> shard1 - 1-4
> shard2 - 5-8
> shard3 - 9-12
>
> now lets add new shard. In cassandra adding new shard by default cuts
> existing one by half, so you will have
> shard1 - 1-2
> shard23-4
> shard35-8
> shard4   9-12
>
> see? You needed to move only documents from old shard1. Usually you are
> adding more then 1 shard during reorganization, you do not need to rebalance
> cluster by moving every node into different position in hash keyspace that
> much.

Re: Wildcards and fuzzy/phonetic query

2012-10-08 Thread Hågen Pihlstrøm Hasle


I understand that I'm quickly reaching the boundaries of my Solr-competence 
when I'm supposed to read about "Expert Level" concepts.. :)  I had already 
read it once, but now I read it again. Twice.  And I'm not sure if I understand 
it correctly..  So let me ask a follow-up question:
If I define an analyzer of type multiterm, will every filter I include for that 
analyzer be applied, even if it's not MultiTermAware?

To complicate this further, I'm not really sure if phonetic filters is a good 
match for our needs.  We search for names, and these names can come from all 
over the world.  We use DoubleMetaphone, and Wikipedia says it "tries to 
account for myriad irregularities in English of Slavic, Germanic, Celtic, 
Greek, French, Italian, Spanish, Chinese, and other origin".  So I guess it's 
quite good.  But how about names from the middle east, Pakistan or India?  Is 
DoubleMetaphone a good match also for names from these countries?  Are there 
any better algorithms?  

How about fuzzy-searches and wildcards, are they impossible to combine?

We actually do three queries for every search, one fuzzy, one phonetic and one 
using ngram.  Because I don't have too much confidence in the phonetic 
algorithm, I would really like to be able to combine fuzzy queries with 
wildcards.. :)


Regards, Hågen


On Oct 8, 2012, at 6:09 PM, Erick Erickson wrote:

> whether phonetic filters can be multiterm aware:
> 
> I'd be leery of this, as I basically don't quite know how that would
> behave. You'd have to insure that the  algorithms changed the
> first parts of the words uniformly, regardless of what followed. I'm
> pretty sure that _some_ phonetic algorithms do not follow this
> pattern, i.e. eric wouldn't necessarily have the same beginning
> as erickson. That said, some of the algorithms _may_ follow this
> rule and might be OK candidates for being MultiTermAware
> 
> But, you don't need this in order to try it out. See the "Expert Level
> Schema Possibilities"
> at:
> http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
> 
> You can define your own analysis chain for wildcards as part of your 
> 
> definition and include whatever you want, whether or not it's
> MultiTermAware and it
> will be applied at query time. Use the  entry
> as a basis. _But_ you shouldn't include anything in this section that
> produces more than one output per input token. Note, "token", not
> "field". I.e. a really bad candidate for this section is
> WordDelimiterFilterFactory
> if you use the admin/analysis page (which you'll get to know intimately) and
> look at a type that has WordDelimiterFilterFactory in its chain and
> put something
> like erickErickson1234, you'll see what I mean.. Make sure and check the
> "verbose" box
> 
> If you can determine that some of the phonetic algorithms _should_ be
> MultiTermAware, please feel free to raise a JIRA and we can discuss... I 
> suspect
> it'll be on a case-by-case basis.
> 
> Best
> Erick
> 
> On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle
>  wrote:
>> Hi!
>> 
>> I'm quite new to Solr, I was recently asked to help out on a project where 
>> the previous "Solr-person" quit quite suddenly.  I've noticed that some of 
>> our searches don't return the expected result, and I'm hoping you guys can 
>> help me out.
>> 
>> We've indexed a lot of names, and would like to search for a person in our 
>> system using these names.  We previously used Oracle Text for this, and we 
>> experience that Solr is much faster.  So far so good! :)  But when we try to 
>> use wildcards things start to to wrong.
>> 
>> We're using Solr 3.4, and I see that some of our problems are solved in 3.6. 
>>  Ref SOLR-2438:
>> https://issues.apache.org/jira/browse/SOLR-2438
>> 
>> But we would also like to be able to combine wildcards with fuzzy searches, 
>> and wildcards with a phonetic filter.  I don't see anything about phonetic 
>> filters in SOLR-2438 or SOLR-2921.  
>> (https://issues.apache.org/jira/browse/SOLR-2921)
>> Is it possible to make the phonetic filters MultiTermAware?
>> 
>> Regarding fuzzy queries, in Oracle Text I can search for "chr%" ("chr*" in 
>> Solr..) and find both christian and kristian.  As far as I understand, this 
>> is not possible in Solr, WildcardQuery and FuzzyQuery cannot be combined.  
>> Is this correct, or have I misunderstood anything?  Are there any 
>> workarounds or filter-combinations I can use to achieve the same result?  
>> I've seen people suggest using a boolean query to combine the two, but I 
>> don't really see how that would solve my "chr*"-problem.
>> 
>> As I mentioned earlier I'm quite new to this, so I apologize if what I'm 
>> asking about only shows my ignorance..
>> 
>> 
>> Regards, Hågen

Re: Fallout from the deprecation of setQueryType

2012-10-08 Thread Shawn Heisey


On 9/28/2012 9:09 AM, Shawn Heisey wrote:
I am planning and building up a test system with Solr 4.0, for my 
eventual upgrade.  I have not made a lot of progress so far, but I 
have come across a potential problem.


It's been over a week with no response to this.  Please see the original 
email for full details.


I have all but decided that I will allow the default /select handler to 
receive queries currently assigned to my lbcheck handler, and use a new 
handler called /search for everything on which I want to track statistics.


There is still a possible problem.  I have a broker core that has the 
shards parameter included in the standard request handler, so this would 
migrate to the new /search request handler.  In the past, you could 
change the handler used on those shards with a shards.qt parameter, but 
if the qt parameter is no longer allowed to have a slash, this isn't 
going to work in the future.  I will instead need an alternate config 
option that makes it use a new handler instead of /select.  Does that 
option already exist?


Thanks,
Shawn

Re: Wildcards and fuzzy/phonetic query

2012-10-08 Thread Erick Erickson

To answer your first question, yes, you've got it right. If you define
a multiterm section in your fieldType, whatever you put in that section
gets applied whether the underlying class is MultiTermAware or not.
Which means you can shoot yourself in the foot really bad ...

Well, you have 6 or so possibilities out of the box...and all of them will
fail at times. Fuzzy searches will also fail at times. And so will most
anything else you try. The problem is these are algorithmic in nature
and there are just too many cases that don't fit, human language is
so endlessly variable

Whether Middle Eastern names will work well with phonetic filters, well,
what's the input language? Are you indexing English (or Norwegian or...)
translations? In that case things should work OK since the phonetic variations
should be accounted for in the translations.

If you're indexing in different languages, you can apply different
phonetic filters
on different fields, so you might be able to work it that way. But if you're
indexing multiple languages in to a _single_ field, you'll have a lot of other
problems to solve before you start worrying about phonetics...

All I can really say is give it a try and see how well it works since "good"
search results are so domain dependent

Fuzzy searches + wildcards. I don't think you can do that reasonably, but
I'm not entirely sure.

Best
Erick

On Mon, Oct 8, 2012 at 2:28 PM, Hågen Pihlstrøm Hasle
 wrote:
>
> I understand that I'm quickly reaching the boundaries of my Solr-competence 
> when I'm supposed to read about "Expert Level" concepts.. :)  I had already 
> read it once, but now I read it again. Twice.  And I'm not sure if I 
> understand it correctly..  So let me ask a follow-up question:
> If I define an analyzer of type multiterm, will every filter I include for 
> that analyzer be applied, even if it's not MultiTermAware?
>
> To complicate this further, I'm not really sure if phonetic filters is a good 
> match for our needs.  We search for names, and these names can come from all 
> over the world.  We use DoubleMetaphone, and Wikipedia says it "tries to 
> account for myriad irregularities in English of Slavic, Germanic, Celtic, 
> Greek, French, Italian, Spanish, Chinese, and other origin".  So I guess it's 
> quite good.  But how about names from the middle east, Pakistan or India?  Is 
> DoubleMetaphone a good match also for names from these countries?  Are there 
> any better algorithms?
>
> How about fuzzy-searches and wildcards, are they impossible to combine?
>
> We actually do three queries for every search, one fuzzy, one phonetic and 
> one using ngram.  Because I don't have too much confidence in the phonetic 
> algorithm, I would really like to be able to combine fuzzy queries with 
> wildcards.. :)
>
>
> Regards, Hågen
>
>
> On Oct 8, 2012, at 6:09 PM, Erick Erickson wrote:
>
>> whether phonetic filters can be multiterm aware:
>>
>> I'd be leery of this, as I basically don't quite know how that would
>> behave. You'd have to insure that the  algorithms changed the
>> first parts of the words uniformly, regardless of what followed. I'm
>> pretty sure that _some_ phonetic algorithms do not follow this
>> pattern, i.e. eric wouldn't necessarily have the same beginning
>> as erickson. That said, some of the algorithms _may_ follow this
>> rule and might be OK candidates for being MultiTermAware
>>
>> But, you don't need this in order to try it out. See the "Expert Level
>> Schema Possibilities"
>> at:
>> http://searchhub.org/dev/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
>>
>> You can define your own analysis chain for wildcards as part of your 
>> 
>> definition and include whatever you want, whether or not it's
>> MultiTermAware and it
>> will be applied at query time. Use the  entry
>> as a basis. _But_ you shouldn't include anything in this section that
>> produces more than one output per input token. Note, "token", not
>> "field". I.e. a really bad candidate for this section is
>> WordDelimiterFilterFactory
>> if you use the admin/analysis page (which you'll get to know intimately) and
>> look at a type that has WordDelimiterFilterFactory in its chain and
>> put something
>> like erickErickson1234, you'll see what I mean.. Make sure and check the
>> "verbose" box
>>
>> If you can determine that some of the phonetic algorithms _should_ be
>> MultiTermAware, please feel free to raise a JIRA and we can discuss... I 
>> suspect
>> it'll be on a case-by-case basis.
>>
>> Best
>> Erick
>>
>> On Mon, Oct 8, 2012 at 11:21 AM, Hågen Pihlstrøm Hasle
>>  wrote:
>>> Hi!
>>>
>>> I'm quite new to Solr, I was recently asked to help out on a project where 
>>> the previous "Solr-person" quit quite suddenly.  I've noticed that some of 
>>> our searches don't return the expected result, and I'm hoping you guys can 
>>> help me out.
>>>
>>> We've indexed a lot of names, and would like to search for a person in our 
>>> system using these

Re: Reloading ExternalFileField blocks Solr

2012-10-08 Thread Martin Koch

Sure: We're boosting search results based on user actions which could be
e.g. the number of times a particular document has been read. In future,
we'd also like to boost by e.g. impressions (the number of times a document
has been displayed) and other values.

/Martin

On Mon, Oct 8, 2012 at 7:02 PM, Mikhail Khludnev  wrote:

> Martin,
>
> Can you tell me what's the content of that field, and how it should affect
> search result?
>
> On Mon, Oct 8, 2012 at 12:55 PM, Martin Koch  wrote:
>
> > Hi List
> >
> > We're using Solr-4.0.0-Beta with a 7M document index running on a single
> > host with 16 shards. We'd like to use an ExternalFileField to hold a
> value
> > that changes often. However, we've discovered that the file is apparently
> > re-read by every shard/core on *every commit*; the index is unresponsive
> in
> > this period (around 20s on the host we're running on). This is
> unacceptable
> > for our needs. In the future, we'd like to add other values as
> > ExternalFileFields, and this will make the problem worse.
> >
> > It would be better if the external file were instead read in in the
> > background, updating previously read relevant values for each shard as
> they
> > are read in.
> >
> > I guess a change in the ExternalFileField code would be required to
> achieve
> > this, but I have no experience here, so suggestions are very welcome.
> >
> > Thanks,
> > /Martin Koch - Issuu - Senior Systems Architect.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
> 
>  
>

How to efficiently find documents that have a specific value for a field OR the field does not exist at all

2012-10-08 Thread Artem Shnayder

I'm trying to find documents using this query:

field:"value" OR (*:* AND NOT field:[* TO *])

Which means, either field is set to "value" or the field does not exist in
the document.

I'm running this for ~20 fields in a single query strung together with
ANDs. The query time is high, averaging around 3.5s. Does anyone have
suggestions on how to optimize this query? As a last resort, using
technologies outside of Solr is a possibility.

All suggestions are greatly appreciated!


Thanks for your time and efforts,
Artem



PS. For the record, a colleague and I have brainstormed some idea of our
own:

* Adding a meta field to each document that consists of 1s and 0s, where
each character represents a field's existence (1 yes, 0 no). In this case
the query would look like: field:"value" OR signature:???0???   
So we are looking for a certain field (the 0) that definitely does not
exist and all the others we do not care about (wildcard). Note that this
would have to be a leading wildcard query or we could prepend a dummy
character to beginning. A bit of a hack.

* Using bitwise operations to find all documents whose set of fields is a
subset of they query's set of fields. This would be more work and would
require writing a custom query parser or search handler.

Funny behavior in facet query on large dataset

2012-10-08 Thread kevinlieb

I am doing a facet query in Solr (3.4) and getting very bad performance. 
This is in a solr shard with 22 million records, but I am specifically doing
a small time slice.  However even if I take the time slice query out it
takes the same amount of time, so it seems to be searching the entire data
set.

I am trying to find all documents that contain the word "dude" or "thedude"
or "anotherdude" and count how many of these were written by "eldudearino"
(of course names are changed here to protect the innocent...).

My query is like this: 

http://myserver:8080/solr/select/?fq=created_at:NOW-5MINUTES&q=(+(text:(%22dude%22+%22thedude%22+%22%23anotherdude%22))+)&facet=true&indent=on&facet.mincount=1&wt=xml&version=2.2&rows=0&fl=author_username,author_id&facet.field=author_username&fq=author_username:(%22@eldudearino%22)

Any ideas what I could be doing wrong?

Thanks in advance!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Funny-behavior-in-facet-query-on-large-dataset-tp4012584.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Funny behavior in facet query on large dataset

2012-10-08 Thread Erik Hatcher

Faceting at that scale takes time to "warm up".  If you've got your caches and 
such configured appropriately, then successive searches will be very fast, 
however you'll still need to do the cache warming (depends on the faceting 
implementation you're using, in this case you're probably using the FieldCache).

Faceting performance doesn't depend on the filters or query the caches that 
need to be built are indeed across the entire index.

Erik

On Oct 8, 2012, at 16:26 , kevinlieb wrote:

> I am doing a facet query in Solr (3.4) and getting very bad performance. 
> This is in a solr shard with 22 million records, but I am specifically doing
> a small time slice.  However even if I take the time slice query out it
> takes the same amount of time, so it seems to be searching the entire data
> set.
> 
> I am trying to find all documents that contain the word "dude" or "thedude"
> or "anotherdude" and count how many of these were written by "eldudearino"
> (of course names are changed here to protect the innocent...).
> 
> My query is like this: 
> 
> http://myserver:8080/solr/select/?fq=created_at:NOW-5MINUTES&q=(+(text:(%22dude%22+%22thedude%22+%22%23anotherdude%22))+)&facet=true&indent=on&facet.mincount=1&wt=xml&version=2.2&rows=0&fl=author_username,author_id&facet.field=author_username&fq=author_username:(%22@eldudearino%22)
> 
> Any ideas what I could be doing wrong?
> 
> Thanks in advance!
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Funny-behavior-in-facet-query-on-large-dataset-tp4012584.html
> Sent from the Solr - User mailing list archive at Nabble.com.

RE: Funny behavior in facet query on large dataset

2012-10-08 Thread Michael Ryan

Facets are only really useful if you want the counts for multiple values (e.g., 
"eldudearino", "ladudearina"). I'd suggest just leaving all the facet 
parameters off of that query - the numFound that is returned should give you 
what you want.

The slowness may be due to the facet cache needing to be regenerated (which 
should only happen if you've done a commit since the last time you ran the 
query). Regardless of what time slice you use, behind the scenes Solr has to 
basically get the author_username value for every document in the index and put 
them in an in-memory data structure. This can be quite slow, especially if 
there are many distinct values for that field.

-Michael

-Original Message-
From: kevinlieb [mailto:ke...@politear.com] 
Sent: Monday, October 08, 2012 4:27 PM
To: solr-user@lucene.apache.org
Subject: Funny behavior in facet query on large dataset

I am doing a facet query in Solr (3.4) and getting very bad performance. 
This is in a solr shard with 22 million records, but I am specifically doing a 
small time slice.  However even if I take the time slice query out it takes the 
same amount of time, so it seems to be searching the entire data set.

I am trying to find all documents that contain the word "dude" or "thedude"
or "anotherdude" and count how many of these were written by "eldudearino"
(of course names are changed here to protect the innocent...).

My query is like this: 

http://myserver:8080/solr/select/?fq=created_at:NOW-5MINUTES&q=(+(text:(%22dude%22+%22thedude%22+%22%23anotherdude%22))+)&facet=true&indent=on&facet.mincount=1&wt=xml&version=2.2&rows=0&fl=author_username,author_id&facet.field=author_username&fq=author_username:(%22@eldudearino%22)

Any ideas what I could be doing wrong?

Thanks in advance!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Funny-behavior-in-facet-query-on-large-dataset-tp4012584.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Reloading ExternalFileField blocks Solr

2012-10-08 Thread Mikhail Khludnev

Martin,

I have kind of hack approach in mind regarding hiding document from search.
So, it's a little bit easier than your task. I'm going to deliver talk
about it http://www.apachecon.eu/schedule/presentation/89/ .
Frankly speaking, there is no reliable out-of-the-box solution for it. I
saw that DocValues has been integrated with FunctionQueries already, but
DocValues updates, which sounds like doable thing, has not been delivered
yet.

Regards

On Mon, Oct 8, 2012 at 11:54 PM, Martin Koch  wrote:

> Sure: We're boosting search results based on user actions which could be
> e.g. the number of times a particular document has been read. In future,
> we'd also like to boost by e.g. impressions (the number of times a document
> has been displayed) and other values.
>
> /Martin
>
> On Mon, Oct 8, 2012 at 7:02 PM, Mikhail Khludnev <
> mkhlud...@griddynamics.com
> > wrote:
>
> > Martin,
> >
> > Can you tell me what's the content of that field, and how it should
> affect
> > search result?
> >
> > On Mon, Oct 8, 2012 at 12:55 PM, Martin Koch  wrote:
> >
> > > Hi List
> > >
> > > We're using Solr-4.0.0-Beta with a 7M document index running on a
> single
> > > host with 16 shards. We'd like to use an ExternalFileField to hold a
> > value
> > > that changes often. However, we've discovered that the file is
> apparently
> > > re-read by every shard/core on *every commit*; the index is
> unresponsive
> > in
> > > this period (around 20s on the host we're running on). This is
> > unacceptable
> > > for our needs. In the future, we'd like to add other values as
> > > ExternalFileFields, and this will make the problem worse.
> > >
> > > It would be better if the external file were instead read in in the
> > > background, updating previously read relevant values for each shard as
> > they
> > > are read in.
> > >
> > > I guess a change in the ExternalFileField code would be required to
> > achieve
> > > this, but I have no experience here, so suggestions are very welcome.
> > >
> > > Thanks,
> > > /Martin Koch - Issuu - Senior Systems Architect.
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Tech Lead
> > Grid Dynamics
> >
> > 
> >  
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

Re: Funny behavior in facet query on large dataset

2012-10-08 Thread Chris Hostetter


: a small time slice.  However even if I take the time slice query out it
: takes the same amount of time, so it seems to be searching the entire data
: set.

a) you might try using facet.method=enum - in some special cases it may be 
faster then the default (facet.method=fc).

: I am trying to find all documents that contain the word "dude" or "thedude"
: or "anotherdude" and count how many of these were written by "eldudearino"
: (of course names are changed here to protect the innocent...).

b) field faceting isn't really designed for this type of problem.  field 
faceting is very suitable for questions like "find all docs matching 
QUERY, and for all of those docs, give me a list of hte top N authors and 
how many docs were written by those authors.

c) If you just wnat to query for just the docs written by a single author, 
you cna use an "fq" like you do in your example, and then look at the 
numFound to know the total-- but in that case the faceting is just making 
extra work to generate counts of "0" for all of the other authors.

d) if you want to query for an arbitrary set of documents, and then know 
how many of those documents were written by a particular author (or each 
of a particular set of authors) try "facet.query" instead.

...&facet=true&facet.query=author_username:(%22@eldudearino%22)


-Hoss

Re: How to efficiently find documents that have a specific value for a field OR the field does not exist at all

2012-10-08 Thread Ahmet Arslan

> field:"value" OR (*:* AND NOT field:[* TO *])
> 
> Which means, either field is set to "value" or the field
> does not exist in
> the document.

Instead of field:[* TO *], you can define a default value in schema.xml. Or 
DefaultValueUpdateProcessorFactory in solrconfig.

With this, "the field does not exist in the document" part becomes 
field:MySpecialDefaultValue

Re: Funny behavior in facet query on large dataset

2012-10-08 Thread kevinlieb

Thanks for all the replies. 

I oversimplified the problem for the purposes of making my post small and
concise.  I am really trying to find the counts of documents by a list of 10
different authors that match those keywords.  Of course on looking up a
single author there is no reason to do a facet query.  To be clearer:
Find all documents that contain the word "dude" or "thedude" or
"anotherdude" and count how many of these were written by "eldudearino" and
"zeedudearino" and "adudearino" and "beedudearino"

I tried facet.query as well as facet.method=fc and neither really helped.

We are constantly adding documents to the solr index and committing, every
few seconds, which is probably why this is not working well.

Seems we need to re-architect the way we are doing this... 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Funny-behavior-in-facet-query-on-large-dataset-tp4012584p4012610.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: long query response time in shards search

2012-10-08 Thread Jason

Hi,

We're using Solr 4.0 and servicing patent search.
Patent search intends to very complex queries including wildcard.
I think Ngram or EdgeNgram filter is alternative.
But every terms included a query don't have wildcard.
So we can't use that filter.

If I make empty core and use in main core that just merge search results, is
it helpful?
Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-query-response-time-in-shards-search-tp4012366p4012628.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Funny behavior in facet query on large dataset

2012-10-08 Thread Shawn Heisey


On 10/8/2012 4:09 PM, kevinlieb wrote:

Thanks for all the replies.

I oversimplified the problem for the purposes of making my post small and
concise.  I am really trying to find the counts of documents by a list of 10
different authors that match those keywords.  Of course on looking up a
single author there is no reason to do a facet query.  To be clearer:
Find all documents that contain the word "dude" or "thedude" or
"anotherdude" and count how many of these were written by "eldudearino" and
"zeedudearino" and "adudearino" and "beedudearino"

I tried facet.query as well as facet.method=fc and neither really helped.

We are constantly adding documents to the solr index and committing, every
few seconds, which is probably why this is not working well.

Seems we need to re-architect the way we are doing this...


I would definitely consider increasing the amount of time between 
commits.  You can add documents at whatever interval you want, but if 
you only do commits every minute or two, your caches will be much more 
useful.


Your time slice filter query (NOW-5MINUTES) will never be cached, 
because NOW is measured in milliseconds and will therefore be different 
for every query.  You might consider doing NOW/MINUTE-5MINUTES instead 
.. or even [NOW/MINUTE-5MINUTES TO *] so that you actually are dealing 
with a range.  For the space of that minute (at least until the cache 
gets invalidated by a commit), the filter cache entry will be valid.


Some general questions that may matter: How big are all your index 
directories on this server, how much RAM is in the server, and how much 
RAM are you giving to Java?  I'm also curious how big your Solr caches 
are, what the autowarm counts are, and how long it is taking for your 
caches to warm up after each commit.  You can get the warm times from 
the cache statistics in the admin interface.


Thanks,
Shawn

Re: Funny behavior in facet query on large dataset

2012-10-08 Thread Otis Gospodnetic

Hi Kevin,

Right, it's the very frequent commits, most likely.  Change commits
to, say, every 60 or 120 seconds and compare the performance.  I think
you guys use SPM, so check the Cache graphs (hit % specifically)
before and after the above change.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Mon, Oct 8, 2012 at 6:09 PM, kevinlieb  wrote:
> Thanks for all the replies.
>
> I oversimplified the problem for the purposes of making my post small and
> concise.  I am really trying to find the counts of documents by a list of 10
> different authors that match those keywords.  Of course on looking up a
> single author there is no reason to do a facet query.  To be clearer:
> Find all documents that contain the word "dude" or "thedude" or
> "anotherdude" and count how many of these were written by "eldudearino" and
> "zeedudearino" and "adudearino" and "beedudearino"
>
> I tried facet.query as well as facet.method=fc and neither really helped.
>
> We are constantly adding documents to the solr index and committing, every
> few seconds, which is probably why this is not working well.
>
> Seems we need to re-architect the way we are doing this...
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Funny-behavior-in-facet-query-on-large-dataset-tp4012584p4012610.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-08 Thread Otis Gospodnetic

Hi,

Qs:
* Have you tried StreamingUpdateSolrServer?
* Newever version of Solr(J)?

When things hang, jstack your app that uses SolrJ and Solr a few times
and you should be able to see where they are stuck.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Mon, Oct 8, 2012 at 9:52 PM, Briggs Thompson
 wrote:
> I am running into an issue of a multithreaded SolrJ client application used
> for indexing is getting into a hung state. I responded to a separate thread
> earlier today with someone that had the same error, see
> http://lucene.472066.n3.nabble.com/SolrJ-IOException-td4010026.html
>
> I did some digging and experimentation and found something interesting.
> When starting up the application, I see the following in Solr logs:
> Creating new http client, config:maxConnections=200&maxConnectionsPerHost=8
>
> The way I instantiate the HttpSolrServer through SolrJ is like the
> following
>
> HttpSolrServer solrServer = new HttpSolrServer(serverUrl);
> solrServer.setConnectionTimeout(1000);
> solrServer.setDefaultMaxConnectionsPerHost(100);
> solrServer.setMaxTotalConnections(100);
> solrServer.setParser(new BinaryResponseParser());
> solrServer.setRequestWriter(new BinaryRequestWriter());
>
> It seems as though the maxConnections and maxConnectionsPerHost are not
> actually getting set. Anyone seen this problem or have an idea how to
> resolve?
>
> Thanks,
> Briggs Thompson

Re: long query response time in shards search

2012-10-08 Thread Otis Gospodnetic

Hi,

We've explored this with a few clients a while back.  If I remember
correctly, this doesn't make much difference and I don't expect it
will make any noticable difference for you since all your cores are on
that same 1 server.  If you had 1 server with more CPU cores you would
see better numbers.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Mon, Oct 8, 2012 at 9:43 PM, Jason  wrote:
> Hi,
>
> We're using Solr 4.0 and servicing patent search.
> Patent search intends to very complex queries including wildcard.
> I think Ngram or EdgeNgram filter is alternative.
> But every terms included a query don't have wildcard.
> So we can't use that filter.
>
> If I make empty core and use in main core that just merge search results, is
> it helpful?
> Thanks.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/long-query-response-time-in-shards-search-tp4012366p4012628.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Reloading ExternalFileField blocks Solr

2012-10-08 Thread Otis Gospodnetic

Hi Martin,

Perhaps you could make a small change in Solr to add "don't reload EFF
if it hasn't been modified since it was last opened".  I assume you
commit pretty often, but don't modify EFF files that often, so this
could save you some needless loading.  That said, I'd be surprised EFF
doesn't already do this... I didn't check.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Mon, Oct 8, 2012 at 4:55 AM, Martin Koch  wrote:
> Hi List
>
> We're using Solr-4.0.0-Beta with a 7M document index running on a single
> host with 16 shards. We'd like to use an ExternalFileField to hold a value
> that changes often. However, we've discovered that the file is apparently
> re-read by every shard/core on *every commit*; the index is unresponsive in
> this period (around 20s on the host we're running on). This is unacceptable
> for our needs. In the future, we'd like to add other values as
> ExternalFileFields, and this will make the problem worse.
>
> It would be better if the external file were instead read in in the
> background, updating previously read relevant values for each shard as they
> are read in.
>
> I guess a change in the ExternalFileField code would be required to achieve
> this, but I have no experience here, so suggestions are very welcome.
>
> Thanks,
> /Martin Koch - Issuu - Senior Systems Architect.

Problem with dataimporter.request

2012-10-08 Thread Zakka Fauzan

I'm quite new in SOLR, I have a question regarding the request for data
importer.

In my data-config.xml, i have something like this





However, everytime I execute delta-import (/dataimport?command=delta-import),
it always gives me exception like this:

Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query:
SELECT max(id) AS id FROM  Processing Document # 1

I believe this error exists because the system didn't recognize
${dataimporter.request.dataView}, but I don't know how to make that
recognized?

*I also asked the very same question in
http://stackoverflow.com/questions/12793025/cannot-get-anything-from-dataimporter-request-on-updating-index,
if you want to get some reputations there too, you can answer there. Thank
you!

--
Zakka Fauzan

Re: Help with Velocity in SolrItas

2012-10-08 Thread Paul Libbrecht

Lance,

this is the kind of fun that happens with Velocity all day long...

In general, when it outputs the variable name, it's the that the variable is 
null; this can happen when a method is missing for example 
There are actually effective uses of this brain-dead-debugger-oriented-practice!

I would suppose that the class of your $sentence is not something that has a 
get(String) method.
With a normal debugging, this should be shown in a console.
This is strengthened by the fact that your output of $sentence is not exactly 
the same as a the output of  java.util.HashMap for example.

When in this situation, I generally make
   Raw data: $sentence of class $sentence.getClass()
(note: class is not a bean property, you need the method call)

Hope it helps.

Paul

PS: to stop this hell, I have a JSP pendant to the VelocityResponseWriter, is 
this something of interest for someone so that I contribute it?



Le 9 oct. 2012 à 04:39, Lance Norskog a écrit :

> I am adding something to Solaritas, under /browse. One bit of Velocity
> code does not unpack the result structure the way I think it should.
> Please look at this- there is something I am missing about
> tree-walking.
> 
> Here is the XML from a search result:
> 
>  
>0
>
>A bunch of words
>
>  
> more sentences 
> 
> 
> Here is my Velocity code:
> #foreach($sentence in $outer)
>Raw data: $sentence
>
>#set($index = $sentence.get('index'))
>#set($text = $sentence.get('text'))
>
>  Index: $index
>  
>  Text: $text
>  
>
> #end
> 
> Here is the output:
> 
>Raw data: sentence={index=0,text= A bunch of words}
>Index: $index
>Text: $text

Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-08 Thread Sami Siren

On Tue, Oct 9, 2012 at 4:52 AM, Briggs Thompson
 wrote:
> I am running into an issue of a multithreaded SolrJ client application used
> for indexing is getting into a hung state. I responded to a separate thread
> earlier today with someone that had the same error, see
> http://lucene.472066.n3.nabble.com/SolrJ-IOException-td4010026.html
>
> I did some digging and experimentation and found something interesting.
> When starting up the application, I see the following in Solr logs:
> Creating new http client, config:maxConnections=200&maxConnectionsPerHost=8


Do you see this in Solr log or at the client end, the default
parameters for http client is the following:

max total: 128
max per host: 32

> The way I instantiate the HttpSolrServer through SolrJ is like the
> following
>
> HttpSolrServer solrServer = new HttpSolrServer(serverUrl);
> solrServer.setConnectionTimeout(1000);
> solrServer.setDefaultMaxConnectionsPerHost(100);
> solrServer.setMaxTotalConnections(100);
> solrServer.setParser(new BinaryResponseParser());
> solrServer.setRequestWriter(new BinaryRequestWriter());

Do you instantiate the client just once and not for every request you
make? If you, for some reason, need to recreate the client again and
again you must call shutdown() when you're done with it to release the
resources.

> It seems as though the maxConnections and maxConnectionsPerHost are not
> actually getting set. Anyone seen this problem or have an idea how to
> resolve?

I did some experiments with the solrj and from what it looked like it
seems to respect the values that you set.

--
 Sami Siren

59 matches

Mail list logo