Re: random record from solr server

2011-07-18 Thread Ahmet Arslan
> How can I get random 100 record from last two days record
> from solr server.
> 
> I am using solr 3.1

Hello, add this random field definition to you schema.xml



Generate some seed value ( e.g. 125) at query time,

and issue a query something like this:

q:add_date[NOW-2DAYS TO *]&sort=random_125&start=0&rows=100

If you use different seed values each time you will get random 100 record in 
each request. I assume you have date field to store add date or similar.


Re: SolrJ Collapsable Query Fails

2011-07-18 Thread Kurt Sultana
Hi,

Thanks for the code, snippet, it was very useful, however, can you please
include a very small description of certain unknown variables such as
'groupedInfo', 'ResultItem', 'searcher', 'fields' and the method
'solrDocumentToResultItem'?

Thanks

On Sat, Jul 16, 2011 at 3:36 PM, Kurt Sultana  wrote:

> > Thanks for the information. However, I still have one more
> > problem. I am
> > iterating over the values of the NamedList. I have 2
> > values, one
> > being 'responseHeader' and the other one being 'grouped'. I
> > would like to
> > access some information stored within the grouped section,
> > which has
> > data structured like so:
> >
> >
> grouped={attr_directory={matches=4,groups=[{groupValue=C:\Users\rvassallo\Desktop\Index,doclist={numFound=2,start=0,docs=[SolrDocument[{attr_meta=[Author,
> > kcook, Last-Modified, 2011-03-02T14:14:18Z...
> >
> > With the 'get("group")' method I am only able to access the
> > entire
> > '{attr_directory={matches=4,g...' section. Is there some
> > functionality which
> > allows me to get other data? Something like this for
> > instance:
> > 'get("group.matches")' or maybe
> > 'get(group.attr_directory.matches)' (which
> > will yield the value of 4), or do I need to process the
> > String that the
> > 'get("...")' returns to get what I need?
> >
> > Thanks :)
>
> I think accessing the relevant portion in a NamedList is troublesome. I
> suggest you to look at existing codes in solrJ. e.g. How facet info is
> extracted from NamedList.
>
> I am sending you the piece of code that I used to access grouped info.
> Hopefully It can give you some idea.
>
>  NamedList signature = (NamedList) groupedInfo.get("attr_directory");
>
>if (signature == null) return new ArrayList(0);
>
>matches.append(signature.get("matches"));
>
>
>@SuppressWarnings("unchecked")
>ArrayList groups = (ArrayList)
> signature.get("groups");
>
>ArrayList<> resultItems = new ArrayList<>(groups.size());
>
>StringBuilder builder = new StringBuilder();
>
>
>for (NamedList res : groups) {
>
>  ResultItem resultItem = null;
>
>  String hash = null;
>  Integer found = null;
>  for (int i = 0; i < res.size(); i++) {
>String n = res.getName(i);
>
>Object o = res.getVal(i);
>
>if ("groupValue".equals(n)) {
>  hash = (String) o;
>} else if ("doclist".equals(n)) {
>  DocList docList = (DocList) o;
>  found = docList.matches();
>
>  try {
>final SolrDocumentList list =
> SolrPluginUtils.docListToSolrDocumentList(docList, searcher, fields, null);
>builder.setLength(0);
>
>if (list.size() > 0)
>  resultItem = solrDocumentToResultItem(list.get(0), debug);
>
>for (final SolrDocument document : list)
>  builder.append(document.getFieldValue("id")).append(',');
>
>
>  } catch (final IOException e) {
>LOG.error("Unexpected Error", e);
>  }
>}
>
>
>  }
>
>  if (found != null && found > 1 && resultItem != null) {
>resultItem.setHash(hash);
>resultItem.setFound(found);
>builder.setLength(builder.length() - 1);
>resultItem.setId(builder.toString());
>  }
>
>  // debug
>
>
>  resultItems.add(resultItem);
>}
>
>return resultItems;
>
>


Re: Fuzzy Query Param

2011-07-18 Thread steffen_kmt

entdeveloper wrote:
> 
> I'm using Solr trunk. 
> 

Hi!

I'm using solr 3.1.0 and the feature is not implemented. 
When I search for a word with e.g. ~2 the "~2" is interpreted as part of the
search string. 
Where can I get the trunk version? Is it a stable version or just for
testing purposes?

thanks a lot,

steffen



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fuzzy-Query-Param-tp3120235p3178565.html
Sent from the Solr - User mailing list archive at Nabble.com.


LockObtainFailedException and open finalizing IndexWriters

2011-07-18 Thread Michael Kuhlmann
Hi,

we are running Solr 3.2.0 on Jetty for a web application. Since we just
went online and are still in beta tests, we don't have very much load on
our servers (indeed, they're currently much oversized for the current
usage), and our index size on file system ist just 1.1 MB.

We have one dedicated Solr instance for updates, and two replicated
read-only servers for requests. The update server gets filled by three
different Java web servers, each has a distinct Quartz job for its
updates. Every such Quartz job takes all collected updates, sends them
via Solrj's addBeans() method, and from time to time, they send an
additional commit() after that. Each update job has a
CommonHTTPSolrServer instance, which is a Spring controlled singleton.

We already had LockObtainFailedExceptions before, raising every few
days. Sometimes, we had such an exception before:
org.apache.solr.common.SolrException: java.io.IOException: directory
'/data/solr/data/index' exists and is a directory, but cannot be listed:
list() returned null

This looks like if there were no more file handles from the operating
system. This is strange, since the only index directory never had more
than 100 files, if ever. However, we raised ulimit -n from 1024 to 4096,
and reduced mergeFactor from 10 to 5, which firsted helped us with our
problem. Until yesterday.

Again, we had this:
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out: SimpleFSLock@solr/main/data/index/write.lockat
org.apache.lucene.store.Lock.obtain(Lock.java:84)at
org.apache.lucene.index.IndexWriter.(IndexWriter.java:1114)at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
 at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:101)
  at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
   at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
   at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)

.


When we deleted the write.lock file without restarting Solr, several
hours later we had 441 same log entries:

Jul 18, 2011 7:20:29 AM org.apache.solr.update.SolrIndexWriter finalize
SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a
bug -- POSSIBLE RESOURCE LEAK!!!

Wow, if there really were 441 open IndexWriters trying to access the
index directory, it's no wonder that there will be Lock timeouts sooner
or later! However, I have no clue why there are so many IndexWriters
opened and never closed. The only accessing Solr instances are pure Java
applications using Solrj. Each application only has one SolrServer
instance - and even of not, this shouldn't harm, AFAIK. The update job
is started every five seconds. The installation is a pure 3.2.0 Solr,
without additional jars. And all jars are of the correct revision. The
solrconfig.xml is based on the example configuration, with nothing
special. We currently don't have any own extensions running. There is
absolutely only one jetty instance running on the machine. And I checked
the solr.xml, it's only one core defined, and we don't do any additional
core administration.

I'm using Solr since the beginning of 2010, but never had such a
problem. Any help is welcome.

Greetings,
Kuli


Re: Deleted docs in IndexWriter Cache (NRT related)

2011-07-18 Thread Grijesh
optimize ensures that deleted docs and terms will not be displayed.

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deleted-docs-in-IndexWriter-Cache-NRT-related-tp3177877p3178670.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Start parameter messes with rows

2011-07-18 Thread pravesh
 >i just wanna be clear in the concepts of core and shard ?
>a single core is an index with same schema  , is this wat core really is ?
>can a single core contain two separate indexes with different schema in it
?
>Is a shard  refers to a collection of index in a single physical machine
>?can a single core be presented in different shards ? 

You might look into following thread:

http://lucene.472066.n3.nabble.com/difference-between-shard-and-core-in-solr-td3178214.html
 


Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Start-parameter-messes-with-rows-tp3174637p3178678.html
Sent from the Solr - User mailing list archive at Nabble.com.


Dismax RequestHandler adn Fuzzy Search

2011-07-18 Thread steffen_kmt
Hi!
I would like to implement a fuzzy search with the Dismax Request Handler.
I noticed that there are some discussions about that, but all from the year
2009. (adding ~ in solrconfig)

Is it still at the same state or may be already implemented?

Is there another option to do fuzzy search with a dismax requestHandler?

thanks in advance!






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-RequestHandler-adn-Fuzzy-Search-tp3178747p3178747.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dismax RequestHandler adn Fuzzy Search

2011-07-18 Thread Ahmet Arslan
> Hi!
> I would like to implement a fuzzy search with the Dismax
> Request Handler.
> I noticed that there are some discussions about that, but
> all from the year
> 2009. (adding ~ in solrconfig)
> 
> Is it still at the same state or may be already
> implemented?

It is already implemented. https://issues.apache.org/jira/browse/SOLR-1553


I found a sorting bug in solr/lucene

2011-07-18 Thread Jason Toy
Hi all,  I found a bug that exists in the 3.1 and in trunk, but not in 1.4.1

When I try to sort by a column with a colon in it like
"scores:rails_f",  solr has cutoff the column name from the colon
forward so "scores:rails_f" becomes "scores"

To test, I inserted this doc:

In 1.4.1 I was able to insert this  doc:
User
14914457UserSan
Franciscojtoyjtoylife
hacker0.05


And then I can run the query:

http://localhost:8983/solr
/select?q=life&qf=description_text&defType=dismax&sort=scores:rails_f+desc

On 1.4.1 the query runs fine and returns the expected results.

If I insert the same document into solr 3.1 or trunk and run the same query
I get the error:

Problem accessing /solr/select. Reason:

undefined field scores

I can see in the lucene index that the data for scores:rails_f is in the
document. So solr/lucene is allowing me to store docs with fields that have
colons in it, but then I am not able to sort on it.

Can anyone else confirm this is a bug. Is this in lucene or solr? I believe
the issue resides in solr.




-- 
- sent from my mobile
6176064373


Re: Dismax RequestHandler adn Fuzzy Search

2011-07-18 Thread steffen_kmt
thanks for your answer!

I tried to add the following line to the solrconfig.xml file:
fieldName~0.8

Before adding the line I got 14 results for a request. After adding the line
(and restarting solr) I did the same request and changed just one letter of
the string. I was expecting that I have to get more or less the same
results, but what I get are no results.
What might by the reason?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-RequestHandler-adn-Fuzzy-Search-tp3178747p3178976.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dismax RequestHandler adn Fuzzy Search

2011-07-18 Thread Ahmet Arslan
> thanks for your answer!
> 
> I tried to add the following line to the solrconfig.xml
> file:
> fieldName~0.8
> 
> Before adding the line I got 14 results for a request.
> After adding the line
> (and restarting solr) I did the same request and changed
> just one letter of
> the string. I was expecting that I have to get more or less
> the same
> results, but what I get are no results.
> What might by the reason?

edismax enables fuzzy search but you should use that tilde sing in q parameter. 
What is your purpose of using it in pf parameter?


Re: I found a sorting bug in solr/lucene

2011-07-18 Thread Nicholas Chase
Seems to me that you wouldn't want to use a colon in a field name, since 
the search syntax uses it (ie, to find a document with foo = bar, you 
use foo:bar).  I don't know whether that's actually prohibited, but that 
could be your problem.


  Nick

On 7/18/2011 8:10 AM, Jason Toy wrote:

Hi all,  I found a bug that exists in the 3.1 and in trunk, but not in 1.4.1

When I try to sort by a column with a colon in it like
"scores:rails_f",  solr has cutoff the column name from the colon
forward so "scores:rails_f" becomes "scores"


Re: I found a sorting bug in solr/lucene

2011-07-18 Thread Jason Toy
I am using a fairly popular library (sunspot-solr for ruby) on top of solr
that introduces the use of a colon, so I will modify the library, but I
think  there is still a bug as this stopped working in recent version of
solr. Solr should also not allow the data into the doc in the first place if
it can't sort by that column name.

On Mon, Jul 18, 2011 at 9:47 AM, Nicholas Chase wrote:

> Seems to me that you wouldn't want to use a colon in a field name, since
> the search syntax uses it (ie, to find a document with foo = bar, you use
> foo:bar).  I don't know whether that's actually prohibited, but that could
> be your problem.
>
>   Nick
>
>
> On 7/18/2011 8:10 AM, Jason Toy wrote:
>
>> Hi all,  I found a bug that exists in the 3.1 and in trunk, but not in
>> 1.4.1
>>
>> When I try to sort by a column with a colon in it like
>> "scores:rails_f",  solr has cutoff the column name from the colon
>> forward so "scores:rails_f" becomes "scores"
>>
>


-- 
- sent from my mobile
6176064373


Re: Dismax RequestHandler adn Fuzzy Search

2011-07-18 Thread steffen_kmt

iorixxx wrote:
> 
> What is your purpose of using it in pf parameter?
> 
I don't know. I have seen it somewhere and i thought it has to be in the pf
parameter.


iorixxx wrote:
> 
> edismax enables fuzzy search but you should use that tilde sing in q
> parameter. 
> 
I tried this in qf parameter (fieldname~0.8^2) but have still the same
problem: no results

How is the syntax when I do this in the q parameter?

here is my requesthandler. May be it helps:


 
   all
   10
edismax
   *:*
   *,score
population^0.0005
score desc, 
country^1 country_exact^1.5 city~0.8^2 city_exact^2.5
street^2 street_exact^2.5 poi_name^1.5 housenumber^2 poi_name_exact^2
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-RequestHandler-adn-Fuzzy-Search-tp3178747p3179261.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: XInclude Multiple Elements

2011-07-18 Thread Stephen Duncan Jr
Does anyone use XInclude?  I'd like to hear about any successful usage at all.

Stephen Duncan Jr
www.stephenduncanjr.com


[Announce] Solr 3.3 with RankingAlgorithm NRT capability, very high performance 10000 tps

2011-07-18 Thread Nagendra Nagarajayya

Hi!

I would like to announce the availability of Solr 3.3 with 
RankingAlgorithm and Near Real Time (NRT) search capability now. The NRT 
performance is very high, 10,000 documents/sec with the MBArtists 390k 
index. The NRT functionality allows you to add documents without the 
IndexSearchers being closed or caches being cleared. A commit is also 
not needed with the document update. Searches can run concurrently with 
document updates. No changes are needed except for enabling the NRT 
through solrconfig.xml.


RankingAlgorithm query performance is now 3x times faster than before 
and is exposed as the Lucene API. This release also adds supports for 
the last document with a unique id to be searchable and visible in 
search results in case of multiple updates of the document.


I have a wiki page that describes NRT performance in detail and can be 
accessed from here:


http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver3.x

You can download Solr 3.3 with RankingAlgorithm (NRT version) from here:

http://solr-ra.tgels.org

I would like to invite you to give this version a try as the performance 
is very high.


Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org





Re: SOLR Shard failover Query

2011-07-18 Thread Shawn Heisey

On 7/17/2011 11:03 PM, pravesh wrote:

Hi,

SOLR has sharding feature, where we can distribute single search request
across shards; the results are collected,scored, and, then response is
generated.

Wanted to know, what happens in case of failure of specific shard(s),
suppose, one particular shard machine is down? Does the request fails, or,
is this handled gracefully by SOLR?


The request will fail.  There were two patches that I knew of for 
dealing with this, both of which are very old.  It looks like there has 
been another one since then, much more recent.


Originally available:
https://issues.apache.org/jira/browse/SOLR-1143
https://issues.apache.org/jira/browse/SOLR-1537 (incorporates 
functionality of SOLR-1143)


Available since I last looked:
https://issues.apache.org/jira/browse/SOLR-2253

That said ... in a production setting, you are better off having a full 
redundant chain of servers than relying on a stopgap measure like this.  
IMHO, and the HO of many others, if a server failure does not leave you 
fully functional (including access to your full index), you haven't done 
enough.  Most of the time, temporary reduced performance is acceptable, 
reduced functionality is not.


When I first set things up, I was using SOLR-1537 on Solr 1.5-dev.  By 
the time I went into production, I had abandoned that idea and rolled 
out a stock 1.4.1 index with two complete server chains, each with 7 
shards.  After asking this mailing list and internally discussing it, we 
decided that partial index access on machine failure was not good 
enough.  If it takes a little longer than normal to find things, users 
may still stick around.  If they cannot find what they are looking for 
at all, they'll go somewhere else.


Hope this helps!

Shawn



NRT and commit behavior

2011-07-18 Thread Nicholas Chase
Very glad to hear that NRT is finally here!  But my question is this: 
will things still come to a standstill during a commit?


Thanks...

  Nick


Re: difference between shard and core in solr

2011-07-18 Thread Briggs Thompson
I think everything you said is correct for static schemas, but a single core
does not necessarily have a unique schema since you can have dynamic
fields.

With dynamic fields, you can have multiple types of documents in the same
index (core), and a multiple types of indexed fields specific to individual
document types - all in the same core.

Briggs Thompson



On Mon, Jul 18, 2011 at 2:22 AM, pravesh  wrote:

> >a single core is an index with same schema  , is this wat core really is ?
>
>  YES. A single core is a independent index with its own unique schema. You
> go with a new core for cases where your schema/analysis/search requirements
> are completely different from your existing core(s).
>
> >can a single core contain two separate indexes with different schema in it
> ?
>
> NO (for same reason as explained above).
>
> >Is a shard  refers to a collection of index in a single physical machine
> >?can a single core be presented in different shards ?
>
> You can think of a Shard as a big index distributed across a cluster of
> machines. So all shards belonging to a single core share same
> schema/analysis/search requirements. You go with sharding when index is not
> scalable on a single machine, or, when your index grows really big in size.
>
>
> Thanx
> Pravesh
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/difference-between-shard-and-core-in-solr-tp3178214p3178249.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: NRT and commit behavior

2011-07-18 Thread Yonik Seeley
On Mon, Jul 18, 2011 at 10:53 AM, Nicholas Chase  wrote:
> Very glad to hear that NRT is finally here!  But my question is this: will
> things still come to a standstill during a commit?

New updates can now proceed in parallel with a commit, and
searches have always been completely asynchronous w.r.t. commits.

-Yonik
http://www.lucidimagination.com


Re: NRT and commit behavior

2011-07-18 Thread Jonathan Rochkind
In practice, in my experience at least, a very 'expensive' commit can 
still slow down searches significantly, I think just due to CPU (or 
i/o?) starvation. Not sure anything can be done about that.  That's my 
experience in Solr 1.4.1, but since searches have always been async with 
commits, it probably is the same situation even in more recent versions, 
I'd guess.


On 7/18/2011 11:07 AM, Yonik Seeley wrote:

On Mon, Jul 18, 2011 at 10:53 AM, Nicholas Chase  wrote:

Very glad to hear that NRT is finally here!  But my question is this: will
things still come to a standstill during a commit?

New updates can now proceed in parallel with a commit, and
searches have always been completely asynchronous w.r.t. commits.

-Yonik
http://www.lucidimagination.com



Re: Dismax RequestHandler adn Fuzzy Search

2011-07-18 Thread Ahmet Arslan
> I tried this in qf parameter (fieldname~0.8^2) but have
> still the same
> problem: no results

Okey it is not qf nor pf. Just plain q parameter. q=test~0.8


Re: Dismax RequestHandler adn Fuzzy Search

2011-07-18 Thread steffen_kmt

iorixxx wrote:
> 
> q=test~0.8
> 

do you add ~0.8 in the query (http) or in the solrconfig.xml (like field~0.8)?

is "test" the fieldName or a search string?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-RequestHandler-adn-Fuzzy-Search-tp3178747p3179643.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best practices for Calculationg QPS

2011-07-18 Thread Koji Sekiguchi

(11/07/18 23:35), Siddhesh Shirode wrote:

Hi Everyone,

I would like to know the best practices or  best tools for Calculating QPS  in 
Solr. Thanks.


Just an FYI:

Admin GUI > STATISTICS > QUERY gives you avgRequestsPerSecond for each request 
handler.

koji
--
http://www.rondhuit.com/en/


Re: Dismax RequestHandler adn Fuzzy Search

2011-07-18 Thread Ahmet Arslan
> > 
> > q=test~0.8
> > 
> 
> do you add ~0.8 in the query (http) or in the
> solrconfig.xml (like  name="q">field~0.8)?

mostly in the http.

> is "test" the fieldName or a search string?

search string. Do you have another use case?




Re: NRT and commit behavior

2011-07-18 Thread Mark Miller
I've written a blog post on some of the recent improvements that explains 
things a bit:

http://www.lucidimagination.com/blog/2011/07/11/benchmarking-the-new-solr-%E2%80%98near-realtime%E2%80%99-improvements/

On Jul 18, 2011, at 10:53 AM, Nicholas Chase wrote:

> Very glad to hear that NRT is finally here!  But my question is this: will 
> things still come to a standstill during a commit?
> 
> Thanks...
> 
>   Nick

- Mark Miller
lucidimagination.com










Re: [Announce] Solr 3.3 with RankingAlgorithm NRT capability, very high performance 10000 tps

2011-07-18 Thread Mark Miller
Hey Nagendra - I don't mind seeing these external project announces here 
(though you might keep Solr related announces off the Lucene user list), but 
please word these announces so that users are not confused that this is an 
Apache release, and that it is an external project built on top of Apache Solr.

Thanks,

- Mark

On Jul 18, 2011, at 10:43 AM, Nagendra Nagarajayya wrote:

> Hi!
> 
> I would like to announce the availability of Solr 3.3 with RankingAlgorithm 
> and Near Real Time (NRT) search capability now. The NRT performance is very 
> high, 10,000 documents/sec with the MBArtists 390k index. The NRT 
> functionality allows you to add documents without the IndexSearchers being 
> closed or caches being cleared. A commit is also not needed with the document 
> update. Searches can run concurrently with document updates. No changes are 
> needed except for enabling the NRT through solrconfig.xml.
> 
> RankingAlgorithm query performance is now 3x times faster than before and is 
> exposed as the Lucene API. This release also adds supports for the last 
> document with a unique id to be searchable and visible in search results in 
> case of multiple updates of the document.
> 
> I have a wiki page that describes NRT performance in detail and can be 
> accessed from here:
> 
> http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver3.x
> 
> You can download Solr 3.3 with RankingAlgorithm (NRT version) from here:
> 
> http://solr-ra.tgels.org
> 
> I would like to invite you to give this version a try as the performance is 
> very high.
> 
> Regards,
> 
> - Nagendra Nagarajayya
> http://solr-ra.tgels.org
> http://rankingalgorithm.tgels.org
> 
> 
> 

- Mark Miller
lucidimagination.com










Join performance?

2011-07-18 Thread Kanduru, Ajay (NIH/NLM/LHC) [C]
I am trying to optimize performance of solr with our collection. The collection 
has 208M records with index size of about 80GB. The machine has 16GB and I am 
allocating about 14GB to solr.

I am using self join statement in filter query like this:
q=(general search term)
fq={!join from=join_field to=join_field}(field1:(field1 search term) AND 
field2:(field2 search term) AND field3:(field3 search term))
...

Field definitions:
join_field: string type (Has ~27K terms)
field1: text type
field2: double type
field3: string type

The response time of qf with join is about ten times compared to qf without 
join (~10 sec vs ~1 sec). Is this something on expected lines? In general what 
parameters, if any, can be tweaked? The intention is to use such multiple 
filter queries, hence the need for optimization. Sharding and more horse power 
are obvious solutions, but more interested in optimizing for a given host and a 
given data collection.

Appreciate any insight in this regard.

-Ajay


Re: I found a sorting bug in solr/lucene

2011-07-18 Thread Chris Hostetter

: When I try to sort by a column with a colon in it like
: "scores:rails_f",  solr has cutoff the column name from the colon
: forward so "scores:rails_f" becomes "scores"

Yes, this bug was recently reported against the 3.x line, but no fix has 
yet been identified...

https://issues.apache.org/jira/browse/SOLR-2606

: Can anyone else confirm this is a bug. Is this in lucene or solr? I believe
: the issue resides in solr.

it's specific to the param parsing, likely due to the addition of 
supporting functions in the sort param.


-Hoss


Re: Join performance?

2011-07-18 Thread Yonik Seeley
On Mon, Jul 18, 2011 at 12:48 PM, Kanduru, Ajay (NIH/NLM/LHC) [C]
 wrote:
> I am trying to optimize performance of solr with our collection. The 
> collection has 208M records with index size of about 80GB. The machine has 
> 16GB and I am allocating about 14GB to solr.
>
> I am using self join statement in filter query like this:
> q=(general search term)
> fq={!join from=join_field to=join_field}(field1:(field1 search term) AND 
> field2:(field2 search term) AND field3:(field3 search term))
> ...
>
> Field definitions:
> join_field: string type (Has ~27K terms)
> field1: text type
> field2: double type
> field3: string type
>
> The response time of qf with join is about ten times compared to qf without 
> join (~10 sec vs ~1 sec). Is this something on expected lines?

Yep... the initial join implementation is O(nterms), so it's expected
to be slow when the number of unique terms is high.
Given your index size, it would have almost expected it to be slower!

As with faceting, I expect there to be other implementations in the
future, but nothing right now...

-Yonik
http://www.lucidimagination.com

> In general what parameters, if any, can be tweaked? The intention is to use 
> such multiple filter queries, hence the need for optimization. Sharding and 
> more horse power are obvious solutions, but more interested in optimizing for 
> a given host and a given data collection.
>
> Appreciate any insight in this regard.
>
> -Ajay
>


Solr and External Fields

2011-07-18 Thread Jamie Johnson
I recently modified the DefaultSolrHighlighter to support external
fields, but is there a way to do this for solr itself?  I'm looking to
store a field in an external store and give Solr access to that field.
 Where in Solr would I do this?


Re: Extending Solr Highlighter to pull information from external source

2011-07-18 Thread Jamie Johnson
I haven't seen any interest in this, but for anyone following, I
updated the alternateField logic to support pulling from the external
field if available.  Would be useful to know how to get solr to use
this external field provider in general so we wouldn't have to modify
the highlighter at all, just whatever was building the document.

On Fri, Jul 15, 2011 at 5:08 PM, Jamie Johnson  wrote:
> I tried the patch at SOLR-1397 but it didn't work as I'd expect.
>
> 
>    
>        
>            Test subject message
>        
>        0
>        29
>    
> 
> The start position is right, but the end position seems to be the
> length of the field.
>
>
> On Fri, Jul 15, 2011 at 4:25 PM, Jamie Johnson  wrote:
>> I added the highlighting code I am using to this JIRA
>> (https://issues.apache.org/jira/browse/SOLR-1397).  Afterwards I
>> noticed this JIRA (https://issues.apache.org/jira/browse/SOLR-1954)
>> which talks about another solution.  I think David's patch would have
>> worked equally well for my problem, just would require later doing the
>> highlighting on the clients end.  I'll have to give this a whirl over
>> the weekend.
>>
>> On Fri, Jul 15, 2011 at 3:55 PM, Jamie Johnson  wrote:
>>> Boy it's been a long time since I first wrote this, sorry for the delay
>>>
>>> I think I have this working as I expect with a test implementation.  I
>>> created the following interface
>>>
>>> public interface SolrExternalFieldProvider extends 
>>> NamedListInitializedPlugin {
>>>        public String[] getFieldContent(String key, SchemaField field,
>>> SolrQueryRequest request);
>>> }
>>>
>>> I then added to DefaultSolrHighlighter the following:
>>>
>>> in init()
>>>
>>> SolrExternalFieldProvider defaultProvider =
>>> solrCore.initPlugins(info.getChildren("externalFieldProvider") ,
>>> externalFieldProviders,SolrExternalFieldProvider.class,null);
>>>            if(defaultProvider != null){
>>>                externalFieldProviders.put("", defaultProvider);
>>>                externalFieldProviders.put(null, defaultProvider);
>>>            }
>>> then in doHighlightByHighlighter I added the following
>>>
>>> if(schemaField != null && !schemaField.stored()){
>>>                        SolrExternalFieldProvider externalFieldProvider =
>>> this.getExternalFieldProvider(fieldName, params);
>>>                        if(externalFieldProvider != null){
>>>                    SchemaField keyField = schema.getUniqueKeyField();
>>>                    String key = doc.getValues(keyField.getName())[0];  //I
>>> know this field exists and is not multivalued
>>>                    if(key != null && key.length() > 0){
>>>                        docTexts = externalFieldProvider.getFieldContent(key,
>>> schemaField, req);
>>>                    }
>>>                        } else {
>>>                                docTexts = new String[]{};
>>>                        }
>>>                }
>>>
>>>                else {
>>>                docTexts = doc.getValues(fieldName);
>>>        }
>>>
>>>
>>> This worked for me.  I needed to include the req because there are
>>> some additional thing that I need to have from it, I figure this is
>>> probably something else folks will need as well.  I tried to follow
>>> the pattern used for the other highlighter pieces in that you can have
>>> different externalFieldProviders for each field.  I'm more than happy
>>> to share the actual classes with the community or add them to one of
>>> the JIRA issues mentioned below, I haven't done so yet because I don't
>>> know how to build patches.
>>>
>>> On Mon, Jun 20, 2011 at 11:47 PM, Michael Sokolov  
>>> wrote:
 I found https://issues.apache.org/jira/browse/SOLR-1397 but there is not
 much going on there

 LUCENE-1522 has a lot of
 fascinating discussion on this topic though


> There is a couple of long lived issues in jira for this (I'd like to try
> to search
> them, but I couldn't access jira now).
>
> For FVH, it is needed to be modified at Lucene level to use external data.
>
> koji

 Koji - is that really so?  It appears to me that would could extend
 BaseFragmentsBuilder and override

 createFragments(IndexReader reader, int docId,
      String fieldName, FieldFragList fieldFragList, int maxNumFragments,
      String[] preTags, String[] postTags, Encoder encoder )

 providing a version that retrieves text from some external source rather
 than from Lucene fields.

 It sounds to me like a really useful modification in Lucene core would be 
 to
 retain match points that have already been computed during scoring so the
 highlighter doesn't have to attempt to reinvent all that logic!  This has
 all been discussed at length in LUCENE-1522 already, but is there is any
 recent activity?

 My hope is that since (at least in my test) search code seems to spend 80%
>>

Specify the length for returned highlighted fields

2011-07-18 Thread Jamie Johnson
Is there a way to specify the length of the text that should come back
from the highlighter?  For instance I have a field that is 500k, I
want only the first 100 characters.  I don't see anything like this
now, does it exist?


Solr search starting with 1 character spin endlessly

2011-07-18 Thread Timothy Tagge
Solr version:  1.4.1

I'm having some trouble with certain queries run against my Solr
index.  When a query starts with a single letter followed by a space,
followed by another search term, the query runs endlessly and never
comes back.  An example problem query string...

/customer/select/?q=name%3At+j+reynolds&version=2.2&start=0&rows=10&indent=on


However, if I switch the order of the search values, putting the
longer search term before the single character, I get quick, accurate
results

/customer/select/?q=name%3AReynolds+T+J&version=2.2&start=0&rows=10&indent=on

I've defined my name field as text.


Where text is defined as

  






  
  






  


Am I making a simple mistake somewhere?

Thanks for your help.

Tim T.


Re: Solr search starting with 1 character spin endlessly

2011-07-18 Thread Yonik Seeley
On Mon, Jul 18, 2011 at 3:44 PM, Timothy Tagge  wrote:
> Solr version:  1.4.1
>
> I'm having some trouble with certain queries run against my Solr
> index.  When a query starts with a single letter followed by a space,
> followed by another search term, the query runs endlessly and never
> comes back.  An example problem query string...
>
> /customer/select/?q=name%3At+j+reynolds&version=2.2&start=0&rows=10&indent=on
>
>
> However, if I switch the order of the search values, putting the
> longer search term before the single character, I get quick, accurate
> results
>
> /customer/select/?q=name%3AReynolds+T+J&version=2.2&start=0&rows=10&indent=on


Note that a query of name:t j reynolds
is actually equivalent to name:t default_field:j default_field:reynolds

You probably want a query of name:"t j reynolds"
or name:(t j reynolds)

The query probably doesn't hang, but may just take a long time if you
have a big index, or if you don't have enough RAM and the default
field isn't one that is normally searched (causing much real disk IO
to satisfy the query).

-Yonik
http://www.lucidimagination.com


> I've defined my name field as text.
>  required="true" />
>
> Where text is defined as
> 
>      
>        
>         synonyms="customer-synonyms.txt" ignoreCase="true" expand="true"/>
>                        ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>         generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>        
>         language="English" protected="protwords.txt"/>
>      
>      
>        
>         synonyms="customer-synonyms.txt" ignoreCase="true" expand="true"/>
>                        ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"/>
>         generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>        
>         language="English" protected="protwords.txt"/>
>      
>    
>
> Am I making a simple mistake somewhere?
>
> Thanks for your help.
>
> Tim T.
>


Searching for strings

2011-07-18 Thread Chip Calhoun
Is there a way to search for a specific string using Solr, either by putting it 
in quotes or by some other means?  I haven't been able to do this, but I may be 
missing something.

Thanks,
Chip


RE: Analysis page output vs. actually getting search matches, a discrepency?

2011-07-18 Thread Robert Petersen
OK I did what Hoss said, it only confirms I don't get a match when I
should and that the query parser is doing the expected.  Here are the
details for one test sku.

My analysis page output is shown in my email starting this thread and
here is my query debug output.  This absolutely should match but
doesn't.  Both the indexing side and the query side are splitting on
case changes.  This actually isn't a problem for any of our other
content, for instance there is no issue searching for 'VideoSecu'.
Their products come up fine in our searches regardless of casing in the
query.  Only SterlingTek's products seem to be causing us issues.

Indexed content has camel case, stored in the text field 'moreWords':
"SterlingTek's NB-2LH 2 Pack Batteries + Charger Combo for Canon DC301"
Search term not matching with camel case: "SterlingTek's"
Search term matching if no case changes: "Sterlingtek's"

Indexing:

Searching:


Thanks

http://ssdevrh01.buy.com:8983/solr/1/select?indent=on&version=2.2&q=
SterlingTek%27s&fq=&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&
debugQuery=on&explainOther=sku%3A216473417&hl=on&hl.fl=&echoHandler=true
&adf



0
4
org.apache.solr.handler.component.SearchHandler

 sku:216473417 
 on
 true
 
 standard
 on
 1
 2.2
 *,score
 on
 0
 SterlingTek's
 standard
 





SterlingTek's
SterlingTek's
PhraseQuery(moreWords:"sterling tek")
moreWords:"sterling tek"

sku:216473417


0.0 = fieldWeight(moreWords:"sterling tek" in 76351), product of:
  0.0 = tf(phraseFreq=0.0)
  19.502613 = idf(moreWords: sterling=1 tek=72)
  0.15625 = fieldNorm(field=moreWords, doc=76351)



LuceneQParser







-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Friday, July 15, 2011 4:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Analysis page output vs. actually getting search matches, a
discrepency?


: Subject: Analysis page output vs. actually getting search matches,
: a discrepency?

99% of the time when people ask questions like this, it's because of 
confusion about how/when QueryParsing comes into play (as opposed to 
analysis) -- analysis.jsp only shows you part of the equation, it
doesn't 
know what query parser you are using.

you mentioned that you aren't getting matches when you expect them, and 
you provided the analysis.jsp output, but you didn't mention anything 
about the request you are making, the query parser used etc  it
owuld 
be good to know the full query URL, along with the debugQuery output 
showing the final query toString info.

if that info doesn't clear up the discrepency, you should also take a
look 
at the explainOther info for the doc that you expect to match that isn't

-- if you still aren't sure what's going on, post all of that info to 
solr-user and folks can probably help you make sense of it.

(all that said: in some instances this type of problem is simply that 
someone changed the schema and didn't reindex everything, so the indexed

terms don't really match what you think they do)


-Hoss


Re: Searching for strings

2011-07-18 Thread Rob Casson
chip,

gonna need more information about your particular analysis chain,
content, and example searches to give a better answer, but phrase
queries (using quotes) are supported in both the standard and dismax
query parsers

that being said, lots of things may not match a person's idea of an
exact string...stopwords, synonyms, slop, etc.

cheers,
rob

On Mon, Jul 18, 2011 at 5:25 PM, Chip Calhoun  wrote:
> Is there a way to search for a specific string using Solr, either by putting 
> it in quotes or by some other means?  I haven't been able to do this, but I 
> may be missing something.
>
> Thanks,
> Chip
>


Re: Payload doesn't apply to WordDelimiterFilterFactory-generated tokens

2011-07-18 Thread Chris Hostetter

: It seems that the payloads are applied only to the original word that I
: index and the WordDelimiterFilter doesn't apply the payloads to the tokens
: it generates.

I believe you are correct.  I think the general rule for most TokenFilters 
that you will find in Lucene/Solr is that they don't typically "clone" 
attributes (like payloads) when generating new Tokens -- it may be what 
you want in your use case, but there's no hard & fast rule that it would 
always make sense to do so.

If you'd like to opne a jira (or submit a patch) i suspect a new 
"clonePayload" attribute could be added to the WDF Factory to drive this 
kind of behavior so people with use cases where it made sense could enable 
this -- but i haven't looked at that code (or the current TokenStream API) 
enough to have any idea how hard it would be.



-Hoss


Re: Max Rows

2011-07-18 Thread Chris Hostetter

: Works like a charm, but there is one problem, the maxrows attribute is set
: to 10, this means 10 results per page, but when you put several collections
: in the attribute, what it does is that adds 10 results per page per
: collections, so if a have 4 comma separeted collections the max rows forces
: itself to 40 results per page instead of the 10 total I¹m aiming to.
: 
: My question is: Is there a way to fix this? Is there a way to make the
: maxrow attribute global and prevent it to add more rows per collection?

I don't know anything about the cold fusion client you are asking about, 
but this sounds like it is probably a bug there.

if you can post some details about what the actual requests being made 
to Solr are, perhaps someone can spot a cause for the problem -- but i 
would start by asking about this in whatever user forum is available for 
cold fusion.



-Hoss

defType argument weirdness

2011-07-18 Thread Naomi Dushay
I found a weird behavior with the Solr  defType argument, perhaps with  
respect to default queries?


 defType=dismax&q=*:*  no hits

 q={!defType=dismax}*:* hits

 defType=dismax hits


Here is the request handler, which I explicitly indicate:



lucene


has_model_s
AND


 2<-1 5<-2 6<90% 
*:*
		id^0.8 id_t^0.8 title_t^0.3 mods_t^0.2 textstr>
		id^0.9  id_t^0.9 title_t^0.5 mods_t^0.2 textstr>

100
0.01



Solr Specification Version: 1.4.0
Solr Implementation Version: 1.4.0 833479 - grantingersoll -  
2009-11-06 12:33:40

Lucene Specification Version: 2.9.1
Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25

- Naomi


Re: solr scale on trie fields

2011-07-18 Thread Chris Hostetter
There are a few things here that i think you might be missunderstanding...

: function, but i read in solr book (Solr 1.4 enterprise search server by Eric
: Pugh and David Smiley) that "*scale will traverse the entire document set
: and evaluate the function to determine the smallest and largest values for
: each query invocation, and it is not cached " *. What makes me ask two
: questions:
: 
:1. Is this also true for TrieFields (such as solr.TrieIntField), because
:as far as I understand it suppose to have the values sorted in some manner,
:so checking for the min and max val should happen in constant time
:complexity.

Trie fields are encoded such that the "min" numeric value gets the "min"
Term value, and the "max" numeric value gets the "max" Term value, but  
they are still just Terms, so finding the "max" Term value does require a  
scan of the TermEnumerator.

but that's not what we're talking about with the "scale" function.  

scale(...) is generic -- it can be used the scale the output of *any* 
function, not just field values, so it can't use generic Term seeking 
code, because a client could specify "scale(map(myTrieField,0,0,5),1,10)" 
just as easily as they could write "scale(myTrieField,1,10)"

:2. why are the results are not cached?!?! is there any way to defined
:them to be cached?

In the general case, it's not clear  how/when/where this information could 
be cached -- in your use case it may seem straight forward: you are 
scaling the values of asingle field, so you think the min/max value for 
that ield should be cached, but as i mentioned functions in solr are 
entirely general purpose.  caching the min/max values for every arbitrary 
function that might ever be used as the input to the scale function isn't 
really a good idea.

That said: there would likely be some definite value in adding new 
"minterm" and "maxterm" functions that would take as argument explicit 
field names (not general functions) which would likely be ableto more 
efficiently compute those values (and then be more efficient when scaling) 
but as mentioned there is still the isue of finding the "max" term value 
requiring iteration.

some work is being done at a lower level to better encode these kinds of 
field/term stats in the index, and i suspect you'll see people more eager 
to add functions like that when that underlying work is done.



-Hoss


Re: difference between shard and core in solr

2011-07-18 Thread Chris Hostetter

http://wiki.apache.org/solr/SolrTerminology



: hi ,
: 
: i just wanna be clear in the concepts of core and shard ?
: 
: a single core is an index with same schema  , is this wat core really is ?
: 
: can a single core contain two separate indexes with different schema in it ?
: 
: Is a shard  refers to a collection of index in a single physical machine
: ?can a single core be presented in different shards ?
: 
: 
: 
: 
: 
: 
: -- 
: 
: -JAME
: 

-Hoss


Re: Stored Field

2011-07-18 Thread Erick Erickson
Well, it all depends upon what you mean by "size" ...

This page http://lucene.apache.org/java/3_0_2/fileformats.html#file-names
explains what goes where in the files created by Lucene. The
point is that the raw text (i.e. *stored* data) is put in separate fles
from the indexed (i.e. searched) data. So search times won't be
affected. I'm pretty sure (but not 100%) that the verbatim text is
just stored, with no reference to other possible usages in other
docs.

But this doesn't really affect searching. The primary impact will be
on replicating the index since you're copying more bytes.

Best
Erick

On Thu, Jul 14, 2011 at 7:36 AM, lee carroll
 wrote:
> Hi
> Do Stored field values get added to the index for each document field
> combination literally or is a pointer used ?
> I've been reading http://lucene.apache.org/java/2_4_0/fileformats.pdf
> and I think thats the case but not 100% so thought I'd ask.
>
> In logical terms for stored fields do we get this sort of storage:
>
> doc0 field0 > "xxx xx xx xx xx xx xx xx xx 
> xxx"
> doc0field1 > "yyy yy yy yy yy yy yy yy yyy"
> doc1 field0 > "xxx xx xx xx xx xx xx xx xx 
> xxx"
> doc1field1 > "yyy yy yy yy yy yy yy yy yyy"
>
> or this:
>
> doc0 field0 > {1}
> doc0field1 > {2}
> doc1 field0 > {1}
> doc1field1 > {2}
>
> val1 >"xxx xx xx xx xx xx xx xx xx xxx"
> val2 >"yyy yy yy yy yy yy yy yy yyy"
>
> I'm trying to understand possible impact of storing fields which have
> a small set of repeating values, hoping it would not have an impact on
> file size. But I'm now think it will?
>
> thanks in advance
>


Re: Data Import from a Queue

2011-07-18 Thread Erick Erickson
This is a really cryptic problem statement.

you might want to review:

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Fri, Jul 15, 2011 at 1:52 PM, Brandon Fish  wrote:
> Does anyone know of any existing examples of importing data from a queue
> into Solr?
>
> Thank you.
>


Re: Index rows with NULL value

2011-07-18 Thread Erick Erickson
Please provide some more context here, there's nothing
really to go on. It might help to review:

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Fri, Jul 15, 2011 at 9:58 PM, Ruixiang Zhang  wrote:
> Hi
>
> It seems that solr does not index a row when some column of this row has
> NULL value.
> How can I make solr index these rows?
>
> Thanks
> Ruixiang
>


RE: ' invisible ' words

2011-07-18 Thread Robert Petersen
Read my thread " RE: Analysis page output vs. actually getting search
matches, a discrepancy?" and see if it is not somewhat like your
problem... even if not, there might be something to help as to how to
figure out what is going on in your case...

-Original Message-
From: deniz [mailto:denizdurmu...@gmail.com] 
Sent: Sunday, July 17, 2011 6:24 PM
To: solr-user@lucene.apache.org
Subject: RE: ' invisible ' words

Hi Jagdish,

thank oyu very much for the tool that you have sent... It is really
useful
for this problem... 

After using the tool, I just got interesting results... for some words;
when
i use the tool. it returns the matched docs, on the other hand when i
use
solr admin page to make a search i cant get any matches... with the same
words... now i am more confused and honestly have no idea about what to
do... 

anyone has ever faced such a problem?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context:
http://lucene.472066.n3.nabble.com/invisible-words-tp3158060p3177907.htm
l
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ' invisible ' words

2011-07-18 Thread Erick Erickson
Deniz:

Can you create a self-contained test case that illustrates the problem?

In reality, I suspect that you're doing something ever-so-slightly different,
didn't remove your index between tests, whatever (I know I've gotten
myself completely messed up after working on something for hours!).
Making a junit test that illustrated the problem very often forces one to
focus on the individual steps and Presto! the problem solves itself.

Of course, if you can create the test it might illustrated a bug that needs to
be fixed.

Best
Erick

On Sun, Jul 17, 2011 at 9:23 PM, deniz  wrote:
> Hi Jagdish,
>
> thank oyu very much for the tool that you have sent... It is really useful
> for this problem...
>
> After using the tool, I just got interesting results... for some words; when
> i use the tool. it returns the matched docs, on the other hand when i use
> solr admin page to make a search i cant get any matches... with the same
> words... now i am more confused and honestly have no idea about what to
> do...
>
> anyone has ever faced such a problem?
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/invisible-words-tp3158060p3177907.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: XInclude Multiple Elements

2011-07-18 Thread Chris Hostetter

: I see this post:
: 
http://lucene.472066.n3.nabble.com/including-external-files-in-config-by-corename-td698324.html
: that implies you can use #xpointer(/*/node()) to get all elements of
: the root node (like if I changed my example to only have one include,
: and just used multiple files, which is fine if it works), however my
: testing gave this error: ERROR org.apache.solr.core.CoreContainer -
: org.xml.sax.SAXParseException: Fragment identifiers must not be used.
: The 'href' attribute value
: '../../conf/solrconfigIncludes.xml#xpointer(root/node())' is not
: permitted.  I tried several other variations of trying to come up with
: pointers using node() or *, none of which worked.

Can you post the details of your JVM / ServletContainer and the full stack 
trace of the exception?  My understanding is that fragment identifiers are 
a mandatory part of the xinclude/xpointer specs.

It would also be good to know if you tried the explicit "xpointer" 
attribute approach on the xinclude syntax also mentioned in that thread...

I think it owuld be something like...

 


In general, Solr really isn't doing anything special with XInclude ... 
it's all just delegated to the XML Libraries.  You might want to start by 
ignoring solr, and reading up on XInclude/XPointer tutorials in general, 
and experimenting with command line xml tools to figure out the syntax you 
need to get the "final" xml structures you want -- then aply that 
knowledge to the solr config files.


-Hoss


Re: Best practices for Calculationg QPS

2011-07-18 Thread Erick Erickson
Measure. On an index with your real data with real queries ...

Being a smart-aleck aside, using something like jMeter is useful. The
basic idea is that you can use such a tool to fire queries at a Solr
index, configuring it with some number of threads that all run
in parallel, and keep upping the number of threads until the server
falls over.

But it's critical that you use your real data. All of it (i.e. don't run with
a partial set of data and expect the results to hold when you add the
rest of the data). It's equally critical that you use real queries that
reflect what the users actually send at your index.

Of course, with a new app, getting "real" user queries isn't possible,
and you're forced to guess. Which is much better than nothing, but
you need to monitor what happens when real users do start using your
system...

Do be aware that what I have seen when doing this is that your
QPS will plateau, but the response time for each query will
increase at some threshold...

FWIW
Erick

On Mon, Jul 18, 2011 at 10:35 AM, Siddhesh Shirode
 wrote:
> Hi Everyone,
>
> I would like to know the best practices or  best tools for Calculating QPS  
> in Solr. Thanks.
>
> Thanks,
> SIDDHESH SHIRODE
> Technical Consultant
>
> M +1 240 274 5183
>
> SEARCH TECHNOLOGIES
> THE EXPERT IN THE SEARCH SPACE
> www.searchtechnologies.com
>
>


Re: Specify the length for returned highlighted fields

2011-07-18 Thread Erick Erickson
Does hl.fragsize work in your case?

Best
Erick

On Mon, Jul 18, 2011 at 3:16 PM, Jamie Johnson  wrote:
> Is there a way to specify the length of the text that should come back
> from the highlighter?  For instance I have a field that is 500k, I
> want only the first 100 characters.  I don't see anything like this
> now, does it exist?
>


Re: Question about optimization

2011-07-18 Thread Chris Hostetter

: I saw this in the Solr wiki : "An un-optimized index is going to be *at
: least* 10% slower for un-cached queries."
: Is this still true? I read somewhere that recent versions of Lucene where
: less sensitive to un-ptimized indexed than is the past...

correct.  I've removed that specific statement ... definitely missleading.

: Having 50 000 new (or updated) documents coming to my index every day, would
: a once-a-day optimization be sufficient?

how often you optimize doesn't really matter -- what matters is that you 
optimize *if* you know you aren't going to be getting more changes soon.  
If you get one updated ever minute 24 hours a day, then optimizing once a 
night isn't going to help you out any more then optimizing once a week or 
once a year.

if you do 50K updates a day divided into two batches, one at midnight and 
one at noon, then optimizing immediately after each of those batches might 
give you some noticable search speed improvements the rest of the day.

-Hoss


Re: Analysis page output vs. actually getting search matches, a discrepency?

2011-07-18 Thread Erick Erickson
Hmmm, is there any chance that you're stemming one place and
not the other?
And I infer from your output that your default search field is
"moreWords", is that true and expected?

You might use luke or the TermsComponent to see what's actually in
the index, I'm going to guess that you'll find "sterl" but not "sterling" as
an indexed term and your problem is stemming, but that's
a shot in the dark.

Best
Erick

On Mon, Jul 18, 2011 at 5:37 PM, Robert Petersen  wrote:
> OK I did what Hoss said, it only confirms I don't get a match when I
> should and that the query parser is doing the expected.  Here are the
> details for one test sku.
>
> My analysis page output is shown in my email starting this thread and
> here is my query debug output.  This absolutely should match but
> doesn't.  Both the indexing side and the query side are splitting on
> case changes.  This actually isn't a problem for any of our other
> content, for instance there is no issue searching for 'VideoSecu'.
> Their products come up fine in our searches regardless of casing in the
> query.  Only SterlingTek's products seem to be causing us issues.
>
> Indexed content has camel case, stored in the text field 'moreWords':
> "SterlingTek's NB-2LH 2 Pack Batteries + Charger Combo for Canon DC301"
> Search term not matching with camel case: "SterlingTek's"
> Search term matching if no case changes: "Sterlingtek's"
>
> Indexing:
>         generateWordParts="1"
>        generateNumberParts="1"
>        catenateWords="1"
>        catenateNumbers="1"
>        catenateAll="0"
>        splitOnCaseChange="1"
>        preserveOriginal="0"
> />
> Searching:
>          generateWordParts="1"
>         generateNumberParts="1"
>         catenateWords="0"
>         catenateNumbers="0"
>         catenateAll="0"
>         splitOnCaseChange="1"
>         preserveOriginal="0"
> />
>
> Thanks
>
> http://ssdevrh01.buy.com:8983/solr/1/select?indent=on&version=2.2&q=
> SterlingTek%27s&fq=&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&
> debugQuery=on&explainOther=sku%3A216473417&hl=on&hl.fl=&echoHandler=true
> &adf
>
> 
> 
> 0
> 4
>  name="handler">org.apache.solr.handler.component.SearchHandler
> 
>  sku:216473417
>  on
>  true
>  
>  standard
>  on
>  1
>  2.2
>  *,score
>  on
>  0
>  SterlingTek's
>  standard
>  
> 
> 
> 
> 
> 
> SterlingTek's
> SterlingTek's
> PhraseQuery(moreWords:"sterling tek")
> moreWords:"sterling tek"
> 
> sku:216473417
> 
> 
> 0.0 = fieldWeight(moreWords:"sterling tek" in 76351), product of:
>  0.0 = tf(phraseFreq=0.0)
>  19.502613 = idf(moreWords: sterling=1 tek=72)
>  0.15625 = fieldNorm(field=moreWords, doc=76351)
>
> 
> 
> LuceneQParser
> 
> 
> 
> 
>
>
>
> -Original Message-
> From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
> Sent: Friday, July 15, 2011 4:36 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Analysis page output vs. actually getting search matches, a
> discrepency?
>
>
> : Subject: Analysis page output vs. actually getting search matches,
> :     a discrepency?
>
> 99% of the time when people ask questions like this, it's because of
> confusion about how/when QueryParsing comes into play (as opposed to
> analysis) -- analysis.jsp only shows you part of the equation, it
> doesn't
> know what query parser you are using.
>
> you mentioned that you aren't getting matches when you expect them, and
> you provided the analysis.jsp output, but you didn't mention anything
> about the request you are making, the query parser used etc  it
> owuld
> be good to know the full query URL, along with the debugQuery output
> showing the final query toString info.
>
> if that info doesn't clear up the discrepency, you should also take a
> look
> at the explainOther info for the doc that you expect to match that isn't
>
> -- if you still aren't sure what's going on, post all of that info to
> solr-user and folks can probably help you make sense of it.
>
> (all that said: in some instances this type of problem is simply that
> someone changed the schema and didn't reindex everything, so the indexed
>
> terms don't really match what you think they do)
>
>
> -Hoss
>


Re: defType argument weirdness

2011-07-18 Thread Erick Erickson
What are qf_dismax and pf_dismax? They are meaningless to
Solr. Try adding &debugQuery=on to your URL and you'll
see the parsed query, which helps a lot here

If you change these to the proper dismax values (qf and pf)
you'll get beter results. As it is, I think you'll see output like:

+() ()

showing that your query isn't actually going against
any fields

Best
Erick

On Mon, Jul 18, 2011 at 7:15 PM, Naomi Dushay  wrote:
> I found a weird behavior with the Solr  defType argument, perhaps with
> respect to default queries?
>
>  defType=dismax&q=*:*      no hits
>
>  q={!defType=dismax}*:*     hits
>
>  defType=dismax         hits
>
>
> Here is the request handler, which I explicitly indicate:
>
> 
>        
>                lucene
>
>                
>                has_model_s
>                AND
>
>                
>                 2<-1 5<-2 6<90% 
>                *:*
>                id^0.8 id_t^0.8 title_t^0.3 mods_t^0.2
> text
>                id^0.9  id_t^0.9 title_t^0.5 mods_t^0.2
> text
>                100
>                0.01
> 
>
>
> Solr Specification Version: 1.4.0
> Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06
> 12:33:40
> Lucene Specification Version: 2.9.1
> Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25
>
> - Naomi
>


Re: Best practices for Calculationg QPS

2011-07-18 Thread Lance Norskog
Easiest way to count QPS:

Take one solr log file. Make it have date stamps and log entries on
the same line.
Grab all lines with 'qTime='.
Strip these lines of all text after the timestamp.
Run this Unix program to get counts in line of how many times each
timestamp appears in a row:
uniq -c

Works a treat. After this I make charts in Excel. Use the "X-Y" or
"Scatter plot" chart. Make the timestamp the X dimension, and the
count the Y dimension. This gets you a plot of QPS.

On Mon, Jul 18, 2011 at 5:08 PM, Erick Erickson  wrote:
> Measure. On an index with your real data with real queries ...
>
> Being a smart-aleck aside, using something like jMeter is useful. The
> basic idea is that you can use such a tool to fire queries at a Solr
> index, configuring it with some number of threads that all run
> in parallel, and keep upping the number of threads until the server
> falls over.
>
> But it's critical that you use your real data. All of it (i.e. don't run with
> a partial set of data and expect the results to hold when you add the
> rest of the data). It's equally critical that you use real queries that
> reflect what the users actually send at your index.
>
> Of course, with a new app, getting "real" user queries isn't possible,
> and you're forced to guess. Which is much better than nothing, but
> you need to monitor what happens when real users do start using your
> system...
>
> Do be aware that what I have seen when doing this is that your
> QPS will plateau, but the response time for each query will
> increase at some threshold...
>
> FWIW
> Erick
>
> On Mon, Jul 18, 2011 at 10:35 AM, Siddhesh Shirode
>  wrote:
>> Hi Everyone,
>>
>> I would like to know the best practices or  best tools for Calculating QPS  
>> in Solr. Thanks.
>>
>> Thanks,
>> SIDDHESH SHIRODE
>> Technical Consultant
>>
>> M +1 240 274 5183
>>
>> SEARCH TECHNOLOGIES
>> THE EXPERT IN THE SEARCH SPACE
>> www.searchtechnologies.com
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Deleted docs in IndexWriter Cache (NRT related)

2011-07-18 Thread Nagendra Nagarajayya
Thanks Pravesh! But this is NRT related so commit is not called to 
update a document. The documents added are available for searches 
immediately after update and commit is not needed. A commit may be 
scheduled once in about 15 mins or as needed.


Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org  
http://rankingalgorithm.tgels.org  



On 7/17/2011 10:12 PM, pravesh wrote:

commit would be the safest way for making sure the deleted content doesn't
show up.

Thanx
Pravesh

--
View this message in 
context:http://lucene.472066.n3.nabble.com/Deleted-docs-in-IndexWriter-Cache-NRT-related-tp3177877p3178179.html
Sent from the Solr - User mailing list archive at Nabble.com.






Re: NRT and commit behavior

2011-07-18 Thread Nagendra Nagarajayya
From one of the users of NRT, their system was freezing with commits at 
about 1.5 million docs due to the frequency of commits but with NRT 
(Solr  with RankingAlgorithm) update document performance and a commit 
interval of about 15 mins they no longer have the freeze problem.


Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org  
http://rankingalgorithm.tgels.org  



On 7/18/2011 7:53 AM, Nicholas Chase wrote:
Very glad to hear that NRT is finally here!  But my question is this: 
will things still come to a standstill during a commit?


Thanks...

  Nick






Re: [Announce] Solr 3.3 with RankingAlgorithm NRT capability, very high performance 10000 tps

2011-07-18 Thread Nagendra Nagarajayya
Thanks Mark! I made the earlier implementation of NRT with 1.4.1 
available to Solr through a JIRA issue:


 https://issues.apache.org/jira/browse/SOLR-2568
( I had made available the implementation details through a paper 
published at 
http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf which 
includes the source, modifications, etc.)


I plan to make available the current implementation of NRT with Solr 
3.2/3.3 and RankingAlgorithm as a patch. This implementation has very 
high performance (1 docs/sec) and in fact on my system is faster 
than the normal update/commit.
There are some issues not yet resolved as to when to invalidate/update 
the cache but this seems to be not a very easy problem.


Regarding the Lucene list ( I thought both Solr and Lucene were now 
shared projects. I can add a message to my emails to make it clear that 
Solr with RankingAlgorithm is an external implementation. I also plan to 
file an RFE to allow plugin/api support for external text search 
libraries support for Solr.


- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org


On 7/18/2011 9:45 AM, Mark Miller wrote:

Hey Nagendra - I don't mind seeing these external project announces here 
(though you might keep Solr related announces off the Lucene user list), but 
please word these announces so that users are not confused that this is an 
Apache release, and that it is an external project built on top of Apache Solr.

Thanks,

- Mark

On Jul 18, 2011, at 10:43 AM, Nagendra Nagarajayya wrote:


Hi!

I would like to announce the availability of Solr 3.3 with RankingAlgorithm and 
Near Real Time (NRT) search capability now. The NRT performance is very high, 
10,000 documents/sec with the MBArtists 390k index. The NRT functionality 
allows you to add documents without the IndexSearchers being closed or caches 
being cleared. A commit is also not needed with the document update. Searches 
can run concurrently with document updates. No changes are needed except for 
enabling the NRT through solrconfig.xml.

RankingAlgorithm query performance is now 3x times faster than before and is 
exposed as the Lucene API. This release also adds supports for the last 
document with a unique id to be searchable and visible in search results in 
case of multiple updates of the document.

I have a wiki page that describes NRT performance in detail and can be accessed 
from here:

http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver3.x

You can download Solr 3.3 with RankingAlgorithm (NRT version) from here:

http://solr-ra.tgels.org

I would like to invite you to give this version a try as the performance is 
very high.

Regards,

- Nagendra Nagarajayya
http://solr-ra.tgels.org
http://rankingalgorithm.tgels.org




- Mark Miller
lucidimagination.com














Re: Dismax RequestHandler adn Fuzzy Search

2011-07-18 Thread 虞冰
maybe you can read http://wiki.apache.org/solr/DisMaxQParserPlugin

2011/7/19 Ahmet Arslan 

> > >
> > > q=test~0.8
> > >
> >
> > do you add ~0.8 in the query (http) or in the
> > solrconfig.xml (like  > name="q">field~0.8)?
>
> mostly in the http.
>
> > is "test" the fieldName or a search string?
>
> search string. Do you have another use case?
>
>
>


searching for google+

2011-07-18 Thread Jason Toy
How does one search for the term "google+" with solr? I noticed on twitter I
can search for google+: http://search.twitter.com/search?q=google%2B (which
uses lucene, not sure about solr) but searching on my copy of solr, I can't
search for google+

-- 
- sent from my mobile
6176064373


Re:searching for google+

2011-07-18 Thread 方振鹏
"google+" or google\+
 
--
 Best wishes 
  
 James Bond Fang
  
 方 振鹏 
  
 Dept. Software Engineering
  
 Xiamen University



 
 
 
-- Original --
From: "Jason Toy"; 
Date: 2011年7月19日(星期二) 上午10:28
To: "solr-user"; 
Subject: searching for google+

 
How does one search for the term "google+" with solr? I noticed on twitter I
can search for google+: http://search.twitter.com/search?q=google%2B (which
uses lucene, not sure about solr) but searching on my copy of solr, I can't
search for google+

-- 
- sent from my mobile
6176064373

How could I monitor solr cache

2011-07-18 Thread kun xiong
Hi,

I am wondering how could I get solr cache running status. I know there is a
JMX containing those information.

Just want to know what tool or method do you make use of to monitor cache,
in order to enhance performance or detect issue.

Thanks a lot

Kun


Re: SOLR Shard failover Query

2011-07-18 Thread pravesh
Thanx Shawn,

>When I first set things up, I was using SOLR-1537 on Solr 1.5-dev.  By
>the time I went into production, I had abandoned that idea and rolled
>out a stock 1.4.1 index with two complete server chains, each with 7
>shards.

  So, Both 2 chains were configured under cluster in load balanced manner?

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-Shard-failover-Query-tp3178175p3181400.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How could I monitor solr cache

2011-07-18 Thread pravesh
This might be of some help:

http://wiki.apache.org/solr/SolrJmx http://wiki.apache.org/solr/SolrJmx 

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-could-I-monitor-solr-cache-tp3181317p3181407.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: defType argument weirdness

2011-07-18 Thread William Bell
dismax does not work with a=*:*


 defType=dismax&q=*:*  no hits

You need to switch this to:


 defType=dismax&q.alt=*:*  no hits

On Mon, Jul 18, 2011 at 8:44 PM, Erick Erickson  wrote:
> What are qf_dismax and pf_dismax? They are meaningless to
> Solr. Try adding &debugQuery=on to your URL and you'll
> see the parsed query, which helps a lot here
>
> If you change these to the proper dismax values (qf and pf)
> you'll get beter results. As it is, I think you'll see output like:
>
> +() ()
>
> showing that your query isn't actually going against
> any fields
>
> Best
> Erick
>
> On Mon, Jul 18, 2011 at 7:15 PM, Naomi Dushay  wrote:
>> I found a weird behavior with the Solr  defType argument, perhaps with
>> respect to default queries?
>>
>>  defType=dismax&q=*:*      no hits
>>
>>  q={!defType=dismax}*:*     hits
>>
>>  defType=dismax         hits
>>
>>
>> Here is the request handler, which I explicitly indicate:
>>
>> 
>>        
>>                lucene
>>
>>                
>>                has_model_s
>>                AND
>>
>>                
>>                 2<-1 5<-2 6<90% 
>>                *:*
>>                id^0.8 id_t^0.8 title_t^0.3 mods_t^0.2
>> text
>>                id^0.9  id_t^0.9 title_t^0.5 mods_t^0.2
>> text
>>                100
>>                0.01
>> 
>>
>>
>> Solr Specification Version: 1.4.0
>> Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06
>> 12:33:40
>> Lucene Specification Version: 2.9.1
>> Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25
>>
>> - Naomi
>>
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: How could I monitor solr cache

2011-07-18 Thread Ahmet Arslan

> I am wondering how could I get solr cache running status. I
> know there is a
> JMX containing those information.
> 
> Just want to know what tool or method do you make use of to
> monitor cache,
> in order to enhance performance or detect issue.

You might find this interesting :

http://sematext.com/spm/solr-performance-monitoring/index.html
http://sematext.com/spm/index.html


solr slave's performance issue after replicate the optimized index

2011-07-18 Thread 虞冰
Hi all,

I have a performance issue~

I do a optimize on solr master every night.
but about a month ago, every time after the slaves get the new
optimized index, system cpu usage will raise from 0.3 - 0.5% to 7 -
10% (daily average), and servers's load average also become 2 times
more than normal.the load average remain high even I restart the
tomcat .

after many day's testing, I find that 4 ways to bring the slaves back
to normal load average.

1. reboot linux server
2. shutdown tomcat, manually rm the index data and do repilcate again
3. shutdown tomcat, copy indexdata as indexdata2, rm indexdata, mv
indexdata2 to indexdata, start tomcat
4. shutdown tomcat, use C to alloc 20G memory and free it, start server.

I can only guess it has some relationship with the memory or  the system cache.

Is this a solr bug or lucence bug or just system issue?


My System:

CentOS 5.6 X64   Tomcat 7.0 JRocket 6
Intel E5620 *2  24GB DDR3
Solr 3.1
Index size 7G (after optimize)  / 8G (before opitimize)


Many thanks~