Re: how to get all the docIds in the search result?

2009-07-23 Thread Toby Cole

That's the only way I can think of doing it through Solr.
What is the configuration of the handler you're calling? It could be  
that highlighting or faceting are turned on and slowing down your query.

Toby.

On 23 Jul 2009, at 10:35, shb wrote:


I have tried the following code:
query.setRows(Integer.MAX_VALUE);
query.setFields("id");

when it return 1000,000 records, it will take about 22s.
This is very slow. Is there any other way?


2009/7/23 Toby Cole 

Have you tried limiting the fields that you're requesting to just  
the ID?

Something along the line of:

query.setRows(Integer.MAX_VALUE);
query.setFields("id");

Might speed the query up a little.


On 23 Jul 2009, at 09:11, shb wrote:

Here id is indeed the uniqueKey of a document.

I want to get all the ids  for some other  useage.


2009/7/23 Shalin Shekhar Mangar 

On Thu, Jul 23, 2009 at 1:09 PM, shb  wrote:


if I use query.setRows(Integer.MAX_VALUE);

the query will become very slow, because searcher will go
to fetch the filed value in the index for all the returned
document.

So if I set query.setRows(10), is there any other ways to
get all the ids? thanks


You should fetch as many rows as you need and not more. Why do  
you need

all
the ids? I'm assuming that by id you mean the uniqueKey of a  
document.


--
Regards,
Shalin Shekhar Mangar.



--

Toby Cole
Software Engineer, Semantico Limited
 
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/





solr-user@lucene.apache.org

2009-07-30 Thread Toby Cole

Any chance of getting that stack trace as more than one line? :)
Also, where are you posting your documents from? (e.g. Java, PHP,  
command line etc).


It sounds like you're not using 'entities' for your '&' characters  
(ampersands) in your XML.
These should be converted to "&" This should look familiar if  
you've ever written any HTML.



On 30 Jul 2009, at 09:44, Jörg Agatz wrote:


Good Morning SolR :-) its morning in Germany!

i have a Problem, with the Indexing...

I often become an Error.

I think it is because in the XML stand this Character "&"
I need the Character, what happens?


SimplePostTool: FATAL: Solr returned an error:
comctcwstxexcWstxLazyException_Unexpected_character___code_32_missing_ 
name__at_rowcol_unknownsource_1465__comctcwstxexcWstxLazyException_com 
ctcwstxexcWstxUnexpectedCharException_Unexpected_character___code_32_m 
issing_name__at_rowcol_unknownsource_1465__at_comctcwstxexcWstxLazyExc 
eptionthrowLazilyWstxLazyExceptionjava45__at_comctcwstxsrStreamScanner 
throwLazyErrorStreamScannerjava729__at_comctcwstxsrBasicStreamReadersa 
feFinishTokenBasicStreamReaderjava3659__at_comctcwstxsrBasicStreamRead 
ergetTextBasicStreamReaderjava809__at_orgapachesolrhandlerXMLLoaderrea 
dDocXMLLoaderjava278__at_orgapachesolrhandlerXMLLoaderprocessUpdateXML 
Loaderjava139__at_orgapachesolrhandlerXMLLoaderloadXMLLoaderjava69__at 
_orgapachesolrhandlerContentStreamHandlerBasehandleRequestBodyContentS 
treamHandlerBasejava54__at_orgapachesolrhandlerRequestHandlerBasehandl 
eRequestRequestHandlerBasejava131__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava1299__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava338__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava241__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHt


_


--

Toby Cole
Software Engineer, Semantico Limited
 
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



solr-user@lucene.apache.org

2009-07-30 Thread Toby Cole

On 30 Jul 2009, at 11:17, Jörg Agatz wrote:

It sounds like you're not using 'entities' for your '&' characters
(ampersands) in your XML.
These should be converted to "&" This should look familiar if  
you've

ever written any HTML.
I dont understand this

i musst change even & to & ?



Yes, '&' characters aren't allowed in XML unless they are either in a  
CDATA section or part of an 'entity'.

A good place to read up on this is: 
http://www.xml.com/pub/a/2001/01/31/qanda.html

In short, replace all your & with &

--
Toby Cole
Software Engineer, Semantico Limited
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



Re: Delete solr data from disk space

2009-08-04 Thread Toby Cole

Hi Anish,
Have you optimized your index?
When you delete documents in lucene they are simply marked as  
'deleted', they aren't physically removed from the disk.
To get the disk space back you must run an optimize, which re-writes  
the index out to disk without the deleted documents, then deletes the  
original.


Toby

On 4 Aug 2009, at 14:41, Ashish Kumar Srivastava wrote:



Hi ,


Sorry!! But this solution will not work because I deleted data by  
certain

query.
Then how can i know which files should be deleted. I cant delete  
whole data.




Markus Jelsma - Buyways B.V. wrote:


Hello,


A rigorous but quite effective method is manually deleting the  
files in
your SOLR_HOME/data directory and reindex the documents you want.  
This

will surely free some diskspace.


Cheers,

-
Markus Jelsma  Buyways B.V. Tel.  
050-3118123
Technisch ArchitectFriesestraatweg 215c Fax.  
050-3118124

http://www.buyways.nl  9743 AD GroningenKvK  01074105


On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote:


I am facing a problem in deleting solr data form disk space.
I had 80Gb of of solr data. I deleted 30% of these data by using  
query in

solr-php client and committed.
Now deleted data is not visible from the solr UI but used disk  
space is

still 80Gb for solr data.
Please reply if you have any solution to free the disk space after
deleting
some solr data.

Thanks in advance.





--
View this message in context: 
http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808883.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
Toby Cole
Software Engineer, Semantico Limited
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



Re: Proximity Search

2009-08-18 Thread Toby Cole

See the Lucene query parser syntax documentation:

http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Proximity%20Searches

basically... "shell petroleum"~10 should do the trick (if you're using  
a standard request handler, can't remember if dismax supports  
proximity).


On 18 Aug 2009, at 13:28, Ninad Raut wrote:


Hi,
I want to count the words between two significant words like "shell"  
and
"petroleum". Or want to write a query to find all the documents  
where the
content has "shell" and "petroleum" in close proximity of less than  
10 words

between them.
Can such quries be created in Solr?
Regards,
Ninad Raut.


--

Toby Cole
Software Engineer, Semantico Limited
 
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



Re: Can I search for a term in any field or a list of fields?

2009-08-18 Thread Toby Cole
I would consider using the dismax query handler. This allows you to  
send a list of keywords or phrases along with the fields to search over.

e.g., you could use ?qt=dismax&q=foo&qf=title+text+keywords+concept

More details here: http://wiki.apache.org/solr/DisMaxRequestHandler


On 18 Aug 2009, at 15:56, Paul Tomblin wrote:


So if I want to make it so that the default search always searches
three specific fields, I can make another field multi-valued that they
are all copied into?

On Tue, Aug 18, 2009 at 10:46 AM, Marco Westermann  
wrote:

I would say, you should use the copyField tag in the schema. eg:



the text-field has to be difined as multivalued=true. When you now  
do an
unqualified search, it will search every field, which is copied to  
the

text-field.




--
http://www.linkedin.com/in/paultomblin


--

Toby Cole
Software Engineer, Semantico Limited
 
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



Re: Status of Spelt integration

2009-11-30 Thread Toby Cole

Hi Andrew,
	We ended up abandoning the spelt integration as the built in solr  
spellchecking improved so much during our project. Also, if you did go  
the route of using spelt, I'd implement it as a spellcheck plugin  
(which didn't exist as a concept when we started trying to shoehorn  
spelt into solr).

Regards, Toby.

On 30 Nov 2009, at 11:29, Andrey Klochkov wrote:


Hi all

I searched through the mail-list archives and saw that sometime ago  
Toby

Cole was going to integrate a spellchecker named Spelt into Solr. Does
anyone now what's the status of this? Anyone tried to use it with  
Solr? Does

it make sense to try it instead of standard spell checker?

Some links on the subject:
http://markmail.org/message/cqt4qtzzwyceltqu#query:+page:1+mid:cqt4qtzzwyceltqu+state:results
http://markmail.org/search/?q=spelt#query:spelt+page:1+mid:krzofzojhg7hmms7+state:results
http://groups.google.com/group/spelt

--
Andrew Klochkov
Senior Software Engineer,
Grid Dynamics


Re: Status of Spelt integration

2009-12-07 Thread Toby Cole

I'm pretty sure this isn't a Solr related question.
Have you tried asking on the eGroupware mailing lists? 
http://sourceforge.net/mail/?group_id=78745
Toby.

On 7 Dec 2009, at 08:52, freerk55 wrote:



The standard spell checker of Thunderbird works in eGroupware.
But not in Felamimail!!?? Why not?
How can I get it working as it does in the rest of eGroupware?

Freerk Jongsma



Toby Cole-2 wrote:


Hi Andrew,
We ended up abandoning the spelt integration as the built in solr
spellchecking improved so much during our project. Also, if you did  
go

the route of using spelt, I'd implement it as a spellcheck plugin
(which didn't exist as a concept when we started trying to shoehorn
spelt into solr).
Regards, Toby.

On 30 Nov 2009, at 11:29, Andrey Klochkov wrote:


Hi all

I searched through the mail-list archives and saw that sometime ago
Toby
Cole was going to integrate a spellchecker named Spelt into Solr.  
Does

anyone now what's the status of this? Anyone tried to use it with
Solr? Does
it make sense to try it instead of standard spell checker?

Some links on the subject:
http://markmail.org/message/cqt4qtzzwyceltqu#query:+page:1+mid:cqt4qtzzwyceltqu+state:results
http://markmail.org/search/?q=spelt#query:spelt+page:1+mid:krzofzojhg7hmms7+state:results
http://groups.google.com/group/spelt

--
Andrew Klochkov
Senior Software Engineer,
Grid Dynamics





--
View this message in context: 
http://old.nabble.com/Status-of-Spelt-integration-tp26573196p26674324.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Field Collapsing - disable cache

2009-12-22 Thread Toby Cole
If you take out the fieldCollapsing/fieldCollapseCache element in your  
config the fieldcollapse component will not use a cache.



From http://wiki.apache.org/solr/FieldCollapsing#line-63
	"If the field collapse cache is not configured then the field  
collapse logic will not be cached."


Regards, Toby.

On 22 Dec 2009, at 10:56, r...@intelcompute.com wrote:


my
solconfig can be seen at http://www.intelcompute.com/solrconfig.xml
[1]
On Tue 22/12/09 10:51 , r...@intelcompute.com wrote:Is
it possible to disable the field collapsing cache?  I'm trying to
perform some speed tests, and have managed to comment out the filter,
queryResult, and document caches successfully.

on 1.5
...
...
collapse

facet

tvComponent
...
-
Message sent via Atmail Open - http://atmail.org/ [2]"
target="_blank">http://atmail.org/
-
Message sent via Atmail Open - http://atmail.org/

Links:
--
[1] http://www.intelcompute.com/solrconfig.xml
[2] http://mail.intelcompute.com/parse.php?redirect=


--
Toby Cole
Senior Software Engineer, Semantico Limited
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



Re: Field Collapsing - disable cache

2009-12-22 Thread Toby Cole
Which elements did you comment out? It could be the case that you need  
to get rid of the entire fieldCollapsing element, not just the  
fieldCollapsingCache element.

(Disclaimer: I've not used field collapsing in anger before :)
Toby.

On 22 Dec 2009, at 11:09, r...@intelcompute.com wrote:


That's what I assumed, but I'm getting the following error with it
commented out
MESSAGE null java.lang.NullPointerException at
org 
.apache 
.solr 
.search 
.fieldcollapse 
.AbstractDocumentCollapser 
.createDocumentCollapseResult(AbstractDocumentCollapser.java:276)

at
org 
.apache 
.solr 
.search 
.fieldcollapse 
.AbstractDocumentCollapser 
.executeCollapse(AbstractDocumentCollapser.java:249)

at
org 
.apache 
.solr 
.search 
.fieldcollapse 
.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java: 
172)

at
org 
.apache 
.solr 
.handler 
.component.CollapseComponent.doProcess(CollapseComponent.java:173)

at
org 
.apache 
.solr 
.handler.component.CollapseComponent.process(CollapseComponent.java: 
127)

at
org 
.apache 
.solr 
.handler 
.component.SearchHandler.handleRequestBody(SearchHandler.java:195)

at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336)

at
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239)

at
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 
215)

at
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)

at
org 
.apache 
.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 
210)

at
org 
.apache 
.catalina.core.StandardContextValve.invoke(StandardContextValve.java: 
172)

at
org 
.apache 
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)

at
org 
.apache 
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)

at
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
108)

at
org 
.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
151)

at
org 
.apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
870)

at
org.apache.coyote.http11.Http11BaseProtocol 
$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java: 
665)

at
org 
.apache 
.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java: 
528)

at
org 
.apache 
.tomcat 
.util 
.net 
.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)

at
org.apache.tomcat.util.threads.ThreadPool 
$ControlRunnable.run(ThreadPool.java:685)

at java.lang.Thread.run(Thread.java:636)
On Tue 22/12/09 11:02 , Toby Cole  wrote:If you take out the
fieldCollapsing/fieldCollapseCache element in your
config the fieldcollapse component will not use a cache.
From http://wiki.apache.org/solr/FieldCollapsing%23line-63 [1]"
target="_blank">http://wiki.apache.org/solr/FieldCollapsing#line-63
"If the field collapse cache is not configured then the field
collapse logic will not be cached."

Regards, Toby.

On 22 Dec 2009, at 10:56,  wrote:


my
solconfig can be seen at http://www.intelcompute.com/solrconfig.xml

[3]" target="_blank">http://www.intelcompute.com/solrconfig.xml

[1]
On Tue 22/12/09 10:51 ,  wrote:Is
it possible to disable the field collapsing cache?  I'm trying to
perform some speed tests, and have managed to comment out the

filter,

queryResult, and document caches successfully.

on 1.5
...
...
collapse

facet

tvComponent
...
-
Message sent via Atmail Open - http://atmail.org/ [5]"

target="_blank">http://atmail.org/ [2]"

target="_blank">http://atmail.org/ [6]"

target="_blank">http://atmail.org/

-
Message sent via Atmail Open - http://atmail.org/ [7]"

target="_blank">http://atmail.org/


Links:
--
[1] http://www.intelcompute.com/solrconfig.xml [8]"

target="_blank">http://www.intelcompute.com/solrconfig.xml

[2] http://mail.intelcompute.com/parse.php%3Fredirect%3D%26lt%3Ba

[9]"
target="_blank">http://mail.intelcompute.com/parse.php?redirect=http://blogs.semantico.com/discovery-blog/
-
Message sent via Atmail Open - http://atmail.org/

Links:
--
[1] http://mail.intelcompute.com/parse.php?redirect=http://mail.intelcompute.com/parse.php?redirect=http://mail.intelcompute.com/parse.php?redirect=http://mail.intelcompute.com/parse.php?redirect=http://mail.intelcompute.com/parse.php?redirect=http://mail.intelcompute.com/parse.php?redirect=http://mail.intelcompute.com/parse.php?redirect=http://mail.intelcompute.com/parse.php?redirect=

--

Toby Cole
Senior Software Engineer, Semantico Limited
 
Registered in England and Wales n

Deadlock with DirectUpdateHandler2

2008-11-18 Thread Toby Cole
Has anyone else experienced a deadlock when the DirectUpdateHandler2  
does an autocommit?
I'm using a recent snapshot from hudson (apache- 
solr-2008-11-12_08-06-21), and quite often when I'm loading data the  
server (tomcat 6) gets stuck at line 469 of DirectUpdateHandler2:


  // Check if there is a commit already scheduled for longer then  
this time

  if( pending != null &&
  pending.getDelay(TimeUnit.MILLISECONDS) >= commitMaxTime )

Anyone got any enlightening tips?
Cheers,

Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: [EMAIL PROTECTED]
W: www.semantico.com



Re: Deadlock with DirectUpdateHandler2

2008-11-18 Thread Toby Cole

On 18 Nov 2008, at 20:18, Mark Miller wrote:


Mike Klaas wrote:



autoCommitCount is written in a CommitTracker.synchronized block  
only.  It is read to print stats in an unsynchronized fashion,  
which perhaps could be fixed, though I can't see how it could cause  
a problem


lastAddedTime is only written in a call path within a  
DirectUpdateHandler2.synchronized block.  It is only read in a  
CommitTracker.synchronized block.  It could read the wrong value,  
but I also don't see this causing a problem (a commit might fail to  
be scheduled).  This could probably also be improved, but doesn't  
seem important.
Right. I don't see these as causing a deadlock either, but whatever  
happens, its pretty much JVM undefined right, hence 'who  
knows' (I'll go with pretty doubtful ). I am not so sure its safe  
to read a value from an unsynced method whether you care about the  
result or not though. Its prob safe for atomic types and volatiles,  
but I'm fairly sure your playing with fire doing read/write in and  
out of sync. I don't think its just about stale values. But then  
again, it probably works 99.9% of the time or something.


pending seems to be the issue.  As long as commit are only  
triggered by autocommit, there is no issue as manipulation of  
pending is always performed inside CommitTracker.synchronized.  But  
didCommit()/didRollback() could be called via manual commit, and  
pending is directly manipulated during DUH2.close().  I'm having  
trouble coming up with a plausible deadlock scenario, but this  
needs to be fixed.  It isn't as easy as synchronizing didCommit/ 
didRollback, though--this would introduce definite deadlock  
scenarios.


Mark, is there any chance you could post the thread dump for the  
deadlocked process?  Do you issue manual commits during insertion?

Toby reported it. Thread dump Toby?


-Mike


I'll try and post a thread dump when I get to work, can't remote in  
from here.
I don't mind helping out with the fix, I've been getting to know  
solr's internals quite intimately recently after writing a few  
handlers/components for internal projects.


T


Re: disappearing index

2008-12-03 Thread Toby Cole

Could be that all your documents have not yet been committed.
Have you tried running a commit?

On 3 Dec 2008, at 15:00, Justin wrote:


I built up two indexes using a multicore configuration
one containing 52,000+ documents and the other over 10 million,  the  
entire

indexing process showed now errors.

The server crashed over night, well after the indexing had  
completed, and

now no documents are reported for either index.

This despite the fact that the core's both have huge /data folders.   
(one is

1.5GB the other is 8.5GB).

Any ideas?


Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: [EMAIL PROTECTED]
W: www.semantico.com



Re: Nightly build - 2008-12-17.tgz - build error - java.lang.NoClassDefFoundError: org/mozilla/javascript/tools/shell/Main

2008-12-17 Thread Toby Cole
I came across this too earlier, I just deleted the contrib/javascript  
directory.
Of course, if you need javascript library then you'll have to get it  
building.


Sorry, probably not that helpful. :)
Toby.

On 17 Dec 2008, at 17:03, Kay Kay wrote:


I downloaded the latest .tgz and ran

$ ant dist


docs:

  [mkdir] Created dir: /opt/src/apache-solr-nightly/contrib/ 
javascript/dist/doc
   [java] Exception in thread "main" java.lang.NoClassDefFoundError:  
org/mozilla/javascript/tools/shell/Main

   [java] at JsRun.main(Unknown Source)
   [java] Caused by: java.lang.ClassNotFoundException:  
org.mozilla.javascript.tools.shell.Main
   [java] at java.net.URLClassLoader$1.run(URLClassLoader.java: 
200)
   [java] at java.security.AccessController.doPrivileged(Native  
Method)
   [java] at  
java.net.URLClassLoader.findClass(URLClassLoader.java:188)

   [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
   [java] at sun.misc.Launcher 
$AppClassLoader.loadClass(Launcher.java:301)

   [java] at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
   [java] at  
java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)

   [java] ... 1 more

BUILD FAILED
/opt/src/apache-solr-nightly/common-build.xml:335: The following  
error occurred while executing this line:
/opt/src/apache-solr-nightly/common-build.xml:212: The following  
error occurred while executing this line:
/opt/src/apache-solr-nightly/contrib/javascript/build.xml:74: Java  
returned: 1



and came across the above mentioned error.

The class seems to be from the rhino (mozilla js ) library. Is it  
supposed to be packaged by default / is there a license restriction  
that prevents from being so .




Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: toby.c...@semantico.com
W: www.semantico.com



Re: How to select *actual* match from a multi-valued field

2009-01-20 Thread Toby Cole
We came across this problem, unfortunately we gave up and did our hit- 
highlighting for multi-valued fields on the frontend. :-/
One approach would be to extend solr to return every value of a multi- 
valued field in the highlighting, regardless of whether that  
particular value matched.
Just an idea, don't know if it's feasible or not. if anyone can point  
me in the right direction I could probably bash together a plugin and  
some tests.

Toby.

On 20 Jan 2009, at 16:31, Feak, Todd wrote:


Anyone that can shed some insight?

-Todd

-Original Message-
From: Feak, Todd [mailto:todd.f...@smss.sony.com]
Sent: Friday, January 16, 2009 9:55 AM
To: solr-user@lucene.apache.org
Subject: How to select *actual* match from a multi-valued field

At a high level, I'm trying to do some more intelligent searching  
using

an app that will send multiple queries to Solr. My current issue is
around multi-valued fields and determining which entry actually
generated the "hit" for a particular query.



For example, let's say that I have a multi-valued field containing
people's names, associated with the document (trying to be non- 
specific

on purpose). In one document, I have the following names:

Jane Smith, Bob Smith, Roger Smith, Jane Doe. If the user performs a
search for Bob Smith, this document is returned. What I want to know  
is

that this document was returned because of "Bob Smith", not because of
Jane or Roger. I've tried using the highlighting settings. They do
provide some help, as the Jane Doe entry doesn't come back  
highlighted,

but both Jane and Roger do. I've tried using hl.requireFieldMatch, but
that seems to pertain only to fields, not entries within a multi- 
valued

field.



Using Solr, is there a way to get the information I am looking for?
Specifically, that "Bob Smith" is the value in the multi-valued field
that triggered the hit?



-Todd Feak



Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: toby.c...@semantico.com
W: www.semantico.com



Re: what crawler do you use for Solr indexing?

2009-03-06 Thread Toby Cole

Hi Tony,
	Strangely I started looking into the Solr/Nutch integration yesterday  
so I might be able to help :)


The documentation for it is very sparse, but the trunk of nutch does  
have the solr integration committed.

If I remember correctly, what I had to do was...

I went through one of the nutch setup guides and set it up as if I  
wasn't going to use solr. (Can't remember which one, sorry).


Copy the crawl script from here: http://www.foofactory.fi/files/nutch-solr/crawl.sh 
 into my nutch directory.
I was running this under the soy-latte JVM on OSX, and I had to modify  
the crawler a little to pick up filenames instead of permissions  
strings:

This line was changed (note the 'cut' command)
	SEGMENT=`bin/hadoop dfs -ls $BASEDIR/segments|grep $BASEDIR|cut -d\ - 
f17|sort|tail -1`
I also changed the second to last line to match the required  
parameters for the new solr indexer:
	bin/nutch org.apache.nutch.indexer.solr.SolrIndexer http://localhost:8983/solr/ 
 $BASEDIR/crawldb $BASEDIR/linkdb $SEGMENT


Copy the schema.xml from the nutch config directory into a fresh solr  
install & start it up.
run the crawler.sh, and you should end up with content in your solr  
instance.


I probably wont' be able to answer many nutch-related questions, but  
that's how I managed to get it up and running.


Toby.

On 6 Mar 2009, at 11:27, Andrzej Bialecki wrote:


Tony Wang wrote:

Hi Hoss,
But I cannot find documents about the integration of Nutch and Solr  
in

anywhere. Could you give me some clue? thanks


Tony, I suggest that you follow Hoss's advice and ask these  
questions on nutch-user. This integration is built into Nutch, and  
not Solr, so it's less likely that people on this list know what you  
are talking about.


This integration is quite fresh, too, so there are almost no docs  
except on the mailing list. Eventually someone is going to create  
some docs, and if you keep asking questions on nutch-user you will  
contribute to the creation of such docs ;)



--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: toby.c...@semantico.com
W: www.semantico.com



Re: Solr: ERRORs at Startup

2009-03-13 Thread Toby Cole
quot;]
10:51:22,564 ERROR [STDERR] Mar 13, 2009 10:51:22 AM  
org.apache.solr.core.SolrCore parseListener
INFO: Added SolrEventListener:  
org 
.apache 
.solr 
.core.QuerySenderListener{queries=[{q=fast_warm,start=0,rows=10},  
{q=static firstSearcher warming query from solrconfig.xml}]}


What am I missing? :-(

Any idea?

thanks in advance.

Giovanni




Toby Cole
Software Engineer

Semantico
E: toby.c...@semantico.com
W: www.semantico.com



Re: Storing "map" in Field

2009-03-13 Thread Toby Cole
I don't think anything _quite_ like that exists, however you could use  
wildcard fields to achieve pretty much the same thing.


You could use a post like this:

 
   SKU001
   A Sample Product
   119.99
   109.99
 


if you have a field definition in your schema.xml like:
stored="true"/>


Regards, Toby.

On 13 Mar 2009, at 14:01, Jeff Crowder wrote:


All,

I'm working with the sample schema, and have a scenario where I  
would like
to store multiple prices in a "map" of some sort.  This would be  
used for a
scenario where a single "product" has different "prices" based on a  
price

list.  For instance:


 
   SKU001
   A Sample Product
   119.99
   109.99
 


Is something like this possible?

Regards,
-Jeff


Toby Cole
Software Engineer

Semantico
E: toby.c...@semantico.com
W: www.semantico.com



Re: why is query so slow

2009-03-17 Thread Toby Cole

Peter,
	If possible try running a 1.4-snapshot of Solr, the faceting  
improvements are quite remarkable.
However, if you can't run unreleased code, it might be an idea to try  
reducing the number of unique terms (try indexing surnames only?).

Toby.

On 17 Mar 2009, at 10:01, pcurila wrote:



I am using 1.3


How many terms are in the wasCreatedBy_fct field?   How is that field
and its type configured?

field contains author names and there are lots of them.

here is type configuration:

positionIncrementGap="100">






stored="true"

multiValued="true"/>




--
View this message in context: 
http://www.nabble.com/why-is-query-so-slow-tp22554340p22555842.html
Sent from the Solr - User mailing list archive at Nabble.com.



Toby Cole
Software Engineer

Semantico
E: toby.c...@semantico.com
W: www.semantico.com



Re: Index Creation Exception in solr

2009-03-18 Thread Toby Cole
If you're using a recent 1.4-snapshot you should be able to do a  
rollback: https://issues.apache.org/jira/browse/SOLR-670
Otherwise, if you have unique IDs in your index, you can just post new  
documents over the top of the old ones then commit.

Toby.

On 18 Mar 2009, at 10:19, dabboo wrote:



But if I already have some indexes in the index folder then these  
old indexes

will also get deleted. Is there any way to roll back the operation.



Shalin Shekhar Mangar wrote:


On Wed, Mar 18, 2009 at 3:15 PM, dabboo  wrote:



Hi,

I am creating indexes in Solr and facing an unusual issue.

I am creating 5 indexes and xml file of 4th index is malformed.  
So, while
creating indexes it properly submits index #1, 2 & 3 and throws  
exception

after submission of index 4.



I think you mean documents not indexes. Each document goes into the
Lucene/Solr index.




Now, if I look for index #1,2 & 3, it doesnt show up, which I  
think is
happening because the operation is not committed yet. But these  
indexes

must
be lying somewhere temporarily in the Solr and I am not able to  
delete

these
indexes.



Just delete the 

--
View this message in context: 
http://www.nabble.com/Index-Creation-Exception-in-solr-tp22575618p22576093.html
Sent from the Solr - User mailing list archive at Nabble.com.



Toby Cole
Software Engineer

Semantico
E: toby.c...@semantico.com
W: www.semantico.com



Re: Problem encoding ':' char in a solr query

2009-03-18 Thread Toby Cole

You'll need to escape the colon with a backslash, e.g.
	fileAbsolutePath:file\:///Volumes/spare/ts/ford/schema/data/news/ 
fdw2008/jn71796.xml


see the lucene query parser syntax page:

http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Escaping%20Special%20Characters

Toby.

On 18 Mar 2009, at 11:28, Fergus McMenemie wrote:


Hello

I have a solr field:-

   stored="true"  multiValued="false"/>


which an unrelated query reveals is populated with:-


file:///Volumes/spare/ts/ford/schema/data/news/fdw2008/jn71796.xml


however when I try and query for that exact document explicitly:-

http://localhost:8080/apache-solr-1.4-dev/select?q=fileAbsolutePath:file%3a///Volumes/spare/ts/ford/schema/data/news/fdw2008/jn71796.xml&wt=xml

it fails.

HTTP Status 400 - org.apache.lucene.queryParser.ParseException:  
Cannot parse 'fileAbsolutePath:file:///Volumes/spare/ts/ford/schema/data/news/fdw2008/jn71796.xml' 
: Encountered " ":" ": "" at line 1, column 21. Was expecting one  
of:   ...  ...  ... "+" ... "-" ... "(" ...  
"*" ... "^" ...  ...  ...  ...  
 ...  ... "[" ... "{" ...  ...


My encoding did not work! Help!
--

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Toby Cole
Software Engineer

Semantico
E: toby.c...@semantico.com
W: www.semantico.com



Re: UK Solr users meeting?

2009-05-18 Thread Toby Cole
I know of a few people who'd be interested, we've got quite a few  
projects using Solr down here in Brighton.


On 14 May 2009, at 10:41, Fergus McMenemie wrote:

I was wondering if there is an interest in a UK (South East) solr  
user

group meeting

Please let me know if you are interested.  I am happy to organize.

Regards,

Colin


Yes Very interested. I am in lincolnshire.
--

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: toby.c...@semantico.com
W: www.semantico.com



Re: 1.4 Replication

2009-05-27 Thread Toby Cole
I've not figured out a way to use basic auth with replication. We  
ended up using IP based auth, it shouldn't be too tricky to add  
basicauth support as, IIRC, the replication is based on the commons  
httpclient library.



On 27 May 2009, at 15:17, Matthew Gregg wrote:

On Wed, 2009-05-27 at 19:06 +0530, Noble Paul നോബിള്‍  
नोब्ळ् wrote:
On Wed, May 27, 2009 at 6:48 PM, Matthew Gregg > wrote:
Does replication in 1.4 support passing credentials/basic auth?   
If not

what is the best option to protect replication?

do you mean protecting the url /replication ?
Yes I would like to put /replication behind basic auth, which I can  
do,

but replication fails.  I naively tried the obvious
http://user:p...@host/replication, but that fails.



ideally Solr is expected to run in an unprotected environment. if you
wish to introduce some security it has to be built by you.



I guess you meant Solr is expected to run in a "protected"  
environment?

It's pretty easy to put up a basic auth in front of Solr, but the
replication infra. in 1.4 doesn't seem to support it. Or does it,  
and I

just don't know how?

--
Matthew Gregg 



Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
W: www.semantico.com



Re: support for Payload Feature of lucene in solr

2009-07-14 Thread Toby Cole
As i am new to solr and trying to explore payloads in solr but i  
haven't got

any success on that. In one of the thread Grant mentioned solr have
DelimitedPayloadTokenFilter which
can store payloads at index time. But to make search on it we will
require  implementation of BoostingTermQuery extending  
SpanTermQuery . And

if any other thing also we require.


This looks about the same as the approach I'm about to use for our  
research.
We're looking into using payloads to improve relevance for stemmed  
terms, using the payload to store the unstemmed term, boosting the  
term if there's an exact match with the payloads.



My Question:
1. What all i will have to do for this.
2. How i will do this. I mean even if by adding some classes and  
rebuilding
solr jars and then how i will prepare Document to index to store  
payloads
and how i will build my search query to do payload search. Do we  
need to add

a new Requesthandler for making such custom searches? Please provide a
sample code if have any...

--
Cheers
Sumit



I'm starting work on this in the next few days, I'll let you know how  
I get on.
If anyone else has any experience with payloads in solr please chip  
in :)



--

Toby Cole
Software Engineer, Semantico Limited
 
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



Re: how to get all the docIds in the search result?

2009-07-23 Thread Toby Cole
Have you tried limiting the fields that you're requesting to just the  
ID?

Something along the line of:

query.setRows(Integer.MAX_VALUE);
query.setFields("id");

Might speed the query up a little.

On 23 Jul 2009, at 09:11, shb wrote:


Here id is indeed the uniqueKey of a document.
I want to get all the ids  for some other  useage.


2009/7/23 Shalin Shekhar Mangar 


On Thu, Jul 23, 2009 at 1:09 PM, shb  wrote:


if I use query.setRows(Integer.MAX_VALUE);
the query will become very slow, because searcher will go
to fetch the filed value in the index for all the returned
document.

So if I set query.setRows(10), is there any other ways to
get all the ids? thanks



You should fetch as many rows as you need and not more. Why do you  
need all
the ids? I'm assuming that by id you mean the uniqueKey of a  
document.


--
Regards,
Shalin Shekhar Mangar.



--

Toby Cole
Software Engineer, Semantico Limited
 
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



Re: Multiple Cores Vs. Single Core for the following use case

2010-01-27 Thread Toby Cole
I've not looked at the filtering for quite a while, but if you're  
getting lots of similar queries, the filter's caching can play a huge  
part in speeding up queries, so even if the first query for "paris"  
was slow, subsequent queries from different users for the same terms  
will be sped up considerably (especially if you're using the  
FastLRUCache).


IF filtering is slow for your queries, why not try simply using a  
boolean query (i.e, for the example below: "paris AND userId:123")  
this would remove the cross-user usefulness of the caches, if I  
understand them correctly, but may speed up uncached searches.


Toby.


On 27 Jan 2010, at 15:48, Matthieu Labour wrote:

@Marc: Thank you marc. This is a logic we had to implement in the  
client application. Will look into applying the patch to replace our  
own grown logic


@Trey: I have 1000 users per machine. 1 core / user. Each core is  
35000 documents. Documents are small...each core goes from 100MB to  
1.3GB at most. There are 7 types of documents.
What I am trying to understand is the search/filter algorithm. If I  
have 1 core with all documents and I  search for "Paris" for  
userId="123", is lucene going to first search for all Paris  
documents and then apply a filter on the userId ? If this is the  
case, then I am better off having a specific index for the  
user="123" because this will be faster






--- On Wed, 1/27/10, Marc Sturlese  wrote:

From: Marc Sturlese 
Subject: Re: Multiple Cores Vs. Single Core for the following use case
To: solr-user@lucene.apache.org
Date: Wednesday, January 27, 2010, 2:22 AM


In case you are going to use core per user take a look to this patch:
http://wiki.apache.org/solr/LotsOfCores

Trey-13 wrote:


Hi Matt,

In most cases you are going to be better off going with the userid  
method
unless you have a very small number of users and a very large  
number of
docs/user. The userid method will likely be much easier to manage,  
as you
won't have to spin up a new core every time you add a new user.  I  
would
start here and see if the performance is good enough for your  
requirements

before you start worrying about it not being efficient.

That being said, I really don't have any idea what your data looks  
like.

How many users do you have?  How many documents per user?  Are any
documents
shared by multiple users?

-Trey



On Tue, Jan 26, 2010 at 7:27 PM, Matthieu Labour
wrote:


Hi



Shall I set up Multiple Core or Single core for the following use  
case:




I have X number of users.



When I do a search, I always know for which user I am doing a search



Shall I set up X cores, 1 for each user ? Or shall I set up 1 core  
and

add
a userId field to each document?



If I choose the 1 core solution then I am concerned with  
performance.
Let's say I search for "NewYork" ... If lucene returns all "New  
York"
matches for all users and then filters based on the userId, then  
this
is going to be less efficient than if I have sharded per user and  
send

the request for "New York" to the user's core



Thank you for your help



matt












--
View this message in context: 
http://old.nabble.com/Multiple-Cores-Vs.-Single-Core-for-the-following-use-case-tp27332288p27335403.html
Sent from the Solr - User mailing list archive at Nabble.com.







--
Toby Cole
Senior Software Engineer, Semantico Limited
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/