Re: Error 400 - By search with exclamation mark ... ?! PatternReplaceFilterFactory ?

2010-03-08 Thread stocki

oh okay, thx a lot ;)

can i escape all possible operators with a requesthandler ?
or can i escape these operators automatic when the syntax is wrong ?

is use Solr with an php client ^^



MitchK wrote:
> 
> According to Ahmet Arslan's Post:
> Solr is expecting a word after the "!", because it is an operator.
> If you escape it, it is part of the queried string.
> 

-- 
View this message in context: 
http://old.nabble.com/Error-400---By-search-with-exclamation-mark-...--%21-PatternReplaceFilterFactory---tp27778918p27818984.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Free Webinar: Mastering Solr 1.4 with Yonik Seeley

2010-03-08 Thread stocki

i have the same problem ...

i wrote an email...-->

Jonas, did you set the country correctly? If you set it to the US it will
validate against US number formats and not recognize your number in Germany.

but i did not find any option to set my country =(




Janne Majaranta wrote:
> 
> Do I need a U.S. phone number to view the recording / download the slides
> ?
> The registration form whines about invalid area code..
> 
> -Janne
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Re%3A-Free-Webinar%3A-Mastering-Solr-1.4-with-Yonik-Seeley-tp27720526p27819123.html
Sent from the Solr - User mailing list archive at Nabble.com.



More contextual information in anlyzers

2010-03-08 Thread dbejean

Hello,

If I write a custom analyser that accept a specific attribut in the
constructor

public MyCustomAnalyzer(String myAttribute);

Is there a way to dynamically send a value for this attribute from Solr at
index time in the XML Message ?


  
.


Obviously, in Sorl shema.xml, the "content" field is associated to my custom
Analyser.

Thank you.

Dominique

-- 
View this message in context: 
http://old.nabble.com/More-contextual-information-in-anlyzers-tp27819298p27819298.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: CoreAdmin

2010-03-08 Thread Suram



Shalin Shekhar Mangar wrote:
> 
> On Sat, Feb 27, 2010 at 5:22 PM, Suram  wrote:
> 
>>
>> Hi all,
>>
>> How can i configure Core admin under the Tomcat server.Kindly
>> could
>> u tell me anyone
>>
>>
> There's nothing to configure. If you are using multiple cores in Solr 1.4
> then CoreAdmin is available.
> 
> See http://wiki.apache.org/solr/CoreAdmin
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 


Hi all,

i configured coreAdmin Like http://localhost:8080/solr/core0/

but i try to index the file,it not accept throwing the error like below

\solr\example\exampledocs>java -Ddata=args -Dcommit=yes -Durl=http://l
ocalhost:8080/solr/core0/update -jar post.jar Example.xml
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
othe
r encodings are not currently supported
SimplePostTool: POSTing args to http://localhost:8080/solr/update..
SimplePostTool: FATAL: Solr returned an error: Bad Request

and i tried 

\solr\example\exampledocs>java -jar post.jar Example.xml
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
othe
r encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8080/solr/update..
SimplePostTool: POSTing file Example.xml
SimplePostTool: FATAL: Solr returned an error: Bad Request

i con't found any error in tomcat 

-- 
View this message in context: 
http://old.nabble.com/CoreAdmin-tp27727352p27819657.html
Sent from the Solr - User mailing list archive at Nabble.com.



question related to coord() [might be expert level]

2010-03-08 Thread Smith G
Hello,
 I came to know that coord() value is being calculated on each
sub-query (BooleanQuery) present in the main query.
For Ex : f-field, k-keyword

(( f1:k1 OR f2:k2) OR f3:k3) OR f4:k4

Here if I am correct, coord() is being calculated totally 3 times. My
goal is to boost ( or edit formula of ) coord() value which is "for
the last time". It may seem strange untill you know why it is needed.

   We are expanding query using QueryParser plugin. It adds
synonym-terms of each field.
For example : town:lausanne ---> is expanded to : (town:lausanne OR
city:lausanne).
Consider a big query : Let us assume that f1s1-> first synonym of f1 ,
f1s2---> second synonym of f1, and so on
So, the query mentioned above is expanded to ..

(((f1:k1 or f1s1:k1 or f1s2:k1) OR (f2:k2 or f2s1:k2)) OR (f3:k3 or
f3s1:k3))  OR  f4:k4  [assume no synonyms for f4] .

So, here it makes sense to edit coord formula for the last "coord"
value, but not for every sub-boolean query because there could be 10
synonyms in some cases, etc..
My questions..

1) Is there any chance of finding out inside Similarity whether
current one is the last coord() ?

2) Or is there any other place where we can edit and reach our goal.

3) I have found out usage of "Coordintor" inside "BooleanScorer2",
which seems there could be a way to boost the last element of the
index in coordFactors[], but I do not know whether there could be
plugin for that, or even what would be the effect.

  This seems really expert level [for my knowledge], so I am seeking some help.

Thanks.


Re: Error 400 - By search with exclamation mark ... ?! PatternReplaceFilterFactory ?

2010-03-08 Thread Ahmet Arslan

> can i escape all possible operators with a requesthandler?

With a custom one yes. You can use the static method 
org.apache.lucene.queryParser.QueryParser.escape(String s). 
 
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) 
throws Exception, ParseException, InstantiationException, 
IllegalAccessException 
{

 String q = req.getParams().get(CommonParams.Q);
 ModifiableSolrParams solrParams = new   ModifiableSolrParams(req.getParams());
 solrParams.set(CommonParams.Q, QueryParser.escape(q));
 req.setParams(solrParams);
 super.handleRequestBody(req, rsp);
}

With this solution users cannot use solr query syntax anymore. For example 
range, wildcard queries won't work.

> or can i escape these operators automatic when the syntax
> is wrong ?

May be you can try to parse value of q parameter with new QueryParser in try 
catch block. If exception occurs you can escape special characters.
However I would prefer to do it in client side.


  


RE: Handling and sorting email addresses

2010-03-08 Thread Ian Battersby
Thanks Mitch, using the analysis page has been a real eye-opener and given
me a better insight into how Solr was applying the filters (and more
importantly in which order). I've ironically ended up with a charFilter
mapping file as this seemed the only route to replacing characters before
the tokenizer kicked in, unfortunately Solr just refused to allow sorting on
anything tokenized with characters other than whitespace.

Cheers, Ian.

-Original Message-
From: MitchK [mailto:mitc...@web.de] 
Sent: 07 March 2010 22:44
To: solr-user@lucene.apache.org
Subject: Re: Handling and sorting email addresses


Ian,

did you have a look at Solr's admin analysis.jsp?
When everything on the analysis's page is fine, you have missunderstood
Solr's schema.xml-file.

You've set two attributes in your schema.xml:
stored = true
indexed = true

What you get as a response is the stored field value.
The stored field value is the original field value, without any
modifications.
However, Solr is using the indexed field value to query your data.

Kind regards
- Mitch
 

Ian Battersby wrote:
> 
> Forgive what might seem like a newbie question but am struggling
> desperately
> with this. 
> 
> We have a dynamic field that holds email address and we'd like to be able
> to
> sort by it, obviously when trying to do this we get an error as it thinks
> the email address is a tokenized field. We've tried a custom field type
> using PatternReplaceFilterFactory to specify that @ and . should be
> replaced
> with " AT " and " DOT " but we just can't seem to get it to work, all the
> field still contain the unparsed email.
> 
> We used an example found on the mailing-list for the field type:
> 
>  positionIncrementGap="100">
>   
>
>
> replacement=" DOT " replace="all" />
> replacement=" AT " replace="all" />
> generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
>   
> 
> 
> .. our dynamic field looks like ..
> 
>stored="true"  multiValued="true" />
> 
> When writing a document to Solr it still seems to write the original email
> address (e.g. this.u...@somewhere.com) opposed to its parsed version (e.g.
> this DOT user AT somewhere DOT com). Can anyone help? 
> 
> We are running version 1.4 but have even tried the nightly build in an
> attempt to solve this problem.
> 
> Thanks.
> 
> 
> 

-- 
View this message in context:
http://old.nabble.com/Handling-and-sorting-email-addresses-tp27813111p278152
39.html
Sent from the Solr - User mailing list archive at Nabble.com.




question about mergeFactor

2010-03-08 Thread Marc Des Garets
Hello,

 

On the solr wiki, here:
http://wiki.apache.org/solr/SolrPerformanceFactors

 

It is written:

mergeFactor Tradeoffs

 

High value merge factor (e.g., 25):

Pro: Generally improves indexing speed

Con: Less frequent merges, resulting in a collection with more index
files which may slow searching

Low value merge factor (e.g., 2):

 

Pro: Smaller number of index files, which speeds up searching.

Con: More segment merges slow down indexing.

 

If I have a mergeFactor of 50 when I build the index and then I optimize
the index, I end up with 1 index file so I have a small number of index
files and having used mergeFactor of 50 won't slow searching? Or my
supposition is wrong and the mergeFactor used when building the index
has an impact on speed searching anyway?

 

 

Thanks.
--
This transmission is strictly confidential, possibly legally privileged, and 
intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author 
and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary 
companies.  If you 
are not the intended recipient then you must not disclose, copy or take any 
action in reliance of this 
transmission. If you have received this transmission in error, please notify 
the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement 
on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written 
confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  
i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 
673128728.

Re: Question about fieldNorms

2010-03-08 Thread Siddhant Goel
Wonderful! That explains it. Thanks a lot!

Regards,

On Mon, Mar 8, 2010 at 6:39 AM, Jay Hill  wrote:

> Yes, if omitNorms=true, then no lengthNorm calculation will be done, and
> the
> fieldNorm value will be 1.0, and lengths of the field in question will not
> be a factor in the score.
>
> To see an example of this you can do a quick test. Add two "text" fields,
> and on one omitNorms:
>
>   
>omitNorms="true"/>
>
> Index a doc with the same value for both fields:
>  1 2 3 4 5
>  1 2 3 4 5
>
> Set &debugQuery=true and do two queries: &q=foo:5   &q=bar:5
>
> in the "explain" section of the debug output note that the fieldNorm value
> for the "foo" query is this:
>
>0.4375 = fieldNorm(field=foo, doc=1)
>
> and the value for the "bar" query is this:
>
>1.0 = fieldNorm(field=bar, doc=1)
>
> A simplified description of how the fieldNorm value is: fieldNorm =
> lengthNorm * documentBoost * documentFieldBoosts
>
> and the lengthNorm is calculated like this: lengthNorm  =
> 1/(numTermsInField)**.5
> [note that the value is encoded as a single byte, so there is some
> precision
> loss]
>
> When omitNorms=true no norm calculation is done, so fieldNorm will always
> be
> one on those fields.
>
> You can also use the Luke utility to view the document in the index, and it
> will show that there is a norm value for the foo field, but not the bar
> field.
>
> -Jay
> http://www.lucidimagination.com
>
>
> On Sun, Mar 7, 2010 at 5:55 AM, Siddhant Goel  >wrote:
>
> > Hi everyone,
> >
> > Is the fieldNorm calculation altered by the omitNorms factor? I saw on
> this
> > page (http://old.nabble.com/Question-about-fieldNorm-td17782701.html)
> the
> > formula for calculation of fieldNorms (fieldNorm =
> > fieldBoost/sqrt(numTermsForField)).
> >
> > Does this mean that for a document containing a string like "A B C D E"
> in
> > its field, its fieldNorm would be boost/sqrt(5), and for another document
> > containing the string "A B C" in the same field, its fieldNorm would be
> > boost/sqrt(3). Is that correct?
> >
> > If yes, then is *this* what omitNorms affects?
> >
> > Thanks,
> >
> > --
> > - Siddhant
> >
>



-- 
- Siddhant


Re: index merge

2010-03-08 Thread Shalin Shekhar Mangar
Hi Mark,

On Sun, Mar 7, 2010 at 6:20 PM, Mark Fletcher
wrote:

>
> I have created 2  identical cores coreX and coreY (both have different
> dataDir values, but their index is same).
> coreX - always serves the request when a user performs a search.
> coreY - the updates will happen to this core and then I need to synchronize
> it with coreX after the update process, so that coreX also has the
>   latest data in it.  After coreX and coreY are synchronized, both
> should again be identical again.
>
> For this purpose I tried core merging of coreX and coreY once coreY is
> updated with the latest set of data. But I find coreX to be containing
> double the record count as in coreY.
> (coreX = coreX+coreY)
>
> Is there a problem in using MERGE concept here. If it is wrong can some one
> pls suggest the best approach. I tried the various merges explained in my
> previous mail.
>
>
Index merge happens at the Lucene level which has no idea about uniqueKeys.
Therefore when you merge two indexes containing exactly the same documents
(by uniqueKey), you get double the document count.

Looking at your scenario, it seems to me that what you want to do is a swap
operation. coreX is serving the requests, coreY is updated and now you can
swap coreX with coreY so that new requests hit the updated index. I suggest
you look at the swap operation instead of index merge.

-- 
Regards,
Shalin Shekhar Mangar.


Re: which links do i have to follow to understand location based search concepts?

2010-03-08 Thread KshamaPai

Hi,
Thank You for explaining it in a simple way.
The article really helped me to understand the concepts better.

My question is ,Is it necessary that the data what you are indexing in
spatial example, is to be in the osm format and using facts files?
 In my case,am trying to index data ,that has just lat,longitude and related
news item(just text) in a xml file which looks like this







I have silghtly modified driver.java and other .java files in src/main/java
folder, so that these fields are considered for indexing.(but have retained
geohash,lat_rad,lng_rad as done in spatial example)

But when i do ant index , am getting 

Buildfile: build.xml

init:

compile:

index:
 [echo] Indexing ./data/ 
 [java] ./data/   http://localhost:8983/solr
 [java] Num args: 2
 [java] Starting indexing
 [java] Indexing: ./data/final.xml
 [java] Mar 8, 2010 4:40:35 AM
org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
 [java] INFO: I/O exception (java.net.ConnectException) caught when
processing request: Connection refused
 [java] Mar 8, 2010 4:40:35 AM
org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
 [java] INFO: Retrying request
 [java] Mar 8, 2010 4:40:35 AM
org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
 [java] INFO: I/O exception (java.net.ConnectException) caught when
processing request: Connection refused
 [java] Mar 8, 2010 4:40:35 AM
org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
 [java] INFO: Retrying request
 [java] Mar 8, 2010 4:40:35 AM
org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
 [java] INFO: I/O exception (java.net.ConnectException) caught when
processing request: Connection refused
 [java] Mar 8, 2010 4:40:35 AM
org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
 [java] INFO: Retrying request
 [java] org.apache.solr.client.solrj.SolrServerException:
java.net.ConnectException: Connection refused
 [java] at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472)
 [java] at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
 [java] at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [java] at
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
 [java] at OSMHandler.endElement(OSMHandler.java:127)
 [java] at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:601)
 [java] at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1774)
 [java] at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2930)
 [java] at
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
 [java] at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
 [java] at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
 [java] at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
 [java] at
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
 [java] at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
 [java] at
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
 [java] at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
 [java] at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
 [java] at OSM2Solr.process(OSM2Solr.java:44)
 [java] at Driver.main(Driver.java:80)
 [java] Caused by: java.net.ConnectException: Connection refused
 [java] at java.net.PlainSocketImpl.socketConnect(Native Method)
 [java] at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
 [java] at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
 [java] at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
 [java] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
 [java] at java.net.Socket.connect(Socket.java:519)
 [java] at java.net.Socket.connect(Socket.java:469)
 [java] at java.net.Socket.(Socket.java:366)
 [java] at java.net.Socket.(Socket.java:240)
 [java] at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
 [java] at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
 [java] at
org.a

Re: which links do i have to follow to understand location based search concepts?

2010-03-08 Thread Shalin Shekhar Mangar
On Mon, Mar 8, 2010 at 6:21 PM, KshamaPai  wrote:

>
> Hi,
> Thank You for explaining it in a simple way.
> The article really helped me to understand the concepts better.
>
> My question is ,Is it necessary that the data what you are indexing in
> spatial example, is to be in the osm format and using facts files?
>  In my case,am trying to index data ,that has just lat,longitude and
> related
> news item(just text) in a xml file which looks like this
>
> 
> 
>
> 
> 
>
> I have silghtly modified driver.java and other .java files in src/main/java
> folder, so that these fields are considered for indexing.(but have retained
> geohash,lat_rad,lng_rad as done in spatial example)
>
> But when i do ant index , am getting
>
> Buildfile: build.xml
>
> init:
>
> compile:
>
> index:
> [echo] Indexing ./data/
> [java] ./data/   http://localhost:8983/solr
> [java] Num args: 2
> [java] Starting indexing
> [java] Indexing: ./data/final.xml
> [java] Mar 8, 2010 4:40:35 AM
> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> [java] INFO: I/O exception (java.net.ConnectException) caught when
> processing request: Connection refused
>

The "Connection refused" message suggests that your Solr instance is either
not running or you have given the wrong host/port in your driver.

-- 
Regards,
Shalin Shekhar Mangar.


Re: question about mergeFactor

2010-03-08 Thread Shalin Shekhar Mangar
On Mon, Mar 8, 2010 at 5:31 PM, Marc Des Garets wrote:

>
> If I have a mergeFactor of 50 when I build the index and then I optimize
> the index, I end up with 1 index file so I have a small number of index
> files and having used mergeFactor of 50 won't slow searching? Or my
> supposition is wrong and the mergeFactor used when building the index
> has an impact on speed searching anyway?
>
>
If you optimize then mergeFactor does not matter and your searching speed
will not be slowed down. On the other hand, the optimize may take the bulk
of the indexing time, so you won't get any benefit from using a mergeFactor
of 50.

-- 
Regards,
Shalin Shekhar Mangar.


LocalSolr,Apache-solr-1.4.0

2010-03-08 Thread mamathahl

hi,
I am interested in spatial search. I am using Apache-solr 1.4.0 and
LocalSolr
I have followed the instructions given in the following website
http://gissearch.com/localsolr
The query of the following format
/solr/select?&qt=geo&lat=xx.xx&long=yy.yy&q=abc&radius=zz
(after substituting valid values) given in the website is not producing any
results.
I am also not understanding the significance of field "long" in the
query,though there is no field specified as "long", instead i can find only
"lng".
I want the results to be produced with respect to radius and not the results
got by a mere full text search..
Is there any need to use LocalLucene also?

Any help regarding this will be appreciated.
Thanks in advance
-- 
View this message in context: 
http://old.nabble.com/LocalSolr%2CApache-solr-1.4.0-tp27819867p27819867.html
Sent from the Solr - User mailing list archive at Nabble.com.



Position of snippet within highlighted field

2010-03-08 Thread Mark Roberts
Does anyone know if it's possible to get the position of the highlighted 
snippet within the field that's being highlighted?

It would be really useful for me to know if the snippet is at the beginning or 
at the end of the text field that it comes from.

Thanks, Mark.


Re: example solr xml working fine but my own xml files not working

2010-03-08 Thread Erick Erickson
Have you looked in your SOLR log file to see what that says?

Check the editor you use for your XML. is it using UTF-8 (although you
don't appear to be using any odd characters, probably not a problem).

Think about taking the xml file that *does* work, copying it and editing
*that* one.

Erick

On Mon, Mar 8, 2010 at 12:13 AM, venkatesh uruti
wrote:

>
> Dear Eric,
>
> Please find below necessary steps that executed.
>
> Iam following same structure as mentioned by you, and checked  results in
> the admin page by clicking search button, samples are working fine.
>
> Ex:Added monitor.xml and search for video its displaying results-
> search
> content is displaying properly
>
> Let me explain you the problem which iam facing:
>
> step 1: I started Apache tomcat
>
> step2 : Indexing Data
>   java -jar post.jar myfile.xml
>
> Here is my XML content:
>
> 
>
> 
>  1
>  Youth to Elder
>   Integrated Research Program
>  2009
>   First Nation
> 
> 
>  2
>  Strategies 
>  Implementation Committee 
>  2001
>  Policy
>
> 
>
> 
> Step 4 : i did
>
> java -jar post.jar myfile.xml
>
>
> output of above one:
>
> SimplePostTool: version 1.2
> SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8,
> othe
> r encodings are not currently supported
> SimplePostTool: POSTing files to http://localhost:8983/solr/update..
> SimplePostTool: POSTing file curnew.xml
> SimplePostTool: FATAL: Solr returned an error: Bad Request
>
> Request to help me on this.
>
> --
> View this message in context:
> http://old.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp27793958p27817161.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


HTML encode extracted docs

2010-03-08 Thread Mark Roberts
I'm uploading .htm files to be extracted - some of these files are "include" 
files that have snippets of HTML rather than fully formed html documents.

solr-cell stores the raw HTML for these items, rather than extracting the text. 
Is there any way I can get solr to encode this content prior to storing it?

At the moment, I have the problem that when the highlighted snippets are  
retrieved via search, I need to parse the snippet and HTML encode the bits of 
HTML that where indexed, whilst *not* encoding the bits that where added by the 
highlighter, which is messy and time consuming.

Thanks! Mark,


Re: Handling and sorting email addresses

2010-03-08 Thread Erick Erickson
Well, it's not unfortunate . What would it mean to sort
on a tokenized field? Let's say I index "is testing fun". Removing
stopwords and stemming probably indexes "test" "fun". How
in the world would meaningful sorts happen now? Even if
it was "in order", since the first token was stopped out this
document wouldn't even be in the right part of the alphabet.

The usual solution is to use copyfield and index your field
untokenized in that second field, then sort on *that* field.

HTH
Erick

On Mon, Mar 8, 2010 at 6:56 AM, Ian Battersby wrote:

> Thanks Mitch, using the analysis page has been a real eye-opener and given
> me a better insight into how Solr was applying the filters (and more
> importantly in which order). I've ironically ended up with a charFilter
> mapping file as this seemed the only route to replacing characters before
> the tokenizer kicked in, unfortunately Solr just refused to allow sorting
> on
> anything tokenized with characters other than whitespace.
>
> Cheers, Ian.
>
> -Original Message-
> From: MitchK [mailto:mitc...@web.de]
> Sent: 07 March 2010 22:44
> To: solr-user@lucene.apache.org
> Subject: Re: Handling and sorting email addresses
>
>
> Ian,
>
> did you have a look at Solr's admin analysis.jsp?
> When everything on the analysis's page is fine, you have missunderstood
> Solr's schema.xml-file.
>
> You've set two attributes in your schema.xml:
> stored = true
> indexed = true
>
> What you get as a response is the stored field value.
> The stored field value is the original field value, without any
> modifications.
> However, Solr is using the indexed field value to query your data.
>
> Kind regards
> - Mitch
>
>
> Ian Battersby wrote:
> >
> > Forgive what might seem like a newbie question but am struggling
> > desperately
> > with this.
> >
> > We have a dynamic field that holds email address and we'd like to be able
> > to
> > sort by it, obviously when trying to do this we get an error as it thinks
> > the email address is a tokenized field. We've tried a custom field type
> > using PatternReplaceFilterFactory to specify that @ and . should be
> > replaced
> > with " AT " and " DOT " but we just can't seem to get it to work, all the
> > field still contain the unparsed email.
> >
> > We used an example found on the mailing-list for the field type:
> >
> >  > positionIncrementGap="100">
> >   
> >
> >
> > > replacement=" DOT " replace="all" />
> > > replacement=" AT " replace="all" />
> > > generateWordParts="1"
> > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="0"/>
> >   
> > 
> >
> > .. our dynamic field looks like ..
> >
> >> stored="true"  multiValued="true" />
> >
> > When writing a document to Solr it still seems to write the original
> email
> > address (e.g. this.u...@somewhere.com) opposed to its parsed version
> (e.g.
> > this DOT user AT somewhere DOT com). Can anyone help?
> >
> > We are running version 1.4 but have even tried the nightly build in an
> > attempt to solve this problem.
> >
> > Thanks.
> >
> >
> >
>
> --
> View this message in context:
>
> http://old.nabble.com/Handling-and-sorting-email-addresses-tp27813111p278152
> 39.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>


RE: question about mergeFactor

2010-03-08 Thread Marc Des Garets
Perfect. Thank you for your help.

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: 08 March 2010 12:57
To: solr-user@lucene.apache.org
Subject: Re: question about mergeFactor

On Mon, Mar 8, 2010 at 5:31 PM, Marc Des Garets
wrote:

>
> If I have a mergeFactor of 50 when I build the index and then I
optimize
> the index, I end up with 1 index file so I have a small number of
index
> files and having used mergeFactor of 50 won't slow searching? Or my
> supposition is wrong and the mergeFactor used when building the index
> has an impact on speed searching anyway?
>
>
If you optimize then mergeFactor does not matter and your searching
speed
will not be slowed down. On the other hand, the optimize may take the
bulk
of the indexing time, so you won't get any benefit from using a
mergeFactor
of 50.

-- 
Regards,
Shalin Shekhar Mangar.
--
This transmission is strictly confidential, possibly legally privileged, and 
intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author 
and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary 
companies.  If you 
are not the intended recipient then you must not disclose, copy or take any 
action in reliance of this 
transmission. If you have received this transmission in error, please notify 
the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement 
on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written 
confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  
i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 
673128728.

Re: index merge

2010-03-08 Thread Mark Fletcher
Hi Shalin,

Thank you for the reply.

I got your point. So I understand merge will just duplicate things.

I ran the SWAP command. Now:-
COREX has the dataDir pointing to the updated dataDir of COREY. So COREX has
the latest.
Again, COREY (on which the update regularly runs) is pointing to the old
index of COREX. So this now doesnt have the most updated index.

Now shouldn't I update the index of COREY (now pointing to the old COREX) so
that it has the latest footprint as in COREX (having the latest COREY
index)so that when the update again happens to COREY, it has the latest and
I again do the SWAP.

Is a physical copying of the index  named COREY (the latest and now datDir
of COREX after SWAP) to the index COREX  (now the dataDir of COREY.. the
orginal non-updated index of COREX) the best way for this or is there any
other better option.

Once again, later when COREY is again updated with the latest, I will run
the SWAP again and it will be fine with COREX again pointing to its original
dataDir (now the updated one).So every even SWAP command run will point
COREX back to its original dataDir. (same case with COREY).

My only concern is after the SWAP is done, updating the old index (which was
serving previously and now replaced by the new index). What is the best way
to do that? Physically copy the latest index to the old one and make it in
sync with the latest one so that by the time it is to get the latest updates
it has the latest in it so that the new ones can be added to this and it
becomes the latest and is again swapped?

Please share your opinion. Once again your help is appreciated. I am kind of
going in circles with multiple indexs for some days!

Thanks and Rgds,
Mark.

On Mon, Mar 8, 2010 at 7:45 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Mark,
>
> On Sun, Mar 7, 2010 at 6:20 PM, Mark Fletcher
> wrote:
>
> >
> > I have created 2  identical cores coreX and coreY (both have different
> > dataDir values, but their index is same).
> > coreX - always serves the request when a user performs a search.
> > coreY - the updates will happen to this core and then I need to
> synchronize
> > it with coreX after the update process, so that coreX also has the
> >   latest data in it.  After coreX and coreY are synchronized,
> both
> > should again be identical again.
> >
> > For this purpose I tried core merging of coreX and coreY once coreY is
> > updated with the latest set of data. But I find coreX to be containing
> > double the record count as in coreY.
> > (coreX = coreX+coreY)
> >
> > Is there a problem in using MERGE concept here. If it is wrong can some
> one
> > pls suggest the best approach. I tried the various merges explained in my
> > previous mail.
> >
> >
> Index merge happens at the Lucene level which has no idea about uniqueKeys.
> Therefore when you merge two indexes containing exactly the same documents
> (by uniqueKey), you get double the document count.
>
> Looking at your scenario, it seems to me that what you want to do is a swap
> operation. coreX is serving the requests, coreY is updated and now you can
> swap coreX with coreY so that new requests hit the updated index. I suggest
> you look at the swap operation instead of index merge.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Tomcat save my Index temp ...

2010-03-08 Thread stocki

Hello.

is use 2 cores for solr.

when is restart my tomcat on debian, tomcat delete my index. 

is set data.dir to 
${solr.data.dir:./suggest/data} 
and 
${solr.data.dir:./search/data}






so. why is my index only temp ? 

solr save my index to: /var/lib/tomcat5.5/temp

i test my solr env on XP with tomcat, and all ist okay =(
-- 
View this message in context: 
http://old.nabble.com/Tomcat-save-my-Index-temp-...-tp27819967p27819967.html
Sent from the Solr - User mailing list archive at Nabble.com.



Import database

2010-03-08 Thread Quan Nguyen Anh

Hi,
I have started using Solr. I had a problem when I insert a database with 
2 million rows . I hav

The server encounters error: java.lang.OutOfMemoryError: Java heap space
I searched around  but can't find the solution.
Any hep regarding this will be appreciated.
Thanks in advance





Re: Tomcat save my Index temp ...

2010-03-08 Thread Erick Erickson
You're probably hitting the difference between *nix file
handling and Windows. When you delete a file on a
Unix variant, if some other program has the file open
the file doesn't go away until that other program closes
it.

HTH
Erick

On Mon, Mar 8, 2010 at 9:08 AM, stocki  wrote:

>
> Hello.
>
> is use 2 cores for solr.
>
> when is restart my tomcat on debian, tomcat delete my index.
>
> is set data.dir to
> ${solr.data.dir:./suggest/data}
> and
> ${solr.data.dir:./search/data}
>
> 
>
> dataDir="/suggest/data/index"/>
> 
>
> so. why is my index only temp ?
>
> solr save my index to: /var/lib/tomcat5.5/temp
>
> i test my solr env on XP with tomcat, and all ist okay =(
> --
> View this message in context:
> http://old.nabble.com/Tomcat-save-my-Index-temp-...-tp27819967p27819967.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Import database

2010-03-08 Thread Lee Smith
I had same issue with Jetty

Adding extra memory resolved my issue ie:  java -Xms=512M -Xmx=1024M  -jar 
start.jar

Its in the manual, but cant seem to find the link


On 8 Mar 2010, at 14:09, Quan Nguyen Anh wrote:

> Hi,
> I have started using Solr. I had a problem when I insert a database with 2 
> million rows . I hav
> The server encounters error: java.lang.OutOfMemoryError: Java heap space
> I searched around  but can't find the solution.
> Any hep regarding this will be appreciated.
> Thanks in advance
> 
> 
> 



Re: Tomcat save my Index temp ...

2010-03-08 Thread Jens Kapitza

Am 08.03.2010 15:08, schrieb stocki:

Hello.

is use 2 cores for solr.

when is restart my tomcat on debian, tomcat delete my index.
   

you should check your tomcat-setup.

is set data.dir to
${solr.data.dir:./suggest/data}
and
${solr.data.dir:./search/data}

   
use an absolute path [you have not set the solr.home path] this is 
working/tmp dir from tomcat per default.


 
 


   

is ok. but this is relative from solr.home.

so. why is my index only temp ?

   

try to setup solr again.
http://wiki.apache.org/solr/SolrTomcat

try to setup with Context fragment.

Create a Tomcat Context fragment to point /docBase/ to the 
/$SOLR_HOME/apache-solr-1.3.0.war/ file and /solr/home/ to /$SOLR_HOME/:



and avoid storing the data in .../tmp/



Re: index merge

2010-03-08 Thread Shalin Shekhar Mangar
Hi Mark,

On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher
wrote:

>
> I ran the SWAP command. Now:-
> COREX has the dataDir pointing to the updated dataDir of COREY. So COREX
> has the latest.
> Again, COREY (on which the update regularly runs) is pointing to the old
> index of COREX. So this now doesnt have the most updated index.
>
> Now shouldn't I update the index of COREY (now pointing to the old COREX)
> so that it has the latest footprint as in COREX (having the latest COREY
> index)so that when the update again happens to COREY, it has the latest and
> I again do the SWAP.
>
> Is a physical copying of the index  named COREY (the latest and now datDir
> of COREX after SWAP) to the index COREX  (now the dataDir of COREY.. the
> orginal non-updated index of COREX) the best way for this or is there any
> other better option.
>
> Once again, later when COREY is again updated with the latest, I will run
> the SWAP again and it will be fine with COREX again pointing to its original
> dataDir (now the updated one).So every even SWAP command run will point
> COREX back to its original dataDir. (same case with COREY).
>
> My only concern is after the SWAP is done, updating the old index (which
> was serving previously and now replaced by the new index). What is the best
> way to do that? Physically copy the latest index to the old one and make it
> in sync with the latest one so that by the time it is to get the latest
> updates it has the latest in it so that the new ones can be added to this
> and it becomes the latest and is again swapped?
>

Perhaps it is best if we take a step back and understand why you need two
identical cores?

-- 
Regards,
Shalin Shekhar Mangar.


Extracting content from mailman managed mail list archive

2010-03-08 Thread Lukáš Vlček
Hi,

is anybody willing to share experience about how to extract content from
mailing list archives in order to have it indexed by Lucene or Solr?

Imagine that we have access to archive of some mailling list (e.g.
http://www.mail-archive.com/mailman-users%40python.org/) and we would like
to index individual emails. Is there any easy way how to extract just the
text content produced by sender individual emails? I am interested in
content generated by particular sender omitting the original quoted text. We
can either access individual emails via web or we can download monthly
archive in plain text format (but the content of individual emails depends
on the email client of the author, i.e. plain text, html, html mixed with
plain text in  ... etc... it is very messy).

I would prefer information about mailing lists managed by mailman but I
don't want to limit the scope of this question so any general ideas are
welcome.

Regards,
Lukas


Re: Extracting content from mailman managed mail list archive

2010-03-08 Thread Lukáš Vlček
I just checked popular search services and it seems that neither
lucidimagination search nor search-lucene support this:
http://www.lucidimagination.com/search/document/954e8589ebbc4b16/terminating_slashes_in_url_normalization
http://www.search-lucene.com/m?id=510143ac0608042241k49f4afe7wcd25df3fbacc7...@mail.gmail.com||mailman

Markmail does not support this as well
http://markmail.org/message/papbjx3aoz3uvbhh

Hmmm
I think it would be useful to extract just the *NEW* content without all
quotes because this influences Lucene scoring.

Regards,
Lukas

On Mon, Mar 8, 2010 at 3:55 PM, Lukáš Vlček  wrote:

> Hi,
>
> is anybody willing to share experience about how to extract content from
> mailing list archives in order to have it indexed by Lucene or Solr?
>
> Imagine that we have access to archive of some mailling list (e.g.
> http://www.mail-archive.com/mailman-users%40python.org/) and we would like
> to index individual emails. Is there any easy way how to extract just the
> text content produced by sender individual emails? I am interested in
> content generated by particular sender omitting the original quoted text. We
> can either access individual emails via web or we can download monthly
> archive in plain text format (but the content of individual emails depends
> on the email client of the author, i.e. plain text, html, html mixed with
> plain text in  ... etc... it is very messy).
>
> I would prefer information about mailing lists managed by mailman but I
> don't want to limit the scope of this question so any general ideas are
> welcome.
>
> Regards,
> Lukas
>


Child entities in document not loading

2010-03-08 Thread John Ament
All,

So I think I have my first issue figured out, need to add terms to the
default search.  That's fine.

New issue is that I'm trying to load child entities in with my entity.

I added the appropriate fields to solrconfig.xml





And I updated my document to match









So my expectation is that there will be 3 new fields associated with it that
are multivalued: sizes, colors, and sections.

The full-import seems to work correctly.  I get the appropriate number of
documents in my searches.  However, sizes, colors and sections all come up
null (well, I should say they don't come up when I search for them).

Any ideas on why it won't load these 3 child entities?

Thanks!

John


Re: Free Webinar: Mastering Solr 1.4 with Yonik Seeley

2010-03-08 Thread stocki

you only just delete your browser cache ;)



stocki wrote:
> 
> i have the same problem ...
> 
> i wrote an email...-->
> 
> Jonas, did you set the country correctly? If you set it to the US it will
> validate against US number formats and not recognize your number in
> Germany.
> 
> but i did not find any option to set my country =(
> 
> 
> 
> 
> Janne Majaranta wrote:
>> 
>> Do I need a U.S. phone number to view the recording / download the slides
>> ?
>> The registration form whines about invalid area code..
>> 
>> -Janne
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Re%3A-Free-Webinar%3A-Mastering-Solr-1.4-with-Yonik-Seeley-tp27720526p27820239.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Child entities in document not loading

2010-03-08 Thread Erick Erickson
What does the solr admin page show you is actually in your index?

Luke will also help.

Erick

On Mon, Mar 8, 2010 at 10:06 AM, John Ament  wrote:

> All,
>
> So I think I have my first issue figured out, need to add terms to the
> default search.  That's fine.
>
> New issue is that I'm trying to load child entities in with my entity.
>
> I added the appropriate fields to solrconfig.xml
>
> multiValued="true"/>
> multiValued="true"/>
> multiValued="true"/>
>
> And I updated my document to match
>
>
>
>
>
>
>
>
>
> So my expectation is that there will be 3 new fields associated with it
> that
> are multivalued: sizes, colors, and sections.
>
> The full-import seems to work correctly.  I get the appropriate number of
> documents in my searches.  However, sizes, colors and sections all come up
> null (well, I should say they don't come up when I search for them).
>
> Any ideas on why it won't load these 3 child entities?
>
> Thanks!
>
> John
>


Re: Child entities in document not loading

2010-03-08 Thread John Ament
Where would I see this? I do believe the fields are not ending up in the
index.

Thanks

John

On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson wrote:

> What does the solr admin page show you is actually in your index?
>
> Luke will also help.
>
> Erick
>
> On Mon, Mar 8, 2010 at 10:06 AM, John Ament  wrote:
>
> > All,
> >
> > So I think I have my first issue figured out, need to add terms to the
> > default search.  That's fine.
> >
> > New issue is that I'm trying to load child entities in with my entity.
> >
> > I added the appropriate fields to solrconfig.xml
> >
> > > multiValued="true"/>
> > > multiValued="true"/>
> > > multiValued="true"/>
> >
> > And I updated my document to match
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > So my expectation is that there will be 3 new fields associated with it
> > that
> > are multivalued: sizes, colors, and sections.
> >
> > The full-import seems to work correctly.  I get the appropriate number of
> > documents in my searches.  However, sizes, colors and sections all come
> up
> > null (well, I should say they don't come up when I search for them).
> >
> > Any ideas on why it won't load these 3 child entities?
> >
> > Thanks!
> >
> > John
> >
>


Re: index merge

2010-03-08 Thread Mark Fletcher
Hi Shalin,

Thank you for the mail.
My main purpose of having 2 identical cores
COREX - always serves user request
COREY - every day once, takes the updates/latest data and passess it on to
COREX.
is:-

Suppose say I have only one COREY and suppose a request comes to COREY while
the update of the latest data is happening on to it. Wouldn't it degrade
performance of the requests at that point of time?

So I was planning to keep COREX and COREY always identical. Once COREY has
the latest it should somehow sync with COREX so that COREX also now has the
latest. COREY keeps on getting the updates at a particular time of day and
it will again pass it on to COREX. This process continues everyday.

What is the best possible way to implement this?

Thanks,

Mark.


On Mon, Mar 8, 2010 at 9:53 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Mark,
>
>  On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher <
> mark.fletcher2...@gmail.com> wrote:
>
>>
>> I ran the SWAP command. Now:-
>> COREX has the dataDir pointing to the updated dataDir of COREY. So COREX
>> has the latest.
>> Again, COREY (on which the update regularly runs) is pointing to the old
>> index of COREX. So this now doesnt have the most updated index.
>>
>> Now shouldn't I update the index of COREY (now pointing to the old COREX)
>> so that it has the latest footprint as in COREX (having the latest COREY
>> index)so that when the update again happens to COREY, it has the latest and
>> I again do the SWAP.
>>
>> Is a physical copying of the index  named COREY (the latest and now datDir
>> of COREX after SWAP) to the index COREX  (now the dataDir of COREY.. the
>> orginal non-updated index of COREX) the best way for this or is there any
>> other better option.
>>
>> Once again, later when COREY is again updated with the latest, I will run
>> the SWAP again and it will be fine with COREX again pointing to its original
>> dataDir (now the updated one).So every even SWAP command run will point
>> COREX back to its original dataDir. (same case with COREY).
>>
>> My only concern is after the SWAP is done, updating the old index (which
>> was serving previously and now replaced by the new index). What is the best
>> way to do that? Physically copy the latest index to the old one and make it
>> in sync with the latest one so that by the time it is to get the latest
>> updates it has the latest in it so that the new ones can be added to this
>> and it becomes the latest and is again swapped?
>>
>
> Perhaps it is best if we take a step back and understand why you need two
> identical cores?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Tomcat save my Index temp ...

2010-03-08 Thread stocki


okay i install my solr so like how the wiki said. and a new try. here one of
my two files:


   






Jens Kapitza-2 wrote:
> 
> Am 08.03.2010 15:08, schrieb stocki:
>> Hello.
>>
>> is use 2 cores for solr.
>>
>> when is restart my tomcat on debian, tomcat delete my index.
>>
> you should check your tomcat-setup.
>> is set data.dir to
>> ${solr.data.dir:./suggest/data}
>> and
>> ${solr.data.dir:./search/data}
>>
>>
> use an absolute path [you have not set the solr.home path] this is 
> working/tmp dir from tomcat per default.
>> 
>>  > dataDir="/search/data/index"/>
>>  > dataDir="/suggest/data/index"/>
>> 
>>
>>
> is ok. but this is relative from solr.home.
>> so. why is my index only temp ?
>>
>>
> try to setup solr again.
> http://wiki.apache.org/solr/SolrTomcat
> 
> try to setup with Context fragment.
> 
> Create a Tomcat Context fragment to point /docBase/ to the 
> /$SOLR_HOME/apache-solr-1.3.0.war/ file and /solr/home/ to /$SOLR_HOME/:
> 
> 
> and avoid storing the data in .../tmp/
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Tomcat-save-my-Index-temp-...-tp27819967p27823287.html
Sent from the Solr - User mailing list archive at Nabble.com.



Wildcard question -- case issue

2010-03-08 Thread cjkadakia

I'm encountering a potential bug in Solr regarding wildcards. I have two
fields defined thusly:



  





  
  





  



and 


  








  
  






  


When searching with wildcards I get the following behavior.

Two Documents in the index are named "CMJ foo bar" and "CME foo bar"

The name field has been indexed twice as "name" and "namesimple"

query:

spell?q=name:(cm*) OR namesimple:(cm*)

returns:
CMJ foo bar
CME foo bar

spell?q=name:(CM*) OR namesimple:(CM*)
returns
No results.

I added a equivalent synonym for "cmj,CMJ" and re-indexed

spell?q=name:(CM*) OR namesimple:(CM*)
returns
CMJ foo bar

Naturally I can't see the value or practical use of adding each of these as
they get reported by users and the documentation I've read (as well as
feedback I received on these forums) I've found stemming can interfere with
wildcards during query and indexing, which is why the namesimple field is of
type "textgen." This solved other wildcard/case issues, but this one
remains.

Any suggestions would be appreciated. Thanks!
-- 
View this message in context: 
http://old.nabble.com/Wildcard-questioncase-issue-tp27823332p27823332.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Import database

2010-03-08 Thread Shawn Heisey
What database are you using?  Many of the JDBC drivers try to pull the 
entire resultset into RAM before feeding it to the application that 
requested the data.  If it's MySQL, I can show you how to fix it.  The 
batchSize parameter below tells it to stream the data rather than buffer 
it.  With other databases, I don't know how to do this.



  
url="jdbc:mysql://[SERVER]:3306/[SCHEMA]?zeroDateTimeBehavior=convertToNull" 


  batchSize="-1"
  user="[REMOVED]"
  password="[REMOVED]"/>






Shawn


On 3/8/2010 7:09 AM, Quan Nguyen Anh wrote:

Hi,
I have started using Solr. I had a problem when I insert a database 
with 2 million rows . I hav

The server encounters error: java.lang.OutOfMemoryError: Java heap space
I searched around  but can't find the solution.
Any hep regarding this will be appreciated.
Thanks in advance







Re: index merge

2010-03-08 Thread Shalin Shekhar Mangar
Hi Mark,

On Mon, Mar 8, 2010 at 9:23 PM, Mark Fletcher
wrote:

>
> My main purpose of having 2 identical cores
> COREX - always serves user request
> COREY - every day once, takes the updates/latest data and passess it on to
> COREX.
> is:-
>
> Suppose say I have only one COREY and suppose a request comes to COREY
> while the update of the latest data is happening on to it. Wouldn't it
> degrade performance of the requests at that point of time?
>

The thing to note is that both reads and writes are happening on the same
box. So when you swap cores, the OS has to cache the hot segments of the new
(inactive) index. If you were just re-opening the same (active) index, at
least some of the existing files could remain in the OS's file cache. I
think that may just degrade performance further so you should definitely
benchmark before going through with this.

The best practice is to use a master/slave architecture and separate the
writes and reads.


> So I was planning to keep COREX and COREY always identical. Once COREY has
> the latest it should somehow sync with COREX so that COREX also now has the
> latest. COREY keeps on getting the updates at a particular time of day and
> it will again pass it on to COREX. This process continues everyday.
>

You could use the same approach that Solr 1.3's snapinstaller script used.
It deletes the files and creates hard links to the new index files.

-- 
Regards,
Shalin Shekhar Mangar.


Re: index merge

2010-03-08 Thread Mark Miller

On 03/08/2010 10:53 AM, Mark Fletcher wrote:

Hi Shalin,

Thank you for the mail.
My main purpose of having 2 identical cores
COREX - always serves user request
COREY - every day once, takes the updates/latest data and passess it on to
COREX.
is:-

Suppose say I have only one COREY and suppose a request comes to COREY while
the update of the latest data is happening on to it. Wouldn't it degrade
performance of the requests at that point of time?
   
Yes - but your not going to help anything by using two indexes - best 
you can do it use two boxes. 2 indexes on the same box will actually
be worse than one if they are identical and you are swapping between 
them. Writes on an index will not affect reads in the way you are 
thinking - only in that its uses IO and CPU that the read process cant. 
Thats going to happen with 2 indexes on the same box too - except now 
you have way more data to cache and flip between, and you can't take any 
advantage of things just being written possibly being in the cache for 
reads.


Lucene indexes use a write once strategy - when writing new segments, 
you are not touching the segments being read from. Lucene is already 
doing the index juggling for you at the segment level.



So I was planning to keep COREX and COREY always identical. Once COREY has
the latest it should somehow sync with COREX so that COREX also now has the
latest. COREY keeps on getting the updates at a particular time of day and
it will again pass it on to COREX. This process continues everyday.

What is the best possible way to implement this?

Thanks,

Mark.


On Mon, Mar 8, 2010 at 9:53 AM, Shalin Shekhar Mangar<
shalinman...@gmail.com>  wrote:

   

Hi Mark,

  On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher<
mark.fletcher2...@gmail.com>  wrote:

 

I ran the SWAP command. Now:-
COREX has the dataDir pointing to the updated dataDir of COREY. So COREX
has the latest.
Again, COREY (on which the update regularly runs) is pointing to the old
index of COREX. So this now doesnt have the most updated index.

Now shouldn't I update the index of COREY (now pointing to the old COREX)
so that it has the latest footprint as in COREX (having the latest COREY
index)so that when the update again happens to COREY, it has the latest and
I again do the SWAP.

Is a physical copying of the index  named COREY (the latest and now datDir
of COREX after SWAP) to the index COREX  (now the dataDir of COREY.. the
orginal non-updated index of COREX) the best way for this or is there any
other better option.

Once again, later when COREY is again updated with the latest, I will run
the SWAP again and it will be fine with COREX again pointing to its original
dataDir (now the updated one).So every even SWAP command run will point
COREX back to its original dataDir. (same case with COREY).

My only concern is after the SWAP is done, updating the old index (which
was serving previously and now replaced by the new index). What is the best
way to do that? Physically copy the latest index to the old one and make it
in sync with the latest one so that by the time it is to get the latest
updates it has the latest in it so that the new ones can be added to this
and it becomes the latest and is again swapped?

   

Perhaps it is best if we take a step back and understand why you need two
identical cores?

--
Regards,
Shalin Shekhar Mangar.

 
   



--
- Mark

http://www.lucidimagination.com





Re: Child entities in document not loading

2010-03-08 Thread Erick Erickson
Try http:///solr/admin. You'll see a bunch
of links that'll allow you to examine many aspects of your installation.

Additionally, get a copy of Luke (Google Lucene Luke) and point it at
your index for a detailed look at the index.

Finally, the SOLR log file might give you some clues...

HTH
Erick

On Mon, Mar 8, 2010 at 10:49 AM, John Ament  wrote:

> Where would I see this? I do believe the fields are not ending up in the
> index.
>
> Thanks
>
> John
>
> On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson  >wrote:
>
> > What does the solr admin page show you is actually in your index?
> >
> > Luke will also help.
> >
> > Erick
> >
> > On Mon, Mar 8, 2010 at 10:06 AM, John Ament 
> wrote:
> >
> > > All,
> > >
> > > So I think I have my first issue figured out, need to add terms to the
> > > default search.  That's fine.
> > >
> > > New issue is that I'm trying to load child entities in with my entity.
> > >
> > > I added the appropriate fields to solrconfig.xml
> > >
> > > > > multiValued="true"/>
> > > > > multiValued="true"/>
> > > > > multiValued="true"/>
> > >
> > > And I updated my document to match
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > So my expectation is that there will be 3 new fields associated with it
> > > that
> > > are multivalued: sizes, colors, and sections.
> > >
> > > The full-import seems to work correctly.  I get the appropriate number
> of
> > > documents in my searches.  However, sizes, colors and sections all come
> > up
> > > null (well, I should say they don't come up when I search for them).
> > >
> > > Any ideas on why it won't load these 3 child entities?
> > >
> > > Thanks!
> > >
> > > John
> > >
> >
>


Re: Child entities in document not loading

2010-03-08 Thread John Ament
Erick,

I'm sorry, but it's not helping much.  I don't see anything on the admin
screen that allows me to browse my index.  Even using Luke, my assumption is
that it's not loading correctly in the index.  What parameters can I change
in the logs to make it print out more information? I want to see what the
query is returning I guess.

Thanks,

John

On Mon, Mar 8, 2010 at 11:23 AM, Erick Erickson wrote:

> Try http:///solr/admin. You'll see a bunch
> of links that'll allow you to examine many aspects of your installation.
>
> Additionally, get a copy of Luke (Google Lucene Luke) and point it at
> your index for a detailed look at the index.
>
> Finally, the SOLR log file might give you some clues...
>
> HTH
> Erick
>
> On Mon, Mar 8, 2010 at 10:49 AM, John Ament  wrote:
>
> > Where would I see this? I do believe the fields are not ending up in the
> > index.
> >
> > Thanks
> >
> > John
> >
> > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson  > >wrote:
> >
> > > What does the solr admin page show you is actually in your index?
> > >
> > > Luke will also help.
> > >
> > > Erick
> > >
> > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament 
> > wrote:
> > >
> > > > All,
> > > >
> > > > So I think I have my first issue figured out, need to add terms to
> the
> > > > default search.  That's fine.
> > > >
> > > > New issue is that I'm trying to load child entities in with my
> entity.
> > > >
> > > > I added the appropriate fields to solrconfig.xml
> > > >
> > > > > > > multiValued="true"/>
> > > > > > > multiValued="true"/>
> > > > > > > multiValued="true"/>
> > > >
> > > > And I updated my document to match
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > So my expectation is that there will be 3 new fields associated with
> it
> > > > that
> > > > are multivalued: sizes, colors, and sections.
> > > >
> > > > The full-import seems to work correctly.  I get the appropriate
> number
> > of
> > > > documents in my searches.  However, sizes, colors and sections all
> come
> > > up
> > > > null (well, I should say they don't come up when I search for them).
> > > >
> > > > Any ideas on why it won't load these 3 child entities?
> > > >
> > > > Thanks!
> > > >
> > > > John
> > > >
> > >
> >
>


Re: Child entities in document not loading

2010-03-08 Thread Erick Erickson
Sorry, won't be able to really look till tonight. Did you try Luke? What did
it
show?

One thing I did notice though...

field name="sections" type="string" indexed="true" stored="true"
multiValued="true"/>

string types are not analyzed, so the entire input is indexed as
a single token. You might want "text" here

Erick

On Mon, Mar 8, 2010 at 11:37 AM, John Ament  wrote:

> Erick,
>
> I'm sorry, but it's not helping much.  I don't see anything on the admin
> screen that allows me to browse my index.  Even using Luke, my assumption
> is
> that it's not loading correctly in the index.  What parameters can I change
> in the logs to make it print out more information? I want to see what the
> query is returning I guess.
>
> Thanks,
>
> John
>
> On Mon, Mar 8, 2010 at 11:23 AM, Erick Erickson  >wrote:
>
> > Try http:///solr/admin. You'll see a bunch
> > of links that'll allow you to examine many aspects of your installation.
> >
> > Additionally, get a copy of Luke (Google Lucene Luke) and point it at
> > your index for a detailed look at the index.
> >
> > Finally, the SOLR log file might give you some clues...
> >
> > HTH
> > Erick
> >
> > On Mon, Mar 8, 2010 at 10:49 AM, John Ament 
> wrote:
> >
> > > Where would I see this? I do believe the fields are not ending up in
> the
> > > index.
> > >
> > > Thanks
> > >
> > > John
> > >
> > > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson <
> erickerick...@gmail.com
> > > >wrote:
> > >
> > > > What does the solr admin page show you is actually in your index?
> > > >
> > > > Luke will also help.
> > > >
> > > > Erick
> > > >
> > > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament 
> > > wrote:
> > > >
> > > > > All,
> > > > >
> > > > > So I think I have my first issue figured out, need to add terms to
> > the
> > > > > default search.  That's fine.
> > > > >
> > > > > New issue is that I'm trying to load child entities in with my
> > entity.
> > > > >
> > > > > I added the appropriate fields to solrconfig.xml
> > > > >
> > > > > stored="true"
> > > > > multiValued="true"/>
> > > > > > > > > multiValued="true"/>
> > > > > > > > > multiValued="true"/>
> > > > >
> > > > > And I updated my document to match
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > So my expectation is that there will be 3 new fields associated
> with
> > it
> > > > > that
> > > > > are multivalued: sizes, colors, and sections.
> > > > >
> > > > > The full-import seems to work correctly.  I get the appropriate
> > number
> > > of
> > > > > documents in my searches.  However, sizes, colors and sections all
> > come
> > > > up
> > > > > null (well, I should say they don't come up when I search for
> them).
> > > > >
> > > > > Any ideas on why it won't load these 3 child entities?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > John
> > > > >
> > > >
> > >
> >
>


Re: Wildcard question -- case issue

2010-03-08 Thread Ahmet Arslan
> query:
> 
> spell?q=name:(cm*) OR namesimple:(cm*)
> 
> returns:
> CMJ foo bar
> CME foo bar
> 
> spell?q=name:(CM*) OR namesimple:(CM*)
> returns
> No results.

"Wildcard queries are not analyzed by Lucene and hence the behavior. [1]
[1]http://www.search-lucene.com/m?id=4a8ce9b2.2070...@ait.co.at||wildcard%20not%20analyzed


  


Re: Child entities in document not loading

2010-03-08 Thread John Ament
Another thing I don't get.  The system feels like it's doing the extra
queries.  I put the LogTransformer expecting to see additional output on one
of the child entities



And yet there is no additional output.

Thanks,

John

On Mon, Mar 8, 2010 at 11:37 AM, John Ament  wrote:

> Erick,
>
> I'm sorry, but it's not helping much.  I don't see anything on the admin
> screen that allows me to browse my index.  Even using Luke, my assumption is
> that it's not loading correctly in the index.  What parameters can I change
> in the logs to make it print out more information? I want to see what the
> query is returning I guess.
>
> Thanks,
>
> John
>
>
> On Mon, Mar 8, 2010 at 11:23 AM, Erick Erickson 
> wrote:
>
>> Try http:///solr/admin. You'll see a bunch
>> of links that'll allow you to examine many aspects of your installation.
>>
>> Additionally, get a copy of Luke (Google Lucene Luke) and point it at
>> your index for a detailed look at the index.
>>
>> Finally, the SOLR log file might give you some clues...
>>
>> HTH
>> Erick
>>
>> On Mon, Mar 8, 2010 at 10:49 AM, John Ament  wrote:
>>
>> > Where would I see this? I do believe the fields are not ending up in the
>> > index.
>> >
>> > Thanks
>> >
>> > John
>> >
>> > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson <
>> erickerick...@gmail.com
>> > >wrote:
>> >
>> > > What does the solr admin page show you is actually in your index?
>> > >
>> > > Luke will also help.
>> > >
>> > > Erick
>> > >
>> > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament 
>> > wrote:
>> > >
>> > > > All,
>> > > >
>> > > > So I think I have my first issue figured out, need to add terms to
>> the
>> > > > default search.  That's fine.
>> > > >
>> > > > New issue is that I'm trying to load child entities in with my
>> entity.
>> > > >
>> > > > I added the appropriate fields to solrconfig.xml
>> > > >
>> > > >> > > > multiValued="true"/>
>> > > >> > > > multiValued="true"/>
>> > > >> > > > multiValued="true"/>
>> > > >
>> > > > And I updated my document to match
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > So my expectation is that there will be 3 new fields associated with
>> it
>> > > > that
>> > > > are multivalued: sizes, colors, and sections.
>> > > >
>> > > > The full-import seems to work correctly.  I get the appropriate
>> number
>> > of
>> > > > documents in my searches.  However, sizes, colors and sections all
>> come
>> > > up
>> > > > null (well, I should say they don't come up when I search for them).
>> > > >
>> > > > Any ideas on why it won't load these 3 child entities?
>> > > >
>> > > > Thanks!
>> > > >
>> > > > John
>> > > >
>> > >
>> >
>>
>
>


Re: Child entities in document not loading

2010-03-08 Thread John Ament
The issue's not about indexing, the issue's about storage.  It seems like
the fields (sections, colors, sizes) are all not being stored, even though
store=true.

I could not get Luke to work, no.  The webstart just hangs at downloading
0%.

Thanks,

John

On Mon, Mar 8, 2010 at 12:06 PM, Erick Erickson wrote:

> Sorry, won't be able to really look till tonight. Did you try Luke? What
> did
> it
> show?
>
> One thing I did notice though...
>
> field name="sections" type="string" indexed="true" stored="true"
> multiValued="true"/>
>
> string types are not analyzed, so the entire input is indexed as
> a single token. You might want "text" here
>
> Erick
>
> On Mon, Mar 8, 2010 at 11:37 AM, John Ament  wrote:
>
> > Erick,
> >
> > I'm sorry, but it's not helping much.  I don't see anything on the admin
> > screen that allows me to browse my index.  Even using Luke, my assumption
> > is
> > that it's not loading correctly in the index.  What parameters can I
> change
> > in the logs to make it print out more information? I want to see what the
> > query is returning I guess.
> >
> > Thanks,
> >
> > John
> >
> > On Mon, Mar 8, 2010 at 11:23 AM, Erick Erickson  > >wrote:
> >
> > > Try http:///solr/admin. You'll see a bunch
> > > of links that'll allow you to examine many aspects of your
> installation.
> > >
> > > Additionally, get a copy of Luke (Google Lucene Luke) and point it at
> > > your index for a detailed look at the index.
> > >
> > > Finally, the SOLR log file might give you some clues...
> > >
> > > HTH
> > > Erick
> > >
> > > On Mon, Mar 8, 2010 at 10:49 AM, John Ament 
> > wrote:
> > >
> > > > Where would I see this? I do believe the fields are not ending up in
> > the
> > > > index.
> > > >
> > > > Thanks
> > > >
> > > > John
> > > >
> > > > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson <
> > erickerick...@gmail.com
> > > > >wrote:
> > > >
> > > > > What does the solr admin page show you is actually in your index?
> > > > >
> > > > > Luke will also help.
> > > > >
> > > > > Erick
> > > > >
> > > > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament 
> > > > wrote:
> > > > >
> > > > > > All,
> > > > > >
> > > > > > So I think I have my first issue figured out, need to add terms
> to
> > > the
> > > > > > default search.  That's fine.
> > > > > >
> > > > > > New issue is that I'm trying to load child entities in with my
> > > entity.
> > > > > >
> > > > > > I added the appropriate fields to solrconfig.xml
> > > > > >
> > > > > > > stored="true"
> > > > > > multiValued="true"/>
> > > > > > stored="true"
> > > > > > multiValued="true"/>
> > > > > > > > > > > multiValued="true"/>
> > > > > >
> > > > > > And I updated my document to match
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > So my expectation is that there will be 3 new fields associated
> > with
> > > it
> > > > > > that
> > > > > > are multivalued: sizes, colors, and sections.
> > > > > >
> > > > > > The full-import seems to work correctly.  I get the appropriate
> > > number
> > > > of
> > > > > > documents in my searches.  However, sizes, colors and sections
> all
> > > come
> > > > > up
> > > > > > null (well, I should say they don't come up when I search for
> > them).
> > > > > >
> > > > > > Any ideas on why it won't load these 3 child entities?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > John
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Child entities in document not loading

2010-03-08 Thread John Ament
Ok - downloaded the binary off of google code and it's loading.  The 3 child
entities do not appear as I had suspected.

Thanks,

John

On Mon, Mar 8, 2010 at 12:12 PM, John Ament  wrote:

> The issue's not about indexing, the issue's about storage.  It seems like
> the fields (sections, colors, sizes) are all not being stored, even though
> store=true.
>
> I could not get Luke to work, no.  The webstart just hangs at downloading
> 0%.
>
> Thanks,
>
> John
>
>
> On Mon, Mar 8, 2010 at 12:06 PM, Erick Erickson 
> wrote:
>
>> Sorry, won't be able to really look till tonight. Did you try Luke? What
>> did
>> it
>> show?
>>
>> One thing I did notice though...
>>
>> field name="sections" type="string" indexed="true" stored="true"
>> multiValued="true"/>
>>
>> string types are not analyzed, so the entire input is indexed as
>> a single token. You might want "text" here
>>
>> Erick
>>
>> On Mon, Mar 8, 2010 at 11:37 AM, John Ament  wrote:
>>
>> > Erick,
>> >
>> > I'm sorry, but it's not helping much.  I don't see anything on the admin
>> > screen that allows me to browse my index.  Even using Luke, my
>> assumption
>> > is
>> > that it's not loading correctly in the index.  What parameters can I
>> change
>> > in the logs to make it print out more information? I want to see what
>> the
>> > query is returning I guess.
>> >
>> > Thanks,
>> >
>> > John
>> >
>> > On Mon, Mar 8, 2010 at 11:23 AM, Erick Erickson <
>> erickerick...@gmail.com
>> > >wrote:
>> >
>> > > Try http:///solr/admin. You'll see a
>> bunch
>> > > of links that'll allow you to examine many aspects of your
>> installation.
>> > >
>> > > Additionally, get a copy of Luke (Google Lucene Luke) and point it at
>> > > your index for a detailed look at the index.
>> > >
>> > > Finally, the SOLR log file might give you some clues...
>> > >
>> > > HTH
>> > > Erick
>> > >
>> > > On Mon, Mar 8, 2010 at 10:49 AM, John Ament 
>> > wrote:
>> > >
>> > > > Where would I see this? I do believe the fields are not ending up in
>> > the
>> > > > index.
>> > > >
>> > > > Thanks
>> > > >
>> > > > John
>> > > >
>> > > > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson <
>> > erickerick...@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > What does the solr admin page show you is actually in your index?
>> > > > >
>> > > > > Luke will also help.
>> > > > >
>> > > > > Erick
>> > > > >
>> > > > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament > >
>> > > > wrote:
>> > > > >
>> > > > > > All,
>> > > > > >
>> > > > > > So I think I have my first issue figured out, need to add terms
>> to
>> > > the
>> > > > > > default search.  That's fine.
>> > > > > >
>> > > > > > New issue is that I'm trying to load child entities in with my
>> > > entity.
>> > > > > >
>> > > > > > I added the appropriate fields to solrconfig.xml
>> > > > > >
>> > > > > >> > stored="true"
>> > > > > > multiValued="true"/>
>> > > > > >> stored="true"
>> > > > > > multiValued="true"/>
>> > > > > >> stored="true"
>> > > > > > multiValued="true"/>
>> > > > > >
>> > > > > > And I updated my document to match
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > So my expectation is that there will be 3 new fields associated
>> > with
>> > > it
>> > > > > > that
>> > > > > > are multivalued: sizes, colors, and sections.
>> > > > > >
>> > > > > > The full-import seems to work correctly.  I get the appropriate
>> > > number
>> > > > of
>> > > > > > documents in my searches.  However, sizes, colors and sections
>> all
>> > > come
>> > > > > up
>> > > > > > null (well, I should say they don't come up when I search for
>> > them).
>> > > > > >
>> > > > > > Any ideas on why it won't load these 3 child entities?
>> > > > > >
>> > > > > > Thanks!
>> > > > > >
>> > > > > > John
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>


DataInputHandlers and dynamic fields

2010-03-08 Thread Kevin Osborn
If my query were something like this: "select col1, col2 from table", my 
dynamic field would be something like "fld_${col1}". But I could not find any 
information on how to setup the DIH with dynamic fields. I saw that dynamic 
fields should be supported with SOLR-742, but am not sure how to proceed. Does 
anyone have an example or information on how to set it up?



  

Re: SOLR takes more than 9 hours to index 300000 rows

2010-03-08 Thread JavaGuy84

Shawn,

Increasing the fetch size and increasing my heap based on that did the
trick.. Thanksss a lot for your help.. your suggestions helped me a lot..

Hope these suggestions will be helpful to others too who are facing similar
kind of issue.

Thanks,
Barani

Shawn Heisey-4 wrote:
> 
> Do keep looking into the batchSize, but I think I might have found the 
> issue.  If I understand things correctly, you will need to add 
> processor="CachedSqlEntityProcessor" to your first entity.  It's only 
> specified on the other two.  Assuming you have enough RAM and heap space 
> available in your JVM to load the results of all three queries, that 
> ought to make it work very quickly.
> 
> If I'm right, basically what it's doing is issuing a real SQL query 
> against your first table for every entry it has read for the other two 
> tables.
> 
> Shawn
> 
> On 3/6/2010 11:58 AM, JavaGuy84 wrote:
>> Shawn,
>>
>> Thanks a lot for your response,
>>
>> Yes, still the DB connection is active.. It is still fetching the data
>> from
>> the DB.
>>
>> I am using Redhat MetaMatrix DB as backend and I am trying to find out
>> the
>> parameter for setting the JDBC fetch size..
>>
>> Do you think that this problem will be mostly due to fetch size?
>>
>> Thanks,
>> Barani
>>
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/SOLR-takes-more-than-9-hours-to-index-30-rows-tp27805403p27825172.html
Sent from the Solr - User mailing list archive at Nabble.com.



Search on dynamic fields which contains spaces /special characters

2010-03-08 Thread JavaGuy84

Hi,

We have some dynamic fields getting indexed using SOLR. Some of the dynamic
fields contains spaces / special character (something like: short name, Full
Name etc...). Is there a way to search on these fields (which contains the
spaces etc..). Can someone let me know the filter I need to pass to do this
type of search?

I tried with short name:name1 --> this didnt work..

Thanks,
Barani
-- 
View this message in context: 
http://old.nabble.com/Search-on-dynamic-fields-which-contains-spaces--special-characters-tp27826147p27826147.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Search on dynamic fields which contains spaces /special characters

2010-03-08 Thread Israel Ekpo
I do not believe the SOLR or LUCENE syntax allows this

You need to get rid of all the spaces in the field name

If not, then you will be searching for "short" in the default field and then
"name1" in the "name" field.

http://wiki.apache.org/solr/SolrQuerySyntax

http://lucene.apache.org/java/2_9_2/queryparsersyntax.html


On Mon, Mar 8, 2010 at 2:17 PM, JavaGuy84  wrote:

>
> Hi,
>
> We have some dynamic fields getting indexed using SOLR. Some of the dynamic
> fields contains spaces / special character (something like: short name,
> Full
> Name etc...). Is there a way to search on these fields (which contains the
> spaces etc..). Can someone let me know the filter I need to pass to do this
> type of search?
>
> I tried with short name:name1 --> this didnt work..
>
> Thanks,
> Barani
> --
> View this message in context:
> http://old.nabble.com/Search-on-dynamic-fields-which-contains-spaces--special-characters-tp27826147p27826147.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Child entities in document not loading

2010-03-08 Thread John Ament
All

It seems like my issue is simply on the concept of child entities.

I had to add a second table to my query to pull pricing info.  At first, I
was putting it in a separate entity.  Didn't work, even though I added the
fields.

When I rewrote my query as



It loaded.

I'm wondering if there's something I have to activate to make child entities
work?

Thanks,

John

On Mon, Mar 8, 2010 at 12:17 PM, John Ament  wrote:

> Ok - downloaded the binary off of google code and it's loading.  The 3
> child entities do not appear as I had suspected.
>
> Thanks,
>
> John
>
>
> On Mon, Mar 8, 2010 at 12:12 PM, John Ament  wrote:
>
>> The issue's not about indexing, the issue's about storage.  It seems like
>> the fields (sections, colors, sizes) are all not being stored, even though
>> store=true.
>>
>> I could not get Luke to work, no.  The webstart just hangs at downloading
>> 0%.
>>
>> Thanks,
>>
>> John
>>
>>
>> On Mon, Mar 8, 2010 at 12:06 PM, Erick Erickson 
>> wrote:
>>
>>> Sorry, won't be able to really look till tonight. Did you try Luke? What
>>> did
>>> it
>>> show?
>>>
>>> One thing I did notice though...
>>>
>>> field name="sections" type="string" indexed="true" stored="true"
>>> multiValued="true"/>
>>>
>>> string types are not analyzed, so the entire input is indexed as
>>> a single token. You might want "text" here
>>>
>>> Erick
>>>
>>> On Mon, Mar 8, 2010 at 11:37 AM, John Ament 
>>> wrote:
>>>
>>> > Erick,
>>> >
>>> > I'm sorry, but it's not helping much.  I don't see anything on the
>>> admin
>>> > screen that allows me to browse my index.  Even using Luke, my
>>> assumption
>>> > is
>>> > that it's not loading correctly in the index.  What parameters can I
>>> change
>>> > in the logs to make it print out more information? I want to see what
>>> the
>>> > query is returning I guess.
>>> >
>>> > Thanks,
>>> >
>>> > John
>>> >
>>> > On Mon, Mar 8, 2010 at 11:23 AM, Erick Erickson <
>>> erickerick...@gmail.com
>>> > >wrote:
>>> >
>>> > > Try http:///solr/admin. You'll see a
>>> bunch
>>> > > of links that'll allow you to examine many aspects of your
>>> installation.
>>> > >
>>> > > Additionally, get a copy of Luke (Google Lucene Luke) and point it at
>>> > > your index for a detailed look at the index.
>>> > >
>>> > > Finally, the SOLR log file might give you some clues...
>>> > >
>>> > > HTH
>>> > > Erick
>>> > >
>>> > > On Mon, Mar 8, 2010 at 10:49 AM, John Ament 
>>> > wrote:
>>> > >
>>> > > > Where would I see this? I do believe the fields are not ending up
>>> in
>>> > the
>>> > > > index.
>>> > > >
>>> > > > Thanks
>>> > > >
>>> > > > John
>>> > > >
>>> > > > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson <
>>> > erickerick...@gmail.com
>>> > > > >wrote:
>>> > > >
>>> > > > > What does the solr admin page show you is actually in your index?
>>> > > > >
>>> > > > > Luke will also help.
>>> > > > >
>>> > > > > Erick
>>> > > > >
>>> > > > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament <
>>> my.repr...@gmail.com>
>>> > > > wrote:
>>> > > > >
>>> > > > > > All,
>>> > > > > >
>>> > > > > > So I think I have my first issue figured out, need to add terms
>>> to
>>> > > the
>>> > > > > > default search.  That's fine.
>>> > > > > >
>>> > > > > > New issue is that I'm trying to load child entities in with my
>>> > > entity.
>>> > > > > >
>>> > > > > > I added the appropriate fields to solrconfig.xml
>>> > > > > >
>>> > > > > >>> > stored="true"
>>> > > > > > multiValued="true"/>
>>> > > > > >>> stored="true"
>>> > > > > > multiValued="true"/>
>>> > > > > >>> stored="true"
>>> > > > > > multiValued="true"/>
>>> > > > > >
>>> > > > > > And I updated my document to match
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > So my expectation is that there will be 3 new fields associated
>>> > with
>>> > > it
>>> > > > > > that
>>> > > > > > are multivalued: sizes, colors, and sections.
>>> > > > > >
>>> > > > > > The full-import seems to work correctly.  I get the appropriate
>>> > > number
>>> > > > of
>>> > > > > > documents in my searches.  However, sizes, colors and sections
>>> all
>>> > > come
>>> > > > > up
>>> > > > > > null (well, I should say they don't come up when I search for
>>> > them).
>>> > > > > >
>>> > > > > > Any ideas on why it won't load these 3 child entities?
>>> > > > > >
>>> > > > > > Thanks!
>>> > > > > >
>>> > > > > > John
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>


Solr Startup CPU Spike

2010-03-08 Thread John Williams
Good afternoon.

We have been experiencing an odd issue with one of our Solr nodes. Upon startup 
or when bringing in a new index we get a CPU spike for 5 minutes or so. I have 
attached a graph of this spike. During this time simple queries return without 
a problem but more complex queries do not return. Here are some more details 
about the instance:

Index Size: ~16G
Max Heap: 6144M
GC Option: -XX:+UseConcMarkSweepGC
System Memory: 16G

We have a very similar instance to this one but with a much larger index that 
we are not seeing this sort of issue.

Your help is greatly appreciated. Let me know if you need any additional 
information.

Thanks,
John

--
John Williams
System Administrator
37signals


<>

smime.p7s
Description: S/MIME cryptographic signature


Re: Solr Startup CPU Spike

2010-03-08 Thread Yonik Seeley
Is this just autowarming?
Check your autowarmCount parameters in solrconfig.xml

-Yonik
http://www.lucidimagination.com

On Mon, Mar 8, 2010 at 5:37 PM, John Williams  wrote:
> Good afternoon.
>
> We have been experiencing an odd issue with one of our Solr nodes. Upon 
> startup or when bringing in a new index we get a CPU spike for 5 minutes or 
> so. I have attached a graph of this spike. During this time simple queries 
> return without a problem but more complex queries do not return. Here are 
> some more details about the instance:
>
> Index Size: ~16G
> Max Heap: 6144M
> GC Option: -XX:+UseConcMarkSweepGC
> System Memory: 16G
>
> We have a very similar instance to this one but with a much larger index that 
> we are not seeing this sort of issue.
>
> Your help is greatly appreciated. Let me know if you need any additional 
> information.
>
> Thanks,
> John
>
> --
> John Williams
> System Administrator
> 37signals
>
>
>


Re: Solr Startup CPU Spike

2010-03-08 Thread John Williams
Yonik,

In all cases our "autowarmCount" is set to 0. Also, here is a link to our 
config. http://pastebin.com/iUgruqPd

Thanks,
John

--
John Williams
System Administrator
37signals

On Mar 8, 2010, at 4:44 PM, Yonik Seeley wrote:

> Is this just autowarming?
> Check your autowarmCount parameters in solrconfig.xml
> 
> -Yonik
> http://www.lucidimagination.com
> 
> On Mon, Mar 8, 2010 at 5:37 PM, John Williams  wrote:
>> Good afternoon.
>> 
>> We have been experiencing an odd issue with one of our Solr nodes. Upon 
>> startup or when bringing in a new index we get a CPU spike for 5 minutes or 
>> so. I have attached a graph of this spike. During this time simple queries 
>> return without a problem but more complex queries do not return. Here are 
>> some more details about the instance:
>> 
>> Index Size: ~16G
>> Max Heap: 6144M
>> GC Option: -XX:+UseConcMarkSweepGC
>> System Memory: 16G
>> 
>> We have a very similar instance to this one but with a much larger index 
>> that we are not seeing this sort of issue.
>> 
>> Your help is greatly appreciated. Let me know if you need any additional 
>> information.
>> 
>> Thanks,
>> John
>> 
>> --
>> John Williams
>> System Administrator
>> 37signals
>> 
>> 
>> 



smime.p7s
Description: S/MIME cryptographic signature


Re: Solr Startup CPU Spike

2010-03-08 Thread Yonik Seeley
On Mon, Mar 8, 2010 at 6:07 PM, John Williams  wrote:
> Yonik,
>
> In all cases our "autowarmCount" is set to 0. Also, here is a link to our 
> config. http://pastebin.com/iUgruqPd

Weird... on a quick glance, I don't see anything in your config that
would cause work to be done on a commit.
I expected something like autowarming, or rebuilding a spellcheck
index, etc.  I assume this is happening even w/o any requests hitting
the server?

Could it be GC?  You could use -verbose:gc or jconsole to check if
this corresponds to a big GC (which could naturally hit on an index
change).  5 minutes is really excessive though, and I wouldn't expect
it on startup.

If it's not GC, perhaps the next step is to get some stack traces
during the spike (or use a profiler) to figure out where the time is
being spent.  And verify that the solrconfig.xml shown actually still
matches the one you provided.

-Yonik
http://www.lucidimagination.com



> Thanks,
> John
>
> --
> John Williams
> System Administrator
> 37signals
>
> On Mar 8, 2010, at 4:44 PM, Yonik Seeley wrote:
>
>> Is this just autowarming?
>> Check your autowarmCount parameters in solrconfig.xml
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> On Mon, Mar 8, 2010 at 5:37 PM, John Williams  wrote:
>>> Good afternoon.
>>>
>>> We have been experiencing an odd issue with one of our Solr nodes. Upon 
>>> startup or when bringing in a new index we get a CPU spike for 5 minutes or 
>>> so. I have attached a graph of this spike. During this time simple queries 
>>> return without a problem but more complex queries do not return. Here are 
>>> some more details about the instance:
>>>
>>> Index Size: ~16G
>>> Max Heap: 6144M
>>> GC Option: -XX:+UseConcMarkSweepGC
>>> System Memory: 16G
>>>
>>> We have a very similar instance to this one but with a much larger index 
>>> that we are not seeing this sort of issue.
>>>
>>> Your help is greatly appreciated. Let me know if you need any additional 
>>> information.
>>>
>>> Thanks,
>>> John
>>>
>>> --
>>> John Williams
>>> System Administrator
>>> 37signals
>>>
>>>
>>>
>
>


PDF extraction leads to reversed words

2010-03-08 Thread Abdelhamid ABID
Hi,
Posting arabic pdf files to Solr using a web form (to solr/update/extract)
get extracted texts and each words displayed in reverse direction(instead of
right to left).
When perform search against these texts with -always- reversed key-words I
get results but reversed.
This problem doesn't occur when posting MsWord document.
I think the problem come from Tika !

Any clue ?

-- 
elsadek
Software Engineer- J2EE / WEB / ESB MULE


Re: [ANN] Zoie Solr Plugin - Zoie Solr Plugin enables real-time update functionality for Apache Solr 1.4+

2010-03-08 Thread Don Werve
Too bad it requires integer (long) primary keys... :/

2010/3/8 Ian Holsman 

>
> I just saw this on twitter, and thought you guys would be interested.. I
> haven't tried it, but it looks interesting.
>
> http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin
>
> Thanks for the RT Shalin!
>


Re: PDF extraction leads to reversed words

2010-03-08 Thread Robert Muir
I think the problem is that Solr does not include the ICU4J jar, so it
won't work with Arabic PDF files.

Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your classpath.

On Mon, Mar 8, 2010 at 6:30 PM, Abdelhamid  ABID  wrote:
> Hi,
> Posting arabic pdf files to Solr using a web form (to solr/update/extract)
> get extracted texts and each words displayed in reverse direction(instead of
> right to left).
> When perform search against these texts with -always- reversed key-words I
> get results but reversed.
> This problem doesn't occur when posting MsWord document.
> I think the problem come from Tika !
>
> Any clue ?
>
> --
> elsadek
> Software Engineer- J2EE / WEB / ESB MULE
>



-- 
Robert Muir
rcm...@gmail.com


Re: which links do i have to follow to understand location based search concepts?

2010-03-08 Thread KshamaPai

Hi,
During indexing its taking localhost and port 8983, 
index: 
 [echo] Indexing ./data/ 
 [java] ./data/   http://localhost:8983/solr

So other case where in solr instance not running ,what may be the reason
that solr is not running? (Am new to solr)

You mean it has to do nothing with the xml since its giving

at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
 [java] at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
 [java] at
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
 [java] at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
 [java] at
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
 [java] at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
 [java] at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
 [java] at OSM2Solr.process(OSM2Solr.java:44)
 [java] at Driver.main(Driver.java:79)
 [java] Caused by: java.net.ConnectException: Connection refused
 [java] at java.net.PlainSocketImpl.socketConnect(Native Method)
 [java] at
java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)


and line 79 in driver.java is 
 numIndexed+=02s.process(new FileInputstream(file)); //where its taking my
xml file.




Shalin Shekhar Mangar wrote:
> 
> On Mon, Mar 8, 2010 at 6:21 PM, KshamaPai  wrote:
> 
>>
>> Hi,
>> Thank You for explaining it in a simple way.
>> The article really helped me to understand the concepts better.
>>
>> My question is ,Is it necessary that the data what you are indexing in
>> spatial example, is to be in the osm format and using facts files?
>>  In my case,am trying to index data ,that has just lat,longitude and
>> related
>> news item(just text) in a xml file which looks like this
>>
>> 
>> 
>>
>> 
>> 
>>
>> I have silghtly modified driver.java and other .java files in
>> src/main/java
>> folder, so that these fields are considered for indexing.(but have
>> retained
>> geohash,lat_rad,lng_rad as done in spatial example)
>>
>> But when i do ant index , am getting
>>
>> Buildfile: build.xml
>>
>> init:
>>
>> compile:
>>
>> index:
>> [echo] Indexing ./data/
>> [java] ./data/   http://localhost:8983/solr
>> [java] Num args: 2
>> [java] Starting indexing
>> [java] Indexing: ./data/final.xml
>> [java] Mar 8, 2010 4:40:35 AM
>> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
>> [java] INFO: I/O exception (java.net.ConnectException) caught when
>> processing request: Connection refused
>>
> 
> The "Connection refused" message suggests that your Solr instance is
> either
> not running or you have given the wrong host/port in your driver.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/which-links-do-i-have-to-follow-to-understand-location-based-search-concepts--tp27811139p27830412.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrJ commit options

2010-03-08 Thread Lance Norskog
waitFlush=true means that the commit HTTP call waits until everything
is sent to disk before it returns.
waitSearcher=true means that the commit HTTP call waits until Solr has
reloaded the index and is ready to search against it. (For more, study
Solr warming up.)

Both of these mean that the HTTP call (or curl program or Solrj
program) that started the commit, waits until it is done. Other
processes doing searches against the index are not blocked. However,
the commit may have so much disk activity that the other searches do
not proceeed very fast. They are not completely blocked.

The commit will take as long as it takes, and your results will appear
after that. If you want to time that, use
waitFlush=true&waitSearcher=true.

On Fri, Mar 5, 2010 at 9:39 PM, gunjan_versata  wrote:
>
> But can anyone explain me the use of these parameters.. I have read upon it..
> what i could  not understand was.. if can i set both the params to false,
> after how much time will my changes start reflecting?
>
> --
> View this message in context: 
> http://old.nabble.com/SolrJ-commit-options-tp27714405p27802041.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Import database

2010-03-08 Thread Quan Nguyen Anh

On 3/8/2010 9:21 PM, Lee Smith wrote:

I had same issue with Jetty

Adding extra memory resolved my issue ie:  java -Xms=512M -Xmx=1024M  -jar 
start.jar

Its in the manual, but cant seem to find the link


On 8 Mar 2010, at 14:09, Quan Nguyen Anh wrote:

   

Hi,
I have started using Solr. I had a problem when I insert a database with 2 
million rows . I hav
The server encounters error: java.lang.OutOfMemoryError: Java heap space
I searched around  but can't find the solution.
Any hep regarding this will be appreciated.
Thanks in advance



 


  
I tried this solution but the server has the same error. I think that 
the heap size is not large enough to contain data from mysql.




Re: Import database

2010-03-08 Thread Quan Nguyen Anh

On 3/8/2010 11:05 PM, Shawn Heisey wrote:
What database are you using?  Many of the JDBC drivers try to pull the 
entire resultset into RAM before feeding it to the application that 
requested the data.  If it's MySQL, I can show you how to fix it.  The 
batchSize parameter below tells it to stream the data rather than 
buffer it.  With other databases, I don't know how to do this.



  
url="jdbc:mysql://[SERVER]:3306/[SCHEMA]?zeroDateTimeBehavior=convertToNull" 


  batchSize="-1"
  user="[REMOVED]"
  password="[REMOVED]"/>






Shawn


On 3/8/2010 7:09 AM, Quan Nguyen Anh wrote:

Hi,
I have started using Solr. I had a problem when I insert a database 
with 2 million rows . I hav

The server encounters error: java.lang.OutOfMemoryError: Java heap space
I searched around  but can't find the solution.
Any hep regarding this will be appreciated.
Thanks in advance 

I 'm using MySQL.
This solution fixed my problem. Thanks for your help .


Re: Import database

2010-03-08 Thread Quan Nguyen Anh

On 3/8/2010 11:05 PM, Shawn Heisey wrote:
What database are you using?  Many of the JDBC drivers try to pull the 
entire resultset into RAM before feeding it to the application that 
requested the data.  If it's MySQL, I can show you how to fix it.  The 
batchSize parameter below tells it to stream the data rather than 
buffer it.  With other databases, I don't know how to do this.



  
url="jdbc:mysql://[SERVER]:3306/[SCHEMA]?zeroDateTimeBehavior=convertToNull" 


  batchSize="-1"
  user="[REMOVED]"
  password="[REMOVED]"/>






Shawn


On 3/8/2010 7:09 AM, Quan Nguyen Anh wrote:

Hi,
I have started using Solr. I had a problem when I insert a database 
with 2 million rows . I hav

The server encounters error: java.lang.OutOfMemoryError: Java heap space
I searched around  but can't find the solution.
Any hep regarding this will be appreciated.
Thanks in advance 

I 'm using MySQL.
This solution fixed my problem. Thanks for your help .


Re: Can't delete from curl

2010-03-08 Thread Lance Norskog
... curl http://xen1.xcski.com:8080/solrChunk/nutch/select

that should be /update, not /select

On Sun, Mar 7, 2010 at 4:32 PM, Paul Tomblin  wrote:
> On Tue, Mar 2, 2010 at 1:22 AM, Lance Norskog  wrote:
>
>> On Mon, Mar 1, 2010 at 4:02 PM, Paul Tomblin  wrote:
>> > I have a schema with a field name "category" (> > type="string" stored="true" indexed="true"/>).  I'm trying to delete
>> > everything with a certain value of category with curl:...
>> >
>> > I send:
>> >
>> > curl http://localhost:8080/solrChunk/nutch/update -H "Content-Type:
>> > text/xml" --data-binary 'category:Banks'
>> >
>> > Response is:
>> >
>> > 
>> > 
>> > 0> > name="QTime">23
>> > 
>> >
>> > I send
>> >
>> > curl http://localhost:8080/solrChunk/nutch/update -H "Content-Type:
>> > text/xml" --data-binary ''
>> >
>> > Response is:
>> >
>> > 
>> > 
>> > 0> > name="QTime">1914
>> > 
>> >
>> > but when I go back and query, it shows all the same results as before.
>> >
>> > Why isn't it deleting?
>>
>> Do you query with curl also? If you use a web browser, Solr by default
>> uses http caching, so your browser will show you the old result of the
>> query.
>>
>>
> I think you're right about that.  I tried using curl, and it did go to zero.
>  But now I've got a different problem: sometimes when I try to commit, I get
> a NullPointerException:
>
>
> curl http://xen1.xcski.com:8080/solrChunk/nutch/select -H "Content-Type:
> text/xml" --data-binary ''Apache Tomcat/6.0.20 -
> Error report
> HTTP Status 500 - null
>
> java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:33)
> at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:173)
> at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78)
> at org.apache.solr.search.QParser.getQuery(QParser.java:131)
> at
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
> at java.lang.Thread.run(Thread.java:619)
> type Status
> reportmessage null
>
> java.lang.NullPointerException
> at java.io.StringReader.(StringReader.java:33)
> at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:173)
> at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78)
> at org.apache.solr.search.QParser.getQuery(QParser.java:131)
> at
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>
>
> --
> http://www.linkedin.com/in/paultomblin
> http://careers.stackoverflow.com/ptomblin
>



-- 
Lance Norskog
goks...@gmail.com


Re: Search on dynamic fields which contains spaces /special characters

2010-03-08 Thread Dennis Gearon
I'm starting to learn Soir/Lucene. I'm working on a shared server and have to 
use a stand alone Java install. Anyone tell me how to install OpenJDK for a 
shared server account?


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 3/8/10, Israel Ekpo  wrote:

> From: Israel Ekpo 
> Subject: Re: Search on dynamic fields which contains spaces /special  
> characters
> To: solr-user@lucene.apache.org
> Date: Monday, March 8, 2010, 12:44 PM
> I do not believe the SOLR or LUCENE
> syntax allows this
> 
> You need to get rid of all the spaces in the field name
> 
> If not, then you will be searching for "short" in the
> default field and then
> "name1" in the "name" field.
> 
> http://wiki.apache.org/solr/SolrQuerySyntax
> 
> http://lucene.apache.org/java/2_9_2/queryparsersyntax.html
> 
> 
> On Mon, Mar 8, 2010 at 2:17 PM, JavaGuy84 
> wrote:
> 
> >
> > Hi,
> >
> > We have some dynamic fields getting indexed using
> SOLR. Some of the dynamic
> > fields contains spaces / special character (something
> like: short name,
> > Full
> > Name etc...). Is there a way to search on these fields
> (which contains the
> > spaces etc..). Can someone let me know the filter I
> need to pass to do this
> > type of search?
> >
> > I tried with short name:name1 --> this didnt
> work..
> >
> > Thanks,
> > Barani
> > --
> > View this message in context:
> > http://old.nabble.com/Search-on-dynamic-fields-which-contains-spaces--special-characters-tp27826147p27826147.html
> > Sent from the Solr - User mailing list archive at
> Nabble.com.
> >
> >
> 
> 
> -- 
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the
> gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
> 


Re: More contextual information in analyser

2010-03-08 Thread Lance Norskog
This is an interesting idea. There are other projects to make the
analyzer/filter chain more "porous", or open to outside interaction.

A big problem is that queries are analyzed, too. If you want to give
the same metadata to the analyzer when doing a query against the
field, things get tough. You would need a special query parser to
implement your own syntax to do that. However, the analyzer chain in
the query phase does not receive the parsed query, so you have to in
some way change this.

On Mon, Mar 8, 2010 at 2:14 AM, dbejean  wrote:
>
> Hello,
>
> If I write a custom analyser that accept a specific attribut in the
> constructor
>
> public MyCustomAnalyzer(String myAttribute);
>
> Is there a way to dynamically send a value for this attribute from Solr at
> index time in the XML Message ?
>
> 
>  
>    .
>
>
> Obviously, in Sorl shema.xml, the "content" field is associated to my custom
> Analyser.
>
> Thank you.
>
> Dominique
>
> --
> View this message in context: 
> http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: More contextual information in analyser

2010-03-08 Thread Jon Baer
Isn't this what Lucene/Solr payloads are theoretically for?

ie: 
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/

- Jon

On Mar 8, 2010, at 11:15 PM, Lance Norskog wrote:

> This is an interesting idea. There are other projects to make the
> analyzer/filter chain more "porous", or open to outside interaction.
> 
> A big problem is that queries are analyzed, too. If you want to give
> the same metadata to the analyzer when doing a query against the
> field, things get tough. You would need a special query parser to
> implement your own syntax to do that. However, the analyzer chain in
> the query phase does not receive the parsed query, so you have to in
> some way change this.
> 
> On Mon, Mar 8, 2010 at 2:14 AM, dbejean  wrote:
>> 
>> Hello,
>> 
>> If I write a custom analyser that accept a specific attribut in the
>> constructor
>> 
>> public MyCustomAnalyzer(String myAttribute);
>> 
>> Is there a way to dynamically send a value for this attribute from Solr at
>> index time in the XML Message ?
>> 
>> 
>>  
>>.
>> 
>> 
>> Obviously, in Sorl shema.xml, the "content" field is associated to my custom
>> Analyser.
>> 
>> Thank you.
>> 
>> Dominique
>> 
>> --
>> View this message in context: 
>> http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com



Re: HTML encode extracted docs

2010-03-08 Thread Lance Norskog
A Tika integration with the DataImportHandler is in the Solr trunk.
With this, you can copy the raw HTML into different fields and process
one copy with Tika.

If it's just straight HTML, would the HTMLStripCharFilter be good enough?

http://www.lucidimagination.com/search/document/CDRG_ch05_5.7.2

On Mon, Mar 8, 2010 at 5:50 AM, Mark Roberts  wrote:
> I'm uploading .htm files to be extracted - some of these files are "include" 
> files that have snippets of HTML rather than fully formed html documents.
>
> solr-cell stores the raw HTML for these items, rather than extracting the 
> text. Is there any way I can get solr to encode this content prior to storing 
> it?
>
> At the moment, I have the problem that when the highlighted snippets are  
> retrieved via search, I need to parse the snippet and HTML encode the bits of 
> HTML that where indexed, whilst *not* encoding the bits that where added by 
> the highlighter, which is messy and time consuming.
>
> Thanks! Mark,
>



-- 
Lance Norskog
goks...@gmail.com


Re: [ANN] Zoie Solr Plugin - Zoie Solr Plugin enables real-time update functionality for Apache Solr 1.4+

2010-03-08 Thread Lance Norskog
Solr unique ids can be any type. The QueryElevateComponent complains
if the unique id is not a string, but you can comment out the QEC.  I
have one benchmark test with 2 billion documents with an integer id.
Works great.

On Mon, Mar 8, 2010 at 5:06 PM, Don Werve  wrote:
> Too bad it requires integer (long) primary keys... :/
>
> 2010/3/8 Ian Holsman 
>
>>
>> I just saw this on twitter, and thought you guys would be interested.. I
>> haven't tried it, but it looks interesting.
>>
>> http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin
>>
>> Thanks for the RT Shalin!
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: More contextual information in analyser

2010-03-08 Thread Lance Norskog
Yes, payloads should do this.

On Mon, Mar 8, 2010 at 8:29 PM, Jon Baer  wrote:
> Isn't this what Lucene/Solr payloads are theoretically for?
>
> ie: 
> http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
>
> - Jon
>
> On Mar 8, 2010, at 11:15 PM, Lance Norskog wrote:
>
>> This is an interesting idea. There are other projects to make the
>> analyzer/filter chain more "porous", or open to outside interaction.
>>
>> A big problem is that queries are analyzed, too. If you want to give
>> the same metadata to the analyzer when doing a query against the
>> field, things get tough. You would need a special query parser to
>> implement your own syntax to do that. However, the analyzer chain in
>> the query phase does not receive the parsed query, so you have to in
>> some way change this.
>>
>> On Mon, Mar 8, 2010 at 2:14 AM, dbejean  wrote:
>>>
>>> Hello,
>>>
>>> If I write a custom analyser that accept a specific attribut in the
>>> constructor
>>>
>>> public MyCustomAnalyzer(String myAttribute);
>>>
>>> Is there a way to dynamically send a value for this attribute from Solr at
>>> index time in the XML Message ?
>>>
>>> 
>>>  
>>>    .
>>>
>>>
>>> Obviously, in Sorl shema.xml, the "content" field is associated to my custom
>>> Analyser.
>>>
>>> Thank you.
>>>
>>> Dominique
>>>
>>> --
>>> View this message in context: 
>>> http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: PDF extraction leads to reversed words

2010-03-08 Thread Lance Norskog
Is this a mistake in the Tika library collection in the Solr trunk?

On Mon, Mar 8, 2010 at 5:15 PM, Robert Muir  wrote:
> I think the problem is that Solr does not include the ICU4J jar, so it
> won't work with Arabic PDF files.
>
> Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your 
> classpath.
>
> On Mon, Mar 8, 2010 at 6:30 PM, Abdelhamid  ABID  wrote:
>> Hi,
>> Posting arabic pdf files to Solr using a web form (to solr/update/extract)
>> get extracted texts and each words displayed in reverse direction(instead of
>> right to left).
>> When perform search against these texts with -always- reversed key-words I
>> get results but reversed.
>> This problem doesn't occur when posting MsWord document.
>> I think the problem come from Tika !
>>
>> Any clue ?
>>
>> --
>> elsadek
>> Software Engineer- J2EE / WEB / ESB MULE
>>
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com


Re: PDF extraction leads to reversed words

2010-03-08 Thread Robert Muir
it is an optional dependency of PDFBox. If ICU is available, then it
is capable of processing Arabic PDF files.

The problem is that Arabic "text" in PDF files is really glyphs
(encoded in visual order) and needs to be 'unshaped' with some stuff
that isn't in the JDK.

If the size of the default ICU jar file is the issue here, we can
consider an alternative: The default ICU jar is very large as it
includes everything, yet it can be customized to only include what is
needed: http://apps.icu-project.org/datacustom/

We did this in lucene for the collation contrib, to shrink the jar
about 2MB: http://issues.apache.org/jira/browse/LUCENE-1867

For this use-case, it could be even smaller, as most of the huge size
of ICU comes from large CJK collation tables (needed for collation,
but not for this Arabic PDF extraction).

In reality I don't really like doing this as it might confuse users
(e.g. people that want collation, too), and ICU is useful for other
things, but if thats what we have to do, we should do it so that
Arabic PDF files will work.

On Mon, Mar 8, 2010 at 11:53 PM, Lance Norskog  wrote:
> Is this a mistake in the Tika library collection in the Solr trunk?
>
> On Mon, Mar 8, 2010 at 5:15 PM, Robert Muir  wrote:
>> I think the problem is that Solr does not include the ICU4J jar, so it
>> won't work with Arabic PDF files.
>>
>> Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your 
>> classpath.
>>
>> On Mon, Mar 8, 2010 at 6:30 PM, Abdelhamid  ABID  wrote:
>>> Hi,
>>> Posting arabic pdf files to Solr using a web form (to solr/update/extract)
>>> get extracted texts and each words displayed in reverse direction(instead of
>>> right to left).
>>> When perform search against these texts with -always- reversed key-words I
>>> get results but reversed.
>>> This problem doesn't occur when posting MsWord document.
>>> I think the problem come from Tika !
>>>
>>> Any clue ?
>>>
>>> --
>>> elsadek
>>> Software Engineer- J2EE / WEB / ESB MULE
>>>
>>
>>
>>
>> --
>> Robert Muir
>> rcm...@gmail.com
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Robert Muir
rcm...@gmail.com


QueryElevationComponent blues

2010-03-08 Thread Ryan Grange

Using Solr 1.4.
Was using the standard query handler, but needed the boost by field 
functionality of qf from dismax.

So we altered the query to boost certain phrases against a given field.
We were using QueryElevationComponent ("elevator" from solrconfig.xml) 
for one particular entry we wanted at the top, but because we aren't 
using a pure q value, elevator never finds a match to boost.  We didn't 
realize it at the time because the record we were elevating eventually 
became the top response anyway.
Recently added a _val_:formula to the q value to juice records based on 
a value in the record.
Now we have need to push a few other records to the top, but we've lost 
the ability to use elevate.xml to do it.


Tried switching to dismax using qf, pf, qs, ps, and bf with a "pure" q 
value, and debug showed queryBoost with a match and records, but they 
weren't moved to the top of the result set.


What would really help is if there was something for elevator akin to 
spellcheck.q like elevation.q so I could pass in the actual user phrase 
while still performing all the other field score boosts in the q 
parameter. Alternatively, if anyone can explain why I'm running into 
problems getting QueryElevationComponent to move the results in a dismax 
query, I'd be very thankful.


--
Ryan T. Grange



Re: QueryElevationComponent blues

2010-03-08 Thread Jon Baer
Maybe some things to try:

* make sure your uniqueKey is string field type (ie if using int it will not 
work)
* forceElevation to true (if sorting)

- Jon

On Mar 9, 2010, at 12:34 AM, Ryan Grange wrote:

> Using Solr 1.4.
> Was using the standard query handler, but needed the boost by field 
> functionality of qf from dismax.
> So we altered the query to boost certain phrases against a given field.
> We were using QueryElevationComponent ("elevator" from solrconfig.xml) for 
> one particular entry we wanted at the top, but because we aren't using a pure 
> q value, elevator never finds a match to boost.  We didn't realize it at the 
> time because the record we were elevating eventually became the top response 
> anyway.
> Recently added a _val_:formula to the q value to juice records based on a 
> value in the record.
> Now we have need to push a few other records to the top, but we've lost the 
> ability to use elevate.xml to do it.
> 
> Tried switching to dismax using qf, pf, qs, ps, and bf with a "pure" q value, 
> and debug showed queryBoost with a match and records, but they weren't moved 
> to the top of the result set.
> 
> What would really help is if there was something for elevator akin to 
> spellcheck.q like elevation.q so I could pass in the actual user phrase while 
> still performing all the other field score boosts in the q parameter. 
> Alternatively, if anyone can explain why I'm running into problems getting 
> QueryElevationComponent to move the results in a dismax query, I'd be very 
> thankful.
> 
> -- 
> Ryan T. Grange
> 



Re: More contextual information in analyser

2010-03-08 Thread dbejean

It is true I need also this metadata at query time. For the moment, I put
this extra information at the beginning of the data too be indexed and at
the beginning of the query. It works, but I really don't like this. In my
case, I need the language of the data to be index and the language of the
query.

The goal is to dynamically use the correct chain of tokenizers and filters
according to the language and so use only one field in my index for all
languages.



Lance Norskog-2 wrote:
> 
> This is an interesting idea. There are other projects to make the
> analyzer/filter chain more "porous", or open to outside interaction.
> 
> A big problem is that queries are analyzed, too. If you want to give
> the same metadata to the analyzer when doing a query against the
> field, things get tough. You would need a special query parser to
> implement your own syntax to do that. However, the analyzer chain in
> the query phase does not receive the parsed query, so you have to in
> some way change this.
> 
> On Mon, Mar 8, 2010 at 2:14 AM, dbejean  wrote:
>>
>> Hello,
>>
>> If I write a custom analyser that accept a specific attribut in the
>> constructor
>>
>> public MyCustomAnalyzer(String myAttribute);
>>
>> Is there a way to dynamically send a value for this attribute from Solr
>> at
>> index time in the XML Message ?
>>
>> 
>>  
>>    .
>>
>>
>> Obviously, in Sorl shema.xml, the "content" field is associated to my
>> custom
>> Analyser.
>>
>> Thank you.
>>
>> Dominique
>>
>> --
>> View this message in context:
>> http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
> 
> 

-- 
View this message in context: 
http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27831948.html
Sent from the Solr - User mailing list archive at Nabble.com.