Re: Error 400 - By search with exclamation mark ... ?! PatternReplaceFilterFactory ?
oh okay, thx a lot ;) can i escape all possible operators with a requesthandler ? or can i escape these operators automatic when the syntax is wrong ? is use Solr with an php client ^^ MitchK wrote: > > According to Ahmet Arslan's Post: > Solr is expecting a word after the "!", because it is an operator. > If you escape it, it is part of the queried string. > -- View this message in context: http://old.nabble.com/Error-400---By-search-with-exclamation-mark-...--%21-PatternReplaceFilterFactory---tp27778918p27818984.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Free Webinar: Mastering Solr 1.4 with Yonik Seeley
i have the same problem ... i wrote an email...--> Jonas, did you set the country correctly? If you set it to the US it will validate against US number formats and not recognize your number in Germany. but i did not find any option to set my country =( Janne Majaranta wrote: > > Do I need a U.S. phone number to view the recording / download the slides > ? > The registration form whines about invalid area code.. > > -Janne > > -- View this message in context: http://old.nabble.com/Re%3A-Free-Webinar%3A-Mastering-Solr-1.4-with-Yonik-Seeley-tp27720526p27819123.html Sent from the Solr - User mailing list archive at Nabble.com.
More contextual information in anlyzers
Hello, If I write a custom analyser that accept a specific attribut in the constructor public MyCustomAnalyzer(String myAttribute); Is there a way to dynamically send a value for this attribute from Solr at index time in the XML Message ? . Obviously, in Sorl shema.xml, the "content" field is associated to my custom Analyser. Thank you. Dominique -- View this message in context: http://old.nabble.com/More-contextual-information-in-anlyzers-tp27819298p27819298.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: CoreAdmin
Shalin Shekhar Mangar wrote: > > On Sat, Feb 27, 2010 at 5:22 PM, Suram wrote: > >> >> Hi all, >> >> How can i configure Core admin under the Tomcat server.Kindly >> could >> u tell me anyone >> >> > There's nothing to configure. If you are using multiple cores in Solr 1.4 > then CoreAdmin is available. > > See http://wiki.apache.org/solr/CoreAdmin > > -- > Regards, > Shalin Shekhar Mangar. > > Hi all, i configured coreAdmin Like http://localhost:8080/solr/core0/ but i try to index the file,it not accept throwing the error like below \solr\example\exampledocs>java -Ddata=args -Dcommit=yes -Durl=http://l ocalhost:8080/solr/core0/update -jar post.jar Example.xml SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, othe r encodings are not currently supported SimplePostTool: POSTing args to http://localhost:8080/solr/update.. SimplePostTool: FATAL: Solr returned an error: Bad Request and i tried \solr\example\exampledocs>java -jar post.jar Example.xml SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, othe r encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8080/solr/update.. SimplePostTool: POSTing file Example.xml SimplePostTool: FATAL: Solr returned an error: Bad Request i con't found any error in tomcat -- View this message in context: http://old.nabble.com/CoreAdmin-tp27727352p27819657.html Sent from the Solr - User mailing list archive at Nabble.com.
question related to coord() [might be expert level]
Hello, I came to know that coord() value is being calculated on each sub-query (BooleanQuery) present in the main query. For Ex : f-field, k-keyword (( f1:k1 OR f2:k2) OR f3:k3) OR f4:k4 Here if I am correct, coord() is being calculated totally 3 times. My goal is to boost ( or edit formula of ) coord() value which is "for the last time". It may seem strange untill you know why it is needed. We are expanding query using QueryParser plugin. It adds synonym-terms of each field. For example : town:lausanne ---> is expanded to : (town:lausanne OR city:lausanne). Consider a big query : Let us assume that f1s1-> first synonym of f1 , f1s2---> second synonym of f1, and so on So, the query mentioned above is expanded to .. (((f1:k1 or f1s1:k1 or f1s2:k1) OR (f2:k2 or f2s1:k2)) OR (f3:k3 or f3s1:k3)) OR f4:k4 [assume no synonyms for f4] . So, here it makes sense to edit coord formula for the last "coord" value, but not for every sub-boolean query because there could be 10 synonyms in some cases, etc.. My questions.. 1) Is there any chance of finding out inside Similarity whether current one is the last coord() ? 2) Or is there any other place where we can edit and reach our goal. 3) I have found out usage of "Coordintor" inside "BooleanScorer2", which seems there could be a way to boost the last element of the index in coordFactors[], but I do not know whether there could be plugin for that, or even what would be the effect. This seems really expert level [for my knowledge], so I am seeking some help. Thanks.
Re: Error 400 - By search with exclamation mark ... ?! PatternReplaceFilterFactory ?
> can i escape all possible operators with a requesthandler? With a custom one yes. You can use the static method org.apache.lucene.queryParser.QueryParser.escape(String s). public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception, ParseException, InstantiationException, IllegalAccessException { String q = req.getParams().get(CommonParams.Q); ModifiableSolrParams solrParams = new ModifiableSolrParams(req.getParams()); solrParams.set(CommonParams.Q, QueryParser.escape(q)); req.setParams(solrParams); super.handleRequestBody(req, rsp); } With this solution users cannot use solr query syntax anymore. For example range, wildcard queries won't work. > or can i escape these operators automatic when the syntax > is wrong ? May be you can try to parse value of q parameter with new QueryParser in try catch block. If exception occurs you can escape special characters. However I would prefer to do it in client side.
RE: Handling and sorting email addresses
Thanks Mitch, using the analysis page has been a real eye-opener and given me a better insight into how Solr was applying the filters (and more importantly in which order). I've ironically ended up with a charFilter mapping file as this seemed the only route to replacing characters before the tokenizer kicked in, unfortunately Solr just refused to allow sorting on anything tokenized with characters other than whitespace. Cheers, Ian. -Original Message- From: MitchK [mailto:mitc...@web.de] Sent: 07 March 2010 22:44 To: solr-user@lucene.apache.org Subject: Re: Handling and sorting email addresses Ian, did you have a look at Solr's admin analysis.jsp? When everything on the analysis's page is fine, you have missunderstood Solr's schema.xml-file. You've set two attributes in your schema.xml: stored = true indexed = true What you get as a response is the stored field value. The stored field value is the original field value, without any modifications. However, Solr is using the indexed field value to query your data. Kind regards - Mitch Ian Battersby wrote: > > Forgive what might seem like a newbie question but am struggling > desperately > with this. > > We have a dynamic field that holds email address and we'd like to be able > to > sort by it, obviously when trying to do this we get an error as it thinks > the email address is a tokenized field. We've tried a custom field type > using PatternReplaceFilterFactory to specify that @ and . should be > replaced > with " AT " and " DOT " but we just can't seem to get it to work, all the > field still contain the unparsed email. > > We used an example found on the mailing-list for the field type: > > positionIncrementGap="100"> > > > > replacement=" DOT " replace="all" /> > replacement=" AT " replace="all" /> > generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0" splitOnCaseChange="0"/> > > > > .. our dynamic field looks like .. > >stored="true" multiValued="true" /> > > When writing a document to Solr it still seems to write the original email > address (e.g. this.u...@somewhere.com) opposed to its parsed version (e.g. > this DOT user AT somewhere DOT com). Can anyone help? > > We are running version 1.4 but have even tried the nightly build in an > attempt to solve this problem. > > Thanks. > > > -- View this message in context: http://old.nabble.com/Handling-and-sorting-email-addresses-tp27813111p278152 39.html Sent from the Solr - User mailing list archive at Nabble.com.
question about mergeFactor
Hello, On the solr wiki, here: http://wiki.apache.org/solr/SolrPerformanceFactors It is written: mergeFactor Tradeoffs High value merge factor (e.g., 25): Pro: Generally improves indexing speed Con: Less frequent merges, resulting in a collection with more index files which may slow searching Low value merge factor (e.g., 2): Pro: Smaller number of index files, which speeds up searching. Con: More segment merges slow down indexing. If I have a mergeFactor of 50 when I build the index and then I optimize the index, I end up with 1 index file so I have a small number of index files and having used mergeFactor of 50 won't slow searching? Or my supposition is wrong and the mergeFactor used when building the index has an impact on speed searching anyway? Thanks. -- This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf of i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an authorised employee of the Company. http://www.192.com (Tel: 08000 192 192). i-CD Publishing (UK) Ltd is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.
Re: Question about fieldNorms
Wonderful! That explains it. Thanks a lot! Regards, On Mon, Mar 8, 2010 at 6:39 AM, Jay Hill wrote: > Yes, if omitNorms=true, then no lengthNorm calculation will be done, and > the > fieldNorm value will be 1.0, and lengths of the field in question will not > be a factor in the score. > > To see an example of this you can do a quick test. Add two "text" fields, > and on one omitNorms: > > >omitNorms="true"/> > > Index a doc with the same value for both fields: > 1 2 3 4 5 > 1 2 3 4 5 > > Set &debugQuery=true and do two queries: &q=foo:5 &q=bar:5 > > in the "explain" section of the debug output note that the fieldNorm value > for the "foo" query is this: > >0.4375 = fieldNorm(field=foo, doc=1) > > and the value for the "bar" query is this: > >1.0 = fieldNorm(field=bar, doc=1) > > A simplified description of how the fieldNorm value is: fieldNorm = > lengthNorm * documentBoost * documentFieldBoosts > > and the lengthNorm is calculated like this: lengthNorm = > 1/(numTermsInField)**.5 > [note that the value is encoded as a single byte, so there is some > precision > loss] > > When omitNorms=true no norm calculation is done, so fieldNorm will always > be > one on those fields. > > You can also use the Luke utility to view the document in the index, and it > will show that there is a norm value for the foo field, but not the bar > field. > > -Jay > http://www.lucidimagination.com > > > On Sun, Mar 7, 2010 at 5:55 AM, Siddhant Goel >wrote: > > > Hi everyone, > > > > Is the fieldNorm calculation altered by the omitNorms factor? I saw on > this > > page (http://old.nabble.com/Question-about-fieldNorm-td17782701.html) > the > > formula for calculation of fieldNorms (fieldNorm = > > fieldBoost/sqrt(numTermsForField)). > > > > Does this mean that for a document containing a string like "A B C D E" > in > > its field, its fieldNorm would be boost/sqrt(5), and for another document > > containing the string "A B C" in the same field, its fieldNorm would be > > boost/sqrt(3). Is that correct? > > > > If yes, then is *this* what omitNorms affects? > > > > Thanks, > > > > -- > > - Siddhant > > > -- - Siddhant
Re: index merge
Hi Mark, On Sun, Mar 7, 2010 at 6:20 PM, Mark Fletcher wrote: > > I have created 2 identical cores coreX and coreY (both have different > dataDir values, but their index is same). > coreX - always serves the request when a user performs a search. > coreY - the updates will happen to this core and then I need to synchronize > it with coreX after the update process, so that coreX also has the > latest data in it. After coreX and coreY are synchronized, both > should again be identical again. > > For this purpose I tried core merging of coreX and coreY once coreY is > updated with the latest set of data. But I find coreX to be containing > double the record count as in coreY. > (coreX = coreX+coreY) > > Is there a problem in using MERGE concept here. If it is wrong can some one > pls suggest the best approach. I tried the various merges explained in my > previous mail. > > Index merge happens at the Lucene level which has no idea about uniqueKeys. Therefore when you merge two indexes containing exactly the same documents (by uniqueKey), you get double the document count. Looking at your scenario, it seems to me that what you want to do is a swap operation. coreX is serving the requests, coreY is updated and now you can swap coreX with coreY so that new requests hit the updated index. I suggest you look at the swap operation instead of index merge. -- Regards, Shalin Shekhar Mangar.
Re: which links do i have to follow to understand location based search concepts?
Hi, Thank You for explaining it in a simple way. The article really helped me to understand the concepts better. My question is ,Is it necessary that the data what you are indexing in spatial example, is to be in the osm format and using facts files? In my case,am trying to index data ,that has just lat,longitude and related news item(just text) in a xml file which looks like this I have silghtly modified driver.java and other .java files in src/main/java folder, so that these fields are considered for indexing.(but have retained geohash,lat_rad,lng_rad as done in spatial example) But when i do ant index , am getting Buildfile: build.xml init: compile: index: [echo] Indexing ./data/ [java] ./data/ http://localhost:8983/solr [java] Num args: 2 [java] Starting indexing [java] Indexing: ./data/final.xml [java] Mar 8, 2010 4:40:35 AM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry [java] INFO: I/O exception (java.net.ConnectException) caught when processing request: Connection refused [java] Mar 8, 2010 4:40:35 AM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry [java] INFO: Retrying request [java] Mar 8, 2010 4:40:35 AM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry [java] INFO: I/O exception (java.net.ConnectException) caught when processing request: Connection refused [java] Mar 8, 2010 4:40:35 AM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry [java] INFO: Retrying request [java] Mar 8, 2010 4:40:35 AM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry [java] INFO: I/O exception (java.net.ConnectException) caught when processing request: Connection refused [java] Mar 8, 2010 4:40:35 AM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry [java] INFO: Retrying request [java] org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException: Connection refused [java] at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472) [java] at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) [java] at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) [java] at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) [java] at OSMHandler.endElement(OSMHandler.java:127) [java] at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:601) [java] at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1774) [java] at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2930) [java] at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) [java] at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) [java] at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807) [java] at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) [java] at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107) [java] at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) [java] at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) [java] at javax.xml.parsers.SAXParser.parse(SAXParser.java:395) [java] at javax.xml.parsers.SAXParser.parse(SAXParser.java:198) [java] at OSM2Solr.process(OSM2Solr.java:44) [java] at Driver.main(Driver.java:80) [java] Caused by: java.net.ConnectException: Connection refused [java] at java.net.PlainSocketImpl.socketConnect(Native Method) [java] at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) [java] at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) [java] at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) [java] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) [java] at java.net.Socket.connect(Socket.java:519) [java] at java.net.Socket.connect(Socket.java:469) [java] at java.net.Socket.(Socket.java:366) [java] at java.net.Socket.(Socket.java:240) [java] at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) [java] at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) [java] at org.a
Re: which links do i have to follow to understand location based search concepts?
On Mon, Mar 8, 2010 at 6:21 PM, KshamaPai wrote: > > Hi, > Thank You for explaining it in a simple way. > The article really helped me to understand the concepts better. > > My question is ,Is it necessary that the data what you are indexing in > spatial example, is to be in the osm format and using facts files? > In my case,am trying to index data ,that has just lat,longitude and > related > news item(just text) in a xml file which looks like this > > > > > > > > I have silghtly modified driver.java and other .java files in src/main/java > folder, so that these fields are considered for indexing.(but have retained > geohash,lat_rad,lng_rad as done in spatial example) > > But when i do ant index , am getting > > Buildfile: build.xml > > init: > > compile: > > index: > [echo] Indexing ./data/ > [java] ./data/ http://localhost:8983/solr > [java] Num args: 2 > [java] Starting indexing > [java] Indexing: ./data/final.xml > [java] Mar 8, 2010 4:40:35 AM > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry > [java] INFO: I/O exception (java.net.ConnectException) caught when > processing request: Connection refused > The "Connection refused" message suggests that your Solr instance is either not running or you have given the wrong host/port in your driver. -- Regards, Shalin Shekhar Mangar.
Re: question about mergeFactor
On Mon, Mar 8, 2010 at 5:31 PM, Marc Des Garets wrote: > > If I have a mergeFactor of 50 when I build the index and then I optimize > the index, I end up with 1 index file so I have a small number of index > files and having used mergeFactor of 50 won't slow searching? Or my > supposition is wrong and the mergeFactor used when building the index > has an impact on speed searching anyway? > > If you optimize then mergeFactor does not matter and your searching speed will not be slowed down. On the other hand, the optimize may take the bulk of the indexing time, so you won't get any benefit from using a mergeFactor of 50. -- Regards, Shalin Shekhar Mangar.
LocalSolr,Apache-solr-1.4.0
hi, I am interested in spatial search. I am using Apache-solr 1.4.0 and LocalSolr I have followed the instructions given in the following website http://gissearch.com/localsolr The query of the following format /solr/select?&qt=geo&lat=xx.xx&long=yy.yy&q=abc&radius=zz (after substituting valid values) given in the website is not producing any results. I am also not understanding the significance of field "long" in the query,though there is no field specified as "long", instead i can find only "lng". I want the results to be produced with respect to radius and not the results got by a mere full text search.. Is there any need to use LocalLucene also? Any help regarding this will be appreciated. Thanks in advance -- View this message in context: http://old.nabble.com/LocalSolr%2CApache-solr-1.4.0-tp27819867p27819867.html Sent from the Solr - User mailing list archive at Nabble.com.
Position of snippet within highlighted field
Does anyone know if it's possible to get the position of the highlighted snippet within the field that's being highlighted? It would be really useful for me to know if the snippet is at the beginning or at the end of the text field that it comes from. Thanks, Mark.
Re: example solr xml working fine but my own xml files not working
Have you looked in your SOLR log file to see what that says? Check the editor you use for your XML. is it using UTF-8 (although you don't appear to be using any odd characters, probably not a problem). Think about taking the xml file that *does* work, copying it and editing *that* one. Erick On Mon, Mar 8, 2010 at 12:13 AM, venkatesh uruti wrote: > > Dear Eric, > > Please find below necessary steps that executed. > > Iam following same structure as mentioned by you, and checked results in > the admin page by clicking search button, samples are working fine. > > Ex:Added monitor.xml and search for video its displaying results- > search > content is displaying properly > > Let me explain you the problem which iam facing: > > step 1: I started Apache tomcat > > step2 : Indexing Data > java -jar post.jar myfile.xml > > Here is my XML content: > > > > > 1 > Youth to Elder > Integrated Research Program > 2009 > First Nation > > > 2 > Strategies > Implementation Committee > 2001 > Policy > > > > > Step 4 : i did > > java -jar post.jar myfile.xml > > > output of above one: > > SimplePostTool: version 1.2 > SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, > othe > r encodings are not currently supported > SimplePostTool: POSTing files to http://localhost:8983/solr/update.. > SimplePostTool: POSTing file curnew.xml > SimplePostTool: FATAL: Solr returned an error: Bad Request > > Request to help me on this. > > -- > View this message in context: > http://old.nabble.com/example-solr-xml-working-fine-but-my-own-xml-files-not-working-tp27793958p27817161.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
HTML encode extracted docs
I'm uploading .htm files to be extracted - some of these files are "include" files that have snippets of HTML rather than fully formed html documents. solr-cell stores the raw HTML for these items, rather than extracting the text. Is there any way I can get solr to encode this content prior to storing it? At the moment, I have the problem that when the highlighted snippets are retrieved via search, I need to parse the snippet and HTML encode the bits of HTML that where indexed, whilst *not* encoding the bits that where added by the highlighter, which is messy and time consuming. Thanks! Mark,
Re: Handling and sorting email addresses
Well, it's not unfortunate . What would it mean to sort on a tokenized field? Let's say I index "is testing fun". Removing stopwords and stemming probably indexes "test" "fun". How in the world would meaningful sorts happen now? Even if it was "in order", since the first token was stopped out this document wouldn't even be in the right part of the alphabet. The usual solution is to use copyfield and index your field untokenized in that second field, then sort on *that* field. HTH Erick On Mon, Mar 8, 2010 at 6:56 AM, Ian Battersby wrote: > Thanks Mitch, using the analysis page has been a real eye-opener and given > me a better insight into how Solr was applying the filters (and more > importantly in which order). I've ironically ended up with a charFilter > mapping file as this seemed the only route to replacing characters before > the tokenizer kicked in, unfortunately Solr just refused to allow sorting > on > anything tokenized with characters other than whitespace. > > Cheers, Ian. > > -Original Message- > From: MitchK [mailto:mitc...@web.de] > Sent: 07 March 2010 22:44 > To: solr-user@lucene.apache.org > Subject: Re: Handling and sorting email addresses > > > Ian, > > did you have a look at Solr's admin analysis.jsp? > When everything on the analysis's page is fine, you have missunderstood > Solr's schema.xml-file. > > You've set two attributes in your schema.xml: > stored = true > indexed = true > > What you get as a response is the stored field value. > The stored field value is the original field value, without any > modifications. > However, Solr is using the indexed field value to query your data. > > Kind regards > - Mitch > > > Ian Battersby wrote: > > > > Forgive what might seem like a newbie question but am struggling > > desperately > > with this. > > > > We have a dynamic field that holds email address and we'd like to be able > > to > > sort by it, obviously when trying to do this we get an error as it thinks > > the email address is a tokenized field. We've tried a custom field type > > using PatternReplaceFilterFactory to specify that @ and . should be > > replaced > > with " AT " and " DOT " but we just can't seem to get it to work, all the > > field still contain the unparsed email. > > > > We used an example found on the mailing-list for the field type: > > > > > positionIncrementGap="100"> > > > > > > > > > replacement=" DOT " replace="all" /> > > > replacement=" AT " replace="all" /> > > > generateWordParts="1" > > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > > catenateAll="0" splitOnCaseChange="0"/> > > > > > > > > .. our dynamic field looks like .. > > > >> stored="true" multiValued="true" /> > > > > When writing a document to Solr it still seems to write the original > email > > address (e.g. this.u...@somewhere.com) opposed to its parsed version > (e.g. > > this DOT user AT somewhere DOT com). Can anyone help? > > > > We are running version 1.4 but have even tried the nightly build in an > > attempt to solve this problem. > > > > Thanks. > > > > > > > > -- > View this message in context: > > http://old.nabble.com/Handling-and-sorting-email-addresses-tp27813111p278152 > 39.html > Sent from the Solr - User mailing list archive at Nabble.com. > > >
RE: question about mergeFactor
Perfect. Thank you for your help. -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: 08 March 2010 12:57 To: solr-user@lucene.apache.org Subject: Re: question about mergeFactor On Mon, Mar 8, 2010 at 5:31 PM, Marc Des Garets wrote: > > If I have a mergeFactor of 50 when I build the index and then I optimize > the index, I end up with 1 index file so I have a small number of index > files and having used mergeFactor of 50 won't slow searching? Or my > supposition is wrong and the mergeFactor used when building the index > has an impact on speed searching anyway? > > If you optimize then mergeFactor does not matter and your searching speed will not be slowed down. On the other hand, the optimize may take the bulk of the indexing time, so you won't get any benefit from using a mergeFactor of 50. -- Regards, Shalin Shekhar Mangar. -- This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf of i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an authorised employee of the Company. http://www.192.com (Tel: 08000 192 192). i-CD Publishing (UK) Ltd is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.
Re: index merge
Hi Shalin, Thank you for the reply. I got your point. So I understand merge will just duplicate things. I ran the SWAP command. Now:- COREX has the dataDir pointing to the updated dataDir of COREY. So COREX has the latest. Again, COREY (on which the update regularly runs) is pointing to the old index of COREX. So this now doesnt have the most updated index. Now shouldn't I update the index of COREY (now pointing to the old COREX) so that it has the latest footprint as in COREX (having the latest COREY index)so that when the update again happens to COREY, it has the latest and I again do the SWAP. Is a physical copying of the index named COREY (the latest and now datDir of COREX after SWAP) to the index COREX (now the dataDir of COREY.. the orginal non-updated index of COREX) the best way for this or is there any other better option. Once again, later when COREY is again updated with the latest, I will run the SWAP again and it will be fine with COREX again pointing to its original dataDir (now the updated one).So every even SWAP command run will point COREX back to its original dataDir. (same case with COREY). My only concern is after the SWAP is done, updating the old index (which was serving previously and now replaced by the new index). What is the best way to do that? Physically copy the latest index to the old one and make it in sync with the latest one so that by the time it is to get the latest updates it has the latest in it so that the new ones can be added to this and it becomes the latest and is again swapped? Please share your opinion. Once again your help is appreciated. I am kind of going in circles with multiple indexs for some days! Thanks and Rgds, Mark. On Mon, Mar 8, 2010 at 7:45 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Hi Mark, > > On Sun, Mar 7, 2010 at 6:20 PM, Mark Fletcher > wrote: > > > > > I have created 2 identical cores coreX and coreY (both have different > > dataDir values, but their index is same). > > coreX - always serves the request when a user performs a search. > > coreY - the updates will happen to this core and then I need to > synchronize > > it with coreX after the update process, so that coreX also has the > > latest data in it. After coreX and coreY are synchronized, > both > > should again be identical again. > > > > For this purpose I tried core merging of coreX and coreY once coreY is > > updated with the latest set of data. But I find coreX to be containing > > double the record count as in coreY. > > (coreX = coreX+coreY) > > > > Is there a problem in using MERGE concept here. If it is wrong can some > one > > pls suggest the best approach. I tried the various merges explained in my > > previous mail. > > > > > Index merge happens at the Lucene level which has no idea about uniqueKeys. > Therefore when you merge two indexes containing exactly the same documents > (by uniqueKey), you get double the document count. > > Looking at your scenario, it seems to me that what you want to do is a swap > operation. coreX is serving the requests, coreY is updated and now you can > swap coreX with coreY so that new requests hit the updated index. I suggest > you look at the swap operation instead of index merge. > > -- > Regards, > Shalin Shekhar Mangar. >
Tomcat save my Index temp ...
Hello. is use 2 cores for solr. when is restart my tomcat on debian, tomcat delete my index. is set data.dir to ${solr.data.dir:./suggest/data} and ${solr.data.dir:./search/data} so. why is my index only temp ? solr save my index to: /var/lib/tomcat5.5/temp i test my solr env on XP with tomcat, and all ist okay =( -- View this message in context: http://old.nabble.com/Tomcat-save-my-Index-temp-...-tp27819967p27819967.html Sent from the Solr - User mailing list archive at Nabble.com.
Import database
Hi, I have started using Solr. I had a problem when I insert a database with 2 million rows . I hav The server encounters error: java.lang.OutOfMemoryError: Java heap space I searched around but can't find the solution. Any hep regarding this will be appreciated. Thanks in advance
Re: Tomcat save my Index temp ...
You're probably hitting the difference between *nix file handling and Windows. When you delete a file on a Unix variant, if some other program has the file open the file doesn't go away until that other program closes it. HTH Erick On Mon, Mar 8, 2010 at 9:08 AM, stocki wrote: > > Hello. > > is use 2 cores for solr. > > when is restart my tomcat on debian, tomcat delete my index. > > is set data.dir to > ${solr.data.dir:./suggest/data} > and > ${solr.data.dir:./search/data} > > > > dataDir="/suggest/data/index"/> > > > so. why is my index only temp ? > > solr save my index to: /var/lib/tomcat5.5/temp > > i test my solr env on XP with tomcat, and all ist okay =( > -- > View this message in context: > http://old.nabble.com/Tomcat-save-my-Index-temp-...-tp27819967p27819967.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Import database
I had same issue with Jetty Adding extra memory resolved my issue ie: java -Xms=512M -Xmx=1024M -jar start.jar Its in the manual, but cant seem to find the link On 8 Mar 2010, at 14:09, Quan Nguyen Anh wrote: > Hi, > I have started using Solr. I had a problem when I insert a database with 2 > million rows . I hav > The server encounters error: java.lang.OutOfMemoryError: Java heap space > I searched around but can't find the solution. > Any hep regarding this will be appreciated. > Thanks in advance > > >
Re: Tomcat save my Index temp ...
Am 08.03.2010 15:08, schrieb stocki: Hello. is use 2 cores for solr. when is restart my tomcat on debian, tomcat delete my index. you should check your tomcat-setup. is set data.dir to ${solr.data.dir:./suggest/data} and ${solr.data.dir:./search/data} use an absolute path [you have not set the solr.home path] this is working/tmp dir from tomcat per default. is ok. but this is relative from solr.home. so. why is my index only temp ? try to setup solr again. http://wiki.apache.org/solr/SolrTomcat try to setup with Context fragment. Create a Tomcat Context fragment to point /docBase/ to the /$SOLR_HOME/apache-solr-1.3.0.war/ file and /solr/home/ to /$SOLR_HOME/: and avoid storing the data in .../tmp/
Re: index merge
Hi Mark, On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher wrote: > > I ran the SWAP command. Now:- > COREX has the dataDir pointing to the updated dataDir of COREY. So COREX > has the latest. > Again, COREY (on which the update regularly runs) is pointing to the old > index of COREX. So this now doesnt have the most updated index. > > Now shouldn't I update the index of COREY (now pointing to the old COREX) > so that it has the latest footprint as in COREX (having the latest COREY > index)so that when the update again happens to COREY, it has the latest and > I again do the SWAP. > > Is a physical copying of the index named COREY (the latest and now datDir > of COREX after SWAP) to the index COREX (now the dataDir of COREY.. the > orginal non-updated index of COREX) the best way for this or is there any > other better option. > > Once again, later when COREY is again updated with the latest, I will run > the SWAP again and it will be fine with COREX again pointing to its original > dataDir (now the updated one).So every even SWAP command run will point > COREX back to its original dataDir. (same case with COREY). > > My only concern is after the SWAP is done, updating the old index (which > was serving previously and now replaced by the new index). What is the best > way to do that? Physically copy the latest index to the old one and make it > in sync with the latest one so that by the time it is to get the latest > updates it has the latest in it so that the new ones can be added to this > and it becomes the latest and is again swapped? > Perhaps it is best if we take a step back and understand why you need two identical cores? -- Regards, Shalin Shekhar Mangar.
Extracting content from mailman managed mail list archive
Hi, is anybody willing to share experience about how to extract content from mailing list archives in order to have it indexed by Lucene or Solr? Imagine that we have access to archive of some mailling list (e.g. http://www.mail-archive.com/mailman-users%40python.org/) and we would like to index individual emails. Is there any easy way how to extract just the text content produced by sender individual emails? I am interested in content generated by particular sender omitting the original quoted text. We can either access individual emails via web or we can download monthly archive in plain text format (but the content of individual emails depends on the email client of the author, i.e. plain text, html, html mixed with plain text in ... etc... it is very messy). I would prefer information about mailing lists managed by mailman but I don't want to limit the scope of this question so any general ideas are welcome. Regards, Lukas
Re: Extracting content from mailman managed mail list archive
I just checked popular search services and it seems that neither lucidimagination search nor search-lucene support this: http://www.lucidimagination.com/search/document/954e8589ebbc4b16/terminating_slashes_in_url_normalization http://www.search-lucene.com/m?id=510143ac0608042241k49f4afe7wcd25df3fbacc7...@mail.gmail.com||mailman Markmail does not support this as well http://markmail.org/message/papbjx3aoz3uvbhh Hmmm I think it would be useful to extract just the *NEW* content without all quotes because this influences Lucene scoring. Regards, Lukas On Mon, Mar 8, 2010 at 3:55 PM, Lukáš Vlček wrote: > Hi, > > is anybody willing to share experience about how to extract content from > mailing list archives in order to have it indexed by Lucene or Solr? > > Imagine that we have access to archive of some mailling list (e.g. > http://www.mail-archive.com/mailman-users%40python.org/) and we would like > to index individual emails. Is there any easy way how to extract just the > text content produced by sender individual emails? I am interested in > content generated by particular sender omitting the original quoted text. We > can either access individual emails via web or we can download monthly > archive in plain text format (but the content of individual emails depends > on the email client of the author, i.e. plain text, html, html mixed with > plain text in ... etc... it is very messy). > > I would prefer information about mailing lists managed by mailman but I > don't want to limit the scope of this question so any general ideas are > welcome. > > Regards, > Lukas >
Child entities in document not loading
All, So I think I have my first issue figured out, need to add terms to the default search. That's fine. New issue is that I'm trying to load child entities in with my entity. I added the appropriate fields to solrconfig.xml And I updated my document to match So my expectation is that there will be 3 new fields associated with it that are multivalued: sizes, colors, and sections. The full-import seems to work correctly. I get the appropriate number of documents in my searches. However, sizes, colors and sections all come up null (well, I should say they don't come up when I search for them). Any ideas on why it won't load these 3 child entities? Thanks! John
Re: Free Webinar: Mastering Solr 1.4 with Yonik Seeley
you only just delete your browser cache ;) stocki wrote: > > i have the same problem ... > > i wrote an email...--> > > Jonas, did you set the country correctly? If you set it to the US it will > validate against US number formats and not recognize your number in > Germany. > > but i did not find any option to set my country =( > > > > > Janne Majaranta wrote: >> >> Do I need a U.S. phone number to view the recording / download the slides >> ? >> The registration form whines about invalid area code.. >> >> -Janne >> >> > > -- View this message in context: http://old.nabble.com/Re%3A-Free-Webinar%3A-Mastering-Solr-1.4-with-Yonik-Seeley-tp27720526p27820239.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Child entities in document not loading
What does the solr admin page show you is actually in your index? Luke will also help. Erick On Mon, Mar 8, 2010 at 10:06 AM, John Ament wrote: > All, > > So I think I have my first issue figured out, need to add terms to the > default search. That's fine. > > New issue is that I'm trying to load child entities in with my entity. > > I added the appropriate fields to solrconfig.xml > > multiValued="true"/> > multiValued="true"/> > multiValued="true"/> > > And I updated my document to match > > > > > > > > > > So my expectation is that there will be 3 new fields associated with it > that > are multivalued: sizes, colors, and sections. > > The full-import seems to work correctly. I get the appropriate number of > documents in my searches. However, sizes, colors and sections all come up > null (well, I should say they don't come up when I search for them). > > Any ideas on why it won't load these 3 child entities? > > Thanks! > > John >
Re: Child entities in document not loading
Where would I see this? I do believe the fields are not ending up in the index. Thanks John On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson wrote: > What does the solr admin page show you is actually in your index? > > Luke will also help. > > Erick > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament wrote: > > > All, > > > > So I think I have my first issue figured out, need to add terms to the > > default search. That's fine. > > > > New issue is that I'm trying to load child entities in with my entity. > > > > I added the appropriate fields to solrconfig.xml > > > > > multiValued="true"/> > > > multiValued="true"/> > > > multiValued="true"/> > > > > And I updated my document to match > > > > > > > > > > > > > > > > > > > > So my expectation is that there will be 3 new fields associated with it > > that > > are multivalued: sizes, colors, and sections. > > > > The full-import seems to work correctly. I get the appropriate number of > > documents in my searches. However, sizes, colors and sections all come > up > > null (well, I should say they don't come up when I search for them). > > > > Any ideas on why it won't load these 3 child entities? > > > > Thanks! > > > > John > > >
Re: index merge
Hi Shalin, Thank you for the mail. My main purpose of having 2 identical cores COREX - always serves user request COREY - every day once, takes the updates/latest data and passess it on to COREX. is:- Suppose say I have only one COREY and suppose a request comes to COREY while the update of the latest data is happening on to it. Wouldn't it degrade performance of the requests at that point of time? So I was planning to keep COREX and COREY always identical. Once COREY has the latest it should somehow sync with COREX so that COREX also now has the latest. COREY keeps on getting the updates at a particular time of day and it will again pass it on to COREX. This process continues everyday. What is the best possible way to implement this? Thanks, Mark. On Mon, Mar 8, 2010 at 9:53 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Hi Mark, > > On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher < > mark.fletcher2...@gmail.com> wrote: > >> >> I ran the SWAP command. Now:- >> COREX has the dataDir pointing to the updated dataDir of COREY. So COREX >> has the latest. >> Again, COREY (on which the update regularly runs) is pointing to the old >> index of COREX. So this now doesnt have the most updated index. >> >> Now shouldn't I update the index of COREY (now pointing to the old COREX) >> so that it has the latest footprint as in COREX (having the latest COREY >> index)so that when the update again happens to COREY, it has the latest and >> I again do the SWAP. >> >> Is a physical copying of the index named COREY (the latest and now datDir >> of COREX after SWAP) to the index COREX (now the dataDir of COREY.. the >> orginal non-updated index of COREX) the best way for this or is there any >> other better option. >> >> Once again, later when COREY is again updated with the latest, I will run >> the SWAP again and it will be fine with COREX again pointing to its original >> dataDir (now the updated one).So every even SWAP command run will point >> COREX back to its original dataDir. (same case with COREY). >> >> My only concern is after the SWAP is done, updating the old index (which >> was serving previously and now replaced by the new index). What is the best >> way to do that? Physically copy the latest index to the old one and make it >> in sync with the latest one so that by the time it is to get the latest >> updates it has the latest in it so that the new ones can be added to this >> and it becomes the latest and is again swapped? >> > > Perhaps it is best if we take a step back and understand why you need two > identical cores? > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Tomcat save my Index temp ...
okay i install my solr so like how the wiki said. and a new try. here one of my two files: Jens Kapitza-2 wrote: > > Am 08.03.2010 15:08, schrieb stocki: >> Hello. >> >> is use 2 cores for solr. >> >> when is restart my tomcat on debian, tomcat delete my index. >> > you should check your tomcat-setup. >> is set data.dir to >> ${solr.data.dir:./suggest/data} >> and >> ${solr.data.dir:./search/data} >> >> > use an absolute path [you have not set the solr.home path] this is > working/tmp dir from tomcat per default. >> >> > dataDir="/search/data/index"/> >> > dataDir="/suggest/data/index"/> >> >> >> > is ok. but this is relative from solr.home. >> so. why is my index only temp ? >> >> > try to setup solr again. > http://wiki.apache.org/solr/SolrTomcat > > try to setup with Context fragment. > > Create a Tomcat Context fragment to point /docBase/ to the > /$SOLR_HOME/apache-solr-1.3.0.war/ file and /solr/home/ to /$SOLR_HOME/: > > > and avoid storing the data in .../tmp/ > > > -- View this message in context: http://old.nabble.com/Tomcat-save-my-Index-temp-...-tp27819967p27823287.html Sent from the Solr - User mailing list archive at Nabble.com.
Wildcard question -- case issue
I'm encountering a potential bug in Solr regarding wildcards. I have two fields defined thusly: and When searching with wildcards I get the following behavior. Two Documents in the index are named "CMJ foo bar" and "CME foo bar" The name field has been indexed twice as "name" and "namesimple" query: spell?q=name:(cm*) OR namesimple:(cm*) returns: CMJ foo bar CME foo bar spell?q=name:(CM*) OR namesimple:(CM*) returns No results. I added a equivalent synonym for "cmj,CMJ" and re-indexed spell?q=name:(CM*) OR namesimple:(CM*) returns CMJ foo bar Naturally I can't see the value or practical use of adding each of these as they get reported by users and the documentation I've read (as well as feedback I received on these forums) I've found stemming can interfere with wildcards during query and indexing, which is why the namesimple field is of type "textgen." This solved other wildcard/case issues, but this one remains. Any suggestions would be appreciated. Thanks! -- View this message in context: http://old.nabble.com/Wildcard-questioncase-issue-tp27823332p27823332.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Import database
What database are you using? Many of the JDBC drivers try to pull the entire resultset into RAM before feeding it to the application that requested the data. If it's MySQL, I can show you how to fix it. The batchSize parameter below tells it to stream the data rather than buffer it. With other databases, I don't know how to do this. url="jdbc:mysql://[SERVER]:3306/[SCHEMA]?zeroDateTimeBehavior=convertToNull" batchSize="-1" user="[REMOVED]" password="[REMOVED]"/> Shawn On 3/8/2010 7:09 AM, Quan Nguyen Anh wrote: Hi, I have started using Solr. I had a problem when I insert a database with 2 million rows . I hav The server encounters error: java.lang.OutOfMemoryError: Java heap space I searched around but can't find the solution. Any hep regarding this will be appreciated. Thanks in advance
Re: index merge
Hi Mark, On Mon, Mar 8, 2010 at 9:23 PM, Mark Fletcher wrote: > > My main purpose of having 2 identical cores > COREX - always serves user request > COREY - every day once, takes the updates/latest data and passess it on to > COREX. > is:- > > Suppose say I have only one COREY and suppose a request comes to COREY > while the update of the latest data is happening on to it. Wouldn't it > degrade performance of the requests at that point of time? > The thing to note is that both reads and writes are happening on the same box. So when you swap cores, the OS has to cache the hot segments of the new (inactive) index. If you were just re-opening the same (active) index, at least some of the existing files could remain in the OS's file cache. I think that may just degrade performance further so you should definitely benchmark before going through with this. The best practice is to use a master/slave architecture and separate the writes and reads. > So I was planning to keep COREX and COREY always identical. Once COREY has > the latest it should somehow sync with COREX so that COREX also now has the > latest. COREY keeps on getting the updates at a particular time of day and > it will again pass it on to COREX. This process continues everyday. > You could use the same approach that Solr 1.3's snapinstaller script used. It deletes the files and creates hard links to the new index files. -- Regards, Shalin Shekhar Mangar.
Re: index merge
On 03/08/2010 10:53 AM, Mark Fletcher wrote: Hi Shalin, Thank you for the mail. My main purpose of having 2 identical cores COREX - always serves user request COREY - every day once, takes the updates/latest data and passess it on to COREX. is:- Suppose say I have only one COREY and suppose a request comes to COREY while the update of the latest data is happening on to it. Wouldn't it degrade performance of the requests at that point of time? Yes - but your not going to help anything by using two indexes - best you can do it use two boxes. 2 indexes on the same box will actually be worse than one if they are identical and you are swapping between them. Writes on an index will not affect reads in the way you are thinking - only in that its uses IO and CPU that the read process cant. Thats going to happen with 2 indexes on the same box too - except now you have way more data to cache and flip between, and you can't take any advantage of things just being written possibly being in the cache for reads. Lucene indexes use a write once strategy - when writing new segments, you are not touching the segments being read from. Lucene is already doing the index juggling for you at the segment level. So I was planning to keep COREX and COREY always identical. Once COREY has the latest it should somehow sync with COREX so that COREX also now has the latest. COREY keeps on getting the updates at a particular time of day and it will again pass it on to COREX. This process continues everyday. What is the best possible way to implement this? Thanks, Mark. On Mon, Mar 8, 2010 at 9:53 AM, Shalin Shekhar Mangar< shalinman...@gmail.com> wrote: Hi Mark, On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher< mark.fletcher2...@gmail.com> wrote: I ran the SWAP command. Now:- COREX has the dataDir pointing to the updated dataDir of COREY. So COREX has the latest. Again, COREY (on which the update regularly runs) is pointing to the old index of COREX. So this now doesnt have the most updated index. Now shouldn't I update the index of COREY (now pointing to the old COREX) so that it has the latest footprint as in COREX (having the latest COREY index)so that when the update again happens to COREY, it has the latest and I again do the SWAP. Is a physical copying of the index named COREY (the latest and now datDir of COREX after SWAP) to the index COREX (now the dataDir of COREY.. the orginal non-updated index of COREX) the best way for this or is there any other better option. Once again, later when COREY is again updated with the latest, I will run the SWAP again and it will be fine with COREX again pointing to its original dataDir (now the updated one).So every even SWAP command run will point COREX back to its original dataDir. (same case with COREY). My only concern is after the SWAP is done, updating the old index (which was serving previously and now replaced by the new index). What is the best way to do that? Physically copy the latest index to the old one and make it in sync with the latest one so that by the time it is to get the latest updates it has the latest in it so that the new ones can be added to this and it becomes the latest and is again swapped? Perhaps it is best if we take a step back and understand why you need two identical cores? -- Regards, Shalin Shekhar Mangar. -- - Mark http://www.lucidimagination.com
Re: Child entities in document not loading
Try http:///solr/admin. You'll see a bunch of links that'll allow you to examine many aspects of your installation. Additionally, get a copy of Luke (Google Lucene Luke) and point it at your index for a detailed look at the index. Finally, the SOLR log file might give you some clues... HTH Erick On Mon, Mar 8, 2010 at 10:49 AM, John Ament wrote: > Where would I see this? I do believe the fields are not ending up in the > index. > > Thanks > > John > > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson >wrote: > > > What does the solr admin page show you is actually in your index? > > > > Luke will also help. > > > > Erick > > > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament > wrote: > > > > > All, > > > > > > So I think I have my first issue figured out, need to add terms to the > > > default search. That's fine. > > > > > > New issue is that I'm trying to load child entities in with my entity. > > > > > > I added the appropriate fields to solrconfig.xml > > > > > > > > multiValued="true"/> > > > > > multiValued="true"/> > > > > > multiValued="true"/> > > > > > > And I updated my document to match > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So my expectation is that there will be 3 new fields associated with it > > > that > > > are multivalued: sizes, colors, and sections. > > > > > > The full-import seems to work correctly. I get the appropriate number > of > > > documents in my searches. However, sizes, colors and sections all come > > up > > > null (well, I should say they don't come up when I search for them). > > > > > > Any ideas on why it won't load these 3 child entities? > > > > > > Thanks! > > > > > > John > > > > > >
Re: Child entities in document not loading
Erick, I'm sorry, but it's not helping much. I don't see anything on the admin screen that allows me to browse my index. Even using Luke, my assumption is that it's not loading correctly in the index. What parameters can I change in the logs to make it print out more information? I want to see what the query is returning I guess. Thanks, John On Mon, Mar 8, 2010 at 11:23 AM, Erick Erickson wrote: > Try http:///solr/admin. You'll see a bunch > of links that'll allow you to examine many aspects of your installation. > > Additionally, get a copy of Luke (Google Lucene Luke) and point it at > your index for a detailed look at the index. > > Finally, the SOLR log file might give you some clues... > > HTH > Erick > > On Mon, Mar 8, 2010 at 10:49 AM, John Ament wrote: > > > Where would I see this? I do believe the fields are not ending up in the > > index. > > > > Thanks > > > > John > > > > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson > >wrote: > > > > > What does the solr admin page show you is actually in your index? > > > > > > Luke will also help. > > > > > > Erick > > > > > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament > > wrote: > > > > > > > All, > > > > > > > > So I think I have my first issue figured out, need to add terms to > the > > > > default search. That's fine. > > > > > > > > New issue is that I'm trying to load child entities in with my > entity. > > > > > > > > I added the appropriate fields to solrconfig.xml > > > > > > > > > > > multiValued="true"/> > > > > > > > multiValued="true"/> > > > > > > > multiValued="true"/> > > > > > > > > And I updated my document to match > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So my expectation is that there will be 3 new fields associated with > it > > > > that > > > > are multivalued: sizes, colors, and sections. > > > > > > > > The full-import seems to work correctly. I get the appropriate > number > > of > > > > documents in my searches. However, sizes, colors and sections all > come > > > up > > > > null (well, I should say they don't come up when I search for them). > > > > > > > > Any ideas on why it won't load these 3 child entities? > > > > > > > > Thanks! > > > > > > > > John > > > > > > > > > >
Re: Child entities in document not loading
Sorry, won't be able to really look till tonight. Did you try Luke? What did it show? One thing I did notice though... field name="sections" type="string" indexed="true" stored="true" multiValued="true"/> string types are not analyzed, so the entire input is indexed as a single token. You might want "text" here Erick On Mon, Mar 8, 2010 at 11:37 AM, John Ament wrote: > Erick, > > I'm sorry, but it's not helping much. I don't see anything on the admin > screen that allows me to browse my index. Even using Luke, my assumption > is > that it's not loading correctly in the index. What parameters can I change > in the logs to make it print out more information? I want to see what the > query is returning I guess. > > Thanks, > > John > > On Mon, Mar 8, 2010 at 11:23 AM, Erick Erickson >wrote: > > > Try http:///solr/admin. You'll see a bunch > > of links that'll allow you to examine many aspects of your installation. > > > > Additionally, get a copy of Luke (Google Lucene Luke) and point it at > > your index for a detailed look at the index. > > > > Finally, the SOLR log file might give you some clues... > > > > HTH > > Erick > > > > On Mon, Mar 8, 2010 at 10:49 AM, John Ament > wrote: > > > > > Where would I see this? I do believe the fields are not ending up in > the > > > index. > > > > > > Thanks > > > > > > John > > > > > > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson < > erickerick...@gmail.com > > > >wrote: > > > > > > > What does the solr admin page show you is actually in your index? > > > > > > > > Luke will also help. > > > > > > > > Erick > > > > > > > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament > > > wrote: > > > > > > > > > All, > > > > > > > > > > So I think I have my first issue figured out, need to add terms to > > the > > > > > default search. That's fine. > > > > > > > > > > New issue is that I'm trying to load child entities in with my > > entity. > > > > > > > > > > I added the appropriate fields to solrconfig.xml > > > > > > > > > > stored="true" > > > > > multiValued="true"/> > > > > > > > > > multiValued="true"/> > > > > > > > > > multiValued="true"/> > > > > > > > > > > And I updated my document to match > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So my expectation is that there will be 3 new fields associated > with > > it > > > > > that > > > > > are multivalued: sizes, colors, and sections. > > > > > > > > > > The full-import seems to work correctly. I get the appropriate > > number > > > of > > > > > documents in my searches. However, sizes, colors and sections all > > come > > > > up > > > > > null (well, I should say they don't come up when I search for > them). > > > > > > > > > > Any ideas on why it won't load these 3 child entities? > > > > > > > > > > Thanks! > > > > > > > > > > John > > > > > > > > > > > > > > >
Re: Wildcard question -- case issue
> query: > > spell?q=name:(cm*) OR namesimple:(cm*) > > returns: > CMJ foo bar > CME foo bar > > spell?q=name:(CM*) OR namesimple:(CM*) > returns > No results. "Wildcard queries are not analyzed by Lucene and hence the behavior. [1] [1]http://www.search-lucene.com/m?id=4a8ce9b2.2070...@ait.co.at||wildcard%20not%20analyzed
Re: Child entities in document not loading
Another thing I don't get. The system feels like it's doing the extra queries. I put the LogTransformer expecting to see additional output on one of the child entities And yet there is no additional output. Thanks, John On Mon, Mar 8, 2010 at 11:37 AM, John Ament wrote: > Erick, > > I'm sorry, but it's not helping much. I don't see anything on the admin > screen that allows me to browse my index. Even using Luke, my assumption is > that it's not loading correctly in the index. What parameters can I change > in the logs to make it print out more information? I want to see what the > query is returning I guess. > > Thanks, > > John > > > On Mon, Mar 8, 2010 at 11:23 AM, Erick Erickson > wrote: > >> Try http:///solr/admin. You'll see a bunch >> of links that'll allow you to examine many aspects of your installation. >> >> Additionally, get a copy of Luke (Google Lucene Luke) and point it at >> your index for a detailed look at the index. >> >> Finally, the SOLR log file might give you some clues... >> >> HTH >> Erick >> >> On Mon, Mar 8, 2010 at 10:49 AM, John Ament wrote: >> >> > Where would I see this? I do believe the fields are not ending up in the >> > index. >> > >> > Thanks >> > >> > John >> > >> > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson < >> erickerick...@gmail.com >> > >wrote: >> > >> > > What does the solr admin page show you is actually in your index? >> > > >> > > Luke will also help. >> > > >> > > Erick >> > > >> > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament >> > wrote: >> > > >> > > > All, >> > > > >> > > > So I think I have my first issue figured out, need to add terms to >> the >> > > > default search. That's fine. >> > > > >> > > > New issue is that I'm trying to load child entities in with my >> entity. >> > > > >> > > > I added the appropriate fields to solrconfig.xml >> > > > >> > > >> > > > multiValued="true"/> >> > > >> > > > multiValued="true"/> >> > > >> > > > multiValued="true"/> >> > > > >> > > > And I updated my document to match >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > So my expectation is that there will be 3 new fields associated with >> it >> > > > that >> > > > are multivalued: sizes, colors, and sections. >> > > > >> > > > The full-import seems to work correctly. I get the appropriate >> number >> > of >> > > > documents in my searches. However, sizes, colors and sections all >> come >> > > up >> > > > null (well, I should say they don't come up when I search for them). >> > > > >> > > > Any ideas on why it won't load these 3 child entities? >> > > > >> > > > Thanks! >> > > > >> > > > John >> > > > >> > > >> > >> > >
Re: Child entities in document not loading
The issue's not about indexing, the issue's about storage. It seems like the fields (sections, colors, sizes) are all not being stored, even though store=true. I could not get Luke to work, no. The webstart just hangs at downloading 0%. Thanks, John On Mon, Mar 8, 2010 at 12:06 PM, Erick Erickson wrote: > Sorry, won't be able to really look till tonight. Did you try Luke? What > did > it > show? > > One thing I did notice though... > > field name="sections" type="string" indexed="true" stored="true" > multiValued="true"/> > > string types are not analyzed, so the entire input is indexed as > a single token. You might want "text" here > > Erick > > On Mon, Mar 8, 2010 at 11:37 AM, John Ament wrote: > > > Erick, > > > > I'm sorry, but it's not helping much. I don't see anything on the admin > > screen that allows me to browse my index. Even using Luke, my assumption > > is > > that it's not loading correctly in the index. What parameters can I > change > > in the logs to make it print out more information? I want to see what the > > query is returning I guess. > > > > Thanks, > > > > John > > > > On Mon, Mar 8, 2010 at 11:23 AM, Erick Erickson > >wrote: > > > > > Try http:///solr/admin. You'll see a bunch > > > of links that'll allow you to examine many aspects of your > installation. > > > > > > Additionally, get a copy of Luke (Google Lucene Luke) and point it at > > > your index for a detailed look at the index. > > > > > > Finally, the SOLR log file might give you some clues... > > > > > > HTH > > > Erick > > > > > > On Mon, Mar 8, 2010 at 10:49 AM, John Ament > > wrote: > > > > > > > Where would I see this? I do believe the fields are not ending up in > > the > > > > index. > > > > > > > > Thanks > > > > > > > > John > > > > > > > > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson < > > erickerick...@gmail.com > > > > >wrote: > > > > > > > > > What does the solr admin page show you is actually in your index? > > > > > > > > > > Luke will also help. > > > > > > > > > > Erick > > > > > > > > > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament > > > > wrote: > > > > > > > > > > > All, > > > > > > > > > > > > So I think I have my first issue figured out, need to add terms > to > > > the > > > > > > default search. That's fine. > > > > > > > > > > > > New issue is that I'm trying to load child entities in with my > > > entity. > > > > > > > > > > > > I added the appropriate fields to solrconfig.xml > > > > > > > > > > > > > stored="true" > > > > > > multiValued="true"/> > > > > > > stored="true" > > > > > > multiValued="true"/> > > > > > > > > > > > multiValued="true"/> > > > > > > > > > > > > And I updated my document to match > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So my expectation is that there will be 3 new fields associated > > with > > > it > > > > > > that > > > > > > are multivalued: sizes, colors, and sections. > > > > > > > > > > > > The full-import seems to work correctly. I get the appropriate > > > number > > > > of > > > > > > documents in my searches. However, sizes, colors and sections > all > > > come > > > > > up > > > > > > null (well, I should say they don't come up when I search for > > them). > > > > > > > > > > > > Any ideas on why it won't load these 3 child entities? > > > > > > > > > > > > Thanks! > > > > > > > > > > > > John > > > > > > > > > > > > > > > > > > > > >
Re: Child entities in document not loading
Ok - downloaded the binary off of google code and it's loading. The 3 child entities do not appear as I had suspected. Thanks, John On Mon, Mar 8, 2010 at 12:12 PM, John Ament wrote: > The issue's not about indexing, the issue's about storage. It seems like > the fields (sections, colors, sizes) are all not being stored, even though > store=true. > > I could not get Luke to work, no. The webstart just hangs at downloading > 0%. > > Thanks, > > John > > > On Mon, Mar 8, 2010 at 12:06 PM, Erick Erickson > wrote: > >> Sorry, won't be able to really look till tonight. Did you try Luke? What >> did >> it >> show? >> >> One thing I did notice though... >> >> field name="sections" type="string" indexed="true" stored="true" >> multiValued="true"/> >> >> string types are not analyzed, so the entire input is indexed as >> a single token. You might want "text" here >> >> Erick >> >> On Mon, Mar 8, 2010 at 11:37 AM, John Ament wrote: >> >> > Erick, >> > >> > I'm sorry, but it's not helping much. I don't see anything on the admin >> > screen that allows me to browse my index. Even using Luke, my >> assumption >> > is >> > that it's not loading correctly in the index. What parameters can I >> change >> > in the logs to make it print out more information? I want to see what >> the >> > query is returning I guess. >> > >> > Thanks, >> > >> > John >> > >> > On Mon, Mar 8, 2010 at 11:23 AM, Erick Erickson < >> erickerick...@gmail.com >> > >wrote: >> > >> > > Try http:///solr/admin. You'll see a >> bunch >> > > of links that'll allow you to examine many aspects of your >> installation. >> > > >> > > Additionally, get a copy of Luke (Google Lucene Luke) and point it at >> > > your index for a detailed look at the index. >> > > >> > > Finally, the SOLR log file might give you some clues... >> > > >> > > HTH >> > > Erick >> > > >> > > On Mon, Mar 8, 2010 at 10:49 AM, John Ament >> > wrote: >> > > >> > > > Where would I see this? I do believe the fields are not ending up in >> > the >> > > > index. >> > > > >> > > > Thanks >> > > > >> > > > John >> > > > >> > > > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson < >> > erickerick...@gmail.com >> > > > >wrote: >> > > > >> > > > > What does the solr admin page show you is actually in your index? >> > > > > >> > > > > Luke will also help. >> > > > > >> > > > > Erick >> > > > > >> > > > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament > > >> > > > wrote: >> > > > > >> > > > > > All, >> > > > > > >> > > > > > So I think I have my first issue figured out, need to add terms >> to >> > > the >> > > > > > default search. That's fine. >> > > > > > >> > > > > > New issue is that I'm trying to load child entities in with my >> > > entity. >> > > > > > >> > > > > > I added the appropriate fields to solrconfig.xml >> > > > > > >> > > > > >> > stored="true" >> > > > > > multiValued="true"/> >> > > > > >> stored="true" >> > > > > > multiValued="true"/> >> > > > > >> stored="true" >> > > > > > multiValued="true"/> >> > > > > > >> > > > > > And I updated my document to match >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > So my expectation is that there will be 3 new fields associated >> > with >> > > it >> > > > > > that >> > > > > > are multivalued: sizes, colors, and sections. >> > > > > > >> > > > > > The full-import seems to work correctly. I get the appropriate >> > > number >> > > > of >> > > > > > documents in my searches. However, sizes, colors and sections >> all >> > > come >> > > > > up >> > > > > > null (well, I should say they don't come up when I search for >> > them). >> > > > > > >> > > > > > Any ideas on why it won't load these 3 child entities? >> > > > > > >> > > > > > Thanks! >> > > > > > >> > > > > > John >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
DataInputHandlers and dynamic fields
If my query were something like this: "select col1, col2 from table", my dynamic field would be something like "fld_${col1}". But I could not find any information on how to setup the DIH with dynamic fields. I saw that dynamic fields should be supported with SOLR-742, but am not sure how to proceed. Does anyone have an example or information on how to set it up?
Re: SOLR takes more than 9 hours to index 300000 rows
Shawn, Increasing the fetch size and increasing my heap based on that did the trick.. Thanksss a lot for your help.. your suggestions helped me a lot.. Hope these suggestions will be helpful to others too who are facing similar kind of issue. Thanks, Barani Shawn Heisey-4 wrote: > > Do keep looking into the batchSize, but I think I might have found the > issue. If I understand things correctly, you will need to add > processor="CachedSqlEntityProcessor" to your first entity. It's only > specified on the other two. Assuming you have enough RAM and heap space > available in your JVM to load the results of all three queries, that > ought to make it work very quickly. > > If I'm right, basically what it's doing is issuing a real SQL query > against your first table for every entry it has read for the other two > tables. > > Shawn > > On 3/6/2010 11:58 AM, JavaGuy84 wrote: >> Shawn, >> >> Thanks a lot for your response, >> >> Yes, still the DB connection is active.. It is still fetching the data >> from >> the DB. >> >> I am using Redhat MetaMatrix DB as backend and I am trying to find out >> the >> parameter for setting the JDBC fetch size.. >> >> Do you think that this problem will be mostly due to fetch size? >> >> Thanks, >> Barani >> > > > -- View this message in context: http://old.nabble.com/SOLR-takes-more-than-9-hours-to-index-30-rows-tp27805403p27825172.html Sent from the Solr - User mailing list archive at Nabble.com.
Search on dynamic fields which contains spaces /special characters
Hi, We have some dynamic fields getting indexed using SOLR. Some of the dynamic fields contains spaces / special character (something like: short name, Full Name etc...). Is there a way to search on these fields (which contains the spaces etc..). Can someone let me know the filter I need to pass to do this type of search? I tried with short name:name1 --> this didnt work.. Thanks, Barani -- View this message in context: http://old.nabble.com/Search-on-dynamic-fields-which-contains-spaces--special-characters-tp27826147p27826147.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search on dynamic fields which contains spaces /special characters
I do not believe the SOLR or LUCENE syntax allows this You need to get rid of all the spaces in the field name If not, then you will be searching for "short" in the default field and then "name1" in the "name" field. http://wiki.apache.org/solr/SolrQuerySyntax http://lucene.apache.org/java/2_9_2/queryparsersyntax.html On Mon, Mar 8, 2010 at 2:17 PM, JavaGuy84 wrote: > > Hi, > > We have some dynamic fields getting indexed using SOLR. Some of the dynamic > fields contains spaces / special character (something like: short name, > Full > Name etc...). Is there a way to search on these fields (which contains the > spaces etc..). Can someone let me know the filter I need to pass to do this > type of search? > > I tried with short name:name1 --> this didnt work.. > > Thanks, > Barani > -- > View this message in context: > http://old.nabble.com/Search-on-dynamic-fields-which-contains-spaces--special-characters-tp27826147p27826147.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Child entities in document not loading
All It seems like my issue is simply on the concept of child entities. I had to add a second table to my query to pull pricing info. At first, I was putting it in a separate entity. Didn't work, even though I added the fields. When I rewrote my query as It loaded. I'm wondering if there's something I have to activate to make child entities work? Thanks, John On Mon, Mar 8, 2010 at 12:17 PM, John Ament wrote: > Ok - downloaded the binary off of google code and it's loading. The 3 > child entities do not appear as I had suspected. > > Thanks, > > John > > > On Mon, Mar 8, 2010 at 12:12 PM, John Ament wrote: > >> The issue's not about indexing, the issue's about storage. It seems like >> the fields (sections, colors, sizes) are all not being stored, even though >> store=true. >> >> I could not get Luke to work, no. The webstart just hangs at downloading >> 0%. >> >> Thanks, >> >> John >> >> >> On Mon, Mar 8, 2010 at 12:06 PM, Erick Erickson >> wrote: >> >>> Sorry, won't be able to really look till tonight. Did you try Luke? What >>> did >>> it >>> show? >>> >>> One thing I did notice though... >>> >>> field name="sections" type="string" indexed="true" stored="true" >>> multiValued="true"/> >>> >>> string types are not analyzed, so the entire input is indexed as >>> a single token. You might want "text" here >>> >>> Erick >>> >>> On Mon, Mar 8, 2010 at 11:37 AM, John Ament >>> wrote: >>> >>> > Erick, >>> > >>> > I'm sorry, but it's not helping much. I don't see anything on the >>> admin >>> > screen that allows me to browse my index. Even using Luke, my >>> assumption >>> > is >>> > that it's not loading correctly in the index. What parameters can I >>> change >>> > in the logs to make it print out more information? I want to see what >>> the >>> > query is returning I guess. >>> > >>> > Thanks, >>> > >>> > John >>> > >>> > On Mon, Mar 8, 2010 at 11:23 AM, Erick Erickson < >>> erickerick...@gmail.com >>> > >wrote: >>> > >>> > > Try http:///solr/admin. You'll see a >>> bunch >>> > > of links that'll allow you to examine many aspects of your >>> installation. >>> > > >>> > > Additionally, get a copy of Luke (Google Lucene Luke) and point it at >>> > > your index for a detailed look at the index. >>> > > >>> > > Finally, the SOLR log file might give you some clues... >>> > > >>> > > HTH >>> > > Erick >>> > > >>> > > On Mon, Mar 8, 2010 at 10:49 AM, John Ament >>> > wrote: >>> > > >>> > > > Where would I see this? I do believe the fields are not ending up >>> in >>> > the >>> > > > index. >>> > > > >>> > > > Thanks >>> > > > >>> > > > John >>> > > > >>> > > > On Mon, Mar 8, 2010 at 10:34 AM, Erick Erickson < >>> > erickerick...@gmail.com >>> > > > >wrote: >>> > > > >>> > > > > What does the solr admin page show you is actually in your index? >>> > > > > >>> > > > > Luke will also help. >>> > > > > >>> > > > > Erick >>> > > > > >>> > > > > On Mon, Mar 8, 2010 at 10:06 AM, John Ament < >>> my.repr...@gmail.com> >>> > > > wrote: >>> > > > > >>> > > > > > All, >>> > > > > > >>> > > > > > So I think I have my first issue figured out, need to add terms >>> to >>> > > the >>> > > > > > default search. That's fine. >>> > > > > > >>> > > > > > New issue is that I'm trying to load child entities in with my >>> > > entity. >>> > > > > > >>> > > > > > I added the appropriate fields to solrconfig.xml >>> > > > > > >>> > > > > >>> > stored="true" >>> > > > > > multiValued="true"/> >>> > > > > >>> stored="true" >>> > > > > > multiValued="true"/> >>> > > > > >>> stored="true" >>> > > > > > multiValued="true"/> >>> > > > > > >>> > > > > > And I updated my document to match >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > So my expectation is that there will be 3 new fields associated >>> > with >>> > > it >>> > > > > > that >>> > > > > > are multivalued: sizes, colors, and sections. >>> > > > > > >>> > > > > > The full-import seems to work correctly. I get the appropriate >>> > > number >>> > > > of >>> > > > > > documents in my searches. However, sizes, colors and sections >>> all >>> > > come >>> > > > > up >>> > > > > > null (well, I should say they don't come up when I search for >>> > them). >>> > > > > > >>> > > > > > Any ideas on why it won't load these 3 child entities? >>> > > > > > >>> > > > > > Thanks! >>> > > > > > >>> > > > > > John >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >> >> >
Solr Startup CPU Spike
Good afternoon. We have been experiencing an odd issue with one of our Solr nodes. Upon startup or when bringing in a new index we get a CPU spike for 5 minutes or so. I have attached a graph of this spike. During this time simple queries return without a problem but more complex queries do not return. Here are some more details about the instance: Index Size: ~16G Max Heap: 6144M GC Option: -XX:+UseConcMarkSweepGC System Memory: 16G We have a very similar instance to this one but with a much larger index that we are not seeing this sort of issue. Your help is greatly appreciated. Let me know if you need any additional information. Thanks, John -- John Williams System Administrator 37signals <> smime.p7s Description: S/MIME cryptographic signature
Re: Solr Startup CPU Spike
Is this just autowarming? Check your autowarmCount parameters in solrconfig.xml -Yonik http://www.lucidimagination.com On Mon, Mar 8, 2010 at 5:37 PM, John Williams wrote: > Good afternoon. > > We have been experiencing an odd issue with one of our Solr nodes. Upon > startup or when bringing in a new index we get a CPU spike for 5 minutes or > so. I have attached a graph of this spike. During this time simple queries > return without a problem but more complex queries do not return. Here are > some more details about the instance: > > Index Size: ~16G > Max Heap: 6144M > GC Option: -XX:+UseConcMarkSweepGC > System Memory: 16G > > We have a very similar instance to this one but with a much larger index that > we are not seeing this sort of issue. > > Your help is greatly appreciated. Let me know if you need any additional > information. > > Thanks, > John > > -- > John Williams > System Administrator > 37signals > > >
Re: Solr Startup CPU Spike
Yonik, In all cases our "autowarmCount" is set to 0. Also, here is a link to our config. http://pastebin.com/iUgruqPd Thanks, John -- John Williams System Administrator 37signals On Mar 8, 2010, at 4:44 PM, Yonik Seeley wrote: > Is this just autowarming? > Check your autowarmCount parameters in solrconfig.xml > > -Yonik > http://www.lucidimagination.com > > On Mon, Mar 8, 2010 at 5:37 PM, John Williams wrote: >> Good afternoon. >> >> We have been experiencing an odd issue with one of our Solr nodes. Upon >> startup or when bringing in a new index we get a CPU spike for 5 minutes or >> so. I have attached a graph of this spike. During this time simple queries >> return without a problem but more complex queries do not return. Here are >> some more details about the instance: >> >> Index Size: ~16G >> Max Heap: 6144M >> GC Option: -XX:+UseConcMarkSweepGC >> System Memory: 16G >> >> We have a very similar instance to this one but with a much larger index >> that we are not seeing this sort of issue. >> >> Your help is greatly appreciated. Let me know if you need any additional >> information. >> >> Thanks, >> John >> >> -- >> John Williams >> System Administrator >> 37signals >> >> >> smime.p7s Description: S/MIME cryptographic signature
Re: Solr Startup CPU Spike
On Mon, Mar 8, 2010 at 6:07 PM, John Williams wrote: > Yonik, > > In all cases our "autowarmCount" is set to 0. Also, here is a link to our > config. http://pastebin.com/iUgruqPd Weird... on a quick glance, I don't see anything in your config that would cause work to be done on a commit. I expected something like autowarming, or rebuilding a spellcheck index, etc. I assume this is happening even w/o any requests hitting the server? Could it be GC? You could use -verbose:gc or jconsole to check if this corresponds to a big GC (which could naturally hit on an index change). 5 minutes is really excessive though, and I wouldn't expect it on startup. If it's not GC, perhaps the next step is to get some stack traces during the spike (or use a profiler) to figure out where the time is being spent. And verify that the solrconfig.xml shown actually still matches the one you provided. -Yonik http://www.lucidimagination.com > Thanks, > John > > -- > John Williams > System Administrator > 37signals > > On Mar 8, 2010, at 4:44 PM, Yonik Seeley wrote: > >> Is this just autowarming? >> Check your autowarmCount parameters in solrconfig.xml >> >> -Yonik >> http://www.lucidimagination.com >> >> On Mon, Mar 8, 2010 at 5:37 PM, John Williams wrote: >>> Good afternoon. >>> >>> We have been experiencing an odd issue with one of our Solr nodes. Upon >>> startup or when bringing in a new index we get a CPU spike for 5 minutes or >>> so. I have attached a graph of this spike. During this time simple queries >>> return without a problem but more complex queries do not return. Here are >>> some more details about the instance: >>> >>> Index Size: ~16G >>> Max Heap: 6144M >>> GC Option: -XX:+UseConcMarkSweepGC >>> System Memory: 16G >>> >>> We have a very similar instance to this one but with a much larger index >>> that we are not seeing this sort of issue. >>> >>> Your help is greatly appreciated. Let me know if you need any additional >>> information. >>> >>> Thanks, >>> John >>> >>> -- >>> John Williams >>> System Administrator >>> 37signals >>> >>> >>> > >
PDF extraction leads to reversed words
Hi, Posting arabic pdf files to Solr using a web form (to solr/update/extract) get extracted texts and each words displayed in reverse direction(instead of right to left). When perform search against these texts with -always- reversed key-words I get results but reversed. This problem doesn't occur when posting MsWord document. I think the problem come from Tika ! Any clue ? -- elsadek Software Engineer- J2EE / WEB / ESB MULE
Re: [ANN] Zoie Solr Plugin - Zoie Solr Plugin enables real-time update functionality for Apache Solr 1.4+
Too bad it requires integer (long) primary keys... :/ 2010/3/8 Ian Holsman > > I just saw this on twitter, and thought you guys would be interested.. I > haven't tried it, but it looks interesting. > > http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin > > Thanks for the RT Shalin! >
Re: PDF extraction leads to reversed words
I think the problem is that Solr does not include the ICU4J jar, so it won't work with Arabic PDF files. Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your classpath. On Mon, Mar 8, 2010 at 6:30 PM, Abdelhamid ABID wrote: > Hi, > Posting arabic pdf files to Solr using a web form (to solr/update/extract) > get extracted texts and each words displayed in reverse direction(instead of > right to left). > When perform search against these texts with -always- reversed key-words I > get results but reversed. > This problem doesn't occur when posting MsWord document. > I think the problem come from Tika ! > > Any clue ? > > -- > elsadek > Software Engineer- J2EE / WEB / ESB MULE > -- Robert Muir rcm...@gmail.com
Re: which links do i have to follow to understand location based search concepts?
Hi, During indexing its taking localhost and port 8983, index: [echo] Indexing ./data/ [java] ./data/ http://localhost:8983/solr So other case where in solr instance not running ,what may be the reason that solr is not running? (Am new to solr) You mean it has to do nothing with the xml since its giving at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807) [java] at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) [java] at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107) [java] at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) [java] at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) [java] at javax.xml.parsers.SAXParser.parse(SAXParser.java:395) [java] at javax.xml.parsers.SAXParser.parse(SAXParser.java:198) [java] at OSM2Solr.process(OSM2Solr.java:44) [java] at Driver.main(Driver.java:79) [java] Caused by: java.net.ConnectException: Connection refused [java] at java.net.PlainSocketImpl.socketConnect(Native Method) [java] at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) and line 79 in driver.java is numIndexed+=02s.process(new FileInputstream(file)); //where its taking my xml file. Shalin Shekhar Mangar wrote: > > On Mon, Mar 8, 2010 at 6:21 PM, KshamaPai wrote: > >> >> Hi, >> Thank You for explaining it in a simple way. >> The article really helped me to understand the concepts better. >> >> My question is ,Is it necessary that the data what you are indexing in >> spatial example, is to be in the osm format and using facts files? >> In my case,am trying to index data ,that has just lat,longitude and >> related >> news item(just text) in a xml file which looks like this >> >> >> >> >> >> >> >> I have silghtly modified driver.java and other .java files in >> src/main/java >> folder, so that these fields are considered for indexing.(but have >> retained >> geohash,lat_rad,lng_rad as done in spatial example) >> >> But when i do ant index , am getting >> >> Buildfile: build.xml >> >> init: >> >> compile: >> >> index: >> [echo] Indexing ./data/ >> [java] ./data/ http://localhost:8983/solr >> [java] Num args: 2 >> [java] Starting indexing >> [java] Indexing: ./data/final.xml >> [java] Mar 8, 2010 4:40:35 AM >> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry >> [java] INFO: I/O exception (java.net.ConnectException) caught when >> processing request: Connection refused >> > > The "Connection refused" message suggests that your Solr instance is > either > not running or you have given the wrong host/port in your driver. > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://old.nabble.com/which-links-do-i-have-to-follow-to-understand-location-based-search-concepts--tp27811139p27830412.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ commit options
waitFlush=true means that the commit HTTP call waits until everything is sent to disk before it returns. waitSearcher=true means that the commit HTTP call waits until Solr has reloaded the index and is ready to search against it. (For more, study Solr warming up.) Both of these mean that the HTTP call (or curl program or Solrj program) that started the commit, waits until it is done. Other processes doing searches against the index are not blocked. However, the commit may have so much disk activity that the other searches do not proceeed very fast. They are not completely blocked. The commit will take as long as it takes, and your results will appear after that. If you want to time that, use waitFlush=true&waitSearcher=true. On Fri, Mar 5, 2010 at 9:39 PM, gunjan_versata wrote: > > But can anyone explain me the use of these parameters.. I have read upon it.. > what i could not understand was.. if can i set both the params to false, > after how much time will my changes start reflecting? > > -- > View this message in context: > http://old.nabble.com/SolrJ-commit-options-tp27714405p27802041.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lance Norskog goks...@gmail.com
Re: Import database
On 3/8/2010 9:21 PM, Lee Smith wrote: I had same issue with Jetty Adding extra memory resolved my issue ie: java -Xms=512M -Xmx=1024M -jar start.jar Its in the manual, but cant seem to find the link On 8 Mar 2010, at 14:09, Quan Nguyen Anh wrote: Hi, I have started using Solr. I had a problem when I insert a database with 2 million rows . I hav The server encounters error: java.lang.OutOfMemoryError: Java heap space I searched around but can't find the solution. Any hep regarding this will be appreciated. Thanks in advance I tried this solution but the server has the same error. I think that the heap size is not large enough to contain data from mysql.
Re: Import database
On 3/8/2010 11:05 PM, Shawn Heisey wrote: What database are you using? Many of the JDBC drivers try to pull the entire resultset into RAM before feeding it to the application that requested the data. If it's MySQL, I can show you how to fix it. The batchSize parameter below tells it to stream the data rather than buffer it. With other databases, I don't know how to do this. url="jdbc:mysql://[SERVER]:3306/[SCHEMA]?zeroDateTimeBehavior=convertToNull" batchSize="-1" user="[REMOVED]" password="[REMOVED]"/> Shawn On 3/8/2010 7:09 AM, Quan Nguyen Anh wrote: Hi, I have started using Solr. I had a problem when I insert a database with 2 million rows . I hav The server encounters error: java.lang.OutOfMemoryError: Java heap space I searched around but can't find the solution. Any hep regarding this will be appreciated. Thanks in advance I 'm using MySQL. This solution fixed my problem. Thanks for your help .
Re: Import database
On 3/8/2010 11:05 PM, Shawn Heisey wrote: What database are you using? Many of the JDBC drivers try to pull the entire resultset into RAM before feeding it to the application that requested the data. If it's MySQL, I can show you how to fix it. The batchSize parameter below tells it to stream the data rather than buffer it. With other databases, I don't know how to do this. url="jdbc:mysql://[SERVER]:3306/[SCHEMA]?zeroDateTimeBehavior=convertToNull" batchSize="-1" user="[REMOVED]" password="[REMOVED]"/> Shawn On 3/8/2010 7:09 AM, Quan Nguyen Anh wrote: Hi, I have started using Solr. I had a problem when I insert a database with 2 million rows . I hav The server encounters error: java.lang.OutOfMemoryError: Java heap space I searched around but can't find the solution. Any hep regarding this will be appreciated. Thanks in advance I 'm using MySQL. This solution fixed my problem. Thanks for your help .
Re: Can't delete from curl
... curl http://xen1.xcski.com:8080/solrChunk/nutch/select that should be /update, not /select On Sun, Mar 7, 2010 at 4:32 PM, Paul Tomblin wrote: > On Tue, Mar 2, 2010 at 1:22 AM, Lance Norskog wrote: > >> On Mon, Mar 1, 2010 at 4:02 PM, Paul Tomblin wrote: >> > I have a schema with a field name "category" (> > type="string" stored="true" indexed="true"/>). I'm trying to delete >> > everything with a certain value of category with curl:... >> > >> > I send: >> > >> > curl http://localhost:8080/solrChunk/nutch/update -H "Content-Type: >> > text/xml" --data-binary 'category:Banks' >> > >> > Response is: >> > >> > >> > >> > 0> > name="QTime">23 >> > >> > >> > I send >> > >> > curl http://localhost:8080/solrChunk/nutch/update -H "Content-Type: >> > text/xml" --data-binary '' >> > >> > Response is: >> > >> > >> > >> > 0> > name="QTime">1914 >> > >> > >> > but when I go back and query, it shows all the same results as before. >> > >> > Why isn't it deleting? >> >> Do you query with curl also? If you use a web browser, Solr by default >> uses http caching, so your browser will show you the old result of the >> query. >> >> > I think you're right about that. I tried using curl, and it did go to zero. > But now I've got a different problem: sometimes when I try to commit, I get > a NullPointerException: > > > curl http://xen1.xcski.com:8080/solrChunk/nutch/select -H "Content-Type: > text/xml" --data-binary ''Apache Tomcat/6.0.20 - > Error report > HTTP Status 500 - null > > java.lang.NullPointerException > at java.io.StringReader.(StringReader.java:33) > at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:173) > at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) > at org.apache.solr.search.QParser.getQuery(QParser.java:131) > at > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) > at java.lang.Thread.run(Thread.java:619) > type Status > reportmessage null > > java.lang.NullPointerException > at java.io.StringReader. (StringReader.java:33) > at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:173) > at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) > at org.apache.solr.search.QParser.getQuery(QParser.java:131) > at > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > > > -- > http://www.linkedin.com/in/paultomblin > http://careers.stackoverflow.com/ptomblin > -- Lance Norskog goks...@gmail.com
Re: Search on dynamic fields which contains spaces /special characters
I'm starting to learn Soir/Lucene. I'm working on a shared server and have to use a stand alone Java install. Anyone tell me how to install OpenJDK for a shared server account? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Mon, 3/8/10, Israel Ekpo wrote: > From: Israel Ekpo > Subject: Re: Search on dynamic fields which contains spaces /special > characters > To: solr-user@lucene.apache.org > Date: Monday, March 8, 2010, 12:44 PM > I do not believe the SOLR or LUCENE > syntax allows this > > You need to get rid of all the spaces in the field name > > If not, then you will be searching for "short" in the > default field and then > "name1" in the "name" field. > > http://wiki.apache.org/solr/SolrQuerySyntax > > http://lucene.apache.org/java/2_9_2/queryparsersyntax.html > > > On Mon, Mar 8, 2010 at 2:17 PM, JavaGuy84 > wrote: > > > > > Hi, > > > > We have some dynamic fields getting indexed using > SOLR. Some of the dynamic > > fields contains spaces / special character (something > like: short name, > > Full > > Name etc...). Is there a way to search on these fields > (which contains the > > spaces etc..). Can someone let me know the filter I > need to pass to do this > > type of search? > > > > I tried with short name:name1 --> this didnt > work.. > > > > Thanks, > > Barani > > -- > > View this message in context: > > http://old.nabble.com/Search-on-dynamic-fields-which-contains-spaces--special-characters-tp27826147p27826147.html > > Sent from the Solr - User mailing list archive at > Nabble.com. > > > > > > > -- > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the > gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ >
Re: More contextual information in analyser
This is an interesting idea. There are other projects to make the analyzer/filter chain more "porous", or open to outside interaction. A big problem is that queries are analyzed, too. If you want to give the same metadata to the analyzer when doing a query against the field, things get tough. You would need a special query parser to implement your own syntax to do that. However, the analyzer chain in the query phase does not receive the parsed query, so you have to in some way change this. On Mon, Mar 8, 2010 at 2:14 AM, dbejean wrote: > > Hello, > > If I write a custom analyser that accept a specific attribut in the > constructor > > public MyCustomAnalyzer(String myAttribute); > > Is there a way to dynamically send a value for this attribute from Solr at > index time in the XML Message ? > > > > . > > > Obviously, in Sorl shema.xml, the "content" field is associated to my custom > Analyser. > > Thank you. > > Dominique > > -- > View this message in context: > http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lance Norskog goks...@gmail.com
Re: More contextual information in analyser
Isn't this what Lucene/Solr payloads are theoretically for? ie: http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ - Jon On Mar 8, 2010, at 11:15 PM, Lance Norskog wrote: > This is an interesting idea. There are other projects to make the > analyzer/filter chain more "porous", or open to outside interaction. > > A big problem is that queries are analyzed, too. If you want to give > the same metadata to the analyzer when doing a query against the > field, things get tough. You would need a special query parser to > implement your own syntax to do that. However, the analyzer chain in > the query phase does not receive the parsed query, so you have to in > some way change this. > > On Mon, Mar 8, 2010 at 2:14 AM, dbejean wrote: >> >> Hello, >> >> If I write a custom analyser that accept a specific attribut in the >> constructor >> >> public MyCustomAnalyzer(String myAttribute); >> >> Is there a way to dynamically send a value for this attribute from Solr at >> index time in the XML Message ? >> >> >> >>. >> >> >> Obviously, in Sorl shema.xml, the "content" field is associated to my custom >> Analyser. >> >> Thank you. >> >> Dominique >> >> -- >> View this message in context: >> http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Lance Norskog > goks...@gmail.com
Re: HTML encode extracted docs
A Tika integration with the DataImportHandler is in the Solr trunk. With this, you can copy the raw HTML into different fields and process one copy with Tika. If it's just straight HTML, would the HTMLStripCharFilter be good enough? http://www.lucidimagination.com/search/document/CDRG_ch05_5.7.2 On Mon, Mar 8, 2010 at 5:50 AM, Mark Roberts wrote: > I'm uploading .htm files to be extracted - some of these files are "include" > files that have snippets of HTML rather than fully formed html documents. > > solr-cell stores the raw HTML for these items, rather than extracting the > text. Is there any way I can get solr to encode this content prior to storing > it? > > At the moment, I have the problem that when the highlighted snippets are > retrieved via search, I need to parse the snippet and HTML encode the bits of > HTML that where indexed, whilst *not* encoding the bits that where added by > the highlighter, which is messy and time consuming. > > Thanks! Mark, > -- Lance Norskog goks...@gmail.com
Re: [ANN] Zoie Solr Plugin - Zoie Solr Plugin enables real-time update functionality for Apache Solr 1.4+
Solr unique ids can be any type. The QueryElevateComponent complains if the unique id is not a string, but you can comment out the QEC. I have one benchmark test with 2 billion documents with an integer id. Works great. On Mon, Mar 8, 2010 at 5:06 PM, Don Werve wrote: > Too bad it requires integer (long) primary keys... :/ > > 2010/3/8 Ian Holsman > >> >> I just saw this on twitter, and thought you guys would be interested.. I >> haven't tried it, but it looks interesting. >> >> http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin >> >> Thanks for the RT Shalin! >> > -- Lance Norskog goks...@gmail.com
Re: More contextual information in analyser
Yes, payloads should do this. On Mon, Mar 8, 2010 at 8:29 PM, Jon Baer wrote: > Isn't this what Lucene/Solr payloads are theoretically for? > > ie: > http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ > > - Jon > > On Mar 8, 2010, at 11:15 PM, Lance Norskog wrote: > >> This is an interesting idea. There are other projects to make the >> analyzer/filter chain more "porous", or open to outside interaction. >> >> A big problem is that queries are analyzed, too. If you want to give >> the same metadata to the analyzer when doing a query against the >> field, things get tough. You would need a special query parser to >> implement your own syntax to do that. However, the analyzer chain in >> the query phase does not receive the parsed query, so you have to in >> some way change this. >> >> On Mon, Mar 8, 2010 at 2:14 AM, dbejean wrote: >>> >>> Hello, >>> >>> If I write a custom analyser that accept a specific attribut in the >>> constructor >>> >>> public MyCustomAnalyzer(String myAttribute); >>> >>> Is there a way to dynamically send a value for this attribute from Solr at >>> index time in the XML Message ? >>> >>> >>> >>> . >>> >>> >>> Obviously, in Sorl shema.xml, the "content" field is associated to my custom >>> Analyser. >>> >>> Thank you. >>> >>> Dominique >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com > > -- Lance Norskog goks...@gmail.com
Re: PDF extraction leads to reversed words
Is this a mistake in the Tika library collection in the Solr trunk? On Mon, Mar 8, 2010 at 5:15 PM, Robert Muir wrote: > I think the problem is that Solr does not include the ICU4J jar, so it > won't work with Arabic PDF files. > > Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your > classpath. > > On Mon, Mar 8, 2010 at 6:30 PM, Abdelhamid ABID wrote: >> Hi, >> Posting arabic pdf files to Solr using a web form (to solr/update/extract) >> get extracted texts and each words displayed in reverse direction(instead of >> right to left). >> When perform search against these texts with -always- reversed key-words I >> get results but reversed. >> This problem doesn't occur when posting MsWord document. >> I think the problem come from Tika ! >> >> Any clue ? >> >> -- >> elsadek >> Software Engineer- J2EE / WEB / ESB MULE >> > > > > -- > Robert Muir > rcm...@gmail.com > -- Lance Norskog goks...@gmail.com
Re: PDF extraction leads to reversed words
it is an optional dependency of PDFBox. If ICU is available, then it is capable of processing Arabic PDF files. The problem is that Arabic "text" in PDF files is really glyphs (encoded in visual order) and needs to be 'unshaped' with some stuff that isn't in the JDK. If the size of the default ICU jar file is the issue here, we can consider an alternative: The default ICU jar is very large as it includes everything, yet it can be customized to only include what is needed: http://apps.icu-project.org/datacustom/ We did this in lucene for the collation contrib, to shrink the jar about 2MB: http://issues.apache.org/jira/browse/LUCENE-1867 For this use-case, it could be even smaller, as most of the huge size of ICU comes from large CJK collation tables (needed for collation, but not for this Arabic PDF extraction). In reality I don't really like doing this as it might confuse users (e.g. people that want collation, too), and ICU is useful for other things, but if thats what we have to do, we should do it so that Arabic PDF files will work. On Mon, Mar 8, 2010 at 11:53 PM, Lance Norskog wrote: > Is this a mistake in the Tika library collection in the Solr trunk? > > On Mon, Mar 8, 2010 at 5:15 PM, Robert Muir wrote: >> I think the problem is that Solr does not include the ICU4J jar, so it >> won't work with Arabic PDF files. >> >> Try putting ICU4J 3.8 (http://site.icu-project.org/download) in your >> classpath. >> >> On Mon, Mar 8, 2010 at 6:30 PM, Abdelhamid ABID wrote: >>> Hi, >>> Posting arabic pdf files to Solr using a web form (to solr/update/extract) >>> get extracted texts and each words displayed in reverse direction(instead of >>> right to left). >>> When perform search against these texts with -always- reversed key-words I >>> get results but reversed. >>> This problem doesn't occur when posting MsWord document. >>> I think the problem come from Tika ! >>> >>> Any clue ? >>> >>> -- >>> elsadek >>> Software Engineer- J2EE / WEB / ESB MULE >>> >> >> >> >> -- >> Robert Muir >> rcm...@gmail.com >> > > > > -- > Lance Norskog > goks...@gmail.com > -- Robert Muir rcm...@gmail.com
QueryElevationComponent blues
Using Solr 1.4. Was using the standard query handler, but needed the boost by field functionality of qf from dismax. So we altered the query to boost certain phrases against a given field. We were using QueryElevationComponent ("elevator" from solrconfig.xml) for one particular entry we wanted at the top, but because we aren't using a pure q value, elevator never finds a match to boost. We didn't realize it at the time because the record we were elevating eventually became the top response anyway. Recently added a _val_:formula to the q value to juice records based on a value in the record. Now we have need to push a few other records to the top, but we've lost the ability to use elevate.xml to do it. Tried switching to dismax using qf, pf, qs, ps, and bf with a "pure" q value, and debug showed queryBoost with a match and records, but they weren't moved to the top of the result set. What would really help is if there was something for elevator akin to spellcheck.q like elevation.q so I could pass in the actual user phrase while still performing all the other field score boosts in the q parameter. Alternatively, if anyone can explain why I'm running into problems getting QueryElevationComponent to move the results in a dismax query, I'd be very thankful. -- Ryan T. Grange
Re: QueryElevationComponent blues
Maybe some things to try: * make sure your uniqueKey is string field type (ie if using int it will not work) * forceElevation to true (if sorting) - Jon On Mar 9, 2010, at 12:34 AM, Ryan Grange wrote: > Using Solr 1.4. > Was using the standard query handler, but needed the boost by field > functionality of qf from dismax. > So we altered the query to boost certain phrases against a given field. > We were using QueryElevationComponent ("elevator" from solrconfig.xml) for > one particular entry we wanted at the top, but because we aren't using a pure > q value, elevator never finds a match to boost. We didn't realize it at the > time because the record we were elevating eventually became the top response > anyway. > Recently added a _val_:formula to the q value to juice records based on a > value in the record. > Now we have need to push a few other records to the top, but we've lost the > ability to use elevate.xml to do it. > > Tried switching to dismax using qf, pf, qs, ps, and bf with a "pure" q value, > and debug showed queryBoost with a match and records, but they weren't moved > to the top of the result set. > > What would really help is if there was something for elevator akin to > spellcheck.q like elevation.q so I could pass in the actual user phrase while > still performing all the other field score boosts in the q parameter. > Alternatively, if anyone can explain why I'm running into problems getting > QueryElevationComponent to move the results in a dismax query, I'd be very > thankful. > > -- > Ryan T. Grange >
Re: More contextual information in analyser
It is true I need also this metadata at query time. For the moment, I put this extra information at the beginning of the data too be indexed and at the beginning of the query. It works, but I really don't like this. In my case, I need the language of the data to be index and the language of the query. The goal is to dynamically use the correct chain of tokenizers and filters according to the language and so use only one field in my index for all languages. Lance Norskog-2 wrote: > > This is an interesting idea. There are other projects to make the > analyzer/filter chain more "porous", or open to outside interaction. > > A big problem is that queries are analyzed, too. If you want to give > the same metadata to the analyzer when doing a query against the > field, things get tough. You would need a special query parser to > implement your own syntax to do that. However, the analyzer chain in > the query phase does not receive the parsed query, so you have to in > some way change this. > > On Mon, Mar 8, 2010 at 2:14 AM, dbejean wrote: >> >> Hello, >> >> If I write a custom analyser that accept a specific attribut in the >> constructor >> >> public MyCustomAnalyzer(String myAttribute); >> >> Is there a way to dynamically send a value for this attribute from Solr >> at >> index time in the XML Message ? >> >> >> >> . >> >> >> Obviously, in Sorl shema.xml, the "content" field is associated to my >> custom >> Analyser. >> >> Thank you. >> >> Dominique >> >> -- >> View this message in context: >> http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27819298.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Lance Norskog > goks...@gmail.com > > -- View this message in context: http://old.nabble.com/More-contextual-information-in-analyser-tp27819298p27831948.html Sent from the Solr - User mailing list archive at Nabble.com.