Re: problem with facets - out of memory exception
Have you tried to reindex using DocValues? Fields used for faceting are stored on disk and not on ram using the FieldCache. If you have enough memory they will be loaded on the system cache but not on the java heap. This is good for GC too when committing. http://wiki.apache.org/solr/DocValues -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-facets-out-of-memory-exception-tp4107390p4107407.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.6 optimize and field cache question
Not a solution for the short term but sounds like a good use case to migrate to Solr 4.X and use DocValues instead of FieldCache for faceting. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-optimize-and-field-cache-question-tp4076398p4076822.html Sent from the Solr - User mailing list archive at Nabble.com.
Listeners, cores and Similarity
Hey there, I'm testing a custom similarity which loads data from and external file located in solr_home/core_name/conf/. I load data from the file into a Map on the init method of the SimilarityFactory. I would like to reload that Map every time a commit happens or every X hours. To do that I've thought on implementing a custom listener which populates a custom cache (working as the Map) every time a new searcher is opened. The problem is that from the SimilarityFactory or Similarity class I can't access the Solr caches, just have access to the SolrParams. The only way I see to populate the Map outside the Similarity class is making it static but would like to avoid that. Any advice? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Listeners-cores-and-Similarity-tp4085083.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tweaking boosts for more search results variety
This is totally deprecated but maybe can be helpful if you want to re-sort some documents https://issues.apache.org/jira/browse/SOLR-1311 -- View this message in context: http://lucene.472066.n3.nabble.com/Tweaking-boosts-for-more-search-results-variety-tp4088302p4089044.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.3. Grouping vs DeDuplication and Deduplication Use Case
Deduplication uses lucene indexWriter.updateDocument using the signature term. I don't think it's possible as a default feature to choose wich document to index, the "original" should be always the last to be indexed. /IndexWriter.updateDocument Updates a document by first deleting the document(s) containing term and then adding the new document. The delete and then add are atomic as seen by a reader on the same index (flush may happen only after the add)./ With grouping you have all your documents indexed so it gives you more flexibility -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-3-Grouping-vs-DeDuplication-and-Deduplication-Use-Case-tp3294711p3295023.html Sent from the Solr - User mailing list archive at Nabble.com.
Adding a DocSet as a filter from a custom search component
Hey there, I'm wondering if there's a more clean way to to this: I've written a SearchComponent, that runs as last-component. In the prepare method I build a DocSet (SortedIntDocSet) based on if some values of the fieldCache of a determined field accomplish some rules (if rules are accomplished, set the docId to the DocSet). I want to use this DocSet as a filter for the main query. Right now I'm cloning the existent filters of the request (if they exist at all) to a filter list and add mine there. Then add the list to the request Context: ... build myDocSet DocSet ds = rb.req.getSearcher().getDocSet(filtersCloned).andNot(myDocSet); rb.setFilters(null); //you'll see why rb.req.getContext().put("newFilters",ds); Then to apply the DocSet containing all filters, in the QueryCommand process method do: SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand(); if(rb.req.getContext().containsKey("newFilters")){ cmd.setFilter((DocSet)rb.req.getContext().get("newFilters")); } As I've set rb.setFilters(null) I won't have exceptions and it will work. This looks definitely nasty, I would like not to touch the QueryCommand. Any suggestions? -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-DocSet-as-a-filter-from-a-custom-search-component-tp3452449p3452449.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: changing omitNorms on an already built index
As far as I know there's no issue about this. You have to reindex and that's it. In which kind of field are you changing the norms? (You just will see changes in text fields) Using debugQuery=true you can see how norms affect the score (in case you have them not omited) -- View this message in context: http://lucene.472066.n3.nabble.com/changing-omitNorms-on-an-already-built-index-tp3459132p3459169.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Collection Distribution vs Replication in Solr
Replication is easier to manage and a bit faster. See the performance numbers: http://wiki.apache.org/solr/SolrReplication -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-Distribution-vs-Replication-in-Solr-tp3458724p3459178.html Sent from the Solr - User mailing list archive at Nabble.com.
performance sorting multivalued field
hey there! can someone explain me how impacts to have multivalued fields when sorting? I have read in other threads how does it affect when faceting but couldn't find any info of the impact when sorting Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p905943.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: performance sorting multivalued field
I mean sorting the query results, not facets. I am asking because I have added a multivalued field that has as much 10 values. But 70% of the docs has just 1 or 2 fields of this multiValued field. I am not doing faceting. Since I have added the multiValued field, "java old gen" seems to get full more quick and GC are happening more often. I don't see why multiValued can use more memory querying by normal relevance. That's why I think maybe it's sort queries fault... Any explanation or advice? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p906115.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: performance sorting multivalued field
Hey Erik, I am currently sorting by a multiValued. It apears a feature tha't you may not know wich of the fields of the multiValued field makes the document be in that position. This is good for me, I don't care for my tests. What I need to know if there is any performance issue in all of this. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p907502.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can query boosting be used with a custom request handlers?
Maybe this helps: http://wiki.apache.org/solr/SolrPlugins#QParserPlugin -- View this message in context: http://lucene.472066.n3.nabble.com/Can-query-boosting-be-used-with-a-custom-request-handlers-tp884499p912691.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: performance sorting multivalued field
>>Well, sorting requires that all the unique values in the target field >>get loaded into memory That's what I tought, thanks. >>But a larger question is whether what your doing is worthwhile >>even as just a measurement. You say >>"This is good for me, I don't care for my tests". I claim that >>you do care I just like play with things. First checked the behavior of sorting on multiValued field and what I noticed was, let's say you have docs with field called 'num': doc1->num:2;doc2->num:1,num:4;doc3->num:5 Sorting by the field num what you get is: After sorting asc I get: doc2,doc1,doc3. The behavior seems to be always the same (I am not saying it works like that but it's what I've seen in my examples) After seeing that I just decided to check the performance. The point is simply curiosity. -- View this message in context: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p913626.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: anyone use hadoop+solr?
I think there's people using this patch in production: https://issues.apache.org/jira/browse/SOLR-1301 I have tested it myself indexing data from CSV and from HBase and it works properly -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914553.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr with hadoop
I think a good solution could be to use hadoop with SOLR-1301 to build solr shards and then use solr distributed search against these shards (you will have to copy to local from HDFS to search against them) -- View this message in context: http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p914576.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: anyone use hadoop+solr?
Well, the patch consumes the data from a csv. You have to modify the input to use TableInputFormat (I don't remember if it's called exaclty like that) and it will work. Once you've done that, you have to specify as much reducers as shards you want. I know 2 ways to index using hadoop method 1 (solr-1301 & nutch): -Map: just get data from the source and create key-value -Reduce: does the analysis and index the data So, the index is build on the reducer side method 2 (hadoop lucene index contrib) -Map: does analysis and open indexWriter to add docs -Reducer: Merge small indexs build in the map So, indexs are build on the map side method 2 has no good integration with Solr at the moment. In the jira (SOLR-1301) there's a good explanation of the advantages and disadvantages of indexing on the map or reduce side. I recomend you to read with detail all the comments on the jira to know exactly how it works. -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914625.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: anyone use hadoop+solr?
Hi Otis, just for curiosity, wich strategy do you use? Index in the map or reduce side? Do you use it to build shards or a single monolitic index? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p919335.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: performance sorting multivalued field
Thanks, that's very useful info. However can't reproduce the error. I've created and index where all documents have a multivalued date field and each document have a minimum of one value in that field. (most of the docs have 2 or 3). So, the number of un-inverted term instances is greater than the number of documents. *There are lot's of docs with the same value, I mention that because I supose that same value has nothing to do with the number of un-inverted term instances. Never get the error explained here: http://lucene.472066.n3.nabble.com/Different-sort-behavior-on-same-code-td503761.html Could be that solr 1.4 or lucene 2.9.1 handle this avoiding the error? -- View this message in context: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p920464.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: performance sorting multivalued field
>>*There are lot's of docs with the same value, I mention that because I supose that same value has nothing to do with the number of un-inverted term instances. It has to do, I've been able to reproduce teh error by setting different values to each field: HTTP Status 500 - there are more terms than documents in field "date", but it's impossible to sort on tokenized fields java.lang.RuntimeException: there are more terms than documents in field "id", but it's impossible to sort on tokenized fields at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:706)... But, it's already fixed for Lucene 2.9.4, 3.0.3, 3.1, 4.0 versions: https://issues.apache.org/jira/browse/LUCENE-2142 -- View this message in context: http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p921752.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Recommended MySQL JDBC driver
I supose you use BatchSize=-1 to index that amount of data. Up from 5.1.7 connector there's this param: netTimeoutForStreamingResults The default value is 600. Increasing that maybe can help (2400 for example?) -- View this message in context: http://lucene.472066.n3.nabble.com/Recommended-MySQL-JDBC-driver-tp817458p924107.html Sent from the Solr - User mailing list archive at Nabble.com.
ending an app taht uses EmbeddedSolrServer
Hey there, I've done some tests with a custom java app using EmbeddedSolrServer to create an index. It works ok and I am able to build the index but I've noticed after the commit an optimize are done, the app never terminates. How should I end it? Is there any way to tell the EmbeddedSolrServer to close? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/ending-an-app-taht-uses-EmbeddedSolrServer-tp963573p963573.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ending a java app that uses EmbeddedSolrServer
Seems that coreContainer.shoutdown() solves the problem. Anyone doing it in a different way? -- View this message in context: http://lucene.472066.n3.nabble.com/ending-a-java-app-that-uses-EmbeddedSolrServer-tp963573p964013.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: maxMergeDocs and performance tuning
As far as I know, the higher you set the value, the faster the indexing process will be (because more things are kept in memory). But depending on which are your needs, it may not be the best option. If you set a high mergeFactor and you want to optimize the index once the process is done, this optimization process will take longer than if the merge factor was very low. This is because optimization process compacts many segment files. If mergeFactor is lower, there will be less files so optimize will be faster. -- View this message in context: http://lucene.472066.n3.nabble.com/maxMergeDocs-and-performance-tuning-tp1162695p1168480.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JVM GC is very frequent.
http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot-camp-draft/ -- View this message in context: http://lucene.472066.n3.nabble.com/JVM-GC-is-very-frequent-tp1345760p1348065.html Sent from the Solr - User mailing list archive at Nabble.com.
FieldCache.DEFAULT.getInts vs FieldCache.DEFAULT.getStringIndex. Memory usage
I need to load a FieldCache for a field wich is a solr "integer" type and has as maximum 3 digits. Let's say my index has 10M docs. I am wandering what is more optimal and less memory consumig, to load a FieldCache.DEFAUL.getInts or a FieldCache.DEFAULT.getStringIndex. The second one will have a int[] for as many docs as the index have. Additionally will have a String[] for as many unique terms. As I am dealing with numbers, I will have to cast the values of the String[] to deal with them. If I load a FieldCache.DEFAULT.getInts I will have just an int[] with a value of a doc field on each array position. I will be able to deal straight with the ints... in this case will it be more optimal to use this? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/FieldCache-DEFAULT-getInts-vs-FieldCache-DEFAULT-getStringIndex-Memory-usage-tp1348480p1348480.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Null pointer exception when mixing highlighter & shards & q.alt
I noticed that long ago. Fixed it doing in HighlightComponent finishStage: @Override public void finishStage(ResponseBuilder rb) { boolean hasHighlighting = true ; if (rb.doHighlights && rb.stage == ResponseBuilder.STAGE_GET_FIELDS) { Map.Entry[] arr = new NamedList.NamedListEntry[rb.resultIds.size()]; // TODO: make a generic routine to do automatic merging of id keyed data for (ShardRequest sreq : rb.finished) { if ((sreq.purpose & ShardRequest.PURPOSE_GET_HIGHLIGHTS) == 0) continue; for (ShardResponse srsp : sreq.responses) { NamedList hl = (NamedList)srsp.getSolrResponse().getResponse().get("highlighting"); //patch bug if(hl != null) { for (int i=0; ihttp://lucene.472066.n3.nabble.com/Null-pointer-exception-when-mixing-highlighter-shards-q-alt-tp1430353p1431253.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: what differents between SolrCloud and Solr+Hadoop
Well these are pretty different things. SolrCloud is meant to handle distributed search in a more easy way that "raw" solr distributed search. You have to build the shards in your own way. Solr+hadoop is a way to build these shards/indexes in paralel. -- View this message in context: http://lucene.472066.n3.nabble.com/what-differents-between-SolrCloud-and-Solr-Hadoop-tp1463809p1464106.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do you programatically create new cores?
You have to create the core's folder with it's conf inside the Solr home. Once done you can call the create action of the admin handler: http://wiki.apache.org/solr/CoreAdmin#CREATE If you need to dinamically create, start and stop lots of cores there's this patch, but don't know about it's current state: http://wiki.apache.org/solr/LotsOfCores -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-you-programatically-create-new-cores-tp1706487p1718648.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dynamically create new core
To create the core, the folder with the confs must already exist and has to be placed in the proper place (inside the solr home). Once you run the create core action, this core will we added to solr.xml and dinamically loaded. -- View this message in context: http://lucene.472066.n3.nabble.com/Dynamically-create-new-core-tp1827097p1828560.html Sent from the Solr - User mailing list archive at Nabble.com.
Core status uptime and startTime
As far as I know, in the core admin page you can find when was the last time an index had a modification and was comitted checking the lastModified. But? what startTime and uptime mean? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Core-status-uptime-and-startTime-tp1834806p1834806.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding new field after data is already indexed
>> and i index data on the basis of these fields. Now, incase i need to add a new field, is there a way i can >> add the field without corrupting the previous data. Is there any feature which adds a new field with a >> default value to the existing records. You just have to add the new field in the schema.xml to make Solr know about it. All already indexed documents won't have any value into this field but that doesn't break anything. If you want to give a default value you will have to rebuild your index. >> Is there any security mechanism/authorization check to prevent url like >> /admin and /update to only a few users. As far as I know there's no out of the box feature to do that -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-new-field-after-data-is-already-indexed-tp1862575p1862722.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding new field after data is already indexed
>> and i index data on the basis of these fields. Now, incase i need to add a new field, is there a way i can >> add the field without corrupting the previous data. Is there any feature which adds a new field with a >> default value to the existing records. You just have to add the new field in the schema.xml to make Solr know about it. All already indexed documents won't have any value into this field but that doesn't break anything. If you want to give a default value you will have to rebuild your index. >> Is there any security mechanism/authorization check to prevent url like >> /admin and /update to only a few users. As far as I know there's no out of the box feature to do that -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-new-field-after-data-is-already-indexed-tp1862575p1862724.html Sent from the Solr - User mailing list archive at Nabble.com.
about NRTCachingDirectory
I have a doubt about how NRTCachingDirectory works. As far as I've seen, it receives a delegator Directory and caches newly created segments. So, if MMapDirectory use to be the default: 1.- Does NRTCachingDirectory works acting sort of as a wrapper of MMap caching the new segments? 2.- If I have a master/slave setup and deploy a full optimized index with a single segment and the slave is configured with NRTCachingDirectory, will it try to cache that segment (I suppose not)? And let's say I remove the replication, and start adding docs to that slave, creating small segments every 10 minutes, will by default the NRTCachingDirectory start caching this new small segments? And finally, If I set up again the replication, when a full new index with single segment is deployed, how NRTCachingDirectory would behave? Know it's not a typical use case, but would like to know how it behaves in those different situations. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/about-NRTCachingDirectory-tp4025665.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Shard timeouts on large (1B docs) Solr cluster
timeAllowed can be used outside distributed search. It is used by the TimeL¡mitingCollector. When the search time is equal to timeAllowed it will stop searching and will return the results that could find till then. This can be a problem when using incremental indexing. Lucene starts searching from "the bottom" and new docs are inserted on the top, so, timeAllowed could cause that new docs never appear on the search results. -- View this message in context: http://lucene.472066.n3.nabble.com/Shard-timeouts-on-large-1B-docs-Solr-cluster-tp3691229p3713263.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to ignore indexing of duplicated documents?
http://wiki.apache.org/solr/Deduplication -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-ignore-indexing-of-duplicated-documents-tp3814858p3818973.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting on a date field multiple times
http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-on-a-date-field-multiple-times-tp3961282p3961865.html Sent from the Solr - User mailing list archive at Nabble.com.
latest patches and big picture of search grouping
I need to dive into search grouping / field collapsing again. I've seen there are lot's of issues about it now. Can someone point me to the minimum patches I need to run this feature in trunk? I want to see the code of the most optimised version and what's being done in distributed search. I think I need this: https://issues.apache.org/jira/browse/SOLR-2068 https://issues.apache.org/jira/browse/SOLR-2205 https://issues.apache.org/jira/browse/SOLR-2066 But not sure if I am missing anything else. By the way, I think the current implementation of group searching is totally different that what it was before when you could choose normal or adjacent collapse. Can someone give me a quick big picture of the current implementation (I will trace the code anyway, but it's just to get an idea). Is there still a double trip? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/latest-patches-and-big-picture-of-search-grouping-tp2271383p2271383.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need to create dyanamic indexies base on different document workspaces
In case you need to create lots of indexes and register/unregister fast, there is work on the way http://wiki.apache.org/solr/LotsOfCores -- View this message in context: http://lucene.472066.n3.nabble.com/Need-to-create-dyanamic-indexies-base-on-different-document-workspaces-tp2845919p2852410.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange performance behaviour when concurrent requests are done
Any suggestion about this issue?-- View this message in context: http://lucene.472066.n3.nabble.com/Strange-performance-behaviour-when-concurrent-requests-are-done-tp505478p2878758.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strange performance behaviour when concurrent requests are done
That's true. But the degradation is so big. If you use lunch concurrent requests to a web app taht doesn't use Solr the time per request won't degradate that much. For me, it looks more like a synchronized is happening somewhere in Solr or Lucene and is causing this.-- View this message in context: http://lucene.472066.n3.nabble.com/Strange-performance-behaviour-when-concurrent-requests-are-done-tp505478p2878856.html Sent from the Solr - User mailing list archive at Nabble.com.
problem with the new IndexSearcher when snpainstaller (and commit script) happen
Hey there, I've noticed a very odd behaviour with the snapinstaller and commit (using collectionDistribution scripts). The first time I install a new index everything works fine. But when installing a new one, I can't see the new documents. Checking the status page of the core tells me that the index version has changed but numDocs and maxDocs are the same. I have a simple script that get the version form an index reader and this confirms me that that's not true. numDocs and maxDocs are different in both indexs. The index I'm trying to install is a whole new index, generated with mergefactor = 2 and optimized with no compound file. I've tried manually to mv index to index.old and the snapshot.x to index (while tomcat is up) and manually execute: curl http://localhost:8080/trovit_solr/coreA/update?commit=true -H "Content-Type: text/xml" But the same is happening. Checking the logs I can see that apparently everything is fine. New searcher is registered and warming is properly done to it. I would think that the problem is with some reference opening the index searcher. But the fact that the indexVersion changes but the numDocs and maxDocs dont' makes me understand nothing. If I reload the core, numDocs and maxDocs changes and everything is fine. Any idea what could be happening here? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3066902.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen
Test are done on Solr 1.4 The simplest way to reproduce my problem is having 2 indexes and a Solr box with just one core. Both index must have been created with the same schema. 1- Remove the index dir of the core and start the server (core is up with an empty index) 2- check status page of the core (version,numDocs,maxDocs version should be X and numDocs,maxDocs zero) 3- mv index to index.old 4- mv your folderA (wich contains an index) to index 5- execute curl http://localhost:8080/solr/coreA/update?commit=true -H "Content-Type: text/xml" * Here the log shows me that commit has been executed and new IndexSearcher has been registered and proper warming has been done. 6- Check the core status page (here all has changed: version,numDocs,maxDocs) If now I repeat steps 3,4,5 (in this case using folderB with another index), when I do step 6, indexVersion has changed but numDocs and maxDocs stay the same, which I can't understand in any way. (opening the index with my script shows me that are not). I've ended up doing this test after noticing the problem before with the snapinstaller and commit. -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3067042.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen
I don't know if this could have something to do with the problem but some of the files of the indexes have same size and name (in all the index but not in the empty one). I have also realized that when moving back to the empty index and committing, numDocs and maxDocs change. Once I'm with the empty index, if I move to another one it works too. The problem happens when moving to a none empty index to another none empty index. That's because I think that the name and size of some files could have something to do with the problem: ./index.1: total 702024 drwxr-sr-x 12 marc admin408 15 Jun 11:01 . drwxr-xr-x 10 marc admin340 15 Jun 16:04 .. -rw-r--r-- 1 marc admin 269347737 15 Jun 10:57 _3.fdt -rw-r--r-- 1 marc admin2067804 15 Jun 10:57 _3.fdx -rw-r--r-- 1 marc admin463 15 Jun 10:57 _3.fnm -rw-r--r-- 1 marc admin 40372030 15 Jun 10:59 _3.frq -rw-r--r-- 1 marc admin1033904 15 Jun 10:59 _3.nrm -rw-r--r-- 1 marc admin 27021337 15 Jun 11:00 _3.prx -rw-r--r-- 1 marc admin 234891 15 Jun 11:00 _3.tii -rw-r--r-- 1 marc admin 19330416 15 Jun 11:01 _3.tis -rw-r--r-- 1 marc admin 20 15 Jun 11:01 segments.gen -rw-r--r-- 1 marc admin298 15 Jun 11:01 segments_2 ./index.2: total 701296 drwxr-sr-x 12 marc admin408 15 Jun 11:11 . drwxr-xr-x 10 marc admin340 15 Jun 16:04 .. -rw-r--r-- 1 marc admin 269044254 15 Jun 11:09 _3.fdt -rw-r--r-- 1 marc admin2068116 15 Jun 11:09 _3.fdx -rw-r--r-- 1 marc admin463 15 Jun 11:09 _3.fnm -rw-r--r-- 1 marc admin 40320465 15 Jun 11:10 _3.frq -rw-r--r-- 1 marc admin1034060 15 Jun 11:10 _3.nrm -rw-r--r-- 1 marc admin 26967519 15 Jun 11:11 _3.prx -rw-r--r-- 1 marc admin 235895 15 Jun 11:11 _3.tii -rw-r--r-- 1 marc admin 19372446 15 Jun 11:11 _3.tis -rw-r--r-- 1 marc admin 20 15 Jun 11:11 segments.gen -rw-r--r-- 1 marc admin298 15 Jun 11:11 segments_2 ./index.empty: total 16 drwxr-xr-x 4 marc admin 136 15 Jun 10:45 . drwxr-xr-x 10 marc admin 340 15 Jun 16:04 .. -rw-r--r-- 1 marc admin 20 15 Jun 10:45 segments.gen -rw-r--r-- 1 marc admin 32 15 Jun 10:45 segments_1 -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3067466.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen
I have some more info! I've build another index bigger than the others so names of the files are not the same. This way, if I move from any of the other index to the bigger one or vicevera it works (I can see the cahnges in the version, numDocs and maxDocs)! So, I thing it is related to the name of the files. Maybe the server gets confused with the pointers of the older index files or something like that? The bigger index looks like: ./index.big: total 4181088 drwxr-xr-x 12 marc admin 408 15 Jun 16:46 . drwxr-xr-x 11 marc admin 374 15 Jun 16:48 .. -rw-r--r-- 1 marc admin 1666038160 15 Jun 16:43 _4.fdt -rw-r--r-- 1 marc admin 9178780 15 Jun 16:43 _4.fdx -rw-r--r-- 1 marc admin 477 15 Jun 16:43 _4.fnm -rw-r--r-- 1 marc admin 232687972 15 Jun 16:44 _4.frq -rw-r--r-- 1 marc admin 4589392 15 Jun 16:44 _4.nrm -rw-r--r-- 1 marc admin 161931683 15 Jun 16:45 _4.prx -rw-r--r-- 1 marc admin 824985 15 Jun 16:45 _4.tii -rw-r--r-- 1 marc admin65438631 15 Jun 16:45 _4.tis -rw-r--r-- 1 marc admin 20 15 Jun 16:45 segments.gen -rw-r--r-- 1 marc admin 298 15 Jun 16:45 segments_2 -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3067657.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen [SOLVED]
I've found the problem in case someone is interested. It's because of the indexReader.reopen(). If it is enabled, when opening a new searcher due to the commit, this code is executed (in SolrCore.getSearcher(boolean forceNew, boolean returnSearcher, final Future[] waitSearcher)): ... if (newestSearcher != null && solrConfig.reopenReaders && indexDirFile.equals(newIndexDirFile)) { IndexReader currentReader = newestSearcher.get().getReader(); IndexReader newReader = currentReader.reopen(); if (newReader == currentReader) { currentReader.incRef(); } tmp = new SolrIndexSearcher(this, schema, "main", newReader, true, true); } else { IndexReader reader = getIndexReaderFactory().newReader(getDirectoryFactory().open(newIndexDir), true); tmp = new SolrIndexSearcher(this, schema, "main", reader, true, true); } ... If the names of the segments haven't changed, IndexReader.reopen thinks that they haven't actually changed (but they have in my case as index files have same name but have different docs at the same time) so instead of opening new reader for the segments, it gives back the same one and changes can't be seen by the new IndexSearcher. Knowing that performance gets worse, disabling reopen in solrconfig.xml solves the problem (and is still better performance that reloading the whole core). Does someone knows that this still happen on Lucene 3.2? -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-the-new-IndexSearcher-when-snpainstaller-and-commit-script-happen-tp3066902p3068956.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: embeded solrj doesn't refresh index
Are u indexing with full import? In case yes and the resultant index has similar num of docs (that the one you had before) try setting reopenReaders to false in solrconfig.xml * You have to send the comit, of course. -- View this message in context: http://lucene.472066.n3.nabble.com/embeded-solrj-doesn-t-refresh-index-tp3184321p3190892.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boost documents based on the number of their fields
You have different options here. You can give more boost at indexing time to the documents that have set the fields you want. For this to take effect you will have to reindex and set omitNorms="false" to the fields you are going to search. This same concept can be applied to boost single fields instead of whole document boost. Another option would be to use boosting queries at search time such as: bq=video:[* TO *]^100 (this gives more boost to the documents that have whatever value in video field). The second one is much easy to play with as you don't have to reindex every time you change a value. On the other said you pay the performance penalty of running one extra query. -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-documents-based-on-the-number-of-their-fields-tp3266875p3267628.html Sent from the Solr - User mailing list archive at Nabble.com.
offsets issues with multiword synonyms since LUCENE_33
Has someone noticed this problem and solved it somehow? (without using LUCENE_33 in the solrconfig.xml) https://issues.apache.org/jira/browse/LUCENE-3668 Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/offsets-issues-with-multiword-synonyms-since-LUCENE-33-tp4001195.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: offsets issues with multiword synonyms since LUCENE_33
Well an example would be: synonyms.txt: huge,big size The I have the docs: 1- The huge fox attacks first 2- The big size fox attacks first Then if I query for huge, the highlights for each document are: 1- The huge fox attacks first 2- The big size fox attacks first The analyzer looks like this: fieldType name="sy_text" class="solr.TextField" positionIncrementGap="100"> This was working with a previous version of Solr (couldn't make it work with 3.6, 4-alpha nor 4-beta). -- View this message in context: http://lucene.472066.n3.nabble.com/offsets-issues-with-multiword-synonyms-since-LUCENE-33-tp4001195p4001213.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FieldCollapsing: Two response elements returned?
That's provably because you are using both the CollpaseComponent and the QueryComponent. I think the 2 or 3 last patches allow full replacement of QueryComponent.You shoud just replace: for: This will sort your problem and make response times faster. Jay Hill wrote: > > I'm doing some testing with field collapsing, and early results look good. > One thing seems odd to me however. I would expect to get back one block of > results, but I get two - the first one contains the collapsed results, the > second one contains the full non-collapsed results: > > ... > ... > > This seems somewhat confusing. Is this intended or is this a bug? > > Thanks, > -Jay > > -- View this message in context: http://www.nabble.com/FieldCollapsing%3A-Two-response-elements-returned--tp24690426p24693960.html Sent from the Solr - User mailing list archive at Nabble.com.
update some index documents after indexing process is done with DIH
Hey there, I would like to be able to do something like: After the indexing process is done with DIH I would like to open an indexreader, iterate over all docs, modify some of them depending on others and delete some others. I can easy do this directly coding with lucene but would like to know if there's a way to do it with Solr using SolrDocument or SolrInputDocument classes. I have thougth in using SolrJ or DIH listener onImportEnd but not sure if I can get an IndexReader in there. Any advice? Thanks in advance -- View this message in context: http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24695947.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: update some index documents after indexing process is done with DIH
Ok, but if I handle it in a newSearcher listener it will be executed every time I reload a core, isn't it? The thing is that I want to use an IndexReader to load in a HashMap some doc fields of the index and depending of the values of some field docs modify other docs. Its very memory consuming (I have tested it with a simple lucene script). Thats why I wanted to do it just after the indexing process. My ideal case would be to do it in the commit function of DirectUpdatehandler2.java just before writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want to mess that code... so trying to find out the best way to do that as a plugin instead of a hack as possible. Thanks in advance Noble Paul നോബിള് नोब्ळ्-2 wrote: > > It is best handled as a 'newSearcher' listener in solrconfig.xml. > onImportEnd is invoked before committing > > On Tue, Jul 28, 2009 at 3:13 PM, Marc Sturlese > wrote: >> >> Hey there, >> I would like to be able to do something like: After the indexing process >> is >> done with DIH I would like to open an indexreader, iterate over all docs, >> modify some of them depending on others and delete some others. I can >> easy >> do this directly coding with lucene but would like to know if there's a >> way >> to do it with Solr using SolrDocument or SolrInputDocument classes. >> I have thougth in using SolrJ or DIH listener onImportEnd but not sure if >> I >> can get an IndexReader in there. >> Any advice? >> Thanks in advance >> -- >> View this message in context: >> http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24695947.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24696872.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: update some index documents after indexing process is done with DIH
That really sounds the best way to reach my goal. How could I invoque a listener from the newSearcher?Would be something like: solr 0 10 rocks 0 10 static newSearcher warming query from solrconfig.xml And MyCustomListener would be the class who open the reader: RefCounted searchHolder = null; try { searchHolder = dataImporter.getCore().getSearcher(); IndexReader reader = searchHolder.get().getReader(); //Here I iterate over the reader doing docuemnt modifications } finally { if (searchHolder != null) searchHolder.decref(); } } catch (Exception ex) { LOG.info("error"); } Finally, to access to documents and add fields to some of them, I have thought in using SolrDocument classes. Can you please point me where something similar is done in solr source (I mean creation of SolrDocuemnts and conversion of them to proper lucene docuements). Does this way for reaching the goal makes sense? Thanks in advance Noble Paul നോബിള് नोब्ळ्-2 wrote: > > when a core is reloaded the event fired is firstSearcher. newSearcher > is fired when a commit happens > > > On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlese > wrote: >> >> Ok, but if I handle it in a newSearcher listener it will be executed >> every >> time I reload a core, isn't it? The thing is that I want to use an >> IndexReader to load in a HashMap some doc fields of the index and >> depending >> of the values of some field docs modify other docs. Its very memory >> consuming (I have tested it with a simple lucene script). Thats why I >> wanted >> to do it just after the indexing process. >> >> My ideal case would be to do it in the commit function of >> DirectUpdatehandler2.java just before >> writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want >> to >> mess that code... so trying to find out the best way to do that as a >> plugin >> instead of a hack as possible. >> >> Thanks in advance >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> It is best handled as a 'newSearcher' listener in solrconfig.xml. >>> onImportEnd is invoked before committing >>> >>> On Tue, Jul 28, 2009 at 3:13 PM, Marc Sturlese >>> wrote: >>>> >>>> Hey there, >>>> I would like to be able to do something like: After the indexing >>>> process >>>> is >>>> done with DIH I would like to open an indexreader, iterate over all >>>> docs, >>>> modify some of them depending on others and delete some others. I can >>>> easy >>>> do this directly coding with lucene but would like to know if there's a >>>> way >>>> to do it with Solr using SolrDocument or SolrInputDocument classes. >>>> I have thougth in using SolrJ or DIH listener onImportEnd but not sure >>>> if >>>> I >>>> can get an IndexReader in there. >>>> Any advice? >>>> Thanks in advance >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24695947.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>> >>>> >>> >>> >>> >>> -- >>> - >>> Noble Paul | Principal Engineer| AOL | http://aol.com >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24696872.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24697751.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: update some index documents after indexing process is done with DIH
>From the newSearcher(..) of a CustomEventListener which extends of AbstractSolrEventListener can access to SolrIndexSearcher and all core properties but can't get a SolrIndexWriter. Do you now how can I get from there a SolrIndexWriter? This way I would be able to modify the documents (I need to modify them depending on values of other documents, that's why I can't do it with DIH delta-import). Thanks in advance Noble Paul നോബിള് नोब्ळ्-2 wrote: > > On Tue, Jul 28, 2009 at 5:17 PM, Marc Sturlese > wrote: >> >> That really sounds the best way to reach my goal. How could I invoque a >> listener from the newSearcher?Would be something like: >> >> >> solr 0 > name="rows">10 >> rocks 0 > name="rows">10 >> static newSearcher warming query from >> solrconfig.xml >> >> >> >> >> And MyCustomListener would be the class who open the reader: >> >> RefCounted searchHolder = null; >> try { >> searchHolder = dataImporter.getCore().getSearcher(); >> IndexReader reader = searchHolder.get().getReader(); >> >> //Here I iterate over the reader doing docuemnt modifications >> >> } finally { >> if (searchHolder != null) searchHolder.decref(); >> } >> } catch (Exception ex) { >> LOG.info("error"); >> } > > you may not be able to access the DIH API from a newSearcher event . > But the API would give you the searcher directly as a method > parameter. >> >> Finally, to access to documents and add fields to some of them, I have >> thought in using SolrDocument classes. Can you please point me where >> something similar is done in solr source (I mean creation of >> SolrDocuemnts >> and conversion of them to proper lucene docuements). >> >> Does this way for reaching the goal makes sense? >> >> Thanks in advance >> >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> when a core is reloaded the event fired is firstSearcher. newSearcher >>> is fired when a commit happens >>> >>> >>> On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlese >>> wrote: >>>> >>>> Ok, but if I handle it in a newSearcher listener it will be executed >>>> every >>>> time I reload a core, isn't it? The thing is that I want to use an >>>> IndexReader to load in a HashMap some doc fields of the index and >>>> depending >>>> of the values of some field docs modify other docs. Its very memory >>>> consuming (I have tested it with a simple lucene script). Thats why I >>>> wanted >>>> to do it just after the indexing process. >>>> >>>> My ideal case would be to do it in the commit function of >>>> DirectUpdatehandler2.java just before >>>> writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want >>>> to >>>> mess that code... so trying to find out the best way to do that as a >>>> plugin >>>> instead of a hack as possible. >>>> >>>> Thanks in advance >>>> >>>> >>>> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>>>> >>>>> It is best handled as a 'newSearcher' listener in solrconfig.xml. >>>>> onImportEnd is invoked before committing >>>>> >>>>> On Tue, Jul 28, 2009 at 3:13 PM, Marc >>>>> Sturlese >>>>> wrote: >>>>>> >>>>>> Hey there, >>>>>> I would like to be able to do something like: After the indexing >>>>>> process >>>>>> is >>>>>> done with DIH I would like to open an indexreader, iterate over all >>>>>> docs, >>>>>> modify some of them depending on others and delete some others. I can >>>>>> easy >>>>>> do this directly coding with lucene but would like to know if there's >>>>>> a >>>>>> way >>>>>> to do it with Solr using SolrDocument or SolrInputDocument classes. >>>>>> I have thougth in using SolrJ or DIH listener onImportEnd but not >>>>>> sure >>>>>> if >>>>>> I >>>>>> can get an IndexReader in there. >>>>>> Any advice? >>>>>> Thanks in advance >>>>>> -
Re: update some index documents after indexing process is done with DIH
Hoss I see what you mean. I am trying to implement a CustomUpdateProcessor checking out here: http://wiki.apache.org/solr/UpdateRequestProcessor What is confusing me now is that I have to implement my logic in processComit as you said: >>you'll still need the "double commit" (once so you can see the >>main changes, and once so the rest of the world can see your >>modifications) but you can execute them both directly in your >>processCommit(CommitUpdateCommand) I have noticed that in the processAdd you have acces to the concrete SolrInpuntDocument you are going to add: SolrInputDocument doc = cmd.getSolrInputDocument(); But in processCommit, having access to the core I can get the IndexReader but I still don't know how to get the IndexWriter and SolrInputDocuments in there. My idea is to do something like: @Override public void processCommit(CommitUpdateCommand cmd) throws IOException { //first commit that show me modification //open and iterate over the reader and create solrDocuments list //close reader //openwriter and update the docs in the list //close writer and second commit that shows my changes to the world if (next != null) next.processCommit(cmd); } As I understood the process, the commitCommand will be sent to the DirectUpdateHandler2. that will proper do the commit via UpdateRequestProcessor. Am I in the right way? I haven't dealed with CustomUpdateProcessor for doing something after a commit is executed so I am a bit confused... Thanks in advance. hossman wrote: > > > This thread all sounds really kludgy ... among other things the > newSearcher listener is going to need to some how keep track of when it > was called as a result of a "real" commit, vs when it was called as the > result of a commit it itself triggered to make changes. > > wouldn't an easier place to implement this logic be in an UpdateProcessor? > you'll still need the "double commit" (once so you can see the > main changes, and once so the rest of the world can see your > modifications) but you can execute them both directly in your > processCommit(CommitUpdateCommand) method (so you don't have to worry > about being able to tell them apart) > > : Date: Thu, 30 Jul 2009 10:14:16 +0530 > : From: > : > =?UTF-8?B?Tm9ibGUgUGF1bCDgtKjgtYvgtKzgtL/gtLPgtY3igI0gIOCkqOCli+CkrOCljeCk > : s+CljQ==?= > : Reply-To: solr-user@lucene.apache.org, noble.p...@gmail.com > : To: solr-user@lucene.apache.org > : Subject: Re: update some index documents after indexing process is done > with > : DIH > : > : If you make your EventListener implements SolrCoreAware you can get > : hold of the core on inform. use that to get hold of the > : SolrIndexWriter > : > : On Wed, Jul 29, 2009 at 9:20 PM, Marc Sturlese > wrote: > : > > : > From the newSearcher(..) of a CustomEventListener which extends of > : > AbstractSolrEventListener can access to SolrIndexSearcher and all > core > : > properties but can't get a SolrIndexWriter. Do you now how can I get > from > : > there a SolrIndexWriter? This way I would be able to modify the > documents (I > : > need to modify them depending on values of other documents, that's why > I > : > can't do it with DIH delta-import). > : > Thanks in advance > : > > : > > : > Noble Paul നോബിള് नोब्ळ्-2 wrote: > : >> > : >> On Tue, Jul 28, 2009 at 5:17 PM, Marc > Sturlese > : >> wrote: > : >>> > : >>> That really sounds the best way to reach my goal. How could I > invoque a > : >>> listener from the newSearcher?Would be something like: > : >>> > : >>> > : >>> solr 0 : >>> name="rows">10 > : >>> rocks 0 > : >>> name="rows">10 > : >>> static newSearcher warming query from > : >>> solrconfig.xml > : >>> > : >>> > : >>> > : >>> > : >>> And MyCustomListener would be the class who open the reader: > : >>> > : >>> RefCounted searchHolder = null; > : >>> try { > : >>> searchHolder = dataImporter.getCore().getSearcher(); > : >>> IndexReader reader = searchHolder.get().getReader(); > : >>> > : >>> //Here I iterate over the reader doing docuemnt > modifications > : >>> > : >>> } finally { > : >>> if (searchHolder != null) searchHolder.decref(); > : >>> } > : >>>
Re: update some index documents after indexing process is done with DIH
: If you make your EventListener implements SolrCoreAware you can get : hold of the core on inform. use that to get hold of the : SolrIndexWriter Implementing SolrCoreAware I can get hold of the core and easy get hold of A SolrIndexSearcher and so a reader. But I can't see the way to get hold of SolrIndexWriter just holding core... Marc Sturlese wrote: > > Hey there, > I would like to be able to do something like: After the indexing process > is done with DIH I would like to open an indexreader, iterate over all > docs, modify some of them depending on others and delete some others. I > can easy do this directly coding with lucene but would like to know if > there's a way to do it with Solr using SolrDocument or SolrInputDocument > classes. > I have thougth in using SolrJ or DIH listener onImportEnd but not sure if > I can get an IndexReader in there. > Any advice? > Thanks in advance > -- View this message in context: http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24755320.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is negative boost possible?
:>the only way to "negative boost" is to "positively boost" the inverse... :> :> (*:* -field1:value_to_penalize)^10 This will do the job aswell as bq supports pure negative queries (at least in trunk): bq=-field1:value_to_penalize^10 http://wiki.apache.org/solr/SolrRelevancyFAQ#head-76e53db8c5fd31133dc3566318d1aad2bb23e07e hossman wrote: > > > : Use decimal figure less than 1, e.g. 0.5, to express less importance. > > but that's stil la positive boost ... it still increases the scores of > documents that match. > > the only way to "negative boost" is to "positively boost" the inverse... > > (*:* -field1:value_to_penalize)^10 > > : > I am looking for a way to assign negative boost to a term in Solr > query. > : > Our use scenario is that we want to boost matching documents that are > : > updated recently and penalize those that have not been updated for a > long > : > time. There are other terms in the query that would affect the scores > as > : > well. For example we construct a query similar to this: > : > > : > *:* field1:value1^2 field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS TO > *]^5 > : > lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3 > : > > : > I notice it's not possible to simply use a negative boosting factor in > the > : > query. Is there any way to achieve such result? > : > > : > Regards, > : > Shi Quan He > : > > : > > > > > -Hoss > > > -- View this message in context: http://www.nabble.com/Is-negative-boost-possible--tp25025775p25039059.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Remove data from index
As far as I know you can not do that with DIH. What size is your index? Probably the best you can do is index from scratch again with full-import. clico wrote: > > I hope it could be a solution. > > But I think I understood that u can use deletePkQuery like this > > "select document_id from table_document where statusDeleted= 'Y'" > > In my case I have no status like "statusDeleted". > > The request I would like to write is > > "Delete from my solr Index the id that are no longer present in my > table_document" > > With Lucene I had a way to do that : > open IndexReader, > for each lucene document : check in table_document and remove in lucene > index if document is no longer present in the table > > > > > -- View this message in context: http://www.nabble.com/Remove-data-from-index-tp25063736p25063986.html Sent from the Solr - User mailing list archive at Nabble.com.
Optimizing a query to sort results alphabetically for a determinated field
Hey there, I need to sort my query results alphabetically for a determinated field called "town". This field is analyzed with a KeywordAnalyzer and isn't multiValued. Add that some docs doesn't doesn'h have this field. Doing just: http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town asc Will give me back the results sorted alphabetically but will put the docs that doesn't have this field (town) at the begining. I want them at the end or I want them not to apear. This query solves the problem: http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town asc&fq=town:[a TO z] But applying this filter: fq=town:[a TO z] is definitely not good in terms of memory, speed and clauses... Is there any way to do something similar but with a more optimized query? Thanks in advance! -- View this message in context: http://www.nabble.com/Optimizing-a-query-to-sort-results-alphabetically-for-a-determinated-field-tp25113379p25113379.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Optimizing a query to sort results alphabetically for a determinated field
Yes but I thought it was just for sortable fields: sint,sfloat,sdouble,slong. Can I apply "sortMissingLast"to text fields analyzed with KeywordAnalyzer? Constantijn Visinescu wrote: > > There's a "sortMissingLast" true/false property that you can set on your > fielType definitions in the schema. > > On Mon, Aug 24, 2009 at 11:58 AM, Marc Sturlese > wrote: > >> >> Hey there, I need to sort my query results alphabetically for a >> determinated >> field called "town". This field is analyzed with a KeywordAnalyzer and >> isn't >> multiValued. Add that some docs doesn't doesn'h have this field. >> Doing just: >> >> >> http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town >> asc >> >> Will give me back the results sorted alphabetically but will put the docs >> that doesn't have this field (town) at the begining. >> I want them at the end or I want them not to apear. This query solves the >> problem: >> >> >> http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town >> asc&fq=town:[a<http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town%0Aasc&fq=town:%5Ba>TO >> z] >> >> But applying this filter: fq=town:[a TO z] is definitely not good in >> terms >> of memory, speed and clauses... >> Is there any way to do something similar but with a more optimized query? >> Thanks in advance! >> >> -- >> View this message in context: >> http://www.nabble.com/Optimizing-a-query-to-sort-results-alphabetically-for-a-determinated-field-tp25113379p25113379.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Optimizing-a-query-to-sort-results-alphabetically-for-a-determinated-field-tp25113379p25113637.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Optimizing a query to sort results alphabetically for a determinated field
It just worked. Thanks a lot! Good to know sortMissingLast works not just in sortable fields Constantijn Visinescu wrote: > > not 100% sure but the example schema has: > omitNorms="true"/> > > So i'd say give it a go and see what happens ;) > > On Mon, Aug 24, 2009 at 12:24 PM, Marc Sturlese > wrote: > >> >> Yes but I thought it was just for sortable fields: >> sint,sfloat,sdouble,slong. >> Can I apply "sortMissingLast"to text fields analyzed with >> KeywordAnalyzer? >> >> Constantijn Visinescu wrote: >> > >> > There's a "sortMissingLast" true/false property that you can set on >> your >> > fielType definitions in the schema. >> > >> > On Mon, Aug 24, 2009 at 11:58 AM, Marc Sturlese >> > wrote: >> > >> >> >> >> Hey there, I need to sort my query results alphabetically for a >> >> determinated >> >> field called "town". This field is analyzed with a KeywordAnalyzer and >> >> isn't >> >> multiValued. Add that some docs doesn't doesn'h have this field. >> >> Doing just: >> >> >> >> >> >> >> http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town >> >> asc >> >> >> >> Will give me back the results sorted alphabetically but will put the >> docs >> >> that doesn't have this field (town) at the begining. >> >> I want them at the end or I want them not to apear. This query solves >> the >> >> problem: >> >> >> >> >> >> >> http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town >> >> asc&fq=town:[a< >> http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows=10&indent=on&sort=town%0Aasc&fq=town:%5Ba >> >TO >> >> z] >> >> >> >> But applying this filter: fq=town:[a TO z] is definitely not good in >> >> terms >> >> of memory, speed and clauses... >> >> Is there any way to do something similar but with a more optimized >> query? >> >> Thanks in advance! >> >> >> >> -- >> >> View this message in context: >> >> >> http://www.nabble.com/Optimizing-a-query-to-sort-results-alphabetically-for-a-determinated-field-tp25113379p25113379.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Optimizing-a-query-to-sort-results-alphabetically-for-a-determinated-field-tp25113379p25113637.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Optimizing-a-query-to-sort-results-alphabetically-for-a-determinated-field-tp25113379p25114941.html Sent from the Solr - User mailing list archive at Nabble.com.
Best way to do a lucene matchAllDocs not using q.alt=*:*
Hey there, I need a query to get the total number of documents in my index. I can get if I do this using DismaxRequestHandler: q.alt=*:*&facet=false&hl=false&rows=0 I have noticed this query is very memory consuming. Is there any more optimized way in trunk to get the total number of documents of my index? Thanks in advanced -- View this message in context: http://www.nabble.com/Best-way-to-do-a-lucene-matchAllDocs-not-using-q.alt%3D*%3A*-tp25277585p25277585.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH applying variosu transformers to a field
Hey there, I am using DIH to import a db table and and have writed a custom transformer following the example: package foo; public class CustomTransformer1{ public Object transformRow(Map row) { String artist = row.get("artist"); if (artist != null) row.put("ar", artist.trim()); return row; } } I'm wondering if I write a second transformer and put it in data-config.xml after CustomTransformer1. Will the input value of the row in the second transformer be the result of the transformed row in the CustomTransfomer1 or will be the original row value? I would just need to index the result of transformer2 (whose input would be the output of transformer1) config woul look like: https://issues.apache.org/jira/browse/SOLR-1033 ) but not sure if it's what I ask for Thanks in advance -- View this message in context: http://www.nabble.com/DIH-applying-variosu-transformers-to-a-field-tp25342449p25342449.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Trunk Heap Space Issues
Doing this you will send the dump where you want: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/the/dump Then you can open the dump with jhat: jhat /path/to/the/dump/your_stack.bin It provably will give you a OutOfMemortException due to teh large size ofthe dump. In case you can give you more momory to your JVM do: jhat -J-mx2000m my_stack.bin Then you can analyze the heap at the OutOfMemoryMoment: http://localhost:7000 Let me know if you find something please. I experienced the same a few ago and could't fix the problem Jeff Newburn wrote: > > Added the parameter and it didn't seem to dump when it hit the gc limit > error. Any other thoughts? > > -- > Jeff Newburn > Software Engineer, Zappos.com > jnewb...@zappos.com - 702-943-7562 > > >> From: Bill Au >> Reply-To: >> Date: Thu, 1 Oct 2009 12:16:53 -0400 >> To: >> Subject: Re: Solr Trunk Heap Space Issues >> >> You probably want to add the following command line option to java to >> produce a heap dump: >> >> -XX:+HeapDumpOnOutOfMemoryError >> >> Then you can use jhat to see what's taking up all the space in the heap. >> >> Bill >> >> On Thu, Oct 1, 2009 at 11:47 AM, Mark Miller >> wrote: >> >>> Jeff Newburn wrote: I am trying to update to the newest version of solr from trunk as of May 5th. I updated and compiled from trunk as of yesterday (09/30/2009). >>> When I try to do a full import I am receiving a GC heap error after changing nothing in the configuration files. Why would this happen in the most recent versions but not in the version from a few months ago. >>> Good question. The error means its spending too much time trying to >>> garbage collect without making much progress. >>> Why so much more garbage to collect just by updating? Not sure... >>> The stack trace is below. Oct 1, 2009 8:34:32 AM >>> org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, >>> 167353, ...(83 more)]} 0 35991 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.(String.java:215) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384) at >>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280) at >>> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt reamHandlerBase.java:54) at >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at >>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 38) at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:235) at >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:206) at >>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:233) at >>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:175) at >>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128 ) at >>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102 ) at >>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :109) at >>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at >>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java: 879) at >>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H ttp11NioProtocol.java:719) at >>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java: 2080) at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:619) Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute INFO: [zeta-main] webapp=/solr path=/update params={} status=500 >>> QTime=5265 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryEr
Re: Solr Trunk Heap Space Issues
I think it doesn't make sense to enable warming if your solr instance is just for indexing pourposes (it changes if you use it for search aswell). You could comment the caches aswell from solrconfig.xml Setting queryResultWindowSize and queryResultMaxDocsCached to sero maybe could help... (but if caches and warming are removed from solrconfig.xml I think these two parameters do nothing) Jeffery Newburn wrote: > > Ah yes we do have some warming queries which would look like a search. > Did > that side change enough to push up the memory limits where we would run > out > like this? Also, would FastLRU cache make a difference? > -- > Jeff Newburn > Software Engineer, Zappos.com > jnewb...@zappos.com - 702-943-7562 > > >> From: Yonik Seeley >> Reply-To: >> Date: Fri, 2 Oct 2009 00:53:46 -0400 >> To: >> Subject: Re: Solr Trunk Heap Space Issues >> >> On Thu, Oct 1, 2009 at 8:45 PM, Jeffery Newburn >> wrote: >>> I loaded the jvm and started indexing. It is a test server so unless >>> some >>> errant query came in then no searching. Our instance has only 512mb but >>> my >>> concern is the obvious memory requirement leap since it worked before. >>> What >>> other data would be helpful with this? >> >> Interesting... not too much should have changed for memory >> requirements on the indexing side. >> TokenStreams are now reused (and hence cached) per thread... but that >> normally wouldn't amount to much. >> >> There was recently another bug where compound file format was being >> used regardless of the config settings... but I think that was fixed >> on the 29th. >> >> Maybe you were already close to the limit required? >> Also, your heap dump did show LRUCache taking up 170MB, and only >> searches populate that (perhaps you have warming searches configured >> on this server?) >> >> -Yonik >> http://www.lucidimagination.com >> >> >> >> >> >>> >>> >>> On Oct 1, 2009, at 5:14 PM, "Mark Miller" wrote: >>> Jeff Newburn wrote: > > Ok I was able to get a heap dump from the GC Limit error. > > 1 instance of LRUCache is taking 170mb > 1 instance of SchemaIndex is taking 56Mb > 4 instances of SynonymMap is taking 112mb > > There is no searching going on during this index update process. > > Any ideas what on earth is going on? Like I said my May version did > this > without any problems whatsoever. > > Had any searching gone on though? Even if its not occurring during the indexing, you will still have the data structure loaded if searches had occurred. What heap size do you have - that doesn't look like much data to me ... -- - Mark http://www.lucidimagination.com >>> > > > -- View this message in context: http://www.nabble.com/Solr-Trunk-Heap-Space-Issues-tp25701422p25752521.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR-1395 integration with katta. Question about Katta's ranking among shards and IDF's
Hey there, I am trying to set up the Katta integration plugin. I would like to know if Katta's ranking algorith is used when searching among shards. In case yes, would it mean it solves the problem with IDF's of distributed Solr? -- View this message in context: http://www.nabble.com/SOLR-1395-integration-with-katta.-Question-about-Katta%27s-ranking-among-shards-and-IDF%27s-tp25819241p25819241.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: number of Solr indexes per Tomcat instance
Are you using one single solr instance with multicore or multiple solr instances with one index each? Erik_l wrote: > > Hi, > > Currently we're running 10 Solr indexes inside a single Tomcat6 instance. > In the near future we would like to add another 30-40 indexes to every > Tomcat instance we host. What are the factors we have to take into account > when planning for such deployments? Obviously we do know the sizes of the > indexes but for example how much memory does Solr need to be allocated > given that each index is treated as a webapp in Tomcat. Also, do you know > if Tomcat has got a limit in number of apps that can be deployed (maybe I > should ask this questions in a Tomcat forum). > > Thanks > E > -- View this message in context: http://www.nabble.com/number-of-Solr-indexes-per-Tomcat-instance-tp26027238p26027304.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: number of Solr indexes per Tomcat instance
Probably multicore would give you better performance... I think most important factors to take into account are the size of the index and the traffic you have to hold. With enought RAM memory you can hold 40 cores in a singe solr instance (or even more) but depending on the traffic you have to hold you will suffer of slow response times. Erik_l wrote: > > We're not using multicore. Today, one Tomcat instance host a number of > indexes in form of 10 Solr indexes (10 individual war files). > > > Marc Sturlese wrote: >> >> Are you using one single solr instance with multicore or multiple solr >> instances with one index each? >> >> Erik_l wrote: >>> >>> Hi, >>> >>> Currently we're running 10 Solr indexes inside a single Tomcat6 >>> instance. In the near future we would like to add another 30-40 indexes >>> to every Tomcat instance we host. What are the factors we have to take >>> into account when planning for such deployments? Obviously we do know >>> the sizes of the indexes but for example how much memory does Solr need >>> to be allocated given that each index is treated as a webapp in Tomcat. >>> Also, do you know if Tomcat has got a limit in number of apps that can >>> be deployed (maybe I should ask this questions in a Tomcat forum). >>> >>> Thanks >>> E >>> >> >> > > -- View this message in context: http://www.nabble.com/number-of-Solr-indexes-per-Tomcat-instance-tp26027238p26028437.html Sent from the Solr - User mailing list archive at Nabble.com.
keep index in production and snapshots in separate phisical disks
Is there any way to make snapinstaller install the index in spanpshot20091023124543 (for example) from another disk? I am asking this because I would like not to optimize the index in the master (if I do that it takes a long time to send it via rsync if it is so big). This way I would just have to send the new segments. In the slave I would have 2 phisical disks. Snappuller would send the snapshot to a disk (here the index would not be optimized). Snapinstaller would install the snapshot in the other disk, optimize it and open the newIndexReader. The optimization should be done in the disk wich contains the "not in production index" to not affect the search request speed. Any idea what should I hack to reach this goal in case it is possible? -- View this message in context: http://www.nabble.com/keep-index-in-production-and-snapshots-in-separate-phisical-disks-tp26029666p26029666.html Sent from the Solr - User mailing list archive at Nabble.com.
distributed facet dates
Hey there, I am thinking to develope facet dates for distributed search but I don't know exacly where to start. I am familiar with facet dates source code and I think if I could undesertand how distributed facet queries work shouldn't be that difficult. I have read http://wiki.apache.org/solr/WritingDistributedSearchComponents but I miss some info. Could anyone point me how could I start? Thanks in advance -- View this message in context: http://old.nabble.com/distributed-facet-dates-tp26282343p26282343.html Sent from the Solr - User mailing list archive at Nabble.com.
error with multicore CREATE action
Hey there, I am using Solr 1.4 out of the box and am trying to create a core at runtime using the CREATE action. I am getting this error when executing: http://localhost:8983/solr/admin/cores?action=CREATE&name=x&instanceDir=x&persist=true&config=solrconfig.xml&schema=schema.xml&dataDir=data Nov 23, 2009 6:18:44 PM org.apache.solr.core.SolrResourceLoader INFO: Solr home set to 'solr/x/' Nov 23, 2009 6:18:44 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error executing default implementation of CREATE at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:250) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:111) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'solr/x/conf/', cwd=/home/smack/Desktop/apache-solr-1.4.0/example at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:260) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:228) at org.apache.solr.core.Config.(Config.java:101) at org.apache.solr.core.SolrConfig.(SolrConfig.java:130) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:405) at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:245) ... 21 more I don't know if I am missing something. Should I create manually de folders and schema and solconfig files? -- View this message in context: http://old.nabble.com/error-with-multicore-CREATE-action-tp26482255p26482255.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr+jetty logging to syslog?
With 1.4 -Add log4j jars to Solr -Configure de SyslogAppender with something like: log4j.appender.solrLog=org.apache.log4j.net.SyslogAppender log4j.appender.solrLog.Facility=LOCAL0 log4j.appender.solrLog.SyslogHost=127.0.0.1 log4j.appender.solrLog.layout=org.apache.log4j.PatternLayout log4j.appender.solrLog.layout.ConversionPattern=solr: %-4r [%t] %-5p %c - %m%n -Install syslog-ng and let syslog accept udp packets. To do that uncomment in syslog-ng.conf the line udp(); in # all known message sources source s_all { Otis Gospodnetic wrote: > > Not many people do that, judging from > http://www.google.com/search?&q=+solr%20+syslogd . > > But I think this is really not a Solr-specific question. Isn't the > question really "how do I configure log4j to log to syslogd?". Oh, and > then "how do I configure slf4j to use log4j?" > > The answer to the first one is "by using SyslogAppender" (google says so) > The answer to the second one might be on > http://fernandoribeiro.eti.br/2006/05/24/how-to-use-slf4j-with-log4j/ > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: Steve Conover >> To: solr-user@lucene.apache.org >> Sent: Sat, November 21, 2009 4:09:57 PM >> Subject: Re: solr+jetty logging to syslog? >> >> Does no one send solr logging to syslog? >> >> On Thu, Nov 19, 2009 at 5:54 PM, Steve Conover wrote: >> > The solution involves slf4j to log4j to syslog (at least, for solr), >> > but I'm having some trouble stringing all the parts together. If >> > anyone is doing this, would you mind posting how you use slf4j-log4j >> > jar, what your log4j.properties looks like, what your java system >> > properties settings are, and anything else you think is relevant? >> > >> > Much appreciated >> > >> > -Steve >> > > > > -- View this message in context: http://old.nabble.com/solr%2Bjetty-logging-to-syslog--tp26437295p26531505.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sanity check on numeric types and which of them to use
And what about: vs. Wich is the differenece between both? It's just bcdint always better? Thanks in advance Yonik Seeley-2 wrote: > > On Fri, Dec 4, 2009 at 7:38 PM, Jay Hill wrote: >> 1) Is there any benefit to using the "int" type as a TrieIntField w/ >> precisionStep=0 over the "pint" type for simple ints that won't be sorted >> or >> range queried? > > No. But given that people could throw in a random range query and > have it work correctly with a trie based int (vs a plain int), seems > reason enough to prefer it. > >> 2) In 1.4, what type is now most efficient for sorting? > > trie and plain should be pretty equivalent (trie might be slightly > faster to uninvert the first time). Both take up less memory in the > field cache than sint. > >> 3) The only reason to use a "sint" field is for backward compatibility >> and/or to use sortMissingFirst/SortMissingLast, correct? > > I believe so. > > -Yonik > http://www.lucidimagination.com > > -- View this message in context: http://old.nabble.com/Sanity-check-on-numeric-types-and-which-of-them-to-use-tp26651725p26655009.html Sent from the Solr - User mailing list archive at Nabble.com.
About fsv (sort field falues)
I am tracing QueryComponent.java and would like to know the pourpose of doFSV function. Don't understand what fsv are for. Have tried some queries with fsv=true and some extra info apears in the response: But don't know what is it for and can't find much info out there. I read: // The query cache doesn't currently store sort field values, and SolrIndexSearcher doesn't // currently have an option to return sort field values. Because of this, we // take the documents given and re-derive the sort values. Is it for cache pourposes? Thanks in advance! -- View this message in context: http://old.nabble.com/About-fsv-%28sort-field-falues%29-tp26700729p26700729.html Sent from the Solr - User mailing list archive at Nabble.com.
UpdateRequestProcessor to avoid documents of being indexed
Hey there, I need that once a document has been created be able to decide if I want it to be indexed or not. I have thought in implement an UpdateRequestProcessor to do that but don't know how to tell Solr in the processAdd void to skip the document. If I delete all the field would it be skiped or is there a better way to reach this goal? Thanks in advance. -- View this message in context: http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725534.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: UpdateRequestProcessor to avoid documents of being indexed
Do you mean something like?: @Override public void processAdd(AddUpdateCommand cmd) throws IOException { boolean addDocToIndex =dealWithSolrDocFields(cmd.getSolrInputDocument()) ; if (next != null && addDocToIndex) { next.processAdd(cmd); } else { LOG.debug("Doc skipped!") ; } } Thanks in advance Chris Male wrote: > > Hi, > > If your UpdateRequestProcessor does not forward the AddUpdateCommand onto > the RunUpdateProcessor, I believe the document will not be indexed. > > Cheers > > On Thu, Dec 10, 2009 at 12:09 PM, Marc Sturlese > wrote: > >> >> Hey there, >> I need that once a document has been created be able to decide if I want >> it >> to be indexed or not. I have thought in implement an >> UpdateRequestProcessor >> to do that but don't know how to tell Solr in the processAdd void to skip >> the document. >> If I delete all the field would it be skiped or is there a better way to >> reach this goal? >> Thanks in advance. >> -- >> View this message in context: >> http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725534.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Chris Male | Software Developer | JTeam BV.| T: +31-(0)6-14344438 | > www.jteam.nl > > -- View this message in context: http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725698.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: UpdateRequestProcessor to avoid documents of being indexed
Yes, it did Cheers Chris Male wrote: > > Hi, > > Yeah thats what I was suggesting. Did that work? > > On Thu, Dec 10, 2009 at 12:24 PM, Marc Sturlese > wrote: > >> >> Do you mean something like?: >> >>@Override >>public void processAdd(AddUpdateCommand cmd) throws IOException { >>boolean addDocToIndex >> =dealWithSolrDocFields(cmd.getSolrInputDocument()) ; >>if (next != null && addDocToIndex) { >>next.processAdd(cmd); >>} else { >> LOG.debug("Doc skipped!") ; >>} >>} >> >> Thanks in advance >> >> >> >> Chris Male wrote: >> > >> > Hi, >> > >> > If your UpdateRequestProcessor does not forward the AddUpdateCommand >> onto >> > the RunUpdateProcessor, I believe the document will not be indexed. >> > >> > Cheers >> > >> > On Thu, Dec 10, 2009 at 12:09 PM, Marc Sturlese >> > wrote: >> > >> >> >> >> Hey there, >> >> I need that once a document has been created be able to decide if I >> want >> >> it >> >> to be indexed or not. I have thought in implement an >> >> UpdateRequestProcessor >> >> to do that but don't know how to tell Solr in the processAdd void to >> skip >> >> the document. >> >> If I delete all the field would it be skiped or is there a better way >> to >> >> reach this goal? >> >> Thanks in advance. >> >> -- >> >> View this message in context: >> >> >> http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725534.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> > -- >> > Chris Male | Software Developer | JTeam BV.| T: +31-(0)6-14344438 | >> > www.jteam.nl >> > >> > >> >> -- >> View this message in context: >> http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26725698.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Chris Male | Software Developer | JTeam BV.| www.jteam.nl > > -- View this message in context: http://old.nabble.com/UpdateRequestProcessor-to-avoid-documents-of-being-indexed-tp26725534p26726566.html Sent from the Solr - User mailing list archive at Nabble.com.
tire fields and sortMissingLast
Should sortMissingLast param be working on trie-fields? -- View this message in context: http://old.nabble.com/tire-fields-and-sortMissingLast-tp26873134p26873134.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: suggestions for DIH batchSize
If you want to retrieve a huge volume of rows you will end up with an OutOfMemoryException due to the jdbc driver. Setting batchSize to -1 in your data-config.xml (that internally will set it to Integer.MIN_VALUE) will make the query to be executed in streaming, avoiding the memory exception. Joel Nylund wrote: > > Hi, > > it looks like from looking at the code the default is 500, is the > recommended setting for this? > > Has anyone notice any significant performance/memory tradeoffs by > making this much bigger? > > thanks > Joel > > > -- View this message in context: http://old.nabble.com/suggestions-for-DIH-batchSize-tp26894539p26897636.html Sent from the Solr - User mailing list archive at Nabble.com.
Customize solr query
Hey there, I would like to customize my query this way: I want to give higher boosting to the results that match the query ejecuted between comas and less boosting to the results matching without comas. I have it done in my own lucene app (coding straight with lucene). I am trying to migrate to Solr but don't know how to do this comas stuff. I wouls like to do something like this: ...title:"+query_string+" (setting boosting 3) and title:+query_string+ (setting boosting 2)... I supose I have to add something ti the solrconfig.xml but couldn't find what. Any advice? Thanks in advanced Marc Sturlese -- View this message in context: http://www.nabble.com/Customize-solr-query-tp20293029p20293029.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting a document by primary key
Hey there, I am doing the same and I am experimenting some trouble. I get the document data searching by term. The problem is that when I do it several times (inside a huge for) the app starts increasing the memory use until I use almost the whole memory... Did u find any other way to do that? Jonathan Ariel wrote: > > I'm developing my own request handler and given a document primary key I > would like to get it from the index. > Which is the best and fastest way to do this? I will execute this request > handler several times and this should work really fast. > Sorry if it's a basic question. > > Thanks! > > Jonathan > > -- View this message in context: http://www.nabble.com/Getting-a-document-by-primary-key-tp20072108p20295436.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting a document by primary key
Hey there, I never run out of memory but I think the app always run to the limit... The problem seems to be in here (searching by term): try { indexSearcher = new IndexSearcher(path_index) ; QueryParser queryParser = new QueryParser("id_field", getAnalyzer(stopWordsFile)) ; Query query = queryParser.parse(query_string) ; Hits hits = indexSearcher.search(query) ; if(hits.length() > 0) { doc = hits.doc(0) ; } } catch (Exception ex) { } finally { if(indexSearcher != null) { try { indexSearcher.close() ; } catch(Exception e){} ; indexSearcher = null ; } } As hits is deprecated I tried to use termdocs and top docs... but the memory problem never disapeared... If I call the garbage collector every time I use the upper code the memory doesn't increase undefinitely but... the app works soo slow. Any suggestion? Thanks for replaying! Yonik Seeley wrote: > > On Sun, Nov 2, 2008 at 8:09 PM, Marc Sturlese <[EMAIL PROTECTED]> > wrote: >> I am doing the same and I am experimenting some trouble. I get the >> document >> data searching by term. The problem is that when I do it several times >> (inside a huge for) the app starts increasing the memory use until I use >> almost the whole memory... > > That just sounds like the way Java's garbage collection tends to > work... do you ever run out of memory (and get an exception)? > > -Yonik > > -- View this message in context: http://www.nabble.com/Getting-a-document-by-primary-key-tp20072108p20309245.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting a document by primary key
Hey your are right, I'm trying to migrate my app to solr. For the moment I am using solr for the searching part of the app but i am using my own lucene app for indexing, Shoud have posted in lucene forum for this trouble. Sorry about that. Iam trying to use termdocs properly now. Thanks for your advice. Marc Yonik Seeley wrote: > > On Mon, Nov 3, 2008 at 2:49 PM, Otis Gospodnetic > <[EMAIL PROTECTED]> wrote: >> Is this your code or something from Solr? >> That indexSearcher = new IndexSearcher(path_index) ; is very suspicious >> looking. > > Good point... if this is a Solr plugin, then get the SolrIndexSearcher > from the request object. > If it's not Solr, then use termenum/termdocs (and post to the right list > ;-) > > -Yonik > > -- View this message in context: http://www.nabble.com/Getting-a-document-by-primary-key-tp20072108p20310224.html Sent from the Solr - User mailing list archive at Nabble.com.
Using DataImportHandler with mysql database
Hey there, I am trying to use the DataImportHandler to index data from a mysql database. I am having the same error all the time just when I start tomcat: Nov 10, 2008 7:39:49 PM org.apache.solr.handler.dataimport.DataImporter loadDataConfig INFO: Data Configuration loaded successfully Nov 10, 2008 7:39:49 PM org.apache.solr.handler.dataimport.DataImportHandler inform SEVERE: Exception while loading DataImporter java.lang.NullPointerException at org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:95) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106) ... I am using the oficial release of solar 1.3. First I tried to add the compiled DataImportHandler jar. As it didn't work what I did was: I downloaded the package org/apache/solr/handler/dataimport from a nightly build and have added it and compiled to my solr 1.3 oficial source release. This way I have my solr1.3 release with the DataImporthandler In solrconfig.xml I have created a request handler to make the import: /path_to_/data-config.xml To connect to the database , in data-config.xml I am doing: ...and here I do the select and the mapping db_field - index_field *The mysql connector is correctly added in the classpath I think I must be missing something in my configuration but can't find what... Anyone can give me a hand? I am a bit lost with this problem... Thanks in advanced Marc Sturlese -- View this message in context: http://www.nabble.com/Using-DataImportHandler-with-mysql-database-tp20425791p20425791.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using DataImportHandler with mysql database
That worked! I was writing in a bad way the > It seems like your data-config does not have any tag. The > following is the correct structure: > > > > > > > > On Tue, Nov 11, 2008 at 12:31 AM, Marc Sturlese > <[EMAIL PROTECTED]>wrote: > >> >> Hey there, >> I am trying to use the DataImportHandler to index data from a mysql >> database. I am having the same error all the time just when I start >> tomcat: >> >> Nov 10, 2008 7:39:49 PM >> org.apache.solr.handler.dataimport.DataImportHandler >> processConfiguration >> INFO: Processing configuration from solrconfig.xml: >> {config=/path_to/data-config.xml} >> Nov 10, 2008 7:39:49 PM org.apache.solr.handler.dataimport.DataImporter >> loadDataConfig >> INFO: Data Configuration loaded successfully >> Nov 10, 2008 7:39:49 PM >> org.apache.solr.handler.dataimport.DataImportHandler >> inform >> SEVERE: Exception while loading DataImporter >> java.lang.NullPointerException >>at >> >> org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:95) >>at >> >> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106) >> ... >> >> I am using the oficial release of solar 1.3. First I tried to add the >> compiled DataImportHandler jar. As it didn't work what I did was: >> I downloaded the package org/apache/solr/handler/dataimport from a >> nightly >> build and have added it and compiled to my solr 1.3 oficial source >> release. >> This way I have my solr1.3 release with the DataImporthandler >> >> In solrconfig.xml I have created a request handler to make the import: >> >> > class="org.apache.solr.handler.dataimport.DataImportHandler"> >> >> >>/path_to_/data-config.xml >> >> >> >> To connect to the database , in data-config.xml I am doing: >> >> > url="jdbc:mysql://localhost/db_name" user="root" password=""/> ...and >> here >> I >> do the select and the mapping db_field - index_field >> >> *The mysql connector is correctly added in the classpath >> >> I think I must be missing something in my configuration but can't find >> what... >> Anyone can give me a hand? I am a bit lost with this problem... >> Thanks in advanced >> >> Marc Sturlese >> >> >> >> -- >> View this message in context: >> http://www.nabble.com/Using-DataImportHandler-with-mysql-database-tp20425791p20425791.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Using-DataImportHandler-with-mysql-database-tp20425791p20435463.html Sent from the Solr - User mailing list archive at Nabble.com.
deduplication & dataimporthandler
Hey there, Is there any way to use dataimporthandler with deduplication together just doing xml configuration? I have read that deduplication (http://wiki.apache.org/solr/Deduplication) is meant to be used with the handler named /update (wich uses solr.XmlUpdateRequestHandler class). If there's no other way will go inside the DataImportHandler source but would like to know if it can be done via conf... I am thinking in something like adding: true field1,field2 org.apache.solr.update.processor.TextProfileSignature signatureField Inside my requesthandler called /dataimport (wich uses org.apache.solr.handler.dataimport.DataImportHandler class) Has anyone done something similar? Marc Sturlese -- View this message in context: http://www.nabble.com/deduplication---dataimporthandler-tp20437553p20437553.html Sent from the Solr - User mailing list archive at Nabble.com.
indexing data and deleting from index and database
Hey there, Since few weeks ago I am trying to migrate my lucene core app to Solr and many questions are coming to my mind... Before being in ApacheCon I thought that my Lucene Index works fine with my Solr Search Engine but after my conversation with Erik in the Solr BootCamp I understood that the structure of the Fields in the Solr Index are different, specially in analyzing stuff. Now, I want to use Solr to index too and I have some questions: The first thing I do when I launch the indexer is to delete a lot of documents that I have marked in a db with a field delete=1 that I have indexed before in the Lucene Index. Once it is done, I also delete the documents from the DB. After that, I index some docs from the same DB (the 100.000 newest Docs and some other modifieds). To do the migration I have started using DataImportHandler (with JDBCDataSource) with Delta Import to add new documents. The thing is that I can not find a way to delete the rows from the DB neither the docs from my index with DataImportHandler. Is to do an implementation of the DataSource the best way to do this task? Is there a better way? Thanks for everything!!! -- View this message in context: http://www.nabble.com/indexing-data-and-deleting-from-index-and-database-tp20466411p20466411.html Sent from the Solr - User mailing list archive at Nabble.com.
troubles with delta import
Hey there, I am using dataimport with full-import successfully but there's no way do make it work with delta-import. Aparently solr doesn't show any error but it does not do what it is supose to. I thing the problme is with dataimport.properties because it is never updated. I have it placed in the same folder as solrconfig.xml and schema.xml and the writing permissions are set propertly. What makes me doubt is that couldn't find anywhere to tell solr the path of this file. Don't know if solr is suposed to find it automatically. My data-config.xml looks like this: *I have in the rows of the table a timestamp field called dt_last_modified Other thing that can't exactly understant is why i have to put the query and delta-query... why just with deltaquery (with more fields in the select) is not enough? After the ejecution everything seems to go ok (even with the debug and verbose mode) but no docs have changed and dataimport.properties is not updated... Any suggestion? Have done many tests but no way... -- View this message in context: http://www.nabble.com/troubles-with-delta-import-tp20498449p20498449.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: troubles with delta import
Hey Shalin, I have tryied 2 methods: 1-First doing a full-import and after a delta-import 2.-Start directly with the delta-import. In any of both cases the date of the import.properties file is updated. I have it placed in the same folder as schema.xml,data-config.xml and solrconfig.xml (is where I think is must be placed acording to what I understand in the wiki). Is it correct? Is the only thing that I think maybe I am missing... Thanks in advance Shalin Shekhar Mangar wrote: > > Hi Marc, > > Did you do a full-import first? If not, no value for last import time is > written and the delta query may fail. We should fix this to use a sane > default so that people do not need to full import first. > > You need to put both because we support both full and delta, both of which > need different kinds of queries and we cannot decide what you are going to > use. > > On Fri, Nov 14, 2008 at 4:35 PM, Marc Sturlese > <[EMAIL PROTECTED]>wrote: > >> >> Hey there, I am using dataimport with full-import successfully but >> there's >> no >> way do make it work with delta-import. Aparently solr doesn't show any >> error >> but it does not do what it is supose to. >> I thing the problme is with dataimport.properties because it is never >> updated. I have it placed in the same folder as solrconfig.xml and >> schema.xml and the writing permissions are set propertly. What makes me >> doubt is that couldn't find anywhere to tell solr the path of this file. >> Don't know if solr is suposed to find it automatically. >> >> My data-config.xml looks like this: >> >>> url="jdbc:mysql://path_db" user="user" password="pwd"/> >> >> >> >> >> >> >> >> *I have in the rows of the table a timestamp field called >> dt_last_modified >> >> Other thing that can't exactly understant is why i have to put the query >> and >> delta-query... why just with deltaquery (with more fields in the select) >> is >> not enough? >> >> After the ejecution everything seems to go ok (even with the debug and >> verbose mode) but no docs have changed and dataimport.properties is not >> updated... >> >> Any suggestion? Have done many tests but no way... >> >> -- >> View this message in context: >> http://www.nabble.com/troubles-with-delta-import-tp20498449p20498449.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/troubles-with-delta-import-tp20498449p20500510.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: troubles with delta import
Hey, That's the weird thing... in the log everything seems to work fine: Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DataImportHandler processConfiguration INFO: Processing configuration from solrconfig.xml: {config=/opt/netbeans-5.5.1/enterprise3/apache-tomcat-5.5.17/bin/solr/conf/data-config.xml} Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DataImporter loadDataConfig INFO: Data Configuration loaded successfully Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport INFO: Starting Delta Import Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity homes_tbl_ads with URL: jdbc:mysql://localhost/path_db Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 11 Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time taken = 0:0:0.47 Nov 14, 2008 3:12:46 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr_web path=/dataimport params={verbose=true&command=delta-import&debug=on} status=0 QTime=130 Nov 14, 2008 3:12:46 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr_web path=/dataimport params={command=show-config} status=0 QTime=0 I am calling the dataimport this way: http://...dataimport?command=full-import&debug=on&verbose=true http://...dataimport?command=delta-import&debug=on&verbose=true In delta-import I am getting this aoutput with the verbose debug: ... delta-import debug ... lst name="statusMessages"> 1 10 0 2008-11-14 15:12:46 0:0:0.47 It also shows the changes in the rows in the output of the verbose debug but nothing change in the index when I check it with Luke. I keep thinking that something is wrong coz the import.properties it is not being created... but can't find why :( solrconfig.xml: /opt/netbeans-5.5.1/enterprise3/apache-tomcat-5.5.17/bin/solr/conf/data-config.xml data-config.xml: Thanks a lot -- View this message in context: http://www.nabble.com/troubles-with-delta-import-tp20498449p20501450.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: troubles with delta import
Hey Shalin! Now at least I am getting some errors in the log file :D... Hope now I will be able to find the problem. Thanks for everything! Shalin Shekhar Mangar wrote: > > Ok I found the problem. > > In debug mode, DataImportHandler does not commit documents since it is > meant > for debugging only. If you want to do a commit, add commit=true as a > request > parameter. > > On Fri, Nov 14, 2008 at 7:56 PM, Marc Sturlese > <[EMAIL PROTECTED]>wrote: > >> >> Hey, >> That's the weird thing... in the log everything seems to work fine: >> >> Nov 14, 2008 3:12:46 PM >> org.apache.solr.handler.dataimport.DataImportHandler >> processConfiguration >> INFO: Processing configuration from solrconfig.xml: >> >> {config=/opt/netbeans-5.5.1/enterprise3/apache-tomcat-5.5.17/bin/solr/conf/data-config.xml} >> Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DataImporter >> loadDataConfig >> INFO: Data Configuration loaded successfully >> Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DataImporter >> doDeltaImport >> INFO: Starting Delta Import >> Nov 14, 2008 3:12:46 PM >> org.apache.solr.handler.dataimport.JdbcDataSource$1 >> call >> INFO: Creating a connection for entity homes_tbl_ads with URL: >> jdbc:mysql://localhost/path_db >> Nov 14, 2008 3:12:46 PM >> org.apache.solr.handler.dataimport.JdbcDataSource$1 >> call >> INFO: Time taken for getConnection(): 11 >> Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DocBuilder >> execute >> INFO: Time taken = 0:0:0.47 >> Nov 14, 2008 3:12:46 PM org.apache.solr.core.SolrCore execute >> INFO: [] webapp=/solr_web path=/dataimport >> params={verbose=true&command=delta-import&debug=on} status=0 QTime=130 >> Nov 14, 2008 3:12:46 PM org.apache.solr.core.SolrCore execute >> INFO: [] webapp=/solr_web path=/dataimport params={command=show-config} >> status=0 QTime=0 >> >> I am calling the dataimport this way: >> http://...dataimport?command=full-import&debug=on&verbose=true >> http://...dataimport?command=delta-import&debug=on&verbose=true >> >> In delta-import I am getting this aoutput with the verbose debug: >> >> ... >> delta-import >> debug >> >> ... >> lst name="statusMessages"> >> 1 >> 10 >> 0 >> 2008-11-14 15:12:46 >> 0:0:0.47 >> >> >> It also shows the changes in the rows in the output of the verbose debug >> but >> nothing change in the index when I check it with Luke. >> I keep thinking that something is wrong coz the import.properties it is >> not >> being created... but can't find why :( >> >> solrconfig.xml: >> > class="org.apache.solr.handler.dataimport.DataImportHandler" >> default="false"> >> >> >>> >> name="config">/opt/netbeans-5.5.1/enterprise3/apache-tomcat-5.5.17/bin/solr/conf/data-config.xml >> >> >> >> data-config.xml: >> >> >>> url="jdbc:mysql://localhost/trovit_es" user="root" password=""/> >> >> >> >> >> >> >> >> >> Thanks a lot >> >> >> >> -- >> View this message in context: >> http://www.nabble.com/troubles-with-delta-import-tp20498449p20501450.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/troubles-with-delta-import-tp20498449p20502269.html Sent from the Solr - User mailing list archive at Nabble.com.
using deduplication with dataimporthandler
Hey there, I have posted before telling about my situation but I thing my explanation was a bit confusing... I am using dataImportHanlder and delta-import and it's working perfectly. I have also coded my own SqlEntityProcesor to delete from the index and database expired rows. Now I need to do duplication control at indexing time. In my old lucene core I made my own duplication control but it was so slow as it worked comparing strings... I have been investigating solr deduplication (http://wiki.apache.org/solr/Deduplication) and it seems so cool as it works with hashes instead of strings. I have learned how to use deduplication using the /update requestHandler as the wiki says: dedupe But the thing is that I want to use it with the /dataimport requestHanlder (the one used by dataimporthandler). I don't know if there's a possible xml configuration to add deduplication to dataimportHandler or I should code a plugin... in that case, I don't exacly now where. Hope my explanation is more clear now... Thank's in advanced! -- View this message in context: http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20536053.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using deduplication with dataimporthandler
Thank you so much. I have it sorted. I am wondering now if there is any more stable way to use deduplication than adding to the solr source project this patch: https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel (SOLR-799.patch 2008-11-12 05:10 PM this one exactly). I have downloaded the last nightly-build source code and couldn't see the needed classes in there. Anyones knows something?Should I ask this in the developers forum? Thanks in advanced Marc Sturlese wrote: > > Hey there, > > I have posted before telling about my situation but I thing my explanation > was a bit confusing... > I am using dataImportHanlder and delta-import and it's working perfectly. > I have also coded my own SqlEntityProcesor to delete from the index and > database expired rows. > > Now I need to do duplication control at indexing time. In my old lucene > core I made my own duplication control but it was so slow as it worked > comparing strings... I have been investigating solr deduplication > (http://wiki.apache.org/solr/Deduplication) and it seems so cool as it > works with hashes instead of strings. > > I have learned how to use deduplication using the /update requestHandler > as the wiki says: > > > dedupe > > > > But the thing is that I want to use it with the /dataimport requestHanlder > (the one used by dataimporthandler). I don't know if there's a possible > xml configuration to add deduplication to dataimportHandler or I should > code a plugin... in that case, I don't exacly now where. > > Hope my explanation is more clear now... > Thank's in advanced! > > > -- View this message in context: http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20538008.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using deduplication with dataimporthandler
Marc Sturlese wrote: > > Thank you so much. I have it sorted. > I am wondering now if there is any more stable way to use deduplication > than adding to the solr source project this patch: > https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > (SOLR-799.patch 2008-11-12 05:10 PM this one exactly). > > I have downloaded the last nightly-build source code and couldn't see the > needed classes in there. > Anyones knows something?Should I ask this in the developers forum? > > The thing is I can't find the class > org.apache.solr.update.processor.DeduplicateUpdateProcessorFactory > anywhere... > > Thanks in advanced > > -- View this message in context: http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20538077.html Sent from the Solr - User mailing list archive at Nabble.com.
TextProfileSigature using deduplication
Hey there, I've been testing and checking the source of the TextProfileSignature.java to avoid similar entries at indexing time. What I understood is that it is useful for huge text where the frequency of the tokens (the words in lowercase just with number and leters in taht case) is important. If you want to detect duplicates in not huge text and not giving a lot of importance to the frequencies it doesn't work... The hash will be made just with the terms wich frequency is higher than a QUANTUM (which value is given in function of the max freq between all the terms). So it will say that: aaa sss ddd fff ggg hhh aaa kkk lll ooo aaa xxx iii www qqq aaa jjj eee zzz nnn are duplicates because quantum here wolud be 2 and the frequency of aaa would be 2 aswell. So, to make the hash just the term aaa would be used. In this case: aaa sss ddd fff ggg hhh kkk lll ooo apa sss ddd fff ggg hhh kkk lll ooo Here quantum would be 1 and the frequencies of all terms would be 1 so all terms would be use for the hash. It will consider this two strings not similar. As I understood the algorithm there's no way to make it understand that in my second case both strings are similar. I wish i were wrong... I have my own duplication system to detect that but I use String comparison so it works really slow... Would like to know if there is any tuning possibility to do that with TextProfileSignature Don't know if I should pot this here or in the developers forum... Thanks in advance -- View this message in context: http://www.nabble.com/TextProfileSigature-using-deduplication-tp20559155p20559155.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TextProfileSigature using deduplication
>> >> I have my own duplication system to detect that but I use String >> comparison >> so it works really slow... >> What are you doing for the String comparison? Not exact right? hey, My comparison method looks for similar (not just exact)... what I do is to compare two text word to word. What I do after is decide a % of similarity, fore example: aaa sss ddd fff ggg hhh jjj kkk lll ooo bbb rrr ddd fff ggg hhh jjj kkk lll ooo Deciding a 80% of similarity and comparing word to word these two String would be similar. (I split texts in tokens and count how many similars I do have). (I use some stopwords and rules aswell) I am going to try more tunning in the parameters of TextProfileSignature as you say. Don't know if you remember but I ask you about this in the ApacheConn and you told me abou this 799 JIRA. If i make it word it is definitely much faster than my system... Abou deduplication... I couldn't find anywhere the classe tha aperas in the wiki :org.apache.solr.update.processor.DeduplicateUpdateProcessorFactory so I downloaded the patch and pluedg in to my solr source (I use org.apache.solr.update.processor.TextProfileSignature insted of the one writed in the wiki). Would apreciate any advice about the tuning params of TextProfileSignature Thank you for your time markrmiller wrote: > > >>> >>> I have my own duplication system to detect that but I use String >>> comparison >>> so it works really slow... >>> > What are you doing for the String comparison? Not exact right? > > -- View this message in context: http://www.nabble.com/TextProfileSigature-using-deduplication-tp20559155p20560828.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: TextProfileSigature using deduplication
Hey there, I found couple of solutions that work fine for my case (is not exacly what I was looking for at the begining but I could adapt it). First one: Use always quantum=1 and minTokenLen=2. Instead of order the tokens by frequency, I order them alphabetically, doing this I am a little more permissive (there will be more duplicates that ordering by freq). Use minTokenLen=3 could be usefull in here, depending on the use case Second one Oreder tokens alphabetically Use minToken=2 quantum 2 for all fields less one. In this one I use quantum=1. This filed that uses quantum=1 will make me be more restrictive but I know that if it doesn't match the docs can't be considered as duplicated. These are 2 ways to detect duplicates in not huge texts. It is specific for my case but the idea of giving different quantum to different fields could be helpful in other cases aswell Ken Krugler wrote: > >>Marc Sturlese wrote: >>>Hey there, I've been testing and checking the source of the >>>TextProfileSignature.java to avoid similar entries at indexing time. >>>What I understood is that it is useful for huge text where the frequency of >>>the tokens (the words in lowercase just with number and leters in taht case) >>>is important. If you want to detect duplicates in not huge text and not >>>giving a lot of importance to the frequencies it doesn't work... >>>The hash will be made just with the terms wich frequency is higher than a >>>QUANTUM (which value is given in function of the max freq between all the >>>terms). So it will say that: >>> >>>aaa sss ddd fff ggg hhh aaa kkk lll ooo >>>aaa xxx iii www qqq aaa jjj eee zzz nnn >>> >>>are duplicates because quantum here wolud be 2 and the frequency of aaa >>>would be 2 aswell. So, to make the hash just the term aaa would be used. >>> >>>In this case: >>>aaa sss ddd fff ggg hhh kkk lll ooo >>>apa sss ddd fff ggg hhh kkk lll ooo >>> >>>Here quantum would be 1 and the frequencies of all terms would be 1 so all >>>terms would be use for the hash. It will consider this two strings not >>>similar. >>> >>>As I understood the algorithm there's no way to make it understand that in >>>my second case both strings are similar. I wish i were wrong... >>> >>>I have my own duplication system to detect that but I use String comparison >>>so it works really slow... Would like to know if there is any tuning >>>possibility to do that with TextProfileSignature >>>Don't know if I should pot this here or in the developers forum... >> >>Hi Marc, >> >>TextProfileSignature is a rather crude >>implementation of approximate similarity, and as >>you pointed out it's best suited for large >>texts. The original purpose of this Signature >>was to deduplicate web pages in large amounts of >>crawled pages (in Nutch), where it worked >>reasonably well. Its advantage is also that it's >>easy to compute and doesn't require multiple >>passes over the corpus. >> >>As it is implemented now, it breaks badly in the >>case you describe. You could modify this >>implementation to include also word-level >>ngrams, i.e. sequences of more than 1 word, up >>to N (e.g. 5) - this should work in your case. >> >>Ultimately, what you are probably looking for is >>a shingle-based algorithm, but it's relatively >>costly and requires multiple passes. > > There's an intermediate approach we use... > > * Generate separate hashes for each of the quantized bands > * Create additional fingerprint values (depends on the nature of the data) > * Find potentially similar files using the above > * Then apply an accurate but slower comparison to determine true > similarity > > From our data, it's common to get files where > (due to small text changes) the frequency of a > term moves between quantized bands. This then > changes the über hash that you get from combining > all terms, but with 10 or so bands we still get > some matches on the hashes from the individual > bands. > > The "find potentially similar files" uses a > simple Lucene scoring function, based on the > number of matching fingerprint values. > > -- Ken > -- > Ken Krugler > Krugle, Inc. > +1 530-210-6378 > "If you can't find it, you can't fix it" > > -- View this message in context: http://www.nabble.com/TextProfileSigature-using-deduplication-tp20559155p20600118.html Sent from the Solr - User mailing list archive at Nabble.com.
not string or text fields and shards
Hey there, I have started working with an index divided in 3 shards. When I did a distributed search I got an error with the fields that were not string or text. I read that the error was due to BinaryResponseWriter and not string/text empty fields. I found the solution in an old thread of this forum: http://www.nabble.com/best-way-to-debug-shard-format-errors-td19087854.html The thing is I had to change some source code and rebuild solr. In that old thread said that this problem would be solved in Solr 1.3. It is the version that I am using but I still found the problem. Maybe there is a solution not adding source that I couldn't know. Does someone found any other solution? Thanks in advance. -- View this message in context: http://www.nabble.com/not-string-or-text-fields-and-shards-tp20600353p20600353.html Sent from the Solr - User mailing list archive at Nabble.com.
idea about faceting
Hey there, I am faceing a problem doing filed facets and I don't know if there exist any solution in Solr to solve my problem. I want to do facets with a field that is very small text. To do that I am using the KeywordTokenizerfactory to keep all the words of the text in just one token. I use LowerCaseFilterFactory not to miss cases that doesn't match due to uppercase and ISOLatin1AccentFilterFactory not to miss cases that doesn't match because of the accents. The problem apears here, I would like to show the facets with accents or uppercase. In my old Lucene system not using Solr I use to create my facet fields with accents but at searching time I removed the accents and uppercases manually with java. So, i did the search without accents and upper case but I was able to show them later. I have been playing with the facet solr source code but can't find the way to solve my problem... Does anyone have an idea about how could I reach this goal? Thanks in advance -- View this message in context: http://www.nabble.com/idea-about-faceting-tp20638850p20638850.html Sent from the Solr - User mailing list archive at Nabble.com.
data import handler - going deeper...
Hey there, After developing my own extends classes from sqlentityprocesor, jdbcdatasource and transformer I have my customized dataimporthandler almost working. I have to reach one more goal. In one hand I don't always have to index all the fields from my db row. For example fields from db that have null value don't have to be indexed. Checking the source code I see I could do that modifying the function addFields from the DocBuilder.java In the other hand i need to give boost to fields from a doc at indexing time (set boost not to the whole doc but to a few fields). I see I can do that in the addFieldValue function of the DocBuilder.java. The thing is I would like not to modify core classes but do some plug in. Is there any way to apply those changes using plugins like I did with transformers or entityprocesors? Thanks in advance -- View this message in context: http://www.nabble.com/data-import-handler---going-deeper...-tp20731715p20731715.html Sent from the Solr - User mailing list archive at Nabble.com.