Re: problem with facets - out of memory exception

2013-12-19 Thread Marc Sturlese
Have you tried to reindex using DocValues? Fields used for faceting are stored on disk and not on ram using the FieldCache. If you have enough memory they will be loaded on the system cache but not on the java heap. This is good for GC too when committing. http://wiki.apache.org/solr/DocValues -

Re: Solr 3.6 optimize and field cache question

2013-07-10 Thread Marc Sturlese
Not a solution for the short term but sounds like a good use case to migrate to Solr 4.X and use DocValues instead of FieldCache for faceting. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-optimize-and-field-cache-question-tp4076398p4076822.html Sent from the Solr

Listeners, cores and Similarity

2013-08-16 Thread Marc Sturlese
Hey there, I'm testing a custom similarity which loads data from and external file located in solr_home/core_name/conf/. I load data from the file into a Map on the init method of the SimilarityFactory. I would like to reload that Map every time a commit happens or every X hours. To do that I've th

Re: Tweaking boosts for more search results variety

2013-09-10 Thread Marc Sturlese
This is totally deprecated but maybe can be helpful if you want to re-sort some documents https://issues.apache.org/jira/browse/SOLR-1311 -- View this message in context: http://lucene.472066.n3.nabble.com/Tweaking-boosts-for-more-search-results-variety-tp4088302p4089044.html Sent from the Solr

Re: Solr 3.3. Grouping vs DeDuplication and Deduplication Use Case

2011-08-30 Thread Marc Sturlese
Deduplication uses lucene indexWriter.updateDocument using the signature term. I don't think it's possible as a default feature to choose wich document to index, the "original" should be always the last to be indexed. /IndexWriter.updateDocument Updates a document by first deleting the document(s)

Adding a DocSet as a filter from a custom search component

2011-10-25 Thread Marc Sturlese
Hey there, I'm wondering if there's a more clean way to to this: I've written a SearchComponent, that runs as last-component. In the prepare method I build a DocSet (SortedIntDocSet) based on if some values of the fieldCache of a determined field accomplish some rules (if rules are accomplished, se

Re: changing omitNorms on an already built index

2011-10-27 Thread Marc Sturlese
As far as I know there's no issue about this. You have to reindex and that's it. In which kind of field are you changing the norms? (You just will see changes in text fields) Using debugQuery=true you can see how norms affect the score (in case you have them not omited) -- View this message in con

Re: Collection Distribution vs Replication in Solr

2011-10-27 Thread Marc Sturlese
Replication is easier to manage and a bit faster. See the performance numbers: http://wiki.apache.org/solr/SolrReplication -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-Distribution-vs-Replication-in-Solr-tp3458724p3459178.html Sent from the Solr - User mailing li

performance sorting multivalued field

2010-06-18 Thread Marc Sturlese
hey there! can someone explain me how impacts to have multivalued fields when sorting? I have read in other threads how does it affect when faceting but couldn't find any info of the impact when sorting Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/perfor

Re: performance sorting multivalued field

2010-06-18 Thread Marc Sturlese
I mean sorting the query results, not facets. I am asking because I have added a multivalued field that has as much 10 values. But 70% of the docs has just 1 or 2 fields of this multiValued field. I am not doing faceting. Since I have added the multiValued field, "java old gen" seems to get full m

Re: performance sorting multivalued field

2010-06-19 Thread Marc Sturlese
Hey Erik, I am currently sorting by a multiValued. It apears a feature tha't you may not know wich of the fields of the multiValued field makes the document be in that position. This is good for me, I don't care for my tests. What I need to know if there is any performance issue in all of this. Th

Re: Can query boosting be used with a custom request handlers?

2010-06-21 Thread Marc Sturlese
Maybe this helps: http://wiki.apache.org/solr/SolrPlugins#QParserPlugin -- View this message in context: http://lucene.472066.n3.nabble.com/Can-query-boosting-be-used-with-a-custom-request-handlers-tp884499p912691.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: performance sorting multivalued field

2010-06-22 Thread Marc Sturlese
>>Well, sorting requires that all the unique values in the target field >>get loaded into memory That's what I tought, thanks. >>But a larger question is whether what your doing is worthwhile >>even as just a measurement. You say >>"This is good for me, I don't care for my tests". I claim that >>

Re: anyone use hadoop+solr?

2010-06-22 Thread Marc Sturlese
I think there's people using this patch in production: https://issues.apache.org/jira/browse/SOLR-1301 I have tested it myself indexing data from CSV and from HBase and it works properly -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914553.ht

Re: solr with hadoop

2010-06-22 Thread Marc Sturlese
I think a good solution could be to use hadoop with SOLR-1301 to build solr shards and then use solr distributed search against these shards (you will have to copy to local from HDFS to search against them) -- View this message in context: http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp48

Re: anyone use hadoop+solr?

2010-06-22 Thread Marc Sturlese
Well, the patch consumes the data from a csv. You have to modify the input to use TableInputFormat (I don't remember if it's called exaclty like that) and it will work. Once you've done that, you have to specify as much reducers as shards you want. I know 2 ways to index using hadoop method 1 (so

Re: anyone use hadoop+solr?

2010-06-24 Thread Marc Sturlese
Hi Otis, just for curiosity, wich strategy do you use? Index in the map or reduce side? Do you use it to build shards or a single monolitic index? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p919335.html Sent from the Solr - User mai

Re: performance sorting multivalued field

2010-06-24 Thread Marc Sturlese
Thanks, that's very useful info. However can't reproduce the error. I've created and index where all documents have a multivalued date field and each document have a minimum of one value in that field. (most of the docs have 2 or 3). So, the number of un-inverted term instances is greater than the

Re: performance sorting multivalued field

2010-06-25 Thread Marc Sturlese
>>*There are lot's of docs with the same value, I mention that because I supose that same value has nothing to do with the number of un-inverted term instances. It has to do, I've been able to reproduce teh error by setting different values to each field: HTTP Status 500 - there are more terms th

Re: Recommended MySQL JDBC driver

2010-06-26 Thread Marc Sturlese
I supose you use BatchSize=-1 to index that amount of data. Up from 5.1.7 connector there's this param: netTimeoutForStreamingResults The default value is 600. Increasing that maybe can help (2400 for example?) -- View this message in context: http://lucene.472066.n3.nabble.com/Recommended-MySQL

ending an app taht uses EmbeddedSolrServer

2010-07-13 Thread Marc Sturlese
Hey there, I've done some tests with a custom java app using EmbeddedSolrServer to create an index. It works ok and I am able to build the index but I've noticed after the commit an optimize are done, the app never terminates. How should I end it? Is there any way to tell the EmbeddedSolrServer to

Re: ending a java app that uses EmbeddedSolrServer

2010-07-13 Thread Marc Sturlese
Seems that coreContainer.shoutdown() solves the problem. Anyone doing it in a different way? -- View this message in context: http://lucene.472066.n3.nabble.com/ending-a-java-app-that-uses-EmbeddedSolrServer-tp963573p964013.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: maxMergeDocs and performance tuning

2010-08-16 Thread Marc Sturlese
As far as I know, the higher you set the value, the faster the indexing process will be (because more things are kept in memory). But depending on which are your needs, it may not be the best option. If you set a high mergeFactor and you want to optimize the index once the process is done, this op

Re: JVM GC is very frequent.

2010-08-26 Thread Marc Sturlese
http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot-camp-draft/ -- View this message in context: http://lucene.472066.n3.nabble.com/JVM-GC-is-very-frequent-tp1345760p1348065.html Sent from the Solr - User mailing list archive at Nabble.com.

FieldCache.DEFAULT.getInts vs FieldCache.DEFAULT.getStringIndex. Memory usage

2010-08-26 Thread Marc Sturlese
I need to load a FieldCache for a field wich is a solr "integer" type and has as maximum 3 digits. Let's say my index has 10M docs. I am wandering what is more optimal and less memory consumig, to load a FieldCache.DEFAUL.getInts or a FieldCache.DEFAULT.getStringIndex. The second one will have a

Re: Null pointer exception when mixing highlighter & shards & q.alt

2010-09-07 Thread Marc Sturlese
I noticed that long ago. Fixed it doing in HighlightComponent finishStage: @Override public void finishStage(ResponseBuilder rb) { boolean hasHighlighting = true ; if (rb.doHighlights && rb.stage == ResponseBuilder.STAGE_GET_FIELDS) { Map.Entry[] arr = new NamedList.NamedListEnt

Re: what differents between SolrCloud and Solr+Hadoop

2010-09-13 Thread Marc Sturlese
Well these are pretty different things. SolrCloud is meant to handle distributed search in a more easy way that "raw" solr distributed search. You have to build the shards in your own way. Solr+hadoop is a way to build these shards/indexes in paralel. -- View this message in context: http://luc

Re: How do you programatically create new cores?

2010-10-17 Thread Marc Sturlese
You have to create the core's folder with it's conf inside the Solr home. Once done you can call the create action of the admin handler: http://wiki.apache.org/solr/CoreAdmin#CREATE If you need to dinamically create, start and stop lots of cores there's this patch, but don't know about it's curren

Re: Dynamically create new core

2010-11-02 Thread Marc Sturlese
To create the core, the folder with the confs must already exist and has to be placed in the proper place (inside the solr home). Once you run the create core action, this core will we added to solr.xml and dinamically loaded. -- View this message in context: http://lucene.472066.n3.nabble.com/D

Core status uptime and startTime

2010-11-03 Thread Marc Sturlese
As far as I know, in the core admin page you can find when was the last time an index had a modification and was comitted checking the lastModified. But? what startTime and uptime mean? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Core-status-uptime-and-s

Re: Adding new field after data is already indexed

2010-11-08 Thread Marc Sturlese
>> and i index data on the basis of these fields. Now, incase i need to add a new field, is there a way i can >> add the field without corrupting the previous data. Is there any feature which adds a new field with a >> default value to the existing records. You just have to add the new field in

Re: Adding new field after data is already indexed

2010-11-08 Thread Marc Sturlese
>> and i index data on the basis of these fields. Now, incase i need to add a new field, is there a way i can >> add the field without corrupting the previous data. Is there any feature which adds a new field with a >> default value to the existing records. You just have to add the new field in

about NRTCachingDirectory

2012-12-10 Thread Marc Sturlese
I have a doubt about how NRTCachingDirectory works. As far as I've seen, it receives a delegator Directory and caches newly created segments. So, if MMapDirectory use to be the default: 1.- Does NRTCachingDirectory works acting sort of as a wrapper of MMap caching the new segments? 2.- If I have

Re: Shard timeouts on large (1B docs) Solr cluster

2012-02-03 Thread Marc Sturlese
timeAllowed can be used outside distributed search. It is used by the TimeL¡mitingCollector. When the search time is equal to timeAllowed it will stop searching and will return the results that could find till then. This can be a problem when using incremental indexing. Lucene starts searching from

Re: how to ignore indexing of duplicated documents?

2012-03-12 Thread Marc Sturlese
http://wiki.apache.org/solr/Deduplication -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-ignore-indexing-of-duplicated-documents-tp3814858p3818973.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Faceting on a date field multiple times

2012-05-04 Thread Marc Sturlese
http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-on-a-date-field-multiple-times-tp3961282p3961865.html Sent from the Solr - User mailing list archive at Nabble.com.

latest patches and big picture of search grouping

2011-01-17 Thread Marc Sturlese
I need to dive into search grouping / field collapsing again. I've seen there are lot's of issues about it now. Can someone point me to the minimum patches I need to run this feature in trunk? I want to see the code of the most optimised version and what's being done in distributed search. I think

Re: Need to create dyanamic indexies base on different document workspaces

2011-04-22 Thread Marc Sturlese
In case you need to create lots of indexes and register/unregister fast, there is work on the way http://wiki.apache.org/solr/LotsOfCores -- View this message in context: http://lucene.472066.n3.nabble.com/Need-to-create-dyanamic-indexies-base-on-different-document-workspaces-tp2845919p2852410.ht

Re: Strange performance behaviour when concurrent requests are done

2011-04-29 Thread Marc Sturlese
Any suggestion about this issue?-- View this message in context: http://lucene.472066.n3.nabble.com/Strange-performance-behaviour-when-concurrent-requests-are-done-tp505478p2878758.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Strange performance behaviour when concurrent requests are done

2011-04-29 Thread Marc Sturlese
That's true. But the degradation is so big. If you use lunch concurrent requests to a web app taht doesn't use Solr the time per request won't degradate that much. For me, it looks more like a synchronized is happening somewhere in Solr or Lucene and is causing this.-- View this message in context:

problem with the new IndexSearcher when snpainstaller (and commit script) happen

2011-06-15 Thread Marc Sturlese
Hey there, I've noticed a very odd behaviour with the snapinstaller and commit (using collectionDistribution scripts). The first time I install a new index everything works fine. But when installing a new one, I can't see the new documents. Checking the status page of the core tells me that the ind

Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen

2011-06-15 Thread Marc Sturlese
Test are done on Solr 1.4 The simplest way to reproduce my problem is having 2 indexes and a Solr box with just one core. Both index must have been created with the same schema. 1- Remove the index dir of the core and start the server (core is up with an empty index) 2- check status page of the co

Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen

2011-06-15 Thread Marc Sturlese
I don't know if this could have something to do with the problem but some of the files of the indexes have same size and name (in all the index but not in the empty one). I have also realized that when moving back to the empty index and committing, numDocs and maxDocs change. Once I'm with the empt

Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen

2011-06-15 Thread Marc Sturlese
I have some more info! I've build another index bigger than the others so names of the files are not the same. This way, if I move from any of the other index to the bigger one or vicevera it works (I can see the cahnges in the version, numDocs and maxDocs)! So, I thing it is related to the name of

Re: problem with the new IndexSearcher when snpainstaller (and commit script) happen [SOLVED]

2011-06-15 Thread Marc Sturlese
I've found the problem in case someone is interested. It's because of the indexReader.reopen(). If it is enabled, when opening a new searcher due to the commit, this code is executed (in SolrCore.getSearcher(boolean forceNew, boolean returnSearcher, final Future[] waitSearcher)): ... if

RE: embeded solrj doesn't refresh index

2011-07-22 Thread Marc Sturlese
Are u indexing with full import? In case yes and the resultant index has similar num of docs (that the one you had before) try setting reopenReaders to false in solrconfig.xml * You have to send the comit, of course. -- View this message in context: http://lucene.472066.n3.nabble.com/embeded-solr

Re: Boost documents based on the number of their fields

2011-08-19 Thread Marc Sturlese
You have different options here. You can give more boost at indexing time to the documents that have set the fields you want. For this to take effect you will have to reindex and set omitNorms="false" to the fields you are going to search. This same concept can be applied to boost single fields ins

offsets issues with multiword synonyms since LUCENE_33

2012-08-14 Thread Marc Sturlese
Has someone noticed this problem and solved it somehow? (without using LUCENE_33 in the solrconfig.xml) https://issues.apache.org/jira/browse/LUCENE-3668 Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/offsets-issues-with-multiword-synonyms-since-LUCENE-33

Re: offsets issues with multiword synonyms since LUCENE_33

2012-08-14 Thread Marc Sturlese
Well an example would be: synonyms.txt: huge,big size The I have the docs: 1- The huge fox attacks first 2- The big size fox attacks first Then if I query for huge, the highlights for each document are: 1- The huge fox attacks first 2- The big size fox attacks first The analyzer looks like this

Re: FieldCollapsing: Two response elements returned?

2009-07-28 Thread Marc Sturlese
That's provably because you are using both the CollpaseComponent and the QueryComponent. I think the 2 or 3 last patches allow full replacement of QueryComponent.You shoud just replace: for: This will sort your problem and make response times faster. Jay Hill wrote: > > I'm doing some test

update some index documents after indexing process is done with DIH

2009-07-28 Thread Marc Sturlese
Hey there, I would like to be able to do something like: After the indexing process is done with DIH I would like to open an indexreader, iterate over all docs, modify some of them depending on others and delete some others. I can easy do this directly coding with lucene but would like to know if

Re: update some index documents after indexing process is done with DIH

2009-07-28 Thread Marc Sturlese
code... so trying to find out the best way to do that as a plugin instead of a hack as possible. Thanks in advance Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: > > It is best handled as a 'newSearcher' listener in solrconfig.xml. > onImportEnd is invoked before committing > > On

Re: update some index documents after indexing process is done with DIH

2009-07-28 Thread Marc Sturlese
e event fired is firstSearcher. newSearcher > is fired when a commit happens > > > On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlese > wrote: >> >> Ok, but if I handle it in a newSearcher listener it will be executed >> every >> time I reload a core, isn't it? Th

Re: update some index documents after indexing process is done with DIH

2009-07-29 Thread Marc Sturlese
(I need to modify them depending on values of other documents, that's why I can't do it with DIH delta-import). Thanks in advance Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: > > On Tue, Jul 28, 2009 at 5:17 PM, Marc Sturlese > wrote: >> >> That really sounds the best way

Re: update some index documents after indexing process is done with DIH

2009-07-30 Thread Marc Sturlese
uments after indexing process is done > with > : DIH > : > : If you make your EventListener implements SolrCoreAware you can get > : hold of the core on inform. use that to get hold of the > : SolrIndexWriter > : > : On Wed, Jul 29, 2009 at 9:20 PM, Marc St

Re: update some index documents after indexing process is done with DIH

2009-07-31 Thread Marc Sturlese
hold of SolrIndexWriter just holding core... Marc Sturlese wrote: > > Hey there, > I would like to be able to do something like: After the indexing process > is done with DIH I would like to open an indexreader, iterate over all > docs, modify some of them depending on others and d

Re: Is negative boost possible?

2009-08-19 Thread Marc Sturlese
:>the only way to "negative boost" is to "positively boost" the inverse... :> :> (*:* -field1:value_to_penalize)^10 This will do the job aswell as bq supports pure negative queries (at least in trunk): bq=-field1:value_to_penalize^10 http://wiki.apache.org/solr/SolrRelevancyFAQ#head-76e53d

Re: Remove data from index

2009-08-20 Thread Marc Sturlese
As far as I know you can not do that with DIH. What size is your index? Probably the best you can do is index from scratch again with full-import. clico wrote: > > I hope it could be a solution. > > But I think I understood that u can use deletePkQuery like this > > "select document_id from ta

Optimizing a query to sort results alphabetically for a determinated field

2009-08-24 Thread Marc Sturlese
Hey there, I need to sort my query results alphabetically for a determinated field called "town". This field is analyzed with a KeywordAnalyzer and isn't multiValued. Add that some docs doesn't doesn'h have this field. Doing just: http://localhost/solr//select/?q=whatever&version=2.2&start=0&rows

Re: Optimizing a query to sort results alphabetically for a determinated field

2009-08-24 Thread Marc Sturlese
r > fielType definitions in the schema. > > On Mon, Aug 24, 2009 at 11:58 AM, Marc Sturlese > wrote: > >> >> Hey there, I need to sort my query results alphabetically for a >> determinated >> field called "town". This field is analyzed with a KeywordAn

Re: Optimizing a query to sort results alphabetically for a determinated field

2009-08-24 Thread Marc Sturlese
jn Visinescu wrote: >> > >> > There's a "sortMissingLast" true/false property that you can set on >> your >> > fielType definitions in the schema. >> > >> > On Mon, Aug 24, 2009 at 11:58 AM, Marc Sturlese >> > wrote: >

Best way to do a lucene matchAllDocs not using q.alt=*:*

2009-09-03 Thread Marc Sturlese
Hey there, I need a query to get the total number of documents in my index. I can get if I do this using DismaxRequestHandler: q.alt=*:*&facet=false&hl=false&rows=0 I have noticed this query is very memory consuming. Is there any more optimized way in trunk to get the total number of documents of

DIH applying variosu transformers to a field

2009-09-08 Thread Marc Sturlese
Hey there, I am using DIH to import a db table and and have writed a custom transformer following the example: package foo; public class CustomTransformer1{ public Object transformRow(Map row) { String artist = row.get("artist"); if (artist != null)

Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Marc Sturlese
Doing this you will send the dump where you want: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/the/dump Then you can open the dump with jhat: jhat /path/to/the/dump/your_stack.bin It provably will give you a OutOfMemortException due to teh large size ofthe dump. In case you can give

Re: Solr Trunk Heap Space Issues

2009-10-05 Thread Marc Sturlese
I think it doesn't make sense to enable warming if your solr instance is just for indexing pourposes (it changes if you use it for search aswell). You could comment the caches aswell from solrconfig.xml Setting queryResultWindowSize and queryResultMaxDocsCached to sero maybe could help... (but if

SOLR-1395 integration with katta. Question about Katta's ranking among shards and IDF's

2009-10-09 Thread Marc Sturlese
Hey there, I am trying to set up the Katta integration plugin. I would like to know if Katta's ranking algorith is used when searching among shards. In case yes, would it mean it solves the problem with IDF's of distributed Solr? -- View this message in context: http://www.nabble.com/SOLR-1395-

Re: number of Solr indexes per Tomcat instance

2009-10-23 Thread Marc Sturlese
Are you using one single solr instance with multicore or multiple solr instances with one index each? Erik_l wrote: > > Hi, > > Currently we're running 10 Solr indexes inside a single Tomcat6 instance. > In the near future we would like to add another 30-40 indexes to every > Tomcat instance we

Re: number of Solr indexes per Tomcat instance

2009-10-23 Thread Marc Sturlese
hold you will suffer of slow response times. Erik_l wrote: > > We're not using multicore. Today, one Tomcat instance host a number of > indexes in form of 10 Solr indexes (10 individual war files). > > > Marc Sturlese wrote: >> >> Are you using one single solr

keep index in production and snapshots in separate phisical disks

2009-10-23 Thread Marc Sturlese
Is there any way to make snapinstaller install the index in spanpshot20091023124543 (for example) from another disk? I am asking this because I would like not to optimize the index in the master (if I do that it takes a long time to send it via rsync if it is so big). This way I would just have to

distributed facet dates

2009-11-10 Thread Marc Sturlese
Hey there, I am thinking to develope facet dates for distributed search but I don't know exacly where to start. I am familiar with facet dates source code and I think if I could undesertand how distributed facet queries work shouldn't be that difficult. I have read http://wiki.apache.org/solr/Writ

error with multicore CREATE action

2009-11-23 Thread Marc Sturlese
Hey there, I am using Solr 1.4 out of the box and am trying to create a core at runtime using the CREATE action. I am getting this error when executing: http://localhost:8983/solr/admin/cores?action=CREATE&name=x&instanceDir=x&persist=true&config=solrconfig.xml&schema=schema.xml&dataDir=da

Re: solr+jetty logging to syslog?

2009-11-26 Thread Marc Sturlese
With 1.4 -Add log4j jars to Solr -Configure de SyslogAppender with something like: log4j.appender.solrLog=org.apache.log4j.net.SyslogAppender log4j.appender.solrLog.Facility=LOCAL0 log4j.appender.solrLog.SyslogHost=127.0.0.1 log4j.appender.solrLog.layout=org.apache.log4j.PatternLayout log4j.appe

Re: Sanity check on numeric types and which of them to use

2009-12-05 Thread Marc Sturlese
And what about: vs. Wich is the differenece between both? It's just bcdint always better? Thanks in advance Yonik Seeley-2 wrote: > > On Fri, Dec 4, 2009 at 7:38 PM, Jay Hill wrote: >> 1) Is there any benefit to using the "int" type as a TrieIntField w/ >> precisionStep=0 over the "pint" ty

About fsv (sort field falues)

2009-12-08 Thread Marc Sturlese
I am tracing QueryComponent.java and would like to know the pourpose of doFSV function. Don't understand what fsv are for. Have tried some queries with fsv=true and some extra info apears in the response: But don't know what is it for and can't find much info out there. I read: // The query

UpdateRequestProcessor to avoid documents of being indexed

2009-12-10 Thread Marc Sturlese
Hey there, I need that once a document has been created be able to decide if I want it to be indexed or not. I have thought in implement an UpdateRequestProcessor to do that but don't know how to tell Solr in the processAdd void to skip the document. If I delete all the field would it be skiped or

Re: UpdateRequestProcessor to avoid documents of being indexed

2009-12-10 Thread Marc Sturlese
eers > > On Thu, Dec 10, 2009 at 12:09 PM, Marc Sturlese > wrote: > >> >> Hey there, >> I need that once a document has been created be able to decide if I want >> it >> to be indexed or not. I have thought in implement an >> UpdateRequestProces

Re: UpdateRequestProcessor to avoid documents of being indexed

2009-12-10 Thread Marc Sturlese
Yes, it did Cheers Chris Male wrote: > > Hi, > > Yeah thats what I was suggesting. Did that work? > > On Thu, Dec 10, 2009 at 12:24 PM, Marc Sturlese > wrote: > >> >> Do you mean something like?: >> >>@Override >>public vo

tire fields and sortMissingLast

2009-12-21 Thread Marc Sturlese
Should sortMissingLast param be working on trie-fields? -- View this message in context: http://old.nabble.com/tire-fields-and-sortMissingLast-tp26873134p26873134.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: suggestions for DIH batchSize

2009-12-23 Thread Marc Sturlese
If you want to retrieve a huge volume of rows you will end up with an OutOfMemoryException due to the jdbc driver. Setting batchSize to -1 in your data-config.xml (that internally will set it to Integer.MIN_VALUE) will make the query to be executed in streaming, avoiding the memory exception. Joe

Customize solr query

2008-11-02 Thread Marc Sturlese
Solr but don't know how to do this comas stuff. I wouls like to do something like this: ...title:"+query_string+" (setting boosting 3) and title:+query_string+ (setting boosting 2)... I supose I have to add something ti the solrconfig.xml but couldn't find what. Any advice? Than

Re: Getting a document by primary key

2008-11-02 Thread Marc Sturlese
Hey there, I am doing the same and I am experimenting some trouble. I get the document data searching by term. The problem is that when I do it several times (inside a huge for) the app starts increasing the memory use until I use almost the whole memory... Did u find any other way to do that? J

Re: Getting a document by primary key

2008-11-03 Thread Marc Sturlese
cs... but the memory problem never disapeared... If I call the garbage collector every time I use the upper code the memory doesn't increase undefinitely but... the app works soo slow. Any suggestion? Thanks for replaying! Yonik Seeley wrote: > > On Sun, Nov 2, 2008 at 8:09 PM, Marc St

Re: Getting a document by primary key

2008-11-03 Thread Marc Sturlese
Hey your are right, I'm trying to migrate my app to solr. For the moment I am using solr for the searching part of the app but i am using my own lucene app for indexing, Shoud have posted in lucene forum for this trouble. Sorry about that. Iam trying to use termdocs properly now. Thanks for your a

Using DataImportHandler with mysql database

2008-11-10 Thread Marc Sturlese
I do the select and the mapping db_field - index_field *The mysql connector is correctly added in the classpath I think I must be missing something in my configuration but can't find what... Anyone can give me a hand? I am a bit lost with this problem... Thanks in advanced Marc Sturlese

Re: Using DataImportHandler with mysql database

2008-11-11 Thread Marc Sturlese
That worked! I was writing in a bad way the > It seems like your data-config does not have any tag. The > following is the correct structure: > > > > > > > > On Tue, Nov 11, 2008 at 12:31 AM, Marc Sturlese > <[EMAIL PROTECTED]>wrote: >

deduplication & dataimporthandler

2008-11-11 Thread Marc Sturlese
Field Inside my requesthandler called /dataimport (wich uses org.apache.solr.handler.dataimport.DataImportHandler class) Has anyone done something similar? Marc Sturlese -- View this message in context: http://www.nabble.com/deduplication---dataimporthandler-tp20437553p20437553

indexing data and deleting from index and database

2008-11-12 Thread Marc Sturlese
Hey there, Since few weeks ago I am trying to migrate my lucene core app to Solr and many questions are coming to my mind... Before being in ApacheCon I thought that my Lucene Index works fine with my Solr Search Engine but after my conversation with Erik in the Solr BootCamp I understood that the

troubles with delta import

2008-11-14 Thread Marc Sturlese
Hey there, I am using dataimport with full-import successfully but there's no way do make it work with delta-import. Aparently solr doesn't show any error but it does not do what it is supose to. I thing the problme is with dataimport.properties because it is never updated. I have it placed in the

Re: troubles with delta import

2008-11-14 Thread Marc Sturlese
> use. > > On Fri, Nov 14, 2008 at 4:35 PM, Marc Sturlese > <[EMAIL PROTECTED]>wrote: > >> >> Hey there, I am using dataimport with full-import successfully but >> there's >> no >> way do make it work with delta-import. Aparently solr doesn

Re: troubles with delta import

2008-11-14 Thread Marc Sturlese
Hey, That's the weird thing... in the log everything seems to work fine: Nov 14, 2008 3:12:46 PM org.apache.solr.handler.dataimport.DataImportHandler processConfiguration INFO: Processing configuration from solrconfig.xml: {config=/opt/netbeans-5.5.1/enterprise3/apache-tomcat-5.5.17/bin/solr/conf

Re: troubles with delta import

2008-11-14 Thread Marc Sturlese
gt; meant > for debugging only. If you want to do a commit, add commit=true as a > request > parameter. > > On Fri, Nov 14, 2008 at 7:56 PM, Marc Sturlese > <[EMAIL PROTECTED]>wrote: > >> >> Hey, >> That's the weird thing... in the

using deduplication with dataimporthandler

2008-11-17 Thread Marc Sturlese
Hey there, I have posted before telling about my situation but I thing my explanation was a bit confusing... I am using dataImportHanlder and delta-import and it's working perfectly. I have also coded my own SqlEntityProcesor to delete from the index and database expired rows. Now I need to do d

Re: using deduplication with dataimporthandler

2008-11-17 Thread Marc Sturlese
2008-11-12 05:10 PM this one exactly). I have downloaded the last nightly-build source code and couldn't see the needed classes in there. Anyones knows something?Should I ask this in the developers forum? Thanks in advanced Marc Sturlese wrote: > > Hey there, > > I have post

Re: using deduplication with dataimporthandler

2008-11-17 Thread Marc Sturlese
Marc Sturlese wrote: > > Thank you so much. I have it sorted. > I am wondering now if there is any more stable way to use deduplication > than adding to the solr source project this patch: > https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.iss

TextProfileSigature using deduplication

2008-11-18 Thread Marc Sturlese
Hey there, I've been testing and checking the source of the TextProfileSignature.java to avoid similar entries at indexing time. What I understood is that it is useful for huge text where the frequency of the tokens (the words in lowercase just with number and leters in taht case) is important. If

Re: TextProfileSigature using deduplication

2008-11-18 Thread Marc Sturlese
>> >> I have my own duplication system to detect that but I use String >> comparison >> so it works really slow... >> What are you doing for the String comparison? Not exact right? hey, My comparison method looks for similar (not just exact)... what I do is to compare two text word to word. Wh

Re: TextProfileSigature using deduplication

2008-11-20 Thread Marc Sturlese
Ken Krugler wrote: > >>Marc Sturlese wrote: >>>Hey there, I've been testing and checking the source of the >>>TextProfileSignature.java to avoid similar entries at indexing time. >>>What I understood is that it is useful for huge text where the frequency

not string or text fields and shards

2008-11-20 Thread Marc Sturlese
Hey there, I have started working with an index divided in 3 shards. When I did a distributed search I got an error with the fields that were not string or text. I read that the error was due to BinaryResponseWriter and not string/text empty fields. I found the solution in an old thread of this f

idea about faceting

2008-11-22 Thread Marc Sturlese
Hey there, I am faceing a problem doing filed facets and I don't know if there exist any solution in Solr to solve my problem. I want to do facets with a field that is very small text. To do that I am using the KeywordTokenizerfactory to keep all the words of the text in just one token. I use Low

data import handler - going deeper...

2008-11-28 Thread Marc Sturlese
Hey there, After developing my own extends classes from sqlentityprocesor, jdbcdatasource and transformer I have my customized dataimporthandler almost working. I have to reach one more goal. In one hand I don't always have to index all the fields from my db row. For example fields from db that

  1   2   3   >